October 13, 2023
At Timspark, we understand that staying at the forefront of software development and deployment processes is not merely a choice but our go-to strategy. We have gathered two specialists in the field to shed light on the future of DevOps tools and which solutions are genuinely effective. Our guest and developer advocate at Upbound, Viktor Francis, and Timspark’s leading DevOps expert, Mikhail Shayunov, bring a wealth of experience and share their thoughts on the tools that are shaping the future of software development, deployment, and operational excellence.
Let’s kick off with a burning question to warm up the conversation. It is our task in the current project, and we’ve had a lot of discussions today with the team. Running databases in Kubernetes — what are the pros and cons?
The best option is not to run databases at all and opt for managed ones. Just use a database service from your favorite cloud provider, like AWS, Google Cloud, or Azure. Whatever works. Cloud Native Postgres is actually one of the best options, to my mind.
If you don’t have to run a database, simply do not bother with that. Yet, if you want to manage it yourself, Kubernetes becomes part of the conversation. Kubernetes is the baseline upon which all the cloud vendors build their next generation of projects. Now, there are downsides to running a database on Kubernetes. And the main ones are usually two.
First, databases are typically managed by database administrators. And they might not be sufficiently familiar with Kubernetes. And when I say familiar enough, I don’t mean, ‘Hey, I played with Kubernetes for a week or a couple of weeks.’ I mean, you need experience of running Kubernetes in production. So, the downside is that the person who will manage that database does not have production experience with Kubernetes itself.
The bigger downside is that many databases were neither designed, nor rewritten, nor redesigned to run in Kubernetes. Many of the databases were a thing running for 20 years in virtual machines or bare metal. We learned how to package it in a container image. We learned how to write a docker file. Just run the same thing in Kubernetes… and then we have a miserable failure. That does not work straightforwardly, because Kubernetes primitives do not have everything required to run a database. You need to create your custom resource definitions and controllers; when I say you, I mean a vendor managing that database project.
So, that’s really a huge problem that many databases were not adopted to run appropriately in Kubernetes. An excellent example of something that is adopted would be PostgreSQL. PostgreSQL has good controllers and good customer service definitions that allow people to manage it truly in a way that is designed to manage something in Kubernetes, like cloud native Postgres. On the other hand, running an Oracle database on Kubernetes is just senseless. Simply because, as far as I know, nobody ever bothered to design how it should run correctly on Kubernetes. That’s really a huge problem that many databases were not adopted to run properly in Kubernetes. A good example of something that is adopted would be PostgreSQL. PostgreSQL has good controllers and good customer service definitions that allow people to manage it truly in a way that is designed to manage something in Kubernetes, like cloud-native Postgres. On the other hand, running an Oracle database on Kubernetes is just senseless. Simply because, as far as I know, nobody ever bothered to design how it should run properly on Kubernetes.
Let’s talk about Argo CD workflows for Kubernetes. Can it replace previously popular CI/CD tools, in your opinion?
Absolutely no. It cannot come even close to replacing them. Not even previously popular CI/CD tools — it cannot replace CI/CD tools, period.
And the reason is very simple. Argos CD ensures that the data from GitHub is synchronized with the data in some Kubernetes clusters. So, it’s about synchronization, or what people in the past would call deployments.
Now, what is CI/CD? Continuous integration, or continuous delivery, or continuous deployment? It’s a whole process from the beginning to the end.
With CI/CD, which I prefer to call pipelines precisely because people are getting confused today, we build images, run tests, and perform security scanning, to name a few. We have dozens of different steps in a pipeline required for our code until it gets to production. And one or two of them would be the deployment itself. So, the correct answer is Argo CD can replace a part of the CI/CD process, which is currently performed by pipeline tools like Jenkins or GitHub.
Yet, we can use Cloud CI or Jenkins for deploying our solution to Kubernetes without running any additional services.
Let’s say that you deployed your application right now, and you released it. And let’s say for the sake of argument that three hours from now, the process will fail. What happens after that? The actual state changed and compared to the desired state, and there is nothing that will reconcile those two. So the question to ask here is: do you want to have continuous drift detection and reconciliation and want it to repeat continuously? And how do you know the desired state, not only the actual one?
We never use one tool — we use one tool to orchestrate pipelines. And that orchestration involves many, many different tools. You’re not building with Jenkins. You’re telling Jenkins to execute, and I’m using Jenkins as an example. Pipelines are orchestrating the whole process. And using different tools to orchestrate those processes. From that perspective, we are not changing our pipelines’ work. We are just orchestrating the execution of tasks differently. Instead of executing Kubernetes in the pipeline, we are pushing the changes to Git.
And apart from drift detection, reconciliation, and security, I have another benefit. As a human, I can check for myself what is the desired state. Many people say we can consider Git as the source of truth. I think that’s wrong because the source of truth is only your system and is never what you want. But Git becomes your source of information.
So why is Argo CD becoming so popular right now? Do you personally use it?
I use it all the time. I mean, I use both Argo CD and Flux, sometimes even kapp-controller. Thus, I have certain guarantees that the state I have defined in Git is going to be the state that is somewhere else. But I don’t have that guarantee if I use pipelines. And the reason is relatively simple.
Pipelines are performing one short action, meaning when I push something to Git, certain actions will be performed, and they can be performed alongside other things. For instance, deployment with Helm. Once we execute these commands, we are going to get the exit code. And unless I get the exit code and then the notification, I do have a guarantee of a desired state from the pipelines. The problem is that those guarantees stop the second later. Whenever any changes happen to my system, it starts drifting. The application might fail five seconds later, and it’s not running anymore, or a whole zone went down. What GitOps tools do is drift detection so that reconciliation continues.
So with pipelines, I’m getting a guarantee something will happen when I tell it to happen, while with the GitOps tools I have guarantees we will continuously maintain the actual desired state, which is the same as anywhere else. We can constantly monitor those two states, and if there is a drift, we will reconcile one with the other.
In some cases, you do want things to happen once; in other cases, you want something to happen continuously. For instance, running tests is typically performed with pipelines, as you run them once and get the results. I’m excluding the case of flaky tests that are randomly failing. But GitHub is a good thing for something that should be maintained continuously in your system. Your infrastructure, applications, and services should be in the desired state 100% of the time. And the only way to accomplish that is to ensure that that drift is being searched for all the time.
Another reason why people like Argo CD, Flux, and other similar tools is security. When we use pipelines, we need to open access to our system so the pipeline can enter and change its state. If I’m using Kubernetes or GitHub Actions, the only way to change the state of my Kubernetes cluster is by opening the port. Surely, you will have credential security stored, but realistically, when you open the port, you allow other people, tools, or processes to potentially enter and threaten the system.
However, with GitOps, no external entity comes to your cluster and modifies it. The processes are running inside that cluster. They are very efficient and do not need much memory and high CPU usage.
So, apart from being primarily implemented for drift detection reconciliation, security is the reason for using Argo CD. It’s pulling information from Git while sitting in my system and performing changes to that very system.
Let’s talk about AI for Kubernetes and whether you think it has potential now.
So, it depends on how we define potential. If you define potential as the future tense, AI will certainly play a crucially important role in the future. I have zero doubt about it. AI will completely change how we as humans operate on many different levels, and that includes Kubernetes.
But does it provide value today? I would say a little bit. And that will probably sound strange because if you Google or go to a conference, you will see AI everywhere. We are on a hype train right now. Everybody talks about AI. ‘Let’s do something with it! Joe, can you please set up something over the weekend?’ And in 99% of the cases, literally, anybody can come up with a similar solution. Let’s say create a wrapper code that sends a message to ChatGPT API, gets the message back, and then shows you on the screen what the response is. Those solutions can be replaced with cURL. To make it clear, I’m not saying the same thing for AI in general.
Regarding Kubernetes, there is a low impact and no investment in AI right now, particularly in this field. The majority of AI solutions for Kubernetes are not groundbreaking. They are basic and not secure. This happens for two interconnected reasons.
First, as an industry, we didn’t have enough time to come up with thorough solutions. Most people found out the importance of AI less than a year ago. Until last year, the buzzword for possible investment was ‘security’. Now, it’s about AI. Roughly speaking, these are mostly ad hoc solutions or even rapid prototyping, so you can put an AI sticker, market it, or probably get investment.
And second, the companies aspiring to come up with solutions for Kubernetes do not have enough ML or AI experts in their organization.
In the context of potential, the main area of AI must be the management itself. What I want to see within the scope of Kubernetes is a tool to fix my problems. I don’t want the tool to depress me and tell me, “Hey, this is wrong’. I already know it’s wrong. Can you please fix it for me? Sometime in the future, we will see AI intelligent enough to change your resources, scale them up and down to fix problems, and do whatever we as humans are doing, but implement it as a machine.
Are there any not-obvious but must-have and must-learn DevOps tools you would recommend delving into today? Can you name a few?
First, tools are never good, so keep a hand on what comes next. I would say I’m fortunate. A significant part of my time is spent discovering things that almost nobody else knows about and figuring out what to do with them — testing them, researching, and delivering them to the public.
I strongly recommend Timoni. It is one of those tools that is relatively new, started maybe half a year ago. With Timoni, one can define Kubernetes manifest using CUE as a configuration language.
Essentially, it replaces Helm, and personally, I consider Helm the worst possible option anybody can use to define Kubernetes manifest. And this is an ongoing problem — we never thought about how we can manage the Kubernetes manifest.
Another tool I really like and which is relatively new is Port. It’s a front-end part of what people would call an internal developer platform. Right now, we have backstage as the most popular and finally adopted solution. Backstage is great, mainly because it shows us what we should do, but it fails miserably because it’s too complicated and tedious and requires too many people to operate. Port is providing that front-end part of the platform in a much more elegant way.
First tools are never good, so keep a hand on what comes next. I would say I’m fortunate. A significant part of my time is spent discovering things that almost nobody else knows about and figuring out what to do with them — testing them, researching, and delivering them to the public.
Mikhail: What tools do you use to improve security for your solutions on Kubernetes?
Github. I’m kidding, of course. First of all, many of the things I’m using are not even security tools, but more like practices. For instance, there are a lot of tools that scan your container images, which is great. You should be doing that for security purposes. But designing them right from the start is even more important than scanning your container images. And that brings me to ChainGuard. What the tool does essentially is give out secure images with zero vulnerabilities.
And they provide images that are building daily in the first place. A base image might have zero vulnerabilities today, but then it will have vulnerabilities tomorrow, and ChainGuard prevents this.
There are a couple of solutions coming. Komodor, for example, is doing scanning of the cluster itself. The tool has already been released, and to me, it is extremely interesting. It performs dynamic scanning of your data instead of a static one. This helps identify vulnerabilities more effectively.
When scanning is concerned, my assumptions are tools will be moving to runtime scanning. They will get the context of something running in the system and then get the information about vulnerabilities. When we have the context, those tools can decide, within this context, whether it is truly a vulnerability affecting you or a CVE.
This allows you to use Kernel extensions, and by extending Kernel, you can, among many other things, contribute a lot to security simply because you are able to control Kernel processes without disrupting the system.
Thank you so much. All worth checking out. And what about backup and disaster recovery for Kubernetes? Do you know any tools for this?
There are two things you need to think about — restoration of the disaster recovery data and the definition of the system. By data, I mean databases. The disaster recovery of databases in Kubernetes does not differ significantly from restoring data from databases running elsewhere. You have a database and a specific way to back it up.
For everything else in a database that is not data, and now I’m not joking, that’s GitOps. Because if you have accurately defined the desired state of the system in Git, and let’s say that I destroy the whole cluster, then create a new cluster and still restore it in Git.
The only exception for using GitOps is data. And, of course, we should be running data in at least three data centers that are geographically close to each other but with separate electricity, network, and everything. So that we can spread the cluster across three zones and three data centers, so the chances for recovery are higher.
And does it make sense to use Kubernetes Cluster Federation for high availability? If you want to run your application in different data centers that are not geographically collocated.
Yes, in this case, you need multiple clusters, with or without Cluster Federation. A minimal number of companies truly need it, though — companies like Adobe or Netflix. But for a vast majority of companies, I would say ‘no.’ Simply because it doesn’t work well enough, and it proves to be more of a trouble than a really useful solution. Still, we do not have a decent way to spread Kubernetes cluster one way or another across multiple regions. And by regions, I mean data centers thousands of kilometers apart.
So, in this case, we can say that the Kubernetes cluster is not for huge architecture and huge infrastructure projects.
Quite the opposite. Kubernetes came from Borg, and there is nothing bigger than Borg. Massive companies are using Kubernetes, we just have challenges running data across different continents.
And that has nothing to do with Kubernetes. The reason why we do not deploy the same Kubernetes cluster across nodes in multiple regions is latency. And latency is the problem of physics. That nothing can travel faster than the speed of light.
Let’s create a list of tools and services that we should provide for the support team in production so that they do not need to connect to the cluster.
Essentially, you need two things. First, you need to be able to define the desired state of your deployment services. The most commonly used way to do that is GitOps tools, Argo CD, Flux, or kapp-controller. You’re defining whatever you want in Git, and they will get the information synchronized.
And the second thing you need is an observability-related tool. And I’m using observability in a very generic term, meaning that whatever you need to observe in any form or way. This equally applies even if you use kapp-controller. You need metrics, logs, and traces. Some tools will be able to store all three of those, like Elasticsearch. The problem is these are very different types of data. And if you try to store these in one place, you will not be able to use specific tools, only generic ones like Elasticsearch. And they become very inefficient, using much more memory CPU and being much slower. The alternative is using separate tools for all three of those: Prometheus for metrics, Loki for logs (I personally prefer it), and Grafana Tempo or Jaeger for traces.
We have never had that before. Now we finally have a standard that tools do not matter anymore. And that’s awesome.
Feel free to drop us a line to schedule a face-to-face meeting or look for us at the event.
About the speakers
Developer Advocate at Upbound, a member of the Google Developer Experts, GitHub Stars, and CD Foundation groups, and a published author. The host of the DevOps Toolkit YouTube channel and a co-host of the DevOps Paradox podcast. His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD), and Test-Driven Development (TDD).
DevOps expert at Timspark has 17+ years of experience in system administration and security infrastructure development and 10+ years of in‐depth experience designing, implementing, and scaling highly efficient technical environments for banking IT systems and technologies.