Conf42 Platform Engineering 2024 - Online

- premiere 5PM GMT

How to Build Cloud-Native Platforms with Kubernetes

Video size:

Abstract

Platform Engineering and SRE focus on empowering developers and organizations by creating and maintaining internal software products known as platforms. In this blog, we will explore what platforms are, why they are important, and uncover best practices for well-architected platforms.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Piotr Zaniewski. I'm a head of engineering enablement at Loft Labs. And today I will talk to you about architecting. developer platforms. We'll focus specifically on infrastructure developer platforms. We will talk about design principles and tools used in building those. This will be heavily Kubernetes centric. And I also have a really cool demo. So a little bit about me. I specialize in all the cloud native ecosystem, Kubernetes, Docker. Linux and so on. I typically spend way more free time on tweaking my dot files and working with NeoVim than I should have, but it's a lot of fun. As I mentioned, I work at Loft Labs. Go ahead and check them out if you would like to Work with the cluster or other cool projects. the easiest way to contact me is to go to my web page cloud rumble dot net or simply, send me an invite on LinkedIn. So before we jump into the building of the platform architecting it, let's actually talk about what platform is. And the way I like to do this is like contrasting it or putting it in the context of existing knowledge. So for me, there are three kinds of platforms. One is a business platform. If you have used an application like Uber or DoorDash, those are really platforms for consumers. They enable both consumers and vendors to connect and sell their services. So that's it. That's a platform that is essentially a product that is consumed by the end user. Another type of platform is the domain specific platform. So those are typically services that encapsulate cross cutting functionality for user facing services. A good example is if you have, let's say, An application that has a map functionality would like to put this map functionality like geofencing, translating addresses and so on into a separate service. And then this would be your platform. The developers can use. And finally, we have a third kinds of platforms, which are domain agnostic platforms. So this is the one that we are going to talk about today. And the way to define those is those are really building blocks that provide essential tools. They can be infrastructure focus, like the one we'll talk about today, or they can be older, domain agnostic, something that doesn't fall into your business domain, security related platforms and so on. All right. So with this definitions out of the way, Let's talk about why this is useful. What's the point of building platforms? Imagine somebody has already access to cloud like AWS or Azure. Why do they need something on top of that? The simple answer is that cloud infrastructure or infrastructure in general is built for everyone. Is not built for your developers specifically is built for everyone in mind. So every customer of Azure would need to find themselves in the cloud services. So they might provide way too much than your developers need. So thus, we need to simplifies the resources management and isolate only those features that are useful. And important and actually empower our developers rather than expose everything. this in turn increases development efficiency because developers don't have to spend time learning Kubernetes or learning all kinds of complex cloud services. They can just rely on the internal platform to provide them what they need. platforms are also typically built in scalability in mind. So they are from ground up, scalable and reliable. And this is something that clouds, of course, already have. But when you design your own platform, that's probably the concern that you will think about at the very beginning. So what are the building blocks of a typical cloud native developer platform? those are not exclusive all the building blocks, but also the ones that you would see in most modern platform. So first of all, you have a some form of self service portal. This portal can be a web interface like we will see in the demo, but it can be a script. It can be a program. It can be a web interface. API directly exposed to an endpoint. At the end of the day, this is really encapsulating APIs and making it simpler to access services and execute repeatable tasks like provisioning a service, executing a test, and so on. creating a femoral testing environment and so on. Finally, a second point is really the key. And I would like to emphasize this in this talk, but the platforms are really an additional abstraction layer over an existing APIs and they provide the programmatic APIs for developers to interact with. Again, self service portal could be one way of doing it, but you can also directly, access APIs if you need it. Automated workflows. So think of GitOps and similar principles. Platforms by design would use those to automate the data flow within the pipeline of whatever tasks are necessary from developers to execute. Developers can provision a service. And then the way it happens is that it uses behind the scenes tools like Argo maybe, or GitOps. That automate the whole provisioning process, monitoring and observability. this is not platform specific, but it's very important. Specifically, in the age of distributed systems, monitoring and observability, ability to roll back and maybe perform canary deployments or similar things is really critical. So baking it in the platform is something that we see, Every time and this is a very important building block security and governance controls. I would also bundle here isolation. So if you have a platform that needs to isolate workloads from a tenant to tenant, then you might use services like the cluster internally to isolate and achieve multi tenancy. you want to use products, maybe a Falco or others to harden your security and provide an end product that is secure. in a way that corresponds to your compliance rules. This also means providing audit trails or other things. So those are things that you don't necessarily see as a developer, but those are important building blocks. And finally, platform is ever evolving. Platform is a product. It is going to continuously evolve with your users. You need to have an organizational structure around platform that controls its improvement, its evolution and its features. So those are six building blocks that are definitely important to see in any successful platform. Best practices. or successful platforms. As I mentioned earlier, API driven is the key point. If you look at AWS, Azure, Google, or any other successful cloud platform, the way they provide access to their services is through API. You might, of course, consume it through a UI. But on a large scale, there's an A. P. I. Responsible for driving your decisions within the cloud. You can create a cloud service, manipulated and do all kinds of things through A. P. I. And I think having an A. P. I. Driven mindset in building your platforms is the key to success, and it sets you up on the path to success. Cloud native principles. This, of course, means designing The platforms to be friendly to cloud native applications. So things like containerizing orchestration, leveraging various cloud native projects that falls into this bucket. And as I mentioned earlier, GitOps workflow or some form of automated rollbacks now releases a B testing and, following GitOps principles one way or the other is definitely something that will ensure success of your platform. What is an API? Let's just make a quick refresher. So API stands for application programming interface and is really a set of rules and protocols for building and interacting with software. So I emphasize those two words here, interface and interacting. this is how typically the logic is driven. And if you design your platform thinking, how my developers would interface with the Set of services and API's are provide I provide and how would they how would the API's interact with existing software? So you have kind of a. Pipeline idea. At one end, you have a developers teams that use whatever interface is comfortable for them. And then at the other hand, you have some outbound, processes that interact with various software. Let's say Argo or, deploy something to Kubernetes. So if you have this mindset of having an API pipeline, It really helps design a very healthy platform. So why it all matters? Why do we want to have a platform? And why do we want to follow an API driven design? First of all, this simplifies a lot of things. You can hide or create an obstruction that exactly, corresponds to your developers needs. You can standardize on existing standards, HTTP, GRPC. You can use something that's already there. You don't need to invent a new communication protocols. It helps automate all kinds of tasks on various levels and layers of automation. And finally, APIs are very good at scaling, and you can also secure them relatively easily. A brief reminder, I want to repeat that platform is a product. So this simplistic diagram really shows that the way you design a platform is not different than designing any other product. So you want to understand your customer's needs. You want to design something that fulfills a need or two, test it, implement it, and deliver it to your customers, gather feedback, improve on it, rinse and repeat. The platform should be approached in the same way. So let's transition to a demo. During this demo, I want you to think of the principles we just discussed and see how they translate to an actual platform. So we are going to see how platform can empower developers, how it uses cloud native principles, is API driven, and is baked by GitOps workflow. So let's transition to a demo. I am going to wear a different, different hats. So now I am starting as developer, what on the screen. is my developer world. This is my application. It's called, Azure Storage Blob Reader. And the task of this application is to read the content of Azure Blob Storage and display it on the screen. I might not know too much about Kubernetes or cloud services. My knowledge of containerization ends with this very simple Dockerfile. Right here. Maybe somebody from platform team helped me create it. But that's about it. So my application is a Node. js app. We don't need to go in detail. In code. I just wanted to show you that me as developer, that's what I spend my time on. I am creating new features every day. I run tests. I make sure that everything works. And then I'm using cloud native services and cloud services in conjunction with my application. But the most important part here, that's where I would feel most comfortable. I don't need to deal with all the cloud complexity. All right, so that's the developer. So the developer. Now I would like to deploy this application and test it. I want to see if everything works correctly. So how do I do it? Remember from our principles, we are actually using service portal. So here for this demo, I am using port. Unfortunately, port still doesn't have dark mode. Apologies for that. I tried to keep everything in dark mode for presentation. Regardless, we can see, as developers, that we have two actions in our homepage. One, it says cross plane storage account reader, and we can create it, and another one for removing it. this is the extent of the knowledge I need to have in order to create my infrastructure. let's kick it. When I hit create, it asks me some variables, not many, just three that I might be interested in. One is connection string, how I want to name it. Another one is my image, maybe I want to iterate on the new versions of the image and I can bump it. An application port. I am happy to accept the defaults and behind the scenes what's happening port in this case would actually kick off a CICD pipeline, which in turn creates a PR that creates a necessary file. Okay, so this step is really just for convenience. We could have manually create a file, but we did it through port. as developers, we interact with port or with backstage, and we actually create various things. let's see what's happening. If we refresh this, we should see in a moment that there is a pull request being created by port. I don't know why it doesn't auto refresh yet, maybe a good feature request. So now we have a pull request into a repository called apps deployment. So let's see. Let's click on our pull request. All right, so we can see some details again. This is from a developer point of view. I can interact with this and read various things. I can see logs and runs and whatnot. That's not important at this point. But here I have a link to my repository. As you can see, we have apps deployment. This repository is specifically designed to deploy my application. So now I am changing my hats to becoming a platform team or maybe an admin, and I can see, oh, there's a new PR on a deployment repository. By the way, this action is obviously not mandatory. You can skip the PR review, but I want to show you that it's possible. So what does this PR do? This PR creates a single YAML file. We talked about API and this is the API in action. The API we all agreed on is the Kubernetes style API. So both platform engineers and developers agree that the way we're going to talk to each other and the way we're going to make things happen is by standardizing on Kubernetes API. Why a Kubernetes API? There are various reasons. Kubernetes already exists and has strong ecosystem. Kubernetes lends itself very well to designing custom APIs using CRDs. And the list goes on. So how does this file look like? You can see this file has API version, which if you're familiar with Kubernetes, you will know it has kind, which is a custom resource definition kind. So this is an app claim. It has some names and labels, and it also has spec. It follows the Kubernetes API design, which is spec and status. And within the spec, Those are the parameters you might remember. We specified in port when we triggered the service. So we have a namespace which we couldn't specify. This namespace is hard coded for our team. And then we have those parameters. So this simple YAML file is all what it's necessary to create our application, create associated cloud infrastructure, and other things. All right. So let me show you one more thing before we move. I'm going to trigger Argo CD. So here I just have a handy script to do this. So I just type just launch Argo and you will see that inside of Argo CD, there is not much happening just yet. Let me log in real quick. We have a simple bootstrap application and this Argo CD app Observes the apps deployment repository. The one we just seen a moment ago and the one which we opened the PR to. So for now we have this empty app. Nothing happens. Okay. So far so good. Let's approve the PR. Let's pretend that I reviewed it and we're going to merge it right now. When we merge the PR, we will see in a moment that Argo will pick it up. I will help Argo by refreshing the screen real quick and things start happening. We will see there's a lot of new resources that are being created just from the small configuration YAML that I submitted through API. So we have app claim, which is exactly the YAML that we've seen earlier with some Kubernetes annotations and additional code. Kubernetes added stuff, but you can recognize the spec here and little our parameters and so on and the kind API up claim. So we have applied it to the cluster. Okay, so how come we have all those things right here? We'll talk about this later, but let's go back to being a developer. Remember, one thing I did is I went through my port, I created, clicked create, and how do I know if my app is ready? Maybe I wanted to go and grab a coffee, but my goal is really to test my application. Okay, what's a better way than create a Slack notification? And as you can see here in my channel, I have an app notify. That I might be subscribed to and I have a notification and it tells me your application is deployed to localhost. Okay, let's give it a go. Indeed, we have Kubernetes demo. Let me make it a little bit bigger. So this is my application. That's my storage, Azure Storage Account Reader. So now it's working, but there are no documents found inside. And just to prove to you that we have all the architecture, all the infrastructure here, If we go to Azure for a moment, you can see that we have platform demo, which is the resource group that is being deployed and sorry for all the jumping. But going back to, you can see that one of the resources that we've deployed is actually resource group. So it is corresponding to my resource group in Azure. We also have vcluster, slug and some other stuff that's called object. It's quite some resources that we as developers didn't need to specify. All right, so you can see here we have platform demo storage. This is our storage bucket. If you're not familiar with Azure, storage bucket is like an S3 bucket in AWS. And inside of it, we have a simple container called sample blob, which is currently empty. There's nothing here. But our app works and we as developers. Just needed to click one button. Okay. So how do we test this API all the way down? Let's go back now. By the way, you can see here. we have cross plane resources. Cross plane is the magic behind creating all the cloud infrastructure. And then we also have here an NS, which is a Kubernetes cluster viewer. We can see all the events and we will look at this a little later. But for now, let's again switch hearts and let's become a developer. So now I don't want to use get ups. I don't want to use, something complicated. I just want to quickly create something that I can test my app. So my app needs an actual file that is inside of the blob and storage account. So what I can do is I can use your catalog, apply that file. And I go to examples folder. And here I have a blob content dot Yama. We'll see its content in a second, but remember, it's just an API. So I want to create a blob content inside of my newly created infrastructure. And let's see what happens. So it says created. If we go back to our web page and refresh now, you can see that our storage block leader actually reads the content of a file. Hello, 24 platform engineering and just to prove it to you that the file is here. You can see, indeed, there's an example file present in my storage account. from the developer's point of view, I was able to very easily spin up the infrastructure, create my application, and do everything that is necessary to interact with it. Let's go back real quick to the application and see what's happening inside of my application. But if we go to namespaces, you can see that there's a DevOps team namespace. And inside of it, I have a three replicas of my app full. And I have also a cluster and some coordinates. So why do I have the cluster? Let's imagine that the developer we tested this, but we also want to. Maybe we are a little bit more Kubernetes savvy and we want to experiment with Kubernetes. But of course, we Our kubernetes is populated by other tenants, and we don't really want to get in anybody's way. Maybe we want to create and try out new English controller or something else. And here, if you go back quickly to our slack notification, it tells us that we have here a dedicated V cluster. So if I have a big cluster CLI, Slack tells me, Hey, this is your big cluster. You can connect to it and you can do whatever you want in this big cluster. So let me open a new window. I can connect. And as you can see, we are connected. I can, for example, run maybe a debug port or I can run, any other port or create any other part they want. And this would be interacting with my future cluster. So we're not only given the developer the Testing ephemeral environment. We also given them, their own fully fledged virtual Kubernetes cluster where they can test whatever they want. So they don't have to open tickets and keep following up. So we are escalating the ability for developers, to use various testing tools. All right, let's, close it. Let's go back to our regular cluster. And we have seen the whole flow so far as a developer. I am very happy. I didn't need to interact with any of it. I can see my application clearly. and I can test it. I can also additionally look at the observation log, so you can open Grafana, which is a dedicated Grafana dashboard just for me. And as you can see now, just a little bit of data here. It's a dashboard that maybe my platform team prepared, and it only observes my namespace and my vcluster. And then I can see what's happening. Maybe I want to tweak resource quotas and so on in my application. So we can see we are giving developers various tools, and they are All automatically deployed. We don't need to create a ticket or anything. We just deployed all of this that you've seen. We have deployed from this small yaml file we've seen earlier, which is this guy. Okay, so now we're How did it all happen? Where is the secret sauce? Obviously, the complexity cannot completely disappear. So there is definitely complexity somewhere. So let's open a composition. So what I'm doing here, I'm using cross plane and cross plane composition. As you can see, it's 285 lines of YAML. And as a platform team, this is my task. I am encapsulating and hiding this composition, this, complexity inside of a composition. I'm hiding it away. From developers. So let's see what this composition has inside. So if we go for name, you can see it has an account service ingress and also deployment namespace. Those are all Kubernetes resources. It has the cluster. Resource group container and so on and so forth. So all those things are deployed for us by cross plane, and we can hide all the complexity using those tools. So you can go and read all the file. I will leave a link somewhere in the presentation, so you can see it later and you check the repository for it. But that's how it works. But this is. Again, I'm putting a heart of a platform engineer. That's how, that's how we can make it happen. That's how we can make it possible. All right. So now I'm done as a developer. I don't want to deal with this anymore. I have tested. I successfully tested everything. And now I want to delete. All the infrastructure, including all the Kubernetes resources. So as you might imagine, there is another action that I can perform in my port here, namely delete the resource. So I am just selecting the right repository. And I have to give here a file name. So I'm not deleting accidentally somebody else's file. Click delete. And as you can imagine, that opens another PR. We need to wait a second for another PR to arrive. And once we approve this PR, everything will be cleanly removed, including our cloud infrastructure, as well as Kubernetes resources. So that is a really, lean flow implementing the principles we were talking about and, showing you how potentially you could implement an infrastructure platform, and expose it to your developers. Again, let's pretend I've reviewed it. What it does, it just simply removes the file, but there's nothing crazy happening here. It's a simple GitHub action, confirm, and we are done. So with this done, if I go back to Argo and refresh, you can see that everything could be gone or is in the process of being deleted. And we can go also here, you can see on the left hand side, all the cross plane resources are also being currently removed. My application is being removed. Everything is cleanly done. All right. That was hopefully, showing you well, how you can do this, but let's quickly go back to presentation and let's, summarize what we've just seen. So what tools have we used? We have used Kubernetes and we used it not as a Container orchestration, but we use this as a control plane. We've leveraged Kubernetes A. P. I. Friends and foremost, and we used it to reconcile everything and anything. We used also the cluster to give our developers, virtual Kubernetes cluster if they need a little bit more. to play around with or test. We have used cross plane to deploy everything and reconcile our deployments to keep it synchronized with the desired state with the actual state. And we have used various cross plane providers. Providers are like telephone providers that you can target various infrastructure. We have used Azure provider. We have used cross plane functions, Kubernetes, Helm, HTTP. So that's like Lego building blocks, which you can target. We have used Port, which is a developer portal similar to Backstage, and we let our developers interact with Port rather than, directly interacting with the YAML file. I want to emphasize it is possible to directly interact with YAML or even to directly interact with Kubernetes API, depending on the need and the level of, our developers, knowledge about Kubernetes. We've used GitHub and specifically used it as an interface driving the exchange between the platform team and developers. And we used the. Messaging system. as prs, we use PRS as messaging systems. Our prs are actionable messages that developers send to platform team. They review it and then, magic happens. For GitHubs, we used Argo cd. We could have used flax or other mechanism, but Argo CD because of its ui, was nice to show it and believe it or not. All this that I showed you is running on my local kind Kubernetes cluster, which is Kubernetes in Docker. So I was able to encapsulate all those things inside of my Kubernetes. You can, of course, run it in Docker. in a cloud somewhere like in AKS, EKS or GCP, and you can do this equally well. All right. So that was tooling. Let's look at a helpful diagram, that will again, guide us through the journey that we just saw. And I would like to pinpoint certain aspects here. So we started by being a developer. We interacted with the portal and then we accessed the UI and we created a triggered, creating of our ephemeral testing environment, including our app, this in turn triggered action to git and our git repository. Received, an action or triggered an action as a third step, this PR, was reviewed by platform engineers approving. We have committed those changes to a repository where Argo reconciled it, applied it to a cluster, and from there, Crossplane Bye. Bye. Talk to Kubernetes API and used CRDs and various other mechanisms to reconcile all the infrastructure and the application. This in turn resulted in provisioning vCluster, provisioning our application, which is various Kubernetes resources, and also provisioning cloud services. So that's just a quick overview of what we have done in the demo. in conclusion, what we were able to do. We reduced friction for developers to almost zero. The only touch point that developers had to have with the infrastructure was a PR approval from somebody from platform team, but we could very easily eliminate this PR approval and then we would have actual zero friction in the whole process. We eliminated waiting times related to Tickets because we anticipated that our developers might need a little bit more experiment with and we have added the cluster that gives them essentially admin level privileges, virtual cluster, and they don't need to ask us constantly. Hey, can you create this for me? Can you create that for me? So we have both cater to the immediate needs of testing and application. And we also gave them A little playground that is encapsulated just for the team or just for the person that does the PR. We have used the Git PR as a unified interface between platform and developer teams, and we encapsulated the API calls to the Kubernetes. Those API calls are in the form of YAML. This is just configuration, but at the end, those are instructions for the Kubernetes API and other projects like Crossplane to do something with it. And the magic is in the collaboration part where platform team prepared something up front, which is a cross plane composition and all the setup. And then developers were able to collaborate by executing various APIs. We have also followed zero trust security principles because our developers didn't need any access to Azure. They can, but we've encapsulated it to a point where the developers simply interacted only with API through Kubernetes without even knowing that there's Kubernetes behind it. They just apply files and work with files and configuration and then achieve certain results. And we didn't even mention AI once. So the key take away from this presentation is really to think of designing developer platforms, whether they are infrastructure or other types of platforms. Think about it like designing an API. API is really a data pipeline. It has input. It has outputs. It has various. mutations that happen along the way. And if you think about how your API should look like and how you design and craft the contract between developers and platform team, that's a recipe for success. Thank you for your attention. Thank you for your time. please enjoy the rest of the conference. if you would like to reach out to me. You can visit my web page, CloudRumble. net, or simply connect me, connect with me on LinkedIn. And I would be really interested to hear your thoughts about Kubernetes, platforms, and all the tools that we've used.
...

Piotr Zaniewski

Head of Engineering Enablement @ Loft Labs

Piotr Zaniewski's LinkedIn account Piotr Zaniewski's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)