Conf42 Cloud Native 2021 - Online

Cgroups for the best: Using container technologies as a PaaS provider

Video size:

Abstract

Kubernetes, Docker, LXC, Virtual Machines, Micro-services… And what if all the trendy tail of containers isn’t just f** up? Let’s see how we run isolation and containers in our PaaS using POSIX and Linux native features: Cgroups!

We power a PaaS platform, but we don’t choose to rely on fancy containers technologies to run isolation. Instead, we intensively rely on permissions and cgroups. Here’s an explanation of our architecture, how we use it in production, and why we made those choices that we support for a while now. A proper introduction to isolation in production, for a lightweight environment, far away from Docker, Kubernetes, and others trendy stuff.

Summary

  • Hi everyone, thank you for being here for those new edition of Cloud Native. To talk to you about cgroups and this technology, about containerization and isolation of the process. Let's using to talk a bit about the future and how you could architecture your whole application.
  • A microservices architecture is an application with a collection of different services. Each one is responsible for a small business in your application. They are loosely coupled, which means that you do not depend on a service to run another service. If you need real scaling capabilities, then yes you do have to run microservices.
  • Most of our prediction server are running with Linux and Linux is POSIX compatible. The underlying technology under every container technology is a feature in the Linux kernel called C groups. It's not the same things that containers because it's something more complex. But it's easier in the long term than running containers.
  • Cgroups based isolation is a mechanism that isolate its process per user. It allows you to finely grain your permission and the resources that you add to a dedicated process. Nevertheless, you will still need an orchestrator to maintain everything.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, thank you for being here for those new edition of Cloud Native. I am really happy to be with you. To talk to you about cgroups and this technology, about containerization and isolation of the process. Let's using to talk a bit about the future and how you could architecture your whole application to have something more maintainable and more sustainable in your everyday apps development. You probably heard about microservices and what they are and how to use it in your everyday development. The fact is, microservices is probably not that used in production right now, just because it's not something really easy to use and really easy to do, because it consists of a but of pattern, different patterns that you may have to put in production right now. If you want microservices, you've probably heard about containers and running containers. The fact is the things is not that easy when you want to run different things, because in production, running containers is often something like that. It's a complex task with a lot of frustration and a lot of things to do and to maintain to be sure that everything will run okay in production. Let's dive in in microservices architecture and what it is and how to use it. So as a reminder, a microservices architecture is an application with a collection of different services. Each one is responsible for a small business in your application and they are loosely coupled, which means that you do not depend on a service to run another service. This is something that could run in parallel and they are all deployable on the fly independently, which means that you can scale up or scale down one microservice rather than the overall application in case of a huge consumption or a huge traffic at some point. So it's something like that. It's some kind of galaxy of different services that you have to keep and to interact together. So what are microservices exactly? Firstly, they are business oriented, which means you have one service to do just one thing and just one thing. For example, you could have a service responsible for a setting manager. And this is where you get and set the overall settings of your application. Each service exposes an API, which means that each service that want to interact with those service in question are through standardized API. So this is something that is well documented or well standardized and that allow different parts of your replication to talk together. So in the case of our setting manager module, it exposes getters and setups for each available setting in your application. Each service is independent, it does not require another one to run. So the interface service will get settings from the manager service, but it will fall back to default in case of unavailability of the setting manager. So your those application can be kept at part of availability. Even if some part has missings or some modules, some services are missing. They all rely on the bus message to transmit your information, which means that a service could publish various changes to a common bus. And those other devices that want to use the changes pushed by the first service just consume them in this command bus. So in our case, the setting manager will push every changes in the settings to the command bus and the interface service we had registered to the command bus and will capture the messages and update accordingly. And then finally they're stateless, which means that the data is stored on a dedicated storage backend. And you could deploy a lot of settings manager if you want to load balance build with them, or you can scale up them, or you can scale them down and just have just one service. You do not need to rely on the same standard located shelled storage. This is a dedicated back end for that. So if you want to run microservices, you do have to run a complex architecture with an orchestration for every services in this architecture. And then you will do have to run some kind of backend storage backend data messaging Steve system, like a redis or deadmq or something like that. So it's a complex architecture. It's something complex to run, complex to maintain, and complex to deploy. So do you need microservices for your business right now? If you have complex monolith architecture that you want to split, to have small parts and separated in small modules, then yes, you can use microservices. If your business is spread across different units, yes, you could develop one service per unit. So each unit will embed just one business and not those. Overall, if you really need scaling capabilities, and I really mean real scaling, because for most application you do not really need scaling. You just need to run up and down sometimes and maybe deploy and add more resources to your application than not big upscaling. But if you need real scaling capabilities, then yes you do have to run microservices. And if your team can be split to multiple small projects, because each small team will be responsible of just one unit. So you need to have a big team that you could split and spread across your multiple project. But if you do, yes, you could do microservices and you do need a team that is pretty much DevOps skilled because there will be a lot to maintain, to install, to deploy and so on. So you need that in your team. So when not using microservices, if you're not ready for observability, because each services will be run in its those container in its own sandbox. So you will need to observe them and to be sure that everything is run well. So you do have to have a big culture of observability. If you don't, it will be really complex for you to maintain and to keep the system in a stable state. If your team does not have DevOps, as we said, you're probably not prepared to use microservices. And if you don't know containers and isolation, then no, you don't be ready to run microservices. But most of all, if you want to run the web version of Flippy Bird, then no, you don't need microservices to do that. Microservices are really dedicated to really big apps, really big scale. So if you don't want that, if you just want to run a simple game, then you really do not need microservices. So my preference rather than microservices is probably more about multilanguages architecture. This is something a bit different when I prefer to develop components rather than services. And the components could be responsible for multiple paas of the logics, but I choose my components in regard of the technology I could use because I proper to use rust for some tasks and python for the one and running the front end with node js and umber PWA application. So I proper to pick my choices, the right technology and make the right choice rather than just splitting everything into really small chunks, really small services. The advantage is that you do not need DevOps capabilities to do that. You just need a platform that is able to run the different interpreters and the different paas of your code, but not a big large architecture to do that. So it's pretty much simple. And you still have the good of microservices because you still have scalability and flexibility, maintenance and so on. It's a bit mitigation of the complexity of microarchitectures. This is something I really prefer and it's flippy but compatible. So yeah, definitely. I prefer to do multi languages rather than microservices. So let's see how to do it in a real context application and let's talk about the containers. So the goal of containers is to run your processes in isolation to prevent any drama coming from a service that is killed at some point or that crashed because there is a bug inside it and it prevents you to crash the overall application and it allow you to just reboot some kind or reset some service at some point without affecting the rest of your application. So this is about preventing from any kind of data leaks. It allow you to distribute your resources from your hardware to your different paas of your application regarding their own conception. It keeps sandboxed environments which means that you can pick a version, a dedicated version of a language specifically for a module rather than fixing it for the overall platform. So it allow you to tweak your environment and it allow you to improve your observability because it will be easier to observe just a small part of your application rather than the over one. So you may think that isolation means containers and something like kubernetes or elixir or jazz or virtual machines or so on. But you're wrong if you think that because you can run isolation without containers. The fact is most of our prediction server are running with Linux and Linux is POSIX compatible. POSIX is a standard and in its basics you find different type that are really useful for us. Different patterns. First there is processes and processes allow you to isolate those different execution parts. There is I O controls which allow you to isolate the access. There is message pacing allow you to isolate those communication between the modules. There is permission allowing you to isolate your resources. So every of them is really useful. So you do not need containers, you need a safe isolation system because when you will have your overall isolation for each kind of those module you do not need a container to keep it in a sandbox. Linux can do that and it could do it very well. The underlying technology under every container technology docker, container D, Addix C whatever you want is a feature in the Linux kernel called C groups. It's also used at system level by systemd and it's a kernel level isolation, more about it in a minute. There is no really difficulty to run it on your own because it's a built in feature in the canal. So it's just something to activate. There's a lot of documentation, it's those standard so it's really easy to run and to deploy in production. But you will need some high Linux skills and it's not the same things that containers because it's not like pushing an image, composing an image and pushing an image on a repo. It's something more complex because you will have to do that in the hardware by itself directly. But it's more easier in the long term than running containers. The fact is it's probably more something like isolating containers and full images or like almost a virtual machine, or isolating processes. And definitely isolating processes is an easier task. What about building the platform I'm working for? Always data. We're a cloud provider and we made a cloud platform before the cloud era 15 years ago. At first we were just hosting providers, but we did have to find solution to properly isolate the various processes owned by our customers. And 15 years ago we didn't have docker or containers or whatever, we just had Linux. So we did have to deal with that. Fortunately there were c groups available. Basic definition of cloud is you will need to have a high availability. Your service need to be up at any time, at any moment and could be restart at any point. So we need to maintain the state of high availability. You need elastic scalability to be able to scale up and scale down your application and by adding more resources at some point if you need it, you need embedded services, database, storage, messaging system and so on. You need an agent, fast picture, so something that allow you to be really close to your end customers. And you need native isolation to be able to just simply isolate your processes from the others, from others user. So the cgroups based isolation, it's a mechanism that isolate its process per user, so it's bound to a user and it's dedicated to it. You will have one C group per process per user which allow you to finely grain your permission and the resources that you add to a dedicated process. You will use POSIX permissions for the resources, so it would be really easy to do. In fact it's something like that. There is a pseudo file system in your operating system exposed by the kernel. And here we have a C group container by the container service, which is our own orchestrator, and per user is the intranet user. There will be a proxy and an Apache upstream, and the Apache Upstream is a C group dedicated to the Apache process for this user specifically. And I will have a bunch of files, some of them prefixed by the cgroups prefix, and each of them allow you to manipulate those cgroups. There is one file specifically which is really interesting, which is a cgroup prox. In this file you will just find the process ideas of the various process you want to bind to this cgroups. So it's as easy as just writing a string, writing a number, which is the Id process you want to add to the group. You write it in the file and the process is isolated in the group. That's all. You don't have anything more to do. It's pretty easy to just keep some things isolated in your system, in your running system, in production. It also allows you to set the limits that will be affected to a dedicated process. It's a C group native capability, so it's something that is already available in your kernel. It's the same thing that fixing the cgroups, it's just writing some things into file and it allow you to cap some hardware resources. It will use a kernel system to balance different resources affected to a process at some point. So you could have multiple process, an application running multiple languages, multiple components, and each of them will be properly balanced by the kernel to affect the resources to not impact a process rather than another if a process in the system is high costuming. So it's a really easy system. And as previously you've got in your C group file name memory max for example. And in this one I just put the quantity of memories, the amount of memories that I want to affect to my up Apache upstream process and that's all. And here I can fix 4gb of memory for the Apache upstream process. I just write those number in the file and that's all. And the kernel will do it for me. It's pretty much easy to do. Nevertheless, you will still need an orchestrator to maintain everything because if you have a lot of processes like we do have on our cloud platform, because we have a lot of users in the same servers, you do need an orchestrator to automate the various tasks. This is something that is central, something like kubernetes. It's just heres to automatically create and delete and update the C groups regarding to the user provider settings. So it's something not that complex because it's just a central point of code that just creates C groups, affect them to the user, put those process number in them, fix the limit by editing the files in the C group set of file system. And that's all. So it's not that complex, that's an overall Kubernetes and not something that as complex to maintain and to understand and to use. So this is something more easy to do, but you will need it if you want to automate the task. But if you do not need to automate them, you could do it manually perfectly and it will work. Definitely. Another great capability of the cgroups feature is that you could interface it. We interface it with spam, the system pluggable authentication module system in Linux because we use it to sandbox any kind of processes. In our case when a user SSH on the platform. OpenSSh doesn't support the cgroups natively, but we authenticate the user using PAM and using a custom script. We instantly attach the SSH process for this current session from the user in its own C group. So it is instantly isolated in its own process, and it's completely transparent for the user. Same thing about the network, you can easily create private ips for your inside process and inside interfaces, and you could just rely on EP tables to radiate the traffic from the external interfaces to the internal ones of your different system. In fact, we are supposed to use EP tables, but Netfilter is something that is really complex and not that handy, so we prefer to patch the kernel to just prevent the EP table use. It's a really simple patch in the kernel that allow us to just create bunch of private ips and cap them in the C group to authorize the access if the private ip is declared to be used by the user. So it's pretty easy to do. We just compare some Gid numbers and registered address and that's all. It's not that complex, but if you want to do that in another way, eptable is definitely a good way if you don't want to patch the kernel by itself. Another really good feature is the ability to forbid the process read access for the other users. It's just an option in cgroups to enable, and if you do that, your user will run a PS to analyze the different process running on the system. He won't be able to see the other user processes, it will be sandboxed in its own environment. So you want to those different user on your platform to be able to read the different process of the other users on the platform. So it's pretty much easy. Cgroups come with another system which is called namespace. It's another layer that live in parallel with cgroups that allow you to partition the kernel resources, network interfaces kind of stuff. And this is the same thing. It's used in every container technologies, Docker, Elixir and so on. It's really useful, but it's not that mandatory if you want to just run a simple isolation platform process. We do not use it at always data. It's really overkill for our use cases. Do you need to use POSIX and cgroups instead of containers? You could do it because it's as easy as just making McAdier and t instruction to write into files in the POSIX system. So it's really great to use. It's fully agnostic, it's POSIX compatible, it doesn't need to create images and push images and repos to pull them in production later. You do not need some kind of orchestrator to deploy and manage some kind of services. You could just use a simple platform isolated processes and run your different components or your different services. If you want to target a microservices architecture to simply run really well sandboxed and trendy to scale application without having to worry about the complex orchestrators. Those fact is you dont need kubernetes or keytrees or whatever. You need a fair isolation on a reliable platform and that's all you do not need anymore. The good thing is that using those technologies we are ready to run some kind of bright future with the WASM serverless based architecture which will be some kind of webassembly server running with C Group's POSIX capabilities to definitely kill the containers and run webassembling lambdas directly into your server, running in isolated processes, taking care of the security model of webassembly and the capabilities of cgroups and we will not need containers anymore. And this will be really great in kind of production processes, but it will be for another talk probably. I'm mad. I'm a tech evangelist at Always data, a cloud provider. CGroups is a technology we are using for 15 years and this is something really really useful and really reliable and we do not run containers in production and we are really proud of it. Thank you very much. If you have any question and we'll be happy to talk to you on the discord of the conference. See you later,
...

M4dz

Tech Evangelist - Developer Advocate @ alwaysdata

M4dz's LinkedIn account M4dz's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways