Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, thank you for being here for those new edition
of Cloud Native. I am really happy to be with you.
To talk to you about cgroups and this technology,
about containerization and isolation of
the process. Let's using to talk a bit about the future and
how you could architecture your whole application to
have something more maintainable and more sustainable in
your everyday apps development. You probably
heard about microservices and what they
are and how to use it in your everyday development.
The fact is, microservices is probably not
that used in production right now, just because it's not something
really easy to use and really easy to do,
because it consists of a but of pattern,
different patterns that you may have to put in production right
now. If you want microservices, you've probably heard about
containers and running containers. The fact is the things is
not that easy when you want to run different things, because in
production, running containers is often something like that. It's a
complex task with a lot of frustration and a
lot of things to do and to maintain to be sure that everything
will run okay in production. Let's dive in in
microservices architecture and what it is and how to use
it. So as a reminder, a microservices architecture
is an application with a collection of different services.
Each one is responsible for a small business in
your application and they are loosely coupled, which means that
you do not depend on a service to
run another service. This is something that could run in parallel
and they are all deployable on the fly independently,
which means that you can scale up or scale down one microservice
rather than the overall application in case of a
huge consumption or a huge traffic at some point. So it's something like
that. It's some kind of galaxy
of different services that you have to keep and to interact
together. So what are microservices exactly?
Firstly, they are business oriented, which means you have one
service to do just one thing and just one thing.
For example, you could have a service responsible for
a setting manager. And this is where you get and set the
overall settings of your application. Each service exposes
an API, which means that each service that want to
interact with those service in question are
through standardized API. So this is something that
is well documented or well standardized and that allow different
parts of your replication to talk together. So in the case of our setting
manager module, it exposes getters and setups for
each available setting in your application. Each service
is independent, it does not require another one to run.
So the interface service will get
settings from the manager service, but it
will fall back to default in case of unavailability
of the setting manager. So your those
application can be kept at part of
availability. Even if some part has missings
or some modules, some services are missing.
They all rely on the bus message to transmit your
information, which means that a service could publish various
changes to a common bus. And those other devices
that want to use the changes pushed by
the first service just consume them in
this command bus. So in our case, the setting
manager will push every changes in the settings to the command bus
and the interface service we had registered
to the command bus and will capture the messages and update accordingly.
And then finally they're stateless, which means that the
data is stored on a dedicated storage backend.
And you could deploy a lot of settings manager
if you want to load balance build with them, or you can scale
up them, or you can scale them down and just have just
one service. You do not need to rely on the
same standard located shelled storage.
This is a dedicated back end for that. So if you want to run microservices,
you do have to run a complex architecture with
an orchestration for every services in this architecture.
And then you will do have to run some kind
of backend storage backend data messaging
Steve system, like a redis or deadmq or something like that.
So it's a complex architecture. It's something complex to run, complex to
maintain, and complex to deploy. So do you need microservices
for your business right now? If you have complex monolith
architecture that you want to split, to have small
parts and separated in small modules, then yes, you can
use microservices. If your business is spread across
different units, yes, you could develop one service per
unit. So each unit will embed just one business
and not those. Overall, if you really need scaling
capabilities, and I really mean real scaling,
because for most application you do not really need scaling.
You just need to run up and down sometimes and
maybe deploy and add more resources to your application than
not big upscaling. But if you need real scaling capabilities,
then yes you do have to run microservices.
And if your team can be split to multiple
small projects, because each small team will be responsible of
just one unit. So you need to have a big team that you could
split and spread across your multiple project. But if you do,
yes, you could do microservices and
you do need a team that is pretty much DevOps skilled because
there will be a lot to maintain, to install, to deploy and so
on. So you need that in your team. So when not
using microservices, if you're not ready for observability,
because each services will be run
in its those container in its own sandbox.
So you will need to observe them and to be sure
that everything is run well. So you
do have to have a big culture of observability.
If you don't, it will be really complex for you to maintain and
to keep the system in a stable state. If your team does not
have DevOps, as we said, you're probably not prepared
to use microservices. And if you don't know containers
and isolation, then no, you don't be ready
to run microservices. But most of all, if you want to run the
web version of Flippy Bird, then no, you don't need microservices to
do that. Microservices are really dedicated to really big apps,
really big scale. So if you don't want that, if you
just want to run a simple game, then you really do not
need microservices. So my preference
rather than microservices is probably more about multilanguages
architecture. This is something a bit different when
I prefer to develop components rather than services.
And the components could be responsible for multiple paas
of the logics, but I choose my components in
regard of the technology I could use because I proper to
use rust for some tasks and python for the one and running
the front end with node js and umber PWA application.
So I proper to pick my choices, the right technology
and make the right choice rather than just splitting everything into
really small chunks, really small services. The advantage is that
you do not need DevOps capabilities to do that. You just need a
platform that is able to run the different interpreters and
the different paas of your code, but not a big large architecture
to do that. So it's pretty much simple. And you still have the good of
microservices because you still have scalability and flexibility,
maintenance and so on. It's a bit mitigation of the
complexity of microarchitectures. This is something I really prefer
and it's flippy but compatible. So yeah,
definitely. I prefer to do multi languages rather than microservices.
So let's see how to do it in a real
context application and let's talk about the containers. So the
goal of containers is to run your processes in
isolation to prevent any drama coming from
a service that is killed at some point or that crashed because there
is a bug inside it and it prevents you to crash
the overall application and it allow you to
just reboot some kind or reset some service
at some point without affecting the rest of your application.
So this is about preventing from any kind
of data leaks. It allow you to distribute your
resources from your hardware to your different paas of your application
regarding their own conception. It keeps sandboxed
environments which means that you can pick a version, a dedicated
version of a language specifically for a module rather
than fixing it for the overall platform. So it allow
you to tweak your environment and it allow you to improve
your observability because it will be easier to observe
just a small part of your application rather than the over one.
So you may think that isolation means containers and
something like kubernetes or elixir or jazz
or virtual machines or so on. But you're wrong if you think that because
you can run isolation without containers. The fact is most
of our prediction server are running with Linux and Linux
is POSIX compatible. POSIX is a standard and in
its basics you find different type that are really
useful for us. Different patterns. First there
is processes and processes allow you to isolate
those different execution parts. There is I O
controls which allow you to isolate the access.
There is message pacing allow you to isolate those communication between the
modules. There is permission allowing you to isolate your
resources. So every of them is really useful. So you do
not need containers, you need a safe isolation
system because when you will have your overall
isolation for each kind of those module you do not need a container
to keep it in a sandbox. Linux can
do that and it could do it very well. The underlying
technology under every container technology docker,
container D, Addix C whatever you want is a feature
in the Linux kernel called C groups. It's also
used at system level by systemd and it's a kernel level isolation,
more about it in a minute. There is no really
difficulty to run it on your own because it's a built in feature
in the canal. So it's just something to activate. There's a lot of documentation,
it's those standard so it's really easy to run and
to deploy in production. But you will need some high
Linux skills and it's not the same things that
containers because it's not like pushing an image,
composing an image and pushing an image on a repo. It's something more
complex because you will have to do that
in the hardware by itself directly.
But it's more easier in the long term than running containers.
The fact is it's probably more something like isolating containers
and full images or like almost a virtual machine,
or isolating processes. And definitely isolating
processes is an easier task. What about building the platform
I'm working for? Always data. We're a cloud provider
and we made a cloud platform before the cloud era
15 years ago. At first we were just hosting providers,
but we did have to find solution to properly
isolate the various processes owned by our
customers. And 15 years ago we didn't have
docker or containers or whatever, we just had Linux.
So we did have to deal with that.
Fortunately there were c groups available.
Basic definition of cloud is you will need to have
a high availability. Your service need to be up at
any time, at any moment and could be restart at any
point. So we need to maintain the state of
high availability. You need elastic scalability to be able
to scale up and scale down your application and by adding more resources
at some point if you need it, you need embedded services,
database, storage, messaging system and so on.
You need an agent, fast picture, so something that allow
you to be really close to your end customers. And you
need native isolation to be able to just simply
isolate your processes from the others, from others user.
So the cgroups based isolation, it's a mechanism that
isolate its process per user, so it's
bound to a user and it's dedicated
to it. You will have one C group per process per
user which allow you to finely grain your permission and
the resources that you add to a dedicated process.
You will use POSIX permissions for the resources, so it would be
really easy to do. In fact it's something like that. There is a
pseudo file system in your operating system
exposed by the kernel. And here we have a
C group container by the container service,
which is our own orchestrator, and per user is
the intranet user. There will be a proxy and
an Apache upstream, and the Apache Upstream
is a C group dedicated to the Apache process for
this user specifically. And I will have a bunch of files,
some of them prefixed by the cgroups prefix,
and each of them allow you to manipulate those cgroups.
There is one file specifically which is really interesting, which is a cgroup
prox. In this file you will just find the
process ideas of the various process you want
to bind to this cgroups. So it's as easy as just
writing a string, writing a number, which is the Id
process you want to add to the group. You write it in the file and
the process is isolated in the group. That's all. You don't have anything more
to do. It's pretty easy to just keep some things
isolated in your system, in your running system,
in production. It also allows you to set the
limits that will be affected to
a dedicated process. It's a C group native capability,
so it's something that is already available in your kernel. It's the same
thing that fixing the cgroups, it's just writing some things into
file and it allow you to cap some hardware resources.
It will use a kernel system to balance different resources
affected to a process at some point. So you could have multiple process,
an application running multiple languages, multiple components,
and each of them will be properly balanced by the kernel to affect the resources
to not impact a process rather than another
if a process in the system is high costuming.
So it's a really easy system.
And as previously you've got in your C group file
name memory max for example. And in this one I
just put the quantity of memories, the amount of memories
that I want to affect to my up Apache upstream process and that's
all. And here I can fix 4gb
of memory for the Apache upstream process. I just write those number in the file
and that's all. And the kernel will do it for me. It's pretty much easy
to do. Nevertheless, you will still need an orchestrator
to maintain everything because if you have a lot of processes like we
do have on our cloud platform, because we have a lot of users in
the same servers, you do need an
orchestrator to automate the various tasks. This is
something that is central, something like kubernetes.
It's just heres to automatically create and
delete and update the C groups regarding to the user
provider settings. So it's something not
that complex because it's just a central point of code
that just creates C groups, affect them to the user,
put those process number in them, fix the limit by
editing the files in the C group set of file system. And that's all.
So it's not that complex, that's an overall Kubernetes
and not something that as complex to maintain and to understand and to
use. So this is something more easy to do, but you
will need it if you want to automate the task. But if you do not
need to automate them, you could do it manually perfectly and it
will work. Definitely. Another great capability
of the cgroups feature is that you could interface it.
We interface it with spam, the system pluggable authentication
module system in Linux because we use it to sandbox any kind
of processes. In our case when a user SSH on the
platform. OpenSSh doesn't support the cgroups natively,
but we authenticate the user using PAM and using a custom
script. We instantly attach the SSH process
for this current session from the user in its own C group.
So it is instantly isolated in its own process,
and it's completely transparent for the user. Same thing about the network,
you can easily create private ips for your inside
process and inside interfaces,
and you could just rely on EP tables to radiate the traffic
from the external interfaces to the internal ones of
your different system. In fact, we are supposed to use
EP tables, but Netfilter is something that is really complex and
not that handy, so we
prefer to patch the kernel to just prevent the
EP table use. It's a really simple patch in the kernel that allow
us to just create bunch of private ips and
cap them in the C group to authorize the access if
the private ip is declared to be used by the
user. So it's pretty easy to do. We just compare some
Gid numbers and registered address and
that's all. It's not that complex, but if you want to do
that in another way, eptable is definitely a good way if
you don't want to patch the kernel by itself. Another really good feature
is the ability to forbid the process read access for
the other users. It's just an option in cgroups to enable,
and if you do that, your user
will run a PS to analyze the different process
running on the system. He won't be
able to see the other user processes, it will be
sandboxed in its own environment.
So you want to those different user on your platform to
be able to read the different process of the other users on the
platform. So it's pretty much easy.
Cgroups come with another system which is called namespace.
It's another layer that live in parallel with cgroups that allow
you to partition the kernel resources,
network interfaces kind of stuff. And this is
the same thing. It's used in every container technologies, Docker,
Elixir and so on. It's really useful, but it's not
that mandatory if you want to just run
a simple isolation platform process. We do not use it
at always data. It's really overkill
for our use cases. Do you need to use POSIX and
cgroups instead of containers? You could do it
because it's as easy as just making McAdier and t instruction
to write into files in the POSIX system. So it's really great to
use. It's fully agnostic, it's POSIX compatible, it doesn't need to
create images and push images and repos to pull them in
production later. You do not need some kind of orchestrator to deploy
and manage some kind of services. You could just use
a simple platform isolated processes and run your different
components or your different services. If you want to target a microservices
architecture to simply run really well sandboxed
and trendy to scale application without having to worry
about the complex orchestrators. Those fact is you
dont need kubernetes or keytrees or whatever. You need
a fair isolation on a reliable platform and
that's all you do not need anymore. The good thing is that
using those technologies we are ready to run some kind
of bright future with the WASM serverless based
architecture which will be some kind of webassembly server running
with C Group's POSIX capabilities to definitely kill
the containers and run webassembling lambdas directly
into your server, running in isolated processes,
taking care of the security model of webassembly and
the capabilities of cgroups and we will not need
containers anymore. And this will be really great in
kind of production processes, but it will be for another talk
probably. I'm mad. I'm a tech evangelist at Always data, a cloud
provider. CGroups is a technology we are using
for 15 years and this is something really really useful and really
reliable and we do not run containers in production
and we are really proud of it. Thank you very much. If you have any
question and we'll be happy to talk to you on the discord of the conference.
See you later,