Conf42 Kube Native 2023 - Online

Introduction to Ambient Mesh

Video size:


Ambient Mesh is a new mode in istio. The talk is an intro to this new architecture


  • Abdel is a cloud developer advocate with Google. He works on Kubernetes and service mesh. Today he will talk about ambience mesh.
  • Istio is a service mesh tool. It's essentially a deployment that runs in the istio system. It can be scaled up and down depending on traffic. Service mesh is a way to implement service discovery and load balancing.
  • The concept of service mesh is not really new. It used to be implemented through proxies. Ambient mesh should have compatibility with existing sidecar based networks. It requires restarting workloads in order for the proxy to be injected and used.
  • Hbond takes the CNI and makes it work better with the ztunnel. It can tunnel through the connection through a single MTLs connection using HTTP connect. It's actually better performance than sidecars.
  • Kubernetes community has worked on a new API, a new open source API called the Gateway API. The Gateway API is essentially a set of APIs that are going to be the next generation ingress. It will eventually replace Ingress as an API. One of the key implementation details of the gateway API is the fact that you can do cross namespace routing.


This transcript was autogenerated. To make changes, submit a PR.
You. Hello everyone. Thank you for having me. Today we're going to talk about ambience Mesh. My name is Abdel. I'm a cloud developer advocate with Google. I work on Kubernetes and service mesh. I've been with the company for almost ten years. I'm also a co host of a Kubernetes podcast called the Kubernetes podcast from Google. And I'm also a CNCF ambassador and that's my Twitter x if you want to reach out about anything related to this topic. So today I'm going to do an introduction to ambience mesh. But before I get going, we have to understand what istio is and what does it do. So Istio is a service mesh tool. It's open source, it is part of the CNCF landscape. It's actually a graduated project in the CNCF. Like any Kubernetes tool, it basically follows the same architecture which is based on a control plane and data plane. In the case of istio, the control plane is literally called Istio, which is a deployment. If you are familiar with Kubernetes, it's essentially a deployment that runs in the istio system namespace and it's completely stateless, so it can be scaled up and down depending on traffic. The data plane for Istio is what we call the proxies. These are based on an open source proxy called Invoi. So Invoi is C proxy, which was written and open sourced by Lyft and they have just made it available for everybody to use. And the way issue works is essentially you would deploy the control plane, then you would typically label namespaces in kubernetes to say, I want this namespace to be part of the service mesh. And I'm going to explain why we say a service mesh later. And what would happen is the proxy will be automatically injected next to your workloads and it will be set up such a way that it transparently intercept traffic coming in and out of your application. So in the example you see on the screen, the little rectangle, the gray rectangle is the pod. Inside the pod you have the service, which is one container, and then the proxy is a second container. It has a name. In Kubernetes we call it sidecar. Sidecar is not a Kubernetes native object per se. It's more like, well, that's not correct. Since Kubernetes 1.28 kubernetes have implemented a way to handle sidecars in a more native way. But sidecars is just a common term. It's just like an agreed on kind of pattern where we decided that inside a pod, if you have two containers and one container is providing extra features, then we would call it a sidecar. So when your pod boots, the proxy would boot, it will connect to the istio control plane, download all its configuration, including any policies to enforce, any routing to do, where to send, telemetry, et cetera, et cetera. It also downloads the entire routing table, so it keeps in memory a list of all the other pods, pods in kubernetes that are represented by an API port. So the proxy is aware of all the other pods and then any certificates. And one of the key things that people use Kubernetes istio for is mtls, so mutual tls where you have certificates on both the client and server. So this is how istio stands today. It has also like a set of, we call them special proxies. One of them is the ingress gateway and the other one is the egress gateway. They are just standalone invoice proxies that also are configured through the control plane and they are used, one of them is used for ingress, so ill traffic coming from outside the service mesh to inside the service mesh, and you can also use it to enforce some policies on the perimeter. And then the egress gateway is used for traffic leaving the service mesh, which also can be used to implement some policy enforcement. Now why do we call this a service mesh? Where is the term mesh coming from? Well, if you're familiar with Kubernetes in Kubernetes space, you would use a service capital s as a way to implement service discovery and load balancing. So if you have two applications, application a, application b inside the Kubernetes cluster, you would create a service for application b, and then that service would create a DNS entry in kubedns or whatever DNS you're using core DNS or whatever. And then you would use that sqdn of the service in order to a discover all the pods behind the service and b have a single point of entry toward the service. Typically a service gives you a stable vip virtual ip, and then regardless of where you are in the cluster, if service a is talking to service b or application a is talking to application b. So application a would use service b for service discovery, it will get an IP address, in return, it will send traffic to that IP address, and then some magic behind which is typically implemented through IP tables or other mechanisms would implement load balancing, which is typically a route robin in a service mesh scenario, that's completely different. Services which are again capital s services are still used for service discovery, so you still implement service discovery from an application perspective. The same way if you have service b, you have to create a Kubernetes service for it. And then you would use that Kubernetes service to call it from service a or from application a. The communication path is completely different because since the proxies know each other, and since every proxy know every other proxy, the service discovery is implemented the same way. But at the moment when service a application a sends traffic, that traffic is intercepted by the proxy and the proxy sends traffic directly to the other proxies that represent service b directly to their IP address port. So the VIP is not used for traffic routing. That cluster IP created by the service that you have created is not used in this case. And that's why we call it service mesh, because you have basically a mesh of communication. You have every proxy or every pod talks to every other pod. So explained a little bit how istio works. Istio is used for a lot of things, including policy enforcement. You can do things like timeout and retries and circuit breakers. You can do things like authorization, using jot tokens, or using Spiffey, which is one of the protocols inside of kubernetes. You can do a lot of traffic shaping like content routing, canneries a, b testing, et cetera. So all of these things are implemented in the infrastructure layer. That's the key point with the service mesh like istio, is that if you want to implement any of these things as a developer, you will have to write code for it. With istio, you basically let the network layer, if you want, handle all these things for you. So the app itself doesn't have to be even aware that there is a timeout or there is a retry policy, or there is a circuit breaker or any of these things, right? The concept of service mesh is not really new. It existed for a very long time. People had to do this kind of traffic routing, et cetera. And it used to be implemented through proxies. Typically what really a service mesh introduced is just this concept of sidecar that we have talked about, right? So sidecars give us a lot of things. They allowed us to implement kind of network smart feature in the network, in the infrastructure layer without having to implement them in the code. And while they are useful and important, sidecars have some complications. One of them is that they are very invasive. What do we mean by invasive? Well, imagine that you have a scenario where you don't have istio, and the steps in order for you to implement istio is that you would start by deploying the control plane. That's just typically just a deployment. There are a bunch of crds that you have to deploy because in the istio world it has its own objects for traffic routing. So all these crds get created and then in order to add existing applications to the service mesh. Here I'm talking about scenario where you're going from I don't have a service mesh to I want to have istio. If you're starting fresh, this is probably not a problem for you, but then what you would do is you would tag or label namespaces and then you will have to restart your pods. And that's why we say it's invasive, because it requires restarting workloads in order for the proxy to be able to be injected and used. And that's typically not a problem, but typically not a problem. Depending on the scenario. If you don't want to reload your workloads then it would be hard. And typically what people do is that they would wait until next time they do upgrades for their kubernetes clusters and then they would install istio, which is fine, except that you are basically implementing too many changes and that's typically not recommended from a change management perspective. It also doesn't work with some implementations like istio sidecar istio doesn't implement TCP, doesn't implement TCP, sorry, it doesn't implement websockets. There are a bunch of things that are not implementable. Only HTTP communication is implementable. And then the last thing, and this is like a contention point really, if people have been using Istio for the last five years, is the resource requirements in the last benchmark executed on Istio, I think 1.18 the benchmark is something like a 0.3 or 0.4 virtual cpu and around 40 or 50 megabytes of memory per sidecar for a service which is serving 1000 requests per second. Again, don't forget that there are sidecars per pod. So for each pod there is your container plus the sidecar. So zero point 35 or 0.4 VCPU and 50GB of memory might not sound like a lot, but if you are running inside a cluster that contains 1000 containers or 2000 containers, that could add up. Essentially the moment you add istio, you're doubling up the number of containers in your cluster and that's an issue. So the community and the maintainers of istio got together and tried to figure out a way to solve this. And they came up with this idea of ambience mesh. So the whole idea of ambient mesh is to change the data path. The control plane will remain the same and is the same. The data path, the way we insert intelligence into the network has to be implemented through a set of requirements. One of them, it has to be nondisruptive to workloads. In other terms, adding or removing the proxies or whatever is going to replace the proxies should be not transparent, at least for a while. Ambient mesh should have compatibility with sidecar based istio, because we are aware that the way people will implement ambient mesh will be through a migration process of existing istio workloads. And that's like a very complicated thing to do. So one of the requirements is traffic interoperability between traditional sidecars and no sidecars, which is what ambient mesh is aiming to do. And then in order to enable it, to disable it, they wanted to implement it through a simple way. So in the new architecture for ambient mesh, sidecars are gone and they are replaced by two types of proxies. Those proxies try to treat the mesh as two different layers, secure layer and the layer seven processing layer. The secured layer is implemented through a proxy per node. So it's a multitenant per node cluster. So there is no more per pod proxy anymore. It's per node called ztunnel. ZTunnel runs as a demon, so it runs one proxy per node. It's completely stateless, which means it can be scaled up and down. It has built in authentication and encryption, and it implements some of the layer four policies and telemetry. If we want full layer seven policies, like authorization policies for example, which require something to look at the HTTP header to implement the authorization. Then they added another thing called the waypoint proxy. This is a per namespace proxy which still uses invoi. So ztunnel is a new developed proxy in Rust, but the waypoint proxy implements is based on invoi. And then they used this new protocol called Hborn for encryption and authentication. Now in this new architecture, there are a bunch of things that we have managed to solve. One of them is if people only want to do mtls, then you don't have to implement the layer seven process layer, you don't need it. You can just disable it and just have the overlay, a discure overlay layer through the ztunnel. If you want some basic traffic management like through TCP routing, et cetera, et cetera you can also do that. By the way, I said earlier that issue doesn't support TCP. That was wrong. It doesn't support UDP, not TCP. And then if you want some advanced traffic management or security with authorization policies, then you can implement the layer seven processing layer. And through this new architecture, the aim also is to try to make adopting service mesh as easy as possible. So this is how the ztunnel looks like. I talked about the fact that ztunnel runs per node. You can consider the little purple squares as the node. Each node has a ztunnel running in it as a demon set, completely scalable up and down. If there is a lot of traffic, all the containers in the pod which now don't have sidecars anymore send traffic to ztunnel. And then the Ztunnel implements HTTP tunneling as an overlay to basically encrypt traffic as it goes between two nodes. One of the things also ztunnel does is that it keeps the identity of the pod. So if you have container c one or pod c one sending traffic to pod s one, then s one will see the traffic coming as the identity of pod c one. So it will see the service account essentially. That's what I'm trying to say. Then if you want to add those layer seven policies, then we create the waypoint proxy for you, or you will have to deploy it manually. And then if there are any policies to be enforced, then they will be enforced by the waypoint proxy. Again, the waypoint proxy ran per namespace, so there is no more sidecars. Again, so it's just a special proxy that runs somewhere and it's responsible for one namespace, completely scalable as well if there is more traffic, because it's stateless. So we talked quickly about how we'd traditionally deploy issue service mesh in the traditional deployment model, which uses sidecars. So you would deploy the control plane and you would tag namespaces and then restart them to inject the sidecar in the new mode with ambient mesh. You don't have to do any of that stuff. You deploy the control plane, obviously, and then you can just enable the ztunnel or enable the ztunnel and then the ctunnel will be implemented through the network CNI, because istio does have a CNI. So they basically took the CNI and made it work better with the ztunnel. So what is Hbond? So traditionally with istio based proxies in istio sidecar, every connection from the client creates a new TCP connection between the proxies. So you see here I have two containers, c, one and s one container, c one talks to three different ports, and then for each of those ports there is a new TCP tunnel, or TCP connection created between the proxies. So with Hbone, one of the things this protocol can do is that it can tunnel through the connection through a single MTLs connection using HTTP connect. So it's actually better performance than sidecars. And although this is what Hbone is able to do, by the way, this is actually not visually correct because there are no sidecars. It's the ztunnels talking to each other. And the ztunnels will have a single MTLs connection and they will tunnel all traffic through that connection. I don't have a demo, so I just want to quickly talk about some stuff that are important to keep in mind in istio traffic management or in the existing sidecar based proxy. This is typically how you would do traffic management. So if you're familiar with Kubernetes, you know that you create deployments and services and stuff. But if you add istio, then you remember all the crds I talked about. All these CRDs gives you objects that allows you to do traffic management in istio. So one of them, for example, here is an example. I have a virtual service and a destination route. So let's take an example. We have service a on one side and then we have service b on the other side, and then we added service b version two. And I want to send part of the traffic from service a to service b, in this case, 5%. I can do that with Kubernetes natively. So what I have to do is I have to deploy what we call destination rules. The destination rule, essentially what they do is create like virtual services in a way not to be confused with the actual objects called virtual service, but they basically take service b v one and service b v two, and make them look like two different destinations. And then with the virtual service, then you can say, I want 5% to be able to send to v two, and then I want 95% to be sent to v one. Right? And because of the mesh concept I talked about earlier, the sidecar on the service a side is able to do that fine grained tuning of sending traffic between a and b. What happened over the last few years or so is that Kubernetes, or the Kubernetes community have worked on a new API, a new open source API called the Gateway API. So the gateway API is essentially a set of APIs that are going to be the next generation ingress. They will eventually replace Ingress as an API. And the Gateway API was implemented with a bunch of kind of lesson learned from the Ingress API. One of them is being able to do things natively in the API itself instead of just relying on extra crds or extra annotations. If you have implemented ingress in istio in a service mesh, sorry, in Kubernetes, you would know that it can get very long, because the Ingress API in Kubernetes solves the most common denominator across all cloud providers and across all the open source tools that exist. And it is up to each cloud provider and each open source tool, each gateway API, each whatever, to add that layer of customization they need. And those annotations that you see in an ingress object, they are typically not compatible with each other. The Gateway API had in mind to be able to have a single standard mode of implementing most of what people care about, things like routing rules and path based and host based routing rules and those kind of things, right? And so the Gateway API comes in three different objects. So you would have what we call a gateway class, a gateway and an HTTP route. There are also TCP routes and there are TLS routes. But a gateway class essentially is something that the cloud provider implements for you or installs for you that defines the type of load balancer the gateway objects create. An actual load balancer. The HTTP route is what maps the actual service or the backend toward that load balancer. You can have multiple personas deploying these things in the ingress space. It's up to the service owner inside the namespace to deploy the ingress object to expose their application to the outside of the cluster. Here you can have a platform admin implement the load balancers for you, and then it's up to each service owner to implement their own routing rules. Also, one of the key implementation details of the Gateway API is the fact that you can do cross namespace routing, which you couldn't do with ingress. This is just an example where you would have the gateway object called Foo, which is using a gateway class provided by the cloud provider or by the infrastructure provider. In that foo gateway object, which deploys the load balancer, you can decide what the domain is, you can have TLS certificates, you can have policies, et cetera, and then you can allow the store developer and the site developer. These are two different namespaces, two different apps to use HTTP route to route how traffic gets from the load balancer to their back end. This is how an object looks like. And there is a reason why I'm talking about this because in the context of ambient mesh. So this is an example where you have an HTTP route that says if I have hostname, if I have a rule that matches the header cannery, then I would send it to the cannery version of the service. Otherwise I just want to send to existing version. You can also do things like weight based bumping, like 80%, 20%, stuff like that. So what's happening right now is that the istio community have decided that they are going to take the Gateway API and use it as a way to do traffic management. There are multiple reasons for this. The main reason, the straight point reason, is no one needs more crds. So we're trying to get rid of crds. That's reason number one. Reason two, since the Gateway API comes with already a bunch of those routing rules natively implemented through the API itself, Istio have decided well, if this is the way forward for kubernetes, and eventually at some point all Kubernetes clusters will have the Gateway API installed out of the box. Because it is an upstream API like ingress is, then we might just leverage it, we might just use it to be able to implement this. And so as of today, there is Istio and Linkerd. Both actually support the gateway API. So you can just use the Gateway API to define all your route and your routed rules management instead of using the built in crds of Istio or Linkerd. And there are more to come down the path. That's it. I hope this was useful as a basic introduction to Mitmesh. I know I talk a lot and I talk very fast, so maybe you can just go back and slow it down a little bit in the slides. There are a bunch of links in the show, notes to material to read and I hope that was useful. Don't follow to don't forget to reach out to me on Twitter if you need any, have any questions, or if you need any help and subscribe to the podcast. Thank you.

Abdel Sghiouar

Senior Cloud Developer Advocate @ Google

Abdel Sghiouar's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways