Life of a packet in Amazon EKS

Video size:

Abstract

Ever wonder how packets navigate the Amazon EKS maze? Join me for a wild ride from internet to pod, demystifying Route53, ALB, and VPC magic along the way. No more being stuck between teams during outages – become the go-to troubleshooter. It’s like ‘The Magic School Bus’ for cloud networking!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. Welcome to this com 42 cloudnative session on life of a packet in Amazon EKS. My name is Do Tim Rap. I'm a solutions architect at AWS helping global financial services customers architect scale and optimize their business applications on AWS Cloud. My background is in networking and security infrastructure engineering, and I've been in the industry for over 20 years now. in the past, throughout my career, I worked at various systems integrators and vendors such as Nortel Networks, Brocade, and VMware. In, I've been in different roles in like network solutions architect, network support engineers, solutions engineer. And Solutions Architect. Recently in this session, I will walk you through the various packet walks in Amazon EKS, diving deep into the container network traffic forwarding in Kubernetes. Here's a quick disclaimer, basically stating that all the points that I'm providing in this session, I've observed and obtained, using public tools and anything that I mention in the session represents my own opinions and not my employer. So I'm gonna. Quickly touch upon Kubernetes and Amazon EKS, high level architecture. I'm gonna talk about Kubernetes network model and port connectivity. I'm gonna do deep dive packet walks, and then I'll introduce Kubernetes services and ingress. And, with those concepts, introduced, I'll do some more packet walks. there are things that I kept out of scope in the interest of time, but also to emphasize the foundational concepts, for you to focus on. So let's start with Kubernetes architecture. Kubernetes is comprised of two main components, a highly available control plane, which is comprised of two or more nodes and a data plane, which could be comprised of thousands of nodes. And within the com control plane, there's an API server, which exposes the Kubernetes API to the external world. It's the front end for the Kubernetes control plane. And then there is HCD, which is the key value store that keeps all the configuration and state of the Kubernetes cluster. And then there is scheduler which decides on which node an application port should run. There is controller manager, which is comprised of various controllers. And in Kubernetes, a controller's responsibilities to make sure that the actual resources implemented in the cluster always matches the desired state, which the admin and the developer defines. There is cloud controller manager, which embeds cloud's specific control logic. It allows the Kubernetes cluster to provision cloud native resources. Provision, monitor and remote, and also remove those resources required for the operation of the cluster. All these components in the control plane are implemented with high availability. There's a lead reelection process that takes place for each of them, and within the data plane, you would have the worker nodes where your application workloads run in the form of Kubernetes. Pos, keep in mind that a pod in Kubernetes is the smallest deployment unit and. A pod can consist of multiple containers. There's an agent called culet running on each node, and it's responsible to run the pods, which are assigned to the node and check their health. And lastly, there is cube prox agent running on each node. It configures and maintains network rules with regards to the traffic destined to a Kubernetes service, a concept which I'm gonna explain later on in the session. Let's now look at how this architecture is implemented in Amazon EKS. The control plane is implemented in your VPC, sorry, in A VPC, which is owned and managed by the Amazon EKS service. It's spans across multiple AZs. It runs a minimum of two API server instances, which also run the cube scheduler and cube controller manager components. There are three eight CD instances, which are spread across all three AZs, and one thing to mention is that. These, control plane notes, these instances actually run in an AWS autoscaling group, EC2 autoscaling group for simplicity. Reason, I kept it out of the scope of this diagram. The Kubernetes API is exposed to the public internet by an AWS network load balancer. So next is data Plane. In the data plane, you run your work nodes in your VPC. The way these work nodes communicate with the Kubernetes API is through the cross account. Eni, also called X ENI that are implemented across at least two A. Let's explain the Kubernetes network model. Kubernetes imposes a few fundamental requirements. Each port gets its own IP address In Kubernetes, pods can communicate with each other without na, as a network address. Translation, agents on a node can communicate with all the ports on that node, and a pod can consist of basically multiple containers in Kubernetes. It would have a loop back interface. The containers would communicate with each other through this loop back interface in a pod, and this is also called localized communication. The port would also have an E zero interface and the containers in the pod would communicate with the other resources through this eight zero interface. Let's now look at how pod connectivity works. So you have the network interface of the node itself, which runs in the root network, namespace of the node. Then you have a separate dedicated networks namespace for the pod. The way the pod network namespace is connected to the root network namespace is through a Linux construct called V pair and a V interface. The same approach is used for the other pods on the node as well, and the way the V interfaces connect to the Linux IP stack differs across different solutions. This is because various computer environments may have different connectivity characteristics and requirements, and because of that reason, Kubernetes itself does not own the configuration of the port connectivity part. Instead, there's a specification called Container network Interface, CNI, in short, which leverages plugins to implement port connectivity. There are default. CNI plugins like Loop Back Bridge, IPV lan, as well as third party plugins like Calico Cilium or Amazon V-P-C-C-N-I-C-N-I. Plugin is basically an executable file which runs on each node and it is responsible for various tasks. It adds and deletes the network interface of the pod. It calls the IPAM plugin to assign an IP to the network interface of the port, and also it sets up the V pair to connect the PO network namespace to the host. This is, how it looks like with EKS and what tasks The VPCC and I perform. EPCC and I implements the pod connectivity. It leverages either the secondary IP address or prefix mode capabilities of the EC2 service. In order to assign an IP to the pods from the VPC Sider, basically, in essence, the pod ips are mapped to the specific Ennis and as scale grows VPCC. And I also needs to add or delete Ennis on the node to make sure that there are enough available ips, for the ports. E-P-C-C-N. I also configures the respective routing entries in the node and it also configures routing and ARP entries in the pod network namespace as well. Let's examine various interfaces on the node and the pod. So we got two ENI on this note, and when you look at the pod network interfaces, there's a look back interface. and eight zero interface as mentioned previously, and notice that the subnet mask of the eight zero IP in this pod is slash 32. So routing table of the pod is set up with a default gateway of 1 69, 2 54 0.1 0.1. And this is basically the next top IP that is configured as a local next stop. With the Scope Link perimeter, the Our Pantry for this default gateway IP is configured as a manual entry by the V-P-C-C-N. I notice the m notation in the flags mass column. And when you check the interfaces of the node, you would notice that the default gateway Mac address you have seen in the arc table of the port earlier actually belongs to the V interface in the root network namespace. So the interface ID you see here on the top right is the actual V id. We will now examine ingress and egress traffic on a node with ports in this diagram. PO 50 ones IP is a secondary IP on ENI one and PO 60 ones. IP is a secondary IP on ENI two. Let's imagine traffic coming ingress to the north. Destined for PO 61 ip, the node performs a root lookup in its policy based routing table and matches the traffic to the highlighted entry. And that entry requires to perform a root lookup in the main routing table. So in the main routing table. The traffic matches the highlighted entry and the traffic gets forwarded to PO 61 at this stage. Notice the third entry in this main routing table please 'cause it's for the direct connected subnet on the eight zero interface. This will become relevant in the next packet. Walk. Next, let's imagine traffic generated by. PO 61 would send traffic to a destination IP. In the VPC, the node performs a route lookup in its policy-based routing table and matches the traffic highlighted in the entry. And that entry requires to perform a root lookup in the routing table. Number two, in that routing table, there is only a single entry for the default gateway. Through ENI one, which is the secondary ENI. So the traffic will be sent through ENI one to the default gateway. And this is regardless of whether the destination of the traffic is on the same subnet as. PO 61 or not. 'cause there's a single routing entry, which points out through routing table, number two. And in that routing table. Number two, that single routing entry just points out to the E one, which is the ENI two in on this, no. So let's now look at port to port traffic on the same node. Pod 51 generates traffic destined for Pod 61. Notice the source Mac is the pod 51 Mac, and the destination Mac is the V interface. Mac. The note performs a root lookup in its policy-based routing table and matches the traffic to the highlighted entry. And that entry requires to perform a root lookup in the main routing table. So in the main routing table. The traffic matches the highlighted entry and the traffic gets forwarded to the pod 61 through the respective V interface source. Mac becomes the V Mac and the destination Mac address becomes the PO 61 Mac. This time we will look at. Port to pod traffic across nodes PO 51 generates traffic destined for PO 81. The node performs a route lookup in its policy based routing table and matches the traffic to the highlighted routing entry, and that entry requires to perform a root lookup in the main routing table. And in the main routing table, the traffic matches the highlighted entry. 'cause the pod 80 ones IP is on the same subnet as the ENI one interface of the node and the traffic gets forwarded by the node. Notice the source and destination max become the ENI max ENI of the respective notes. The receiving note performs a root lookup in its policy-based routing table and matches the traffic to the highlighted entry. and then that entry requires to perform a root lookup in the main routing table. So in the main routing table, the traffic matches the highlighted entry and the traffic gets forwarded to. The pod 81 through the respective V interface. So here, source Mac becomes the V Mac and the destination Mac becomes the pod 81 Mac. Now let's have a look at the return flow. So POD 81, response to the previous request from PO 51. The node performs a route lookup in its policy-based routing table and matches the traffic to the highlighted entry. in that, and that entry requires to perform a route lookup in the routing table. Number two, in the routing table two, the traffic matches the highlighted entry, so the node actually sends the traffic to default gateway, even though port 81 and port 51 are in the same subnet. This means if a pod. Is a secondary IP of the secondary NI. Then the traffic would always be sent through the default gateway. This would apply even if the port and the destination is on the same subnet as you can see on this diagram. Then the default gateway forwards the traffic to the other node. The default gateway here that you see would be the VPC router that is maintained and run by AWS implicitly to you as the end user. So the receiving node performs a root lookup in its policy-based routing table and matches the traffic to the highlighted entry. And that entry requires a root lookup in the main routing table. In the main routing table. The traffic matches the highlighted entry, and finally the traffic gets forwarded to the port 51 through the respective V interface. So now I'm gonna explain the concept of, called Kubernetes service. Imagine we have a Kubernetes, deployment manifest shown at the top. This deployment manifest basically represents a stateless microservice application, which has three identical pods also called replicas. These pods have labels as metadata, which are just like resource tags in AWS, and each port is running the same application on the same port like TCP port 80. And in a typical scenario, you would expose your application to other applications and sometimes to external clients. So this in the Kubernetes, context, this brings up an interesting point because where you would need to consider how to make these individual pods in the application accessible in an efficient and optimum way. Because in Kubernetes, every pod gets its own IP address, but pods are considered to be FML rather than durable entities. So meaning that when any of the pods in this application, one gets killed or terminated as a result of a failure or an upgrade, then the deployment controllers. automatically would instantiate new POS as replacements in order to match the actual state to the desired state, which is the controller's main job, however. The new ports would come with a new ip, and this sequence of events might happen thousands of times, in a day due to the, nature of the agile software development principles. Hence, there must be a stable and consistent way to access these, respective microservice applications at a single address every time in a seamless manner. So that takes us to the what piece. So a Kubernetes service, in essence, is an abstraction, which basically groups the pods together based on label selectors. Here you can see the pods, which have the label of name called on App one become part of the Kubernetes service. The Kubernetes service provides. A front end for those pos and pos are actually called endpoints in Kubernetes service terminology, and yet another controller. Called Endpoints Controller is responsible for keeping an up-to-date list of these endpoints in case of any failures or instantiations making the application highly resilient and Kubernetes service constructs supports T-C-P-U-D-P and A CTP protocols. So any request that comes into the inbound to the DNS name or IP address of the service basically gets forwarded to one of the ports that is part of that service or that is backing that service. Sometimes it's called. So it's just like a load balancer. So there are different types of, Services, Kubernetes service types such as cluster ip, not port or load balancer. These service types address different communication patterns. We will now go through those communication patterns one by one. Now service type cluster ip. This is the default service type in Kubernetes. It's used to expose your application on a service virtual IP address that is only internal to that cluster. So this virtual IP is the same on each node. And it's not shared or announced anywhere than the cluster itself. So access to this service is only from within the cluster. And let's take further, into how it's implemented. When you post a cluster IP type of service manifest to the Kubernetes A PIA component called Q proxy, which exists on all the nodes, watches the. Kubernetes API and detects that a new service is just configured. Then it configures the virtual IP address of the service and a bunch of forwarding rules in Linux IP tables on every node, and at this stage, you have the service available for access from within the cluster. On a side note. Kubernetes has its own DNS, which automatically assigns an internal DNS name to each service using a predefined format. So when a request comes to the service, I. Virtual IP on a given node, it'll be destination netted to the IP address of one of the ports, which is part of that service. The algorithm for this is round robin. And as a side note, Q proxy functionality can be implemented using other data plane options such as, Linux, IPVS, IP Virtual Server, or EBPF, extended Berkeley packet filters. This can enhance the characteristics of this load balancing algorithm and decision, although for simplicity purposes, it's not depicted in this diagram. Keep in mind that. The pod that is part of the application two and the port and the pod that is part of the application one. These can be on the same note and this flow could still be the same. So let's do a packet work in this scenario to understand that better. PO 51 generates traffic destined to the service virtual ip. The IP table is on Node 51. Performs three tasks at this stage. First one is load balancing the traffic to one of the PO ips part of that service. And in this instance, it sends the traffic to PO 71, but it's extremely important to know that as shown in this diagram, even if there was a pod part of that service running on the same node, which is node 51. IP tables load balancing algorithm could still send the traffic to Port 71 or any other remote port IP tables does not differentiate between local and remote ports, which are part of that service. So the second task, is that IP tables also applies destination net to the traffic. Third task is since IP tables is a stateful engine, it keeps state of the flows. It needs to keep state of the flows, so it marks this flow to be able to match any return traffic. The node 51 forwards the traffic to node 71 and notice that the IP becomes about 71 ip. And note 71 traffic gets forward to apologies. Note 71 traffic gets forward to Port 71. Now keep in mind that in this scenario, IP tables work alongside I. All the policy routing tables and other routing table logic that we explained previously. Another thing worth mentioning is Port 51 would not know the services virtual IP out of the box. It would first. Need to resolve the services DNS name to its virtual ip. So for that, it's important to highlight that the Kubernetes DNS itself actually runs on a cluster IP service as well. So in reality. To walk you through it really quick, what happens is Port 51 in this topology would generate a DNS request for the respective services DNS name, and then that DNS request is also sent to the Kubernetes DNS services virtual ip, which then. Gets load balance to one of the DNS ports in the cluster, and then port 51 would receive the virtual IP address of the Kubernetes service that it wants to access from the first place. So let's have a look at the return flow here. Port 71 responds to the previous request from PO 51 and then. Node 71 forwards the traffic to node 51. The IP tables on node 51 first identifies this traffic as return traffic of an existing flow. So it applies source net or snet in short to the traffic. And it basically swaps the source IP of the packet with the virtual IP of the service because Port 51 originally was communicating with the Bernet service, virtual ip, not Port seven one directly, and then this traffic gets forwarded to PO 51. Let's explain what Northport type of services. Northport is basically used to expose your applications to external resources for testing purposes. It's built on top of cluster ip. All the characteristics that I mentioned for the service type cluster IP apply here as well. But there are also additional points that enabled node port capability, so let's explain how that works. You post a note port type of service manifest to the API. The Q proxy watches the API and detects that there is a new service and then it configures forwarding rules and network address translation rules in Linux tables. But this time it actually configures an additional rules for two reasons. First. To process the requests that come to a specific port on the node and second to forward those requests to the actual Kubernetes service. Keep in mind that the port configured, on the nodes are all the same on every node and the port. That port has to be from a specific range, which is shown on the top right hand side of the diagram here. So let's do a packet walk in the Kubernetes service type note port. This time request comes from an external client or service destined to the norde port. On note 51, the IP tables on note 51 performs four tasks. This time it first load balances the traffic to one of the pod ips. Part of that service. And as mentioned in cluster IP section, please keep in mind that even when there is a local pod on North 51, part of that Kubernetes service, IP tables may still forward the traffic to a remote po. And in this instance, let's assume that it sends the traffic to Port 71. It then applies destination net or dnet to the traffic, and it then has to apply snet or source net to the traffic as well to implement symmetry for the flow, basically, so that the client would not receive the response traffic from the port IP directly. Which would break the flow 'cause the client originally is communicating with the north port of the North 51. So since IP tables keeps the state of the flows, the fourth task is IP tables marks this flow to be able to match any future return traffic. Then note 51 forwards the traffic to note 71. The node 71 then forwards the traffic locally to the pod 71. Now it's worth mentioning that the client IP would always be S netted or source netted, even when the destination pod was running on the same node with Port 51. Something to keep in mind. The return traffic would look like this. Port 71 responds to the request. The note 71 forwards the traffic to note 51. The IP tables on Note 51 first identifies the traffic as return traffic of an existing flow, and then it applies both destination net. Swaps the destination IP with the Port 51 IP and also apply source net swaps, the source IP as the service virtual ip, and then the non 51 sends the traffic back to the client. The downside of the node port is that you are supposed to send the requests explicitly to the node IP on the node port. This means you need to figure out the IP address of the individual nodes and keep track of them in case of one of the nodes fails, or you know you need to do perform upgrades, et cetera, and then additional consideration. Would be how to distribute the client requests across different nodes. Sounds we need an external load balancer 'cause that would be the perfect solution to cover the node failure or upgrade scenarios. Let's look at how, the load balancer type of service, is used. So load balances type of service, is we use this type of service to expose the applications to clients external to the cluster. It's built on top of the node port, and when you post the load balance type of service manifest to the API. As you can see, all the things we have explained previously is configured within the Kubernetes cluster and work exactly the same way to be more specific. Requests that hit the node port on any of the nodes get forwarded to one of the ports. But this time Kubernetes expects. An external component to detect the new service and configure external load balancer on the fly so that load balancer starts forwarding the requests from the clients to the nodes. That external component. Is a Kubernetes controller called Service Controller, which comes as part of the Kubernetes distribution, the open source Kubernetes, code. It's actually a controller bundled within the controller manager component that I mentioned when I was introducing high level Kubernetes architecture. And, AWS currently maintains two such controllers. One is the one that I just mentioned, service controller, which comes as part of Kubernetes out of the box. And the other one is AWS load ER controller. Which is a Kubernetes special interest group, SIG in short project on GitHub that anyone can contribute to. You can install this AWS load balance controller to replace the service controller as it has additional capabilities such as Kubernetes, ingress support, and also target type IP support, both of which I'm going to explain later on in this session. Kubernetes service is a layer four construct, so it does not address layer seven load balancing. So whenever you configure a service type of load balancer, then the controller provisions and AWS classic load balancer type of elastic load balancer. So this service controller that you see on this slide, it is capable of configuring classic load balancer by default. And classic load er is a legacy type of load. Er, it can also configure, a network load er, as well. And, the network load balancer is, it can perform health checks against the node port on each node. So here you see that we are instructing the service controller to configure an NLB to provision an NLB rather than. CLB, the legacy type of load balancer, but in short, Kubernetes service type is for layer four traffic. So let's do a packet walk with the load balancer scenario. The request comes from an external client or service destined to NLB listener, port on the listener IP Lord er has all the nodes in its target group and performs health checks on the node port. And important thing to know, here is that old nodes, including the ones which do not have a pod, part of that respective service would look healthy. So all the nodes would, look as healthy nodes in the network. Node balancer target group, which means node 51 would also be a healthy target from NL B'S point of view, ready to receive traffic. So let's assume that NLB forwards the traffic to note 51. So note 51 receives the traffic on its note port. And rest of the flow would be almost identical to the node flow. The node port flow that we saw earlier, the IP tables on the node 51 performs four tasks, and as you can see, the destination Kubernetes service, there are no ports on Node 51, part of that Kubernetes service. So what would happen is the. note 51 when it receives the traffic on a sport and when the IP tables on Note 51 performs its tasks, the IP tables on Note 51 first load balances the traffic to one of the port ips. that is part of that service. And keep in mind that I'm showing the most suboptimal forwarding scenarios in this session intentionally. If there is a local port on North 51 part of the respective service, then IP tables may select that local port as the destination, or IP tables may still select Port 71 or any other remote port as the destination. So there is no guarantee that it'll always. Pick a pod that is part of the service that is on the same node where you received your traffic. But let's assume that IP tables picks the POD 71 as the destination pod being part of the destination service. So next thing is the IP tables would perform dnet destination net to swap the destination IP with the Port 71 ip. And also. It applies IP tables applies source net so that it swaps the source IP of the traffic with the Node 51 ip. So lastly, the IP tables marks this flow to be able to match any feature return traffic. So note 51, then forwards the traffic to note 71 and note 71 when it receives the traffic. It forwards the traffic locally to port 71. Now let's look at the return traffic. Port 71 responds to the request. The node 71 forwards the traffic to note 51. The IP tables on no 51 first identifies this traffic as return traffic of an existing flow, and it then applies both destination net and source net to the traffic. And then North 51 sends the traffic back to the NLB and NLB responds back to the client. Let's now tackle this suboptimal traffic pattern that I keep repeating from the NLB to the North 51, which does not have any local port that is part of that service. So previously we saw that there could be traffic tromboning happening between the NLB, then across nodes and the pod. In addition, the traffic also gets source netted to achieve flow symmetry. So in the cloud, that tromboning actually. Manifest could manifest itself as a gross AZ traffic across availability zone traffic, which could add latency and also could incur data transfer charges. So external traffic policy. Is a Kubernetes feature that can be configured in the spec section of the Kubernetes service manifest. And when it is set to local, basically the load balancer would send requests only to the nodes, which have pods that are part of that Kubernetes service. So basically, this feature makes sure that the traffic would be forwarded to the pods local on those nodes only. And additionally, the traffic no longer gets snotted by the note, so the application can see the actual client ip. So let's do a packet walk to better understand this scenario. Client sends the traffic to the NLV, and this time the load balancer performs an additional health check on a port different than the node port if the respective node. Does not have any local pods that are part of that Kubernetes service, then this additional health checks fail on the node. So in our case here, this means that NLB would never load balance the traffic to node 51. And let's assume that NLB load balances the traffic to North 71 on its north port. This time IP tables on North 71 load balances the traffic to the local port, which is port 71, and it would not pick a remote port. It just applies destination net. Lastly, the traffic gets forwarded to the POD 71, but as you can see there is still IP tables process. On Note 71, which basically swaps the destination IP with the Port 71 ip. So can we make it even more efficient than that? Let's look at that. That is possible with a feature called Target ip, or Target type IP feature of the AWS Elastic Load balancers. If you deploy. And use AWS load balancer controller instead of the service controller, which comes out of the box with Kubernetes. Then target type IP feature is fully supported. So basically for this feature, you need AWS load balancer controller in this mode. The ports that are part of the service become targets in the target group of the AWS network load balancer, and neither the note port nor the IP tables is used in this scenario. However, you need to keep in mind the ELB elastic load balancer service quotas. 'cause if you have, for instance, a service of 99 500. Ports, then it could, this could lead to many targets being part of the same target group on the A ELB. I'll provide the ELB service quotas link at the end of the session. So let's do a packet walk and see the difference here. Client sends traffic to the NLB listener IP on the listener port ER has the pos as targets in its target group. And let's assume that the. NLB sends the traffic to Port 71 North 71 receives the traffic, but this time IP tables does not play any role for this traffic. 'cause the destination IP and the IP that is already port 71 ip. So the traffic gets forward to port 71. it's worth mentioning that in this session, the assumption is default settings are used with the load balance, with the network load balance. So in this flow, you might have noticed that the source IP becomes the NLV ip, however. If the application needs to track the client ip, then this can be achieved by enabling the AWS network, broadband attribute called Preserve client ip, which is an other, not notation, annotation, sorry, in the Kubernetes service manifest. Having said that, let's move on to the last topic, which is ingress. So you may be asking, Wasn't what we just investigated with North Port or load er. Also ingress traffic. the way Kubernetes community, uses the, ingress in the terminology is a little bit, slightly different. Ingress is a different mechanism than service. Ingress is to implement layer seven load balancing and routing rules, such as host name, URL, path based routing, definitions. And as seen in this manifest here, it enables you to expose your Kubernetes services through HTTP or Htt PS route here HT TP requests, to order service would be forwarded to the, sorry, the HTTP requests as. Slash order would be forwarded to the order service and requests like slash rating would be forwarded to the rating service in this ingress manifest. That's the definition. So when you apply this manifest. To the Kubernetes API. Just like in the previous section, Kubernetes would expect an external component to implement a load balancer, which can process the client traffic according to the layer seven routing definitions that you put in place, that external component. Is a custom controller provided by the respective solution. So I talked about AWS load ER controller, provisioning an AWS network load balancer. Previously. In this context, AWS load ER controller is also capable of provisioning an AWS application load balancer to fulfill the Kubernetes ingress. Request. So let's do a quick and simple packet. Walk with the ingress. as well, this will be a scenario where target type IP is used, as I mentioned previously in this mode, the pods that are part of the service become the targets of the AWS application, nor neither nor port nor the IP tables is used in this scenario. Let's do a packet walk and see how it looks like. So we do have an ingress manifest applied on the top left and also a cluster IP service manifest is also applied. So client sends the traffic to the A-L-B-A-L-B decides to load this traffic to port 71. A LB always applies SNA to the request. So the source IP always is the A LB ip. You cannot pre, you cannot preserve the client IP through a LV, but you can use if, capabilities like X forwarder in the HT TP headers and then the, node itself would bypass the IP tables. It does an IP tables does not perform anything, and the traffic gets, the packet gets forwarded to port 71 on the node directly. Let's examine the return flow PO 71 response to the traffic. And then again, IP tables is just, bypassed the node, sends the traffic to the application node balancer, and then the application node balance sends the traffic back to the client again with its own ip. Let's stitch it all together in the context of AWS completely. So client sends a DNS request for an application DNS record, let's say portal.example.com, and the domain is hosted on Route 53. Portal example.com record is an alias record on Route 53, and it points out to the A IB. So Route 53 actually responds with a multi value where all the public ips of the A LB would be in the response. Client picks one of the public ips in the DNS response and sends the actual request to the application. Let's say the client picks the public IP number three. In this scenario, internet gateway would apply destination net. To the request by swapping the destination IP of the traffic with the private IP of the A LB in availability, zone three. So basically here, the original destination IP in the client request was public IP number three, but the internet gateway swapped that with the private IP of the A LB rather than the public IP of the A LB. The private IP of the A LB would be the destination. of the traffic and a LB performs a load balancing decision and decides to forward the request to one of the pods. It could be any of the pods in the, any of the available zones. It's important to note that by default, cross zone load balancing is enabled on a LB. So A LB may decide to forward this request to any of the pods in the other AZs. That's the reason. But if you'd like to force a LB. To forward the requests to the availability zone, local resources, then you can disable the cross load, cross zone load balancing feature if you'd like. So to make it easier for the novice, user or admin, everything works out of the box with V-P-C-C-N-I, especially in terms of. When I say everything, let me rephrase that. Everything as in PO generated traffic, PO initiated, PO generated traffic would always work in V-P-C-C-N-I. The reason is out of the box, V-P-C-C-N-I applies SNET source net to all the pod initiated traffic, which is destined to external subnets or non V-P-C-C-D, CIDRs or siders. So when put in this slide. The 1 92 1 1 68, 1 51 ip, the port sends some traffic destined to a public IP on the internet VPCC, and I automatically swaps the source IP with the nodes primary E and i's. primary ip, which is DOT 50 in this scenario, again, it swaps the source IP of the traffic with the nodes primary, Ennis primary ip, which is DOT 50. In this scenario then. dot 50 source IP gets source netted again, but this time by IGW to the, public IP that is, that which is associated with the DOT 50 ip. As the public subnets always mapped the instance IP to a public ip, so this is the way. Code initiated. Traffic automatically always works. I am sharing additional resources on this slide, especially when it comes to cross AZ traffic. How to? Localize the traffic within each az, even when the Kubernetes traffic patterns are, as, complex as this or as hope by hope. like this, please have a look at the, these resources, which I believe it would help, tremendously help you, and especially E Ks Best Practices Guide also goes through some of the scenarios that I mentioned here with that. Thank you so much for joining my session today and I hope you find it useful. Thanks again and have a great day.

Slides

Download slides (PDF)

See all 81 talks at this event!

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Life of a packet in Amazon EKS

Video size:

Abstract

Summary

Transcript

Slides

Dumlu Timuralp

Senior Solutions Architect, Global Financial Services @ AWS

Join the community!

Featured event

2026

2025

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Life of a packet in Amazon EKS

Video size:

Abstract

Summary

Transcript

Slides

Dumlu Timuralp

Senior Solutions Architect, Global Financial Services @ AWS

Join the community!