Platform Engineering's Hidden Network Challenge: Why Cloud Platforms Ditched Multicast and How It Impacts Your Infrastructure Automation

Video size:

Abstract

Ever wonder why GCP, AWS, and Azure killed native multicast? This hidden decision is breaking your K8s scaling, slowing your automation, and costing you millions in bandwidth. Discover the 5 technical barriers that forced this choice and the game-changing solutions coming next.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. This is Shankar. I'm here to present my topic on the Platform Engineers Hidden Network Challenge, especially to talk about the multicast in the cloud native environments. Traditionally the multicast is proven to reduce the bandwidth consumption in the the on-prem environments, but the major cloud providers. Most of them don't natively support multicast. So I'm gonna be talking about the pros and cons of multicast and some of the alternate options that could potentially be implemented within the the cloud native environments. So when I say multicast what exactly it means? It's replicating the same data to multiple recipients. So since the data and the same data is being received by the multiple in-house typically where this helps. If we consider unicast, it's one-to-one, right? Sender sends the packet to see what gets it. So for that to happen, they have to establish a session initially. And if we assume hundreds or thousands of receivers and one sender that single sender has to build hundred or thousands of TCP sessions with each one of them, which is CPU Intensive. So you're literally wasting. The resource of the sender just by establishing these sessions which is efficient. So the better approach is using malca. We use one sender since the packet. To the multicast group and whoever joins that group, the receivers, they start receiving the packets. And there is no session between the sender and receiver who are joins the group. They get those packets, and if you leave, you're out of the stream. So you could consider the multicast as an example of probably a radio stream or a TV channel streaming. Where the broadcaster broadcast the channel, the programs on the channel and who are like, if you tune, tune into that channel, you start seeing the picture in your TV and whoever joins the channel, they'll be able to see the same thing at that given time. And same thing with the radio, like the radio broadcaster streams on a certain frequency. And if you tune the radio to that frequency. You start listening the sounds or whatever broadcast is done. So it's the same concept like the multicast one center streams data and whoever joins the group they start receiving the data. There's the, though there's a dramatic efficiency gain but implementing that in the large scale virtual environments. Has been complex and difficult which is why I think it's still not yet need to be supported in some of the major cloud providers. So what are the critical barriers, right? That broke multicast state explosion and software defined networks. When we talk about multicast the elements that are in the network have to maintain a forwarding table. And since and it has to be consistent across the network infrastructure in the cloud. And as the, as there is if you have multiple resources or the receivers joining and leaving the group, there's a huge amount of calculations that happen. So this could stress out the control plane of the the backend infrastructure. So that could be. One reason that could potentially prevent multicast being implemented in this wide network of public cloud security. Another consideration. Typically the cloud native platforms this strictly depend on the multi-tenancy with the different clients. Our customer workloads running the shared fiscal infrastructure in the backend, but they're typically separated. There is no leakage of data or the resource don't get shared into the other customers, environment. But the. Multicast designed to be operated under one administrative control. So there could be a potential that the the data could leak into the other tenant's environments. So that probably one reason that the multicast could potentially be not implemented. The other thing is the operational complexity and automation challenges. With unicast, you can define it, static routing. But when it comes to multicast it's dynamically changing behavior. It's, which is hard to predict and as I said, they it potentially. Causes a lot of recalculations in the forwarding table and a lot of stress in the control plane. So that's another reason the fourth one performance degradation in the virtual environments. Theoretically multicast reduces the the bandwidth consumption, but the com but it comes with the complexity of implementation. And there's the processing of multicast package through the virtual environments could potentially be far reaching or maybe it exceeds the bandwidth savings. And the fifth one is the the architecture mismatch with the cloud native patents. So the microservices they emphasize loosely on the loose coupling. And they express explicit service boundaries, but when it comes to multicast, it's implicit coupling between the publisher and the subscribers. Those are like the five constraints that could potentially has been restricting the multicast to be natively supported in the cloud. And how. How having the multicast restricted in this cloud it, it could impact infrastructure automation. One would be the migration challenges. So many financial data telecommunication media streaming services, heavily relying on multicast for the efficient data distribution. As most of their models depend on one couple centers, multiple receivers. But when it needs to be, when it needs to be brought into the the the cloud native it changed like the legacy application moving into the. Changes the whole infrastructure architecture because as it's not natively supported, so if you're moving from on-prem legacy app, which is a multicast based into cloud yeah that's where. Since there is no native support from multicast in the cloud you would have to employ some alternate solutions. So which kind of becomes complex and that could result in migration challenges and infrastructure as code, right? Like terraform and cloud formations. There is, as there is no direct resource natively supported, so you cannot directly implement using the declarative approach. Rather you have to have some complex workarounds through custom scripts. And that kinda breaks the process. The platform engineers rely on like the consistent repeatable code that we kinda rely on with the state files. That kind of probably gets lacked. So those could be two of the impacts the infrastructure mission can have with the missing multicast, native multicast supporting. Cloud container orchestration complications. So service discovery service discovery and Kubernetes typically depends on the DNS, which is like the point to point communication patterns. Applications that generally use multicast for dynamic service discovery. They must be modified to work with the Kubernetes service abstraction model. So that's one challenge that if you're gonna go from. Legacy app to like the microservices that could be a huge change that needs to be considered security policy gaps. You can apply security policies for part to part communications. But the Kubernetes security process do not control the multicast traffic. And that could be, huh. Potential security gap that could be left pod lifecycles. The pods lifecycle management the pods could be created, destroyed pretty quickly even before the multicast recalculations happen. So that could be one complication. If you want to implement. Multicast within the Kubernetes in the cloud native environment. So does that mean like we cannot do multicast? I would say there are a couple approaches that could be employed though a little complex and have their pros and cons. But they, there could be few approaches that could be taken. One is at the application level itself within the developers actually take that network layer into the application like code, the the membership, the group membership management the message delivery and error handling within that application. And they actually maintain the whole communication from center to receivers within the application. Though it increases the complexity, but it does provide with an approach where you could generally move migrate your applications into cloud. But the overhead of network now falls under the developers. And also with the networking layer added into the application there could be performance issues. As well. The other approach that we can employ is creating over networks. So there, there are a couple vendors that do support over networks within the public cloud environments. So the one way is to build let's say. You have a marketplace router that you can deploy and you can build the GRE tunnels from the host and the receivers, I mean from the center end receivers to that cloud router, which can as a central point. And then build BGP between the end house to that centralized router. And then the sender sends the package to the. Router and you can configure like the pim which kind of manages the whole multicast protocol. That's one way that it can be achieved. Though as I said it, it has its own complexity, but it is one way that you can take off that overhead from the application and bring it back into the the network layer. Compared to what we've discussed earlier about application level replication, There are, there, there are a couple cloud native alterna approaches that could potentially be reviewed as well. Mainly the messaging services like. Amazon SQS cloud pop sub they provide reliable scalpel message delivery and if properly planned and designed potentially the cloud native approach could be avoided. The network approach could be avoided and potentially the application will be, can be built more of with the cloud native resources. And if there is an application like the content delivery networks. If the application relies on the content distribution rather than real time data, then every cloud provider provides support CDN. So that could be one that could be employed. And and there are like software defense solutions. Like the industries now, the network industries coming up with some of the solutions that could potentially be supported in the cloud as well, like a market solutions. So that, that could be one option that we could potentially see in the future. Okay. So though we have the workarounds, but it, it does have its own cons and first one is the complex workaround. Like even if you try to do it overly or apply it in the application. The lack of native multicast support it forces the development teams to learn new things and try to inculcate into the application. The whole developer experience gets affected and it might slow down the software lifecycle, like the software development lifecycle as well. Environment inconsistency. You might be having a legacy application that locally. Works fine. But with the with the alternate approach that we try to implement in the cloud. So there's, there could be this huge disconnect between the application, the on-prem to what it's working now in the in the cloud native and the in the public cloud with alternate solutions that you implement. And also the testing. The testing and the legacy would differ a lot. When it comes to the solution implemented in the cloud. So these are some of the challenges that we might come across if implemented the alternate way of implementing the multicast in the cloud and operational overhead and complexity. So monitoring, complexity, and, capacity planning challenges and multiple communication patterns. So monitoring becomes complex with the multicast because monitoring and troubleshooting actually given you would be implementing different rather than using a cloud native now we'll be implementing multiple, overlay networks with, implementing overlay network with the different vendor elements in it. So it'll be hard to figure out where the issue is, and also like monitoring different elements centrally that becomes a huge problem And capacity planning given the multicast if you implementing the application it would be hard to know what's the usage and the scaling becomes difficult if it is managed entirely within the application. So that's another challenge that could arise. Implementing the alternate solutions. So it typically affects in higher costs, right? Higher costs. Incident resolution become longer and there could be SLAs being missed, service disruptions, production issues. It, you could implement with alterna approaches, but there are complexities and there are it, its own cons that come along with it. As I mentioned, the future prospects and emerging technologies. SGN native multi multicast protocols. So the vendors the networking, networking vendors are probably gonna be building something that could natively support in future within the cloud environments or maybe something that could be not implemented across the wide area network. In the cloud, but potentially limited to a certain scope for a specific purpose that could be option that could be implemented. And with the advancement of some of the nicks, like the smart nicks and the data processing units that could help potentially reduce the the challenges that we see in the the virtualized environments. That could potentially reduce the the performance issues that multicast posts in the cloud environments and maybe service mesh integrations like ST on console. There's been great advancements and potentially that's one area that could, can have like multicast like functionality in future, one cloud native option that could. Being researched is serverless event in architectures. Because I think with so many issues that we could come across with the alternate approaches like doing multicast within the application or doing the OA networks. As in increases as it in introduces complexity. I think going with the cloud native approach or redesigning your applications to be more cloud native probably would be the going forward and as serverless even architectures could be something that could be employed. And yeah, as I mentioned, you could the cloud native alternates is one option that could be employed. Sorry. The operational complexity and architectural constraints in the cloud environment, the total con of cost of ownership include not just resource consumption, but also operational overhead and development complexity, and the future of efficient group communication in cloud environment likely lies in solutions Design specifically for software defines centrally controlled architectures. Rather than adaptions of traditional distribution protocols. That ends my topic. Thank you all for listening. Thanks.

Slides

Download slides (PDF)

See all 83 talks at this event!

Conf42 Platform Engineering 2025 - Online

September 04 2025 - premiere 5PM GMT

Platform Engineering's Hidden Network Challenge: Why Cloud Platforms Ditched Multicast and How It Impacts Your Infrastructure Automation

Video size:

Abstract

Summary

Transcript

Slides

Srinivas Shamkura

Senior Cloud Infrastructure Engineer @ SADA

Join the community!

Featured event

2026

2025

Info

Conf42 Platform Engineering 2025 - Online

September 04 2025 - premiere 5PM GMT

Platform Engineering's Hidden Network Challenge: Why Cloud Platforms Ditched Multicast and How It Impacts Your Infrastructure Automation

Video size:

Abstract

Summary

Transcript

Slides

Srinivas Shamkura

Senior Cloud Infrastructure Engineer @ SADA

Join the community!