AI Infrastructure over Ethernet: Secure, Scalable, High-Performance Design

Video size:

Abstract

Discover how Ethernet is transforming AI infrastructure. From RoCEv2 to UEC-driven standards, learn how to build secure, scalable, and high-performance networks that rival InfiniBand empowering AI training, inference, and DevSecOps at scale.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. Thank you for joining today. We are going to explore something absolutely foundational to the rapid expansion of ai, the network fabric. Before we dive in, let me quickly introduce myself. My name is Nam Khan and I'm a solutions engineer at Cisco on the data center and the AI infrastructure team. My job is to make sure the solutions we build align with our customer's real business needs. I've been in the industry for over 23 years working with companies like Motorola, quest Communications and Cisco. I hold a dual CCIE certification and have delivered multiple sessions at Cisco Life across the US and Europe. Now, why are we here today? Because we are living through one of the biggest infrastructure shifts in the history of computing. Every organization is racing to build or consume AI models. And the interesting part is this, the bottleneck in AI isn't just the GPU speed anymore. It's how fast those GPUs can talk to each of them. You can buy the fastest GPU in the world, but if they're connected through a slow or congested network, it's like driving a Ferrari in a traffic jam. All that horsepower completely wasted. So in the session we'll look at how ethernet, the same technology that powers the internet. Has evolved into high performance fabric for scalable secure AI training and inference. Here's a roadmap for our session. We'll start with the silicon building blocks, the C-P-U-G-P-U and DPU. Then we'll look at the different types of AI clusters and the specific network requirements for each. The core of a discussion will be on network architecture. Focusing on RDMA, Rocky V two and Congestion Management. Finally, we look at the future with the Ultra Ethernet Consortium and discuss security. Let's start by breaking down on what actually sits in a modern AI server. It typically contains three MA major processing units. The CPU or the Center Processing Unit, which is the brain of the AI system, it manages the operating system, orchestrates data loading and handles all journal purpose task. Think of it as a conductor of an orchestra, making sure everything stays in sync in harmony. Next is the GPU or the graphics processing unit, which can be turned as the muscle of the AI system. This is where the heavy lifting happens. GPUs Excel at massive parallelism, performing thousands of metrics multiplication. One in AI clusters. GPUs are the most expensive resources, and everything about the network is designed for one purpose. Keep the GPUs busy. If a GPU is waiting for data, you're literally burning money. Finally, the DPU or the data processing unit is the traffic controller of the AI system as network scales to 400 gig and 800 gig processing packets. Eats up CPU cycles, what we call the infrastructure tax, the DPU offloads set, tax handling, encryption, packet inspection and routing. So the GPU and GPU stay focused on their workload. Thinking of DPU as a personal assistant, making sure the main performance can shine these three components together form the backbone of modern AI infrastructure. So what is an AI cluster? Isn't it just a rack of servers? Essentially, it is an interconnected network of high performance GPUs, acting as a single supercomputer. The network here is not just the plumbing. It is part of the compute fabric. If the network slows down the entire cluster stalls. This brings us two collective communication. In standard networking, aox to B. In ai, a whole group of GPUs need to exchange data simultaneously to function as a unit. During training, every GPU calculates a gradient. And they must all share that data to update the model. We use topology aware algorithms like rings or trees to ensure this happens instantly. If this synchronization lags training, time explodes. Now let's distribute between the workload. Training is where the model learns. It uses massive data sets. To teach the model patterns. It's like teaching a robot to read, walk, dance, whatever the robot was built for. Inference is the application phase. This is when you use the train model to make predictions or generate text. It's like asking that robot a question and getting an answer. Let's look at the specific requirement for each of these clusters as shown in the table. The bandwidth. Training requires a high no to node bandwidth because of that gradient synchronization, which we talked about in inference, it's the bandwidth requirement is relatively lower. The key metric for the training and the inferences for the training, the key metric is how much time it takes to train the model. For inference, it is about latency and higher liability. So basically, if you think that you're building a robot. How much time it takes to build a robot and how much time it takes to train the robot so that it can perform the task, which is, which it's intended for. Inferencing is just like executing the robot. So basically the the end user commands the robot to walk, dance, or whatever the purpose it is built for to execute the purpose it is built for. Training happens offline, like when the robot is being built. It is trained. It is not available for the user. It's still being manufactured by the company or the system, which is developing it. Inference is online, right? The robot is available for service whenever it's required. From an infrastructure requirement, training clusters are massive. They're like centralized networks because they require a lot of bandwidth and the more resources they have, the lesser time it takes to train a particular model, influencing clusters, often small and distributed, something similar to what we have been using in our data centers, the regular networks, what we have been using in the data centers, those could be used as intrinsic clusters. So with those differences in mind, let's talk about what it takes to build a network that supports these workloads, specifically a lossless network fabric. This is where technologies like RDMA, condition control and specialized architectures come into play To get the performance we need for training, we use RDMA or remote direct memory access. Standard networking is too slow because data has to pass through. The C-P-U-R-D-M-A allows the NIC to transfer data directly into G p's memory by passing the CPU entirely. This gives us the low latency required for a loss risk fabric. I often joke that if humans had RDMA, you could just sit in the room and by the end of the session, all my AI networking knowledge would be copied into your brain instantly. But for RDMA to work, the network must act losslessly, meaning no packet drops even during spikes or congestion. This requires careful planning around buffers, queues, and traffic engineering. Now, we may say we are using RDMA as a technology. How about the physical network for ai? Here is a high level topology. We typically split the network into two, which is the front end network and the backend network. The front end network is your standard ethernet network, which you use for your storage, out of management, user access. Just the regular data center which you have been using. The bank end network is a place where the magic happens. This is a dedicated high speed GPU to GPU fabric. So any model you're training here, any robots you're training here, it has to happen in the backend network. It is purpose built and it carries only compute traffic. The GPU gradients, which you talked about, the activation, the model TERs, all of that happens in the backend network. And of course, since the backend network uses RDMA because it involves G pt, GP communication, so it has to be lossless. Because if any packet drops the entire collective operation on the collective communication. Communication we talked about between the GPUs will stall or hash to restart. For the backend network, which runs RDMA, we have two choices in terms of technology or in terms of system infrastructure. Either we could use Infinity Band or we could use Ethernet Infinity Band have has been used in high performance computing. It's a traditional, it's a standard, but infinity band is propriety. It is fast, it has better latency as compared to ethernet, but it is propriety and all its component, all its element, the infrastructure to build up. And InfiniBand, a cluster. It's expensive. RDMA is natively supported on Infiniti Bank. Like it does not require any tweaks or adjustments to run RDMA or Infiniti Bank. You can just directly run RDMA or Infiniti Band as is. However, when you talk about ethernet, so ethernet as we know is our best effort. Technology, basically. It's not lossless. RDME requires lost us. What we have been talking about and what we have done to run RDMA or ethernet is we have developed some industry standard a particular protocol, which we're going to talk about in the upcoming slide, and then we also have to kind of configure or make the ethernet with certain condition mechanism to make it. At on par within InfiniBand without compromising on its features. So ethernet as such, we know is a standard. Being used widely, very popular. Most of the folks out there know how to operate ether, ethernet, the switches, the infra, the optics, all the infrastructures are quite standard, and we can run the entire backend network or e ethernet. We don't have to change anything except. For the part where we have to tweak the RDMA to how to use ethernet. So we have to customize ethernet for RDMA and nothing is non-standard. Everything is quite a standard procedure. So the big question here, what it takes to get ethernet on par with Infin Bank. So the answer is rocky, which is nothing but RDMA or converge ethernet. What we do here is we encapsulate RDMA inside an ethernet frame. We started with Rocky version one, which was only limited to layer two domains because it did not have a UDP IP header. Then that was slowly upgraded to Rocky version two, which where we added the UDP IP headers. RDMA would be able to route across layer three networks. So this is what allows ethernet to scale AI clusters across racks, row and entire data center halls. So let's talk about more about Rocky V two. With RO UE two GPU still use RDMA, but now the RDMA packets ride over ethernet. The NIC can place data directly into the GPU memory without involving the CPU kernel, which cuts latency from microseconds to nanoseconds. This is a magic that makes ethernet a serious contender for AI networking. Of course, using ethernet introduces one challenge. RDA assumes a loss alert network, which you talked about. And ethernet is best referred if the buffer fills up ethernet drop packets. But RDA cannot tolerate drops. So introduce two mechanisms here, starting with the ECN, which is the explicit condition notification. ECN does not drop packet insert. It marks them. When the switch starts to build queue pressure, the endpoints then slow down based on those marks. This keeps the network stable and predictable exactly what AI workloads need. The second mechanism is PFC or priority flow control, PFC, pauses traffic, its specific priorities when congestion occurs. This gives us. Per prietary lossless behavior, even when multiple traffic types share the same links, proper tuning of PFC, the headroom, the pause thresholds, and the queue management is critical to deploying Rocky QV two successfully. Looking ahead, the UAC or the Ultra Ethernet Consortium Initiative is an industry group optimizing ethernet specifically for ai. They're developing standards for better congestion control, lower latency, and brought interoperability to replace proprietary solutions completely. So what we are seeing here is UAC because we know that ethernet in its own form cannot be used for. The backend network, what we discussed about how we use Rocky V two and how we are using congestion mechanism. So the goal of the alter Ethernet consortium team is to ensure that we develop a standard ethernet or a modern ethernet, what we call it, or what they call it as ultra ethernet. Which will work flawlessly for building AI backend networks. Performance is critical, but so is security. AI introduces new attack surfaces that traditions, tools don't always cover. Some key risk included here, like I've put it on the slide, is data poisoning, model extraction, privacy violations, insider threats. As AI systems handle more sensitive and proprietary data, securing the pipeline just becomes an important as a accelerating it. To address these race, we rely on we many core strategies, which is like encryption of data and transit and rest secure designs. Continuous monitoring. We monitor the AI network continuously to di discover any anomalies in traffic flows or any anomalies in behaviors of the users. And then access control, which is all very important in aspects of securing our network. So these guardrails ensure AI workloads remain secure even as they scale dramatically. Talking about dpu, DPU play a huge role in they enforce these security policies like encryption and isolation right at the server edge, operating the work. So the GPU can focus a hundred percent on training. They also help with model partnering by managing data moment efficiently. Let's bring everything together. What we discussed so far is scalability aspect. How we are making AI networks more scalable. So AI is moving from a single server to massive clusters like we talked about, how we require massive amounts of GPU to cater to the AI demand. What we have. So the network must also scale with it. We discussed about performance, how ethernet with the Rocky V two can provide the high throughput, low latency fabric AI workloads required. Also, we talked about the UEC or the Ultra Ethernet consortium initiatives where they're trying to develop ultra ethernet, which would work flawlessly with the AI backend network. And if you get the network architecture right, you can build a system that's far secure and ready for the future of ai. Thank you for your time. I hope this was informative, and I wish you a great confidence.

Slides

Download slides (PDF)

See all 26 talks at this event!

Conf42 DevSecOps 2025 - Online

December 04 2025 - premiere 5PM GMT

AI Infrastructure over Ethernet: Secure, Scalable, High-Performance Design

Video size:

Abstract

Summary

Transcript

Slides

Nazim Khan

Technical Solutions Architect @ Cisco Systems

Join the community!

Featured event

2026

2025

Info

Conf42 DevSecOps 2025 - Online

December 04 2025 - premiere 5PM GMT

AI Infrastructure over Ethernet: Secure, Scalable, High-Performance Design

Video size:

Abstract

Summary

Transcript

Slides

Nazim Khan

Technical Solutions Architect @ Cisco Systems

Join the community!