Reimagining SerDes for Scalable AI: Architectures, Bottlenecks, and Breakthroughs

Video size:

Abstract

Explore how next-gen AI pushes SerDes design to the edge higher speeds, tighter power, and smarter tuning. Learn practical strategies, dive into real-world SoC case studies, and discover how to scale high-speed interconnects for AI at massive scale.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. I'm Fanish Product Management Professional from Synopsis. Welcome to the presentation on re-imagining service for scalable AI architectures bottleneck and breakthroughs. In today's presentation, I'll be using a service. More often. It is a short form for serializer de serializer. It's a high speed interface that is used for data transfers. Today's agenda covers ai, workload impact on interconnect designs, signal integrity, challenges at multi gigabit data rates, architectural tradeoffs, and future directions for. The generative AI leads to exponential growth in compute demands. The model sizes are doubling every three to four months. Despite the parallelism, the training times are increased to weeks and months. Due to increased compute demands, the memory and interconnect bandwidths are falling behind to support the compute demands. With this, it puts a lot of pressure on the interconnect and memory technologies to deliver higher bandwidths with the lower power and lower latency. The conflicting demands for service in the I era. So as I mentioned earlier, in order to support the increased compute demands, the interconnect bandwidths needs to be increased. That means the data rates for the the service are going high and high. If you see the current transition. The Ethernet's service are moving from 112 gig data rates to 224 gig data rate, and the PCIE is moving from 64 gig data rate to 128 gig data rate, even with increased data rates. The power efficiency is also very key. So in order to meet the system requirements, we have to ensure. The power efficiency targets are well within the limit. The current side is the power efficiencies are around like four to five Pico Joel per be that needs to support the long range channels of other, of 45 to 50 db EV. In addition to this we need to deal with the signal integrated challenges as well at a higher data rate, so with increased data rate. The signal integrity challenges are becoming more and more at a higher data rates. The channel loss is very high and it leads to more reflections as well due to the con discontinuities in the channel. And the crosstalk impact is also very high. So in addition to that, we need to understand the workloads where this particular service is being utilized. So understanding the workloads will help us. To better get the use case and so we can optimize the service to meet all the requirements. The conflicting demands are creating a design paradox that demands innovative approaches to meet all the service requirements. Let's try to understand the AI workload impact on interconnect designs. Sir. So as we know, like there are two different workloads, like a training workload and inference workloads. Let's start with the inference workloads. Inference workloads are more of a topic with the variable loads that needs asymetic bandwidth requirements. And inference. Workloads are more of a memory bond than a compute bond, and the latency is very key. Here. If you look at the training workloads, training workloads are more of a compute bound, and it needs like a very high bandwidth to move the higher data. And the latency is also very key here. So for the distinct workload characteristics requires a specialized research design techniques. So. For ai, before we start designing the the service, we need to better understand the exact use case and the workload requirements that will help us to optimize the performance power area and the latency. Let's touch base on the signal integrity at multi gigabit day trade. The channel characteristics varies with the frequency. The channel loss is very high at higher data rates. If you look at the ethernet service at 224 gigabytes per second data rate, it has to support channel loss of order of 45 to 50 DB to meet the system level constraints. Similarly. The PCIE Gen seven server is at 128 gigabits per second needs to meet a channel loss of 35 to 40 DB with increased data rates. We have a design constraints on the jitter as well as like increased inter symbol interference, increased reflections due to discontinuities in the channels, and increased crosstalk effect. S. So as we move forward 2 24 and beyond, we need to implement novel equalization techniques to address all the signal integrity challenges. Let's touch on innovations in service design. The traditional n RZ based CER implements a feed forward equalizer. On the transmit side and continuous time linear equalizer, followed by a dis feedback equalizer on the received side in order to support the long reach channel requirements as the data rates are increasing, the signals scheme for ER has moved to like a PAM four from Nazi. All the PAM four service are DSP based, these service. Leverages multi-tap feed forward equalizer and multi-tap distill feedback equalizer and optimize the continuous time linear equalizers, along with the the high performance data converters to meet the stringent signal integrated constraints. If you look at adopted equalization. Real time adaptation techniques are implemented to fine tune the service parameters to meet to meet the channel characteristics. Machine learning for service. So there are machine learning algorithms are implemented for calibration and adaptation. Advanced CDR and forward error correction techniques. The clock and data recovery is a key function that is being implemented on receiver side to recover the clock from the received data. So advanced clock on data recovery circuits uses a digital bang bang phase detector. And a multi-phase sampling to improve the accuracy of the sampling and the spec spectrum. Clocking is implemented to minimize the electro migration and IR drop. In addition to that, hybrid architectures are selected to balance the performance versus power forward error correction circuitry. Our error correction circuitry helps to optimize the bid rate by minimizing the bids within the service. There are various standard fact circuits are available out in the market like re, Solomon Fact, and L-D-P-C-F. A specific effect can be selected based on the channel performance and power and latency constraints. Let's look into the architectural tradeoffs for the SER design for er. As I mentioned earlier, performance power area, and latency are the key parameters. This leads to a multi-dimensional system. Design challenges. So we need to address and optimize all these four parameters to make sure it meets the system requirements. So the next one is looking at different architectural options, analog architectures versus digital architectures. So the legacy service are more of analog architectures with the higher data rates and the signaling scheme moved to PAMM four, the latest PAMM four. ER are more of a DSP based and moving towards the digital architectures to enable all the, the process scalability advantages as well as to implement the voltage scaling to further minimize the power. For. Third is the configurability and scalability are very key parameters. As I mentioned earlier, in order to support a wide variety of bandwidth requirement service can be configured from one lane to 16 lane for ethernet. And similarly like the PCI use case. So configurability is one key, important parameter for the service as well as reconfigurability is also equally important. So if we look at multi-protocol service, the same service can be configured for ethernet as well as the PCIE based on the, the use case. The Power Challenge. AI workload demands a few hundred sub servers to be used in order to meet the compute demands. So. The power efficiency is very key for the service. There are various techniques can be implemented starting from circuit to the system level to optimize the power. If you look at the, the circuit techniques available to optimize the power that includes supply voltage, scaling, adaptive biasing, and cloud getting. If you look at the architectural optimizations, we can implement the power eye landing. And the workload aware power states that can optimize the architectural level. If you look at the system approaches, there are a few ways of optimizing the power by implementing the dynamic voltage and frequency scaling, as well as thermal aware floor plan and placement techniques also will help us to optimize the power. The holistic solution strategies for the service design. So before we start designing the er, we need to analyze few things and understand the use case. Then we can optimize the the service solution for a specific application use case. So in this case, we need to start with the workload analysis. We have to understand the exact workload patterns and the traffic use case, so that will help us to better understand the service use case. And it'll help us to optimize the performance, the signal integrity and power integrity challenges. We need to build a end-to-end model that includes the transmitter receiver and the channel, including the package. And PCB traces and analyze the system level performance and fine tune the characteristics of the individual modules within the transmitter and receiver based on the, the channel characteristics that will help us to optimize the performance power area and the latency of the individual modules, the architecture selection. So architecture selection means we need to better understand the application use case. For example, the Ultra Accelerator link can be used for scale up architecture within the data centers. Similarly, ultra ethernet can be used for scale out architectures for data center architectures. So these particular configurations are optimized for latency and power and performance. In addition to all these three, we need to look at the physical design as well. We need to optimize the service for north South as well as the east west placement in the chip. So most of these AI applications needs hundreds of service lanes. So, so we will be able to place all these service in. On north side, south side, east and west side, and find optimal package escape routes to not to impact the performance. Let's touch on the future. Directions insert is for ai, so as we discussed earlier, we are at a data rate of 224 gig for current ethernet service. And we'll be moving to 448 gigabytes per second data rate in next to two to three years. Similarly we'll be transitioning to PCIE generate at 258 gigabits per second from PCI Gen seven, which is at 128 gigabits per second. And link also will be transitioned to 220 4K to 448 in a couple of years. In addition to these NextGen technologies on the standards, the co-pack optics is becoming more and more popular. This will help move the optics closer to the switches and AI accelerators that minimizes the channel loss requirements and eliminates the need for the long service, which in turn optimizes the, the power and performance. In addition to the co package optics service for advanced packaging, also becoming more and more popular with the multi D solutions chip plates and the 3D packaging applications needs fine tuning the lot of service requirements to meet all these latest technologies. So. These technologies promises five to eight x improvement in the bandwidth with the reduction in power by two to three x. So this will help us to further improve the, the performance and minimize the power. The key takeaways from the presentation are AI workloads are fundamentally reshaping the. Service requirements, power efficiency and latency are key Design constraints for the service signal integrity requires increasingly sophisticated approaches. Heterogene service architectures are the feature. This concludes my presentation for today. Thanks everyone. I.

Slides

Download slides (PDF)

See all 53 talks at this event!

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Reimagining SerDes for Scalable AI: Architectures, Bottlenecks, and Breakthroughs

Video size:

Abstract

Summary

Transcript

Slides

Phani Paladugu

Executive Director - Product Management @ Synopsys

Join the community!

Featured event

2026

2025

Info

Conf42 Kube Native 2025 - Online

October 16 2025 - premiere 5PM GMT

Reimagining SerDes for Scalable AI: Architectures, Bottlenecks, and Breakthroughs

Video size:

Abstract

Summary

Transcript

Slides

Phani Paladugu

Executive Director - Product Management @ Synopsys

Join the community!