Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
I'm Fanish Product Management Professional from Synopsis.
Welcome to the presentation on re-imagining service for
scalable AI architectures bottleneck and breakthroughs.
In today's presentation, I'll be using a service.
More often.
It is a short form for serializer de serializer.
It's a high speed interface that is used for data transfers.
Today's agenda covers ai, workload impact on interconnect designs,
signal integrity, challenges at multi gigabit data rates, architectural
tradeoffs, and future directions for.
The generative AI leads to exponential growth in compute demands.
The model sizes are doubling every three to four months.
Despite the parallelism, the training times are increased to weeks and months.
Due to increased compute demands, the memory and interconnect
bandwidths are falling behind to support the compute demands.
With this, it puts a lot of pressure on the interconnect and memory technologies
to deliver higher bandwidths with the lower power and lower latency.
The conflicting demands for service in the I era.
So as I mentioned earlier, in order to support the increased
compute demands, the interconnect bandwidths needs to be increased.
That means the data rates for the the service are going high and high.
If you see the current transition.
The Ethernet's service are moving from 112 gig data rates to 224 gig data rate,
and the PCIE is moving from 64 gig data rate to 128 gig data rate,
even with increased data rates.
The power efficiency is also very key.
So in order to meet the system requirements, we have to ensure.
The power efficiency targets are well within the limit.
The current side is the power efficiencies are around like four to five Pico Joel per
be that needs to support the long range channels of other, of 45 to 50 db EV.
In addition to this we need to deal with the signal integrated
challenges as well at a higher data rate, so with increased data rate.
The signal integrity challenges are becoming more and more
at a higher data rates.
The channel loss is very high and it leads to more reflections as well due to
the con discontinuities in the channel.
And the crosstalk impact is also very high.
So in addition to that, we need to understand the workloads where this
particular service is being utilized.
So understanding the workloads will help us.
To better get the use case and so we can optimize the service
to meet all the requirements.
The conflicting demands are creating a design paradox that
demands innovative approaches to meet all the service requirements.
Let's try to understand the AI workload impact on interconnect designs.
Sir. So as we know, like there are two different workloads, like a training
workload and inference workloads.
Let's start with the inference workloads.
Inference workloads are more of a topic with the variable loads that
needs asymetic bandwidth requirements.
And inference.
Workloads are more of a memory bond than a compute bond, and the latency is very key.
Here.
If you look at the training workloads, training workloads are more of a compute
bound, and it needs like a very high bandwidth to move the higher data.
And the latency is also very key here.
So for the distinct workload characteristics requires a specialized
research design techniques.
So.
For ai, before we start designing the the service, we need to better
understand the exact use case and the workload requirements that will
help us to optimize the performance power area and the latency.
Let's touch base on the signal integrity at multi gigabit day trade.
The channel characteristics varies with the frequency.
The channel loss is very high at higher data rates.
If you look at the ethernet service at 224 gigabytes per second data rate, it has to
support channel loss of order of 45 to 50 DB to meet the system level constraints.
Similarly.
The PCIE Gen seven server is at 128 gigabits per second needs
to meet a channel loss of 35 to 40 DB with increased data rates.
We have a design constraints on the jitter as well as like increased
inter symbol interference, increased reflections due to discontinuities in the
channels, and increased crosstalk effect.
S. So as we move forward 2 24 and beyond, we need to implement novel
equalization techniques to address all the signal integrity challenges.
Let's touch on innovations in service design.
The traditional n RZ based CER implements a feed forward equalizer.
On the transmit side and continuous time linear equalizer, followed by a
dis feedback equalizer on the received side in order to support the long reach
channel requirements as the data rates are increasing, the signals scheme for ER
has moved to like a PAM four from Nazi.
All the PAM four service are DSP based, these service.
Leverages multi-tap feed forward equalizer and multi-tap distill feedback equalizer
and optimize the continuous time linear equalizers, along with the the high
performance data converters to meet the stringent signal integrated constraints.
If you look at adopted equalization.
Real time adaptation techniques are implemented to fine tune the
service parameters to meet to meet the channel characteristics.
Machine learning for service.
So there are machine learning algorithms are implemented for
calibration and adaptation.
Advanced CDR and forward error correction techniques.
The clock and data recovery is a key function that is being implemented
on receiver side to recover the clock from the received data.
So advanced clock on data recovery circuits uses a digital
bang bang phase detector.
And a multi-phase sampling to improve the accuracy of the
sampling and the spec spectrum.
Clocking is implemented to minimize the electro migration and IR drop.
In addition to that, hybrid architectures are selected to
balance the performance versus power forward error correction circuitry.
Our error correction circuitry helps to optimize the bid rate by
minimizing the bids within the service.
There are various standard fact circuits are available out in the market like
re, Solomon Fact, and L-D-P-C-F.
A specific effect can be selected based on the channel performance
and power and latency constraints.
Let's look into the architectural tradeoffs for the SER design for er.
As I mentioned earlier, performance power area, and latency are the key parameters.
This leads to a multi-dimensional system.
Design challenges.
So we need to address and optimize all these four parameters to make
sure it meets the system requirements.
So the next one is looking at different architectural
options, analog architectures versus digital architectures.
So the legacy service are more of analog architectures with the higher data
rates and the signaling scheme moved to PAMM four, the latest PAMM four.
ER are more of a DSP based and moving towards the digital architectures
to enable all the, the process scalability advantages as well as
to implement the voltage scaling to further minimize the power.
For.
Third is the configurability and scalability are very key parameters.
As I mentioned earlier, in order to support a wide variety of bandwidth
requirement service can be configured from one lane to 16 lane for ethernet.
And similarly like the PCI use case.
So configurability is one key, important parameter for the service
as well as reconfigurability is also equally important.
So if we look at multi-protocol service, the same service can be
configured for ethernet as well as the PCIE based on the, the use case.
The Power Challenge.
AI workload demands a few hundred sub servers to be used in order
to meet the compute demands.
So.
The power efficiency is very key for the service.
There are various techniques can be implemented starting from circuit to
the system level to optimize the power.
If you look at the, the circuit techniques available to optimize the power that
includes supply voltage, scaling, adaptive biasing, and cloud getting.
If you look at the architectural optimizations, we can implement
the power eye landing.
And the workload aware power states that can optimize the architectural level.
If you look at the system approaches, there are a few ways of optimizing the
power by implementing the dynamic voltage and frequency scaling, as well as thermal
aware floor plan and placement techniques also will help us to optimize the power.
The holistic solution strategies for the service design.
So before we start designing the er, we need to analyze few things
and understand the use case.
Then we can optimize the the service solution for a
specific application use case.
So in this case, we need to start with the workload analysis.
We have to understand the exact workload patterns and the traffic use
case, so that will help us to better understand the service use case.
And it'll help us to optimize the performance, the signal integrity
and power integrity challenges.
We need to build a end-to-end model that includes the transmitter receiver
and the channel, including the package.
And PCB traces and analyze the system level performance and fine tune the
characteristics of the individual modules within the transmitter and
receiver based on the, the channel characteristics that will help us to
optimize the performance power area and the latency of the individual
modules, the architecture selection.
So architecture selection means we need to better understand
the application use case.
For example, the Ultra Accelerator link can be used for scale up
architecture within the data centers.
Similarly, ultra ethernet can be used for scale out architectures
for data center architectures.
So these particular configurations are optimized for latency
and power and performance.
In addition to all these three, we need to look at the physical design as well.
We need to optimize the service for north South as well as the
east west placement in the chip.
So most of these AI applications needs hundreds of service lanes.
So, so we will be able to place all these service in.
On north side, south side, east and west side, and find optimal package escape
routes to not to impact the performance.
Let's touch on the future.
Directions insert is for ai, so as we discussed earlier, we are at a data rate
of 224 gig for current ethernet service.
And we'll be moving to 448 gigabytes per second data rate
in next to two to three years.
Similarly we'll be transitioning to PCIE generate at 258 gigabits per
second from PCI Gen seven, which is at 128 gigabits per second.
And link also will be transitioned to 220 4K to 448 in a couple of years.
In addition to these NextGen technologies on the standards, the co-pack optics
is becoming more and more popular.
This will help move the optics closer to the switches and AI accelerators
that minimizes the channel loss requirements and eliminates the need
for the long service, which in turn optimizes the, the power and performance.
In addition to the co package optics service for advanced packaging, also
becoming more and more popular with the multi D solutions chip plates and the
3D packaging applications needs fine tuning the lot of service requirements
to meet all these latest technologies.
So.
These technologies promises five to eight x improvement in the bandwidth with the
reduction in power by two to three x.
So this will help us to further improve the, the performance
and minimize the power.
The key takeaways from the presentation are AI workloads
are fundamentally reshaping the.
Service requirements, power efficiency and latency are key Design constraints
for the service signal integrity requires increasingly sophisticated approaches.
Heterogene service architectures are the feature.
This concludes my presentation for today.
Thanks everyone.
I.