Scaling Prompt Engineering: How Ultra Ethernet and UALink Accelerate Token-to-Token Performance

Video size:

Abstract

Prompt engineering at scale needs more than clever text it needs blazing-fast infrastructure. Learn how Ultra Ethernet and UALink boost throughput, reduce inference latency, and accelerate real-time LLM performance across distributed AI systems.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. My name is Raje. I'm principal engineer at Synopsis. My presentation is about how ultra ethernet and UL link accelerate token to token performance. In my presentation, I'll be covering on AI infrastructure bottleneck need for enhanced interconnect technology, UL link Ultra ethernet, and finally the conclusion. AI infrastructure bottleneck in the present AI infrastructure. There is exponential growth in the compute demand. Despite parallelization, the training time of model have rise from week to months. If you look into the picture on the right hand side, model parameters are doubling every three to four months. Multi-dimensional systems are getting stretched, like network bandwidth, network latency, memory, bandwidth, memory capacity, and then compute. Also the design complexities are increasing memory bandwidth, interconnect bandwidth. Now if you look into the extreme right side, we have a DDR standard, HBM standard PCA standard and DD day to day standard are exponentially increasing. Therefore, we need AI infrastructure with enhanced interconnect technology to meet current and future demand. Need for enhanced interconnect technology. In this slide, let try to understand why GPU performance is important. If you consider AI ML lifecycle in the picture, to build a model, we need to prep the data first, we need to prep the data, build a model, train it, test it, and then fine tune the model. So there is a continuous feedback provided to fine tune the model, and this loop continues during which data is split and fed. To multiple GPUs and sometimes multiple machines at larger scale. Therefore, GPU performance influences the timeline of deep learning. UL Link Scale Up UL Link is an open source interconnect technology developed to scale up. Accelerators for AI workload. If you look into the picture here, we have a pod. This is also called as a cluster. We have a racks in them stacked vertically. Each rack will have a GPU. All these GPUs are interconnected to get together through a UL link. So what are we trying to achieve here is we are trying to connect all the GPUs together to create one big giant GPU. To enable memory sharing and synchronization between the accelerators so that there is a direct load store and automat operations enabling between the accelerators. UL Link can connect up to a hundred to thousands of GPUs together ultra ethernet scale out. So Ultra Ethernet is an open source, high performance networking technology developed by ultra ethernet consortium to offload AI workload and HPC. If you look into the picture on the right hand side, we all, we have already discussed about UL Link now talking about ultra ethernet, which is highlighted in the local. These are. Connecting all the clusters together, which is called scale out. This establishes a high bandwidth, multi-path open, standard, highly configurable interface. This is very much important for AI clustering and also ALTA ethernet stack introduces new transport layer with enhanced congestion control and then enhanced RDMA capabilities. Here we are talking about interconnecting millions of GPUs together. Conclusion Token to token performance. UL Link is used to connect two aspirators together so that there is a memory synchronization happening between them. This scenario is optimized for AI workload because of rapid token passing. Ultra ethernet establishes low latency, high throughput for rapid token exchange and scalability. Thank you for taking time and watching my presentation.

Slides

Download slides (PDF)

See all 28 talks at this event!

Conf42 Prompt Engineering 2025 - Online

November 06 2025 - premiere 5PM GMT

Scaling Prompt Engineering: How Ultra Ethernet and UALink Accelerate Token-to-Token Performance

Video size:

Abstract

Summary

Transcript

Slides

Rajesh Arsid

ASIC Physical Design, Principal Engineer @ Synopsys Inc

Join the community!

Featured event

2026

2025

Info

Conf42 Prompt Engineering 2025 - Online

November 06 2025 - premiere 5PM GMT

Scaling Prompt Engineering: How Ultra Ethernet and UALink Accelerate Token-to-Token Performance

Video size:

Abstract

Summary

Transcript

Slides

Rajesh Arsid

ASIC Physical Design, Principal Engineer @ Synopsys Inc

Join the community!