Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
My name is Murli Kris Manum.
Welcome to the Con 42 Golan Conference.
Today we will discuss about real-time, a inference at the
edge for self-driving cars.
Self-driving cars represent one of the most challenging
applications of edge computing and ai, modern autonomous vehicles.
Must process over a gigabyte of sensor data per second from high
resolution cameras, lidars, and radar units to make a split second
decision that ensure passenger safety.
In this presentation, we will explore how real-time a inference at the
edge enables autonomous vehicles to function safely and efficiently.
We'll also examine.
Computational challenges, hardware innovations and optimization
techniques that makes it this possible.
Welcome.
Let's get into the details first.
Let's understand the challenge autonomous vehicles pose in terms
of the processing requirements.
First, the data that it generates is massive.
Second, the latency requirements for safe operation of the.
Vehicle on the road is very strict.
And then three, the variability of the compute requirements based
on the environment the vehicle operates in varies significantly
from the data perspective.
Autonomous vehicles generate anywhere from 1.5, 1.4 terabytes to 19
terabytes of raw data per hour from the multiple high resolution cameras.
Which operate at 32 60 frames per second at full HD resolution.
And then lidars, which generates a hundred thousand to 4.5 million points per frame.
And lidar radar systems operating at 24 to 77 gigahertz.
With all this amount of data, strict latency requirements, poses another
challenge, complete perception.
Decision and action pipeline must execute within a hundred to 300
milliseconds for collagen avoidance.
at high speeds.
Each 10 10 milliseconds of crossing delay translates to approximately
0.3 to 0.5 meters of extra stopping distance, which is very crucial
for safe operation of the cars.
Next.
Environmental variability, computational load.
Can vary by up to 480% between minimal complexity scenarios,
which are open highways and maximum complexity environments like
dense urban intersections where vehicles crisscross humans present,
and many more complex scenarios.
This requires adaptive computation architectures.
Let's understand how edge computing evolved for autonomous, cars.
First generation repurposed consumer gpu.
Early autonomous prototypes used adapted consumer GPUs delivering up
to 12 te terra operations per second with significant limitations, namely.
High power consumption, up to 300 watts.
This high power post, additional constraints with respect to the thermal
management, which required liquid cooling and also data transfer bottlenecks
up to 67% of the processing time.
In the second generation automotive grad, automotive
grade accelerators were designed.
As a purpose built process with improved efficiency up to two to four
terra operations per second per watt, with reduced memory traffic by up to
60% through pruning and compression techniques, and also supported
reduced patient computing for up to four x throughput improvement.
Let's get into the next level of details on all these techniques.
Edge versus com cloud computing trade off.
Let's understand the differences and the advantages between edge and cloud.
Computings edge computing advantages, it is near instantaneous processing
with five to 15 milliseconds latency versus up to 500 milliseconds for cloud.
Maintains functionality during connectivity interruptions, which may
happen up to 38% of the route reduced security vulnerabilities up to 50%.
Better privacy protection by keeping sensitive data
within the vehicle boundaries.
And of course, cloud computing has its own advantages.
It is two to three others, magnitude greater computational throughput.
It enables more sophisticated algorithms with higher accuracy.
It externalizes the power consumption up to 1500 watts for edge processing.
As the power is delivered from a grid, there is no limit on
the power that it can consume.
It is ideal for non-safety critical task, like high definition
map generation, et cetera.
So with that, now let's get into hybrid approach.
In the cloud processing, high bandwidth analytics mapping
and fleet learning are used.
This is up to 70, 72% of the task.
The hybrid processing where intelligent load balancing with seamless task
migration across the platforms is done.
The last one is edge processing.
This is used primarily for mission critical perception and control systems,
which is up to 82% of the processing needs in a hybrid, in a autonomous car,
the industry has conversed on a hybrid architecture that strategically allocates
computational workloads between onboard systems and cloud infrastructure.
The sophisticated approach.
Delivers almost a hundred percent functional liability while reducing the
vehicle computational requirements by up to 45% compared to pure edge solutions.
By leveraging both paradigms in both paradigms, inherent strengths, which
is immediacy of the edge processing with scalability of cloud computing.
This hybrid framework creates an optimal balance of safety, efficiency,
and capitalize while effectively neutralizing the limitations of
either approach in its isolation.
Let's understanding some, let's understand some of the issues are
some of the requirements related to various aspects of autonomous driving.
First, update detection and classification.
This is done mostly at the edge performance requirements.
With the state of the art models, we can achieve up to 87% of precision while
operating within the strict latency constraints of up to 50 milliseconds
per frame on auto grade processors.
Second multi-stage processing in this, technique.
In Shell region.
Proposal networks operate at 2230 heads, followed by more
intensive classification networks.
That process only identified region of interest reducing computational
requirements by up to 75%.
There are, there is one more optimization technique that is quantization.
Quantization.
If it is full precision, we use 32 bits.
But by quantization reducing the quantization from 32 bit to eight bit
in T Precision, which reduces the memory requirements up to 76% and inference
time by up to 3.4 times with accuracy.
Degradation of only up to 1.8 are maybe 2% compared to the full
precision 30 littlebit parameters.
Next, let's understand the lane detection and project planning compute requirements.
Lane detection con consumes up to 12% of perception budget processing camera
feeds either in seven 20 P resolution or 1,080 p resolution to extract the
lane markings within five to eight centimeters, accuracy at distances
up to 80 meters multimodal fusion.
In this RGB camera data is combined with lidar reflectivity and rather
returns to maintain accuracy during adverse weather when visual data
quality degrades almost by 60%.
Third, trajectory planning in this, in, in this task hardware are, the
compute requirements evaluates 1500 to 3000 candidate trajectories.
Every a hundred milliseconds, which comprises of almost 25% of the
computational budget for optimizing the safety, comfort, and efficiency
within the strict time constraints or latency constraints we have.
Next let's you, let's understand few of the techniques that are used to
optimize further on the model itself.
Quantization.
Converts high precision that bit floating.
Point to an efficient eight bit T representation, delivering four
times weight compression and two times activation Compression with
minimal accuracy loss up to 1.2%.
This is critical for its deployment.
Eight bit operations consume nine times less energy compared
to floating point calculations.
Next pruning technique.
In this technique, we systematically eliminate T neural network parameters,
which are up to 70%, that they contribute negligible to the performance, and then
structured pruning techniques yield up to 3.8 times computational efficiency
gains while maintaining model integrity.
With this, maybe accuracy drops only up to 2% with fine tuning.
And then another technique for model optimization is knowledge distillation.
In this technique, we use larger teach model, teacher models to guide the
training of compact and student networks.
This technique enables dramatically smaller models up to 8, 8, 8 times fewer
parameters to capture the essential calculates of larger architectures.
While sacrificing only up to 3% accuracy, this is ideal for
resource constrained edge devices.
Let's get into some hard, hardware aware neural architecture, such improvements
automated discovery discovers in this mode, in this optimization.
It discovers novel network architectures tailored to specific hardware,
outperforming handcrafted designs by up to 12% inaccuracy, while reducing inference
latency by 2.3 times, which is very good.
Then the second technique is hardware modeling.
in this technique, we incorporate details, detailed models of memory access patterns.
Operator execution times and paralyzation capital capabilities of hardware to
maximize the hardware utilization in this models are aware of the actual hardware,
underlying hardware and its capabilities to achieve battery utilization.
Third sensor fusion optimization, particularly effective for
sensor fusion tasks, reducing re inference latency by up to 48%.
Compared to the conventional architecture on auto grade accelerators, fourth
attention mechanisms discovers efficient attention based architectures
that selectively process high information reasons, reducing overall
computational requirements by up to 55%.
There is also another.
Emerging trend, which is neuromorphic computing before
understanding neuromorphic computing.
Let's understand what is event driven processing?
Event driven processing is instead of processing data frame by frame,
we process data based on an event.
This reduces power consumption up to 95% compared to the frame based
approach By allocating resources only when significant changes occur.
This also helps temporal advantages, which is microsecond scale.
Temporal resolutions up to 10 microseconds improves detection latency by 20 to
45 milliseconds for high speed objects compared to conventional vision systems.
This also helps lighting, adaptability, maintains consistent detection.
Across elimination ranges from point 0.1, luminance to a hundred thousand
luminance, addressing limitations with high dynamic range in neuro environments.
With this understanding, let's get, let's understand.
What is neuromorphic computing?
Neuromorphic computing represent a fundamental departure from conventional
architectures, drawing the inspiration from biological neural systems.
These systems integrate.
Processing and memory in artificial neuron and signup structures that
mimic their biological counterparts.
Prototype neuromorphic processing units can process five 50 to a hundred million
events per second while consuming only a hundred to 300 milliwatts of
power representing a two orders of magnitude improvement in efficiency
compared to GPU based solutions.
Okay, let's summarize the future.
Where are we headed with this?
Distributed AI and continuous learning are the future distributed
AI architectures maintain 85 to 92% of the critical functionality, even with
failures in 30% of the processing nodes.
Reduce latency by up to 47% for complex perception tasks.
Through better paralyzation and reduced data movement, continuous learning systems
improve detection accuracy by 15 to 20%.
In novel environments not represented in initial training data, employee
safety, aware, incremental updates, that limit parameter changes
to only up to 2% per cycle.
Federated learning.
Vehicles continue to col collective intelligence while transmitting
only a 0.5% of the data required for central centralized approaches
with this synchronization of every a hundred to 500 kilometers provides
up to 85% of continuous connectivity benefits, which is a great one.
Thank you.
With this.
I conclude my presentation.