Conf42: Machine Learning 2021


Deploying ML solutions with low latency in Python

Aditya Lohia
Machine Learning Engineer @ Tod'Aers

Aditya Lohia's LinkedIn account

When we aim for better accuracies, sometimes we forget that the algorithms become more massive and slower. This fact renders the algorithms unusable in real-time scenarios. How do you deploy your solution? Which framework to use? Can you use Python for deploying my solution? Can you use Jetson Nano for multi-stream inferencing? If you are curious to solve these questions, join me in this talk to discover TensorRT and DeepStream and how they reduce your algorithm’s latency and memory footprint.

NVIDIA TensorRTâ„¢ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. DeepStream offers a multi-platform scalable framework with TLS security to deploy on edge and connect to any cloud. If you are using a GPU and CUDA/Tensor cores, you can leverage the SDK framework to deploy bigger and better algorithms for your real-time scenarios. The main focus of this talk will be to demonstrate why, where, and how to use TensorRT and DeepStream.

Awesome tech events for

Priority access to all content

Community Discord

Exclusive promotions and giveaways