Bringing AI to the Edge: How ML is Powering the Future of IoT

Video size:

Abstract

By 2025, 75% of enterprise data will be processed at the edge, revolutionizing AI-powered IoT. Discover how ML deployment is transforming smart cities, healthcare, and manufacturing, enabling real-time intelligence and thereby powering the future of IoT.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. Welcome to the Con 42 Machine Learning Conference. I am Ga Mohan. Today I'm gonna talk about bringing AI to the edge, how ML machine learning models are covering the future of iot. So that's my topic. So let's get started. So the first thing I wanna talk about is what is h AI mean? Because in iot, usually the machine learning models are deployed in the cloud rather than in the edge. And but with Edge ai, what happens is it's a little bit different because you have the capability to deploy models locally on the edge device and reduce the latency. So usually what happens is if you see here in the Edge device, in the H AI scenario, so the devices are connected to a local. Pre-trained model and which can do some inferencing, and that pre-processing is done in the edge itself. And then any result that is there, it goes directly to the user interface. But in the case of the usual scenario, the cloud scenario, what happens? It goes one level step first, the it goes into the cloud, to the do to do the inferencing. And once the inferencing results are out, then the user gets notified. But in the case of Edge ai, you are reducing multiple levels of hopping here. So everything is done at the edge itself, and this helps in real time tracking. For example, if you wanna do traffic light control system, or if you wanna understand if your appliances are broken, all these informations can be immediately available. And you can track also the real time status of your devices. So this also actually solves the privacy issues being once you send the data to the cloud. So it can be geo geographically moved from one location to another. So you can avoid that by keeping the model inferencing very close to the source where it is getting generated. So that is all about H ai. So now let's see about, now let's see about what it means to. Actually seen in an actual scenario. So for example, the first one is a surveillance camera. So I was talking about if you want to create create inferencing in the place where the cameras are located, for example you want to track any anomaly in the outputs so you can keep influencing models right close to the source where the data is generated, which is in the case of. So you can detect if any anomalies happening in the same way. You can do that in the manufacturing robot, so you can real time track without sending these images to the cloud. And then there is a smart speaker in your home, which recognizes your voice locally instead of sending the, sending your voice and your prompt to the cloud for further processing. So instead, you keep everything in the source for faster response and also for privacy concerns. So these are some of the examples that you can see today. And the next one is about why edge is better than cloud. So now we are moving the AI inferencing to the edge. So is it how better it is? So you can see that in the cloud, the inferencing happens in the cloud, in the edge. It happens in the. Device itself. So what happens? What is a big advantage of this? So the first advantage is latency, as we have seen, because you are getting the real time response as the data is generated, and then the inferencing is done, and also it reduces the bandwidth. So bandwidth being so you are not sending a lot of data to the cloud. So there's minimal data that is being up that is being processed as as well as sent to the cloud. Now comes security, which we already touched upon. So you are reducing the number of hops that happens for the data. So in this case, in this scenario, what happens is you are creating a much more secure solution because you're, you are not moving the data around most often. And also it's very reliable because it operates with even no internet connectivity as well as limited internet connection. That way you can create reliable solutions. And the last but not least advantage of that is the energy. So because you are not spending that much compute com compute in the cloud, you're basically reducing the energy as well. So this is some difference between Edge AI and then the cloud ai. So the biggest change in the paradigm shift for IOT today is 75% of the data today as we speak for this year, is actually processed at the edge, which is revolutionizing the AI powered iot. This is changing the way we approach the iot solutions because you are doing a lot more than what you can a lot more at the edge than you were able to do before. But doing this actually ha poses its own challenges in the background. So the first challenge is limited computing resources because. Now you're limited to, now you're doing more than what you used to do. So you have usually these devices used to be resource constrained devices, right? So now you have low processing power compared to those powerful GPUs and TPUs that are running in the cloud. And then you also have limited memory and power, memory and storage for the AI models to use. Because there are so much going on, right? And the devices are already resource constrained and the power supply. So mostly these devices are battery operated. So because of that, there is a constraint in the power as well. And you cannot run actually complex models like a full fledged resident one 50 and stuff like that, because. You don't have that much resources in the case of compute, storage processing power, et cetera. And also the module deployment and update becomes a challenge because it's very hard to push all these updates to the we might have 10,000 devices in the edge, right? So it's very hard to go and, update each and every device for a new model when it is available. And also you have the risk of not matching the version that you are actually, you actually need. This also I mean we have already seen that it actually helps with security, but also it also possesses another security in the other side of the spectrum. So the security issues being the devices are actually mostly open to the public network. So you you are prone to malware attacks, and also you need a secure boot in the device. And also you need to use lightweight standard protocols, which which provides security from the start. So other than that it becomes a huge prone to threats and attacks even though it is secure in its way that you're not doing in the cloud, but there is of course security risk in the edge itself. So there are other challenges. Let's go into the second part of the challenge. The second part of the challenge being. Connectivity issues. So connectivity issues, as I said before these devices are usually deployed in remote services. Sometimes they don't have any internet in them, so it is very hard to sync to the cloud even after getting some inferencing data and then you wanna store it in the cloud or, use it for later purposes. For deploying the cloud for using the cloud for post-processing. So it's very, it's becomes a hard challenge. And the next challenge is model optimization and compression. So you have to ensure that you are creating a model which is suitable for the edge. So it has to be lightweight. So the accuracy might be compromised because you're not gonna get the full blown result of the model because it's very tailored to the scenario at the edge. And in the case of observability, it's very hard to do because. You have to monitor performance and failures. You can do only simple logs and tools. You cannot do complex telemetry data like open Telemetry, Yeager, and all that stuff because it's just not possible with the resources in hand. And there's also diversity problem, Harvard diversity problems. So there are so many devices. So many different vendors are there. So everyone has their own OS protocols. So it becomes a challenge to aggravate those devices and ensure that you're able to run those models everywhere in a uniform fashion. So these are one of the challenges that we have. Now let's go into the nitty gritty details of best practices. So even though we have some challenges, but there is a way that we can actually make it better by using some of these practices. So there are four practices that I'm gonna cover today. So the first is deploying ML models at scale. How we can do that so that we have 10,000 devices and the devices keep increasing. How do we make sure that the models are getting deployed to every device? And the second one is updating models on the edge. How do you do it? If there is a new version of the model that is available how we can ensure that you are following the best practice for that. And also you use model compression techniques, which will reduce the model size and then ensure that it is used for the scenario in hand. And also last but not the least, using AI pipelines. The cloud is not in the not completely out of the picture, but you can still use them for your other purposes, which will get into it. So the first practice, which I wanna cover, is like how to deploy models at this edge. Since we have so many devices, thousands and thousand devices, it becomes very critical to ensure that you have model portability. So you wanna ensure that you're using a runtime in the device, which is portable for all the models. For example, use formats like Onyx or TensorFlow Light that works across different hardware because we wanna ensure that why if it is working in one hardware and if it's a different vendor in a different hardware, it should work, right? Because then there is a model incompatibility. So that's why you should use a model that is portable. And also you should also use hardware abstractions like target accelerators Nvidia Jetson or in Intel Vid or Arm NPUs because these devices are designed in a way to use unified run times. And you can use these unified runtime to deploy any model of your choice, which is targeted specifically for H ai. And the last best practice to deploy models at scale is containerization. So once you package all the dependencies in a container, you can use Docker to run it so that you have a consistent execution across different platforms so that in that way you have a steady way to deploy these models. So you package the models inside the containers as well so that you ha you, you're expecting something in the same consistent result in every device that you're deploying. So this is one of the best practices to follow first when you're deploying so many devices for I iot H Solutions. So the next solutions for updating the models, for example. If you have a new model that is coming up after you have pre-trained them with the data, that historic data that you have, now you need to understand how to update them, right? Because frequent updates are required required in order to improve the efficiency. As well as, if there is any vulnerability, you wanna ensure that is also taken care. So you need to adapt to an environment that is constantly shifting. So in order to do that, some of the best practices to do is over the air updates with version control. So that is basically sell sending to multiple multiple devices over the air, which means over the internet. And then there is another practice called AB testing. So you first test with one model and deploy them and see how it works, and then you roll out the rest of them for the rest of the devices. And there is also another scenario you can use, which is digital twin. So digital twin is basically, you mimic the same physical device to a digital environment and see that if you put the new model into the digital twin, what happens? And if everything is the everything is going fine, then you can go ahead and update that. So this one is about how to update the models. So the third practice is how to use models that uses the compression techniques. So since Edge AI is going to use a different model than the usual model that we use, so we need to understand what kind of compression techniques that goes to the model. So the first one is the quantization. So quantization you can think of if you have a cloud, a beefy system, you would use a 34 bit machine or a 64 bit machine, right? So now you are no longer in such a luxury en environment, you're in a very small, constrained devices like a raspberry pie. So now you have to run the model in there, right? So you need to compress your model to eight bit in teacher so that it's smaller size and it can also do faster inferencing. And also another technique that you can do is pruning. So what you do is you remove the insignificant fats or the neurons from the model. So this way what happens is it speeds up the computation as well as gives you inferencing the cells very quickly. And then the next, the third one is like knowledge distillation. So basically it's a concept of dialing down the model to a student model so that it does the basic operations. Correctly. And then you try to learn to make the teacher model, which is the larger model. So in this way you get also higher efficiency with good accuracy as well. So these are some of the techniques that are deployed in order to support the model for the edge AI scenario. So the last but not the least practice is the. Cloud managed AI pipelines so you're not taking cloud entirely out of the picture because rather you would try to augment both cloud and edge together. So rather than running the, managing the life cycles of all these systems manually, you can do it using cloud based ML operations, which are machine learning operations. So these pipelines can actually train and retrain on the cloud and then update the dataset. What I mean by that is like you're doing behind the scene work, which is doing all the training and, making the model better so that you can, you don't have to pre-train them anymore, right? So the CICD pipelines are there to automate the testing as well as the deployment. And also you can do telemetry collection from edge to continuously improve the models because. Some basic telemetry is still okay because you would need that to understand what's going on in the edge device and then use that information to train your model better and then use it for the, different version, which is much better than the previous version. These are some of the examples. Azure, ml AWS SageMaker always, they have their own solutions. For AI pipelines so that you can integrate and continuously deploy AI solutions in for the real world application. So now let's see some of the actual real world application where it is actually using some of the principles we just talked about. So first is the deploying example. So Walmart actually uses, for deploying models at scale, how what they did is basically they deployed thousands of cameras with AI models in their retail stores to monitor inventory levels. Also understand where each order, each item is placed, and also custom behavior. There is any theft or if there is any anomalies that is happening. So how they used it is they used compact edge servers like Nvidia Jetson and also they used computer vision models in which is optimized just for edge scenario. And they use tens. Tens is it is a simpler version of the user 10 flow. So now we have the result. So result is basically you have reduced stockouts and you have optimized the restocking schedules. You understand how to improve the efficiency, operational efficiency across different locations. So this is for deploying at scale for, the first best practice. So now let's go into the second best practice, which is updating the versions over the ensuring that you're versioning it, and you are delivering the right model to the right devices. So Tesla actually did the over the air updates model. Model updates. So what they did is they routinely push AI updates. For example, the autopilot vision and the driving behavior models, they go to the cars globally. So how they did it was they used the secure OTA pipelines to validate and deploy the module with rollback capabilities. So they have ensured that for each model that they are incrementing, they made the minor major or minor improvements. Without the customers going to the store and making an update. So they basically, they did it over the air, which means that the customer received the models just over the internet and then it is much easier to deploy and reduces the time and efficiency of the customer as well as the service centers. So this was, this is a very good example of real time scenario of Tesla cars getting updated. So the last, oh no, the la the model compression, right? So Google actually did the model compression for their mobile net on their Android devices. So what they did was they did the mobile net family, which is basically just targeted for a lightweight mobile inferencing scenario. So you can do image classification in with the compute power that you have for the mobile or object detection. So how they did it, basically they used quantitation as well as pruning techniques, which we talked about to reduce the model size and also use the that to retain the accuracy in which they wanna detect or image or object detection. So the result was real time. You were able to get AI features in your phone instantly without the use of internet. So you can just even in Europe plane, you can do all the image detection and object detection pretty easily. So this is an example. Very good example for model compression. So let's go to the last example, which is the cloud managed AI pipeline. So Siemen actually predicted they did predicted. Maintenance in the factories. So Simons, what they did was they deployed the AI across factory flows. So whenever the machines were down, or before even getting to that state, they were able to predict it and they used Azure iot as well as ML pipelines to train models in the cloud and also deploy the different versions that are needed. By this way, they reduce the downtime because what happens is whenever the the machines go on maintenance mode, it takes forever to come back up and it creates a lot of downtime, cost inconvenience to the customer. So it, they reduced all that and the machine downtime was reduced to 30%, which is pretty nice. So now they're also using the cloud to continuously retain models. For all the edge collected for the edge scenarios. So they were able to collect the telemetry from the edge and use that in the cloud and use the pipelines to make this seamless so that they use both the edge in the cloud scenario. So this way they were able to reduce the maintenance as well as make sure that they are getting real time updates of their devices using the AI models. So these are some of the best practices and we saw some real time use cases. So that's pretty much what I had and the how AI is actually re revolutionizing iot in the edge space is amazing. And hopefully this gave you some insights into the into the background of IOT and its solution and what is being followed right now. Thank you so much.

Slides

Download slides (PDF)

See all 136 talks at this event!

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Bringing AI to the Edge: How ML is Powering the Future of IoT

Video size:

Abstract

Summary

Transcript

Slides

Gayathri Jegan Mohan

Software Engineer @ Microsoft

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Bringing AI to the Edge: How ML is Powering the Future of IoT

Video size:

Abstract

Summary

Transcript

Slides

Gayathri Jegan Mohan

Software Engineer @ Microsoft

Join the community!