Conf42 Machine Learning 2021 - Online

Leverage Power of Machine Learning with ONNX

Video size:

Abstract

Have you ever wanted to make your apps “smarter”? This session will cover what every ML/AI developer should know about Open Neural Network Exchange (ONNX) . Why it’s important and how it can reduce friction in incorporating machine learning models to your apps. We will show how to train models using the framework of your choice, save or convert models into ONNX, and deploy to cloud and edge using a high-performance runtime.

Summary

  • Onnx is short for open neural network exchange. It is an open format for machine learning models. Best place to learn more about Onnx about Onyx is through Onyx AI.
  • Onnx is kind of like PDF, right? You create your Word document in Microsoft Word. Now you can display it on different types of devices using acrobat or PDF viewer. How to create Onyx models, step one. And then we'll talk more about step two.
  • Onnx model zoo allows you to export or using Onyx model zoo. There's existing models out there that you can just download off the Internet and start using incorporating to your application. Once you train it, you can have a typical end to end machine learning process.
  • In order to create a successful business or successful bakery, you need both. Where we create these machine learning models, it is important where we're going to deploy them. Edge meaning how close it is to your customers or your users.
  • There is Onyx runtime where you can run it's a high performance inference engine for your onnx models. It's open sourced by Microsoft under MIT license. It has extensible architecture that allows to have different hardware accelerators. It gives you that flexibility where you want to deploy and run this inferencing.
  • Using ML Net, I wanted to convert a model trained in ML. Net into Onyx. I have two columns on this csv file, years of experience and salary. It would kind of guess how much is the salary based from that experience. Here's a simple example to show you how to do it.
  • As long as we can integrate it to our application, not just existing application or any greenfield application, we can start incorporating these machine learning models through Onnx. Use the right tool for the right job and how you can efficiently run it on a target platform.
  • Ron Dagdag is a lead software engineer at Spacey. He is a 50 year Microsoft MVP. The best way to contact him is through LinkedIn or Twitter. Have a good day.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning, good afternoon, good evening, wherever you are at our virtual world. My name is Ron Dagdag. I'm a lead software engineer at Spacey. Today I will be talking about leverage the power of machine learning with Onnx. For all you Pokemon fans out there, I will not be talking about the Pokemon onyx, nor I will be talking about the mineral onyx. I will be talking about Onyx on NX open neural network exchange. All right, let's go back to basics. What is programming? Programming, traditionally you have an input. You write an algorithm, you combine them together, run it, and it will spit out answers for you. In machine learning world, you have an input, you have examples of what the answers would be, and the computer's goal is to provide an algorithm for you. So as a primer, you have your programming, traditional programming at the right, machine learning on the left still have your input, answers, and your algorithm machine learning world, we call the input and the answers as your training data. You have to use a training framework in order to get a machine learning model. And based from that model you would substitute or use that into your application, and that's what we call inferencing. And you would use a runtime to be able to process your input and your model, and it would give you the answers. And now that you have more answers, it could be a good feedback loop to improve your training data. So typically data scientists would program, or would create a program in Pytorch. They would run it locally on their machine using the cpu. And then of course, if you are a JavaScript developer and you've seen all these different Javascript frameworks and all these different ways, and how you can create applications or web applications the same way as in machine learning world. There's all these different machine learning frameworks, training frameworks that you can use, and the ecosystem is growing. And of course we're not just limited in deploying it locally on our devices, not just on our laptops. Also, you have to sometimes use a phone or deploy it in the cloud. Sometimes you want better performance. You run it through a GPU or FPGA or ASIC, or you can also run it, you might wanted to also run it on a microcontroller. And that's when Onnx comes into the picture. Onyx is the bridge between how you get trained and where to deploy. Onnx is short for open neural network exchange. It is an open format for machine learning models. Notice that it's not just limited to neural networks, it's also capable of your traditional machine learning models too. It is on GitHub, GitHub.com onyx and best place to learn more about Onnx about Onnx is through Onyx AI. And when you go to this website, you'll notice that every time I would go in there's new partners coming in and be able to improve that ecosystem. We just started between partnership between Microsoft and Facebook and I've noticed that more and more there's partners using this application on GitHub about 10.9 what, 11,000 GitHub stars pull request about almost 2000 pull request about 200 contributors about 2000 GitHub forks and there's also model zoo. Onnx is available out there. It is a graduate project of Linux Foundation AI. And so it's becoming more, there's a lot of traction going on for these Onyx application. When would you use Onnx? Is when you have something that's trained in Python and you want to deploy it to a C sharp application, or maybe you want to incorporate it to your Java application or JavaScript application when you have high inferency latency that you want for production use. So, meaning if it's too slow to run it or if you want performance so that you can use it for production. Because let's say if you have it in some training platform or training framework and it's not good enough and you want to improve, if you have high inferency, that would be a good use case for it. If you want to deploy it to an IoT device or an edge device, it might make sense to convert it to Onyx and be able to deploy it to those devices when it's trained on a different OS. And if you want to run that model into a different OS or different hardware, that is a good use case for it. When you want to combine the models, let's say you have a team of data scientists. Some of your models were created on, let's say Pytorch and some of the models were created in keras. And you want to create a pipeline and you want to combine these models that is trained from different frameworks. That is another way. Another one is through training, takes too long. And that's when you start talking about transformer models. Let's say if you want to train it locally at the edge, that is one way that you can use Onyx too. All right, so we did talk about what is Onyx. One thing I want to point out, one good way of to describe Onnx, it's kind of like PDF, right? You create your Word document in Microsoft Word. Or you create your documents in Microsoft Word or some word processing application, you convert it to PDF. Now you can display it on different types of devices using acrobat or PDF viewer. And then we did talk about when to use Onnx. And then we'll talk more about how to create the Onyx models and how to deploy onnx models, how to create Onyx models, step one. And then we'll talk more about step two. Step one. Let's focus on that. Have you ever baked a cake? And of course there's a lot of different ingredients, different procedures. Of course bakers specializes on this. My analogy in this is that bakers or your data scientist, your team, they're the ones who make the secret recipe for your business. They try different tweaks and different ingredients and different procedures and how to create these AI models, which is going to be your secret recipe. So how do you create these Onyx models? One way is to export or using Onyx model zoo. Onyx model zoo. There's existing models out there that you can just download off the Internet and start using incorporating to your application. You can use Azure custom vision or some service that exports to Onyx. You can convert an existing model and also you can train models in azure machine learning or some automated machine learning. So Onnx model zoo allows you to be able to just, someone already pre converted all these different popular AI models out there or machine learning models out there. If you're interested in Restnet, it's already converted to Onyx for you and you can just download it. These are some examples of the different sizes of that model once it's converted to onnx. So it's not just limited in image, there's also sound. There's different models out there you can just download. Another one is through custom vision, which allows you to do low code vision service where you would upload some photos, you tag them, start tagging them, you train to create a machine learning model for you and then you can export it to Onyx. Another way is to convert the model from the existing training frameworks. So let's say you have it in Pytorch or keras or Tensorflow or Scikitlearn. There's a way you can convert it to an Onyx model. Of course there's three steps loaded in existing model into memory. Convert to an Onyx and save that Onyx model. Here's an example of how you would use and convert from Pytorch to onnx. So you would load that model and provide some sample input and use this torch onnx to export it. Another way is know if you have it in keras. Same steps. You load the Keras model, convert the Keras model into Onyx, and then save it as a protoblock. And there is onyxml tools that you can do to pip install. You can also convert the Tensorflow model using command line. So where you specify, of course, your input, where's your saved model and then your output. A lot of good examples how you would do this on GitHub. This one is through ScikitLearn. Notice that there's an SKL to onnx where you can convert scikitlearn into an Onyx application or to onnx format. I keep on talking about Onyx. Let me go back real quick. There's this tool called nettron app that visualizes this Onyx model for you. And it also helps software engineers to kind of know what's the input and output of that existing model without going back to the data scientist, going back to the original code where it was trained from to know how to use can Onyx model. It visualizes the inputs and then be able to kind of visualize what the graph of operations would look like. If you go to Netron app, open an Onyx file there, and you can visualize it. All right. You can also use onnx as an intermediary format intermediate. Let's say if you have a Pytorch model and you want to convert it to Tensorflow, you can convert from Pytorch to Onyx and Onyx to Tensorflow. That is one way. Also, there's Onyx to core ML that you can use. There's ways also you can fine tune, can onyx model create and do transfer learning on an existing onnx model. If you're interested, of course you can train models in the cloud. You have a GPU clusters. But the important part here for me, and I wanted to talk more, this is your typical end to end machine learning process, where you have your experiments and you're building your base from your different iDe, or you create your training application. Once you train it, you would have a machine learning model. You register it somewhere in the cloud and manage these models. You can have these versionings and then based from that, kind of like when you have a docker image, you have kind of like Docker hub where you can store all these images. Also as your machine learning, as a way you can manage your models, you can upload these models there and be able to version them and also build a pipeline to create and to download these and incorporate it and create the image. So we did talk about step one, creating. Once we have an Onnx model, start deploying them. Okay, so we did talk about, as your data scientist, building kind of like a chef or a baker building your secret recipe. Now, let me ask you one thing. What is the difference between a baker and starting a bakery? Main difference is they all have different skill set. In order to create a successful business or successful bakery, you need both. Need the baker and also you need someone that actually manages the bakery. Software engineers are great at looking into how to start a bakery. They know where to put the cash here, how to collect money, right. How to create these pipelines and how you would display or use the application and be able to create those different areas of the business system, how you would use the machine learning model or how to create the whole application itself. What is the customer experience in all these different things? So it is important, whenever we create these machine learning models, it is important where we're going to deploy them. Some things to think about, right? You deploy it on a VM, you might want to deploy it on a Windows device or a Linux device or a Mac, you can deploy it on it. Edge devices or phones, different ways. How you would deploy and create these AI models and use these AI models. Of course, every time we think about deployment, think about where we can deploy this. We're going to deploy this to the cloud or at the edge. Edge meaning how close it is to your customers or your users. The analogy in that is McDonald's and subway. What's the difference in how they make the bread? Right. McDonald's most likely it's in not a warehouse, but it's outsourced. It's not at the edge, meaning it's not at the store, just compared to a subway, where you bake the bread at the store. So it's a different experience. Right. So what I'm trying to get at is whenever we talk about deployment, where we're going to run these AI models, where do we want to run them? Do we send the data to the cloud and then we run the inferencing at the cloud and then return the results to us, or at the edge, meaning closer to the user. Maybe it's on the phone, or maybe it's on the camera itself, or in a gateway closer to the user. So those are things we have to consider when we deploy these machine learning models, especially in the Onnx model. Of course, you can also deploy them in the cloud, how you would deploy them, since you already have registered in your machine learning model or your Onyx models in the cloud. As you build your image, you create your pipeline. That is one way where you can deploy it through a service, an app service. You can deploy it and run it in a docker container or in Kubernetes service. Speaking of docker images, there are Onyx Docker images that you can start using. There's an Onyx base that has minimal dependency that you can use it. If you want to run use onyx into your application. There's Onyx ecosystem that allows you to be able to convert without an installer, right? So let's say if you just want to convert an existing Onyx model, an existing application or an existing machine learning model, let's say it was written in Pytorch. You don't want to download all the converters locally in your machine. You can just use these docker images. So whenever we talk about edge, what is the edge? Remember the definition is how close it is to your customers or to your users. But of course, every time we think about the edge, we'll talk about deployment. When we deploy it to the cloud, most likely it's just you're deploying to the data centers. Maybe it's thousands of devices. If we talk about we're going to deploy it in 5g infrastructure where we deploy it to the fog, which is maybe just millions of devices, millions of models where you're going to deploy these. And of course, when you talk about edge might be billions of devices depending on the need, because each device may have those different deployment structure. So why would you want to deploy your machine learning model on the edge or run it on the edge? One is low latency. Think about, let's say you're collecting videos, right? You're doing inferencing based from video or sound. You want it faster, so it makes sense to run it locally on that device itself. So it's load in. See, think about it. If you have to ship that to the cloud, you have to ship each images, each frame. That might cost you money and of course produce scalability. So it might make sense to run it at the edge to provide scalability. Another one is flexibility. So it might make sense to run it locally so you don't have the need for Internet connection. Also rules, privacy rules, want to send any personally identify Pii information or might make sense to local laws that it's limited to certain geographical areas. So it might make sense to. It gives you that flexibility where you want to deploy and where you want to run this inferencing. There is Onyx runtime where you can run it's a high performance inference engine for your onnx models. It is actually open sourced by Microsoft under MIT license. So it's not just limited to neural networks. Also for traditional machine learning spec it has extensible architecture that allows to have different hardware accelerators. It's part of Windows ten as Winml. And if you want to learn more about Onyx Runtime, there's Onnx runtime AI website. The good thing about this is there's this part where I think it's pretty neat or let's say if you want to use different platforms, let's say I'm going to create a Linux application and I want to create a C sharp using C sharp API and this architecture X 86. If you want to run ARM 64, you can select them and then you have these different architecture, different hardware accelerators. So if you want to use the gpu, select CUDA, or you can just use default cpu and it will give you instructions how you can incorporate it to your application. Notice that there's different hardware accelerators. So like for example if you wanted to run Openvino, you have to convert it. You don't have to convert let's say a Pytorch model to something that's compatible with Openvino. You can go Pytorch to Onyx and then use Onyx runtime with the Openvino hardware accelerator. Like I said, onnx runtime ships with Windows AI platform. So if you're as part of the Winml API, which is a practical, a simple model based API for inferencing in Windows. So let's say if you have can existing forms application and you want to add machine learning model, or you want to add machine learning to a windforms application, this allows you to be able to do that. There's also direct ML API, so that if you're creating a game, there is a way to be able to use direct ML that runs on top of DirectX twelve which has a real time high control machine learning operator API. And of course you have these robust driver models that it automatically knows if you have a gpu, a VPU or XPU fully defined, but it automatically switches. If it can run a cpu it would use it. If it can run in any one of these, then you'll be able to use that. That's how it's able to access those drivers. There is also Onnx JS which is a JavaScript library to run Onnx models in the browser or even in node it's using WebGL and webassembly and it could automatically use CPU or GPU. So think about let's say you have it in your browser. What it had to do is it would download the Onnx model to the browser locally and then use onyxjs to be able to use inferencing. So instead of sending it to cloud, the Onnx model is actually locally on the Chrome browser or on that browser itself and doing inferencing that way. It is compatible with Chrome Edge, Firefox, Opera. If you want an electron app, you can also integrate it with your node application. It's not just desktop, also mobile, Chrome Edge Firefox you can use too. All right, I'll do a little bit of demo. If you're interested in getting what I'm using to demo, here is the link and I will show that later. Again, let me pull this application for you back there. Okay. So if you go to that link, it will get you this application or this website. If you want to try out our demo today, you click this out to try it out. And what it'll do is it would pull up the docker file and create an instance of a Jupyter notebook using binder. This is what it looks like. So now that the kernel is ready, this is a c sharp application. So what I have here is running Jupyter notebook using. Net interactive so I can have a c sharp application. And what I want to demo here today is I wanted to convert a model trained in ML. Net into Onyx. So this is how I would get some nuget packages and download them. While it's downloading, let me kind of talk a little bit about the code. So this one right here is system IO. I'm using system IO, Microsoft data analysis Xplot plotly. And this one right here allows me to be able to format it properly, to display it properly on this Jupiter notebook. So it's just a library. Okay, let's wait until that one's done. So this one right here, I have a CSV file, salary CSV. Let me try to open that for you. So this is what it looks like. I have two columns on this csv file, years of experience and salary. I want the simplest example. I mean, this is not the best example if you're going to create a machine learning application. But I want to one input and one output. Input is your years experience. Output is salary. So we want to create a machine learning model that when you create years of experience, your input is years of experience. It would kind of guess how much is the salary based from that experience. It's just a contrived example. Okay, let's go back here. Now that that one is done, it was able to download all my nuget packages. Now I'm using to run this system. I'm going to run this to your application right here so that I can load the csv file using data frame. And based on that, this is what my data looks like, right? Notice that as the years increases the salary. So it's just simple example and looking at this description, then it gives me what's the min, the max. Right now I only have 30 items on my list. And so at the end of the day, what this one is trying to do using ML net, I want to create a pipeline to be able to train a model. Whereas in order to train, there's two things you have to do. You have to use ML net and once you have that context, you create the pipeline and then you do fit and transform. There's always that pair. So once you have a transformer model, it'll create that model for you. So what I want to do now is now that I have that model, I want to convert it to Onyx. So I use context model convert to onyx. I pass in the stream and my data and what it'll do is it would create these model onyx for me. See where that one is? Model onyx. Put it in here. So let's try to open it again. There you go. See, I noticed Onyx model was generated for me. Think I can open that Onyx model? Okay, so now that I have an Onyx model, let me try to verify it and see how I can run. I'm going to open another project. This time I have this Onyx inference Python notebook. This is not a c sharp application, so I want to change my kernel. Let's change this kernel into a python. So this time I want to use Onyx runtime in Python to do the inferencing on that model onyx file. So I do pip install onyx runtime. And what I'll do is it would download all the necessary requirements to install to get Onyx runtime library. Of course, here I'm just importing them and I create this inference session. So notice that this model Onyx, if you go to netron app, it would display something like this where you can view the contents of your onnx model. And notice that it gives me the input and then the output. This input gives me the years of experience and salary. The output is like this. So whatever we're interested in is input. In this case, it's only years of experience salary. That's the one. We're trying to guess this point when you're doing inferencing. So it would be ignored, but it would use all your inputs. So even if you place a number here, it would be ignored, it won't be used. Notice how that one is not connected. So it's only using years experience. He's using feature vectorizer to be able to create this linear regressor to get the output. Of course, here, all these are not going to be used anyway. It's just going to be stub. What we're interested is the output right here. Okay, so this one, what I wanted to do is to get the name, the shape and the type of years experience, input, years, shape and type. This one is for another one for the salary. Years salary shape and type kind of gives me the descriptor and of course the output gives me the shape and type. In this case, how did I get four? It's the fourth of the output. So 01234. Right. So that's the fourth output. Let's run this one too. So now that I have that I can pass in that data, in this case pass in input experience, input salary, and it specify the years because I know the type and I need to place them into these array. So I got the years salary. Notice how I put zero because it's going to be ignored anyway. So let's say I change this to ten and the output would be identified here. And notice that if I have ten, that would be the value of the result. And now I can grab that one and that would be my output. So if I change it again to say three years, three and a half years, see what happens, and then I'll have a different output. Okay, so what happened so far? What I did was to train a model, export it to an Onyx file using ML net in c sharp. And based from that Onyx model, I use Onyx runtime to use it in my python application and do inferencing that way. This is how it feels like after learning all these things. Now it kind of connects all the different, how I can just easily use an existing Onnx is a way for us to be able lead software engineer to be able to talk to data scientists and also data scientists to talk lead lead software engineer that we can use these secret recipes, right, use these machine learning models and integrate it to our application. At the end of the day, all our best effort and all our programs. Actually, as long as we can integrate it to our application, not just existing application or any greenfield application, we can start incorporating these machine learning models through Onnx. So we did talk about as a recap, what is onyx? It's an open standard. Use the right tool for the right job and how you can efficiently run it on a target platform. It separates out how you train it and how you would run and do inferencing on that model. How you would create Onnx model. I did show you how to download it from the Onnx model zoo. You can create it and convert using some of the Onyx convert. There's different ways how you can create can onnx model. You can also deploy Onnx model. You can deploy it through Windows. NET JavaScript using Onyx. JS did a demo how you would use Onyx Runtime in Python to it in high performance. All right, if you want to learn more about me, my name is Ron Dagdag. I'm a lead software engineer at Spacey. I'm a 50 year Microsoft MVP. The best way to contact me is through LinkedIn or Twitter at Ron Dagdag I appreciate you geeking out with me about Onnx, Onnx Runtime, about Jupyter notebooks, about bakeries, bakers and breads. Thank you very much. Have a good day.
...

Ron Lyle Dagdag

Lead Software Engineer @ Spacee

Ron Lyle Dagdag's LinkedIn account Ron Lyle Dagdag's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways