Conf42 Machine Learning 2021 - Online

Convolutional Neural Networks in Action

Video size:


Neural networks are great for complex data sets, but some sets have more features to figure out than others. Many times these features are initialized based on heuristics and they have to be tuned as the model returns predictions. With convolutional neural networks, the model tunes the features for itself.

In this talk, you will learn some use cases for CNNs, how they work under the hood, and how you can create a CNN in Python. You’ll be able to see how convolutions and max-pooling help decrease the amount of pre-processing you have to do. By the end of the talk, you should have a good understanding of the basics of CNNs and how to implement them.


  • Melissa McGregor is a developer advocate at iterative AI. Today she'll talk to you about convolutional neural networks in action. We'll walk through the code that you would use to make a CNN in Python. And we'll run one quick training experiment with DBC.
  • neural networks are basically just algorithms that can be used to make predictions. They're made of these multiple layers of nodes. The goal of a neural network is to take advantage of deep learning to try and imitate the way our brain works. There are a lot of different types of CNNs.
  • You don't have to use visual studio code to do any of the things that I'm doing. You can even run all of these commands in a regular old console. Let's break down this convolutional neural net. Pretty much much all of this is making a machine learning model.


This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Melissa McGregor, and I'm a developer advocate at iterative AI. But I specifically work on DVC, which is a data version control like open source tool you can use to, well, version control your machine learning projects. But today I'm going to talk to you about convolutional neural networks in action. If you have any questions at any point, feel free to reach out to me personally on Twitter at flipped coding, or you can reach out to the whole team on but just so you have a little background about me, I have my master's in mechanical and aerospace engineering. Then I did some machine learning work in robotics, where I was able to work on this cool autonomous car that interfaced with pedestrians and passengers. And from there I've done things on front end, back end, DevOps, database, admin stuff. I've just kind of been all over the place in tech. But convolutional neural networks are actually something that I work with a lot on my personal projects, which probably says a lot about what I do with my free time. But I wanted to talk about them with you all so that you can see how they're actually used in action. So just a quick overview of what we'll be talking about today. I'll give a quick background on neural networks in general. Then we'll go over some basics of cnns, and I'll touch on a few use cases for cnns, and we'll actually make one. Well, I'll walk through the code that you would use to make a CNN in Python. And we'll run just one quick training experiment with DBC so we can look at how well our model actually is. And finally, I'll wrap up with just a few key takeaways, some stuff that I really hope helps you after this is over. So to get started, little background on neural networks. These are basically just algorithms that can be used to make predictions. So they're made of these multiple layers of nodes. And this is what one node looks like. So the goal of a neural network is to take advantage of deep learning to try and imitate the way our brain works. So each node can be like a node in your brain or a neuron or something. You give it a certain number of inputs, and these a value is assigned to how important those inputs are to the problem you're trying to solve. Then some crazy math happens, and you go through an activation function to finally get your output or your prediction or whatever it is you're looking for. Now let's talk about some basics of CNNs. So convolutional neural networks, network has convolutions. And these are just math. So it's a linear operation that uses multiplication on set weights with inputs. But basically that means the filter is smaller than the input data. So you take the multiplication between youll filter and the filter itself, like the filter patch. So the little part of the image that your CNN is going over, you take the filter patch and the filter and you get the dot product. But here's a picture to show what that looks like a little bit better. So these squares that you see in orange or yellow, I'm not sure which color that is. But the squares that youll see in there are actually the filter sized patch of the image. And the filter itself is this three by three matrix that is over that image. And as we perform these convolutions, you go across, youll image in a step called a stride. So if you have a stride length of one, you would just can from this three x three section, and then you'd shift the whole three by three matrix over to these next set of three squares until you run out. And then you drop down to the next row and youll repeat until youll have scanned these whole image with that filter. So when you do have your convolved feature at the end, basically this is just a smaller representation of what that image is. So usually we're using convolutions to pick out the features in these photo like edges or maybe large landscape features, things that are really big and help define the overall image. Basically, because with CNNs, the gap that they fill in is that when you have just a regular neural network or you're using some kind of other algorithm to classify images, they might take that image and make it into this one dimensional thing, which if you take an image and make it 1d, it takes a lot away from the context. So you need to have at least two dimensions with your images so that you get that spatial and temporal perspective of what's actually happening in these image. And when we use convolutions, it helps us both take advantage of a lot of the preprocessing that we get with CNNS. And it helps us get through our data faster because it takes these image and it squeezes it down into the features that really matter. That's what we're doing with our convolutions. So something else that's a major part of CNN's is the max pooling layer or multiple layers, if that's what your model needs. But max pooling is actually how we decrease our computational load that we need to process all of our data. So the way that this works, it returns the max value from the portion of the image covered by the kernel. So if I go back to this, the filter that's over our orange area is the kernel. So our kernel is three, because we have this three by three chunk out of this element. And what this is saying is that it returns the largest value from that kernel. So with this one, this might not be the best example. This is a picture for something else, but this would return a one here just because that would be the largest number in this filter size patch. So that's what's meant by max pooling. And when it's choosing this maximum value from the kernel, this helps act as a noise suppressant, so that max value represents some really bold feature in an image. For example, that might be a large these, that really cases up a good chunk of an image. Or it could be something like a person that's standing in these foreground of an image it's going through and picking. But the most important features in each of those kernels as we go through the image with our convolution scan. So there are a lot of different types of CNNs. It just depends on the model you need and the problem you're trying to tackle. And 1D CNNs are usually used on time series data. Like I mentioned a little bit earlier, 1D isn't the greatest to process images just because you lose a lot of context of what's happening in an image. But if you have some kind of time series data, like maybe, I don't know why you would be interested in the weather changing in a way that you would need a CNN, but I'm sure there's an application for it. So anything that is time dependent, one CNNs will probably do a good job. And then two CNNs, which is what we've been talking about the most. These are used with image labeling, classification problems. It's kind of like the standard when we're trying to classify images now and then there are three dimensional CNNs. This is getting into some more high tech imagery. So you'll see these a lot in healthcare with things like ct scans and mris. Youll probably see it in some kind of crazy advanced scientific labs where they're doing stuff with electric fields, maybe. Not sure. Whatever it is that they use to make three d images, you might be able to process it with a 3d CNN, and you've seen what a 2d CNN looks like. When you're scanning through with the convolutions and doing your maxpooling. But a 3d CNN would look like a cube, and you would scan through different chunks of that cube and do your convolutions and your max pooling to get the most important features out of it. So that's how they use 3d CNNs for things like tumor detection or weird tissue issues. Tissue issues, sorry. But that is a real practical use for convolutional neural sets. And just so we're clear, there's a few differences between convolutional neural nets and regular ones. But a big one is that CNN save time on pre processing data. So when they're scanning through the image and looking at the kernels, it's actually doing some feature extraction for us. So you don't have to have a predefined set of features that you're looking for. CNNs do this discovery as they're going along. And I just spoke a little bit ahead of my second bullet point, but that's okay. But like I was saying, CNNs, they figure out the important characteristics as they go through that convolutional part of the process. But there is one thing where neural nets do shine a little bit more than CNNs, and that's when you don't have super large data sets. So typically anything under, I think it was 10,000 images. You might not get the best accuracy with cnns. So convolutional neural nets do need a lot more data than regular neural nets to be super effective. So here's a few use cases for CNNs. Maybe you want to recognize different handwriting, which kind of segues into the MNIsT example that we'll be going through shortly. Or maybe you're working on something like an autonomous car. Or maybe you're trying to get a computer to identify certain parts as they pass a camera. This is one of those times you would consider a convolutional, convolutional neural networks net. And you might also use this to help prevent bank fraud. So reading the digits on checks is actually a really important thing. If you've ever deposited a check in an ATM or on a mobile app, there's probably been some kind of CNN behind that. And again, post offices have a lot of mail that these handle throughout the day, so they need some help when things are going down. Conveyor belts, I imagine. So youll know, it needs to make sure that the zip codes make sense and addresses make sense for handwritten letters or even labels that have been printed off still need to be processed. And this is where CNNs really shine. So now for the fun part. I tested this literally 1 minute before I started this talk. So this live demo should work, but we'll see how things go. Let me go ahead and switch screens. Okay, so you can see my instance of vs code. And I want to make sure to emphasize right now, you don't have to use visual studio code to do any of the things that I'm doing. You can use whatever ide you prefer. Nothing I'm doing is vs code specific. You can even run all of these commands that I'm about to run in a regular old console, but I just like vs code anyways. I have these example convolutional neural net setup for a MNiSt example, but I also have it inside of a DVC pipeline because this is how I like to track my experiments, to see what models or what code changes, what hyperparameter values or what data sets really make a big difference when I'm training my model, it just makes it easier to track stuff. But let's break down this convolutional neural net. So to start with, I'm just using Pytorch, and this is all built in. So this torch neural net module, all of these convolution max pool linear things, this relu activation function, all of this is just part of Pytorch. But what we're doing initially is creating the neural net itself. So up here, this is where you'll have to know something about your data. So the data set that I'm working with, I know that I have one in channel for my image. Like you might have multiple channels for your images if you're working with rgb images, like colored images. But if you're working with something that's black and white, you probably can get away with just having one channel in. And then we'll have eight channels going out, and we have a kernel size of three. So the reason we have eight channels going, but is just because I know these size of my images, and this is about how much we'll be able to scan over and get from our convolution. And then we've added this padding because we only have eight, we want to make sure that we're capturing the edges of the image in case there's some important information there. So that's how we made our first convolution. Next we have our max pool layer. And then this is where we'll go through that kernel size and make sure we have the max value for each kernel size patch. So this is where we get our noise suppression. This is how we make sure we're pulling out the more important features of our image. And then we have our next convolution. So you can have as many layers of convolutions and max pools as you need. You can throw in some batches if you need. They can get really intense if you're working with some crazy images, which it doesn't take much. So don't be afraid to play around with multiple layers of convolutions and max pools. But one thing to keep in mind is that the input for the next convolutional layer matches what the but was for the previous layer. So in this case, you see we're getting more defined on the image area that we're going over. So now we've started with eight channels coming in and we're going to go over 16 predictions. So we're trying to get more definition and figure out what those important features are. But we'll keep the same kernel size. We'll do the padding just to make sure we're not losing anything. And then we just have a few linearization layers at the end just to normalize some things. And last, we have our forward network. And this is how we really get the features and build the model. So we have activation layer for this first one, and that's just a way of getting a positive value or a zero if there isn't a positive value. And youll see relu activation used pretty much standardly in neural networks just because it gives you that impulse value, it's either positive or zero, so you don't have to filter out as much stuff. But we're doing the activation function, we're handling the maxpooling on the image, we're going through activation for the next convolution, we're doing some more stuff, and then finally we return the model. So that is how you make convolutional neural networks net in Python using Pytorch. And a lot of it really is just dependent on how well you know your data. So if you're a little bit confused about what numbers you should use for your convolutional layers or your max pool, take a look at these pytorch docs and then just play around with them. Pretty much all of model making and machine learning is just experimenting with different values. And that's why I like DVC. So for example, I'm just going to run an experiment to show you what it looks like I told you, I tested this not too long ago, like literal minute before, and okay, it's not broken, but haven't made any changes. So let's say I didn't know how many out channels to have, and I'm going to put nine here just to see what happens. So when I run this experiment, DVC detects the change and it runs this training stage. But you'll notice that we have can error for this because I changed from eight to nine. And it'll show you what the expected is for whatever our given image is. And we just change these back. Maybe we change these kernel size and I try to run that again. Let's see. No, that didn't work. So we'll change that back. Let's change something down here. Let's say we want ten instead of 16. Wonder if that'll work. No, but the good thing about DVC is that if any of those code changes were to have run, maybe let's try changing this to 64, see what happens. Did that run? No. Good. So you see, you really have to know the data you're working with to get the right values. But let me just try one more thing and see if that works. All right. These is exactly what it's like sometimes when you're training a model. But basically I'm going to come change a hyperparameter, going to change my learning rate, just to show you all what it's like to run an experiment, because this is something else you might change if you're working with convolutional neural networks net or a lot of other machine learning problems. So I changed that learning rate because I'm trying to figure out the best model for my MNIS data sets. And I'm going to do this a lot. It's a common problem in machine learning to just do this, changing values, changing code, maybe adding new data to your training set and seeing how that affects your model. So in this case, youll can see we have some epics that are running, we have our loss, we see how accurate it is, and that's with this learning rate. So I'm going to stop these training run and just show you our table real quick. So, yeah, if you look up here, you'll see my last run with this training rate. Yeah, it's not as great as we would want it to be, but 83% accuracy isn't that bad. But now that you've seen kind of how we make a CNN, how we run, can experiment, I'm going to switch back over to the slides and we can go ahead and finish up. So a few things I hope that youll take away from this. Make sure that you compare a few different algorithms before you decide what's best for your application. So a CNN might be great for most images, but maybe you find something else that works better for your data set. It's okay to play around with different things to find what gives you the best and then take advantage of what's already out there. You don't have to write everything from scratch to prove that you're just this great ML engineer. It's fine. We already know that you're great. Just take it easy. You don't have to do the hard math stuff anymore. Use those existing libraries like Pytorch and maybe even tensorflow if you want to take the time to learn that. And then when you have a problem, especially in machine learning, try breaking it down into multiple steps. So that's something that a tool like DVC can really help with is just you're able to reproduce every experiment you run. So if you're looking at your model and you're trying to figure out what value you changed to make it this good, then you'll have a record of the exact changes you made to make such a great model. But that's all I have for you today. I hope you were able to learn a lot about cnns. And again, if you have any questions for me, feel free to reach out on Twitter at flipped coding. Thank you.

Milecia McGregor

Developer Advocate @ Iterative

Milecia McGregor's LinkedIn account Milecia McGregor's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways