Conf42 JavaScript 2020 - Online

- premiere 5PM GMT

Image Super-resolution in Javascript

Video size:

Abstract

Ever wanted to zoom and enhance an image, as seen in the movies? Now you can, with Javascript! UpscalerJS is a tool built with Tensorflow.js to upscale images to 2, 3, or 4x, all in the browser. In this talk, you’ll learn how to leverage the power of neural networks in your apps.

Summary

  • UpscalerJs lets you upscale images to two, three, or even four x all in your browser. Applying this technology on the backend has a number of things going for it. But there's two big drawbacks that I see to deploying this on the back end.
  • Running it in JavaScript means relying on your user's browser. There's no gpu to provision or keep running. The third compelling argument for doing this in your browser is bandwidth savings. There are some drawbacks to doing this on the front end.
  • Almost all machine learning research gets posted publicly and is accessible for free. Metrics are a tricky thing for something like superresolution. You don't necessarily need a full understanding of the technology to use it in your work.
  • The first invocation of a model tends to be significantly slower than subsequent invocations. If you're using tensorflowjs, you'll want to know about something called warm ups. The first time takes 900 milliseconds, but subsequent runs are a lot faster.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Kevin Scott and I'm excited to talk to youll about upscalerjs, a new open source tool that I wrote for doing image upscalerjs in the browser using machine learning. UpscalerJs is built with tensorflowjs and it lets you upscale images to two, three, or even four x all in your browser. In this talk, I'm going to discuss a little bit about how I built it and how you can start to leverage the power of neural nets in your apps. I want to start by outlining a use case that I think is really appropriate for something like this. This use case is inspired by a situation I encountered at work. Let's say you're working on an ecommerce platform. Images are critical for attracting people to products images sell. It could be real estate, fashion, software. Almost anything performs better with images. But if you're dealing with user generated content, you know how difficult it can be to get high quality imagery. A lot of the time, you take what you can get, which frankly, is not much, and you design that site with high quality, beautiful images in mind, and your design looks great. And then you get to deploying the actual site and suddenly your users are uploading low quality, pixelated images. It's not their fault. It's probably all that they have, but it kills the design. So this actually happened to me. I was working with a team and we put up a site that was extremely image dependent. It looked amazing in the designs, but when we actually got to deploying the site, it fell flat. Without high quality imagery, the design just didn't work. So what's a nontechnical solution to this problem? Well, I can tell you, because we do this today, you go back and you ask for better images, and sometimes people can oblige, but often they can't. The images they've given you is all they've got. Maybe it's an image screenshotted from a PDF, or maybe it's an old image and they can't get a better one. Even if they can get better images, it's a labor intensive ask of your users to go back and fix their images for you, even if it's for their benefit. So what else can we do? Well, there's a whole realm of research in machine learning called superresolution. The idea is to take a low superresolution. It look, well, higher resolution. Can you enhance it? Can we enhance this? Can you enhance it? Hold on a second, I'll enhance. If you've watched CSI, you've probably seen this fake technology in action. This technology is now real, and though it's not perfect, it's pretty good and it's getting better. One option is to apply it on the back end. There's lots of techniques for doing this, most of them are in python. Applying this technology on the backend has a number of things going for it. For one, backend code can benefit from beefy hardware. This lets you run the most accurate, most powerful models. If getting the most highly accurate images is important, this is probably the way to go. Also, a lot of use cases are you upload once, you display it often. So processing the images ahead of time, even if it can take a while in processing time. It's not a big deal because you only do it once. But there's two big drawbacks that I see to deploying this on the back end. One, it takes a lot longer to get immediate feedback. If I'm a user of your site, I upload an image, it has to go to your server, get processed there, and then get sent back down to my computer. There's also the issue of deployment. This can be nontrivial, especially because so many cutting edge implementations are at the bleeding edge, with unstable dependencies and changing requirements. And if your deployment requires gpus, that could end up being can expensive proposition as well, and hard to scale. So what about deploying this on the front end? Would that work? The issues with deploying this on the back end motivated me to explore whether it'd be feasible to perform upscaling in the browser using JavaScript. And it turns out that it is. And I'll discuss some of the technical hurdles and the code in a minute. But first, let's talk about the advantage of running this technology in JavaScript. First, that issue of deployment from the previous slide is gone. Running it in JavaScript means relying on your user's browser. There's no gpu to provision or keep running. It all happens on your user's computer, and in fact you can go to this link right now and upscale can image without installing anything, not installing anything. That's a really powerful argument, particularly if you don't have any machine learning experts on your team. The second big argument is immediacy. In the back end example, whenever a user uploads an image, they have to wait for the round trip experience. They have to upload it to your servers, get it processed, which depending on your technology, a server might need to be provisioned, or you may need to wait for a gpu or a lambda spun up, or any number of other technical issues before you send it back down the pipe to their computer. If you do it in JavaScript, it's already there in your browser. No waiting around. It only takes as long as it takes to do the inference. The third, and in my mind most compelling argument for doing this in your browser is bandwidth savings. In the back end example, we can upscale images ahead of time, but the image we're sending down is still the full superresolution. It could be a megabyte or multiple megabytes large. If you do it on the front end, you can send a smaller image, sometimes a much smaller image. That's huge. Let's say you're doing four x scaling. That's an image that is potentially 16 times less file size. That's a huge file savings. But that's only assuming that, one, the front end can perform decently fast, and two, that the image that we upscale looks good. Can we depend on that? Turns out that mostly we can, and where we can't, things are getting better. So now, of course, there are some drawbacks to doing this on the front end. One drawback is that if you do have those coveted machine learning experts on your team and those capabilities, front end performance could be worse, particularly if bandwidth is not a concern. Maybe your users primarily use desktops, then keeping things on the back end will probably perform better, and doing so gives you access to all the latest cutting edge techniques that might not translate yet to the browser. The second big concern is that neural networks running on devices benefit significantly from cutting edge hardware. The good news here is that consumer companies, namely Apple and Google, have invested huge sums in increasing the power of their devices hardware, specifically the ability to process neural networks on device, what's also known as edge AI. The downside is that because the improvements are so significant year after year, it makes the disparity for users running older devices that much more significant. Some older devices will just be awful. So if you want a consistent experience, youll may want to look at superresolution. So while there's absolutely tradeoffs to be made between back end and front end, the point is that JavaScript is absolutely a first class citizen when it comes to applications of machine learning and neural networks. No longer are you forced into some heavy duty back end solution. You can run this technology right now, today in your browser, and in this case in particular, doing it client side can be a much better choice than keeping things on the back end. Now, if you're a JavaScript developer and you're ensconced in the world of JavaScript, maybe you're wondering how you go about hearing about new machine learning technologies, how you know whether they're applicable to your work or whether you can use it. How would you even know super resolution is a thing unless you happen to see it on that CSI video? Hold on a second, I'll enhance so I want to briefly touch on how I became familiar with this research and how you might leverage a similar strategy to learn about opportunities that might be relevant to you. The first thing to know is that almost all machine learning research gets posted publicly and is accessible for free. This is academic research papers that can tend to be theory and math heavy and sometimes pretty hard to penetrate. And this can scare off a lot of people. It certainly scared me off at first. I don't want to minimize the importance of fully understanding the research. If you have a deep understanding of the theory, that can often lead you to novel insights and new development that's relevant to your field. But youll don't necessarily need a full understanding of the technology to use it in your work, particularly if you're focused on implementing prebuilt models like we are here, you can rely on others to evaluate the research as well as implement a lot of the code. For you. I like to rely on a website called papers with code. This website lists cutting edge research organized by topic. You can see the latest papers measured against metrics. You can also see available code implementations, as well as information about the frameworks that they're using. In our specific example, super resolution, there's actually a whole category dedicated to that research, and we can see the various implementations wanted. Metrics are a tricky thing for something like superresolution. Most common metrics are two, called PsNR or ssIm. They're both measurements of how different one image is from another. But as humans, we perceive images differently than a computer does. A set of pixels that are, say, less saturated but maybe sharper. That may lead to a lower metric score by the computer, but a more aesthetically pleasing score for a human. And this is not just a theoretical concern. At a certain point, people rate more aesthetically pleasing images as more similar than the ones the computers measure. And in fact, for popular metrics, the authors often note that better performing filters can tend toward a blurry, washed out kind of look. So metrics are absolutely important, but it's also important to bring your critical eye to them and consider your own use cases. For our purposes for super resolution, we're looking for good accuracy, yes, but not necessarily the best accuracy. Just as important is that it be fast and that it be compatible with JavaScript because not all of the code that we're looking at is the paper I ended up exploring was something called esrgan and the particular implementation was this one. You can check out my blog for more information on how I went about evaluating the different implementations out there. So with a viable architecture in hand, we can take the model offered by the author and see how to make it work in JavaScript. So we can start off by converting our model. For this example you can go to a website called Google Colab, which is a free notebook for running Python code in the browser. It also offers GPU if you don't have access to one. And so along the bottom here is a link where you can run this in your browser. And so here I've set up a number of cells that demonstrate the code running and upscaling in this notebook. This cell in particular is very important. This saves the model and not just the weights you can do either. Built for our purposes, we need the full model to be saved and converted, otherwise our JavaScript code won't know how to interpret the code that we give it. Another thing to note here is that this highlighted line I found that I needed to do, and I'm not sure if this is a bug, something that I'm doing wrong, or if maybe this is something a bug in the software. But I found that I needed to manually change this bit of JSON configuration in order to get my JavaScript code to run. So if you run into a similar issue, just know that you may need to run this bit of code in order to get it to run. So once we have our model saved, we can zip it up, we can download it, and we can then upload it in the next step in JavaScript. Then over in JavaScript this is code sandbox and here's a link you can follow along in your browser yourself. So what we can do here is create a folder that'll hold our model and we can then upload the files into it. And there they go. And there they are. So we can check and it uploaded correctly. So now on the right is the panel showing our code running. This image of a baboon is we're considering it our source image. This is what we're going to be upscaling. And so we load our model here, and the entry point is the JSON file, not the bin files. The bin files contain information about the weights and they're sharded to basically enable caching in the browser. But youll always want to give it the JSON input here. So on button click. We set up this function that will start a timer and then do the conversion of the image into a tensor. A tensor is sort of the core data structure that all machine learning works with. You can think of it as a multidimensional numeric array. And so we need to convert our image into a tensor so that we can put it through our model. So that's what this bit of code is doing is it's taking the original image and then making it into a tensor. So then we await the promise of our model if it hasn't loaded yet. And then we put the tensor through our model predict function that will return a new tensor, which represents what it thinks is the upscaled image. We then put that through this tensor as base 64 function, which will take that tensor and turn it back into a base 64 source representation, which we can attach to the image, and we can run it. And voila, we've got an upscalerjs image. That's really cool. It worked. We can see that it took north of 900 milliseconds. So one thing that's really interesting here is that the first time takes 900 milliseconds, but subsequent runs are a lot faster. They take around 100 milliseconds or so. So what's going on here? So, if you're using tensorflowjs, you'll want to know about something called warm ups. Based on how tensorflow JS interacts with your GPU, the first invocation of a model tends to be significantly slower than subsequent invocations. So the way around this is to when your site loads up, you send some initial dummy data into your model, and that will warm it up and avoid the cold start. For this to work, the image has to be the same size, which will be a particular problem here, as we probably won't have consistently sized images. And on top of this, this technique doesn't help the fact that the UI blocks. So another example is that we can explore web workers, and they help somewhat, but they're not a silver bullet. Again, you can check out my blog post. I go into more technical detail on exactly what's going on there and why they're not a silver bullet. So at this point, we've got a working implementation. We're able to upscale an image in our browser, which is really cool. When I first got this working, I was blown away that this is even possible, but it's still pretty slow. And the solution that we have in place for speeding it up only works if we're giving it consistently sized images which we can't really rely on. Plus it's still locking up the GPU, so we still have a number of roadblocks before we're able to use this in a consumer app. So what if instead of feeding our image directly into the model in its full sense, what if instead we cut the image into pieces and try to process those pieces one by one? If we subdivide our image into sections, we can take a single long task and break it into four smaller tasks, where after each one we can release the UI thread. But we run into a new problem. Now, these upscaled images tend to have artifacting around the edges, and this is a common thing that happens with a lot of upscaling algorithms, is that they introduce this issue of artifacting around the edges. It's sort of inherent to how they're working. It's generally not very obvious in a full upscalerjs image, but when we cut it into pieces like this, it becomes very obvious to fix it. What we can do is add padding to each of our chunks. And the interesting thing about this solution is that going back to the issue of warming up, we set our images have to be the same size. So long as we set our patch size small enough, smaller than the smallest image we expect to receive, we'll always be able to pass a consistent image in and avoid hitting the warm up. And I have an implementation at this link here where you can see where we're doing this, where we're doing some of the math to basically take an incoming image and split it into smaller chunks that have a consistent size that allows us to avoid hitting that warm up. So there's also other things we can do if we want to make this run faster in the browser. We can quantize our model, which means making it smaller and also easier to zip as you're passing it down the wire. We can prune our model, which means dropping poorly performing weights during training, which makes it run faster. We can also improve the accuracy of our model by giving it more data or training on a specific domain. And these are all things that, if you're looking to deploy this in production, are absolutely worth pursuing. But the point that I want to emphasize is JavaScript is absolutely a first class contender for considering can application of this technology, and it's a contender that, in my opinion is arguably a better option in a lot of cases than the pure Python solution. You don't need machine learning experts, although it probably doesn't hurt, and you don't need to be a machine learning expert yourself, although again, it probably doesn't hurt. All the code I showed today is available in upscaler JS, the open source tool that I built using Tensorflowjs. You can head to NPM right now and install the package and then run an image through it, and voila, you've got a working upscaler in your browser. As I continue to work in this area, I'll keep improving the package as well as the models that ship with it as well. Exploring domain specific models like perhaps face specific models or illustration models. Imagine the implications of something like this for video technology. What if we could reduce the size of a video coming down the pipe by 80 90%? What if we got improved models that could, instead of upscaling by four x, what if we could do eight x or 16 x in the browser? Those are all improvements that are, they're not outlandish, they're very feasible. I wouldn't be surprised if we see that in the next year and that's all applicable in JavaScript. That's all technology that is very feasible to happen in JavaScript. That's really exciting. That's huge savings that we could be seen in the browser. I hope you've enjoyed this talk and learned a little bit about machine learning and javascript today. If you're interested, you can find me on Twitter, on GitHub, and at my website where I write and talk about this technology. Thanks for listening.
...

Kevin Scott

Creator @ Upscaler.js

Kevin Scott's LinkedIn account Kevin Scott's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways