Conf42 JavaScript 2022 - Online

Interactive command-line tutorials with WebAssembly

Video size:

Abstract

In this talk, I’ll dive into how sandbox.bio was built, with a focus on how WebAssembly enabled bringing command-line tools to the web. Although these command-line tools were originally written in C/C++, they all run directly in the browser, thanks to WebAssembly! Since the computations run on each user’s computer, this makes the application highly scalable and cost-effective.

Along the way, I’ll discuss how to get started with WebAssembly, along with its benefits and pitfalls (it’s a great technology but not always the right tool for the job!).

Summary

  • Webassembly is another language that you can use in the browser. It's a compilation target, meaning that you write code in another language and then compile it to webassembly. The reason webassembly has been really powerful so far is three things: reusing code, performance and versatility.
  • Webassembly lets you run commands in the browser without reaching out to the server. It's also more secure to execute arbitrary commands within the sandbox of the browser. The advantages of using webassembly are cheaper and more responsive. But there are disadvantages.
  • Finally, I wanted to share some resources with you that I thought could be useful. sandboxbio is primarily focused on bioinformatics, but also has interesting command line tutorials. And this obviously also uses Webassembly, another resource. And finally, there's also my book levelupwasm. com.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Jamaica makes on these real time feedback into the behavior of your distributed systems and observing changes exceptions errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Close today I wanted to walk you through how I built interactive commandline tutorials using Webassembly. So the application I want to focus on today is sandbox bio, and this is an application that features interactive commandline tutorials. It's mostly aimed at bioinformaticians, but it also has tutorials for general command line usage. So here I have an octututorial. On the left you have the instructions, and on the right you have this playground where you can start writing commandline and executing them right away. So here I'm taking the first few lines of a file. I can also make more complex commands, like taking the output of awk that prints the third column and piping it into the head command. And what's interesting about this is that not only is it running the real know, this is not a simulation, it's running AUC in the browser. There are no servers that do any of this computation. How, you ask? Well, that's where Webassembly comes in. And so let me start by telling you a bit more about webassembly itself. To me, Webassembly is just another language that you can use in the browser. We can use HTML, CSS, Javascript. Now we can also use Webassembly. The key difference though, is that Webassembly looks a little strange. So here's a very simple piece of code in Webassembly that defines a function, and this function returns a string that has.com 42 in it. That's all these does. This looks pretty complicated, but the thing about Webassembly is that you don't write this code directly. It's a compilation target, meaning that you write code in another language, or you take existing code in another language like c, and then you compile it to webassembly so that you can run it in the browser. So that's why people talk about Webassembly as being a compilation target. The best support that we have today is c and c in rust, but there's other languages that you can absolutely compile to Webassembly. So why? The reason webassembly has been really powerful so far is three things. Number one, reusing code. All these are examples of tools that were on a desktop or on the command line that have been ported to the web without having to start from scratch. Number two, performance. In some cases you can replace slow, heavy Javascript computation with faster, more optimized webassembly, and you can get speed ups. And number three, there's this idea that you can really run webassembly wherever a runtime for it exists. So there's webassembly runtime in the browser, but there's also webassembly runtime outside the browser, right? If you do edge computing like Cloudflare or fastly, if you use node or Dino, you can run it there was. Well, or you can run it on small devices. Now, how do you concretely get started? How do you compile things to webassembly practically, and if you're compiling c and C plus plus tools, I would say by far the best choice is unscripted. It's a fantastic toolkit. It helps simplify this compilation and offers a whole bunch of utilities that I will mention in a bit. All right, so let's take a look at a concrete example. We have this commandline utility called CPK. This is a tool commonly used in bioinformatics, and what you should note is, number one, it's a useful tool, number two, it's written in c, and number three, I want to run it in the browser, how do I do that? And so if we put webassembly aside for a second, how do you compile this tool in order to run it on your own computer outside the browser? Well, you would use a C compiler like GCC. And so here you tell the compiler I want to output a binary file called Ctk, and I have a whole bunch of flags. If you want to do the same thing, but compile it to webassembly, what you can do is use Mscripton's eMCc. So this stands for mscripton C compiler. It's basically a wrapper around Gcc that makes these compilation to webassembly easier. So it looks fairly similar. Instead of outputting a binary file, you output ctk js. Note here is that these actually asks MScripton to output both a wasm and a JS file. So you may be wondering, what do I need the js file? I thought this was webassembly, and it is. But one thing that mscripton does is give you this Javascript file if you want it, that helps you initialize the module, helps you deal with calling various functions, has a bunch of utilities around file systems that I'll mention in a second. So that could be really powerful to avoid having to rewrite all that yourself. And so you can see the other flags are fairly similar, except when we get to Lz. So this means I want the zealib library. And so instead of using that you tell these script and use zlilib equals one. Because the alternative is you would have to bring in the zlib code and have it also be compiled to webassembly. And you don't have to figure that, but you can just tell inscription yeah, I want zlib, and mscripton does that for a whole bunch of other libraries that are commonly used. Zlib is very commonly used for compression, but if you use Png files, you can use png. If you do a lot of graphics or games, you can ask emscript to load sdl the same way. And the last thing I'll mention is this force file system. You technically don't need to tell these script in that it will figure it out. But I just want to make it explicit here that most command line tools expect there to be a file system like they operate on files, these output files. And so to make it possible to use that tool as is in the browser, mscripton creates a virtual file system in the browser in memory. It doesn't affect your real files, it's just a mock file system, but it helps you do things like you could ask the user to give you a local file and then you can mount that file on the virtual file system, giving it a path that you can then give to your command line utility. And so then it can work the same way normally does. And so this is another thing that you get out of outputting this js file. Okay, so how do I actually call ctk then? Well, if I'm on the command line, I just call ctk like this and give it the parameter within scripton you would do module call main and this is Javascript code, right? And then you give it an array of parameters that you want to give the webassembly module. And then behind the scenes mscripton will figure out how to convert this to something that the webassembly module will understand. Because keep in mind, webassembly only understands numbers, right? So you can't pass in strings, you have to do this transformation. So this was using Gcc, but mscripton has a whole bunch of wrappers for other build tools. If you users g, you can use em. If you're making a library emar, same thing for make cmake and configure, you can use these wrappers from mscriptin to do the compilation. Now one thing to keep in mind is that I just showed you a pretty simple example. It can get pretty complex to compile something to webassembly. Some things use threads. Encryption has some tools to make that easier to use web workers for that. Some tools use SIMD. Now that's not entirely supported. Webassembly. Net currently supports SIMD 128 bits, but if you're using something different, it might not work. If you have assembly code, actual assembly code in your c program, you absolutely cannot compile that, right? And so in those cases, if that code is only there for optimizations, there's usually flags that you can use to disable that to get around it. And these are other things like this that make it a little harder to compile it. Or if you have sockets, that's really tricky. You have to work around that. Anyway, if you're curious about learning more about how to compile things to webassembly for use in these browser, I wrote a book a few years ago focusing on that called level up with webassembly, and you can check it out@levelupwasm.com okay, so now back to sandbox bio. We have these tools that I want to be able to run in the browser, like awk, grepJq, and a whole bunch of core utils like ls and head and tail. These are all written in c and c plus plus, so I can use the process I talked about earlier and compile these tools from c to webassembly. And now I'm able to run these tools in the browser. So just to put it into the context of the application, where do I actually execute these webassembly modules? So the first thing is we're going to use exterm js, which is a library that helps you simulate the look and feel of a command line. But of course this library will only make it look like a terminal. You still have to interpret the commands and do something with it. And so what I do is essentially parse the user's input into an abstract syntax tree. So this lets me get a clear view of what are these programs that are running, what are the parameters that we give the program. And we need to be able to handle computations such as piping, right where the output of command is the input of another process. Substitution is also common on the command line. Things like variables, you need to be able to handle that. And so you need to parse that ahead of time, have it in data structure that you can then go through one step at a time. And for example, were you say, okay, first I start with Auc I'm going to run call the main function from AuC wasm, I'm going to give it these parameters and then I want the output of this to be the input of the head wasm module that I will call. So that's kind of how this webassembly fits in to the application. And then in the background I have a process that stores the file system state in indexdb. So this is because I want users to be able to make modifications to these files on abscripton's virtual file system, but still be able to see them when they refresh the page. So if I modify this orders Tsv file for example, I want that to be maintained across sessions. So why use webassembly for this use case? What are the alternatives? Well, so here's what it looks like with Webassembly. You have a browser, you have a server. All the server needs to do really is give static assets to the browser. This is the Javascript for the app logic and the utility code that we get from mscript, and it also has the wasm binaries. So then once these are in the browser, anytime you need to run a command, you just need to execute it in the browser. You don't have to reach out to the server at all. And also like I mentioned, we keep track of the file system state in the browser itself. And so here's what it would look like without Webassembly. If we can't run things in the browser, then we have to run them in the server. And so the server would provide the browser with some application logic. And now every time you want to run a command, let's say it's an op command, you have to go to the server. The server has to be managing, spinning up and down some sort of workers that can execute arbitrary user commandline on demand, give the answer back to the browser. But now this is a lot more complicated if you want to maintain file system state, and in a way you have to, because in the browser the state is at least maintained until refresh, even if you don't have these system. But on a server you would need another way to track which users is making which request and on which files and what is the state of each one of those files. So the advantages of using webassembly is first of all it's a lot cheaper. In the Webassembly case all I'm doing is serving static assets. These is very cheap to do. I can put that behind a CDN and I'm done on the server. Side, I would have to be managing a lot of computer resources and a lot of storage resources, and so that would get quite expensive. And because of that it's a lot easier to scale this webassembly. Side, I can easily support millions of users, whereas without webassembly this would be trickier. The other advantage is that it's more secure to execute arbitrary commands within the sandbox of the browser and webassembly, whereas if you want to do the same thing on your servers, you have to absolutely make sure that users are not escaping the sandbox that you have. It's also more responsive to use Webassembly because it doesn't need to reach out to the server, wait for a worker to be ready, execute the request, go back to the browser. That makes it a lot slower and so we can make it more responsive with Webassembly and it's a lot easier to maintain the state. With Webassembly, I just store the state in each user's browser. It could be temporary, that's fine, but on the server I have to associate a file system to each user because if you send me a command that modifies a certain file, that file may be different depending on where the user is. In these tutorial right now, there are disadvantages. The first one is that data size is limited in the sense that the files that you users in the tutorials can be too large. If they're too large and you're doing too much computation, the browser just won't support it. It's going to take too long. It's going to lag things down very dramatically. And so the way around that is the tutorials use a very small subset of large data sets to illustrate the point of using some of these tools. And that's okay, that's not that big of a disadvantage. These are tutorials, after all. They're meant to show how to use the tools, not necessarily to fully analyze hundreds of gigabytes of data. The biggest disadvantage, I would say, is that all the tools that are featured in the tutorials have to be compilable to webassembly somehow. And like I mentioned earlier, that can get really tricky in some cases. It's just not practical to do so. To me, that's the biggest disadvantage for this website. Now, I've talked a lot about how awesome webassembly can be. I think it's important to keep in mind when it doesn't make sense to use webassembly. I want to say three things. Number one, too little or too much computation in the browser. When you're facing that situation, it's probably not a good use case for Webassembly. So concretely, let me give you an example of too little computation is if you use a language like rust, for example, to write front end UI, and then that gets compiled to Webassembly, to me that's too little computation. It adds a lot of complexity, first of all, but also adds a lot of overhead of webassembly. And you're absolutely not going to get speed up for this sort of simple UI. And so for that, I would say probably not a good use case for using Webassembly. The other example is too much computation. If you're running some analysis that takes two dozen cpus and 50 gigs of ram, probably stay clear from using webassembly for that purpose as well. I think really the sweet spot for WebAssembly in the browser is things like audio and video processing, gaming, it's been users by games a lot, simulations and subset of computations, playgrounds like sandbox Bio and these sorts of things where you're not doing too little or too much computation, but just enough that makes sense given what you're doing in the browser and given the complexity that you're introducing into your code by bringing in webassembly. So number two is, you don't need to use webassembly yourself if someone has already done the hard work of compiling the tool you're interested in to webassembly. So make sure you leverage these libraries like SQLJs or Pyadye if you want to use SQLite or Python in the browser. The idea being that now you're just using an off the shelf JavaScript library. As far as you're concerned, whether they use Webassembly or not is kind of irrelevant. And that is a great place to be in because it means that you don't have to deal with all the maintenance burden and these compilation burden. And number three, don't try to replace containers, right? When we talk about using WebAssembly. So far I've mostly talked about webassembly in these browser. You can also use Webassembly outside the browser. And so here's a hypothetical example. You have a whole bunch of containers that are used for your Python web application. You have an Nginx postgres for the database, and then you have the python side of things that uses G, Unicorn and flask. You're not going to compile every single one of these containers into a webassembly binary instead. First of all, that's going to be really complicated. Dealing with things like especially postgres, sockets and such is going to be nontrivial. But also when you compile Python to webassembly, that adds a significant amount of overhead and typically you'll see a lot of slowdown. And also the benefits just aren't really there. And so to me, this blind replacement of containers with webassembly does not make sense. And I think most people in the field agree that webassembly will not replace containers. It's just that in certain situations, webassembly becomes another option. So to me, where it really makes sense to use webassembly outside the browser is first if you want to safely run user provided code. And so what this means is if you have an application and you want to let users write code to extend the functionality, using a sandbox like webassembly outside the browser makes a ton of sense, and that's a really good use case. Another one is edge computing. Edge computing is the idea that you can spread your code all over these world, and depending on where your users are, they will execute the code on a data center that is closest to them. And so there clearly speed matters if you're doing that. And so one thing that's nice about webassembly is that it is more lightweight than containers, and so it can initialize a lot more quickly. So that's another use case where it kind of makes sense. Finally, I wanted to share some resources with you that I thought could be useful. The first one I'll mention is sandboxbio itself. It is primarily focused on bioinformatics, but it also has interesting command line tutorials that would be applicable to a general audience like AUC and JQ. We also have playgrounds. So I often find myself writing an AUC or said command where I want to write something that I want to test really quickly without having to type something. Pressing enter up, arrow key modify, enter up modify. And so this playground lets you do that very easily. So anything that you type in here gets immediately executed in the browser and shows you the output of the command. And this obviously also uses Webassembly, another resource. So I have this open source package called bioasm. This is a library of mostly bioinformatics tools that are compiled from C to Webassembly, but I think it could be a useful resource if you're looking for other examples of compiling complex applications to webassembly. And finally, there's also my book levelupwasm.com and I also have a whole bunch of free articles and other talks that I've given that you might find interesting. And with that, thank you very much for being here.
...

Robert Aboukhalil

Senior Software Engineer @ Chan Zuckerberg Initiative

Robert Aboukhalil's LinkedIn account Robert Aboukhalil's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways