Conf42 Python 2023 - Online

PyO3: Rust <3 Python

Video size:

Abstract

Rust is great at low-level zero-overhead abstractions. Python is great for quick prototyping. With PyO3, get the best of both worlds. Migrate code to Rust when you need to without changing its external interface and without violating DRY.

Summary

  • Moshe Zadka: I live in Belmont in the San Francisco Bayer peninsula, which is the ancestral homeland of the raw Tusholoni people. Today I want to talk to you about Pyo three and how Python loves rust.
  • Rust is a low level language which supports zero cost abstractions, abstractions that don't have any one time cost. This combination is rare, and that's what makes rust useful. Use a rust example to show us how to use these features in Python.
  • Using rust in Python is surprisingly easy. Rust is very high performance. Python is almost the opposite. But if you combine them, you get the best of all worlds. You can prototyping in Python and then move the performance button next to rust.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everybody. My name is Moshe Zadka. My website is cobodies.com, where you can find every way of reaching out to me, known to humankind. And today I want to talk to you about Pyo three and how Python loves rust. I want to start with the acknowledgment of country. I live in Belmont in the San Francisco Bayer peninsula, which is the ancestral homeland of the raw Tusholoni people. Let's start by talking a little bit about rust. What use it, why using it might be a good idea, and how to use it. What is rust? Rust is a low level language which supports zero cost abstractions, abstractions that don't have any one time cost, and it is memory safe. This combination is pretty rare, and that's what makes rust useful. So why is this combination useful? Well, sometimes, as they say, algorithm needs to go vroom, right? Performance is useful. However, the algorithm often runs on untrusted data. We really like it, even if the data is bad, even if it's bugging the algorithm. The way it comes about is not harmful, right? It doesn't say open a security issue. The prototypical example that you want to keep in mind when you think about trust is low level parsing. Let's say that you need to parse some new weird low level format. Your parser will probably have to do with some untrusted data. You can't use a higher level library because this is new format. So you have to go byte by byte, which means you want something that can give you the performance of reading something byte by byte while still keeping safety in mind and giving you the zero cost abstractions to make sure that your code still is readable and easy to debug. So for my example, I'll use counting as kind of my low level parsing example. It's a nice third example that will show us a number of features in rust and how to use them for Python. So in this example, we need to check if a character appears more than x time, not to say how many times it appears. We just need to see if it goes above a threshold value. That's an interesting twist. And that means that we can't use a lot of other things that might be super useful in general for counting, but will, for example, read the whole file, even if we only need to check for three appearances, and those all appeared in the beginning. We want to enable resetting counts on spaces and new lines. Again, this is the example, so you can think of like a use case, right? We want to see is there the word that includes x more than the character more than x time, or if there's a line that includes it more than x time. Again, this is not very complicated to implement, which is what makes it a good example, and does have enough nooks and trannies that we'll be able to use it to kind of get into some of the interesting parts of interfacing between Python and rust. So it's really just interesting enough. So let's start writing some rust code. The first thing we know is that we need to support three options. Reset on new lines, reset on spaces, and no reset. So we make it an enum that would be probably the equivalent to what we would choose in Python to make it as a three way num. And now we have the structure that kind of keeps the parameters of the problem, right? So you want to define the character, we want to define the min number and what is the reset. And in our example we just have one method has count, where you give it some data and it checks if it has a count. In order to make the code a bit more site friendly, I've moved some of the implementation into a separate has count internal function that won't be exposed to Python later on. Spoiler alert. But that makes it easier to kind of fit different things on slide. So the function takes a counter, it takes the data, and it returns a boolean, and it loops over the character. It will update the current counter. You'll notice that I define current count as a mutable. That's mutable. And when I pass it, I pass it as a mutable reference, which means as the changing site. I can tell that this is a function that might update current count as well as get it as an input. And if it returns Boolean, then I can stop. I know that I already have enough count, so I can stop. So now we need to implement that count, and we do maybe reset, which might reset if we encounter a reset character. It maybe increments if this is the right count character, and if it sees that the current count is bigger than the minimum number, it returns true. I do that by just having the last expression equal true. So the way Ras interprets a semicolon separated list of expressions is by evaluating them in order and returning the value of the last expression, which in this case is a boolean, saying whether we've exceeded the current count, maybe reset. Use a pretty fun feature of rust, which is the match. So in this case I match on a tuple. The tuple is the character. And what is the reset and if it's new line and the reset is new lines reset, or if it's space and the reset is spaces reset, then I reset the count, and otherwise I do nothing. Every function in rust has to return a value, but in this case the value is the empty tuple, which is always the same value, which is the empty tuple. Maybe incur checks if the character equals to the target character, and if so, it increments the count. You notice that here I direct the current count, and you'll also notice that again, both in the calling site and in the function. I take care to note that current counter is a mutable reference, right? Which means that it will always know that it's a mutable reference at every point. And the important thing is that this is not a default. And so if you don't see that, you know, for example, in this case, that counter cannot be changed because that's not a mutable reference. So again, I wrote this code both to fit on slides. This is not necessarily the best practices. I didn't format the code according to the best formatting guidelines, again with the ideal that it fits into a slide and be reasonably readable. Here, the API between the functions is definitely not ideal. Even the higher level API is not what I would choose in other cases. But this is enough to have working code. It's not really bad rasp, it's just not an ideal rasp, but it's nice to see it on slides. And now we see how that code, right? So again, in kind of our imagination, that code was kind of mostly written or pre written before we start, but we have to understand it so that we can properly wrap it in python. So now let's go to the wrapping parts. There's obviously a few ways to wrap it, but using pyro three is really nice, because all we do is add annotations to the Ras code. We don't have to write any python code or any kind of glue code. The only thing we do is go inside the RAs code and add proper annotations, so it's purely in line. And which means that as the code is modified, it's modified together with the wrapping code, we start with like include. This is the rust equivalent to import statement. So we use a Pyo three prelude, which imports a lot of stuff that we'll use on later. And now we have to decorate the reset enam a little bit. So we say this is a pyo class, which means please expose it to Python, and it can't be exposed to Python unless it implements clone or copy, and these are basically things that we would need to implement ourselves, except in waste. Often when you want to say copy it the obvious way that you would have copied, and clone it the obvious way that you would have cloned it. There's a specific way to spell these things, and that's what derived clone and derived copy means. It means there's only one reasonable way to copy that. There's only one reasonable way to clone that. Please just write the code yourself, don't make me do that. So Russ will happily do that for you. The counter is a little bit simpler. All we need to do is just wrap it with a PI class and the implement. We wrap with PI methods and we add a new method called new. We didn't have to have it for us because we could create a new counter object directly from rust. It wasn't hidden. But in our case we do want to make it exposed to Python, and that means that we need to expose a constructor to Python. So we expose the constructor to Python that takes the parameters and sets them in the structure. And with all of that done, we just tell the module to include the counter and to include the reset and we return ok. The question marks means that if there's a problem in adding either class that will raise an exception. Well, it will return an incorrect PI result, which in Python, when it's unwrapped will raise an exception saying I had a problem initializing that. And the okay means don't raise an exception, it's fine. Now the tool to use for all of that is maturing. So maturing develop is the equivalent of PP install minus editable. It installs it inside the current virtual environment, and you'll still have to rerun it because it's fast and not Python, but it kind of more or less automatically keeps it up to date. And when you finally want to upload a wheel to pipe PI, you use material and build it will give you a great wheel. And by a great wheel I mean it will be cpu specific. So if you want to support more than one cpu, you need to build it on more than one cpu architecture. And obviously Linux Macs and windows will need different wheels, but it will provide you a many Linux wheel, right? So that's all taken care of you just by using maturin without changing hard. And once you have either the wheel installed or you install it via Maturin for testing it out, you import it just like any other module, you create a new counter. In this case I created a counter that does the new line reset method. And if I count something that has three c's in it, then it will return. True. And if I count something that has three c's in it, but there's a new line between the first two, then obviously it resets on the, after it sees the second c, it resets on the new line, and so it never gets a three. And so it gives me false. Right. So now we have fast code to implement the desired algorithm. So what do I want you to take away from this? Using rust in Python is surprisingly easy. If you already have the Rust code, you decorate it. If it's someone else's rust code, then you have to thinly wrap it in your own layer of rust and then decorate that. Obviously these two languages are very different from each other. Rust is very high performance. I didn't put any measurements here because that was beyond the scope of what I have time to talk about. And also, there's many ways to optimize rust, but even in this case, implementing more or less equivalent code in Python would be much, much slower. It's safe, right? I looped over characters. I potentially could have done pretty complicated things. There was no way that this code could have an out of bound error or a memory issue or something like that, because I wrote it on top of high level abstractions that are safe even though they don't have any runtime cost, but just have a learning curve, right? You saw, I have to understand stuff like mutable references, and if you do slightly more complicated things, you start to have to understand bow checkers and lifetimes and a lot of fun things like that. It's not trivial, and that makes it kind of awkward for prototyping. If I want to quickly change how to function interface, I also have to change the type information. Python is almost the opposite. It's very easy to get started. Supports really tight iteration loops, but there is a speed cap. You can optimize Python, but at some point you write into pretty hard limits. But if you combine them, you get the best of all worlds. You can prototyping in Python and then move the performance button next to rust and the stronger together you can do development and then you send it off to be deployed and you reap the benefits. So I hope that will help you your own projects. Thank you so much for listening, and I hope you enjoy the rest of the conference.
...

Moshe Zadka

Principal Engineer @ Twisted

Moshe Zadka's LinkedIn account Moshe Zadka's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways