Conf42 Python 2021 - Online

Python - best practices

Video size:

Abstract

I will share my experience with Python over the last 4 years.

Specifically on practices that have benefited our workflow in terms of future refactoring and performance.


Python easily makes it to the top 5 on several reports for widely used programming languages due to its ease of use. Due to this, it is very important to enforce best practices while coding which we’ll be covering in this session which can be attended by everyone interested in python programming.

Summary

  • Today's session will be about best practices with respect to python development. The first suggestion is to use built in methods. The second suggestion over here is to fail fast. Using inbuilt functions offers a very good performance boost.
  • Let's compare the performance between loops, list comprehensions, and a map. List comprehension is the clear winner for simple scenarios like this. But when you start generating in a much more complicated manner with several conditions or multiple objects, then reading list comprehension is not very developer friendly.
  • Using plus versus join is preferred method of concatenating strings. Regular way versus regular way to update a fruit versus get. Suggestion for collections is to use generators as opposed to collections.
  • Whenever we have huge list of values, there are two ways we can go about checking it. One is to use a for loop and then just check if the number is present within that. The other approach is simply to use the in. This is way more efficient in most of the scenarios.
  • Due to a time constraint, I had to restrict the number of best practices I could discuss with you folks. But I would love to hear feedback from you and any other best practices that you may recommend. And any constructive criticism feedback is more than welcome.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good day everyone. I'm Ranjan Mohan, a senior software developer at Alice, and today this session will be about best practices with respect to python development. A few things before we get into the session. As such, I'll be using a few metrics to highlight as to why certain suggestions are better, and those metrics should be taken with a pinch of salt, simply because metrics like the execution time haven't excluded the the thread sleep time, as in the time taken to context switch in the CPU. So that's the eventual goal, to exclude that in our measurements in a future iteration of this, but as of now it doesn't. But because of the steep gap in the time difference, we work under the assumption that it's pretty evident that the context switch time is not playing a big role here, simply because I'm not running anything CPU intensive, and the fact that the time taken to basically run that is pretty close to the overall thread execution time. The second thing over here is this is a fragment of my experience, and I have only covered best practices that I have resonated with closely. So it's not the whole list altogether, but I've just cherry picked a few and would like to share it to you over the next few minutes. So let's get into the session within the general section. The first suggestion I have is to use built in methods. Here we have a collection numbers which contains values from 100 to zero, as in 199, so on all the way till one, right before zero, because one is zero is excluded, followed by squared numbers, which basically should store all the squared values of the numbers over here. So we could use a for loop for it. But we have chosen to use a map which is an inbuilt function, and we give it a lambda to say hey, apply this lambda for all numbers within the numbers collection. So this lambda takes in a single input, squares it, and returns that as the output. Similarly, when we want to get the sum of the squared numbers, we just make use of the sum function, and when we want to find the product of all the squared numbers, we use a reduce with a lambda that multiplies the two inputs that it has. So this lambda on consecutive application to the collection of squared numbers will continuously reduce it one by one, and eventually at the end, it'll reduce it to a single value, which ends up being the product of all the squared numbers. So like I said before, all these goals can be achieved using for loops. But there are two key reasons why using inbuilt functions would be much better for these scenarios. Number one is basically, most of the times it offers a very good performance boost, especially when you're dealing with a large collection of numbers or values. The second thing is it simplifies the code and makes it more elegant, as opposed to a for loop where you need to maintain variable names and add other references and so on. Using a map or a sum or a reduce clearly simplifies as to what operation you're planning to do. And it also takes care of efficiency like point number one. So those are the two main reasons. Number one, efficiency in terms of performance memory and cpus. Number two is because of the fact that it generates cleaner code when you use inbuilt functions, as opposed to trying to reinvent the wheel. Now, the second suggestion over here is to fail fast. Consider the requirement where I need to convert years into months and the number of years is the input over here. One approach I can take is I can check for the success or the valid criteria to see if it's a valid year. Go ahead and compute the number of months and return it. Else throw an exception and fail. That's basically the fail late solution, because you're checking for the validity and computing. And if that's not the case, then you're failing. So you're failing later, as opposed to you checking for the invalid condition first. And if that's true, you're failing first, else you're basically computing the necessary output. Let's try and run these methods, both these methods for your values that are invalid from minus n to minus one. So it runs from -100 to minus two, and I'm going to run it for both and over 1000 times. I'm sorry, it's from -1000 to minus one. So it goes from -1000 to minus two. So I'm going to basically run it for all those values and then find the average execution time for it. Let's see what is more efficient, failing first or failing last. As you can see over here, failing first is about zero, zero, five milliseconds, whereas failing late is about 00:13 so there's a clearly more than a 50% cut in the time taken when you're failing fast. So this is also under the premise that the condition that you use to check a failure criteria and a success criteria and a valid criteria pretty much takes the same time. If the conditions that you use to evaluate success or a valid criteria takes way more, it's way less time than the one makes to evaluate a failure criteria, then you'll obviously see the vice versa. So this is under the assumption that the conditions to check for a failure, as well as a validity criteria take the same amount of resources in terms of cpu time and memory. Now the third suggestion over here is to import only when necessary. I have a baseline function here which does absolutely nothing, it just returns. And I have another function which imports two commonly used libraries, numpy and URL lib. So what I'm going to do is I'm going to try calling both those functions, and I'm going to try timing it to see how much time it takes. As you can see, the baseline function did practically nothing, and I think the execution time might have been in microseconds, so it rounded off to zero milliseconds. But on the other hand, the import call function took about 264 milliseconds. Now imagine a scenario where you're calling a function which calls another function which calls another function, another file, and so on. For each file it accesses, if it has a global import that is right on top over here that imports things that are not needed for the function being called, it keeps coding several hundreds of milliseconds to the runtime, and that can end up being your performance bottleneck. It can add few seconds to your overall execution time. So one thing to notice, especially for expensive imports, please do it only when necessary. I understand there is a trade off, as in the convenience of managing all imports here is lost, but with the convenience of ides where you can refactoring analyze things. And a lot of these tools which you can use to analyze dependencies, this becomes a bit of a moot point to maintain it on the top. So for expensive imports, use it only in the context where it's needed, be it within a method, within a class, within a code block. That's for you to decide. But that would end up saving a few seconds of runtime, if not 100 milliseconds of runtime at least. Now the fourth suggestion over here is to use caches for methods where for a unique sequence of input, you have a very, very repetitive, specific output, right? The output is not going to change if the input is the same. So for such scenarios, it would be wise to use a cache as in what the cache does is it maps the input that was previously called to the output it generated. So the next time, if the same input is supplied and the method is called, it returns the same output from the cache. It doesn't need to do the computation again. So for this we're basically trying to compute the sum of the first n Fibonacci sequence numbers and we're doing it using recursion over here. And both of the method logics are the same, except that the first method doesn't use a cache, whereas the second method uses an LRU cache, and it stores up to 128 elements, as in 128 unique combinations of inputs and outputs. So what an LRU stands for is a least recently used when the cache becomes full, it evics entries that are not used recently. So as long as the entry in the cache hasn't been used recently, it evicts it. That is the strategy of this particular caching technique. So let's try running both of these and try to see what the performance gain in terms of execution time is. So as you can see, with the LRU cache, it's pretty much close to zero. We don't see any noticeable increase in time as the number of terms to some increase, whereas for the method without a cache, we see it exponentially rising. So as you can see, this is a scenario where using an LRU cache, or any cache for that matter, would significantly help. But one thing to notice, in such scenarios where there's for the same input, you get the same output. I wouldn't recommend using cache only for one one exception only if the input objects are very, very expensive memory wise, because in that particular scenario, or that case, your cache can end up occupying a lot of your memory and reduce the amount of free memory available for other parts of the code to execute. So that is the only caution you need to exercise. But in scenarios like this, where the method just takes in a measly integer value, you could jolly well use a cache to improve performance by several manifolds. Now that we're done with the general category, let's move on to the loops category, where let's compare the performance between loops, list comprehensions, and a map. So the test code over here is basically trying to accept a list of numbers, and trying to return a list which contains the same numbers, but the square of the numbers, not the exact same numbers as such, but the square of all the numbers that have been passed as the input. So when I run this, I'm basically trying to call all three methods for values between zero and 1 million with increments of 10,000. So it's going to take a bit of time for the execution to complete, but we'll get a very good idea as to how the performance of all three approaches for the same problem, or the same goal that we need to achieve is like. So as you can see here, list comprehension is the clear winner. It uses much lesser time than any of them, especially when the size of the input increases dramatically, whereas loop is somewhere in between and map ends up using most of the time. So one thing I would strongly recommend here is even if map ends up using a bit more time in the order of a few tens of milliseconds, I would still consider using map more than list for loop, simply because of the fact that it's cleaner code. And over time, especially for very large inputs, that's when you really start seeing the efficacy of map as opposed to a for loop. And over here, list comprehension is the clear winner for simple scenarios like this. Go ahead and knock yourselves out with list comprehension. But when you start generating in a much more complicated manner with several conditions or multiple objects, then reading list comprehension is not very developer friendly, so such code can actually end up adding more cognitive complexity in terms of the developer to take time to understand. So it would make more sense to add a map or even a loop at some point. So performance wise, list comprehension is the winner, even though for loop gives you a better performance overhead, up to 1 million numbers. I would personally still prefer using map simply because it generates cleaner code and it generates a much better performance throughput at a much, much higher scale. Now that we're done with the loop section, let's move on to the string section. Any PR where I see string concatenation about 90% of the time, I see a comment that says use join, don't use plus. And there's a very, very good reason why. Let's take a look. There's a method over here which uses plus to concatenate a given list of strings. There's another method over here which uses join to concatenate the given list of strings. Let's try running this across sequence of lists where the first list is basically about one value, second list is two values, and so on, up to 1000 values. And see how the performance difference is when we use concat, as in the plus operator versus join. So if you see the execution time for using plus is a bit erratic. It goes up, goes down, goes up, goes down, goes up, goes down, and as. And when we move across larger number of strings, the trend of the execution time increases, whereas while using join, it's almost close to zero, it's not even observable. So that parallelizes it makes it way more efficient in terms of performance. And that is the preferred method of concatenating strings as opposed to plus, for obvious reasons, as we can see, now that we are done with the string section, let's move on to the collection section. So in this case, whenever we talk about maintaining a dictionary for a particular reason, we might have to initialize values that are not present inside it. So assuming we want to add a fruit or keep a dictionary that keeps track of the number of fruits, and when we say add a fruit and we pass a fruit argument, it should check if the fruit is there in the dictionary. If it is not there, you basically initialize it to zero and then increment the value by one. So this keeps track of fruits that have been added. This is the primitive approach to go about it. But one of the more elegant approaches, and not only elegant, but even a performance gain approach, is to use a get function within the dictionary. So what it does is it tries to get the value of fruit. If fruit doesn't exist, it returns the default value that we are specified, which is zero, and then it increments it by one. So by doing such a thing, we're not only finishing it elegantly within a line, we're also gaining performance. So that's what we're going to test over here. So there are two operations that I'm testing. Adding a fruit for the first time versus updating an existing fruit. So let's try testing both of it. When we try to add a new fruit in a regular way, it takes about 001299 milliseconds, whereas when we try to add a new fruit using get, it's zero, zero. There are three terms, seven, nine, seven. But to be fair, adding a new fruit, there's not much of a performance overhead. If I run it again, you'll see the gap shrink or even the gap for the new fruit being higher. But where you'll see a good difference in the opposite direction is basically the regular way versus regular way to update a fruit versus get. So the regular way makes much lesser time in this scenario, as opposed to using get, simply because of the fact that when you use a get function, you're trying to get it directly and there is an indirection inside that to basically pass a default value as well. So that is a bit of an overhead, not much. So let me try running this again and we'll see what's happening. So in this scenario, getting the fruit using get is much lesser than this, as in getting the fruit or getting the fruit after updating in the regular way, as opposed to using get. In a nutshell, if you take a look, using get for updating or adding, the average time taken for both operations will generally be lesser than the time taken to update and add in the regular way. So you take an average of all these operations for the regular way, and the average of all these operations for the regular way. This average using get will always be lower than the average in the regular way, and that's the reason why comparing it individually may not yield meaningful results. But comparing all the operations that have been done totally in the regular way versus using get will make more sense in this scenario. The second suggestion for the collections category is to use the generators as opposed to collections. When you want to generate a list of numbers or a collection of numbers. One way is to use a list comprehension to generate a list, or even a for loop to generate a list. The other approach is by just setting up a generator. So the key difference over here is when you use a list comprehension, it actually generates the list and how many of the numbers you're generating are generated and unstored in the memory. Whereas in this scenario you're just storing a generator object that lazily generates whenever you need that number or you call that generator object to generate. So in that case, the memory used for this is significantly lower than the memory needed for this. So let's try to run this piece of code and see whether it gives us any performance gain. So if we look at the size occupied, especially with the increasing number of values, we see a steady increase when we are generating the numbers list as opposed to the generator. The memory usage is constant, very very minimal compared to the list generation, because it's not generating all the values and storing it is only generating it when we use a generator to iterate or when we iterate through the generator. So that is also one of the best practices that I would like to suggest. Now that we're done with collections, let's move on to the condition section. Whenever we have huge list of values, there are two ways we can go about checking it. One is to use a for loop and then just check if the number is present within that. The other approach is simply to use the in. This is not only more elegant, it comes back to the built in methods and keywords that we have to use. This is way more efficient in most of the scenarios. So let's take a look to see how much of a time gain we get by using in, or by just iterating through it and checking it out. So as you can see, the time, there are four sections over here. Primarily. One is basically the red, which is matching using in, and then we have the orange, which is basically matching using the for loop. So this is basically the orange, whereas the red is over here and it's being overlapped by the green and the other elements, the colors over here. But in a nutshell, if you take a look, the red spike over this is the red spike and these are the orange spikes. So this red spike over here is for in. So there's a momentary spike over here, whereas that may be because of the context switching in the cpu, whereas for the for loop we see way more context switching and way more increase in the time duration as opposed to using in. So that is one of the key things that we observe over here and reason why we would recommend using in, simply because it's cleaner code and there is also a performance gain as opposed to using the for loop. Now that we're done with condition, we have successfully concluded the session. So I would be sharing this repository containing this code, as well as a best practice markdown file that contains all best practices we have discussed and more, in fact. So due to a time constraint, I had to restrict the number of best practices I could discuss with you folks. But I would love to hear feedback from you and any other best practices that you may recommend, so I would be glad to include it in my future session. And any constructive criticism feedback is more than welcome. Thank you so much for investing your time in this presentation. I hope to see you in one soon, another one soon. Thank you.
...

Ranjan Mohan

Senior Software Engineer @ Thales

Ranjan Mohan's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways