Conf42 Python 2021 - Online

Python Memory Understanding

Video size:

Abstract

While writing python code, you might have wondered about pointers and memory? Am I right? I think yes, most of the developers try to write code which consumes less memory as well as which executes faster. But have you explored how memory management works in Python? What happens when you define a variable? In C/C++, you can define a pointer or any integer, but is it so with Python?

We’ll have a look at many topics of memory in this talk suck as allocation, de-allocation, garbage collector and many more.

Summary

  • Nisaraksha will be talking about Python memory and rabbit collection. After this presentation you'll have a thorough understanding how the memory deallocation and deallocations works. Now you might be able to write some efficient Python code.
  • In Python everything is object only. As soon as any assignment is done, an object will be created. The garbage collection automatically tracks all the objects and it deallocation the memory automatically. To speed up all these operations, Python has a manager called Pymeloc.
  • The reference counting algorithm scans the reference count. If it finds any reference count equal to equal to zero, then it deallocation the memory for that object. But this algorithm can't detect the circular reference. Also, there are some performance issues and all that. This is where the generational GC algorithm comes into picture.
  • The processing is done by TP underscore traverse function. It marks all this temporarily unreachable tag and all that. At the end, all the objects of this unreachable container will be deleted by the algorithm. I would highly recommend you to have a look at the design of the c Python's garbage collector.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, myself Nisaraksha and today I'll be talking about Python memory and rabbit collection and so how it works and all. Myself Nisaksha and I am an undergraduate can student and right now I'm interning at fountain and startup. You guys can connect with me on the following social media links. And yeah, if you have any feedback for me on this presentation then also feel free to give me and if you have any doubt or if you want to discuss any topic then also, yeah, I'm open to that. Now let's have a look at what will be the main points covered in this talk. What will be the giveaway of this talk? So first of all, I'll start with the Python objects. Then we'll shift to the memory storage, that how memory is being stored and all. And after that we'll have a look at a couple of garbage collection algorithms, what it does, how they works and all that. After this presentation you'll have a thorough understanding how the memory deallocation and deallocation works. And so you might be able to write some efficient Python code because now you know how the objects are getting deallocation. So you might consider that thing while designing your application. Also, as a Python developer, you might have only heard of objects. So yeah, in Python everything is object only. Now let's say here I have assigned nine to a variable, and after that if I do id of variable, then I'll get an id, and if I do type of variable, then I'll get a type which is like class of int. So yeah, as soon as I assign this nine, then Python. So object will be created which will have unique id, its type value, and a reference count. So like as soon as any assignment is done, an object will be created which will have all these properties. Now, how the memory storage actually works in Python, the objects and the instance. So the objects and the instance variables are actually created in a heap memory. So as soon as the functions are written, so at that point there is no need of that function or variable so that objects will be collected by a garbage Collector. Now what's garbage collection? We'll see that in upcoming slides. And the methods and the variables, like the local variables and the methods are created in the stack memory. So a stack frame is created. Then as soon as the method returns, those frames get destroyed. In Python, you can have an object into an object, right? So a list into list or a dictionary into dictionary, or a list into dictionary, dictionary into list, any kind of thing, right? Because of this Python's dynamic nature, we need to have smart memory allocation scheme and also the speedy memory allocation speed. Now, to speed up all these operations, Python has a manager which is called Pymeloc, and it actually sits on the top of the general memory allocator. So the main work of Pymeloc is to allocate the memory and speed up the memory operations. Now let's say in any Python application you might have hundreds of objects and a couple of long running Python processes, right? So there might be need of huge memory. Now if you as a developer are handling the deallocation of memory, then it might be a case that by mistake you might deallocation some wrong object, some wrong variable which may be in use in further program. Then your program will be crashed and debugging will take a lot of time. So who does this garbage collection thing? So who actually deallocation the memory? So for that, the garbage collection has a bag. It collects the objects which are of no longer use and it de allocates them. It actually releases the memory whenever the object is of no longer use. So you can think this system as a kind of trash bin in our computer. So if you don't need any folder, then you'll just shift it into the trash and then it will be deleted. So it's a kind of a similar system. The garbage collection automatically tracks all the objects and it deallocation the memory automatically. So it's a huge relief for a programmer that the programmer does not need to deallocation the memory by himself. And the thing is, the garbage collection algorithms actually track these objects and it finds an optimal time at which to deallocation. There are these algorithms which does all this, and it is very useful for us also. So as a programmer, we don't need to do anything. Now let's look at the GC algorithms. So one is the reference counting algorithm, and the second is a generational GC reference counting algorithm is pretty much straightforward, it's easy and efficient also. And it works as like whenever there is no reference. So as the name suggests, reference fountain. Right? So whenever there is no reference of an object at that moment, it delocates the memory of that object. Now, to keep track of all the references, every object has an extra field called reference count, which we saw in the first slide that each python object, so each object has its id, its type and reference count and a couple of other fields, right? So actually that reference count is increased or decreased based on the allocation and all that. And the generational GC algorithm. So it is based on a trace based algorithm so it actually scans a list of objects and all that. So we'll look into much detail in the upcoming slides. But these two algorithms are there which work in different cases. And so actually generational DC algorithm detects the circular references, cycle of references, and it is used in that case. So let's have a look what all this means and let's understand it. So first of all, we'll see the reference count. Now here I have assigned 19 to variable a. So if I do id of a, then I'll get the id of this object. So as soon as I do this assignment, like this object will be created, and of course the id will also be there. So this object will be created, which has its type, value and reference count. And all these properties are automatically detected by the python. So we don't need to give it here you can see the reference count is one, so we can say that a is referencing towards this object. So like this kind of thing. Now let's say if I introduce like new variable b, and if I do b equal to a or b equal to 19, then what will happen? If I do b equal to a, then you can see when I do id of b, then I'll get the same id, which means b is also referencing to the same id to the same object. And as soon as I do b equal to a, the reference count will be incremented by one. So now it will be two. Now as soon as I will do c equal to 19, at that moment also the reference count will be incremented by one, which means here reference count will be three and C will be also pointing towards this object. You can get all this count by sys get ref count. You just need to pass the object in that and you'll get the reference count. But here note one point that passing a variable into the function, it also increments the reference count by one because it creates a local copy of that, and that also is referred to the same object. So if I do sys get ref count of c, then I'll get the reference count as four, not three. Now let's have a look how the list works. So let's say I have one list, which is l one equal to one two three, then I'll do id. I got an id. If I do l two equal to l one, then I'll get the same id, right? But now let's say I copy the list, like copy using copy copy. If I copy the list one into l three, then you can see that the id of l three and id of l one are not same. It means the copy actually creates another object and it is referred to that object. Of course, if I do l one equal to one three, four, then the id of l one gets changed because now it is referencing to some other object. Now let's see what circular reference means because like generational dc algorithm detects it. So as the name suggests, some kind of a circular. So like here it's a circle. Here it's a circle, right? So circle reference means either an object is referencing towards itself or like two objects are differing towards. Here, let's say I have one list l one, or list one equal to this, and I am appending list one. Append list one. I am appending the object into the same object. Or else let's say I have two objects, object one and object two, and I am referencing object two into object one and then object one into object two. Now, in this kind of scenarios, if you look, then the reference count will always be greater than one. So the reference count won't be equal to zero at any moment. Because if I am referencing towards myself only then the reference never be equal to zero. So in this moment, the reference fountain algorithm won't work. And so the generational DC algorithm comes into picture. Now let's see how the reference counting garbage collection algorithm works and how it handles the reference counting algorithm. It actually always counts the reference number to the objects, right? And the reference counts are kept in the memory so that the programs are executed effectively. Now, first thing, we need to have a look at that, in which case the count will be increased, right? So one will be, if I pass an argument into the function, at that moment, the count will be incremented. If I append an object to an object, then also the reference count will be incremented. And if I do any assignment, operator, then also the count will be incremented. The reference counting algorithm scans the reference count. And if it finds any reference count equal to equal to zero, then it deallocation the memory for that object. Now, let's say I have one list, which is one, two, three, and if I'm appending that list into list two, if I have four, five, six, and I am appending that list one to list two, at this point the reference count will be one of l one. But now, as soon as I do l two equal to four, five, six, and I am append l one into the l two, at that moment, the reference count will be incremented by one, and now it will be two. And if I delete the l two. Then at that moment the reference count of l one will be decremented by one. Also. Now for global variables. So what about global variables? Right, so if the reference count of global variables becomes zero, then this algorithm will deallocation the memory, and after that we won't be able to use that global variable. But the main use of the global variable is that if that variable is not in use at any state, then also I'll be able to access that anywhere in my program. Right? So here in this case, it is made sure that the reference count of the global variable never drops to zero, which means they won't be de allocated and they will be deallocation only once the Python program execution is finished. So that's of no issue here. Now let's look at one example here. I have one function named poo here. So I have assigned one string to my name variable. So at this moment the reference count will be one. As soon as I pass this, my name into this poo function, the reference count of this object will become two. And as soon as the execution is completed, the reference count will become one. You can delete the objects using del method, right. When you use del method in Python, it actually decreases or it actually decrements the reference count by one. So if you delete an object, then also at that moment the reference count will be decremented by one for that particular object. Now let's see, what are the issues with this algorithm? Right. First thing is, it can't detect circular reference, because the reference count for the circular references, or that cycles won't be equal to zero at any point. And the other thing is that this algorithm actually has many memory and performance issues. And also it's kind of a big algorithm, but the main advantage is that it is real. So the main advantage of this approach of scanning the reference count and check if it is equal to equal to zero is that the objects can be immediately and easily destroyed after they are of no longer use. So as soon as the count is found equal, equal to, to zero, at that moment the object will be deleted, destroyed. So yeah, that's a plus point. But the thing is that this algorithm can't detect the circular reference. Also, there are some performance issues and all that. Now for that, the generational GC algorithm comes into picture. So it can actually detect the cycle. It could reach to the object which was unreachable by the reference counting, and it could delete that objects and it could free the memory. One more thing to know that in what cases, reference cycles could occur, right? If we refer any object into objects. So let's say a list into a list, dictionary into dictionary, the same thing, or let's say two objects pointing towards each other, something like that. So in this kind of scenario, the reference cycles could be created, and these kind of scenarios are found by the generational DC. And if they are of no use or something, then the memory deallocation will be done. Now, one thing to note is that this algorithm does not run in real time. It runs periodically while the reference counting was running in real time. So in real time it would scan if the reference count is equal to equal to zero, and then all the things were done there. But the generational Gc algorithm does not run in real time. It runs periodically. So let's say at x moment of time, I have declared like five variables, and so I've created five objects, let's say then after some moment, so let's say now at y period of time, the generational DC algorithm started running. Now, between that period, let's say at y moment, now only three objects are induced, and the two objects which I had declared, like their executes or their function in the program, has been completed. So now at that y moment, first of all, it will check if the objects are newly created. If it is newly created, it will insert into the generational zero list. Actually, GC algorithm, it defines three generation list, and it inserts the objects into that. And then it checks whether the object is of use or no use, right? So at that now by moment, it will see if the objects are newly created. If it is so, then it will insert into the generation zero list. Now it will check for the references. So it will scan, and it will check whether all the inserted objects are of use or of no use. We'll see how it detects that these objects are of user, of no use, or if it has circular references, let's say. Now the algorithm found that there is a circular reference, or two objects are of no use, it will discard those objects and then insert the remaining objects into the generation one list. Now, the same thing will happen with the generation one list. So it will also check for the references. It will discard the objects which are of no use, and then it will insert the remaining objects into the generation tool list. And then this generation tool list, this generation two list, also scans it, and it discards the necessary objects. And now the objects which are there in this second generation list, it remains until the Python execution is not, until the Python code execution is not completed. Now one thing is each and every generation list maintains its own individual counter and its own threshold. So let's say if it exceeds the threshold, then also the scan will be done and it will try to find some objects which are of no use or something, and it will try to discard it, right? So now let's see how it detects the cycle or how it detects which objects to discard and how the circular reference is detected, and to discard it, right? Which is like this step, discard the objects. Now this algorithm has two containers. One is the objects to can container. Second is the unreachable container. And each and every object that supports this garbage collection will also have an extra reference count. So this reference count is the basic count of Python memory, which is incremented on any assignment or like something. But this now GC underscore ref will be an extra reference count initialized by this. So as soon as the algorithm starts, GC underscore reference is initialized. So how the initialization takes place. So this link one object is referenced by the outer variable, which is a so the reference can will be incremented to one. Then like this will be one because this link two is referenced by link one. Then link three will be also one, and then link one will be incremented to two because it is also referenced by link three. And here like link for the GC underscore RF will be one. Now just imagine that this link one, link two, link three are objects and let's say a equal to link one object. And then internally this link two and link three objects are referenced by this object. After this initialization is done, algorithm makes one more iteration, and then it finds that which objects are truly referenced by the outer world. So here link one object is directly referenced by the outer world, while link two, link three, and link four are not directly referenced by the outer world. Now, it will scan all these objects, it will find this and it will decrement the GC underscore reference field. It will actually decrement the GC underscore reference field by one if it does not find the outer reference directly. So here link one object has direct reference from outside world, so this will be one and this will be zero, zero and zero because this object does not have the direct reference from the outside world. Now as soon as this object has been scanned and this number has been allocated, so let's see what happens next. So now this will be the state, but now notice that GC underscore reference equal, equal to zero does not mean that these objects are not reachable. Of course, GC underscore reference greater than zero means that it is actually directly accessible from the outside world. But GC underscore reference equal equal to zero does not mean that they are not reachable. Actually, here link two is reachable by link one. Link three is reachable by link two. So now one more scan will be done. And now for the objects which has GC underscore reference equal to equal to zero count, they will be marked as temporarily unreachable, and they will be shifted to the unreachable container like this. So let's say it saw that GCN square reference of link four is zero. It will shift this into this container. Then for three, it will shift this into this container. Now, let's say it scanned link one. Then the GC underscore reference was one, but so as it is greater than zero, then it will not do anything. So this processing is actually done by TP underscore traverse function, and it actually marks all this temporarily unreachable tag and all that. Now as soon as this is done, then the TP underscore traverse again can and it finds all the objects which has an immediate object from the object which is actually referenced to the outside world. So here link two has an immediate reference from the link one, and the link one is actually referenced by the outside world. So now this link two will be shifted to this objects to scan, and its reference count will be incremented by one. Then it will scan link three. Yes, now it will check whether the link to object has an immediate reference or not. So yes, it has the immediate reference. So it will get this link three object into this object to scan container, and it will increment its GC underscore reference count. And after that, like link three, immediate reference is link two. So here also the GCNs reference will be incremented. But now if you can see this link four is all alone by itself. So it is neither referenced by any object, and also it has no reference from any object which is accessible from outside world. So now in this kind of scenarios, this will be now marked as finally unreachable. And at the end, all the objects of this unreachable container will be deleted by the algorithm. So this is just an overview of the working if we have three objects which are referenced by each other, and then if we have fourth object which is referenced by itself, now all these three objects are. So the first object has direct reference by some variable, and now the circular object will be referencing to itself only, right? So in this kind of situation, these cycles will be detected and it will be discarded. So this is just an overview. I would highly prefer to have a look at the design of the c Python's garbage collector. It will give you more clarity and also they have explained by some code, snippets and all. Also, I would highly recommend you to have a look at the GC garbage function, the GC collect function. And once you'll see, you'll get some idea about how the garbage collection so how the scanning and all that takes place. So actually, I have referred these things, and I also referred to some of the python memory model blocks at medium. So they are also good. So yeah. Now if you have more interest in knowing this, then I would highly recommend you to refer these references. Thank you. If you have any questions or anything then feel free to connect with me. Drop me an email, or if you have any suggestions for me then also you are more than welcome to have a chat with me. Thank you.
...

Nisarg Shah

Software Developer Intern @ Fountain9

Nisarg Shah's LinkedIn account Nisarg Shah's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways