Conf42 Golang 2022 - Online

Dissecting Slices, Maps and Channels in Go

Video size:

Abstract

Slices, Maps and Channels in go, are first class citizens of the language, they are used widely in our day to day work, but… do you know how they works under the hood?

Do you know the implications of adding elements to an slice, or new keys to a map? Do you know why you can’t relay in maps order?

Do you know how channels handle the buffer or the blocked goroutines?

If you don’t know about that, this is your talk. I going to access the go runtime memory state of the maps, slices and channels, and show you how they evolve over time while we change them.

Summary

  • Andela has matched thousands of technologists across the globe to their next career adventure. Now the future of work is yours to create. Anytime, anywhere. The world is at your fingertips.
  • In go. We are going to talk about the most commonly used building structures. Slices, maps and channels. We will analyze how they behave in memory when we modify, create or access these structures.
  • A map is just a set of map metadata and aSet of buckets. That is where the real data, the data that just stored in the map, is stored. The data structure in maps is way more complex than the data structures in slices. We will talk more about that later.
  • When a resize happens, it's going to reserve another chunk of memory and migrate all the data. This way you don't have a big blocked. When you need to resize a map, you have just resized the map. Getting that structure that in a readable way.
  • Reading the go runtime code is not complicated. It's not super straightforward, but it's not especially complicated. Understanding the building blocks of the language is going to help you to understand the implications that that building blocks comes with. Dissecting go and just reproduce these experiments is a lot of fun and some conclusions.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
What if you could work with some of the world's most innovative companies, all from the comfort of a remote workplace? Andela has matched thousands of technologists across the globe to their next career adventure. We're empowering new talent worldwide, from Sao Paulo to Egypt and Lagos to Warsaw. Now the future of work is yours to create. Anytime, anywhere. The world is at your fingertips. This is Andela. Welcome to the secting channel. So, slices and maps in go. We are going to talk about the most commonly used building structures. In go. Slices, maps and channels. We probably all know how to use them, but not necessarily how they work under the hood. In this talk, we are going to follow an experimental approach to analyze how they behave in memory when we modify, create or access these structures. Okay, we are going to need some class materials. The scalpel, the microscope, and the subject. Let's start with the slices. What is our escalpel in a slice? We are going to use a function called escalpel that is going to receive a slice. In this case, I'm going to use a slice of integrals because it's simpler. And I'm going to use unsafe to get access to the memory address of the slice and store that data in a structure. Then I have a microscope function. The microscope function is going to show me the data in a readable way to analyze what is going on. And the subject in this case is a slice. A slices is not other than array or one or more slices. We are going to see what I mean. This is the structure of anslice. A slice is an array. It's a pointer to an array in memory, just a chunk of memory in the heap. In this case, it's a chunk of memory that stores integers. Then I have the length and the capacity of the slice. What happened when I create a new slice of integers? In this case, I'm creating empty slice of integers. The memory address is not especially important here because we are not storing anything yet. The slice length and the slice capacity is zero and the stored data is just an empty, empty. Okay, what happened if I open a new integral? The slice code is going to reserve a chunk of memory to store that integral and store there the integral value. In this case one. The slice length is one and the slice capacity is one. Let's see what happened when I add more data there. If I add four more elements, it's going to change the memory address. Why? It's changing the memory address because I don't have enough space in the original reserved memory. If I go here and see the slice capacity. When I added one element, the slice capacity is one. So I'm not able to store more data in that array because the slice capacity is saying the amount of data that is able to store that array in memory. This case is one. So every time that we reach the capacity limit of the slice is going to happen at resize. A resize is just reserving another, a bigger chunk of memory in the heap and migrating all the data to this new chunk of memory. In this case we added five elements and in that process the slice has been growing. The slices is not going to grow one element at a time. The slice capacity is going to grow more than one element normally because you don't want to resize the slice with every single insertion. So in this case, at some dont, I added enough data to grow the slice and the slice growth to the capacity of eight. That means that I reserve a memory for eight elements, but in this case I'm only using five of them. So the store data in memory is 123-4500 because I have eight positions in memory. Eight integrate space for eight integrates in memory. Okay, what happened if I create a super slice? A super slice is just a slices of a slice. In this case, I'm creating a super slice from the position one to the position four. What is interesting here is the memory address. You can see that the memory address here is c one, a five 40, and here the memory address is c five four eight. That eight is because we are using integral. And in my architecture that means integral 64, what is eight bytes? Because this is the same, actually it's the same chunk of memory, but one byte after, well, sorry, eight bytes after, eight bytes after because I'm starting in the position one. So I'm skipping the position zero of the original array or the original slice. And also it's interesting, the slice capacity, because it's exactly the same chunk of memory. The slice capacity is seven because before was eight. But I'm not able to use the whole array. I'm just able to use the rest of the array, the memory that is reserved from the address that I'm using. Okay, let's see what happened if I modify a super slices. If I modify, for example, the position zero of the super slice, that is the position one of the original slice, what is going to happen? Is the original slice also get modified? It gets modified in the position one, and the super slice is modified in the position zero as I did in the code. Okay, what happened if I append something that is even more interesting? I'm appending six, the value six, to the super slice. But the super slices was from position one to position four. It is just three elements and is adding a fourth element at the end. But the end of the super slice is not the end of the original slice. So we are overriding the position, the fifth position in the original array, and we are appending in the slice. So the behavior from the perspective of the original array is different, and the behavior from the super slice, as you can see there, the value six is affecting both the super slice and the original slices. Even more interesting is if I happen more data and then modify something in the super slice, if I happen more data to the original slice and modify something in the super slice, what I get is because I added enough data to trigger a resize in the original slice, we are no longer sharing the array memory address. We are pointing to a different chunk of in memory. So that generates a disconnection between both slices. So a super slice can be considered a window to another slices if that slice hasn't been resized or the super slices hasn't been resized. That is probably a very unexpected behavior. So I wouldn't try to relay in this behavior for anything. Okay, I have the code that I used to run this experiment here. I have a link at the end of the talk with all the code. So if you want to reproduce this experiment, you can. So just check out the link at the end of the talk. Okay, let's talk about maps. The scalpel for maps is going to be pretty much the same. I'm just using unsafe to access the memory and store that memory in a structure. In this case, as an example, I'm using a maps of integral values and integral keys because it's simpler. The microscope is more complex because the data structure in maps is way more complex than the data structure in slices. And the subject is the map. A map is just a set of map metadata and a set of buckets. That is where the real data, the data that just stored in the map, is stored. Okay, what is the structure that store a map? Well, we have the count, that is the number of elements in the map. We have the flags. That is not important for this talk. We have the b value that represents the number of resizes of the map. Well, it's related to the number of buckets. Not super important, but yeah, it is related to the number of buckets. The number of overflows give you an approximation of the number of overflow buckets. The hashiro is a completely random number that is generated whenever you create a new map. And this random number is used to generate the hash of the keys to decide where the keys are stored in the maps. This is interesting because this gives the certain feature to the maps that it is not predictable in which bucket the keys are going to fall. It's not predictable before you create this random number, before you instantiate the map. So you can't relay in the order of a map or things like that, or in what the order of a map is not relatable in go. One of the reasons is because the hash zero, the buckets is just a pointer to a chunk of memory that store a set of buckets. We are going to see what a bucket is right away. Then we have the all buckets. That is another pointer to another chunk of memory. Storing buckets again. And evacuated is the number of evacuated buckets. We are going to talk more about that later and some extra metadata that is not super important for this talk. We have the bucket extract also. That is the structure that is stored in the heap, storing the real data. Each bucket is going to contain eight elements and is going to have a top hash that is just a set of chunks of the hashes of the keys stored in the bucket. Well, not especially important for this talk either. And then we have the keys and the values. The keys and the values are two arrays of eight elements storing each key and each value. Each position in the keys correspond to the same position in the values. And then we have the overflow pointer. The overflow pointer is another pointer that points to another memory address that contains a bucket, an overflow bucket. We are going to see what an overflow bucket is later. Well, let's start creating a maps. An empty map. In this case, I'm creating an empty map with integral values and integrate keys. The map size, because the maps is empty, is going to be zero, the flux zero, the b zero. What is for zero size? Well, the b zero means that there's only one bucket, the hashid that is a random number. The number of overflow buckets is zero. Well, the buckets is pointing to a chunk in memory that is storing an empty bucket. An empty bucket means basically top hash zero. All the keys are zero or the values are zero, and the overflow pointer is null or zero. In this case, the old buckets is not pointing anywhere and the number of evacuated buckets is zero for now. Okay, what happened if I add one element to this map? If I, for example, store in the key one the value ten? Now the map size is one. And everything else keeps the same except for the bucket. If I go to the bucket, I see that I store one key, that is the key one and one value, that is the value ten, both in the same position in the keys and the value of those arrays. The top hash is just a chunk of the hash of the keys, but it's not super important. Okay, let's keep going. If I add more data, in this case, I'm adding eight more elements. Eventually this is going to trigger a resize. In this case, now the map size is nine, the b is one. Because this number of elements has triggered a resize. When a resize happens, it's going to reserve another chunk of memory and it's going to migrate all the data. It's going to migrate the data from the original bucket to the new set of buckets in memory. Normally it duplicates the number of buckets each time. And in this process of migration, it's going to evacuate the old bucket. That means pick all the keys and the values and store in the corresponding bucket in the new set of packets. That implies that the keys have to be. It needs to recalculate the hash to decide where it's going to fall in the new buckets. Okay. And the number of evacuated buckets is going to be one in this case because the old bucket was evacuated and moved to the new buckets, as you can see here. Well the elements doesn't need to be the same of the old set of buckets. What an overflow bucket is, the match gets resized when it reach certain threshold, but cant happen that you have a maps with some data that is not enough to justify a resize, but one of the buckets is completely full and then suddenly you try to add something and it's going to fall in that bucket. And instead of triggering a resize, simplest approach and better approach to optimize the usage of the map is just create an overflow bucket that is going to reserve a chunk of memory to store a new bucket and it's going to store in that bucket a new key and link that to the existing bucket. In this case, as you say, the bucket one have an overflow pointer that points to another bucket in another position in memory that have just more information related to the bucket one. And also you cant see that the number of overflow buckets is now one. What happened when you have big resizes, when you have big resizes in go, what it does is using these buckets and old buckets, variables that are pointing to memory addresses. So what it does is reserve a new set of buckets, doubling the size of the number of buckets, and are migrating the data to that buckets. That new buckets is going to be buckets and the previous existing buckets is going to become all buckets. And it's not going to migrate all the data in one time because that will take extra time that can have some performance impact. So what it does is just start migrating the data gradually with each operation. This way you don't have a big blocked. When you need to resize a map, you have just resized the map and it's not blocking that much and it's just getting some performance degradation during certain time. And whenever everything is migrated, the map backs to normal. All buckets and number of evacuated buckets is used for this process of migration. Once all the buckets in the old buckets are evacuated, the system just released that memory. Okay, here is the code. If you want to reproduce that, you are going to find them at the end of the talk channels. For channels, I'm going to use a channel of infirty two. Again just using unsafe to access the memory. Storing that in a structure, the microscope more or less the same. Getting that structure, printing that in a readable way. The subject in this case is a channel. More specifically, I'm going to use the buffered channels because it's more interesting and you can infer how a non buffer channel works. Basically not assigning a buffer of zero to this. And you are going to have exactly the same behavior. You can infer the behavior based on that. A channel is going to have a queue count that is the number of elements stored in the buffer. A data queue size, that is the number of elements that the buffer cant store. The buffer itself, that is just a chunk of memory. In this case, it's a chunk of four positions of in 32 because it's a buffer. Channels with size four and the channel type is in 32. The element size is the size of each element in the buffer. The closed defines if the channel is open or closed. The element type is appointed to, in this case in 32 representation. So it's a pointer to the type that is stored in the channel buffer. The send x and receive x are two indexes that points to the position in the buffer that is going to be the next place where you send something or where you receive something. The receive queue and the send queue is a two wait queue list. A wait queue is basically a place where you can store go routines that are waiting for something. So in this case, the recipe and the send q are where the channels, sorry, where the goroutines wait for the channel to have data or to have a space for data. Okay, let's see an example. If I create a channel of infertility, two with four positions, with a buffer of four, what I'm going to have is a Q count of zero data queue size of four, because I have space for four elements. And the element size is going to be four because it's an in 32, that means four bytes. The buffer is going to contain for now, because it's just a four positions buffer of four integrals. That is no elements yet. Okay, what happened if I add one element to the channel? If I add one element to the channel, the Q count is going to be increased to one because I'm adding a five. That five is going to be a store in the buffer. Where the send X index was pointing to, was pointing to the first position in the buffer. So I store five in the first position in the buffer and increment the senex to the next position in the buffer. What happened if I add more data than the buffer can store? In this case, I'm adding 4321. If I add 4321, I'm going to add four in the senex position. That was the second position. The senex is going to be increased and I add three, and then I add two and so on. But whenever I try to add the one, the Q count is already four, so there's no enough space for adding a new element. So what the goroutines is going to do is going to store himself in the send queue and it's going to park himself to wait until, to wait until there is some space in the buffer. What happened if somebody reads something? If somebody reads something, it's going to receive that five because it's going to read the data in the receive X position. That was the first element in the buffer. So it's going to read that data and return that data to the requester and it's going to increase the receive x. The data queue size hasn't changed, but hasn't changed because once we read that data, we see that the San queue have somebody waiting there. So we wake up that go routine and that goroutines automatically inserts the one value. The value that was waiting before to try to insert is going to get inserted in the position where the ten x was. And that's the reason why we have still a four in the q count and the first element in the buffer now is one. And the receive index and the send index have moved one position. Okay, if I read more data from the channel. What is going to happen is now it's going to decrease the amount of data. The Q count to three. Because there's nothing. Adding more data and there's no more data. Now it's going to read where the recifex was. That is in the second position. So it's going to return a four. That is what was in the second position before. And it's going to increase the recifex. And that's it. What happened if I read more data than I have in the buffer? Well, it's going to start reading data. It's going to read the three, the two, the one. And when it reached the point where there's no more data. The goroutines that is trying to get data from the channel that is empty. Is going to add himself to the receive queue. And park himself waiting for new data to come in in the channel. If somebody sends something to the channel, it's going to wake up that goroutines. But what happened if I close the channel? If I close the channel, everybody that is connected to receive data from the channel. Is going to receive the zero. Well, the closed message from the channel in this case is going to go to the receive queue. Send that message to any receiver that is in the receive queue. And it's going to change the closed value of the channel to one. And that's it. Here is the code if you want to try it. It's interesting. Okay, some references. If you want to know more about slices, maps and channels. Probably reading the go runtime code is not complicated. It's not super straightforward, but it's not especially complicated. You can get more or less what is going on. Actually, I wrote this talk just reading that information. If you want to reproduce that experiment, you can go to my GitHub. Repo. Dissecting go and just reproduce these experiments is just a lot of fun and some conclusions. Well, understanding the building blocks of the language is going to help you to understand the implications that that building blocks comes with. Because sometimes these building blocks have some trade offs in it. Have some decisions that were made by the. By the Go team. And if you understand why that decisions are there and how they really behave, you can use it better. Well, also take into consideration that there are some unexpected behaviors that can be surprising, especially the slices. One is really interesting. And probably this knowledge is not going to be needed for anything. That you are going to do in your work, but it can help in very specific situations. So that's it. Thank you,
...

Jesus Espino

Staff Engineer @ Mattermost

Jesus Espino's LinkedIn account Jesus Espino's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways