Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Concurrency primitives of Golang, when to and when not
            
            
            
              to use some of them. Over the years I've seen a lot of code being
            
            
            
              refactored every time there's a marginal change in
            
            
            
              complexity because of some concurrency primitive, and I really wish
            
            
            
              I knew some of these things which I probably will talk about.
            
            
            
              And that could actually save a lot of channels and
            
            
            
              a lot of toil just having to go about refactoring this
            
            
            
              while speaking with concurrency. One of the important things to discuss
            
            
            
              is what exactly is a race? A data
            
            
            
              race? To be fairly honest, the most common definition
            
            
            
              that comes to the mind is reading and writing to a same memory
            
            
            
              location. But before we get to that, there's an important concept
            
            
            
              that needs to be covered, which is memory ordering.
            
            
            
              I'll probably show a small snippet of a code and
            
            
            
              I want you to guess what the output would be.
            
            
            
              Here's a simple piece of code. There's a declaration of VAR a
            
            
            
              and b which is type int. There's a function f assigns
            
            
            
              a equal to one, b equal to two. There's a function g which does
            
            
            
              print b and print a. You can
            
            
            
              ignore the fact that there will be a data race in here,
            
            
            
              because the interesting bit is it is possible
            
            
            
              that g prints two first and then zero.
            
            
            
              Pay slight attention to this. What is the possibility
            
            
            
              when it prints two and zero when print b observed
            
            
            
              the value of b as two, whereas the print
            
            
            
              a which happens after print b did not yet
            
            
            
              observe the value of a equal to one which was
            
            
            
              executed before b equal to two.
            
            
            
              Why does this happen? It is because compilers at
            
            
            
              the time of compilation or at runtime may
            
            
            
              do perform instruction ordering change.
            
            
            
              So it's not guaranteed that what you see as a equal to one
            
            
            
              and b equal to two outside of the block outside of
            
            
            
              these scope of that function would be observed in the same manner
            
            
            
              and things is a memory ordering problem. And this results
            
            
            
              int everything else where we need to now start serializing,
            
            
            
              because what we're trying to do is we are solving this unpredictability.
            
            
            
              Here's a sample piece of code. It's a pseudocode
            
            
            
              to comes it may look like Python, to others it may look like go.
            
            
            
              We have a bunch of URLs. There is an
            
            
            
              array of responses where we will gather all the responses that
            
            
            
              we get by fetching the URLs. There's a small counter of total
            
            
            
              bytes received. Simply put, for I in URLs,
            
            
            
              fetch the URL using a pool. So we're trying to restrict
            
            
            
              the number of parallel connections that can be made outbound.
            
            
            
              Once we receive the data. We want to write it to a response array
            
            
            
              and also update total bytes received. The control
            
            
            
              must block until all the URLs are fetched
            
            
            
              and return the response altogether. The total
            
            
            
              bytes received are also being sent by telemetry, but that is not
            
            
            
              relevant. There are four aspects of
            
            
            
              concurrency control which are happening here.
            
            
            
              First is the handoff. The handoff is once
            
            
            
              these parallelism is done, or I'm liberally using
            
            
            
              the word parallel here because that's confusingly used, parallel and
            
            
            
              concurrent. Once that is done, the control must pass
            
            
            
              on to the main routine, and that's a handoff. So somebody
            
            
            
              has to wait to see if all the workers
            
            
            
              are done working on what they have done, and only then it must
            
            
            
              proceed forward. There's a concept of data sharing here.
            
            
            
              Each of the fetch URLs we want to do in a
            
            
            
              non blocking manner, which should be able to send
            
            
            
              data to be returned to a response array.
            
            
            
              There's a serialization happening as well because we do
            
            
            
              not want all the URLs to be executed together.
            
            
            
              There is a control mechanism which will allow only
            
            
            
              certain number of URLs to be hit, and that's these property of
            
            
            
              a pool. Then there's also a data consistency problem here because
            
            
            
              we want to channel the total bytes received, but because
            
            
            
              there are multiple threads that may be performing this, multiple go routines
            
            
            
              that may be performing this, multiple processes that may be performing this. Based on the
            
            
            
              language implementation, the value of total bytes received
            
            
            
              must increase collectively. That's a data concurrency
            
            
            
              part as well. So to highlight these again, what are the four challenges?
            
            
            
              A control handoff which is passing on
            
            
            
              the work, the control from a set of workers
            
            
            
              routines to the main serialization. We want
            
            
            
              the ability to control what gets executed
            
            
            
              at what point of time and what gets to wait, and a control
            
            
            
              to block unblock data sharing work
            
            
            
              while may be happening in parallel or in non
            
            
            
              blocking asynchronous manner. The data must be receivable
            
            
            
              by some other process as well, and data consistency at
            
            
            
              all points of time. If we have agreed to increment the
            
            
            
              integer, then everybody should see a fresh value of an integer
            
            
            
              which is incremented. This is also
            
            
            
              a derivative of saying that while multiple processes are coordinating with
            
            
            
              each other, they will usually have a consensus problem. To solve
            
            
            
              a consensus problem, the control needs
            
            
            
              to be offloaded to somebody outside the system.
            
            
            
              And this is exactly why we need synchronization primitives.
            
            
            
              I'm going to call it a marshall. Not to be confused with Marshall Json.
            
            
            
              Like a game marshall. What kind of
            
            
            
              marshals exist to achieve this synchronization?
            
            
            
              There is a sync weight group. There's a sync mutex,
            
            
            
              anatomic channels and lockless data structures.
            
            
            
              I'll probably not go into the depths of the definition of
            
            
            
              defining what these are, because that is almost pretty much available.
            
            
            
              I want to narrow this down to what semantics can be used
            
            
            
              for what aspect of the problem statement which we discussed
            
            
            
              just a few slides back over here. Control handoff,
            
            
            
              serialization, data sharing, and data consistency. The first
            
            
            
              one is a sync weight group. Fairly straightforward. You must have seen
            
            
            
              this all over Golang, blogs, et cetera. Pretty straightforward to use,
            
            
            
              but what is it used for? It's precisely only and only used
            
            
            
              for control handoff. What is it not used for is it's
            
            
            
              not used for serialization. What I mean by that is that using weight
            
            
            
              groups you are more often than not only and only
            
            
            
              just waiting for control. As the name suggests, it's a wait group
            
            
            
              to reach the main routine while some other parallel
            
            
            
              routines are doing their job. It's not used for data sharing.
            
            
            
              There is no way you can share data, and it's not used for data consistency
            
            
            
              either. Using things definition I'm going to through
            
            
            
              the slides, there'll always be a fun section where we talk about some
            
            
            
              fun mistakes. These fun mistakes are usually
            
            
            
              things that we would miss out on, and it
            
            
            
              gives us an insight on how these things work at the depth.
            
            
            
              Now here's a simple program. What is wrong here? To walk
            
            
            
              you through the code, there's a funk main which declares a weight
            
            
            
              group. There is a for loop zero to 10 weight group add
            
            
            
              simple construct. There's a go routine being called where we call hello
            
            
            
              inside hello, we do a print and then we just wait.
            
            
            
              Classic example of control handoff where we are waiting.
            
            
            
              What could possibly go wrong here? What would go
            
            
            
              wrong here is that when you run this code,
            
            
            
              it runs int this problem of all goroutine are asleep.
            
            
            
              It runs into a deadlock problem. But weight group
            
            
            
              is supposed to solve a concurrency problem, not create another problem.
            
            
            
              And the reason for that is that even concurrency semantics need to
            
            
            
              respect the principles of the language. Now what is the principle
            
            
            
              of the language? The principle of the language is I'm
            
            
            
              passing weight group by value and not as
            
            
            
              a reference. So the goroutine see a copy of
            
            
            
              that synchronization primitive and they call a
            
            
            
              done on that and hence it runs into a deadlocks.
            
            
            
              So even the synchronization primitives need
            
            
            
              to cater to the principle of the language.
            
            
            
              The next one is sync mutex. What is the sync
            
            
            
              mutex used for? It's used for serialization
            
            
            
              by using mutexes also locks,
            
            
            
              we have the ability of controlling which routine should work, which routine
            
            
            
              should not work. When I say which, you're not identifying
            
            
            
              a routine in particular, but you have the ability to rewrites code.
            
            
            
              These only one can get executed while the other is waiting
            
            
            
              for a read or a write operation to happen. So things ability to serialize
            
            
            
              operations is what you get because that's what locks do effectively.
            
            
            
              And it also allows data consistency,
            
            
            
              because while inside a lock you can increment a
            
            
            
              value, so the other reader cannot read the value. But what
            
            
            
              it doesn't do, it doesn't do a control handoff because
            
            
            
              it's used as a communication exchange control
            
            
            
              mechanism between two processes, and usually not used for a handoff, which is where
            
            
            
              a weight group comes into the picture. And it's also not used for data sharing
            
            
            
              because there is no construct of it carrying its own data.
            
            
            
              What are the fun mistakes around mutexes int?
            
            
            
              A simple counter which has an embedded mutex inside it,
            
            
            
              a pretty well known construct in Golang int also has an
            
            
            
              integer n. There are two methods. There's an add method which
            
            
            
              acquires a lock and increments n by the integer pass. There's another
            
            
            
              value method which also rewrites a lock and
            
            
            
              returns the c n. So effectively there should
            
            
            
              not be any race condition here, because we are judiciously using unlocks.
            
            
            
              Now what can go wrong here? If I run this code, when I
            
            
            
              run this code, I surprisingly actually run into a race condition.
            
            
            
              Now why is that? Because we just discussed that even synchronization
            
            
            
              primitives need to respect the principles of the language.
            
            
            
              So what's happening here? What's happening here is
            
            
            
              counter is not a method which has been
            
            
            
              bound. Value is not a method which has been bound. Cto the pointer reference.
            
            
            
              What that means is that when value is invoked,
            
            
            
              the state of the counter is copied and so is the
            
            
            
              sync mutex. And while it is getting copied, it tries
            
            
            
              to read n because somebody else had still acquired a lock
            
            
            
              on that n. Effectively, it doesn't even matter
            
            
            
              what you have inside the body of the code just by accessing value,
            
            
            
              because the process of calling value invokes
            
            
            
              a copy to happen, and by the virtue of that a
            
            
            
              read to happen on n, and hence it results in a race condition.
            
            
            
              So even if I was to replace that entire body of the
            
            
            
              code with a return one, this would be the exact same
            
            
            
              race condition. How do we solve this? We just
            
            
            
              make sure that we actually use a pass by reference here.
            
            
            
              So the counter should be bound, the pointer should be binding,
            
            
            
              the value method, not the other way around. It's a fair conclusion
            
            
            
              to say that while resolving a consensus problem,
            
            
            
              the marshals themselves are not immune to the consensus
            
            
            
              problem. They themselves have to adhere. CTo all of these principles,
            
            
            
              underneath all these mutexes weight groups is the
            
            
            
              third Marshall, which is the bedrock of all synchronization, which is these
            
            
            
              sync atomic. Now surprisingly, atomic has very
            
            
            
              limited applicability as such directly.
            
            
            
              And most of the times we end up using the constructs
            
            
            
              on top of this. But it is the underlying construct which
            
            
            
              is used behind most of these other primitives,
            
            
            
              which are higher order. Now, what is an atomic use for
            
            
            
              data consistency? All int, all through, nothing else.
            
            
            
              There is a value. I want to atomically change that value,
            
            
            
              update an integer and do nothing else.
            
            
            
              It is not uses for control handoff. It cannot be used for serialization.
            
            
            
              I mean there can be derivatives of it, but by direct virtue of its own
            
            
            
              properties, it doesn't do this by primitives.
            
            
            
              Weight group is a derivative of this. So those allow because using atomics you
            
            
            
              can create such higher other primitives as well.
            
            
            
              Like we'll discuss later, lock free data structure. And it doesn't
            
            
            
              do data sharing. It's not used for data sharing because it doesn't carry
            
            
            
              it from data like channels do.
            
            
            
              Now here's an applicability of atomically performing
            
            
            
              a load and store. There's a simple map type map
            
            
            
              string string. There's an atomic value inside m store.
            
            
            
              We basically save a map. Then there's a read function and there's an
            
            
            
              insert function. Read function just atomically does
            
            
            
              a m load and converts the value of type
            
            
            
              to map. Insert, however, tries to update
            
            
            
              this value so it performs can m dot store.
            
            
            
              Now just int this simple piece of code while,
            
            
            
              and this is a piece of code which has been copied from the Golang documentation.
            
            
            
              So what can possibly go wrong here? What can possibly go wrong
            
            
            
              these is that if multiple writers
            
            
            
              try to invoke the insert method, the construct is not immune
            
            
            
              to it. Now what we do is we are doing a
            
            
            
              mix and match. So while we are discussing atomics, but we have started using
            
            
            
              other constructs to solve our problems. So we introduced
            
            
            
              a mutex here. And on the insert we just
            
            
            
              guarantee serialization by performing a mutex
            
            
            
              lock and then we just defer it. What this does is that if multiple
            
            
            
              routines are trying cto invoke the insert method only
            
            
            
              prone of them runs at a time because mutexes are great at serialization.
            
            
            
              We just agreed that and now we will
            
            
            
              just perform this mutex block. Next up,
            
            
            
              what would you expect the output to be here there's
            
            
            
              a funk a. There's a main method. Inside the main method
            
            
            
              there is xy two integers. There is
            
            
            
              a print ln, which is basically calling the a
            
            
            
              function. Inside the function we pass the pointer of
            
            
            
              the integer inside the body of an a. It accepts
            
            
            
              an integer and it returns a channel. The channel,
            
            
            
              just before it returns, it spawns a go routine and it
            
            
            
              does can atomic increment of an integer value.
            
            
            
              There are two operations of this. In one variation we
            
            
            
              simply call log print ln and we pass the
            
            
            
              two fun we call the two functions and we read the channel value from
            
            
            
              it. In another variation we store it into two
            
            
            
              variables, ab and bv. There are two functions, a and
            
            
            
              a called again, and the second line of it will
            
            
            
              read from that. They don't look entirely different if
            
            
            
              you think about it, because what we have just done is that in the second
            
            
            
              variation we have taken the initialization one step ahead.
            
            
            
              So it's two stage. Now it's not just one stage. What would you expect
            
            
            
              the output to be? It's interesting that when I
            
            
            
              run this code the first execution
            
            
            
              gives me a two and a four, whereas the second
            
            
            
              execution gives me a four and a four. Now which is understandable,
            
            
            
              the four and four part. Why? Because it's the same atomic
            
            
            
              operation. And remember, atomic operations were about data
            
            
            
              consistency. So if it's the same integer
            
            
            
              which has been incremented twice and read on the channel it will
            
            
            
              obviously be four. But why did the two happen? To answer
            
            
            
              this, let's add some log statement. Now what we do
            
            
            
              is just after the atomic ad I've added a log print ln
            
            
            
              and there's a time sleep of 2 seconds. So what that means is
            
            
            
              that before the value is returned it's actually going to sleep for 2 seconds.
            
            
            
              Now when we run this, the first execution
            
            
            
              prints one prone and two four. The second also prints one, one and
            
            
            
              four four. But interestingly, if you observe the timestamp,
            
            
            
              the first int happens at the 16th second and then
            
            
            
              it waits for 2 seconds and then int prints one and waits another
            
            
            
              2 seconds and returns two and four. Whereas the second implementation
            
            
            
              on the 20th 2nd, both of them print one and then after
            
            
            
              2 seconds because both waited for 2 seconds return a four and four.
            
            
            
              So effectively what's happening is the arguments inside
            
            
            
              the function are executed serially. There's a classic
            
            
            
              mistake that we do when we're dealing with this. A simple way to solve
            
            
            
              is use the lower construct, which is actually concurrent
            
            
            
              and not the first one because that would happen one after the other. Now this
            
            
            
              can result in a situation because it is possible that you
            
            
            
              may get blocked on the other value because the one hasn't worked.
            
            
            
              Clearly evident in this code here because it happened as
            
            
            
              two and these, the other value came as four. We also discussed that
            
            
            
              atomics and mutexes can both actually be uses for data
            
            
            
              consistency. So are they interchangeable? Is a common question.
            
            
            
              Atomic, as we said, are the bedrock of int
            
            
            
              using that others are done now? Yes and no,
            
            
            
              because there's only one case where you can actually interchangeably call
            
            
            
              them as performing the same role, which is assume there was
            
            
            
              an integer operation where we had to increment something. Now I may use atomic
            
            
            
              add int, or I may actually take a lock and do an unlock and apply
            
            
            
              it. Int will give me the same behavior at the end of the day.
            
            
            
              And this is the only time where we say that oh yeah, there may be
            
            
            
              overlapping concerns, but interestingly atomics are actually way faster,
            
            
            
              so you may be able to do more number of atomic operations compared with
            
            
            
              the locks because atomics are sent directly to the CPU and
            
            
            
              these are no language. VM ordered locks which are
            
            
            
              being performed. So you would actually see a drastic performance gain. But all
            
            
            
              over Golang we've always read and seen please do not communicate
            
            
            
              by sharing memory. Instead share memory by communicating, which takes
            
            
            
              us to the channels. Now what are channels used for like
            
            
            
              atomics are only and only used for primarily data consistency.
            
            
            
              Channels are used for everything, which is other than that, which is control,
            
            
            
              handoff, serialization, data sharing. It's surprising how much you can actually
            
            
            
              start doing with channels. I mean it's only not used for data consistency
            
            
            
              now because channels are so widely used.
            
            
            
              I'm going to cover a lot of these fun mistakes.
            
            
            
              Fun for me, maybe not fun for your others. Look at
            
            
            
              things. Piece of code. What could be going wrong here?
            
            
            
              There's a funk request which returns an InT. There's a channel,
            
            
            
              there is an I, it loops over zero to
            
            
            
              five increments at plus plus, and it
            
            
            
              writes the value to a channel. What could go prone?
            
            
            
              These. What can go prone here is that
            
            
            
              those goroutine are waiting to write on
            
            
            
              a channel. Now if the receiver or the reader of
            
            
            
              the request only read it once, there are multiple
            
            
            
              invocations of those functions which are just blocked forever
            
            
            
              because they are just waiting for the channel to clear up to rewrites.
            
            
            
              To solve for this, we have constructs. What we do is we
            
            
            
              wrap it inside a select clause which tries to write to
            
            
            
              a channel, and if it doesn't work it falls to the default,
            
            
            
              which is an empty. Surprisingly, this is also not the correct
            
            
            
              way, because this can also run into a unique condition, which is the
            
            
            
              last line can block forever now, because it's possible that
            
            
            
              even before the channel is read from
            
            
            
              outside of the request, the other goroutine tried
            
            
            
              writing, found nothing, and fell through
            
            
            
              the default section. So now the return channel
            
            
            
              is waiting forever. One of the most interesting questions
            
            
            
              that I always face when
            
            
            
              I'm writing code as well, is when and who should close an
            
            
            
              open channel. There's one of the things that I never thought, and it always takes
            
            
            
              me the hard way, a simple piece of code. Again, it's a for loop.
            
            
            
              Inside the for loop there's a value which is returned to a channel which has
            
            
            
              been produced. So you can think of it like a producer. And underneath there
            
            
            
              is a receiver which basically ends the value, appends it to an array.
            
            
            
              Once we are done, we print. When I print, I find that
            
            
            
              one is missing. Now, this would happen if
            
            
            
              my program exited, obviously this loop. After code C, my programs
            
            
            
              exited. Now, because WG weight was a weight
            
            
            
              group on these producers. Once producers were
            
            
            
              done producing and the channel was closed immediately,
            
            
            
              it's possible that the receiver has not yet
            
            
            
              performed the operation on receive and the function got exited and
            
            
            
              the program got exited. And hence that's the reason why I did not see
            
            
            
              a one. So how do I guarantee this, that if ten were
            
            
            
              sent, ten were received in this particular things? First of
            
            
            
              the implementation, the very first implementation, what we do is we
            
            
            
              detach the close responsibility from the sender.
            
            
            
              What we do is we basically make the WG wait
            
            
            
              in another goroutine, and that is the one which closes.
            
            
            
              And the execution of reading through the channel now
            
            
            
              is passed through the main routine in
            
            
            
              which I range over the channel. And even if it is closed
            
            
            
              by the goroutine, there is still an iteration that
            
            
            
              will happen which will still be able to read from the channel. This is one
            
            
            
              way of implementing it. Another way of implementing it, which is actually
            
            
            
              far more prevalent, is when both producers and senders
            
            
            
              and receivers are actually two different go routines which need
            
            
            
              to communicate. And this is where we introduce the dungeon. So you're
            
            
            
              producing and the consumer is receiving. But at
            
            
            
              comes point of time, the control flow must not leave
            
            
            
              these parent function. And hence we have two
            
            
            
              weight constructs. Now we are actually waiting via the wait
            
            
            
              group. We close the channel and then we wait on
            
            
            
              another channel, which is triggered after
            
            
            
              the receivers are done receiving. So these are two
            
            
            
              constructs of it. What else can go wrong with the channel?
            
            
            
              Let's look at things long running piece of code. It accepts
            
            
            
              a messages channel. What we're trying to achieve here is read from
            
            
            
              messages, and if any of those reads take more than
            
            
            
              a minute, then just return. And a bot, it's a
            
            
            
              fairly straightforward code. It compiles. But what's going wrong here,
            
            
            
              what's going wrong here is that channel can still decommemary, because inside each
            
            
            
              code operations we are creating a new time after.
            
            
            
              And this keeps on getting created, created, created.
            
            
            
              All we wanted to do was we just wanted to wait for
            
            
            
              a minute. A better way to deal with this is to
            
            
            
              initialize a time dot after, save an allocation, keep reusing it,
            
            
            
              keep looking for t channels output. If int happens after a minute,
            
            
            
              it will return otherwise formed. Print ln. Once it's
            
            
            
              done with the select, just do a reset. What this does is it doesn't
            
            
            
              keep creating those channels and the time dot after in
            
            
            
              every single iteration. Now all of these are about when two people
            
            
            
              are trying to compete for a resource here,
            
            
            
              everything, every such construct was about that. One of the best ways to
            
            
            
              resolve a conflict is to avoid int from happening. Now this is where
            
            
            
              we introduce the notion of lockless data structures. It's a study
            
            
            
              in itself on how lockless data structures are implemented.
            
            
            
              Why I stress on lockless data structures is because generalized
            
            
            
              lock free algorithms are very hard to implement. Designing lock free
            
            
            
              data structures is relatively easier. My slides have a bunch of
            
            
            
              these links which are from Microsoft research and other
            
            
            
              research papers which talk about building these common data structures which
            
            
            
              are actually lockless. And most of them use atomic underneath. Because as
            
            
            
              we discussed earlier, atomic is almost the bedrock of all concurrency control.
            
            
            
              Many a times. What also happens is that a simple
            
            
            
              solution is actually way better than a smart solution. If you look
            
            
            
              at this piece of code, there's an input array of integers,
            
            
            
              and there's an output array of integers. We need to run over the input array,
            
            
            
              perform a go routine, which is something,
            
            
            
              it does something, get the output of it and append it to the
            
            
            
              output array. The simplest way to do this would be using a lock,
            
            
            
              because now we're trying to serialize the access to the result output
            
            
            
              array. Or do we really need a lock here?
            
            
            
              A simpler way would just be to use an array and
            
            
            
              use positional index. Now arrays, as long as they
            
            
            
              are positionally accessed, can actually leave and do
            
            
            
              not require a need of a lock. We can just simply do run
            
            
            
              with an index. Once we get the output, we access the
            
            
            
              array via the positional index, and it does
            
            
            
              the job so many times we actually may not even need a
            
            
            
              very sophisticated concurrency control mechanism because this itself will do our
            
            
            
              job. Just to recap, what is the
            
            
            
              need for concurrency constructs? Because we want to solve
            
            
            
              the unpredictability problem, we want control.
            
            
            
              There are two kinds of that. There are control constructs
            
            
            
              which are used for control handoff. There are data constructs which are
            
            
            
              used for data sharing, and data consistency. Constructs also
            
            
            
              need to adhere to language principles. We saw how weight groups,
            
            
            
              how counters, or how channels as well can actually misbehave
            
            
            
              if you're not careful about how we are applying them. And now
            
            
            
              we also probably understand which prone to use when to recap
            
            
            
              were to look at the same problem again. Now when
            
            
            
              we look at this, we probably are better equipped to handle this problem.
            
            
            
              Thank you. I'm CTO and founder at last
            
            
            
              nine. We are an observability company and I'm Pushova.