Conf42 Golang 2021 - Online

DevOps automation with Go

Video size:

Abstract

In his talk, Oliver will show the many ways his team is using Go to automate every aspect of their development workflow from race condition reporting, deploying releases, bridging Freshdesk and Gitlab, versioning internal libraries and many more.

Summary

  • Oliver is the lead developer at Restorepoint, a network automation device backup and restore solution. The company has 120,000 lines of go, plus roughly 2.8 million from external libraries. We use GitLab for our whole development lifecycle. How does our DevOps look like?
  • Restorepoint is hiring software engineers in either remote UK or EU. We're looking for driven and analytical software engineers, ideally with go experience. We can also consider you if you are really experienced in another language. Please come and talk to either me or hit our careers page.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Oliver, welcome to my talk about DevOps automation with Go. So I've been a software engineer for more than 20 years and I discovered go back in 2017 and I immediately fell in love with it. It's a great language to write, but especially to read when you have a large code base to get into. I've been the lead developer at Restorepoint since 2019. Restorepoint is the name of our company, but also so of our main product, which is a network automation device backup and restore solution. It's all written as a go monolith, so we have a single binary which is highly concurrent. We have our own scheduler, HTTP server, FTP server, TFTP server, a lure environment, et cetera, et cetera. And all this runs inside a Linux environment which we tightly control. So most of our customers run it on premise or in their own cloud, and it's updated either manual or automatic by an update server. So we currently have around 120,000 lines of go, not counting comments, plus roughly 2.8 million from external libraries. And we use GitLab for our whole development lifecycle. So how does our DevOps look like? So we have three different release versions. We have two target environments. We do weekly production releases. We could actually release every day if we wanted to, but most of our customers prefer a weekly release. So we released in the middle of the week. But we do our development releases internally. They are released whenever there's a change. So that's continuous. And we have multiple internal tools that make our lives easier. And as you can see down here in that image, that is how our pipeline looks at the moment. So one of our internal tools is the release API, which avoids us having to copy the build artifacts from our build server to the update server. So it's a tightly controlled solution and it's used by multiple of our products, and it's a single binary service as well. And so it has two sides. So the build server sends a call by a post of course, and it sends the final build artifacts as a TGZ, it md five sums the TGZ and then sends additional metadata. So down here I've copied the call that we actually sent to our server. As you can see, there's a lot of additional metadata, doesn't apply for all products, for most of them. And then there's a shared secret between the build server and the release API, so that the release API will only react to calls that contain that shared secret. And then on the receiving side, so the release API receives that post request that I mentioned checks that all required metadata fields for a product have been passed, checks the shared secret, of course. And then it writes the file that's been passed and calculates the MD five sum at the same time, which is quite a nice trick you can do in go by using a t reader. And if the calculate MD five sum is not the same as the one that has been sent in the request, then the release is also aborted. And once all the checks are done, then the metadata is written to an end file as well as the TGZ, and then it's passed to an individual release script based on the product. And this is a single binary service, as I mentioned, and it's maybe 100 lines of code and it's a really nice, like it's one of the powers of Go in my opinion, that you can actually write a web server and very few lines. Another tool that we have is the Freshdesk GitLab bridge. So for our first line support we use Freshdesk. And as developers we only deal with issues in GitLab and our support engineers decide when to escalate issues to us as developers. And we've written a temper monkey script around that which injects a button into the freshdesk UI. So it's quite easy to trigger that escalation process. And it will copy all comments from Freshdesk and all attachments into an issue in GitLab. And it avoids creating duplicates as well. And also make sure that both sides have a link so you know which ones have been escalated and which ones are not. I can show that real quick. So this is a video that I took just, you can see that button over here. This is the injected button, and it will ask you if you really want to do this. And then it will copy the files from a freshdesk and will create a GitLab issue out of the freshdesk issue. And that's quite a neat way for us to deal with customer support without having to expose the whole team to all custom issues. Not all of them are related to development. And also this is a single binary service as well. And then we have another tool which we call the automatic version check. It warns us because we have more than one production release, we have three, actually. It warns us if we are trying to merge mismatched versions. So if I want to say, as you can see here in the screenshot, we have a five three one version and a five four version. When trying to merge that, then I get this warning as a comment and the way it works with merge requests internally you cannot merge a merge request unless you have resolved all issues, like all discussions on a merge request. So this will keep the merch request from being or accidentally merged. This works by a webhook. So this is also a service that's running on a server. And GitLab basically sends all merge requests, or like signals, all merge requests via webhook to this endpoint. And then we use the GitLab API to check the version of the source and target branch. And then we have an additional thing for automating our development workflow. So GitLab has these things called boards, and you can use different statuses, which are labels in GitLab. And these labels, we use them for everything, for the area of the product it applies to, if it's a UI or an API issue, if it's a fresh dust ticket for example, but also for process. So our GitLab issues always go through that stage from open to to do to in development to in review to test to testing, and then eventually they get closed. And we just make sure that we automatically transition issues when a merge request is opened. So the only thing a developer has to do is to actually mention the number of GitLab issue in their merge request, and then the ticket will automatically be set to be in review. And when the merge request is merged, then it's changed to test. And this really reduces the amount of manual updates that we have to do, because as developers we tend to always forget these things. But it's nice to have our issues in the right state so it's clear where we are, what the progress is, et cetera. And then another thing that because we have a highly concurrent piece of software with a lot of lines of code, so we from time to time have data races and go has this nice way of allowing you to detect race conditions, so it will see if a variable is read and written to at the same time. And therefore all of our internal development builds have race condition detection enabled, which is bit of a performance, or it has a performance impact. So I think it increases cpu usage by, I can't remember, but it definitely takes more cpu cycles, but especially memory, I think it doubles the memory usage. So we only do this for development builds internally. And the reason why we have to do this is because most of our race conditions, they happen whenever a certain code pass is hit. And we have course fixed all the low hanging fruit, but there's always something left somewhere and also sometimes it's library code. So we have discovered quite a lot of race conditions in external libraries and then reported that as well. And so we have a lot of internal boxes that replicate all the common usage scenarios that we have, and they run twenty four seven, and then they write race condition error messages into their log files. And then we run this race condition check tool once every day on these individual machines. And then if a race condition is found in logs, then it will automatically create an GitLab issue for each entry. And if an entry already exists, then it will add a comment instead to keep the issue fresh. So I copied here an example of how that looks like in a log. So it starts with warning, colon data, race, that's the start marker. And then it usually goes like right at blah blah memory address and go routine, number, number something, and then the code, the function where this occurs, this is what we use as the title, then everything below. So between the start and the end marker we put into the issue, and this ends up looking like this. So I had to blur, of course, the details for obvious reasons, but it will basically show this, it shows where it occurred, where the write was, where the previous write was, where a read was, and it will automatically label it with the race conditions tag, which is important. So we can actually see if that this was an actual race condition problem. Yeah, and that's a really nice solution for that. And then we have another tool which is for automatic library versioning. So we have roughly 20 internal libraries that are being used by different products, and these are consumed via go modules, of course, and go like semantic version tags. So when you do a go get and then you say the name of the library or the URL of the library, and then add and then the version tag. And we built a tool around that, which is a job that's run on the individual libraries CI CD pipeline. It's a tag job, and it will basically, whenever the master branch of the library is updated, it will tag the library automatically using the last commit message as the description of the tag, and increases the patch level of the previous tag, and therefore create a new version which then can be used in the product that is using the library. And it will make sure that it will either increment any existing tags, or if no tags exist, then it will just create a new one. Yeah, and this is it. So this is how we automate our own DevOps at Restorepoint. And I have to do a shameless plug at the end, of course. So we are hiring in either remote UK or EU, and our pitch is of course, if you're tired of the same old go microservice on Kubernetes pitch, then maybe have a chat with us. As I explained, we ship an on premise go monolith wrapped in a Linux box every week and our customers love it. And yeah, we're looking for driven and analytical software engineers, ideally with go experience. But we can also consider you if you are really experienced in another language and you want to cross train because Go is relatively easy to pick up. Yeah. So please come and talk to either me or hit our careers page. Thank you very much.
...

Oliver Fuerst

Lead Developer @ Restorepoint

Oliver Fuerst's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways