Conf42 DevSecOps 2021 - Online

Building our own custom Code Insight tool at Form3

Video size:

Abstract

Form3 are on a journey in scaling up - we are expanding our codebase and our engineering teams as fast as we can! In this talk, we present Code Insight, our tool for scanning our code for vulnerabilities:

  • First, we introduce how we work and deliver code at Form3 to set the scene of our DevSecOps practices.
  • Then, we move on to discuss what the requirements of Code Insight to fit our practices at Form3 are.
  • Next up, we present the Code Insight architecture that we have built using Github webhooks and AWS technologies.
  • Finally, we round off the presentation with our lessons learned and next steps.

Join us to learn how we used Code Insight to scale and deliver faster than ever before! 🚀

Summary

  • Form three was founded in 2016 and we've been growing and scaling ever since. We currently have around 260 employees, of which about 130 are engineers. We are a fully remote company and we're hiring. We will be learning about our journey with code analysis at form three as well.
  • Form three has over 500 repositories in different languages such as terraform, GO, Java, Yaml. We needed a centralized source code scanning solution that could integrate well with development workflows. Code Insight project aims to address some of the problems that we were seeing with Sexcam.
  • code insight is installed as a GitHub app on each of our repos. Even the quietest project has its code scanned every 24 hours. Centralizing the config keeps the honest people honest. And the infrastructure scales to meet our demands.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
You welcome to our talk about code Insight. You know who we are from our introductions, but let me introduce form three as well. The name form three is derived from third generation cloud technology and a shortened version of the word platform, form three. We have some great customers, which you can see on this slide, who choose our payment platform as their payment processing solution. Form three was founded in 2016 and we've been growing and scaling ever since. We currently have around 260 employees, of which about 130 are engineers. We are a fully remote company and we're hiring. Today. We will be learning about our journey with code analysis at form three as well. AWS how we built our custom tool code Insight we will begin with an introduction of engineering at form three. This will set the stage at how we deliver code at form three and what challenges our teams face. Next, we discuss the requirements that we needed from our custom tool, followed up by the architecture we have implemented as part of the code Insight project. Finally, we round up our journey by discussing its adoption into our teams and some insights or lessons learned from the project. We have a lot of ground to cover, so let's get started. Let's begin with a discussion of engineering at form three. As I mentioned in the intro, form three have been around since 2016 and we've been growing ever since. The way we deliver software has evolved together with our organization. We now have over 500 repositories in different languages such as terraform, GO, Java, Yaml, which our developers contribute to at different frequencies. Some of our repos are under development and some are not actively maintained. Built are important to our platform. We are a rapidly growing engineering organization. This means that we have engineers which are quite new to the code base, contributing to our live services. They need support and quick feedback on the code they deliver. Our platform is compliant with the highest standards of security and should be actively maintained to remain free from vulnerabilities. We use Travis CI as our continuous integration tool and all of our code checks integrate with it built. Most importantly, code ownership and the Devsecops mindset are at the heart of everything we do at form three. We firmly believe that security must also play an integrated role in the full lifecycle of our apps in order to take advantage of the agility of DevOps. Our code analysis tools should make it easy for teams to deliver secure solutions. The first static analysis tool we used was Salis, open sourced by Coinbase. It gave us a consistent set of static analysis tools across all of our repos. All the analysis tools were bundled into one Docker container. The docker container was then downloaded source mounted and the scans run from the container for each build. The solution worked well for a while, but the containers became sluggish and heavy. Afterwards, we re implemented our own lightweight solution called SeC can as a replacement form salas. It was designed to wrap multiple static analysis tools and to allow our tool can to be easily tweaked and reused across multiple repos. Checks were run in a docker container with a configured token, which meant that each service needed its own token. Injected scans were configured for each repo via a make file or a Travis YAML file. Sexcan brought us two big advantages, standardized scanning across all repos in a single place to maintain our scanning tools. However, while Sexcan delivered on its promise of a more lightweight scanning tool, it introduced some other problems which were accentuated by our growing engineering team. First, there was no enforcement. Sex can had to be manually configured on every new repo and could be made optional. Second, we had no visibility of which repos were configured to use it and which repos were failing their scans. Furthermore, Sexcan only ran on prs, so the repos that were no longer under active development were never scanned. And finally, it was difficult to manage adoption and updates as the config was spread out across every repo. All of them needed changing once updates were rolled out. As we continued to grow and have more services and repositories, it became necessary to move to a new solution. Some key requirements for the new code scanning solution, code insight, were identified by the infosec team. We needed a centralized source code scanning solution that could integrate well with our development workflows. A central configuration makes it easy to maintain and change system wide. Custom code Insight tool provide metrics for scanned repositories, even those that are not being actively contributed to. It is very scary to have repositories that are not scanned that could get new vulnerabilities over time. Custom code insight tool be easy to add and enforce for new repositories. It should also provide a workaround to stop it from blocking emergency releases. With all this in mind, the team kicked off the new code Insight project to address some of the problems that we were seeing with Sexcam. The project has been the work of many of our amazing engineers, which you can see on this slide. I will now hand over to one of these awesome contributors and my co presenter Ross, who will walk you through the rest of our exciting journey with code insight. Thanks Adelina. Right, let's take a look at how code insight scans one of our repositories code insight is installed as a GitHub app on each of our repos, and when a pull request is created, our GitHub app is notified by a webhook from GitHub on our side that looks like a lambda function behind an API gateway. So this first lambda function fetches the code from GitHub and calls GitHub back to say that we're going to be performing some scans on that pr. It looks at the content of the code and uses that to decide which scans need to be performed, and then it puts a message onto a request queue to say please go and run these scans. That request queue is then consumed by an orchestrator lambda, and that records some details about the scan, and then creates a message on the task pending queue, which gets picked up by the scheduler lambda. Now, the scheduler lambda actually runs the scan as a task on Amazon's elastic container service, and for this we're using a serverless Fargate cluster. Now, inside Fargate, each of our scans runs multiple containers in a single task, so we wanted to keep the scan container itself really simple so it's easy to add new ones. So all the peripheral jobs have been pushed out to other containers. The first container clones a repository and writes it to disk, so that the scanned container can just read it from disk and writes its results to disk. Once that's complete, two more containers kick in to process those results. One of them writes comments back to GitHub, which we'll see later, and another takes the results and puts them on an s three bucket for persistence. Finally, once that's all done, a notification container takes over, which writes a message on a queue to say that the task is complete. So when that task complete message is consumed by the orchestrator, it can update the records and emit an event for the notifier, which in turn will update GitHub. Now, once all of the scans are complete, the notifier can set the final status of the check on the pr. So we run this whole process for every pr every time there's a change coming into one of our repositories. But we also scan all of our repositories default branches nightly, meaning that even the quietest project has its code scanned every 24 hours. If one of those default branch scans fails, we'll notify the owning team over slack. So what does this look like to one of our engineers? Well, in the GitHub check information, they'll be able to see each of the scans that was performed, the result, and a link to the scan in the code insight UI. In some cases we want to allow a build to pass even if a check has failed, and those are labeled here as soft failures. So this can be really handy for existing repositories where the team is still working through some issues, or when we're experimenting with new scans. And we don't want to risk breaking everybody's builds. If a user clicks through, they'll be able to see details of the scans. So this is the full suite, each of the scans that have been performed, and they can click through and see the full content of the logs from the s three bucket we saw before. But really the best user interface for code insight is within GitHub. So my colleague Adam recently finished off the feature so that engineers will get fast feedback as comments directly on their pr, right alongside the offending line of code. So to do this, we need the scan output to be consistent, and we're adapting our scans to use the sarif standard wherever possible. So this shift from SEC scan to code insight has given us a few real benefits. Centralizing the config keeps the honest people honest. So now switching a scan off requires a pull request into a central repository, which the information security team will review. It also prevents an attacker from introducing some malicious code in a PR and also disabling the check that would look for that malicious code in the same PR. And also this infrastructure scales to meet our demands. So it's a fairly diurnal flow. So we see engineers creating a load of prs during the day, and it tends to be quite built overnight apart from the nightly builds. And the infrastructure can scale to adapt to that. It's mostly serverless, although I should mention this might change over time because we've just hired our first canadian employee with more to come so we could see changes in this pattern, but we think that the infrastructure is going to scale well for that. So how did we actually bring this in at form three? We didn't just drop this in and break everybody's builds on day one. So for existing repositories, we started with soft failures at first, so can would just report failures for information but not block builds. And we took a ratcheting approach to things. So once a repository was passing, we would then enforce it, and we wouldn't allow it to go back for new repositories. Once we were content that this worked, we enforced code insight everywhere on all new repositories. So people are getting that out of the box by default. And we started to gather metrics recently so that we can assess vulnerabilities that are raised by code insight scans, but also so that we can check code insight's performance and make sure it's not getting in people's way. But for those existing repositories, we needed to drive adoption across teams. We need to encourage our engineering teams to fix the issues and get their builds going green. And we did this in a few different ways. So we created some batch prs for issues that affected multiple teams. So where we found something that we could fix in multiple places, we would use a tool, something like turbolifter, to be able to create prs across multiple repositories. And we could offer them up to the feature teams to review and then pull them into their code base to fix an issue. We arranged some mob sessions, so we got people from across the engineering team together to work on improving code insight coverage. And also there was a feature of code insight that was meant to introduce some gamification, which is the team leaderboard. So this is a view of our team leaderboard. But I have to be honest, this has not been terribly successful in driving adoption. So these metrics unfairly reward teams who write very little code product and stigmatizes those with lots of repositories to manage. So on the whole, I don't think gamification is right for this sort of thing. Engineers can just see right through it. So that's one of the things that hasn't worked terribly well. There are a few others. So, firstly, it can be tempting to think that because this system isn't processing billions of pounds worth of payments like our other code, that it's not critical. But if code insight stops working, if somebody introduces a can that just fails everything, then we can bring the whole company's work to a breaching halt, as I found. So we have to treat it like production infrastructure. And to that end, we've introduced ways of canarying new versions of scans, and we're also adding a better test environment and better monitoring. Next thing is that we've had a few chunky repositories, particularly with terraform code, that have proved to be a stumbling block for adoption. So it's much easier to get a small code based passing and then improve coverage incrementally than it is to wrangle some sprawling repository so where you can. Splitting out repositories into smaller units can help. Finally, many of the tools we are using to scan our code are based on databases of vulnerabilities that are held externally. So a minor vulnerability added to one of those databases can cause a lot of our builds to break suddenly if we're not careful. So to address some of these issues, we've got some upcoming features. So we want to be able to distinguish between new and existing issues so that we can target feedback to users. This is going to help in those legacy code bases where the existing issues can make it hard to spot new ones being introduced. We're also going to do more with metrics, so for example, spotting flakiness of scans or some scans that take too long to run. And finally, we want to introduce the concepts of age and severity into code insight so that we only hard fail a built if it violates our policies on remediation. So to wrap it up then, code insight has allowed us to streamline the way we do application security. At form three, we're running scans on every pull request and nightly on every repository, so that even the quietest project is being checked for vulnerabilities regularly. And for our engineers, it's easy to interact with code insight just as part of their normal GitHub workflow. So having code insight comment on your pr is like having an eagle eyed, slightly pedantic colleague looking over your shoulder all the time, which is pretty helpful. On the other hand, gamification didn't add much for us. If you're going to try it, make sure that the metrics driving rewards are fair and meaningful to you. So I think a big part of the success of code insight is the central management and configuration of it, which raises our overall compliance and that allows us to drive improvements in the tool. It's dead easy for us to add new scans, and we can take comfort that every piece of our code is being looked at continuously. And with that, I'd like to say thanks for listening, and thank you to my co presenter Adelina. We look forward to getting your questions on Discord, and please don't hesitate to reach out to us on Twitter. It's always nice to make new friends. Please take a look at the form three engineering site. Once you've worked your appetite there, I'm sure you'll like to get to the career site and become one of our colleagues. And also check out the excellent form three tech podcast run by my colleague Kev. There's some excellent guests on there. Thanks again for listening.
...

Adelina Simion

Technology Evangelist @ Form3

Adelina Simion's LinkedIn account Adelina Simion's twitter account

Ross McFarlane

Technical Lead, Information Security @ Form3

Ross McFarlane's LinkedIn account Ross McFarlane's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways