Conf42 Site Reliability Engineering (SRE) 2024 - Online

When Infrastructure as Code Ends - The practical guide to creating Terraform Providers

Video size:


Adding new functionality to Terraform can be daunting: it’s written only in Go (which you may not know) and you have to understand the architecture and work through less than welcoming documentation. I’ll provide a walkthrough from my experience with it, going from zero to publishing a provider.


  • Harel Safra talks about creating terraform providers. There are 4100 plus published providers as of this recording. Anyone can create and provider as long as there's a valid API that you can work with. How to create a provider in coding and live demo.
  • Schemas and attributes are used to map between terraform configuration files and the code itself. These descriptions are later on grabbed by the plugin by the documentation framework to create these lovely documentation files. You probably want to run it locally and debug it in case there are errors.
  • So to wrap it up by journey with the provider started with no provider. I started by learning go and terraform framework. I then created code for the provider and released more features. People appear to be using it. If you have any questions, feel free to reach out to me.


This transcript was autogenerated. To make changes, submit a PR.
Hi, call for it viewers and welcome to this episode SRE 2024 session about creating terraform providers and my journey when I wanted to create one, starting from learning the documentation up until publishing it into the terraform registry. We'll cover a bit of a refresher about what is infrastructure as code and then discuss where should we start and how to create a provider in coding and live demo. I'm Harel Safra. I'm a data platform engineering team lead at riskify. Riskified is a fraud detection company centered around e commerce solutions and my team managed all online databases that are part of our systems. We use SQL, NoSQL, search engines, graph databases to provide our services. Before that, I've been in the infrastructure domain for 20 plus years, managing servers, network switches, stored databases, and anything to that effect. What is infrastructure is code infrastructure code is a programmatic definition of infrastructure elements. That means that we want to define infrastructure, which could be various things like servers and network switches, but it could also be database users or elasticsearch indexes in a programmatic way that allows for repeatable and documented processes where the knowledge is baked into the process itself and doesn't depend on humans remembering what they need to do. There are two general approaches for infrastructure as code. The first one is declarative. We ask the user to define what they want to achieve and the framework compiles that into infrastructure elements. You can think of terraform in that domain. The other approach is imperative. The user defines how they want to achieve something. It's more of a code based approach, and the framework again creates an infrastructure for the user. You can think of Pulumi in that domain. Terraform providers are the plugins that interface with the infrastructure API on one side, and they also interface with a terraform core over RPC on the other side, and they bridge the gap between what terraform core is and what the infrastructure needs on the API to create the infrastructure elements. There are 4100 plus published providers as of this recording, and the main point is that anyone can create more providers. You don't have to associate with the infrastructure vendor itself. Anyone can create and provider as long as there's a valid API that you can work with. You need to understand a bit about the architecture first. The first element is terraform core, which is actually the terraform executable that runs when you run a terraform command. If you run terraform plan, it actually runs terraform core in the documentation language. Terraform Core will communicate over RPC with a terraform provider, and the terraform provider uses native calls inside the process to communicate with a client library that then communicates over the native infrastructure APIs and protocol with the infrastructure itself to provision resources and this native library. This native communication can happen on HTTP. You'll find it in various documentation, but it could also be GRPC or SQL or system calls or anything that infrastructure knows how to interpret. So obviously the first thing you need to understand is the API that's supported by your infrastructure. Find the correct API to interface with that and find an easy way to do that. If you, if there's an existing goal and client for your infrastructure, try and use that. It will probably be easiest. Otherwise you have to reverse engineer the protocol and that can be a bit annoying. After you understood the API, you should learn go. Go is the language that the terraform providers are written in. It's a compiled, high level programming language. I need to understand Go didn't understand go too deeply, but you need to understand control structures and a bit about interfaces and how you go about creating code. I use a step by step tutorial that's found in the Go site tour and it's a good tutorial. It will take you from not knowing Go to have a working knowledge about how to use it. I like Go for its simplicity. It's easy to understand and to learn. I like very much that it's compiled compile find problems before you do in production, there's one thing that need to remember. There's no exception. If you, if you are used to exception handling for other operate languages that you have to check method return values for errors and otherwise you'll find that your code will error out for various reasons because it didn't check the error, the return error so you understand the API you have a working knowledge of go now you need to understand how to create the providers. I used Hashicorp's documentation to learn how to do that. It's located in the developer portal under terraform plugins. You don't need to learn everything in advance, just read the overview and then continue from there in the sections you need. After you have the basics, let's see how you create the provider itself. We'll use a demo of a provider that manages lines inside the text file. It's a simple made up example, but it will allow you to understand what we do. All the files are managed in a single path that's defined in the provider configuration. There's a single type of resource, a file that has a file name and a lines array, and it's actually just an array of strings nothing too fancy. The file API that provided is limited and limited by design, because when you're working with other types of APIs, there will be limitations that you need to understand and work around. And you didn't just want to provide an API that allows you to do everything on files. An example for this sort of our recent configuration, you can see on the left there's file file one resource that has file name equals file one and lines line 192, and the provider will translate that into a text file containing line one and line two, named file one. Clone the code under my GitHub repository, eight safer terraform provider filedata and it's also published in the terraform registry for you to see after. After you understood the API that you work on and you're into will point you to creating providers with a plugin framework. And this is actually the correct way and the recommended way to create new providers. It abstracts a lot of interactions that either happen with terraform core and allows you to focus on your logic. You start by cloning the terraform provider, scaffolding framework, repo to your GitHub profile, and then you can tweak it and customize it from there. Therefore, providers have four basic operations that they need to support for each managed resource. And these operations both provision the infrastructure itself, or change infrastructure, or delete the infrastructure, but they also amend the terraform state. Or actually they provide the instructions for terraform core to amend the terraform state after they've done the operation. So after a create operation, you need to amend the state so that the state will include the new resource that was created. A create operation obviously provisions a new resource. A read operation gets the infrastructure's current state. That means that it will go and, and read the infrastructure state over the API and return that into terraform core. An update operation changes attributes that can be changed, and obviously not, not every attribute in the threshold can be changed and a delete operation removes the resource. Terraform core sometimes uses delete operation to change, to change resources that cannot be changed if there's an immutable opinion with an attribute, therefore core will destroy and recreate the resource to change that attribute. If you look at the code, this is how the repository works. When you close the repository, we have a documentation directory and all the other code sits in the internal provider and the resources. So if we look at the file resource, it has a few different methods that cover the operations. It has the delete operation. The delete method covers the delete operation, update, read and create. And if you look inside one of the methods you can see they all start with the same kind of way. You start by getting the parameters the terraform course has sent to the provider, and then you do a bit of logic and then you return a response to terraform port to allow it to amend the state correctly. So if you take for example the create operation, you can see this provider starts with creating a full name from the base path inside the provider, and then from the file name provided inside the create operation. Then it iterates over all the lines inside the lines array and writes them to the file with API writeline operation. Schemas and attributes are used to map between terraform configuration files and the code itself. So every configuration block of the provider, the resource, or the provider itself has a schema that define the needed parameters, and schemas contain attributes that define the data elements itself. Each attribute has a type. It could be a primitive, like an x 64 or a string, and could also be a complex type like a map, an object, a list. Each attribute also has properties like a description, if it's optional, if it's sensitive, or other attributes, other properties, and it can have optional validators that check the user supplied values against what the provider expects and allows you to not check them later on because they will fail validation checks. If you look at the code, you can see that the file reasons has a schema method that defines the schema that the file resource expects to receive. It starts with a description of the file resource itself and then has an attribute named filename which is of type string. It has a description file name, it's required obviously, and it has a validator that checks the correctness of the file name provided in this example with the regix. It also has a lines attribute which is a list attribute. A list interfrom has is a collection of elements of the same type. This is also required, and the element type is string, as I mentioned earlier. And it has a validator that's a list validator that requires the list to have at least two elements inside it, just as an example. Nothing too fancy about it. Try new descriptions wherever you can, because these descriptions are later on grabbed by the plugin by the documentation framework to create these lovely documentation files. And you can see the description is copied from here and every other attribute. So if you have the descriptions inside, inside your resource file, it will be copied to the documentation and you can use that to publish your provider later on. Types of the terraform plugin framework are not native Golang types, so an n 64 in the plugin framework is not a native n 64 because they have additional methods to handle null values and unknown values. For example, in x 64 and any other type has an is null method that returns a true force in case this is a null value or not. When you want to access the values, use the in case of it's a primitive type, you use the value type method like value in 64 that returns a native in 64. If it's a collection, you can convert the values into go learn types with the add methods. For example, list has an elements as an elements as a method that returns the type as a native go type. If we look at the code again inside the file resources, you can see that over the breakpoint, you can see that the same create method is accessed. The filename attribute, which is a string attribute with a value with value dot string. This is used in various other places, so this value string will copy the native the framework type into a native type and convert into a string that you can work with after you created your code. The code run at least hopefully. You probably want to run it locally and debug it in case there are errors. The first thing that you can use to run telephone providers locally is to use a TFC like config file with a provider. Installation dev overrides substance, which translates a registry address into a local address that has your code. This will allow you to run the code without publishing it into terraform registry. You can use log based debugging for simple cases, but for more complex cases use debugger based debugging. It will allow you to set breakpoints and run your code as any other code. You do that by passing a flag debug. True, you set an environment variable that it outputs and then you run your action that you want to run, and then it will break inside the provider code. Let's see an example for that in action. We started with a configuration directory. We have a configuration directory that has a provider file that defines a base path, and this base path is the same as the one we're currently in. It also has a resources file that defines two files and file two with these names and the line inside them. And we can see that if we add file two, you will see the lines AA, BBB and CCC the same as the provider has defined them. If you want if we run terraform plan now, there's no changes because the resources have the same values as this as the infrastructure. Say you want to debug the the plan state and specifically want to debug the create the read operation inside the terraform plan. We'll start by setting a breakpoint inside the read method of the provider. Then we'll make sure the run configuration has a debug equals true as a program argument and we click the debug button and it will instruct us to copy these values and set them as an environment variable. And this will allow the core executable to reconnect to that running session, not just use what it has in this directory. So we export this value and if you run terraform plan again now you will see that the debugger has jumped and started to run and it breaks inside the read method. And from here you can just use the regular debugging operation to debug your code and see what happens. In this case you can see that this is the read operation for file two. And if we resume running the code there will be another break for the read operation of file one. And while it's running, the terraform process is hanging. It's actually waiting for instruction that are returned from the provider. And if we resume the operation it will continue and in this case again show that there are no changes. So that's how you debug operations. Use it. It's very powerful. It's very easy to debug like that. After we created the code, they debugged it. It seemed to be working. Add acceptance tests it will be used both for automatic tests during deploys inside inside GitHub actions, but you can also use it locally to make sure that your changes are valid and not you didn't break anything. You can have automatic testing for resources, data sources, providers, and anything that you created. The state inside the acceptance test is checking. Basically you don't have to check that the state changes are done correctly, but if you do have any change that you want to validate on the infrastructure side, you need to do that inside the acceptance test. You can run it manually. All tests with manually will make test act and the way you structure that is you have file resource. The resource name underscore test inside the same directory. So if we take a look at the code, you can see that the file resource has a file resource test file and it has a function called, and each function is actually a test. So you have multiple functions for test and there's a helper function here that defines the files that need to be run. So you can see there's a configuration block for provider and another configuration bug for a resource of type file data with this name that gets passed inside as a parameter. Here and has the lines formatted as lines here. And this allows you to see as the same kind of configuration block for different tests. You can see for example that in this test I passed file one with one and two in the lines, and in this case the test and update I passed the same file name but with two and three. And this allows the same configuration method to create different configuration files. To allow you to test the check. You have to define each check to pass. And this is just a check that tests the resource attribute. The filename is indeed equal to file one. This check is a bit more interesting because it tests that lines the first line. The first value inlines is equal to two as we defined here. As I said, you can run the configuration acceptance tests both inside the operation and you can also run it by running bake test act. That will, that will just run the acceptance test and we can just run it again and again to make sure that everything that we created and changes indeed validated break any other functionality. After you finish creating your provider, you debugged it. It looked all working. You created test acceptance tests and auth passing you could publish into the terraform register and allow other people to use your work. You first define a GPG key and then set the repo sacred GPG private key and passphrase to your value. An important thing that it took me some time to understand is that you need to create a tag named v version. For example v zero dot one dot zero to allow the automation to grab the changes. And after you created a tag and pushed it into GitHub, you can log into Terraform registry and add the repo for the initial node. Terraform will read what the current context of the repo, but it will also set webhooks that push any new changes to Terraform registry. And new changes are new tags. So if you have a tag named v one and then you push a new tag named v two, there's a default GitHub action that will compile that into resources and then Terraform registry will go and grab these resources and publish it. You can see this in this example, the version is zero dot one dot zero and it's backed by a tag name v zero dot one dot zero. So to wrap it up by journey with the provider started with no provider, just manual management scripts, manual procedures, a lot of documentation. Confluence. I started by learning go and terraform framework. I then created code for the provider and released more features. In my example I created a provider that defines users and sets inside the aerospace database. That's something that wasn't available after the code was created. I published it into the terraform registry and people appear to be using it. Thank you for your time today. If you have any questions, feel free to reach out to me either by my email, by my LinkedIn profile, or opening issue on the provider that's published under GitHub hsafra. Thank you and I hope it's been instructing for.

Harel Safra

Data Platform Engineering Team Lead @ Riskified

Harel Safra's LinkedIn account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways