Conf42 Cloud Native 2022 - Online

Terraform Practices - The Good, the Bad and the Ugly

Video size:

Abstract

Terraform is a GREAT tool, but like a lot of other things in life, it has its pitfalls and bad practices.

Since you are working with Terraform, you probably went through its documentation, which can tell you what resources can be used - BUT do you always have a clear path towards using these resources? How should modules be constructed? What should we call these modules? How should you structure your Terraform code in general?

In this talk, I’ll cover the good, the bad, and the ugly when it comes to Terraform. I will show best practices for working with Terraform that were put together with a lot of blood, sweat, and tears, so you’ll ultimately have a go-to approach and a paved way of working with Terraform, whether it’s an existing codebase or a new functionality altogether.

Summary

  • Hila Fish is a senior DevOps engineer currently working for weeks. Her talk will cover telephone practices, the good, the bad and the ugly telephone is a great tool. This talk is suitable for any cloud provider that you will work with.
  • The good thing here is to lock the versions always of the modules, providers and the terraform. The bad thing is toLock in a way that will still allow breaking changes to break through. The ugly thing here is to have no version lock whatsoever.
  • The next thing I want to talk about is tagging resources. This is a must implement practice because it allows you to filter cloud provider expenses and sort them. The bad thing here is to tag inconsistently. And the ugly thing is to have no tagging whatsoever.
  • By default terraform works with local state. The good thing here is to have the state remote and secured. A lot of times we have in the state sensitive information like secrets and stuff like that. You really need to make sure that these state is secured and backed up.
  • The first thing is using community modules versus creating them. The good thing here is to use official modules wherever possible. The bad thing is to write your own modules while official modules existing. These are things that you can do on your day to day in order to gain a lot of value out of terraform.
  • Unlike variable values, local values can use dynamic expressions and resources arguments. Locals also don't change values during or between terraform runs. You can enforce guidelines and practices through locals on these live side. Just remember to keep the live section as simple as possible.
  • Use environment variables. Use locals to these hard code names and tags which are set only once. If the main TF gets complex, if it has a lot of things, then consider break it down to sub modules. That way you will get a decent logical arrangement for faster access.
  • The next thing that I want to talk to you about is applying classic code best practices. Keep your telephone code in source control management like GitHub, GitLab, BitBucket. Functional programming is also another approach to writing telephone code. I really encourage you to check this road as well.
  • Long haul means that stuff that you should prepare for and plan ahead in order to work with terraform in a best, efficient way. Consider using telephone wrapper to avoid human errors. Workspaces isolate their state. Take that into consideration when you're considering using workspaces.
  • Make sure that youll always strive to remote execution. You should run apply with telephone plan file. And you should set up a telephone timeout. The ugly here is to execute locally and click control c while terraform is running.
  • The most important part of every module, even if it's a private module, is readability and cleanliness of the code. In order to keep things in check, you should use practices enforcement. These checks can easily remind developer to keep a high quality standard of pr.
  • Don't use telephone in an ad hoc mindset. Plan for your future needs. Planning ahead will allow you to enable others on their terraform journey. Even if you're a startup, you should still think about scales.
  • Thank you so much for listening. I want to do a quick shout out for some people from Wix that helped with the visuals of the presentation. If you want to approach me on LinkedIn or Twitter or mail and consult about telephone or other sre aspects I would be more than happy to help.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, thanks for joining me. I'm going to talk, but telephone practices, the good, the bad and the ugly telephone is a great tool. It is used widely across the world. This talk will cover not only the specific bits and pieces regarding terraform usage and best practices, but also hopefully will make you think about youll unique use cases and scenarios and help you see the big picture and utilize terraform in a broader context rather than just an infrastructure tool. So a little bit about myself. I'm Hila Fish. I'm a senior DevOps engineer currently working for weeks. I have 15 years of experience in the tech industry. I'm a DevOps culture fan. I think this what helps companies achieve great things. I'm conferences, a co organizer. So DevOps Day is Tel Aviv in Israel, and Sascraft which is a monitoring conference. I'm a mentor in opschool, which is a course for DevOps and Ba'ot which is a community in tech for women. And I'm a lead singer in a cover band, as you can see in this picture. Okay, so telephone implementations can be good, bad and ugly, right? So we will talk about it here. Just one disclaimer beforehand I'm using to show mostly examples from AWS, but this talk is suitable for any cloud provider that you will work with. And also I mostly or not mostly only worked with telephone open resources and not enterprise or cloud. So I don't know if what I'm going to show you applies to them as well. So just bear that in mind. Okay, quick wins. I'm going to show youll briefly things that you can do in terraform that will achieve great value in a low amount of time, just a matter of seconds or minutes. These first thing is versions lock. So like we have requirement TXT or packages lock JSon. So that's exactly the same thing. So when you think about modules, providers and terraform versions, each of them has a version that gets deployed, which means a specific syntax that is accepted and the features with it. So if you lock the versions, you know exactly which features are the ones that you can use, which syntax is acceptable and valid and stuff like that. So the good thing here is to lock the versions always of the modules, providers and the terraform. The bad thing is to lock in a way that will still allow breaking changes to break through. So you have version constraints. So if you lock the version to the major or the latest of major x until breaking changes get introduced, then that's fine because little changes applied and no breaking changes will get through. But if you still lock it in a way that will allow the breaking changes, then this could be bad. And the ugly thing here is to have no version lock whatsoever. Trust me, it is really really ugly. You will see a lot of stuff in plan apply of like I don't know this syntax, what this is it, stuff like that. So just lock your versions. The next thing I want to show you or talk about is tagging resources. So I think this is a must implement practice because it allows you to filter cloud provider expenses and sort them. Because if you use tags you can sort by whatever you choose to and that way you also gain visibility on ownership and overall projects. So if you tag them correctly you can do a lot of great things and sort and really track your cost management and other stuff along the way. So the good thing here is to tag everything because why not everything is good. You can use default tags on the provider level. So for example, if you use the managed by tag that I showed before in the examples, then managed by is by terraform of course. So if you set it in the provider level then you can just forget about it. You don't need to set it up on other resources along the way. Once it is set up on provider level, then it will get applied to any resource that is under that provider. And another good thing to do is to add enforcements to failure pr were if tags weren't added. I'm going to talk about the practices enforcement later on. So you'll see that the bad thing here is to tag inconsistently because as we know, consistency is key. Inconsistently means that sometimes you will be able to sew, sometimes not, and it's not that great. So try to be as consistent as possible. And the ugly thing is to have no tagging whatsoever. Because again, trust me, if you'll tag things, phenops team will thank you and your team leader will thank you and the management and company will thank you. So do it. And the last thing in the quick wix section I went to show youll is the remote state. So by default terraform works with local state. So even if you're walking alone, okay, let's say you're the only one managing infrastructure still you want to think about, or you should think about backups and to have the state secured and to have redundancy. Because if something happens to your local machine that it's not good. So all sort of things. So you should really use a back end like in this example, this is s these, but again, call cloud providers, provide that. And then this way the state will kept remotely. So the good thing here is to have the state remote and secured because a lot of times we have in the state sensitive information like secrets and stuff like that. So you really need to make sure that these state is secured. Have these state backed up. So if you use s three as the back end then enable versioning and then it will be blood and ensure that tf state lock occurs. Because when you run write operations then if a lock doesn't happen then there will be conflicts if other people trying to run it as well. So not so great. And these bad. And the ugly here is quite the opposite actually. So if the state is kept locally, if it is remote but not backed up or not secured, and if telephone state lock doesn't occur during write operations. Okay, so we talked about quick wins, stuff that you can do on your day to day in order to gain a lot of value out of terraform. Let's talk about second nature. What do I mean by second nature? I mean that these are the things that youll should have in your awareness on your day to day in order for you to really work with terraform in these best way possible. So the first thing is using community modules versus creating them. So hey, why reinvent the bicycle, right? I mean if it exists then use it. Using official community models is good because they are proved over time, they support it by the community and you eliminate the need to support it and keep it up to date because they do it for you. A lot of well known cloud providers features are already covered by modules, so do your research, check available modules before implementing yours. So the good thing here is to use official modules wherever possible. The bad thing is to write your own modules while official modules existing. But if you still maintain them as such, so you have enforcements and stuff like that, then it's good. And using community modules without version lock, it's also a bad thing as I specified before. And the ugly thing here is to write your own modules and paved no checks or consistency whatsoever and code practices not applied. So we will talk about it also later on. And also remember, community modules, as I will show in a bit, usually use enforced linters, formatters and logical checks because they want to maintain important aspects to allow new users to get involved and to contribute to the module and use it right. So they have everything set up to create the best quality code. So why not use it? So if you do have to create your own modules, make sure that they are stateless and clean and generic and you do not repeat yourself there and it is kept as simple as possible and use enforcement slope. So like community modules have these enforcements, use them as well. I will also cover it later on and youll just bear in mind that the code should always be clean and readable. Okay, variables and locals. So unlike variable values, local values can use dynamic expressions and resources arguments. Locals also don't change values during or between terraform runs such as plan, apply or destroy. You can use locals to give a name to the result of any terraform expression and reuse that name throughout your configuration. Like this example with the tears that I showed right here. So on the module side, you should use variables for needed settings for the module config itself. And if youll set a variable on the module, you should set default or validation because if the module expect a variable to get passed to it and it doesn't get it, then the module will break right. So make sure to set up a default or validation. And for locals it should be a constant to be honored or relied on. So for example, you have a bucket module that creates buckets. If you want to have consistency of all buckets should have the same name convention. You can do it through using that convention in the locals as I showed in this example. So this is also a great thing and you could really enforce guidelines and practices through locals on these live side. When I say live, I mean where I call the modules because modules are generic and nothing happens there. And there's the section where we call the actual modules and do the actual creation of the resources. So these live section variables wise, if you use the variables once, then just set up default on the variable. But if you use it per regions or other logical breakdown that you have, then use tfvals file and then you specify each variable value on tfvals based on each region or each other breakdown that you do avoid using locals and use data sources to pass their outputs to the module itself. And basically just remember to keep the live section as simple as possible. No logic, only call the modules themselves. Okay, so to sum things up regarding variables and locals use these tfvals wherever possible. I haven't mentioned it before, but use environment variables. So if you have environment variables already set up, then just utilize them instead of creating a new variable. Use locals to these hard code names and tags which are set only once or to decrease code readability, repeatability, sorry. And keep things generic as much as possible. The best thing here is to use multiple locals block if not necessary, because terraform allows that it allows youll to create multiple blocks. But if you don't need to, then why burden my eye with a lot of stuff that is written, right? So just have one block if it's not necessary. And ignoring environment variables could also be bad because then it forces you to maintain more variables than you need to. And the ugly thing here is to hard code values on variables that should support multiple scenarios. So these are why variables are for right to set them up according to our needs. So if you hard code values whenever it is not good to do that, then it could get ugly. So that's about that file structure. So when you think about the file structure, and I say that in regards to both modules and these live section, you should think about it for better logical arrangement and easy management. So this is how it is structured on community modules. So they are basically the standard for us. Main TF is the main logic variables, data and outputs. And usually, at least from my experience, for example, if we take a module of VM, the VM itself creation, these resources creation is on the main and then if they have complementaries like security groups, then it should be in SG file. If there has definitions for log answer, then put it in ALBTF. So that's about that. And if the main TF gets complex, if it has a lot of things, then consider break it down to sub modules. So let's know IAM. For example, the community module IAM has breakdown of sub modules. Also eks one of the sub modules are the creation of node pools. So it shouldn't be in the top main because it's not the main logic, but it is relevant. That's why it was breakdown to modules. And also when you break it down then the variables for each sub modules is only with that sub modules. And then also your variables. TF file doesn't get huge and it really is easier to maintain it that way. Also it's best to have naming convention which reflects the actual purpose of each file. And that way you will get a decent logical arrangement for faster access, better readability and cleanliness of the code. Okay, the next thing that I want to talk to you about is applying classic code best practices. So yeah, Terraform is not a pure programming language, I know that. I think that everyone can agree on that. But similar rules of writing code apply to terraform as well. Terraform progressed over the years in a way that adopts code best practices. For example, you might remember that before Terraform, a zero point 13 you can't even use for each for modules. And in August 2020 with the release of Terraform, zero point 13. Hashicorp finally introduced the ability to loop over modules with a single module call. So even Hashicorp realized that hey, Terraform should follow best practices for its code. So that's why they introduced these capabilities. So keep your telephone code in source control management like GitHub, GitLab, BitBucket. Keep it simple, stupid as much as possible, of course. Do not repeat yourself. And make sure that your modules that you create and everything that you use are item potent. Which means that whenever you create something, the result of that something, the result of the logic that runs is always the same. Youll expect the same result. Because if not, then there's a saying about it that maybe you're crazy. I don't know. Let's leave it aside. But everything should be important because you want to make sure that everything is as expected. You always expect the same results. Functional programming is also another approach to writing telephone code. It is great. I haven't did it myself yet, but I spoke with other developers who are utilizing functional programming into their telephone code, which is very interesting and fascinating. So I really encourage you to check this road as well and about human and cleanliness. So there's an interesting read by tixen Guo. I hope that I pronounce his name properly. He really writes things in a clear manner about applying classic code best practices in terraform. So I want to quote him on something about human and clean code. So the computer that processes your code doesn't care if the variable names are ambiguous or inaccurate, right? If used correctly, it still gets executed. But since human beings are the ones to maintain this code, then we need to make sure the code is readable. Things like refactoring clean code, naming conventions, stuff like that are invented so that we humans can read these code better for the sake of us human and not the computer, right? So that's about that. I really encourage you to read the article because it's really interesting. Okay, so we spoke about the quick wins stuff that you can do in minutes in order to get a lot of value out of terraform for your company's long term and whatnot. We talked about second nature things that you need to think about on your day to day when you work with terraform. Let's talk about the long haul. Long haul means that stuff that you should prepare for and plan ahead in order to work with terraform in a best, efficient way. So structuring youll telephone code, how do you structure your code? There's a lot of ways to do it. So let me show you how we do it here at Wix. In Wix, we did the structuring like that team project, blood provider and region. This is actually a feature oriented approach. And that way, when you look at the example here, you see the live section, right? And we have bi, which is the team airflow is the project AWS, the cloud provider, and us east one is these region. That way. Also, it allows the state to be very small, because the state is only for the airflow project, for bit, for us east one in AWS. So it's very small. It allows you better flexibility and control over what you are inserting and what you're managing at that specific point. So it really is very beneficial to have this structure. And also, when you come to think about accounts, currently we manage the accounts on the region level, which is not great. So that's why we are structuring it or thinking to structure it again, on top of tears. So it's an ongoing process. But think about that. Think that if you have multiple accounts that you need to manage and different projects, and the code doesn't repeat itself, like I will show in the next example, then maybe the account should also get into the consideration of the structuring of the code. Another example of structuring these code is using workspaces. So workspaces isolate their state. So if you run telephone plan in one workspace, you will see only the state for that workspace and not the other one that is just around the corner. So one example is to use when you have the same telephone config, but different customers. So let's say GCP, okay, I talked about AWS until now. Let's say GCP. Each project in GCP is a different customer, and it's exactly the same code, right? Because it's the same code, just different customers. So in that case, you can use for each customer, each project, which is the same code, just different workspaces, and each workspace is a customer. So this is one example. Another example which really links and couples with the one that I just showed is when you have the same service, but different regions. So we have different customers, right, but all customers need to go to one service, financial service, for example. So if I have financial service on different regions, use east one, use east two, and stuff like that, then I can also use that for workspaces. Okay, some comes about workspaces. So if you use them, consider using telephone wrapper to avoid human errors. Because when you use workspaces, it's using the CLI telephone workspace, select x, that way if I forgot to change the workspace, I am a bit, so it's not great. So consider if you creating a telephone wrapper that will actually run the code for you, and then you will run the wrapper instead of running telephone directly. And that's why this wrapper will handle the changing of the workspaces and management for you. Second thing is that you have less visibility because hey, I just started with the CLI, right? So if I haven't used the telephone workspace built in variable here, then it means that I don't even have the ability to know that we have other workspaces. If I haven't did the telephone workspace list command, it really is important to know that you have less visibility and to take that into consideration when you're considering using workspaces. A couple more things about workspaces. So from these terraform official documentation, it says that use workspaces to manage multiple non overlapping groups of resources with the same configuration. Okay, so it means, it suggests that these usages are qualified, right? Multiple environments, dev staging, stuff like that. Multiple regions like I showed in the previous example, or multiple accounts or subscriptions. Okay, cool. Now let's see. Also from the terraform official documentation it says that for different development stages like staging versus production, named workspaces are not suitable isolation mechanism for this scenario. So if you do go with workspaces, maybe I read it quote, I don't know, just make sure that you go into it with open eyes and you know what you're doing. And I think that we can all agree on at least one workspace usage. Both the documentation says that and other people that I worked with and showed me that they are doing it is when you have workspaces, you have a default workspace, this is the main one. And then if you create another one, you can call it whatever you want. This could be a side branch. And then you can test out any code that you want to introduce, see that everything works okay, and then apply this code to default workspace. So create a new workspace, do whatever you want, test it out, and then if call looks okay, these apply it to the default workspace. Okay, so to sum things up in regards of structuring your telephone code base these good thing of really thinking things through and planning ahead. And if you for example, take the first example that I showed you with the feature oriented one, then it allows small states set up and small state is a very good practices to have. And also the first example with the feature oriented is it really allows you to set up a terraform as a platform because that way you can let any team in your organization use terraform. Each team has their own control over these folder. Also in GitHub each folder has it is stated in the code owners so they can approve their own prs and stuff like that. So it really gives you flexibility, enables independence, and offloads responsibilities to others. The bad thing is that if you don't think and plan ahead then organizational changes could cause a need to restructure the code. And you don't want to restructure the code just because you didn't plan. If stuff evolved, great. But if you need to restructure just because you didn't plan it correctly, then it could be a bummer. Another bad thing is to use workspaces for the wrong reasons. I just spoke about it before, so just make sure you're doing it for the right reasons. And the ugly thing here is that if you structure the code in a way that will allow or enable huge states to occur, then this could lead to invalid dependencies. So it happened to me quite a lot that I did a change X and then I can a plan and then I saw in the plan it's going to change Y and I'm like what? I changed x not Y. So huge states could lead to it. So make sure you choose a structure that will allow smaller states has possible. The next thing I want to show you or talk about is the executing terraform. So make sure that youll always strive to remote execution because that way you don't need to set up local credentials, you don't set up local configurations, you paved better audit of who can what. So it is always great to have remote execution. You should run apply with telephone plan file so you can pass an argument of which file the plan file to run and then you know exactly what is getting applied. And you should set up a telephone timeout because I had cases where I ran auto scaling and the auto scaling was based on spot instances. So telephone just waited for the price to fall in the right. So it's not nice. I just need to wait and wait and wait and it's not nice. So set up a telephone timeout which makes sense to you. The bad thing is to execute the telephone locally so either your computer or a server because then you have no audits. It's not nice. And the ugly here is to execute locally and click control c while terraform is running. If you don't want to wait for the timeout, I understand, but it's best for you to just go and grab a cup of coffee or cocoa, cocoa, whatever, but it's not good. Control c while telephone is running could lead to disruptions in the state conflicts. It could really really get ugly. So don't do it. Okay? Practices enforcement so we talked about that. The most important part of every module, even if it's a private module, which is only going to be used internally, is readability and cleanliness of the code right? In order to keep things in check, in order to make sure that everything is clean and right and everyone has guidelines, then you should use enforcements. So these enforcements already happen on community modules, so you should also do them yourself on your internal modules. So this is example from the AWS auto scaling community module. As you can see on each pr there's a set of checklists that is being checked for the GitHub actions. So it checks if the contributor added documentation. If he formatted, he or she formatted all day, formatted the code, terraform, lint, telephone format, what else? End of file. So a lot of things that are being checked and it's really awesome to have these checks because these simple checks can easily remind developer to keep a high quality standard of pr as best as possible. Okay, so to sum things up in regards to practices enforcement, I tried to think about bad things to say about that. So maybe the bad thing will be, I don't know, it forces the developer to revisit the code and add more stuff. But it's not really a waste of time because it is good to add this stuff. It's not just on a whim. These are important things that we need to add and that's why it's good to add them. So I only have good things to say on practices enforcement. So youll should add pre commit or pre were linters for matters and logical checks, either through GitHub actions or CI pipeline checks. You should also, if you want, create a slack, but that actually tears you if there was a drift between the plan that you did and the actual environment. And speaking of actual environment, you should always make sure that the enforcement know or verifies that the master should always be your source of truth, your actual environment. So for example, in have when we push the code to GitHub, the GitHub check if tears were added and more stuff to come. And then once everything was cleared and everything is okay, it runs the plan for me. I see that everything looks good and then I do Atlantis apply because we use Atlantis for the actual run. Atlantis apply does two things. One, it actually merges these code and applies it that way. I know that what applied is what merged to main branch and that way. This is awesome and really it keeps the situation as it should be. Main is the actual environment on the right. I put you some open resources, enforcement and helpers that you can use after things talk, which you're going to sit down and read about enforcements and how to do it. So these are a few checks and a few tools that will help you with this enforcement journey and set up. Okay, so we talked about a lot of things here, right? I showed you a lot of things you can do in telephone or think about telephone. So maybe stuff will stay with you, maybe not. But the thing that I really, really want you to think about and stay with youll after this presentation is to think and ask yourself, when you work with terraform, how do you envision the infrastructure and the company needs? Because you should really think things through. Planning ahead will allow you to enable others on their terraform journey. You will be able to set up guidelines and best practices of your own, like tagging, usage and whatnot. That way you will make sure that everything is utilized in an organized way and an orderly fashion way. And this is what we need in a company, right? We need structure and we need to make sure everything is aligned because it's better. We can really keep things in check and we can really make sure that everything is manageable that way. So take into consideration your use cases and your pain points. Terraform constraints, where do you see yourself and your company in the long tears and then plan accordingly. And if we wouldn't plan ahead, we wouldn't be able to set up terraform as a platform as we did here at Wix. So this is one take but of it. And even if you're a startup, you should still think about scales, think about how should you address and prepare for changes to come. And then you will be able to utilize terraform in the best way possible. So like any other tool, don't use telephone in an ad hoc mindset. Plan for your future needs. Because I spoke about tixen Goa before. I want to quote him on another thing. Programs evolve and code changes. And it is really rare that you write telephone code and it stays like that because this is not how projects work. If that was the case, then we wouldn't be talking here on telephone practices and you would only use it once in one way and that's that. But we will always have projects. And because businesses went to improve and the project is the way to move from the current state to the next desired state. Changing from one state to another is a project and by nature project means change and these code is also change constantly so think about your structure and how you structure things and allow projects to evolve and get introduced to your environment and to your company. Thank you so much for listening. I hope that it was beneficial for you and I want to do a quick shout out for some people from Wix that helped me liberal that helped with the visuals of the presentation and the logical flow. Without her it wouldn't look like that so thanks for her and other people. Ilya Schenking from my team ran Schneider, Oprah Velez and thermal cupak they all pitched in and gave me some inputs so thank you guys and again thank you all for listening and if you want to approach me on LinkedIn or Twitter or mail and consult about telephone or other sre aspects I would be more than happy to help. Thanks a lot.
...

Hila Fish

Senior DevOps Engineer @ Wix

Hila Fish's LinkedIn account Hila Fish's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways