Terraform collaboration challenges and how to tackle them
For those who have never heard or used Terraform, I recommend reading this (link to Infrastructure as Code article) article first. You will find there a short introduction to Terraform and its workflow.
Here I would like to highlight common issues which are related to working on Terraform in a co-operation model.
Problem: Shared access to state file and its encryption
Terraform by default stores a state file on the local machine, where the command line interface is being used. While this is fine for testing purposes for single engineer(s), for bigger projects it becomes a challenge because of several factors.
To update the infrastructure, each engineer needs to have knowledge about existing resources and access to consistent data stored in the state file. It means that the state file should be available in a shared location (storage). Moreover, having access to the same file simultaneously may lead to conflicts, data loss or even file corruption. It means that an additional protection mechanism needs to be implemented – file locking. It may sound that the version control system (GitHub, Bitbucket) is the perfect solution here. Likewise, it indeed resolves problems related to shareability and lockability.
However, you need to take into account additional facts. All data in the state file is stored in plain text. Because of that, we cannot store data with sensitive information in VCS.
Moreover, such a solution is burdened with the factor of human error. It is enough that one engineer forgets to pull the latest version before running Terraform apply – in that case, we may have a situation where the entire infrastructure or its single resources are rolled back to a previous state by accident.
Solution: To address the above collaboration issues, Terraform offers built-in support for so-called remote backend. It is nothing more than a remote (shared) location for storing state files. When the remote backend is used, Terraform downloads the latest version of the state file before applying changes to infrastructure, and then automatically uploads the up-to-date version when the deployment is completed. When you rely on Microsoft Azure cloud computing, it is a natural choice to use a Storage Account as a remote backend. It supports out-of-the-box file locking and encryption together with additional, optional features like file versioning (reviewing state file versions can give us more information on how infrastructure was changing in time with deployments).
Problem: Terraform versions
I have highlighted Terraform collaboration problems related to shareability and lockability. Terraform is a single binary which can be downloaded from HashiCorp pages in different versions. It introduces another issue in the context of collaboration. Imagine that we have our Terraform code stored in a version control system with remote-backend configuration. That is fine as long as we actually need to do changes and run Terraform against our infrastructure. One engineer can use the older version of Terraform binary. Another engineer may run his/her changes using the latest version of this tool. Changes between major Terraform releases have significant impact on the way how is state file stored and organized. Worth mentioning here that the state file is not compatible with all versions of Terraform. Moreover, Terraform code is not always compatible backwards and some functions or features may be deprecated.
Solution: Terraform integration with CI/CD pipelines in Jenkins, Azure DevOps, etc.
Problem: Code and Terraform Plan Review within team
Many of you will think that the proper Terraform setup together with Ci/CD pipelines are the golden remedy for all the problems cited so far. However, you need to take the most important aspect in all projects – collaboration.
What about having the option to see infrastructure changes once a Pull Request is raised?
What about having another team member to review Terraform code and plan output before approving PR?
What about running approved changes (terraform apply) once PR is reviewed and approved?
Above logic can be implemented within CI/CD pipelines via conditional checkouts, however we can achieve the same much easier…
Solution: Terraform Pull Request Automation using Atlantis tool (http://runatlantis.io)
Terraform Pull Request Automation
I think this section header is self-explanatory – you can automate running Terraform from Pull Request level, getting its output, like terraform plan output directly to PR comment. Atlantis is self-hosted – it means that the service is deployed within your infrastructure, so you can fully manage it, and you make sure that sensitive data doesn’t leave your infrastructure. It runs as a Golang binary or Docker image, which makes it really easy to deploy on VM or Kubernetes. Application listens for webhooks, so you can integrate it without issues with any version control system (GitHub/GitLab/Bitbucket/Azure DevOps). The most important feature which makes the magic is that Atlantis under the hood runs terraform commands remotely and comments back with their output.
I will not do a deep dive into its implementation and integration, as it is well documented on its official site, depending on what infrastructure you have. Let’s go through Atlantis workflow to see how it can improve your collaboration on Terraform code and make sure that your infrastructure changes are properly reviewed and approved by your teammates.
STEP 1: Raise Pull Request with Terraform code changes
STEP 2: Option 1: Atlantis runs terraform plan automatically if configured to do so
Option 2: You put proper comment to pull request to run terraform through Atlantis
In both cases, Atlantis comments back on the pull request with terraform plan output
STEP 3: Your team member reviews the plan and based on it makes a decision to approve Pull Request
or asks for proper changes
STEP 4: Once PR is approved, you can run terraform apply by putting equivalent comment to PR to notify Atlantis
STEP 4: You can merge Pull Request after successful run from previous step
To summarize Atlantis tool, let’s catch up key points being the biggest advantages from collaboration point of view:
you can avoid errors before deployment – with review, you have the extra pair of eyes on your code
you ensure that code deployed on infrastructure is approved and merged to protected branch (master)
you do not need to worry about Terraform binary version and credentials required by Terraform to plan and execute code