Terraform Secrets and the AWS Parameter Store - Bugcrowd Engineering

Introduction

Building and deploying an application always starts very simply: you might begin by using the console of your favourite cloud service... a few button presses later and it's done. Of course, this is how it all starts - sooner or later, you will have more bits and pieces when suddenly, one day you need another environment. Then you go through the hassle of setting up our new environment - making sure that all the clicks you did the first time are done again, and making sure that you don't miss a step or forget anything.

Fortunately there exists the concept of Infrastructure as Code (IaC), where we can declare our infrastructure in files (commit them to git!) and deploy them to multiple environments. At Bugcrowd, we use Terraform to manage our Infrastructure. For example:

provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "web" {
  ami           = "ami-005e54dee72cc1d00"
  instance_type = "t3.micro"

  tags = {
    Name = "HelloWorld"
  }
}

output "ip_address" {
  value = aws_instance.web.private_ip
}

While this is easy in the beginning, not everything can be tracked inside Terraform - especially when it comes to secrets - and you will have to come up with your own solution.

But what secrets?

If Terraform is all about infrastructure configuration as code, then you may ask how "secret" data enters the system. You would assume that you can use Terraform providers to generate secrets as part of the configuration. Unfortunately, this is not always true. There are things that you can't generate using Terraform. One example is when you are using a third-party service that doesn't have an equivalent Terraform provider. In our platform, we use Bugsnag for error tracking, and it doesn't have a provider. This software requires an API key per project, which we need to manually generate and inject into our Terraform configuration in some way.

What we used to do

At Bugcrowd, our calls to Terraform are wrapped in a script. This is done for a variety of reasons, such as being able to switch between two different environments. Previously, the script contained two different methods of distributing secrets back into the application.

Secrets using state

One method was to use the script to parse the outputs of deployments, and inject them back as a variable into Terraform. For example

variable "super_secret_key" {
  description = "My super secret key"
}

output "_super_secret_key" {
  value = var.super_secret_key
}

This doesn't look super intuitive as it's hard to understand what happens when the output is not yet defined.

When the deployment is applied for the first time, the value of _super_secret_key is null. In this situation Terraform asks the user to enter a value for the undefined variable. Then at the end of execution, Terraform would store the computed output in the state file and push it to remote state.

During subsequent executions, the script first calls terraform output on the deployment and grabs any outputs that were prefixed by an underscore. Then it exports a shell variable to fill in the required value. In the above example, the script generates export TF_VAR_super_secret_key=$(value_from_state). By doing this, the variable now has a value and the user no longer needs to remember the enter the secret. With newer versions of Terraform, we can mark variable and output declarations as sensitive using a flag to ensure that they aren't leaked.

One problem we had with this system was that it was a pain to rotate a secret. In order to rotate it, you would have to change the assignment of the value to the new value, do an apply, and then remove that change and apply again to propagate the secret.

// Step 1: Change the output

variable "super_secret_key" {

}

output "_super_secret_key" {
  value = "new-super-secret-key" // var.super_secret_key
}

// Step 2: Apply

// Step 3: Put it back
variable "super_secret_key" {

}

output "_super_secret_key" {
  value = var.super_secret_key
}

// Step 4: Apply

Repeating these four steps every time we rotate credentials is a big pain, so we wanted to move to a system where this can be done easily - without re-applying every deployment multiple times.

Secrets from file

Another approach to secrets, especially when it came to files, was to use an encrypted tarball. In most cases, we could get away with storing the secret in state, but for certain cases, we needed files such as when certain Terraform providers need a file on disk to work. We use Keybase widely at Bugcrowd, and one of its features is a file system known as KBFS, where you can mount shared files in a chat as a volume.

We can create a folder, and add secrets for each environment in it. Then we can zip up the file and encrypt it using Keybase, such that it was decryptable by everyone in the infrastructure owners' chat group.

The CLI wrapper scripts are then adjusted to take the secrets file as an argument. During execution, the file is be decrypted, unzipped, and then mounted on the disk. Any file paths inside the Terraform code are configured to pull secrets from this folder and use them during execution.

We had a few problems with this method:

Granting people access to Terraform required giving them access to this secrets file for all environments. This meant that temporary access was not something we could achieve.
Rotating secrets to the tarball was a bit of a pain: it required coordination between team members to ensure that the fresh secrets were always used.
We didn't like being dependent on Keybase to deploy our infrastructure. In fact, at the time of writing, KBFS seems to have issues, and Keybase as a whole is not well supported.

As a result, we decided that we needed to move away from this approach.

Using the AWS SSM Parameter Store

We decided that we need to move off of our custom secrets system as much as possible. One approach raised was to store the secrets somewhere remote, so that Terraform can natively access without writing it ourselves. This would also enable us in the future to use systems like Terraform Cloud, which prefer a more standardized use of Terraform.

There are several AWS offerings for secret storage. We decided that the SSM Parameter Store would work the best. We already use it with the services that we deploy, so it felt like a good choice to extend its use to the problem at hand.

SSM in Terraform

An SSM secret is simply the combination of a path and a value. We can further secure the value by declaring it a SecureString and providing a KMS key to encrypt it. So creating a new secret in Terraform looks like this:

resource "aws_ssm_parameter" "super_secret_key" {
  path   = "/foo/bar/super-secret-key"
  value  = "something"
  type   = "SecureString"
  key_id = aws_kms_key.key.id
}

We can also get an existing parameter store secret and reference it in code

data "aws_ssm_parameter" "super_secret_key" {
  path = "/foo/bar/super-secret-key"
}

resource "my_other_resource" "thing_that_needs_a_key" {
  secret = data.aws_ssm_parameter.super_secret_key.value
}

As long as Terraform can decrypt using the KMS key, we can retrieve the value in the code.

Putting it all together

Using what we learnt above, we create and track secrets separately in SSM using an alternative CLI tool, or just the AWS console. Then we reference those secrets in Terraform and not use any special hacks.

In Terraform, you have a collection of deployments that can be thought of as layers on your infrastructure. Each layer builds on the previous one, referencing objects that were created in the cloud. Since we need a KMS key that already exists to create secrets, we can setup a special bootstrap deployment that has the basics of what is needed to deploy the rest of the Terraform.

So in bootstrap we would have:

resource "aws_kms_key" "terraform_secrets" {
  description         = "Manage terraform secrets"
  policy              = data.aws_iam_policy_document.terraform_secrets.json
  enable_key_rotation = true
}

resource "aws_kms_alias" "terraform_secrets" {
  name          = "alias/terraform-secrets"
  target_key_id = aws_kms_key.terraform_secrets.key_id
}

The referenced IAM policy is removed for brevity.

Now that this key exists, we can go on the CLI and create all the secrets we need. We decided that the path of an SSM parameter would follow: /service/terraform/$deployment_name/$secret_name. For example, a Bugsnag API key for a service named Bugcrowd would have a path /service/terraform/bugcrowd/bugsnag-api-key. We would then mark this as a SecureString and pick our Terraform secrets key to encrypt it.

Back in Terraform land we would reference the Secret and assigned it to the service.

data "aws_ssm_parameter" "bugsnag_api_key" {
  path = "/service/terraform/bugcrowd/bugsnag-api-key"
}

resource "aws_ssm_parameter" "bugcrowd_bugsnag_key" {
  path = "/service/bugcrowd/bugsnag-api-key"
  secret = data.aws_ssm_parameter.bugsnag_api_key.value
  type = "SecureString"
  kms_id = # TODO
}

Now, this might seem a bit convoluted as to why are we creating another SSM parameter when it's already in SSM. This method allows us to not give the service access to the Terraform secrets key, but instead give it its own key. It also allows us to show in code any manual steps that have to take place for the deployment to be applied. So, if we forget a secret, the apply process will error.

How do I use this with providers?

Providers in Terraform need credentials to function as well. For example, we use the Datadog Provider to create dashboards, monitors and more. To use the Datadog provider, you need API keys. If we were to follow our current approach, it would look like this:

data "aws_ssm_parameter" "datadog_api_key" {
  path = "/service/terraform/bugcrowd/datadog-api-key"
}

data "aws_ssm_parameter" "datadog_app_key" {
  path = "/service/terraform/bugcrowd/datadog-app-key"
}

provider "datadog" {
  api_key = data.aws_ssm_parameter.datadog_api_key.value
  app_key = data.aws_ssm_parameter.datadog_app_key.value
}

Unfortunately, this will not work in Terraform. There is an outstanding issue in Terraform to fix this, but it has been open since 2015 and it doesn't appear that it is ever going to be solved. At a high level, during execution Terraform will create a dependency graph between resources so it can determine the order of execution. This graph does not take into account a link between providers.

In the above example, we can see that there is a dependency between the datadog provider and the aws provider since the Datadog provider needs AWS resources to execute. Terraform is unable to resolve this dependency.

We needed an alternative solution to handle this case. An approach for cases like these is to split the dependency into two layers: the first layer is computed during one apply and the other layer can use its output. If we were to use this example:

// In another deployment, called 'core'
data "aws_ssm_parameter" "datadog_api_key" {
  path = "/service/terraform/bugcrowd/datadog-api-key"
}

data "aws_ssm_parameter" "datadog_app_key" {
  path = "/service/terraform/bugcrowd/datadog-app-key"
}

output "datadog_api_key" {
  value = data.aws_ssm_parameter.datadog_api_key.value
  sensitive = true
}

output "datadog_app_key" {
  value = data.aws_ssm_parameter.datadog_app_key.value
  sensitive = true // Since Terraform 0.14 this will be implicit as the AWS provider declares that an SSM Parameter is sensitive
}

// And then in the datadog deployment
data "terraform_remote_state" "core" {
  // configuration depending on the backend you are using
}

provider "datadog" {
  api_key = data.terraform_remote_state.core.outputs.datadog_api_key.value
  app_key = data.terraform_remote_state.core.outputs.datadog_app_key.value
}

In the example, we have a separate deployment called 'core', which uses our SSM based method to retrieve the credentials we need. We then export the credentials as sensitive outputs. This deployment when applied will have the credentials in its state file. In the deployment in which we want to use the credentials, we fetch the 'core' deployment statefile and use that to fill in the credentials required by the Datadog provider. This system works as there is a defined order between state file retrieval and provider initialization.

But what about files?

Sadly, we have to admit that we couldn't find a good solution where we need a file on disk. Most providers support both a string value and a file path, but there are cases where it must be a file. Fortunately in our case, there is only a single provider that requires a file on disk.

The solution was to fall back to our wrapper scripts. We start by storing the contents of the file in remote state using a sensitive output as previously discussed. Then in our wrapper scripts (during apply) we grab the output using terraform output -raw <output_name> and pipe it into a file. The provider is then able to use this file during initialization.

This isn't the cleanest solution, but since we only have a single case, it works for us. It should also be stated that we always run our Terraform inside of an ephemeral container, so any sensitive data that may exist on disk is erased after each run - avoiding accidentally committing any secrets to version control.

Wrapping Up

To wrap up, our solution to the problem of Terraform secrets can be described as:

Store secrets in an external source like AWS SSM Parameter Store
Use data blocks to retrieve credentials and provide a coupling between a deployment and the credentials it needs to run
If the credentials are used in a service, re-export them to the form the service needs them to be in
If the credentials are used by a provider, export them in a separate deployment and retrieve them from remote state
And finally, if you need it in a file, you are out of luck :)