Unexpected Terraform lock file changes

I bumped into an interesting issue recently where a job that runs Terraform as part of the CI/CD setup (which I’ll refer to as CI for short) failed with:

Error: Provider dependency changes detected

Changes to the required provider dependencies were detected, but the lock
file is read-only. To use and record these requirements, run "terraform init"
without the "-lockfile=readonly" flag.

What is this about? The Terraform setup uses lock files as recommended by the Terraform documentation:

You should include this file in your version control repository so that you can discuss potential changes to your external dependencies via code review, just as you would discuss potential changes to your configuration itself.

Then in CI the variant of the terraform init command used is

terraform init -input=false -lockfile=readonly

which makes sure that the lock file is complete and up to date and refuses to modify it if it’s not.

I know what you’re thinking: I must have forgotten to run terraform init locally and commit the lock file changes. Nine times out of ten you’d be right.

This is the tenth time.

Some clues to lead us to the root cause:

  1. The exact same commit that generated the error above actually passed CI with flying colors a day before. I re-ran it for reasons that are unimportant here.

  2. When I ran terraform init locally to check if there would be changes it actually did generate some new entries in the lock file:

    diff --git a/terraform/preprod/.terraform.lock.hcl b/terraform/preprod/.terraform.lock.hcl
    index a735c763bd09..1aad821d2305 100644
    --- a/terraform/preprod/.terraform.lock.hcl
    +++ b/terraform/preprod/.terraform.lock.hcl
    @@ -113,6 +112,48 @@ provider "registry.terraform.io/hashicorp/local" {
       ]
     }
    
    +provider "registry.terraform.io/hashicorp/null" {
    +  version     = "3.2.4"
    +  constraints = ">= 3.0.0"
    +  hashes = [
    ...
    +  ]
    +}
    +
    +provider "registry.terraform.io/hashicorp/time" {
    +  version     = "0.13.1"
    +  constraints = ">= 0.9.0"
    +  hashes = [
    ...
    +  ]
    +}
    +
     provider "registry.terraform.io/hashicorp/tls" {
       version     = "4.1.0"
       constraints = ">= 3.0.0"
    
  3. When I ran terraform init locally the day before it didn’t add the lock file entries.

  4. A copy of the Terraform configuration for a different deployment configuration with the same providers and the exact same lock file did not have this problem, it worked fine in CI and terraform init did not want to add any lock file entries.

So, what gives?

Here’s what you need to know (or remember, if you already knew) about Terraform:

  1. It has configuration (your “.tf” files etc.).

  2. It has state where it tracks what’s present in your infrastructure.

  3. It makes is so that the state converges to the configuration, in particular:

    1. If a resource (like a VM) is configured to be there but it isn’t, it’s created.
    2. If a resource exists but there is no corresponding configuration, it’s removed.

I direct your attention at the last point. This is where the “issue” lies.

Terraform has to be able to tear down resources that aren’t present in the configuration. It may happen that some Terraform providers necessary for that are not referenced anywhere in the configuration. Yet they still participate in the lock files.

The terraform providers command helps identifying the problem, behold:

> terraform providers

Providers required by configuration:
...

Providers required by state:
...
    provider[registry.terraform.io/hashicorp/null]

    provider[registry.terraform.io/hashicorp/time]
...

Now, how come the state actually needed these providers? I actually deployed a different Terraform configuration to that environment from another branch, about which the failing commit didn’t know so in preparation to undo things it needed providers that were only required by the state, not by the configuration.

To resolve the problem I updated the failing branch with the changes from the already deployed branch (which included the lock file additions) and all was well again.

I’m sure there’s a critique of Terraform’s statefulness in here somewhere, but that’s a subject for another day.

For now just remember: it’s not just the Terraform configuration that decides what providers are needed, it’s also its state.