Terraform State Management Basics

Intermediate 10 min read

What you'll learn

✓Why Terraform needs state in the first place
✓Local vs remote backends and why remote always wins
✓State locking and how to recover from stuck locks
✓Workspaces, state splitting, and import strategies
✓How to keep state files safe and auditable

Prerequisites

•Basic Terraform familiarity

What and why

Terraform state is a JSON file that maps the resources in your configuration to the real-world objects in your provider (AWS instance IDs, GCP project numbers, Postgres role names). Without state, Terraform would have no way to know that the aws_instance.web in your code is the same EC2 instance it created yesterday.

The state file is also a cache. Terraform reads it during plan so it does not have to query every property of every resource from the provider. That speedup is real, but it means a stale state can produce wrong plans.

Mental model

State sits between your configuration and the real world. Configuration is what you want. The real world is what exists. State is what Terraform last saw.

+------------------+     terraform plan     +------------------+
|  main.tf         |  ------------------>   |  Provider APIs   |
|  desired state   |                        |  real resources  |
+------------------+                        +------------------+
      |                                            ^
      |       compares with                        |
      v                                            |
+------------------+      refresh / apply            |
|  terraform.tfstate +---------------------------->  |
|  last-known state                                  |
+------------------+

Plan = (config) - (state)
Apply = call provider APIs to make state match config
      then update state file

State as the bridge between config and reality

Hands-on example

A remote backend on S3 with DynamoDB locking:

terraform {
  required_version = ">= 1.7"
  backend "s3" {
    bucket         = "acme-tf-state"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tf-locks"
    encrypt        = true
  }
}

Bootstrap the backend with another small Terraform project (or by hand) before you can use it:

resource "aws_s3_bucket" "tf_state" {
  bucket = "acme-tf-state"
}

resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_dynamodb_table" "tf_locks" {
  name         = "tf-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

Versioning on the bucket gives you point-in-time recovery. DynamoDB provides the lock so two engineers cannot apply at the same time.

Initialize and apply:

terraform init
terraform plan -out tfplan
terraform apply tfplan

init configures the backend and downloads providers. plan writes a binary plan file you can review. apply tfplan executes exactly that plan, with no surprises from a config change in between.

When you need to bring an existing resource under management, import it:

terraform import aws_s3_bucket.logs my-existing-logs-bucket

Then add the matching resource block in code and run plan until the diff is empty.

For multiple environments, prefer separate state files per environment (different key paths in the backend) over workspaces. Workspaces share variables and backend config; separate states isolate blast radius.

Common pitfalls

Committing state to git. State files contain secrets in plain text (database passwords, generated keys) and grow until git is unhappy. Add *.tfstate* to .gitignore from day one.

Running terraform apply from a laptop against shared infrastructure. With no lock and no central state, two engineers can race and corrupt resources. Always use a remote backend with locking.

Editing state by hand. terraform state has subcommands (mv, rm, replace-provider) for safe surgery. Editing the JSON directly is a last resort and almost always wrong.

terraform destroy on the wrong workspace. Workspaces look identical at the CLI prompt; setting TF_WORKSPACE or putting the workspace in the shell prompt prevents catastrophes.

Drifting state. Someone clicks in the console, the real resource changes, the state file does not. The next plan tries to “fix” the drift, sometimes destructively. Run terraform plan regularly in CI on every environment, and treat unexpected diffs as alerts.

Production tips

Encrypt state at rest and in transit. The S3 backend with encrypt = true and a KMS key is the standard pattern. Restrict who can read the bucket; state is privileged.

Split state by blast radius, not by tidiness. Network, IAM, data, and application layers in separate states means a bad apply in the app layer cannot accidentally destroy the VPC.

Pin everything. Pin Terraform versions in required_version, pin providers in required_providers with exact versions, and use a lockfile (.terraform.lock.hcl). Reproducible plans depend on it.

Use -refresh-only plans periodically to detect drift without changing config.

Treat state as audit. Versioned S3 plus access logs on the bucket give you who-applied-what across time. Pair with an OIDC-based CI role so only the pipeline can touch state.

For very large estates, evaluate Terragrunt or a registry-backed module pattern. Both help you keep DRY without coupling unrelated state files.

Wrap-up

State is the memory of your infrastructure. Use a remote backend with locking and encryption, never commit state to git, never edit it by hand, and split it by blast radius. Pin versions, plan in CI, watch for drift, and treat the state file like a database: backed up, versioned, and locked down. With those habits, Terraform becomes predictable instead of scary.