AI is writing more of your Terraform

12 min lesen

TL;DR Overview

  • AI-generated Terraform fails in four recurring ways (permissive defaults, missing security blocks, hardcoded values, and stale provider patterns) that map to the same misconfigurations behind real cloud breaches.
  • Peer-reviewed benchmarks from NeurIPS 2024, FSE 2026, and ICSE 2026 show frontier LLMs pass real-world Terraform tasks at rates below 40%, with HCL specifically harder for AI than YAML and JSON-based IaC.
  • terraform validate and terraform plan will not catch these failures because they verify syntax and state delta rather than security posture, which requires semantic analysis.
  • SonarQube covers Terraform from authoring to merge: IDE feedback via SonarQube for IDE, pull request and merge-gate analysis via SonarQube Cloud or SonarQube Server, and agent-side verification via Agentic Analysis on SonarQube Cloud.

According to Sonar's 2026 State of Code Developer Survey, 42% of code committed today is written or assisted by an AI agent, with that share expected to grow to 65% by 2027. The same AI coding agents that ship your application code are also writing your infrastructure as code: scaffolding modules, generating resource blocks, and filling IAM policies on the fly.

The rate at which agents are producing IaC is now outpacing what traditional, manual reviews and pipelines can absorb, and misconfigurations are reaching production faster than ever. This raises two important concerns: when an AI agent writes your Terraform, what does it get wrong? And what catches the mistakes before they reach production?

How often does AI get Terraform wrong?

Often enough to take seriously. The IaC-Eval benchmark from NeurIPS 2024 measured GPT-4 on 458 human-curated AWS Terraform scenarios. Pass@1 accuracy was a measly 19.36%. The same model scored 86.6% on equivalent Python tasks in EvalPlus, meaning it performed four times worse on infrastructure as code than on application code.

The pattern hasn't reversed on newer models. The DPIaC-Eval study, accepted at FSE 2026, tested six frontier LLMs across 153 real-world IaC tasks. First-attempt deployment success ranged from 20.8% to 30.2%. Filtered compliance under standard Checkov policy checks returned an even-lower, 8.4% rate. The benchmark is CloudFormation-anchored, but the underlying dynamics (training-corpus age, missing intent context, and permissive defaults) carry across IaC formats.

A 2026 paper accepted at ICSE, TerraFormer, tested 17 frontier LLMs and explained why Terraform takes the hit harder than other IaC formats: HCL is less prevalent in LLM training data than the YAML and JSON syntaxes used by Kubernetes, CloudFormation, and Ansible, so agents have access to fewer examples of proper Terraform to model from.

Generating Terraform that looks right is one problem. Generating Terraform that's secure, current, and free of fabricated references is another. At the confluence of these studies exists a small set of recognizable patterns.

The four ways AI gets Terraform wrong

Each of the four failure modes below names what the agent does, not the resulting CVE category (which is downstream of the behavior).

1. The reach for * problem

When an AI agent can't reason about the access set the team intended, it often defaults to permissive: wildcard IAM, public S3 ACLs, security groups open to 0.0.0.0/0 on SSH or RDP, etc. Restrictive configurations need boundary context that the prompt rarely supplies; permissive defaults satisfy the immediate request and ship.

A real pattern, generated:

# Generated by an AI assistant: it looks reasonable, but isn't
resource "aws_iam_policy" "app_policy" {
  name = "app-service-policy"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "*"
      Resource = "*"
    }]
  })
}

The scoped version the agent should have written:

resource "aws_iam_policy" "app_policy" {
  name = "app-service-policy"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:PutObject"]
      Resource = "${var.bucket_arn}/*"
    }]
  })
}

In our example, SonarQube catches this with rule S6304, which flags policies granting access to all resources of an account.

2. The silent omission problem

At times, AI-generated resources appear complete but skip the security-relevant blocks: aws_db_instance without storage_encrypted, aws_cloudfront_distribution without a logging_config block, or Azure storage accounts without encryption configuration (for example). In such cases, the resource comes up but the protection doesn't. Encryption and logging aren't load-bearing for "does this resource exist?" so they're easy to omit, and the omission is invisible in code review unless the reviewer knows what to look for.

The pattern, generated:

resource "aws_db_instance" "primary" {
  identifier        = "app-db"
  engine            = "postgres"
  instance_class    = "db.t3.medium"
  allocated_storage = 100
  username          = "appadmin"
  password          = var.db_password
}

The safe equivalent:

resource "aws_db_instance" "primary" {
  identifier        = "app-db"
  engine            = "postgres"
  instance_class    = "db.t3.medium"
  allocated_storage = 100
  username          = "appadmin"
  password          = var.db_password
  storage_encrypted = true
  kms_key_id        = aws_kms_key.db.arn
}

In our example, SonarQube rule S6303 catches unencrypted RDS resources. The sibling rule S6258 catches the disabled-logging variant across AWS, Azure, and GCP services..

3. The hardcoded everything problem

It’s not uncommon for agents to hardcode such things as: literal secrets in *.tf or *.tfvars, fabricated credentials that look real enough to commit, magic numbers, and CIDRs inlined where the team uses variables or data sources. In our experience, agents reach for literals when the prompt doesn't supply variable scaffolding. Once a literal credential lands in Git history, it lives there.

The generated pattern:

resource "aws_db_instance" "primary" {
  identifier = "app-db"
  username   = "appadmin"
  password   = "P@ssw0rd-Summer-2026!"
}

The safe pattern, sourced from a secret manager:

data "aws_secretsmanager_secret_version" "db_creds" {
  secret_id = aws_secretsmanager_secret.db.id
}

resource "aws_db_instance" "primary" {
  identifier = "app-db"
  username   = "appadmin"
  password   = data.aws_secretsmanager_secret_version.db_creds.secret_string
}

Sonar's secrets detection flags hardcoded credentials across source files in your project, and Sonar's IaC analysis includes its own credential-detection rules. Both are available in SonarQube Server and SonarQube Cloud.

4. The stale training data problem

AI agents sometimes emit attribute names and resource arguments that were correct a year or two ago but are now deprecated, renamed, or removed. The AWS provider v4 to v5 transition is the canonical case: in v4, you could set the ACL inline on an aws_s3_bucket; v5 split that into a dedicated resource type and removed the inline attribute.

The pattern, generated in the v4 style:

# Generated by an AI assistant: works on AWS provider v4, broken on v5+
resource "aws_s3_bucket" "app_data" {
  bucket = "my-data-bucket"
  acl    = "private"
}

The current v5+ pattern:

resource "aws_s3_bucket" "app_data" {
  bucket = "my-data-bucket"
}

resource "aws_s3_bucket_acl" "app_data" {
  bucket = aws_s3_bucket.app_data.id
  acl    = "private"
}

TerraFormer explains the mechanism: training corpora age, Terraform providers don't.

Why your existing checks miss these problems

terraform validate checks that your HCL is syntactically valid and internally consistent. HashiCorp's docs are explicit: “It does not validate remote services, such as remote state or provider APIs.” terraform plan previews the state delta the provider will apply, but it doesn't evaluate whether the resulting configuration is secure or appropriate.

A syntactically perfect aws_s3_bucket with a public ACL passes both. So does an aws_iam_policy with Action = "*". So too does an aws_db_instance with no encryption block. The four failure mode patterns highlighted in this article form a gap these tools leave open by design and catching them requires analysis that understands what the configuration means, not just whether or not it parses.

SonarQube as the IaC verification layer

SonarQube's Infrastructure as Code analysis covers Terraform, Azure Resource Manager, and CloudFormation, with adjacent platform and pipeline configurations also analyzed (like Ansible, Docker, and Kubernetes) in both SonarQube Server and SonarQube Cloud. The analyzer parses your HCL into an AST and applies extensive coverage across AWS, Azure, and GCP misconfiguration patterns, including the four failure modes above. The rule corpus is maintained as new provider versions ship (recently expanded in SonarQube Server 2026.1 LTA), which is how detection keeps pace with the schema drift behind failure mode 4.

The same rules run at four touchpoints:

  • In the IDE: SonarQube for IDE flags issues as the Terraform is written.
  • Inside the agent session: Agentic Analysis, on SonarQube Cloud, lets an AI agent ask SonarQube to check its work in agent modes and CLIs before the code ever reaches a PR.
  • On the pull request: SonarQube Cloud and SonarQube Server decorate the PR with rule findings across GitHub, GitLab, Azure DevOps, and Bitbucket.
  • At the merge gate: the quality gate decides pass or fail. If you run SonarQube Cloud with Terraform Cloud, the Run Task integration checks the latest gate status at the pre-plan stage.

The same Sonar survey that put AI's authoring share at 42% also found that SonarQube users report being 44% less likely to experience outages caused by AI-generated code. That gap is what a consistent quality gate buys you: every commit that touches Terraform runs the same rules and faces the same merge decision.

The Agent Centric Development Cycle, applied to infrastructure

AI code generation tools are moving so fast that manual IaC reviews and CI/CD pipelines struggle to keep pace with the volume of incoming changes, necessitating a verification layer that catches mistakes and misconfigurations as and when they arise.

That's the Agent Centric Development Cycle in practice for infrastructure. SonarQube applies the same rules, the same severity, and the same merge decision to every Terraform commit. Whether an AI agent ships a wildcard IAM policy, omits an encryption block, or inlines a literal credential, the mistake gets caught at the source, before it enters production.

What's next

Schaffen Sie Vertrauen in jede Zeile Code

Integrieren Sie SonarQube in Ihren Workflow und beginnen Sie noch heute mit der Suche nach Schwachstellen.

Rating image

4.6 / 5