What NIST should know when updating the SSDF for AI

6 min read

Luis Villa photo

Luis Villa

VP Legal, focused on product

TL;DR overview

  • The Secure Software Development Framework must evolve to address non-deterministic AI code generation, massive agentic code volumes, and new active adversarial attack surfaces.
  • An upcoming SSDF update should mandate automated, independent verification layers because AI models cannot reliably self-certify their own probabilistic outputs.
  • Reviewing AI-generated code requires scaling up supply chain controls to handle automated workflows that routinely exceed human review capacities.
  • Combining deterministic AI code review, SCA, and CI validation ensures consistent code quality and security outcomes before software is shipped.

The Secure Software Development Framework was designed for a world where developers write code. That world is changing fast—and the update needs to reflect it.

Sonar analyzes more than 750 billion lines of code every day across 7 million software developers. We have a front-row seat to what AI is actually doing to software quality and security at scale. If NIST takes up revising SP 800-218 (SSDF) to cover AI technologies, we want to offer some observations grounded in that data—not as theoretical concerns, but as patterns we are seeing in production codebases right now.

The short version: the SSDF's current framing assumes deterministic tooling, human-scale code volumes, and software developers who can be trained to follow secure coding practices. None of those assumptions hold for AI. The update needs to address three structural problems, and then translate them into specific changes to existing controls.

The three structural problems

1. Non-determinism is not a quirk; it's a category shift

The SSDF's existing controls assume tools behave deterministically. Run the same test suite against the same code twice, you get the same result. That assumption is foundational to how the framework thinks about code verification, review, and testing.

AI breaks it. A prompt that produced safe, correct code yesterday has no guarantee of doing so today. This is not a limitation of any specific model; it is a structural property of probabilistic systems. The consequence is direct: you cannot rely on a code-generating tool to self-certify its own output. A deterministic, independent verification layer is not optional; it is the only technically valid substitute for the assumption of deterministic behavior that the current SSDF takes for granted.

The same logic applies to AI used as a code reviewer. LLMs reviewing their own output have documented consistency problems and self-review blind spots that differ fundamentally from systematic code analysis. AI-based review is a useful complement to, but not a substitute for, deterministic verification, and the SSDF should say so explicitly.

2. Developer review does not scale to agentic output

AI-generated PRs are substantially larger than developer-written ones. LLM patches modify an average of 14x more code than traditional tools, and agentic workflows correlate with a 3–5x increase in lines added per project. Thirty-eight percent of software developers already report that reviewing AI-generated code is more effortful than reviewing developer-written code. And research from Microsoft confirms what many already know intuitively: code review quality degrades sharply past ~20 files—a threshold agentic PRs routinely exceed by an order of magnitude.

This is not a staffing problem. It is a structural mismatch between a programming framework designed around human-velocity code production and a reality where AI can generate more code in an afternoon than a team ships in a sprint. SSDF controls that assume peer review as a primary quality gate need to explicitly account for automated code verification tooling. That tooling is not a supplement to review—it is the mechanism that makes review tractable at all.

The volume problem also cascades into the supply chain. More code velocity means more dependencies introduced per unit time. AI-assisted vulnerability remediation can accelerate dependency churn further, by identifying more CVEs and in turn causing more updates and upgrades. In other words, the core challenge of supply chain risk has not changed since SSDF 1.0, but its volume and velocity has. Existing controls (PO.3, PS.3) remain the right tools; the SSDF update just needs to make clear they must scale to match AI-driven development pace.

3. AI is an active attack surface, not just an unreliable tool

The first two problems are about AI being untrustworthy. The third is about AI being exploitable. AI agents with access to code repositories, CI/CD systems, and deployment pipelines introduce adversarial attack surfaces the SSDF has not had to reckon with: prompt injection (where malicious instructions hidden in files, tickets, or API responses cause an agent to execute attacker-controlled actions), memory-based privilege escalation across sessions, and backdoor persistence through adversarially crafted inputs that survive safety fine-tuning. These are documented attack classes, not theoretical concerns (see Anthropic's Zero Trust for AI Agents, May 2026), and controls for agentic systems should reflect this expanded threat model.

What this means for specific SSDF controls

These three structural problems map directly to gaps in existing SSDF language. Here is where the update should focus:

PW.5 (Create Source Code by Adhering to Secure Coding Practices) was designed around a developer who can be trained. When AI is writing the code, that splits into two distinct problems.

First: context injection replaces training. Secure coding practices can be fed to an AI agent as pre-generation context. However, this requires deliberate, structured effort: architectural rules, security constraints, and organizational standards injected into the agent's prompt at the right moment. This is fundamentally different from how organizations train developers, and the SSDF should recognize it as a distinct implementation path, not just an analogue of software developer training. 

Second: PW.7 and PW.8 shift from recommended to mandatory. Even with well-structured context injection, non-determinism means the generating tool cannot self-certify compliance with the standards it was given. Code review (PW.7) and testing (PW.8) therefore become mandatory, rather than optional, when AI is the author.

Verification must be layered. Different types of review, like AI code review, SCA, or CI validation, each cover failure modes the others do not. This is not redundancy; it is the only combination that is consistent, auditable, and repeatable at generation speed.

This matters because the safety net most software developers rely on is insufficient. Sonar's own research (analyzing over 4,400 AI-authored code submissions with SonarQube) found that passing unit tests has no correlation with code security or quality. The same research confirms a broader point worth making explicit in the SSDF: readable, maintainable code is also more likely to be secure. Code quality and security are not separate concerns, and AI makes both more urgent simultaneously.

The code verification gap is empirically documented. Sonar's data shows that coding models, left unverified, produce roughly 1,200 security issues per million lines of code analyzed. Ninety-six percent of software developers don't fully trust AI-generated code, yet only 48% consistently verify it before committing (State of Code report). The SSDF update is an opportunity to close that gap by prescription, not persuasion.

How does the NIST SSDF align with the EU Cyber Resilience Act

The SSDF update is also an opportunity NIST should not miss to deliberately align with EU Cyber Resilience Act requirements—particularly around vulnerability handling and secure development documentation. Organizations following SSDF should not be forced into duplicative or conflicting compliance work for software shipped into the EU market. CRA implementation guidance is actively being developed; building in harmonization now is far cheaper than retrofitting it later.

How should organizations verify AI-generated code before it ships?

Across all of these updates, the organizing principle should shift from tracking AI use to validating outcomes. Use tracking (which tools were used, by whom, when) is table stakes—especially for organizations subject to the EU AI Act—but it is not sufficient. What customers are demanding, and what the SSDF should reflect, is outcome guarantees: proof that code, regardless of how it was generated, met defined security and quality standards before it shipped.

Sonar’s Guide-Verify-Solve framework illustrates how to provide the right context before generation, applying deterministic automated analysis after, and enabling targeted remediation.This is how that outcome validation works in practice. But the principle is framework-agnostic: whatever the implementation, the SSDF needs to establish that in an AI-assisted world, verified outcomes are the standard, not a best practice.

Learn more about how Sonar is leading in AI code verification

Build trust into every line of code

Integrate SonarQube into your workflow and start finding vulnerabilities today.

Rating image

4.6 / 5

Unsubscribe