When linting is not enough

10 min de lecture

Choosing the right code analysis for AI-assisted development

TL;DR overview

  • Modern software teams require multilayered verification that goes beyond basic linting for AI-assisted development.
  • Multilayered verification engines detect deep semantic bugs, like SQL injection, through control flow graphs.
  • Automated security tools mitigate supply chain risks by identifying malicious packages and hardcoded secrets.
  • Architecture-as-code enforcement prevents structural decay and technical debt caused by verbose, agent-generated code.

AI coding tools have changed a basic assumption of software development. When an assistant can generate hundreds of lines of valid, well-formatted code in seconds, “the developer wrote it and reviewed it” no longer describes what actually happened. When an agent can modify 20 files across a service boundary in an hour, “the team reviewed the pull request” does not mean what it used to.

The risk profile has shifted along three axes simultaneously. The bugs are deeper. The attack surface is wider. The structural decay is faster. A linter addresses none of these. This article examines the three categories of risk that the Agent Centric Development (AC/DC) introduces and why each requires analysis that operates beyond pattern matching against syntax.

What linters do, and where they stop

A linter parses source code into an abstract syntax tree (AST) and applies rules against its structure. Rules operate as pattern matchers: they see what expressions exist and how they nest, but nothing about what the code does when it runs. This gives linters a well-defined and genuinely useful scope. They catch syntax errors, undefined variables, unused imports, style violations, and simple type mismatches. They run in milliseconds. For a team with no static analysis at all, adopting a linter produces immediate quality improvements.

However, linters have no model of program execution, no representation of how values move through a system, and no ability to reason about what happens when function A in module B passes a value to function C in module D. That kind of analysis requires an analytical model that builds graphs of program behavior and reasons over them mathematically.

A multilayered code verification engine like SonarQube operates at this mathematical reasoning level, covering issues from syntactic pattern matching through control flow, data flow, and taint analysis, in a single integrated platform. A team adopting it gets linting as one layer within a much deeper stack. 

This allows development teams to address a range of critical security, maintainability, and reliability issues that would otherwise be missed. 

Three key risk vectors that linters miss:


1. Deep bugs and vulnerabilities that syntax cannot reveal

Static analysis is a spectrum. Linting occupies the first level. Each level beyond it exists because there are real, consequential categories of bugs that the previous level cannot detect.

Control flow analysis

Consider a function where one branch initializes a database connection and another does not, but both paths later attempt to use that connection. Or a function where a condition is always true, making an entire else branch dead code that silently hides the logic it was supposed to provide. A linter sees individual statements. It does not model the relationships between them.

Detecting these requires constructing a control-flow graph (CFG), a directed graph of every possible execution path through a function. That means every branch, loop, exception handler, and early return. The analysis engine walks each path to determine whether variables are initialized before use, whether conditions are satisfiable, and whether all branches are reachable. A CFG answers a question linting cannot: “what can this code actually do?”

AI-generated code is particularly prone to these issues. AI coding tools frequently generate redundant guard clauses, impossible condition combinations, and dead branches that look plausible but never execute. The code reads well. It just does not behave as expected.

Data flow analysis

A more dangerous category of bug emerges when the problem is not about which paths execute, but about what data does as it moves through them. Consider a function that receives a user object, extracts the user’s role, passes it through a formatting function, then uses it to construct a file path. Is the role validated before it reaches the file system call? The answer depends on tracing the value across multiple assignments and function boundaries.

This is data flow analysis: constructing a data-flow graph (DFG) that models how values are assigned, transformed, and consumed across functions, files, and modules. Without it, you cannot determine whether a null can propagate from a failed database lookup to a crash three function calls later, or whether two concurrent code paths are operating on stale vs. fresh copies of the same data. These bugs cause production incidents, and they are invisible at the AST level because the syntax at every individual point is perfectly valid.

Taint analysis

Taint analysis applies data flow reasoning specifically to security. The engine identifies sources (places where untrusted data enters: HTTP parameters, file contents, environment variables) and sinks (places where data is consumed dangerously: SQL queries, shell commands, file system operations). It then applies graph reachability algorithms to determine whether any execution path connects a source to a sink without passing through an adequate sanitizer. This is mathematical reasoning in the formal sense—the codebase is modeled as a graph, and properties of that graph are computed and checked against security conditions. The question is not “does this line look dangerous?” but “can untrusted data reach this dangerous operation through any sequence of calls?”

Consider this Python function, exactly the kind of code an AI assistant might generate:

def get_user_profile(username: str) -> dict:
    query = f"SELECT * FROM users WHERE username = '{username}'"
    return db.execute(query).fetchone()

A linter finds nothing wrong. The syntax is valid, the type hint is present, the f-string is well formed. SonarQube identifies username as a taint source (external input), traces it through the f-string interpolation into db.execute() (a SQL sink), determines no parameterization or sanitization occurs on that path, and raises a confirmed SQL injection vulnerability.

This example is simple enough that an experienced reviewer might catch it. Real taint flows are not. In a documented analysis of the OpenAPI Generator project, SonarQube traced a taint flow that propagated user-controlled data through 28 distinct steps across multiple files before reaching a dangerous file system operation, leading to the discovery of CVE-2024-35219, an arbitrary file read and deletion vulnerability rated CVSS 8.3. No linter rule, and no practical code review process, would catch a 28-step cross-file taint flow. It requires graph traversal across a model of the full program.

The scale of the problem is quantifiable. A Carnegie Mellon University study (Zhao et al., 2025) benchmarked an AI coding agent on 200 real-world feature request tasks drawn from open-source projects. Although 61 percent of the agent’s solutions were functionally correct, only 10.5 percent were secure. Roughly 80 percent of solutions that passed behavioral tests still failed security tests, with common failures including timing side-channels in authentication checks and redirect vulnerabilities that allowed header manipulation. Functional correctness and security are not correlated: code that works is not necessarily code that is safe. AI-generated code passes linting reliably. The vulnerabilities it introduces are semantic, not syntactic. Mathematical reasoning over a program model is designed to catch them.

Even at the syntactic and semantic levels where linters also operate, SonarQube’s analysis covers categories that typical linters do not target: null pointer dereference detection, resource leak detection (file handles, database connections, or streams never closed on all execution paths), exception handling anti-patterns (swallowed exceptions, overly broad catch blocks), and redundant logic detection (identical if/else branches, conditions that are always true). These are the bugs that cause production incidents, and detecting them requires reasoning about program behavior, not just program structure.


2. Supply chain security in an agent-centric development world

The second risk vector is not in the code a team writes, but in the code it imports. AI assistants routinely suggest dependencies. Agents install them autonomously. The supply chain attack surface has expanded accordingly, and the trend is accelerating.

Malicious packages on PyPI and npm are no longer rare occurrences. Typosquatting campaigns, dependency confusion attacks, and packages that exfiltrate credentials on install have become a persistent feature of the ecosystem. In March 2026 alone, attackers compromised the Axios npm package (over 100 million weekly downloads) through social engineering of its lead maintainer, publishing versions that installed a remote access trojan. Days earlier, the LiteLLM AI infrastructure library on PyPI was compromised through a poisoned CI/CD pipeline, exfiltrating cloud credentials from every environment where the package was installed. An AI assistant that suggests the wrong package name, or an agent that resolves a dependency to a malicious fork, introduces a vulnerability that no amount of source code analysis will catch by examining the project’s own code alone. The problem is not in the code you write. It is in the code you trust.

SonarQube Advanced Security addresses this with several capabilities that operate at different points in the supply chain. Malicious package detection, drawing on the Open Source Security Foundation (OSSF) Malicious Packages dataset, raises blocker-level alerts when a known malicious package appears in a project’s dependency tree on PyPI and npm. Software composition analysis (SCA) maps dependencies to known CVEs, surfacing vulnerability information directly in the IDE and CI pipeline so developers see the risk at the point of decision rather than in a separate security report weeks later. For organizations subject to compliance requirements like the EU Cyber Resilience Act or US executive orders on software supply chain security, SCA also provides the foundation for generating a Software Bill of Materials (SBOM) — an increasingly mandatory inventory of every component in a deployed system.

SonarQube’s secrets detection covers more than 450 patterns for API keys, tokens, and credentials, the kind of sensitive data that agents are particularly prone to hardcoding. The SonarQube CLI, designed to run as a pre-commit hook, catches leaked credentials before they ever enter version control. For organizations managing incident response, this shifts the timeline from “discovered in a scan after merge” to “blocked before commit.”

This matters because agent-centric development compresses the window between “dependency added” and “code deployed.” When a human developer adds a dependency, there is typically a moment of judgment to determine whether the package is trustworthy. When an agent adds one as part of a larger autonomous task, that judgment step may not exist. The verification has to be automated, and it has to operate at the dependency level, not just the source level.


3. Architectural sanity and the compounding cost of AI slop

The third risk vector is the most insidious because it does not announce itself as a bug or a vulnerability. It announces itself as a codebase that gradually becomes more complex and harder to work with until agents themselves start failing.

Today, AI assistants are stateless across files and sessions. They regenerate similar logic independently in different parts of a codebase, producing near-duplicate implementations with subtle behavioral differences. They introduce dependencies between modules designed to be independent, generating deeply nested control flow that passes lint checks but is impossible to safely maintain. Researchers in the AI code quality space describe the result as “comprehension debt” (code that works but cannot be understood) and “context debt” (implementations that ignore existing patterns because the assistant lacked awareness of them).

The pattern is familiar to teams that have adopted agents. Even if the first 80 percent of a task gets done quickly, the mess of architectural inconsistencies and compounding technical debt starts surfacing shortly after. Agents enter loops of fixing one thing and breaking another, a form of whack-a-mole where each fix introduces a new inconsistency. A 2026 benchmark study (SlopCodeBench) formalized this by testing 11 coding agents on iterative development tasks where specifications evolve over time, the way real software actually works. Quality degraded in 80 percent of trajectories. Agent-generated code was 2.2 times more verbose than human code. No agent solved any problem end to -end. The root cause is that no verification of structural quality happened along the way. By the time the decay is visible, unwinding it is expensive.

SonarQube addresses this at multiple levels. Cognitive complexity scoring, calibrated to human readability rather than cyclomatic complexity, flags functions that have become too complex to safely modify. Token-level duplication detection across all projects catches the near-duplicate implementations that agents produce, even when variable names differ. The technical debt ratio (TDR) expresses remediation cost as a percentage of development cost, making invisible decay visible and quantifiable.

The most direct answer to architectural drift is SonarQube’s architecture management capability. This feature allows teams to define their intended architecture as code: specifying components and the allowed dependencies between them. SonarQube reverse-engineers the actual component relationships from the codebase and detects violations, places where the code has drifted from the intended design. These violations surface as maintainability issues in quality gates, meaning architectural drift blocks a pull request the same way a vulnerability would. The feature currently supports Java, JavaScript, TypeScript, Python, and C#.

This is particularly valuable in agentic workflows. An agent modifying 20 files has no awareness of architectural boundaries unless those boundaries are enforced programmatically. Architecture-as-code makes the intended structure machine-readable and verifiable at each step of an agent’s work, not just at the end. The alternative is discovering after hundreds of agent-generated commits that the module boundaries have dissolved into a monolith that no agent or human can safely modify.

Quality gates tie all of these capabilities together operationally. A gate defines conditions that must pass before a pull request can merge: zero new vulnerabilities, zero unreviewed security hotspots, no architectural violations, duplication below a threshold. Teams make deliberate decisions about existing issues while ensuring new code meets their defined standard. In an agentic workflow, the quality gate is the automated reviewer that does not lose focus after the fourteenth file change.

The core distinction

Linters and multilayered verification engines are not competing for the same job. A linter is the right tool for syntactic quality: fast feedback on formatting, style, and obvious anti-patterns. It is fast precisely because it does not build a model of program execution. It is limited for the same reason. It is a necessary first layer, but not complete.

AI-assisted and agent centric development has shifted the risk along three vectors that linting cannot reach. The bugs are deeper: semantic vulnerabilities that span files and functions, invisible at the syntax level, detectable only through mathematical reasoning over program graphs. The attack surface is wider: supply chain threats from malicious packages, leaked secrets, and vulnerable dependencies that agents introduce without human judgment. The structural decay is faster: architectural drift, compounding duplication, and complexity that accumulates at generation speed until the codebase resists further modification.

Whether that depth of analysis is necessary depends on your codebase, your team, and your risk tolerance. For teams deploying AI tools in production systems at scale, the question is no longer whether linting is enough. It is how quickly the gaps become visible.

Instaurer la confiance dans chaque ligne de code

Rating image

4.6 / 5