Blog post

What is 'taint analysis' and why do I care?

February 10, 2020

G. Ann Campbell

Community Manager

TL;DR overview

Taint analysis is a static analysis technique that tracks untrusted user input (sources) through application code paths to security-sensitive operations (sinks), identifying injection vulnerabilities without executing the code.
The technique detects critical vulnerability classes including SQL injection, cross-site scripting, command injection, and path traversal by modeling all possible data flows across functions and files.
SonarQube's taint analysis is cross-function and cross-file, reducing false positives by only raising issues when a proven, exploitable path exists from source to sink without sufficient sanitization.
Taint analysis is available in SonarQube Server and SonarQube Cloud commercial editions and complements traditional SAST by finding deeply hidden vulnerabilities that pattern-based detection misses.

He covered a wet, hacking cough with his hand, then pushed through the door off the ward. I reached the same door, and hesitated. The Cougher had just tainted the door with his germs. If I touched it, I'd be tainted too.

---

These days we all know what germs are and how they're passed from person to person, and from hand to door to hand. The fact is that particularly in cold and flu season you have to regard every doorknob, and every elevator button as suspicious. You alwayswash your hands afterward, because you never know which doorknob is tainted with germs. You have to assume they all are.

And the same is true for the data you get from your users. Not every user is a bad actor. In fact, most aren't. But some are. Some want to infect your systems - to get access to your users, their passwords, their mothers' maiden names, and anything else they can sell - and they'll do anything to accomplish that. So you have to treat every user's data as if contained The Plague, and sanitize accordingly.

Unfortunately, in large systems that's easier said than done. First you have to find all the places you accept data from users, and then you have to sanitize the data before you use it. The hard part is making sure you've found all the sources of user data and intervened before any kind of use. That's where taint analysis comes in.

Taint analysis identifies every source of user data - form inputs, headers, you name it - and follows each piece of data all the way through your system to make sure it gets sanitized before you do anything with it. And by "all the way through" I mean all the way through. Here's a simple example from the OWASP Benchmark project, an intentionally insecure application built to test analyzers:

Here, SonarQube Server shows us that

At line 47, data provided by the user is retrieved and assigned to the variable 'param'. 'param' is now tainted by user input.
Line 51, 'param' gets manipulated - but not sanitized! It's still tainted.
Line 54, 'param' is incorporated into the value of 'sql'. 'sql' is now tainted too!
Lines 58-59, 'sql', which is tainted with raw user input, is sent to the database :-(

Of course, in that example, everything is contained in a single method. The problem is easy to spot... if you know what to look for… and where to look… and that you shouldlook.

So let's look at something slightly more complicated. This one's from Securibench micro, another test-the-analyzers project:

Here, in the 'doGet' method, user-supplied data is stored in a collection. Then in another method in a different file, it's retrieved from the collection and sent to the database. Again, without being sanitized. In the SonarQube Server UI this example is easy to understand because all the relevant files are shown together, with each propagation of the taint highlighted, but it would be much harder than the first example to find manually. Because if you start from the 'doGet' method, you have to find every place the method is called from and then follow the data it returns until it's no longer "live" to make sure it's not misused. On the other hand, you could start from the other end and go backward to the source of every value sent to this "sink" (place where the data is stored/used). That might be a little cleaner, but it's no less painful.

And that's why you want taint analysis. Because it traces user-tainted data from its source to your sinks, and raises the alarm when you use that data without sanitizing it. It helps you protect your data, your users, and your reputation from hackers and accidents.

Taint analysis of Java, C#, PHP, and Python is free on SonarQube Cloud for open source projects, and available in SonarQube Server commercial editions as part of SonarSource's larger SAST (Static Application Security Testing) offering. Later in 2020, SonarSource's SAST offering will expand to include JavaScript, TypeScript, C and C++.

SonarQube Cloud

SonarQube Server

SonarQube for IDE

SonarQube Advanced Security

GitarNew

Sonar VortexNew

SonarQube Remediation Agent

MCP Server / SonarQube CLI

SonarSweepEarly access

SonarQube Cloud

SonarQube Server

SonarQube for IDE

SonarQube Advanced Security

GitarNew

Sonar VortexNew

SonarQube Remediation Agent

MCP Server / SonarQube CLI

SonarSweepEarly access

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

Developer hub

Learning center

Commitment to open source

Community

Developer guides

SonarQube Server

SonarQube Cloud

SonarQube for IDE

Sonar Vulnerability database

GitHub

Bitbucket

Azure DevOps

GitLab

See all

Java

JavaScript