Blog post

Crafting regexes to avoid stack overflows

February 23, 2021

Sebastian Hungerecker

Developer

Due to the way regular expression matching is implemented in Java (and many other languages/libraries), matching a pattern may - depending on the regex - require stack space proportional ...

TL;DR overview

Poorly crafted regular expressions can cause catastrophic backtracking—known as ReDoS (Regular Expression Denial of Service)—where the regex engine exponentially increases execution time or causes a stack overflow on certain inputs.
SonarQube's regex analysis rules detect patterns known to cause catastrophic backtracking—such as nested quantifiers and ambiguous alternation—and flag them before they reach production.
The fix is often straightforward: using possessive quantifiers, atomic groups, or simplifying the regex to eliminate the ambiguity that causes backtracking.
Developers writing validation logic, parsers, or input sanitization should treat regex complexity with the same attention as algorithmic complexity—worst-case regex performance is a real attack surface.

We've been working recently on adding rules to help write better regular expressions in Java. I've talked already about rules to find errors in character classes and with the use of boundary markers and overly complex regular expressions. Those rules help you make sure your regular expressions are accurate and maintainable. Today I will show you how to make sure that the regular expression won’t crash your program.

As a brief reminder: regular expressions are a terse and powerful mechanism for matching patterns in strings. Part of the power of regular expressions is that a pattern can concisely describe a string that might be much larger than the original pattern. Sometimes much, much larger. Sometimes enough to overflow the stack and crash your application.

Due to the way regular expression matching is implemented in Java (and many other languages/libraries), matching a pattern may - depending on the regex - require stack space proportional to the length of the input. This means large inputs could cause the program to crash with a StackOverflowException when you try to use the regex.

We ran into this problem in our own code when analysis tried to find comments containing at least one word with more than a given number of characters. For this we used a regex similar to #(.|\n)*\w{3,}, which would cause the analysis to crash on source files containing sufficiently long comments. To prevent problems like this for our future selves as well as for you, dear users, we implemented a rule which can detect problems like this:

java:S5998 - Regular expressions should not overflow the stack

An issue is raised on a repetition in a regex which could lead to a stack overflow.

In theory any regex is susceptible to stack overflow if it contains a repetition that contains some sort of branching, such as another repetition or an alternative. However, for some regular expressions the length of input required to make them crash is much larger than for others.

To account for this, our rule tries to estimate how much stack space a regular expression will consume relative to the input size and only raises issues on regular expressions whose stack consumption exceeds a configurable threshold.

One way to avoid stack consumption in regular expressions is to use possessive quantifiers. Possessive quantifiers are created by adding a + to a quantifier (e.g. x*+ instead of x*). Doing so disables backtracking. In addition to avoiding issues with catastrophic backtracking, making a quantifier possessive allows a pattern to be matched without consuming stack space. The problem is that sometimes backtracking is necessary and using possessive quantifiers to disable it may leave you with a regex that can never match any input.

Consider for example the regex I cited above: #(.|\n)*\w{3,}: If we try to fix the stack overflow by making the quantifier possessive, we end up with #(.|\n)*+\w{3,}. This regex can never match anything because any input that could be matched by \w will already have been matched by (.|\n)*+ and, being possessive, it won’t give it back. Luckily we have a rule that warns you about issues like this:

java:S5994 - Regex patterns following a possessive quantifier should not always fail

An issue is raised on an impossible-to-reach sub-pattern

So how would one properly fix this issue? In our case, we decided to get rid of the regex altogether since there wasn’t really a good reason to use a regex here. But the regex could also be salvaged easily enough by getting rid of the alternation. The intent of .|\n was to match any character, including line breaks, because . does not match line breaks by default. The proper way to match any character would be to enable the DOTALL flag, which can also be enabled from within the regex using (?s). So either Pattern.compile(“#.*\\w{3,}”, Pattern.DOTALL) or ”(?s)#.*\\{3,}” would work (the latter of which also works when using methods that don’t involve Pattern.compile).

Another regex susceptible to stack overflows would be something as simple as (a|b)*. Here the alternation can be removed by using a character class instead, which doesn’t require stack space to match: [ab]*. Luckily we have a rule that finds alternations that can be replaced by character classes and suggests just that replacement like in the following Spring Boot code:

java:S6035 - Single-character alternations in regular expressions should be replaced with character classes

An issue is raised on an alternation can can be replaced with a character class.

Regular expressions are a truly powerful feature. Because great power comes with great responsibility (and great opportunities for screwing things up, or confusing your colleagues and future-you), we now offer a total of 24 rules targeting regular expressions. They are all available today in SonarQube Cloud. These three rules around avoiding stack overflows are available starting from SonarQube Server 8.7, but the rest are already released. You can see them all in the product. You can view all our supported programming languages in our documentation.

Finally, note that some of the rules perform in-depth automata-based analyses on your regular expressions to identify issues as accurately as possible. This is an extremely promising approach and we will probably discuss this feature in a future blog post, because we truly believe it’s amazing. This will for sure bring A LOT to regex code quality in general, so stay tuned!

---

This is the third installment in a series on what can go wrong in writing Regular Expressions:

Something to add? Join us in the community!

Free Video! See common mistakes while writing regular expressions and how to test them.

Watch now

SonarQube Cloud

SonarQube Server

SonarQube for IDE

SonarQube Advanced Security

GitarNew

Sonar VortexNew

SonarQube Remediation Agent

MCP Server / SonarQube CLI

SonarSweepEarly access

SonarQube Cloud

SonarQube Server

SonarQube for IDE

SonarQube Advanced Security

GitarNew

Sonar VortexNew

SonarQube Remediation Agent

MCP Server / SonarQube CLI

SonarSweepEarly access

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

AI code quality

Developer-led security

Automated code review

Platform engineering

Compliance & reporting

SDLC governance

Secrets detection

Supply chain security

All use cases

Agent Centric Development Cycle (ACDC)

AI solutions

Architecture management

Security solutions

Code quality solutions

ROI calculator

LLM leaderboard

SonarQube vs GitHub Code Quality

Healthcare

Financial services

Retail

Federal government

Our customers

Customer stories

Developer hub

Learning center

Commitment to open source

Community

Developer guides

SonarQube Server

SonarQube Cloud

SonarQube for IDE

Sonar Vulnerability database

GitHub

Bitbucket

Azure DevOps

GitLab

See all

Java

JavaScript