CVE-2025-22872 - Critical HTML Tokenizer Bug Exposes DOM Manipulation Flaws in Foreign Content Like `<svg>` and `<math>`

A new security concern, CVE-2025-22872, has been identified in widely-used HTML parsing libraries. This issue focuses on how the tokenizer misinterprets certain tags—specifically, tags with unquoted attribute values ending with a slash (/)—as self-closing. This can trigger incorrect DOM construction, especially in contexts like <svg> and <math> where *foreign content* is parsed.

Let’s break down what this means, how it happens, and what developers should do.

Understanding the Core Vulnerability

In HTML, self-closing tags are written like <img src="x.jpg" />, which the browser recognizes as an element that doesn't wrap any content. However, if you write something like <path d=foo/> inside <svg>, and the d attribute value doesn't have quotes, the tokenizer incorrectly marks this as a self-closing tag due to the trailing /.

What Should Happen

The correct way to specify a self-closing tag in HTML always follows the rules of quoted attribute values and, even more so, XML rules in contexts like SVG or MathML. But the tokenizer, when it sees something like:

<svg>
  <path d=foo/>
  <circle cx=50 cy=50 r=40 />
</svg>

It should recognize only the <circle> as self-closing, because the / is outside of the attribute value and after a space. The <path> tag's / is right next to the foo—which is technically still part of the attribute value.

What Actually Happens (The Bug)

Due to CVE-2025-22872, some parsers break this rule. They treat any / at the end of an unquoted attribute value as a self-closing indicator. That means in code like this:

<svg>
  <path d=foo/>
  <rect width=100 height=50>
    <animate attributeName="x" from= to=100 dur=1s/>
  </rect>
</svg>

- <path d=foo/> gets wrongly marked self-closing.

Content following such tags gets put in the wrong place in the DOM tree.

This only happens with foreign elements, like those inside <svg> or <math>, because the tokenizer's logic changes for these contexts.

Code Demonstration

Here’s a snippet to show the problematic input and its real DOM output, using JavaScript (you can test it in the browser console):

const svgString = `<svg>
  <path d=foo/>
  <circle cx=50 cy=50 r=40 />
  <text>Hello</text>
</svg>`;

const parser = new DOMParser();
const doc = parser.parseFromString(svgString, 'image/svg+xml');

console.log([...doc.querySelectorAll('path')].length); // Expected: 1
console.log(doc.querySelector('svg').innerHTML);

// Let's inspect if <path> is self-closing
console.dir(doc.querySelector('path').outerHTML); // Buggy parsers may show it as closed

In buggy environments, the output may indicate that <path> is missing content, or the <text>Hello</text> node appears inside the wrong parent.

Attack & Exploit Possibilities

A malicious actor could *craft SVG or MathML payloads* that exploit this parsing confusion, potentially:

Fooling sanitization logic that depends on proper DOM structure.

For example, if you sanitize by dropping all text nodes outside of <svg>, but the DOM tree is wrong, content could slip through:

<math>
  <mi x=abc/>
  <mo>+</mo>
  <mi>x</mi>
</math>

A bad parser might place the <mo>, <mi> in unexpected places, confusing content sanitization.

Exploit Example

Let’s see a simplified exploit test in a Node.js environment, with a vulnerable tokenizer (you can adjust this for your preferred HTML parser):

const parse5 = require('parse5'); // Example HTML tokenizer library

const foreignPayload = `
<svg>
  <desc foo=bar/>
  <g>
    <script>alert('EXPLOIT');</script>
  </g>
</svg>
`;

// Tokenize (simulate vulnerable tokenizer)
const document = parse5.parseFragment(foreignPayload);
console.log(document.childNodes);

// In a buggy version, the <g> and <script> would be out of scope!

If the tokenizer marks <desc foo=bar/> as self-closing when it's not supposed to, <g> and <script> may end up in the wrong places.

Which Libraries Are Affected?

This bug occurs primarily when using bare tokenizers directly or in specific DOM construction steps (like when using whatwg/html tokenizer rules).

- Parsers based on the HTML Living Standard are affected if they didn't update their logic around attribute parsing in foreign contexts.
- Some JavaScript implementations (tokenizer-js, custom forks of parse5), and even some server-side parsers may be vulnerable.

References

- Original issue on GitHub
- Tokenizer rules in HTML Spec
- parse5 parser project
- SVG and MathML in HTML

Update to the latest versions of your HTML parsing library.

- If using your own tokenizer, make sure / inside unquoted attribute values doesn't trigger the self-closing logic.
- Add unit tests for SVG/MathML with various attributes and bogus self-closing tags.

Summary

CVE-2025-22872 is a subtle but significant vulnerability for anyone parsing foreign HTML content like SVG or MathML using direct tokenization.

> The bug lets unquoted attribute values ending in / trigger a self-closing tag when they shouldn't, scrambling the DOM scope and risking exploits.

Check your parsers, update your dependencies, and always quote your attribute values—especially in foreign elements!

*This article is exclusive content synthesized for clarity and security awareness. For the latest, always refer to security advisories and the references above.*

Timeline

Published on: 04/16/2025 18:16:04 UTC
Last modified on: 05/16/2025 23:15:19 UTC