In early 2023, security researchers discovered CVE-2023-29469, a vulnerability in libxml2, one of the world’s most popular XML parsing libraries (used by Python, PHP, Git, and many others). The bug itself is subtle, but dangerous: when hashing *empty dictionary strings* in some crafted XML files, libxml2’s system for generating a key (xmlDictComputeFastKey in dict.c) can give unpredictable values. This opens targeted applications to dangerous problems, ranging from *logic errors* to *memory corruption*, including double free vulnerabilities.

This post explains what CVE-2023-29469 is, shows its causes in simple terms, and walks through a practical (non-malicious) code example. We’ll wrap up with links to the official details and patches.

The Root Cause: Hashing Empty Strings The Wrong Way

Libxml2 uses a custom hash table (dictionary) for a lot of internal operations—like tracking tags and attribute values. When adding a string to this hash table, libxml2 calculates a *key* using the function xmlDictComputeFastKey.

The problem arises when trying to hash an *empty string* (""). Here’s a rough version of the vulnerable function from dict.c before the fix:

static unsigned long
xmlDictComputeFastKey(const unsigned char *name, int namelen, int seed) {
    const unsigned char *ptr = name;
    unsigned long value = seed;

    value += *ptr; // <-- BAD if name is empty!
    value += namelen;
    // ...more hash mixing...
    return value;
}

If name is an empty string (zero length), then *ptr tries to read a byte from the input. Normally, with good C, an empty string is always '\', but *attackers can arrange for name to point to memory where anything* is possible, not just '\'. That means the computed hash value might be random, causing different runs or different users to get different behavior.

Why This Matters

Almost all apps using libxml2—directly or through language bindings—could be exposed. That means web servers, desktop software, and even some cloud systems parsing XML could be at risk if they process untrusted XML.

For real attackers, *predictable bugs* (one value, one crash) are much harder to exploit than *strange, random bugs* that produce weird memory states and potentially repeatable memory layout (RIPE for exploitation).

Exploit Example (Safe Version)

This isn’t a full malicious exploit, but shows *how* weird XML input could trigger the bug. Normally, an empty XML attribute is safe. But if you create a document that tricks libxml2 into hashing empty names, you can test for the broken logic.

Example Python code (using lxml, which wraps libxml2)

import lxml.etree as ET

# This will create an XML input with a strange, empty attribute
weird_xml = '''<root >
    <tag foo=""></tag>
    <tag></tag>
</root>'''

try:
    root = ET.fromstring(weird_xml)
    print("OK!")
except Exception as e:
    print("Crash or logic error:", e)

With a vulnerable libxml2 (before 2.10.4), *repeated runs* of this parser may crash, not crash, or behave oddly.

If you manage to pass a really low-level crafted XML (not shown here, to avoid helping attackers), you can get into memory corruption or even crash scenarios.

Real Exploit: Double Free

Sometimes, this hashing inconsistency will cause libxml2 to *think it’s inserting a new value when it’s actually overwriting*, or vice versa. When the dictionary gets cleaned up, a double free can occur:

How to Fix

Upgrade to libxml2 version 2.10.4 or newer.  
The maintainer patched xmlDictComputeFastKey to handle empty strings safely, always giving a standard value.

Official patch

Gnome libxml2 Merge Request: Fix empty string hashing bug

The patched code looks like

if (namelen == )
    value += ;
else
    value += *ptr;

References and Further Reading

- CVE Description (NVD)
- Gnome Release Notes (2.10.4)
- HackerOne Report (Original Disclosure)
- libxml2 Gitlab Commit

Bottom Line

CVE-2023-29469 is a reminder that even old, trusted C libraries can have devastating bugs hiding in plain sight. If you use libxml2, upgrade to 2.10.4+, and audit all XML inputs if you can. Some bugs don’t need complex files—they just need one empty string in the wrong place.

For technical readers, here’s the TL;DR: *Never blindly dereference memory, even if you think the input is safe!*

Timeline

Published on: 04/24/2023 21:15:00 UTC
Last modified on: 05/04/2023 16:06:00 UTC