Software parsing XML is everywhere — from web browsers and databases to small IoT devices. Expat (libexpat) is a popular open-source XML parser written in C, and it’s widely used as a core component by many applications, including Python, LibreOffice, and Apache.

Early in 2022, a critical vulnerability was found in Expat’s buffer management code for certain configurations: CVE-2022-23852. This post explains the bug in simple terms, shows you what the vulnerable code looks like, and discusses how attackers might exploit it — so you can understand both the risk and the fix.

What is CVE-2022-23852?

CVE-2022-23852 is a signed integer overflow in the function XML_GetBuffer in Expat (versions before 2.4.4), which can occur when the parser is built or configured with a nonzero value of XML_CONTEXT_BYTES (this option is not enabled by default but is used in some environments).

If you’re using Expat with XML_CONTEXT_BYTES set to a nonzero value, a specially-crafted XML file or input can cause the library to miscalculate buffer sizes when processing XML, potentially leading to heap corruption, denial-of-service (DoS), information leaks, or arbitrary code execution.

The Bug: Signed Integer Overflow

First, let's look at what happens in the source code.

In C, if you add two large positive integers, the result can "wrap around" and become negative if using signed integers — that is, if the math result exceeds the maximum value for a signed int. This is called signed integer overflow, and it’s undefined behavior in C (which attackers can exploit).

The code in Expat tried to grow a buffer based on user-input, but didn't properly check for overflow when calculating the new buffer size.

Here’s a simplified snippet to illustrate

int neededSize = oldSize + someCalculatedValue;
char *newBuffer = malloc(neededSize);

If oldSize and someCalculatedValue (influenced by attacker-controlled input) are large enough, neededSize becomes negative, and malloc is called with an invalid value (or a too-small size), leading to buffer overflows or memory corruption.

Actual Expat Code (Vulnerable version)

Here’s a piece of the real code from Expat:

int bufferSize = parser->m_bufferLim - parser->m_buffer;
bufferSize += neededSize;
...
if (bufferSize > parser->m_bufferLim - parser->m_buffer) {
    char *newBuffer = (char *) realloc(parser->m_buffer, bufferSize);
    ...
}

If an attacker controls neededSize, then bufferSize might overflow, if the sum is larger than INT_MAX (e.g., >2GB on 32-bit). This could result in reallocating a too-small buffer, leading to writing out of bounds — classic heap buffer overflow.

Demo: Triggering the Vulnerability

Note: For educational and defensive testing ONLY. DO NOT use exploits against systems without explicit permission.

You need libexpat compiled with a non-zero XML_CONTEXT_BYTES (e.g., 1024).

Crafting Malicious XML

The attacker’s goal is to manipulate XML so that when processed, neededSize gets extremely large. Usually, this requires creating XML content that expands massively during parsing. For this example, you can use a simple script (Python):

# Generate a big XML with recursive expandable entities
xml_content = "<!DOCTYPE r [\n<!ENTITY a \"" + ("A" * 1024 * 1024 * 3) + "\">\n]>\n<r>&a;</r>\n"
with open("huge.xml", "w") as f:
    f.write(xml_content)

Feed this to a vulnerable Expat-based program, with a nonzero XML_CONTEXT_BYTES.

Add this to your build (e.g., via CFLAGS)

#define XML_CONTEXT_BYTES 1024

Or, see Expat documentation for details.

Run the vulnerable parser on huge.xml

./vulnerable_parser huge.xml

Outcome:
The program will crash, potentially with memory errors, heap corruption, or segmentation faults.

Can It Be Exploited for Code Execution?

Yes, although this mostly causes a crash, attackers can sometimes achieve code execution, especially if they can use heap spraying or heap layout manipulation.

On 32-bit systems, controlling large values and memory allocations becomes easier.

- If the application is setuid/setgid or runs with privileges, the impact is more severe.

How Was It Fixed?

The fix added proper bounds checking and cast safety:

size_t newBufferSize = (size_t)neededSize + extraPadding;
if (newBufferSize < neededSize) { // check for overflow
    // handle error, abort
}

The update ensures that any addition / allocation is size-checked; if it would overflow, it aborts safely.

Use at least version 2.4.4 or later.

- Download from the official site.

- Python, LibreOffice, and many Linux distributions updated quickly. Check your OS/package status

- Debian Security Tracker
  - RedHat Bugzilla
  - NVD entry

Run with Least Privilege

- Don’t use root/admin if you don’t need it. Parsing XML from untrusted sources is dangerous!


## References / Further Reading

- Expat 2.4.4 release notes (security)
- Upstream GitHub Patch
- CVE Details: CVE-2022-23852
- Original bug report

Conclusion

CVE-2022-23852 is a strong reminder that unchecked math — especially with memory sizes — is a real risk in security-critical C/C++ code. If you use or ship Expat, patch immediately, especially in embedded environments or where you parse untrusted XML. Stay alert for integer overflow bugs — a tiny bit of validation can save a world of trouble.

Have questions, or want to know whether your software is affected? Comment below or contact your vendor for a security audit.

Stay safe, and always keep your libraries up to date!

Timeline

Published on: 01/24/2022 02:15:00 UTC
Last modified on: 06/14/2022 11:15:00 UTC