In early 2022, a significant vulnerability was discovered in Expat, also known as libexpat — a C library that’s widely used for parsing XML files. The vulnerability, tracked as CVE-2022-23990, is due to an integer overflow in the doProlog function and affects all versions before 2.4.4.
This post breaks down this CVE simply, explains how the bug works, shows code snippets, details possible exploitation, and gives you the essential references.
What Is Expat (libexpat)?
Expat is a stream-oriented XML parser library written in C. It's heavily used in open-source projects such as Python, Ruby, Apache, and others for parsing XML data. Because it's so widely adopted, bugs in Expat can affect many downstream users.
What’s CVE-2022-23990?
CVE-2022-23990 is all about an integer overflow in the doProlog function inside the Expat codebase. If an attacker exploits this overflow, they could potentially trigger a heap buffer overflow, which can lead to a crash (denial of service) or even remote code execution, depending on how Expat is used in an application.
Short Description (from NVD)
> Expat (aka libexpat) before 2.4.4 has an integer overflow in the doProlog function.
Original NVD Entry:
https://nvd.nist.gov/vuln/detail/CVE-2022-23990
Official Expat Security Page:
https://github.com/libexpat/libexpat/blob/master/expat/Changes
Where Is the Bug? (Vulnerable Code)
The problem lies in how the doProlog function calculates memory allocation for some internal buffers. When parsing crafty XML, an attacker can make those calculations wrap around and allocate less memory than needed, causing an overflow later.
Here's a simple (simplified) version of what happens
// inside xmlparse.c (simplified for clarity)
int bufferSize = elementEndPtr - s;
// "elementEndPtr" and "s" come from XML input
// The bug: integer overflow not handled
buffer = (char *)malloc(bufferSize);
memcpy(buffer, s, bufferSize);
If elementEndPtr - s becomes a negative number (or a very large positive one via overflow), malloc may allocate not enough memory, but memcpy will still overwrite more than intended.
How Does the Overflow Happen?
If an attacker sends a giant or specifically crafted XML document, it's possible that elementEndPtr and s are manipulated such that their subtraction overflows the integer, resulting in a tiny (or even zero) malloc, but with a much larger memcpy. That's how heap memory gets overwritten.
How Can This Be Exploited?
The most direct result is a crash (denial of service). But, in certain scenarios—especially if Expat is parsing XML from untrusted sources or running in a privileged process—this could be used for more dangerous attacks like code execution.
Craft a Malicious XML
The XML is specifically designed so the buffer size overflows by creating unbalanced tags or deeply nested elements.
Memory corruption
- Under rare circumstances, the attacker could execute code (if they can control overflow data precisely).
Sample Proof-of-Concept (PoC) Code
Below is a conceptual PoC. This only shows the idea — real exploitation often requires fuzzing and fine-tuning.
<!-- Example of a deep recursive element -->
<root>
<a><a><a><a><a>... (repeat many times) ...</a></a></a></a></a>
</root>
And a C code snippet to parse this XML using Expat
#include <stdio.h>
#include <expat.h>
void startElement(void *userData, const char *name, const char **atts) {
// Just a stub, nothing here...
}
int main() {
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetStartElementHandler(parser, startElement);
FILE *f = fopen("evil.xml", "r");
char buf[8192];
size_t len;
while ((len = fread(buf, 1, sizeof(buf), f)) > ) {
if (XML_Parse(parser, buf, len, feof(f)) == XML_STATUS_ERROR) {
printf("Parse error at line %lu\n",
XML_GetCurrentLineNumber(parser));
break;
}
}
XML_ParserFree(parser);
fclose(f);
return ;
}
If you feed this parser a malicious XML (with length and nesting crafted to trigger the overflow), it could crash.
Expat maintainers released a patch where they applied safe integer checks before allocations.
- Download link: https://github.com/libexpat/libexpat/releases/tag/R_2_4_4
References
- NVD Details: CVE-2022-23990
- GitHub Release: Expat 2.4.4
- Expat Changelog: expat/Changes
- Blog with more technical insight: Google OSS-Fuzz bug report
Summary
CVE-2022-23990 is a real threat especially for apps using old versions of Expat to parse XML from the outside world. By understanding how the integer overflow happens in doProlog and seeing how easy it is to trigger with a crafted XML document, you can appreciate just how important it is to update this library in your software supply chain.
Simple advice:
Update Expat to 2.4.4 or newer now to prevent crashes or worse.
*Stay safe, keep your dependencies up to date, and always validate your inputs!*
Timeline
Published on: 01/26/2022 19:15:00 UTC
Last modified on: 06/14/2022 11:15:00 UTC