In early 2022, security researchers discovered a significant vulnerability in libexpat, a widely used XML parsing library. This flaw, tracked as CVE-2022-22823, affects the build_model function within xmlparse.c prior to version 2.4.3. If exploited, it allows for an integer overflow, leading to potential denial of service or even remote code execution in some cases.

In this long read, we'll break down what this vulnerability is, how it can be exploited, and what you can do to stay safe. I'll even show you a simplified vulnerability demo using C code, so you can see the problem in action.

1. What Is libexpat?

Expat (or libexpat) is an open-source XML parser written in C. It is found in many systems, ranging from embedded devices to web servers and programming languages (like Python and PHP).

Why Should You Care?

If an attacker feeds specially crafted XML to a libexpat-powered parser, you could be open to attacks!

2. The Heart of the Bug: build_model and Integer Overflows

In C programming, an integer overflow happens when an operation tries to create a numeric value that is outside of the range that can be represented with a given number of bits.

Where Did the Overflow Occur?

In the vulnerable versions of Expat, the allocation sizes in the build_model function (xmlparse.c) could be calculated incorrectly—resulting in a much smaller buffer being allocated than needed.

Main Issue:  
If the size is too small, memory is overwritten when data is copied or written to that buffer. This is a classic gateway to arbitrary code execution or crashing programs.

Here's a critical snippet from xmlparse.c in libexpat < 2.4.3 (simplified)

int length = group->length;
Model *model = (Model *)MALLOC((length + 1) * sizeof(Model));
if (!model) { return XML_ERROR_NO_MEMORY; }
// ... further code that writes to model array

What Goes Wrong?

If length is large enough, adding 1 (i.e., length + 1) will wrap around to zero due to integer overflow. The result? The buffer that's allocated is not the right size. When libexpat writes to this buffer, it can go out of bounds!

Pseudocode Simulation

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main() {
    int length = INT_MAX; // set to max int
    size_t alloc_size = (length + 1) * sizeof(int); // This overflows!
    int* model = (int*)malloc(alloc_size);
    if (!model) {
        printf("Allocation failed.\n");
        return 1;
    }
    model[length] = 1; // Out-of-bounds write!
    free(model);
    return ;
}

Running this code could crash or corrupt memory—perfect conditions for an exploit.

4. How Can Attackers Exploit This?

Simply by feeding a malicious XML document, the attacker tricks libexpat into processing a huge group (causing length to be too large). This triggers the overflow, leading to the out-of-bounds write.

Denial of Service: Crashing the service parsing XML.

- Arbitrary Code Execution: In certain circumstances, they might run code under the context of the parser.

Note: Expat is used in various root-level system services—making this a critical issue.

5. Timeline and Patches

- Reported: Early Jan 2022 (Initial GitHub Issue)
- Patched: v2.4.3 (Release notes)
- Public Advisory: NVD CVE-2022-22823

Commit Fix

Commit on GitHub

The core fix: Prevent the length + 1 calculation from overflowing by checking for possible overflows before the allocation, and by switching to size_t types.

7. Further Reading & References

- Expat Security Announcements
- NVD Detail for CVE-2022-22823
- GitHub commit fixing CVE-2022-22823
- Original Vulnerability Report

8. Conclusion

CVE-2022-22823 is an important example of how simple programming mistakes—like missing a check in arithmetic operations—can lead to serious security problems in upstream software, affecting millions of downstream users.

Update your dependencies. And remember, "big numbers" that wrap around are more dangerous than they look!


*Feel free to share this post and help keep the open-source community safe.*

Timeline

Published on: 01/10/2022 14:12:00 UTC
Last modified on: 06/14/2022 11:15:00 UTC