---
Overview
CVE-2024-35333 is a newly discovered vulnerability affecting html2xhtml 1.3, an open-source tool for converting HTML documents into XHTML. This vulnerability is a stack buffer overflow in the function read_charset_decl, caused by improper bounds checking. Attackers can exploit this by supplying a specially crafted input, leading to denial of service, data corruption, or even arbitrary code execution.
This post will cover how the vulnerability works, provide some code examples, walk through a sample exploit, and link to important references.
What is html2xhtml?
html2xhtml is a command-line utility that parses HTML files and converts them into well-formed XHTML. Written in C, it's often used in automated web processing pipelines and data conversion tools.
The Vulnerable Code: read_charset_decl
The vulnerability is located in the read_charset_decl function. Here’s a simplified version of the function illustrating the issue:
void read_charset_decl(char* input) {
char buf[64]; // Fixed-size buffer on stack
// Vulnerable usage: no check on input length.
strcpy(buf, input);
// ... Further processing
}
How Does the Exploit Work?
An attacker can provide an input longer than 64 bytes (including the null byte) to read_charset_decl. When strcpy runs, it copies all those bytes into buf, overrunning the buffer, and *corrupts* other information on the stack.
Step-By-Step Exploit (Proof-of-Concept)
Let’s walk through how a malicious user might exploit this vulnerability.
1. Crafting the Malicious Input
Suppose an attacker prepares an input string that is 80 bytes long.
# Python code to create the payload
payload = b"A" * 80 # 80 bytes, all 'A'
with open("exploit.txt", "wb") as f:
f.write(payload)
2. Feeding Input to the Program
Assume html2xhtml allows specifying a charset via an argument or config file that calls read_charset_decl. The attacker feeds the payload from "exploit.txt":
html2xhtml --charset "cat exploit.txt" input.html output.html
How to Fix
The correct way to eliminate this vulnerability is to use a safe string copy that checks boundaries, like strncpy or, better yet, snprintf, and to ensure the buffer is always null-terminated.
Vulnerable
strcpy(buf, input);
Fixed
strncpy(buf, input, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\'; // guarantee null-termination
References
- CVE-2024-35333 Record at NVD (pending)
- html2xhtml Project Page
- Common C Mistakes: Buffer Overflows (CWE-121)
- An Introduction to Stack Smashing Attacks
Do not use html2xhtml 1.3 with untrusted input.
2. Monitor the sourceforge project for patches.
Conclusion
CVE-2024-35333 is a classic, yet dangerous, stack buffer overflow caused by (strcpy)’s careless use in the read_charset_decl function. As history shows, such vulnerabilities can allow attackers to easily crash programs or take control of systems when user input isn’t carefully managed. Until a patch is released, only use trusted input with html2xhtml 1.3, and always validate inputs in C programs.
Timeline
Published on: 05/29/2024 16:15:11 UTC
Last modified on: 08/19/2024 16:35:15 UTC