A major security flaw has been found in Apache Tika affecting its tika-core (1.13-3.2.1), tika-pdf-module (2..-3.2.1), and tika-parsers (1.13-1.28.5) modules on all platforms. Tracked as CVE-2025-66516, this issue allows attackers to perform XML External Entity (XXE) injection using a specially crafted XFA form inside a PDF file.

This post explains what the vulnerability is, why the scope is broader than the previous CVE-2025-54988, how it can be exploited, and what you should do right now.

What is XXE and Why is it Dangerous?

XXE (XML External Entity) injection is a vulnerability that allows attackers to interfere with how an application parses XML data. By injecting malicious XML content, attackers can:

Potentially execute code remotely (depending on configuration)

All the attacker needs is for your application to process a malicious XML file, often hidden inside common formats like PDF.

tika-parsers: 1.13 up to 1.28.5 (old branch, PDF parsing in this module)

All platforms (Windows, Linux, Mac) are vulnerable.

CVE-2025-66516 shows that the real vulnerability (and the actual fix) is in tika-core.

- If you only updated tika-parser-pdf-module (as previous advisories said), and not tika-core, you are _still vulnerable_!

Legacy Module

- Previous reports missed that, in Tika 1.x versions, PDF parsing (and the vulnerable code) was part of tika-parsers (org.apache.tika:tika-parsers).

Technical Details & Exploitation

When Tika processes a PDF with an interactive form (XFA), some code paths try to parse the embedded XML without disabling external entity resolution. An attacker can weaponize this using a malicious PDF file:

The PDF contains an embedded XFA XML form.

- The XML references an external entity pointing to a sensitive file (such as /etc/passwd on Unix).

Malicious XFA XML inside the PDF

<?xml version="1." encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<xfa:data>
    <username>&xxe;</username>
    <password>password123</password>
</xfa:data>

If your Java code parses PDFs with Tika _any version before the fix_, the xxe entity will be resolved, retrieving and possibly disclosing the contents of /etc/passwd.

A simple illustration in Java (ANY vulnerable Tika version)

InputStream pdfStream = ...; // Attacker's malicious PDF
PDFParser parser = new PDFParser();
Metadata metadata = new Metadata();
parser.parse(pdfStream, new BodyContentHandler(), metadata, new ParseContext());

If the embedded XFA content triggers an XXE, your server data is at risk.

Upgrade tika-core and tika-pdf-module to at least version 3.2.2

- If using Tika 1.x, tika-parsers needs to be at least 1.28.6 (or, preferably, upgrade to the latest branch)
- Do not rely on only updating the tika-pdf-module or individual parser modules – the root fix is in the core library.

Detection

- Look in your logs for exceptions during PDF parsing that reference external entities or unexpected system file reads.

References

- Apache Tika Security Advisory (2025-66516) *(NVD listing may lag)*
- GitHub Tika Project
- CVE-2025-54988 (previous/related advisory)
- What is XXE? (OWASP Guide)

Summary

CVE-2025-66516 is a critical flaw in Apache Tika's PDF handling chain, affecting old and new versions through both core and parser modules. Attackers can extract server files by embedding crafted XFA XML in PDFs. The fix is to update both core and relevant parser libraries. Don’t just patch — audit your dependency tree and push for production updates.

Stay safe. Patch now.

*This post is exclusive and offers a simple breakdown of a complex, real-world vulnerability affecting common document processing stack. For more technical help, see links above.*

Timeline

Published on: 12/04/2025 16:17:24 UTC
Last modified on: 12/30/2025 16:15:46 UTC