CVE-2025-32381 - Unbounded Memory Cache in XGrammar Library Can Crash Your Servers

XGrammar is a popular open-source library designed for efficient, flexible, and portable structured generation of data. If you are building AI, ML, or NLP apps—especially those working with structured formats like JSON schemas—there’s a good chance you or your dependencies use XGrammar. However, if you are using any version before .1.18, your systems are exposed to a serious Denial of Service (DoS) risk.

This post breaks down CVE-2025-32381, explains the issue in simple terms, provides testing code, and gives clear advice on how to stay safe.

What is CVE-2025-32381?

In all versions before XGrammar .1.18, there is a bug in the way the library caches (remembers) compiled grammars. There are zero limits on how many grammars are stored in memory ("unbounded memory cache"). This means that attackers—or even just heavy use—can fill up your RAM and crash your server or application.

Fixed in: XGrammar .1.18
Exploitable in: All versions before .1.18

Short Version

If you keep sending unique schema definitions to a system using XGrammar, the library compiles and caches each new grammar forever. Since there’s no eviction or limit, over time, this causes memory usage to grow and grow—until something breaks.

Example Story

Say you’re running an LLM (Large Language Model) inference server that checks every request’s JSON structure using XGrammar. If a user (accidentally or on purpose) sends 1,000,000 unique, slightly different schemas, each is compiled and cached. This memory never gets freed—even for schemas that will never be used again. This keeps eating into your RAM until Linux’s OOM Killer or Docker or Kubernetes steps in and terminates the process—or worse, the entire node.

Code Snippet: How to Reproduce the DoS

Below is a simple Python example. Suppose your server uses XGrammar (Python bindings) to compile grammars on each request:

import xgrammar

for i in range(10**7):  # 10 million requests
    unique_json_schema = f'''{{
        "type": "object",
        "properties": {{
            "field_{i}": {{"type": "string"}}
        }},
        "required": ["field_{i}"]
    }}'''
    xgrammar.compile(unique_json_schema)
    if i % 100000 == :
        print(f"Sent {i} unique grammars...")

*With this loop, each iteration creates a grammar XGrammar will compile and cache forever. Let it run, and eventually your process will hit the memory limit.*

Real-World Attack Scenario

An attacker or botnet could abuse an LLM-based API or any public-facing endpoint which passes arbitrary schemas to XGrammar. They send repeated requests with unique payloads. Each request increases your server’s memory footprint. In a few minutes or hours—crash! Users get errors, downtime spikes, and your observability goes red.

Security Advisory:

GitHub Advisory GHSA-5a7q-qx8q-pvqg

CVE Entry:

CVE-2025-32381 on MITRE

XGrammar Repository:

github.com/lgramling/xgrammar
- Fixed Commit / Release .1.18:
Changelog Entry

Rate-limiting and API Gateway Protections:

Prevent mass request flooding and unique schemas from the same user/IP.

Conclusion

CVE-2025-32381 is a classic example of how “smart” optimizations—like aggressive caching—can backfire if not done safely. If you rely on XGrammar for parsing or validating user-submitted structures, do not delay upgrading. Unpatched, even a curious user could accidentally fill your RAM; a malicious actor could take down your whole service.

Stay secure, and keep your dependencies up-to-date!

*Exclusive content — for more security news and technical breakdowns, follow our feed or subscribe.*

Important:
If you think your application has been targeted by this attack, do a memory audit, check logs for large patterns of unique schemas, and rotate any potentially compromised keys or tokens.

Timeline

Published on: 04/09/2025 16:15:26 UTC
Last modified on: 04/09/2025 20:02:41 UTC