Summary: A Denial of Service (DoS) vulnerability has been detected in the vLLM Outlines library through malicious actions to fill the provided cache, which may cause storage space exhaustion. This vulnerability affects the V engine and has been fixed in version .8..
Background
vLLM is a high-throughput and memory-efficient inference and serving engine used for Language Models (LLMs). It leverages the Outlines library as a backend to support structured output (a.k.a. guided decoding). The Outlines library provides an optional cache for compiled grammars on the local filesystem, and this cache is enabled by default in vLLM. The Outlines library is also accessible by default through the OpenAI compatible API server.
Affected Code
The affected code is located within the vLLM module vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache provided by the Outlines library.
# vllm/model_executor/guided_decoding/outlines_logits_processors.py
import os
from outlines import FilesystemCache
# Use the Outlines library cache unconditionally
cache = FilesystemCache(base_path)
Exploit Details
A malicious user could send a stream of very short decoding requests containing unique decoding schemas, subsequently resulting in the addition of a new cache entry per request. This constant cache filling could lead to a Denial of Service attack if the affected filesystem exhausts its available storage space.
Please note that even if vLLM is configured to utilize a different backend by default, the Outlines library backend remains accessible on a per-request basis using the guided_decoding_backend key of the extra_body field within the request.
{
"extra_body": {
"guided_decoding_backend": "outlines"
}
}
Vulnerability Scope and Fix
This vulnerability specifically impacts the V engine. The development team has resolved the issue in vLLM version .8., wherein they introduced a fix to disable the Outlines cache by default. Users can control the cache usage via the OUTLINES_CACHE configuration option:
# Configuration example in vLLM .8.
# vllm/config.py
# Use the Outlines library cache based on the OUTLINES_CACHE setting
cache = FilesystemCache(base_path) if config.OUTLINES_CACHE else None
Recommendations and Conclusion
Users of the vLLM V engine are highly advised to upgrade to vLLM .8., which includes the necessary fix to mitigate the risk of the Denial of Service vulnerability in the Outlines library. Furthermore, users can implement additional security measures such as monitoring the request stream and implementing rate-limiting strategies to prevent abuse.
Timeline
Published on: 03/19/2025 16:15:31 UTC