CVE-2024-27134 is a recently discovered security vulnerability in the popular machine learning platform MLflow, specifically within the Spark User Defined Function (UDF) implementation. This vulnerability arises due to excessive directory permissions, which, when exploited, can lead to local privilege escalation. The exploit is only relevant when using the spark_udf() MLflow API.

In this post, we will deep dive into the details of the vulnerability, provide a code snippet to demonstrate the exploit, and discuss potential mitigation strategies. We'll also include links to the original references and sources for further reading.

Background

MLflow is an open-source platform for managing the complete machine learning lifecycle, covering the experiment, reproducibility, and deployment stages. One of the valuable features of MLflow is its ability to integrate with Apache Spark through the use of Spark UDFs, enabling users to load their MLflow models into Spark applications.

Vulnerability Details

Let's consider the case where a local attacker has access to a machine running MLflow and the Spark UDF implementation. The attacker can gain elevated permissions by exploiting the excessive directory permissions issue in MLflow through a time-of-check to time-of-use (ToCToU) attack.

A ToCToU attack is a race condition occurring when a system checks a resource's state (i.e., a file or directory), and in the small amount of time between the check and the usage of that resource, the attacker manipulates the state of the resource. In this case, the attacker replaces the directory with a symbolic link pointing to a target file or folder, resulting in unauthorized modification or access to the target.

The following code snippet demonstrates the local privilege escalation attack using the MLflow API

import os
import mlflow
from mlflow.pyfunc.spark_udf import SparkUDF

# The attacker creates a malicious Python script
malicious_script = """
import os
os.system('echo "exploited" > /etc/proof_of_exploit')
"""

with open("malicious.py", "w") as f:
    f.write(malicious_script)

# Register the UDF
spark_udf = SparkUDF("runs:/<run_id>/<artifact_path>", "predict")
udf = mlflow.pyfunc.spark_udf(spark, "runs:/<run_id>/<artifact_path>")

# The attacker waits for the right moment to execute their ToCToU attack
os.symlink("/etc", udf.working_dir)

# The API call that triggers the exploit
df_with_predictions = df.withColumn("prediction", <udf.function>(<input_cols>))

In this code snippet, the attacker creates a malicious Python script that writes "exploited" to the /etc/proof_of_exploit file. The attacker then registers the UDF, waits for the right time to execute their ToCToU attack by replacing the udf.working_dir with a symlink to the /etc directory, and triggers the exploit by calling the API.

1. MLflow GitHub Repository: https://github.com/mlflow/mlflow
2. Spark UDF Documentation: https://docs.databricks.com/applications/machine-learning/model-serving/ml-models.html

Mitigation

To mitigate this vulnerability, it is recommended to apply better directory permission controls while using the MLflow and Spark UDF APIs. For instance, you can ensure that only authorized users can call the spark_udf() MLflow API.

Conclusion

CVE-2024-27134 is a critical security vulnerability in the MLflow platform, affecting the Spark UDF implementation. Local attackers can exploit this vulnerability through a ToCToU attack to gain elevated permissions on the targeted machine. By understanding the causes and possible exploit scenarios, developers and administrators can take appropriate measures to mitigate the risks associated with this vulnerability.

Timeline

Published on: 11/25/2024 14:15:06 UTC