A recently discovered vulnerability named CVE-2024-52338 affects the Apache Arrow R package, versions 4.. through 16.1.. The vulnerability lies in the deserialization of untrusted data in IPC and Parquet readers within the package. This could allow an attacker to execute arbitrary code by making an application read Arrow IPC, Feather, or Parquet data provided from untrusted sources, such as user-supplied input files.
The issue specifically impacts the Arrow R package and does not affect other Apache Arrow implementations or bindings, except in cases where those bindings are used via the R package. It is highly recommended that users of the Arrow R package upgrade to version 17.. or later. Downstream libraries should also update their dependency requirements to Arrow 17.. or later.
A workaround for those using an affected version of the package is to read untrusted data into a Table and utilize its internal to_data_frame() method (e.g., read_parquet(..., as_data_frame = FALSE)$to_data_frame()).
Exploit Details
An attacker could exploit this vulnerability by crafting a malicious IPC, Feather, or Parquet file and tricking a user into reading it using an affected version of the Apache Arrow R package. The file contents would then be deserialized, allowing the attacker's code to be executed within the application. This could further lead to unauthorized access, data leakage, or data corruption, based on the attacker's intent and the privileges of the compromised application.
Here is a code snippet of the affected functionality within the Arrow R package
# Reading a Parquet file (vulnerable)
data_frame <- read_parquet('malicious.parquet')
# Reading an IPC/Feather file (vulnerable)
data_frame <- read_ipc('malicious.arrow')
# Workaround: Reading a Parquet file without directly creating a data frame (less vulnerable)
table <- read_parquet('possibly_malicious.parquet', as_data_frame = FALSE)
data_frame <- table$to_data_frame()
As a temporary fix, users can adopt the workaround mentioned above, which reads the untrusted data into a Table and uses the to_data_frame() method to convert it, reducing the potential impact of the vulnerability. Nevertheless, users should prioritize upgrading to Apache Arrow R package version 17.. or later.
Links to Original References
1. Apache Arrow R package deserialization concerns
2. CVE-2024-52338 in National Vulnerability Database
3. Apache Arrow R package repository on GitHub
To mitigate this vulnerability, follow these steps
1. Upgrade the Apache Arrow R package to version 17.. or later. This can be done using the following command:
install.packages("arrow")
2. Update any downstream libraries to require Apache Arrow R package version 17.. or later as a dependency.
3. If it is not possible to upgrade, make sure to use the read_parquet(..., as_data_frame = FALSE)$to_data_frame() workaround for reading untrusted IPC, Feather, or Parquet files.
Stay safe and ensure the prompt resolution of this vulnerability by keeping your software up-to-date and always being cautious when dealing with data from untrusted sources.
Timeline
Published on: 11/28/2024 17:15:48 UTC
Last modified on: 11/29/2024 15:15:17 UTC