In recent times, a critical vulnerability - CVE-2023-47248 - was discovered in the PyArrow library, which impacts versions .14. to 14... The vulnerability is related to the deserialization of untrusted data in Inter-Process Communication (IPC) and Parquet readers. If exploited, it allows for arbitrary code execution, putting applications and their data at risk.

If your application reads Arrow IPC, Feather or Parquet data from untrusted sources, such as user-supplied input files, it is vulnerable to this security flaw. However, it is essential to note that this vulnerability only affects PyArrow and not other Apache Arrow implementations or bindings.

To mitigate this issue, users of PyArrow are urged to upgrade to version 14..1, which resolves the vulnerability. Downstream libraries should also update their dependency requirements to PyArrow 14..1 or later. PyPI packages for the upgraded version are already available, and conda-forge packages are expected to be available soon.

For those who cannot upgrade to the latest version, a separate package called pyarrow-hotfix is provided, which disables the vulnerability on older PyArrow versions. You can find instructions on how to use the package on the PyPi project page.

Here is a code snippet demonstrating a possible exploit scenario

import pyarrow as pa
import pyarrow.parquet as pq
import io

# Sample exploit payload
payload = b'\x12' * 1024 * 1024

def unsafe_deserialize(data):
    reader = pa.BufferReader(data)
    return reader.read_all()

def unsafe_read_parquet(parquet_data):
    file = io.BytesIO(parquet_data)
    table = pq.read_table(file)
    return table

# Usage of unsafe_deserialize and unsafe_read_parquet with untrusted data
received_data = b'...'  # Imagine this coming from an untrusted source
try:
    # Deserialize Arrow IPC data (potentially unsafe)
    arrow_data = unsafe_deserialize(received_data)
    
    # Read data from a possibly unsafe Parquet file
    parquet_data = unsafe_read_parquet(received_data)
except Exception as e:
    print(f"Error: {e}")

In the above example, both unsafe_deserialize and unsafe_read_parquet functions are potentially dangerous as they read data from untrusted sources. Upgrading to PyArrow 14..1 or applying the pyarrow-hotfix package will help mitigate the issue. Be sure to follow the instructions as mentioned in the provided links for a secure implementation.

Timeline

Published on: 11/09/2023 09:15:08 UTC
Last modified on: 11/29/2023 03:15:42 UTC