In early 2024, a security vulnerability, tracked as CVE-2023-52497, was found and fixed in the Linux kernel’s EROFS (Enhanced Read-Only File System) subsystem. The issue affected the way the EROFS module handles inplace decompression with the LZ4 algorithm. The vulnerability could potentially lead to data corruption under certain hardware conditions, particularly on newer Intel processors with the FSRM (Fast Short REP MOVSB) feature.

This is an easy-to-understand, exclusive deep dive into what happened, how the problem was found, and what the fix looks like. We’ll even include some code snippets and a basic proof of concept (PoC) outline.

What is EROFS and LZ4?

- EROFS: Enhanced Read-Only File System, a high-performance and space-efficient compressed read-only filesystem used in Linux, commonly on mobile and embedded devices.

LZ4: A fast lossless compression algorithm commonly used for efficient decompression.

Inplace decompression means uncompressing data directly into its final location, potentially overlapping the input (compressed) data.

Technical Background

Many LZ77-family compression methods (like LZ4) expect the compressed data to sit at the *end* of the decompressed buffer. That's because the algorithm reads and writes in a way that the output might overlap the input, and the code is written to handle this.

Example memory structure

|-- destination buffer for decompressed data --|-- compressed data --|
                      |   --> (decompression direction)   |

Commonly, that’s handled by using memmove() in the decompressor, which safely moves overlapping memory regions.

The Vulnerability (CVE-2023-52497): What's the Issue?

EROFS tried to optimize performance by mapping the compressed and decompressed memory as *separate*, disjoint kernel virtual mappings (virtual buffers). Due to how the Linux kernel managed memory, this usually worked fine on common x86/arm64 platforms. But:

- On newer Intel CPUs with the FSRM feature, the rep movsb instruction in memmove() exposed a subtle bug.
- The overlapping of compressed and decompressed buffers wasn’t guaranteed in the right order. Depending on address order, memmove() could copy data incorrectly and corrupt the output.

TL;DR:
On certain processors, decompressing files in-place with EROFS+LZ4 could randomly corrupt data when pages overlap, even though the code *looked* safe. This bug didn’t show up for years due to previous CPU and kernel implementation quirks.

Here’s an abstract version of how the code *used* to work

// Map two virtual buffers
void *compressed_vb = kmap(compressed_page);
void *inplace_vb    = kmap(decompressed_page);

// Danger! These may NOT be ordered in memory.
memmove(inplace_vb, compressed_vb, size);

If inplace_vb < compressed_vb, memmove() would normally handle it, except that the hardware’s implementation could copy in the wrong direction, corrupting data.

Imagine this simple overlap in memory

// compressed_buffer and decompressed_buffer might overlap incorrectly:
memmove(decompressed_buffer, compressed_buffer, size);

If "decompressed_buffer" comes after "compressed_buffer" in memory, a naive copy (or hardware-accelerated one) may overwrite data before it’s copied.


Reference:
- Upstream patch and commit message
- LZ4 Decompression docs
- Kernels with FSRM exposure


## Exploit/Proof-of-Concept (PoC) Outline

Note: Since this is a data corruption bug and not a classic privilege escalation, the threat is mostly accidental data loss, not direct code execution. But in theory, a malicious file system image could trigger silent corruption of user data.

Tiny PoC Snippet (pseudo-code)

# Only works on susceptible kernel/hardware
sudo modprobe erofs

# Create a specially-crafted EROFS image (details omitted)
mkfs.erofs image.erofs my_data_file

# Mount and read
sudo mount -t erofs image.erofs /mnt/test
cat /mnt/test/my_data_file > /tmp/output

# Output may be silently corrupt!

From the commit message

> "Let’s strictly use the decompressed buffer for lz4 inplace decompression for now..."

In practice:
Always make sure the overlapped region is handled as LZ4 expects, i.e., by carefully tying the compressed buffer to the *end* of the decompressed buffer or, safer, by *not using in-place decompression* if the memory order can’t be guaranteed.

Patch (simplified)

// Use only decompressed buffer for LZ4 inplace decompression
decompress_lz4(decompressed_buffer, decompressed_buffer + offset, size);

Full fix:
Kernel patch: erofs: fix lz4 inplace decompression

What Should You Do?

- Users: Update to a Linux kernel with the fix (look for the commit above or anything newer than 6.8).
- Operators: Ensure your kernel isn’t vulnerable, especially if using EROFS with LZ4 compression.
- Developers: Be wary whenever mixing overlapping memory operations and hardware-specific memory moves (memmove, rep movsb, etc.)!

References

- Upstream kernel commit: erofs: fix lz4 inplace decompression
- CVE-2023-52497 NVD entry
- EROFS Documentation
- LZ4 Documentation
- FSRM (Fast Short rep movsb) explanation

Conclusion

CVE-2023-52497 highlights how even high-performance kernel code can break down under new hardware, exposing assumptions about memory operations. If you’re using compressed filesystems on Linux, update your kernel — and make sure to keep an eye on subtle details like buffer overlaps!

Timeline

Published on: 03/01/2024 14:15:53 UTC
Last modified on: 01/09/2025 20:20:02 UTC