A severe flaw (CVE-2024-53219) was discovered in the Linux kernel's virtio-fs file system related to how the kernel handles direct I/O with large kernel vectors (kvecs). This bug can trigger a kernel warning or hang during module loading when a large file (e.g., 10MB) is accessed with caching disabled. This post examines how the bug occurs, who’s affected, a step-by-step explanation of the underlying Linux code, the exploit trigger, and the exclusive details of the kernel patch, all in straightforward language for sysadmins and practitioners.

What is virtio-fs?

virtio-fs is a modern shared file system for virtual machines (VMs). It lets Linux VMs share files quickly with their host without complex networking. Efficiency comes by letting guest VMs directly access file contents over a Virtio device.

Who’s at risk: Any Linux guests or hosts using virtiofs without page cache

- Impact: Kernel warnings; Denial of Service — hangs during loading large modules via insmod, possible memory allocation failures

Attack vector: Direct use of big files (>10MB) in a virtio-fs share with caching disabled

- Resolution: Use page structures for direct I/O transfers instead of raw pointers/big bounce buffers

The Vulnerability in 5 Steps

Let’s walk through the vulnerability by following how the kernel loads a module file from a virtio-fs mount:

The user runs insmod or similar to insert a large kernel module stored on a virtio-fs mount.

- The kernel uses the finit_module() syscall which eventually calls kernel_read_file() to read the file data.

No Split Read & DMA Buffer Issue

- Because virtio-fs disables the page cache, the kernel takes a direct I/O path.

FUSE’s internal config (max_read) allows huge reads (up to UINT_MAX!).

- The kernel hands the whole 10MB buffer — as a single block — into virtio-fs’s request queue with no size cap.

Large kmalloc Call Fails

- virtio-fs needs DMA (Direct Memory Access) buffers, so it tries to use kmalloc() to allocate a 10MB “bounce” buffer.
- PROBLEM: kmalloc() can’t handle requests >2MB (at best). The allocation fails and triggers a kernel warning:

`

WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551

`

- The kernel retries allocation in the background but always fails — the operation hangs, and no error is returned to userspace.

Example: Where It Breaks

// Abbreviated code path leading to the bug

void virtio_fs_enqueue_req(...) {
    void *bounce = kmalloc(large_size, GFP_KERNEL); // Fails if large_size ~10MB
    if (!bounce)
        // Can't allocate, retries forever
}

// This call chain triggers the bug:
finit_module()
  -> kernel_read_file()
    -> kernel_read()
      -> iov_iter_kvec()
        -> fuse_file_read_iter()
          -> fuse_direct_io()
            -> virtio_fs_enqueue_req()

If you hit the bug, you’ll see a warning like below in dmesg

------------[ cut here ]------------
WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551
...
__alloc_pages+x2bf/x380
...
virtio_fs_enqueue_req+x240/x6d
...
copy_args_to_argbuf() invokes kmalloc() with a 10MB size ...

Why Is This a Critical Bug?

- Kernel hangs or crashes: If you try to load a large module or file in a VM with virtio-fs (no cache), the system stalls or spews kernel warnings.
- Denial of Service: Automatic or remote triggers are possible if kernel code is tricked into allocating these big buffers.
- Root cause: The kernel did not limit buffer size nor did it use paged buffers (the preferred large-data memory structure in the kernel for DMA-able memory).
- No easy workaround: Limiting max_read would cripple virtio-fs performance generally; so the solution must handle big files safely.

Here’s an example of how an attacker (or admin) could trigger the issue

# 1. Prepare: Place a large file (e.g., big_module.ko, 10MB+) on the virtio-fs mount with cache disabled.
# 2. Run insmod
insmod /mnt/virtiofs/big_module.ko

# 3. The guest kernel tries to read the module; kmalloc fails, triggers warning/hang.

Result: The guest kernel hangs or fills logs with warnings, requiring a hard reboot. This scenario is more about DoS (Denial of Service) than privilege escalation.

How Was It Fixed? The Exclusive Kernel Patch

Instead of restricting all users to slow, small reads, developers rewrote the I/O path by having FUSE and virtiofs use “pages” (struct page pointers) rather than a single pointer for kvec data.

For direct IO using kvec, pass an array of page pointers instead of one big buffer for DMA.

- If the buffer is from vmalloc (as kernel_read_file provides), flush/invalidate VM mappings before and after DMA, to avoid data corruption/CPU stale caches.

Patch Snippet

*(shortened for clarity & readability)*

struct fuse_conn {
    ...
    bool use_pages_for_kvec_io; // NEW: Enable paged buffer for virtiofs
    ...
};

// Only for virtiofs — set this when creating the connection
fc->use_pages_for_kvec_io = true;

// In FUSE read/write IO path:
if (fc->use_pages_for_kvec_io) {
    // Instead of kmalloc big buffer...
    // Allocate an array of struct page pointers
    for (i = ; i < num_pages; i++)
        pages[i] = alloc_page(GFP_KERNEL);
    // Copy data or map pages to DMA engine
    // (plus flush/invalidate if using vmalloc)
}

Extra: Flushing for vmalloc

if (is_vmalloc_addr(buf)) {
    flush_kernel_vmap_range(buf, size);
    // ... or invalidate after DMA read
}

References and Further Reading

- Upstream Patch — Linux kernel official fix
- virtio-fs project
- Original kernel bug report
- kmalloc limitations
- Linux FUSE filesystem direct I/O documentation

Conclusion

CVE-2024-53219 is a great example of how modern kernel features (like direct file system I/O and virtualization file shares) can trip up traditional memory allocation — especially with massive files and no cache. The fix solidifies the code, makes large direct I/O safe, and avoids subtle hangs that can ruin uptime for busy virtualized servers.

If you use virtiofs:

Still have questions? Want help checking your kernel?

Drop your concerns in the comments or reach out to your vendor’s security channels. And always, always keep an eye on the Linux kernel security list.

Timeline

Published on: 12/27/2024 14:15:29 UTC
Last modified on: 05/04/2025 09:56:14 UTC