CVE-2024-45000 - Data Race Fix in Linux Kernel’s fscache_cookie – How a Missed Counter Check Led to Kernel NULL Pointer Dereference

---

Introduction

In April 2024, a subtle but serious bug in the Linux kernel’s fscache subsystem was identified and patched. The flaw (tracked as CVE-2024-45000) could lead to a kernel NULL pointer dereference and system crash under rare race conditions. This article explains the issue, how it could be triggered, shows relevant kernel code, links to technical resources, and clarifies why this fix matters—even if you’ve never heard of the fscache_cookie.

What Is fscache and Why Does It Matter?

fscache is a kernel subsystem that helps network filesystems (like NFS or AFS) cache data locally to speed up repeated accesses and reduce server load. At its core, fscache manages “cookies”—small internal objects representing cacheable resources.

If these cookies are mishandled (especially during complex scenarios like rapid file access and eviction), the kernel might try to access memory that’s already been freed. That leads straight to kernel panics, page faults, and potential denial of service.

Sysadmins began seeing logs like this when under heavy load (or with aggressive fscache usage)

BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
...
RIP: 001:cachefiles_prepare_write+x30/xa
...
struct cachefiles_object *object = cachefiles_cres_object(cres);
struct cachefiles_cache *cache = object->volume->cache;  // <--- crash here

2. On a different CPU, fscache_unuse_cookie() is called, triggering LRU discard and setting a discard flag.

4. Meanwhile, cachefiles_prepare_write() is called *again*, finds the pointer unexpectedly NULL, and dereferences it.

Why? A Missing Check for Active Accesses

There’s an n_accesses counter (“how many things are using this cookie right now?”). Good code checks this counter before deleting things—but the path for LRU discards *forgot* to check. Withdraw logic usually waits for all accesses to end before cleanup, but not if the LRU path triggers it as described.

Vulnerable Code (before the patch):

switch (cookie->state) {
...
case FSCACHE_COOKIE_STATE_ACTIVE:
    if (cookie->flags & FSCACHE_COOKIE_DO_LRU_DISCARD) {
        /* LRU discard can occur while there are outstanding accesses */
        cookie->state = FSCACHE_COOKIE_STATE_LRU_DISCARDING;
        fscache_stat(&fscache_lru_discards);
        break;
    }
    ...

Here, the LRU discard (FSCACHE_COOKIE_DO_LRU_DISCARD) switches state without checking for n_accesses.

Patched Code (after the fix):

case FSCACHE_COOKIE_STATE_ACTIVE:
    if (cookie->flags & FSCACHE_COOKIE_DO_LRU_DISCARD) {
        if (atomic_read(&cookie->n_accesses) > ) {
            /* There are still accesses, wait before discarding */
            break;
        }
        cookie->state = FSCACHE_COOKIE_STATE_LRU_DISCARDING;
        fscache_stat(&fscache_lru_discards);
        break;
    }

Now, if any accesses remain, discard does *not* proceed—avoiding the NULL pointer dereference.

Patch Reference:
- The full, original patch is available at kernel.org commit 69fdb36f5e

Exploitation Details

This is a denial-of-service bug: a local user (or buggy third-party service) repeatedly accessing and evicting cache files can trip the race. This will cause a kernel panic but has *no* known escalation to privilege or info leak.

How Could Someone Reproduce It?

- Set up a system with heavy network cache activity (NFS/AfS with fscache).
- Rapidly open and discard many files, while also forcing cache pressure (e.g., low disk space, unmounting caches).
- Given async kernel workqueues, eventually the state machine and another cachefiles process may overlap just right—triggering the bug.

Proof-of-Concept (in C, Simulated)

Practical remote or local exploitation depends on system load/unmount cycles—deliberate triggering is tricky. Here’s a demonstration pseudocode to approximate the conditions:

void stress_fscache() {
  #pragma omp parallel for
  for (int i = ; i < 100000; i++) {
    // Touch file to cache
    system("cat /netfs/mount/filename > /dev/null &");
    // Force cache pressure (simulate LRU)
    system("fstrim /var/cache/fscache");
  }
}

> Note: This just increases chances; the actual race is deep in async kernel threads.

Linux Kernel: Versions before the April 2024 fix, typically around 6.8.y.

- Distributions: Any that backport fscache and cachefilesd support (Debian, Ubuntu, Fedora, RHEL, custom kernels).

Users: Primarily those using fscache with network filesystems.

Check your kernel version and watch for patches in distribution advisories.

Mitigation & Patch Guidance

- Upgrade to a kernel including this patch (first public in April 2024).

Original References and Further Reading

- Linux Kernel Commit Fixing CVE-2024-45000
- Canonical Advisory USN-6783-1
- Red Hat Bugzilla - Bug 2273956
- LWN: fscache, a persistent file cache for Linux

Conclusion

CVE-2024-45000 is a classic example where a simple forgotten check in concurrency logic can bring down the world’s most-used operating system kernel. If you operate NFS or AFS caching on Linux, *patch now*—and remember, even tiny race windows in kernel code are worth closing.

*Stay tuned for more kernel vulnerability deep-dives. If you need help patching, check your distro’s advisory or reach out to the community!*

Timeline

Published on: 09/04/2024 20:15:08 UTC
Last modified on: 09/06/2024 16:27:31 UTC