---
Introduction
A critical bug, CVE-2024-26958, has been found and fixed in the Linux kernel NFS (Network File System) subsystem, specifically within the code handling direct writes. This bug could result in a use-after-free (UAF) scenario, risking data integrity and possible privilege escalation.
In this post, we break down the issue, show what caused it, examine the warning, and walk through the patch and its exploitability.
The Warning
In large-scale production servers (for instance, running RocksDB across 200 nodes), sysadmins observed kernel warnings like:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+x9c/xe
Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
...
nfs_direct_write_schedule_work+x237/x250 [nfs]
process_one_work+x12f/x4a
worker_thread+x14e/x3b
...
The root cause: Kernel code was completing the nfs_direct_request structure twice, leading to a refcount drop below zero — classic UAF territory.
What Happened in the Code
When NFS direct write requests are issued, each gets a nfs_direct_request object to track completion. The code scheduling NFS commit operations did not properly synchronize, so in rare, high-throughput cases, completion logic would be executed twice on the same request.
Before the Patch
if (nfs_commit_end(cinfo.mds))
nfs_direct_write_complete(dreq);
Meanwhile, asynchronous behavior could lead to more than one completion handler running for the same request. This double-frees or UAFs the memory referenced by dreq since 'complete' is meant to finalize and clean up.
The classic pattern, as seen elsewhere in NFS
nfs_commit_begin();
/* work */
nfs_commit_end();
Or, in direct request handling
get_dreq();
...
put_dreq();
#### The Patch / Fix
The fix is to apply the same get/put reference pattern around the commit requests, ensuring the request is only completed (and freed) once.
Snippet from the Patch
get_dreq(dreq);
/* Proceed with async commit scheduling */
if (nfs_commit_end(cinfo.mds))
nfs_direct_write_complete(dreq);
put_dreq(dreq);
Now, any completion handler will only complete and release each request *once*.
Who Can Trigger It?
A local or remote user with write access to an exported NFS mount can trigger rapid, overlapping direct writes. If timed (or fuzzed) correctly, they could trigger the UAF, i.e., act on memory that's already freed.
What Could Go Wrong?
A use-after-free in the kernel is bad news. Depending on the underlying allocator and kernel build options, potential impact includes:
PoC (Proof-of-Concept) Outline
While a full working exploit would require precise timing (racing NFS direct writes and commits), here’s a simplified thought process:
Example in Pseudocode
import threading, os
def direct_write(fd, data):
os.pwrite(fd, data, , os.O_DIRECT | os.O_WRONLY)
with open("/mnt/nfs/testfile", "wb", buffering=) as fd:
threads = []
for _ in range(100):
t = threading.Thread(target=direct_write, args=(fd, b'A'*4096))
threads.append(t)
t.start()
for t in threads:
t.join()
In production, stress tests with RocksDB and other heavy-duty, parallel write workloads reliably hit this bug every ~10 minutes — seeing the refcount warning and associated kernel stack trace.
References
- Main Kernel Patch Commit
- CVE Entry
- NFS Direct IO Documentation
- LKML Discussion (hypothetical)
- Linux Kernel Bugzilla Entry (hypothetical)
Conclusion
CVE-2024-26958 demonstrates how subtle race conditions in async kernel code can lead to deep reliability (and security) problems. If your servers, appliances, or containers use the NFS client with direct IO, you must update the kernel to include this fix.
This kind of kernel UAF isn’t just theoretical — the bug was hitting real world production at scale. The fix is concise but crucial: reference counting must be handled *precisely* with concurrency and async completion in mind.
Timeline
Published on: 05/01/2024 06:15:12 UTC
Last modified on: 12/23/2024 13:22:45 UTC