CVE-2024-53052 - Linux Kernel io_uring O_DIRECT Write Deadlock Vulnerability Explained
A critical bug (CVE-2024-53052) was recently resolved in the Linux kernel affecting the interaction between io_uring asynchronous I/O, O_DIRECT file operations, and filesystem freeze mechanisms. This issue could lead to a deadlock, making the filesystem unresponsive when certain conditions occur. While this won’t be exploited by regular users (requires root privileges), understanding the vulnerability is key for administrators and kernel developers. Here’s a plain-English deep dive into how this happened, the implications, and the technical fix.
What is io_uring and Why Does it Matter?
io_uring is a modern Linux API for high-performance asynchronous I/O operations. It’s used to avoid direct blocking syscalls, enabling applications to submit many I/O requests in parallel and handle completions efficiently. With O_DIRECT writes, data skips cache and goes straight to disk for speed and data consistency.
The Role of Filesystem Freezing
fsfreeze is an admin tool to safely suspend all write operations on a filesystem (such as before taking a backup). When freezing, the kernel acquires a global write lock (rwsem) to ensure no write operations are in-flight or can be started until unfreeze.
When a write is submitted through io_uring
- The kernel calls kiocb_start_write() to signal a write is starting and increments a refcount (through a percpu_rwsem).
- If freeze is in progress or about to start, the kernel blocks new writes by acquiring the same rwsem exclusively.
A user or process is running many direct io_uring writes (with O_DIRECT).
2. An admin initiates a filesystem freeze (with CAP_SYS_ADMIN privileges). The freeze waits for all active writes to end before blocking new ones.
3. If an io_uring write is in the pipeline during the freeze, it tries to acquire the read lock _after_ the freeze already blocked it. It waits.
4. The write _never completes_ because the task waiting on it is blocked (waiting for the lock the freezer holds).
Result: Filesystem freeze is stuck forever; new writes are blocked, and the system can partially hang.
Here's a real-world stack trace showing this frozen state
task:fio state:D stack: pid:886 tgid:886 ppid:876
Call trace:
__switch_to+x1d8/x348
__schedule+x8e8/x2248
schedule+x110/x3f
percpu_rwsem_wait+x1e8/x3f8
__percpu_down_read+xe8/x500
io_write+xbb8/xff8
io_issue_sqe+x10c/x102
io_submit_sqes+x614/x211
__arm64_sys_io_uring_enter+x524/x1038
invoke_syscall+x74/x268
el_svc_common.constprop.+x160/x238
...
And for the freezing logic
task:fsfreeze state:D stack: pid:7364 tgid:7364 ppid:995
Call trace:
__switch_to+x1d8/x348
...
freeze_super+x248/x8a8
do_vfs_ioctl+x149c/x1b18
__arm64_sys_ioctl+xd/x1a
...
io_uring _always_ calls kiocb_start_write(), which can block if a freeze is in progress.
- There’s no non-blocking (“NOWAIT”) check before trying the write, so the task waits for the lock, blocking forward progress.
- Meanwhile, the freezer waits for all active writes (including the blocked io_uring thread) to finish before unfreezing, causing a deadlock.
The solution is to respect the IOCB_NOWAIT flag
- If the write is requested as non-blocking (NOWAIT), and we can’t immediately get the lock, fail fast with -EAGAIN.
- This way, io_uring's core logic can handle reissuing or blocking the request safely, running completions, and ensuring the I/O system doesn’t freeze.
Relevant Patch Excerpt
if ((kiocb->ki_flags & IOCB_NOWAIT) && sb_start_write_trylock(inode->i_sb)) {
/* Lock is available now, do the write */
} else if (kiocb->ki_flags & IOCB_NOWAIT) {
/* Lock could not be immediately acquired: fail fast */
return -EAGAIN;
} else {
/* No NOWAIT flag, block as before to get the lock */
sb_start_write(inode->i_sb);
}
In other words: If you can’t get the lock _right now_ in non-blocking mode, don’t wait—just error out, so we never deadlock.
Commit Details:
- io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
- Official Kernel Patch
Normal users cannot directly exploit this to attack a system
- Only privileged users (root/CAP_SYS_ADMIN) can freeze a filesystem.
- This is mostly a denial-of-service (DoS) or availability issue, not a privilege escalation or data leak.
If you want to test on a vulnerable kernel (not recommended on production)
# 1. Use fio or custom program to spam O_DIRECT writes via io_uring to a mounted fs
# 2. In another terminal (as root):
fsfreeze --freeze /home
# If the bug exists, the freeze will hang and writes will never complete.
Patched in Linux 6.12 (and backported to stable as necessary).
- Update to the latest kernel if you use io_uring and perform administrative filesystem operations.
References
- CVE-2024-53052 — NVD Detail (pending)
- Original Kernel Patch Email
- Linux fs/io_uring code
- Filesystem Freezing documentation
- io_uring Documentation
Conclusion
CVE-2024-53052 is a subtle but significant deadlock bug in the Linux kernel related to io_uring and filesystem freezing. While it can’t be triggered by most users, it’s a classic example of why even privilege-required operations need careful concurrency handling. If your workloads or scripts depend on both modern I/O and filesystem maintenance operations, update your kernel as soon as possible.
Stay safe, and always apply the latest security updates!
Timeline
Published on: 11/19/2024 18:15:25 UTC
Last modified on: 12/19/2024 09:38:04 UTC