In June 2024, a new Linux kernel vulnerability was identified and patched, known as CVE-2024-50079. This issue impacts the io_uring subsystem’s SQPOLL thread model, specifically when handling task work during thread exit or request cancellation. In this exclusive, easy-to-follow article, we’ll walk through:

This article is designed for sysadmins, developers, and anyone interested in Linux internals.

What is io_uring and SQPOLL?

io_uring is a high-performance async I/O API in the Linux kernel, designed to offer huge speedups for I/O-bound apps. SQPOLL (Submission Queue Polling) allows a dedicated kernel thread to poll and submit I/O requests on behalf of user threads—helpful for reducing syscalls and context switches.

But sometimes performance tweaks have subtle side effects…

Short Summary

When the SQPOLL thread (iou-sqp-*) is shutting down, it may need to run some "task_work". If this happens while canceling in-flight I/O (e.g., via io_uring_cancel_generic()), the kernel can end up running non-blocking and even blocking operations in an invalid thread state.

If the thread is not in the proper state (TASK_RUNNING), deep and rare kernel bugs can happen—even deadlocks or security problems if exploited carefully.

Kernel users saw crashes and warnings like

WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+xf4/x140
do not call blocking ops when !TASK_RUNNING; state=1 set at [<...>] prepare_to_wait+x88/x2fc

Translation: The kernel tried to do something that blocks (waits), but the thread wasn’t marked as actually "running." This is like going to sleep while holding a lock—dangerous!

Stack Trace Example

Call trace:
 __might_sleep+xf4/x140
 mutex_lock+x84/x124
 io_handle_tw_list+xf4/x260
 tctx_task_work_run+x94/x340
 io_run_task_work+x1ec/x3c
 io_uring_cancel_generic+x364/x524
 io_sq_thread+x820/x124c
 ret_from_fork+x10/x20

Technical Reason

A thread’s state (TASK_RUNNING, TASK_INTERRUPTIBLE, etc) controls what operations it’s allowed to do safely:

- TASK_RUNNING: Thread is available to run and can use basic locks/waits.

TASK_INTERRUPTIBLE: Thread is “sleeping”; should NOT perform blocking operations.

The SQPOLL cancel path didn’t always restore TASK_RUNNING before running task work. This caused the kernel to call blocking ops while in TASK_INTERRUPTIBLE, which the scheduler flagged.

Potentially use the bug as an infoleak or denial-of-service in tightly controlled workloads

Most real-world impact is on kernel panic / crashes; privilege escalation is *not* easy from this bug alone.

The Patch

The fix is simple but critical: make sure the thread’s state is set to TASK_RUNNING before running task_work in the cancel path—just as it’s done in other places.

Bad (pre-patch)

// This could run while state != TASK_RUNNING
io_run_task_work(ctx->sqo_task);

Good (post-patch)

// Make sure task state is correct before running task_work
set_current_state(TASK_RUNNING);
io_run_task_work(ctx->sqo_task);

Real Upstream Patch

- Link to mainline patch
- Linux stable tree commit

Commit diff summary

--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ ... @@
         if (task->state != TASK_RUNNING)
-             io_run_task_work(task);
+             set_current_state(TASK_RUNNING);
+             io_run_task_work(task);

---

How to Reproduce (and Test the Fix)

To trigger this, you would set up io_uring with SQPOLL, submit several async operations, and try to cancel/close the ring from another thread simultaneously. This is easiest to do from C, using the liburing library.

Warning: Do not run on production systems!

#include <liburing.h>
#include <pthread.h>

void *cancel_thread(void *ring_ptr) {
    struct io_uring *ring = (struct io_uring *) ring_ptr;
    // Close ring from another thread, potentially racing the SQPOLL exit
    io_uring_queue_exit(ring);
    return NULL;
}

int main() {
    struct io_uring ring;
    io_uring_queue_init(8, &ring, IORING_SETUP_SQPOLL);

    // Submit some I/O
    // ...

    pthread_t tid;
    pthread_create(&tid, NULL, cancel_thread, &ring);

    // Main thread does more ring operations / cancels
    // ...

    pthread_join(tid, NULL);
    return ;
}

If patched, this should not cause kernel warnings/panics.

Conclusion

CVE-2024-50079 was a deep Linux kernel bug affecting the state machine behind async I/O with io_uring’s SQPOLL. Thanks to the Linux kernel community, it’s patched in all major trees in June 2024.

References

- Mainline patch commit
- Patch on lore.kernel.org
- io_uring documentation on kernel.org
- liburing userspace library

Timeline

Published on: 10/29/2024 01:15:04 UTC
Last modified on: 10/30/2024 17:05:40 UTC