CVE-2024-50135 - Race Condition in Linux Kernel’s NVMe PCI Driver

A critical race condition vulnerability, now tracked as CVE-2024-50135, was discovered and resolved in the Linux kernel’s NVMe PCI driver. The flaw could allow unsafe interactions between device reset operations and queue management, potentially leading to kernel panics or undefined device behavior. In this article, we break down the bug, explain the root cause, and walk through both the exploitation possibilities and the fix, in a way that’s easy to understand.

What Is the Problem? (In Plain Language)

Modern NVMe SSDs connect to computers through PCIe. The Linux kernel controls these devices using its nvme stack, which uses multiple hardware queues to maximize performance. When things go wrong—like a device reset or shutdown—these queues are updated, brought online, or removed.

The race condition arises between two actions

- Device Shutdown (nvme_dev_disable()): Disables the NVMe device, and modifies how many queues are active (dev->online_queues).
- Queue Update (nvme_pci_update_nr_queues()): Updates the number of hardware queues, referring to dev->online_queues.

Without proper locking, these two actions can happen at the same time, leading to invalid queue states—and that means system errors or crashes.

How Did the Bug Look in Real Life?

Kernel warning logs (as shown below) were triggered, often after system suspend/resume or hotplugging an NVMe device:

WARNING: CPU: 39 PID: 61303 at drivers/pci/msi/api.c:347
         pci_irq_get_affinity+x187/x210
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
RIP: 001:pci_irq_get_affinity+x187/x210
Call Trace:
 <TASK>
 ? blk_mq_pci_map_queues+x87/x3c
 ? pci_irq_get_affinity+x187/x210
 blk_mq_pci_map_queues+x87/x3c
 nvme_pci_map_queues+x189/x460 [nvme]
 blk_mq_update_nr_hw_queues+x2a/x40
 nvme_reset_work+x1be/x2a [nvme]

This log hints at queue mapping function calls happening while another part of the driver was possibly resetting or disabling the device—classic signs of racing threads.

Here is a pseudo-simplified sequence

// Thread 1
nvme_dev_disable(nvme_device);
//   dev->online_queues gets changed

// Thread 2
nvme_pci_update_nr_queues(nvme_device);
//   Reads dev->online_queues, calls blk_mq_update_nr_hw_queues()

If these two happen together, blk_mq_update_nr_hw_queues() could receive a stale or uninitialized value, resulting in out-of-bounds errors or other kernel nastiness.

Where Was the Lock Missing?

No mutex guarded dev->online_queues. That meant the above sequence had zero protection from races.

Exploitability: Why Should You Care?

While this is a kernel crash bug (not privilege escalation), any situation where you can trigger NVMe resets or reinitialization (for example, hot-swapping SSDs, triggering PCI errors, or suspending/resuming a laptop) could be used to crash the host. In a cloud or hosting environment, this can cause denial of service.

A simple exploit—if you have local access

# On a vulnerable kernel
echo 1 > /sys/class/nvme/nvme/reset_controller
# (causing a reset while heavy IO is ongoing)
# Repeat, and you might hit the race and crash the system.

The Fix: Lock It Down

The Linux kernel patch introduced a mutex (shutdown_lock). Both between reset and update routines, this lock ensures only one thread modifies or reads dev->online_queues at once.

Patch snippet (simplified)

void nvme_pci_update_nr_queues(struct nvme_dev *dev, int nr_queues)
{
    mutex_lock(&dev->shutdown_lock);

    if (!(dev->flags & NVME_FLAG_ENABLED)) {
        // Device is disabled, skip updating queues!
        mutex_unlock(&dev->shutdown_lock);
        return;
    }

    // Now safe to update hardware queue count
    blk_mq_update_nr_hw_queues(dev->tagset, dev->online_queues);

    mutex_unlock(&dev->shutdown_lock);
}

Original References

- Linux Kernel Commit (nvme): "nvme-pci: fix race condition between reset and nvme_dev_disable()"
- Linux Kernel Mailing List Patch Discussion
- CVE-2024-50135 at NVD (awaiting update)

How to Stay Safe

- Upgrade your kernel: This patch is included in upstream kernels after [June 2024]. Check if your Linux distributor has backported this fix.

Conclusion

CVE-2024-50135 is a textbook example of a race condition in complex, multithreaded kernel code. While not a privilege escalation or remote code execution bug, it can still crash production systems unexpectedly. The fix is simple—lock around shared state—but the impact is real.

Have questions about this or want to share your experience? Comment below or check the official patch links for more technical details!

*Copyright (c) 2024 – Exclusive breakdown for prompt user.*

Timeline

Published on: 11/05/2024 18:15:16 UTC
Last modified on: 11/08/2024 14:34:11 UTC