CVE-2024-26891 - Linux Kernel IOMMU/vt-d ATS Invalidation Hard Lockup Vulnerability Explained

CVE-2024-26891 is a critical vulnerability found and resolved in the Linux kernel, specifically within the Intel VT-d IOMMU (Input-Output Memory Management Unit) code related to PCIe devices. This bug could cause a complete system hang ("hard lockup") when devices using hotplug PCIe slots were removed, triggering the kernel to send memory invalidation requests to non-existing devices. In this post, you'll learn how this bug works, how it can crash your system, details about the patch, and why it matters.

The Short Version

- Some servers and workstations use hotplug-capable PCIe slots for devices like network cards or GPUs. These devices can be "hot reset" (powered off and unplugged at runtime).
- The Linux kernel’s IOMMU subsystem manages memory used by these devices, flushing "translation lookaside buffers" (IOMMU device-TLBs) as needed.
- When a device is unplugged, the kernel was still trying to invalidate its memory with an "ATS Invalidation" request—even though the device was gone.
- This triggered a loop of retry attempts and fault interrupts, leading to a hard system lockup or panic.

What makes this dangerous?
A non-privileged local user could potentially trigger the condition by repeatedly hot-removing PCIe devices, causing denial of service by freezing the system.

Technical Dive: What Happened?

Normally when you remove a hotpluggable PCIe device, a "Device TLB Invalidate" request is sent to make sure no stale entries remain in memory lookups. But if the device is ALREADY GONE at the hardware level, the kernel’s request keeps failing, retried endlessly. The kernel's interrupt handler locks up the CPU, as seen in this kernel log:

[ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
[ 4223.822627] Kernel panic - not syncing: Hard LOCKUP

These traces show the problematic code paths

qi_submit_sync+x2c/x490
qi_flush_dev_iotlb+xb1/xd
__dmar_remove_one_dev_info+x224/x250
dmar_remove_one_dev_info+x3e/x50
intel_iommu_release_device+x1f/x30
iommu_release_device+x33/x60
iommu_bus_notifier+x7f/x90
blocking_notifier_call_chain+x60/x90

The retry loop is endless, as the hardware never acknowledges the request to a non-existent device.

PoC: Crashing the Kernel

> NOTE: This is for education only. DO NOT use on production systems.

Here’s a simplified way a user with physical access might repeatedly trigger the bug

# This assumes pciehp (PCI Express Hot Plug driver) and the device are present
for i in {1..10}; do
    echo 1 | sudo tee /sys/bus/pci/slots/<slot_number>/power # Turn slot off (unplug)
    sleep 1
    echo  | sudo tee /sys/bus/pci/slots/<slot_number>/power # Turn slot on (plug back in)
    sleep 1
done

Replace <slot_number> with your actual PCIe slot id.

If unlucky, the kernel starts sending ATS invalidations to the missing device, causing a watchdog timeout and panic.

Vulnerable Example Code

The core bug is in handling device removal. Before the patch, the IOMMU code didn’t check if the device was already gone:

// Before Patch (Simplified)

// Unconditionally send ATS Invalidation
qi_submit_sync(iommu_dev, ATS_INV_CMD);

This will fail if the device is gone from the bus.

The patch adds a check to skip sending the invalidation request if the device is already removed

if (device_exists_on_bus(iommu_dev)) {
    qi_submit_sync(iommu_dev, ATS_INV_CMD);
} // else, skip the invalidation

Real-World Impact

- Platforms affected: Linux with Intel IOMMU (VT-d), especially servers supporting PCIe hotplug/hot-swap
- Risks: Any admin or script that frequently powers off/removes PCIe devices (including in blade servers and high-availability clusters) might inadvertently crash their machine
- Exploitability: Mostly local, though a sufficiently privileged attacker could use this as a kernel-level denial of service

References

- Linux Kernel Patch Discussion
- CVE-2024-26891 on NVD
- Intel VT-d Specification (ATS)
- Linux IOMMU Subsystem
- pci-hotplug subsystem info (kernel doc)

Patch your kernel!

Fixed versions are integrated upstream in mainline Linux as of March 2024 and have been cherry-picked by major distros.

Conclusion

CVE-2024-26891 is a great example of how complex kernel memory and device management can fail in edge cases—creating a risk for entire system stability, not just security.

If you manage systems with Intel IOMMU and hotpluggable PCIe, make sure to update your Linux kernel as soon as possible. Sometimes, a simple “check if device exists first” is the only thing between uptime and a server-room panic.

Timeline

Published on: 04/17/2024 11:15:10 UTC
Last modified on: 05/07/2025 17:42:36 UTC