Linux Kernel users on IBM Power Systems, especially those running virtualized LPARs (Logical Partitions) with POWER architecture, should be aware of CVE-2024-36926. This critical vulnerability was triggered by certain PCI devices (PEs) that are “frozen” at firmware boot time, leading to kernel panics and system crashes due to a NULL pointer dereference. In simple terms: if a PCI device is disabled at the firmware level and Linux tries to use it on boot, the system can crash.

A patch has been merged to fix the problem. Here’s what went wrong, why it matters, and how to stay safe.

What is CVE-2024-36926?

CVE-2024-36926 is a kernel bug in the PowerPC “pSeries” architecture, affecting the code path that configures PCI devices at boot. If partition firmware presents a frozen PCI device (“frozen PE”) to a Linux guest (LPAR), the kernel expects to read a ibm,dma-window property from device-tree Open Firmware. When this property is missing—because the device is frozen—the kernel code naively dereferences a pointer, leading to an oops: kernel panic.

Kernel panic and boot failure (oops) on IBM POWER LPARs.

- Typically triggered after a hardware or firmware error that *freezes* a PCI device, causing it to be visible to Linux without a valid DMA window property.

Technical Details: Breaking Down the Bug

At boot, the Linux kernel tries to discover resources for each PCI device (PE). Usually ibm,dma-window is present in device-tree for every PE the firmware gives to the host OS.

Special Firmware Behavior

If a PCI device is misbehaving (due to hardware or firmware error), IBM system firmware may freeze it, and *withhold* the ibm,dma-window property. This freeze usually lasts 24 hours, or until full system reset.

The Bug

The kernel code did not check if the property was missing. Here’s the key function from the bug report:

/* Simplified vulnerable code */
struct device_node *np = pci_bus_to_OF_node(bus);
ibm_dma_window = of_get_property(np, "ibm,dma-window", &len);
/* No check if ibm_dma_window is NULL */
dma_addr = *((u64 *)ibm_dma_window); // <-- CRASH if ibm_dma_window is NULL

If ibm_dma_window is NULL, the dereference causes a kernel oops (crash). Here’s part of a real crash log:

BUG: Kernel NULL pointer dereference on read at x000000c8
NIP [c0000000001024c] pci_dma_bus_setup_pSeriesLP+x70/x2a

Exploitability and Impact

This bug is caused by unexpected hardware/firmware events, not by attacker-controlled input. Remote exploitation is not likely.

However

- Any script, tool, or even a mischievous admin that intentionally (or accidentally) leaves a PCI device in “frozen” state could repeatedly trigger boot failures for affected LPARs.
- In data centers, a frozen device due to power/hardware incident could render automatic restart impossible/painful.

Denial of service is the main practical impact: LPARs will fail to boot until the frozen device is cleared or the host is power-cycled.

Patch and Mitigation

The fix is very simple: *check if the property exists before using it*. Here’s a snippet from the patch:

ibm_dma_window = of_get_property(np, "ibm,dma-window", &len);
if (!ibm_dma_window) {
    dev_info(&bus->dev, "Frozen PE (phb/PE missing ibm,dma-window), skipping\n");
    return;
}
// safe to dereference ibm_dma_window ...

Patch Reference

- Mainline kernel commit: 6f89cfbff92c ("powerpc/pseries/iommu: Handle missing ibm,dma-window property on frozen PE")

Fix included in Linux 6.8 and later.

- SUSE, RHEL and Ubuntu have backported this to their stable series. See SUSE bugzilla entry.

How to Tell If You’re Vulnerable

- If you run POWER pSeries LPARs with PCI devices, and your kernel is older than Linux 6.8 or not patched, you’re at risk.
- If you see boot-time panics with NULL pointer dereference in pci_dma_bus_setup_pSeriesLP, you have hit this bug.

Recommendations

- Update your kernel to a version with the patch (commit link).

Check vendor advisories (SUSE, Red Hat, Ubuntu) for hotfixes or patches for your distro.

- If you hit a crash: Either wait 24 hours for PE to defrost, or perform a full hardware reset to purge frozen device state.

Original References

- Mainline kernel patch
- Openwall CVE listing
- SUSE Bugzilla 1222647

Summary

CVE-2024-36926 is an excellent illustration of how close-to-metal details (PCI device firmware properties) can crash entire LPARs if kernel code makes bad assumptions. If you run Linux on IBM hardware with LPARs, patch your kernel—otherwise, a frozen PCI device could lock you out of your systems until maintenance windows.

Always check external inputs—even if from the firmware!

Got questions or want technical breakdowns of other kernel bugs? Drop them in the comments!

Timeline

Published on: 05/30/2024 16:15:15 UTC
Last modified on: 07/03/2024 02:03:51 UTC