CVE-2024-53153 - How a Simple PCIe Endpoint Timing Bug Could Crash Your Qualcomm-based Linux Kernel (And How It Got Fixed)

In June 2024, a subtle but critical bug was quietly patched in the Linux kernel’s PCI subsystem. If you’re working with Qualcomm platforms, especially as endpoints in PCIe setups, you’ll want to understand CVE-2024-53153, its root cause, and how developers resolved it. This read will break it down for you in clear, practical terms—with code snippets and all—so you won’t accidentally stun your embedded system the next time that mysterious "PERST#" line pulses…

What Is CVE-2024-53153, In Plain English?

It’s a Linux kernel bug affecting the qcom-ep (Qualcomm Endpoint) PCIe driver. In Qualcomm SoCs when they act as PCIe endpoints, there’s a control signal called PERST# ("Peripheral Reset", active low) managed by the host. When the host asserts PERST# (drives it low), it signals all its endpoints—including your Qualcomm device—to reset.

On most Qualcomm endpoints, this refclk comes from the host—the SoC can’t generate its own.

- If the Linux kernel tries to fiddle with hardware registers after refclk has gone away, the whole chip can crash.
- The original driver did exactly this: it performed _endpoint cleanup_ (hardware accesses) when refclk could already be gone.

Result: Every time the host reset the bus, your endpoint might crash—especially when cleaning up things like DMA or powering off function devices.

The Faulty Flow (Old Code)

// qcom_pcie_perst_assert()
dw_pcie_ep_cleanup();   // Needs live hardware registers
pci_epc_deinit_notify();
// ... a moment later, host cuts refclk, endpoint can't talk to hardware ...

At this point, the endpoint crashes when it desperately tries to finish those cleanups with *no hardware clocks* available.

Why Not Just "Make Your Own Clock"?

Qualcomm engineers confirm that _some endpoint hardware designs can’t do this._ They rely on the host supplying refclk. So, always driving it locally isn’t a general solution.

The Fix: Cleaning Up _After_ Refclk Is Guaranteed

The correct place to access hardware registers is when you *know* refclk is available. That’s true when the host de-asserts PERST# (goes high again), refclk comes back on, and endpoint resources are live.

The New, Safe Flow (Patched Code)

// qcom_pcie_perst_deassert()
// Host brings back refclk, power rails up

enable_resources();
dw_pcie_ep_cleanup();   // CLEANUP SAFE: hardware now accessible
pci_epc_deinit_notify();
// Continue handling normal resume...

A simplified pseudo-diff to clarify the change

- static void qcom_pcie_perst_assert(...)
- {
-     dw_pcie_ep_cleanup(ep);
-     pci_epc_deinit_notify(epc);
-     // ... host may already switch off refclk here
- }

+ static void qcom_pcie_perst_deassert(...)
+ {
+     enable_resources(...);   // Turn on core, clocks, etc.
+     dw_pcie_ep_cleanup(ep);  // Safe: refclk present
+     pci_epc_deinit_notify(epc);
+     // Continue normal initialization
+ }

Exploiting the Vulnerability

This isn’t a typical remote exploit you’d encounter in a server; instead, it’s a denial-of-service class vulnerability:
- If you have a system with a Qualcomm SoC acting as PCIe endpoint, a hostile (or simply buggy) host can repeatedly assert PERST#.
- If running a vulnerable kernel, the SoC will *crash hard* whenever it tries to do endpoint cleanup with refclk missing.
- This bug can be weaponized in embedded environments (test setups, developer boards, industrial systems), or simply cause repeated downtime until the issue is patched.

How to Protect Your System

1. Kernel Patch: Apply the Linux kernel patch—already in upstream commit 7e769b01019f.
2. Check Vendor Trees: For downstream/Android/Board Support Package kernels, check if your vendor has pulled this fix.
3. Test Robustness: If you design products with PCIe endpoint mode, stress-test with assert/deassert cycles and monitor for panics.
4. Review Clock Sourcing: Know whether your endpoint can *survive* without host refclk—some newer designs can, but many still can’t!

References & Further Reading

- Upstream kernel patch commit (qcom-ep refclk)
- NVD CVE-2024-53153 entry
- Linux PCI Endpoint Framework documentation

Takeaway

CVE-2024-53153 is an example of a small timing bug with big consequences. If your hardware setup gets clocking rules wrong, even trusted Linux code can brick your device after a simple PCIe bus event. The fix is subtle: just wait until the clocks come back! Don’t take your hardware’s heartbeat for granted—especially in the world of endpoints.

If you’re deploying Qualcomm PCIe endpoints, patch up or risk mysterious, nasty SoC crashes anytime the host has a bad day.

Got questions, need step-by-step patch instructions, or wonder if your platform is safe? Feel free to reach out in the kernel mailing list or ask below!

*Author: PCIe Debugger | June 2024 | Exclusive Tech Insights*

Timeline

Published on: 12/24/2024 12:15:23 UTC
Last modified on: 10/08/2025 14:43:14 UTC