CVE-2025-22010 - Resolving a Soft Lockup in Linux RDMA/hns With Large Buffers

A new Linux kernel vulnerability, CVE-2025-22010, was discovered and fixed in the RDMA/hns driver. The issue could lock up CPUs whenever massive memory regions (100+ GB) were mapped for RDMA operations, causing system instability and potential DoS (Denial of Service). This article breaks down what happened, shows technical details, and explains how it was fixed with code you can read.

Vulnerability Explained: What is a Soft Lockup?

A soft lockup means a CPU core spins in kernel space for too long without yielding, essentially "hogging" the CPU and making the entire system freeze or lag. Linux uses watchdog timers to catch this. If you see messages like this in your logs:

watchdog: BUG: soft lockup - CPU#27 stuck for 22s!

— it means that CPU core was unresponsive for over 20 seconds, possibly due to a driver bug or infinite loop.

## Where Did This Happen? (RDMA/hns)

The bug lies in the hns_roce_hw_v2 driver, which supports RDMA (Remote Direct Memory Access) for Huawei network cards. It's used in big data, HPC, AI, and databases, where moving huge blocks of memory directly between servers is critical.

What Triggered It?

When allocating or mapping massive memory regions (MRs) — for example, over 100GB — the driver had to loop, allocating "bt pages" and mapping them to user-space memory pages. No pause or context switch was inserted, so the kernel could get stuck for dozens of seconds in loops like this:

for (i = ; i < num_bt_pages; i++) {
    // Allocate and map a page
    // No point to yield the CPU!
}

This triggers repeated "soft lockup" events, as the scheduler can't regain control.

Here are real-world traces found in production

Call trace:
 hem_list_alloc_mid_bt+x124/x394 [hns_roce_hw_v2]
 hns_roce_hem_list_request+xf8/x160 [hns_roce_hw_v2]
 hns_roce_mtr_create+x2e4/x360 [hns_roce_hw_v2]
 alloc_mr_pbl+xd4/x17c [hns_roce_hw_v2]
 hns_roce_reg_user_mr+xf8/x190 [hns_roce_hw_v2]
 ib_uverbs_reg_mr+x118/x290

and

Call trace:
 hns_roce_hem_list_find_mtt+x7c/xb [hns_roce_hw_v2]
 mtr_map_bufs+xc4/x204 [hns_roce_hw_v2]
 hns_roce_mtr_create+x31c/x3c4 [hns_roce_hw_v2]
 alloc_mr_pbl+xb/x160 [hns_roce_hw_v2]
 hns_roce_reg_user_mr+x108/x1c [hns_roce_hw_v2]
 ib_uverbs_reg_mr+x120/x2bc

Attack & Exploit Details

This vulnerability is not a remote exploit but an easy DoS for local users: anyone on the box able to allocate big enough RDMA memory regions can freeze CPUs for tens of seconds, affecting all real-time applications, databases, or network responses.

This is reproducible by allocating, for example, a 100+GB memory region using RDMA verbs in user space (with the right privileges):

ibv_reg_mr(..., size = 1024*1024*1024*100 /*100GB*/, ...)

Doing so would freeze the kernel in the loop — so essentially, any application, user, or container with access to the RDMA subsystem could DoS the system.

How Was It Fixed?

The official patch added cond_resched() calls inside the critical loops. cond_resched() is a kernel helper that checks if the scheduler needs to switch tasks, and voluntarily yields the CPU if so.

Here’s what the new logic looks like (simplified)

#define SOFTLOCKUP_THRESH_NUM_BT_PAGES ((100ULL * 1024 * 1024 * 1024) / PAGE_SIZE)

for (i = ; i < num_bt_pages; i++) {
    /* ... existing allocation and mapping ... */
    // To avoid performance penalty, only resched every threshold
    if (num_bt_pages > SOFTLOCKUP_THRESH_NUM_BT_PAGES) {
        if ((i & xFF) == ) // every 256 iterations
            cond_resched();
    }
}

> Key idea: Only very large mappings (like 100+ GB) will trigger the reschedule, so normal operation is unimpaired, but huge ones no longer lock CPUs up.

Patch & Discussion References

- Main commit: RDMA/hns: Fix soft lockup during bt pages loop
- Linux Kernel Bug Tracker: LKML Patch Discussion
- NVD entry: CVE-2025-22010 at NVD *(may take time to appear)*

Who Is Affected?

- Huawei Hip08/Hip09 RoCE adapters

Kernels using hns_roce_hw_v2 driver (mainline and backports)

- Systems where untrusted users/containers can allocate large RDMA memory regions

Restrict RDMA privileges: Don't let untrusted users run RDMA applications unless necessary.

- For embedded/specialty systems: Backport this patch to avoid production soft lockups.

Conclusion

*CVE-2025-22010* is a clear example of how edge-case kernel code can trip up in high-performance computing environments. If your servers use modern RDMA and support big memory regions, update soon — this fix protects your critical workloads and your uptime.

Want to dig deeper?
Patching guide for Linux
Linux RDMA subsystem documentation

*Stay safe and keep your kernel fresh! For more kernel vulnerabilities explained plainly, follow this space.*

Original writeup by: [Your Name]
*Exclusive content for [Your Site/Blog]*

Timeline

Published on: 04/08/2025 09:15:24 UTC
Last modified on: 04/10/2025 13:15:50 UTC