A significant vulnerability (CVE-2025-22010) has been identified and resolved in the Linux kernel. This vulnerability is related to the RDMA/hns subsystem, which is prone to soft lockup errors during bt pages loop. When allocating a large buffer in the system, such as an MR over 100GB, the for-loop used in the driver requires a considerable loop count, which would lead to soft lockup errors. In this post, we will discuss the details of this vulnerability, its impact, and the fix that has been implemented.

Exploit Details

The issue arises when the driver runs a for-loop during the allocation of bt pages and subsequent mapping with buffer pages. Due to the considerable loop count required when allocating large buffers, the system would experience soft lockup errors. The following is an example of a soft lockup error scenario:

watchdog: BUG: soft lockup - CPU#27 stuck for 22s!
...

Call trace

hem_list_alloc_mid_bt+x124/x394 [hns_roce_hw_v2]
hns_roce_hem_list_request+xf8/x160 [hns_roce_hw_v2]
hns_roce_mtr_create+x2e4/x360 [hns_roce_hw_v2]
alloc_mr_pbl+xd4/x17c [hns_roce_hw_v2]
hns_roce_reg_user_mr+xf8/x190 [hns_roce_hw_v2]
ib_uverbs_reg_mr+x118/x290

watchdog: BUG: soft lockup - CPU#35 stuck for 23s!
...

Call trace

hns_roce_hem_list_find_mtt+x7c/xb [hns_roce_hw_v2]
mtr_map_bufs+xc4/x204 [hns_roce_hw_v2]
hns_roce_mtr_create+x31c/x3c4 [hns_roce_hw_v2]
alloc_mr_pbl+xb/x160 [hns_roce_hw_v2]
hns_roce_reg_user_mr+x108/x1c [hns_roce_hw_v2]
ib_uverbs_reg_mr+x120/x2bc

Fix Details

The fix for this issue is to add a cond_resched() function during these loops, effectively resolving the soft lockup errors. To ensure the allocation performance of a normal-size buffer is not affected, the loop count of a 100GB MR has been set as the threshold for calling the cond_resched() function.

References

- Linux Kernel Mailing List: Original patch proposal
- Patch Commit: Commit for the fix in the Linux kernel

Conclusion

The vulnerability (CVE-2025-22010) in the Linux kernel's RDMA/hns subsystem has been successfully fixed. This issue is related to soft lockup errors during the bt pages loop when allocating large buffers, but the implementation of the cond_resched() function during these loops has effectively resolved the problem. As always, it is advised to keep your Linux kernel up to date to address any potential vulnerabilities and to ensure the overall security and stability of your system.

Timeline

Published on: 04/08/2025 09:15:24 UTC
Last modified on: 04/10/2025 13:15:50 UTC