A bug identified as CVE-2024-56589 was found in the Linux kernel’s hisi_sas SCSI driver, which could cause a CPU soft lockup when running with the no forced preemption model. This bug becomes apparent in high-performance environments involving several SAS SSDs. This post breaks down what the bug is, what causes it, how it can be exploited (intentionally or unintentionally), and how it was fixed—using simple, clear language.

What Happened? (The Problem)

When using the hisi_sas driver (mainly on ARM64 servers with Huawei/HiSilicon SAS controllers), you could hit a scenario where a CPU gets stuck for tens of seconds, disrupting the whole system. Here's a real kernel log snippet showing the issue:

watchdog: BUG: soft lockup - CPU#240 stuck for 22s! [irq/149-hisi_sa:3211]
...
Call trace:
 fput_many+x8c/xdc
 fput+x1c/xf
 aio_complete_rw+xd8/x1fc
 blkdev_bio_end_io+x98/x140
 bio_endio+x160/x1bc
 blk_update_request+x1c8/x3bc
 ...
 slot_complete_v3_hw+x260/x760 [hisi_sas_v3_hw]
 cq_thread_v3_hw+xbc/x190 [hisi_sas_v3_hw]
 irq_thread_fn+x34/xa4
 irq_thread+xc4/x130
 kthread+x108/x13c
 ret_from_fork+x10/x18

The error (soft lockup) means the CPU failed to process anything else (including the watchdog timer thread), getting stuck in kernel code for way too long. This makes the system unresponsive or even crash-y under certain high I/O workloads.

The kernel driver code (hisi_sas) has a busy wait loop in its interrupt thread.

- When you attach say, 12 high-speed SAS SSDs and they pump out a ton of interrupts, the driver’s IRQ thread and hardware IRQ handler both run on the same CPU.
- The function irq_wait_for_interrupt() always returns , because new interrupts keep coming in so fast, so the CPU never gets a break or chance to run something else (like the watchdog timer).
- On kernels configured with no forced preemption (PREEMPT_NONE), the kernel doesn’t automatically break out of long-running loops to let other processes run.

Eventually, the watchdog detects the CPU hasn’t checked in, and logs a soft lockup.

In other words: With high I/O and no preemption, a CPU can permanently get stuck in this thread and the OS can hang.


## Proof-of-Concept / Exploit

This kind of bug isn’t a classic remote code exec or info leak, but it’s a denial-of-service (DoS) vector. Here’s how you might reliably trigger it (assuming vulnerable kernel and hardware):

Setup: Use a server with a supported HiSilicon SAS controller and 12+ fast SAS SSDs.

2. Kernel Config: Make sure you’re using a kernel built without preemption (CONFIG_PREEMPT_NONE).

`bash

fio --name=test --filename=/dev/sd[a-l] --rw=randrw --bs=4k --numjobs=12 --iodepth=64 --runtime=120

`

4. Observe: Watch dmesg for soft lockup errors as seen above. If the bug isn’t fixed in your kernel, you’ll likely freeze the CPU and/or kernel.

The Fix

The kernel community fixed this by adding a call to cond_resched() inside the driver’s irq thread. cond_resched() is a helper that says: “If someone else needs the CPU, let’s let them run for a moment.” This breaks the busy loop, even on kernels that don’t have forced preemption enabled.

Key Patch Snippet

/* old - endless busy-wait in cq_thread_v3_hw() */

while (some_irq_condition) {
    ... // process interrupts
    // No CPU break!
}

/* new - add cond_resched() to offer CPU to others */

while (some_irq_condition) {
    ... // process interrupts
    cond_resched(); // <------ Add this line!
}

Effect:
This change means the kernel can run other tasks (like the watchdog or other critical system tasks) even in heavy interrupt conditions, thus preventing CPU starvation and system hang.

References (Original & Further Reading)

- Kernel Mailing List Patch Discussion (LKML)
- Linux commit fixing this issue
- Hisi SAS Driver source code
- Kernel preemption models explained

Who is Affected?

- Servers using HiSilicon SAS controllers with many SSDs or high I/O rate workloads.

Anyone pushing high interrupt rates without the fix.

## How To Patch / Mitigate

If you build kernels yourself, check your hisi_sas_v3_hw.c for cond_resched() in IRQ threads.

- As a workaround, consider using a kernel build with voluntary or full preemption or spreading high I/O loads across CPUs.

TL;DR

- *CVE-2024-56589* is a denial-of-service bug in the Linux kernel’s hisi_sas driver under high load and no-preempt kernels.
- Certain I/O patterns can lock up a CPU, potentially freezing the box.

Solution is merged; update your kernel if you run on affected hardware.

## Questions/Comments?

Feel free to ask if you want help checking if your system is vulnerable or have questions about preemption and Linux scheduling!


*This post is original content, for educational and operational awareness. If you run Linux servers with HiSilicon SAS controllers, please patch promptly.*

Timeline

Published on: 12/27/2024 15:15:18 UTC
Last modified on: 05/04/2025 09:59:09 UTC