CVE-2024-26931 - Linux Kernel scsi: qla2xxx Command Flush Vulnerability Explained

In February 2024, the Linux community fixed a serious bug in the qla2xxx SCSI driver that could lead to a full system crash ("kernel oops") when a Fibre Channel cable was pulled under memory pressure. In this post, I'll explain the vulnerability, show what happened behind the scenes, and walk you through how the patch solves the problem.

What is CVE-2024-26931?

CVE-2024-26931 affects the Linux kernel's qla2xxx SCSI driver, which supports QLogic Fibre Channel Host Adapters (common in enterprise storage). When a cable was physically disconnected ("pulled") during periods of low memory, the driver sometimes failed to flush outstanding SCSI commands back to the OS before tearing down its session state. This could result in the kernel trying to access NULL pointers. The impact is an immediate system crash.

Impact: Kernel NULL pointer dereference, crash

- Patched in: Linux 6.9 commit link

1. The Set-Up

Suppose there's a storage array attached via Fibre Channel. Under high I/O and memory stress, the system tries to allocate memory for error recovery tasks (SRBs). If it fails (OOM), error recovery stalls.

2. The Trigger

When a user or admin pulls the Fibre Channel cable, the kernel notices and schedules a teardown of the affected storage session. Normally, all outstanding SCSI commands should be flushed back to the SCSI midlayer for cleanup.

3. The Bug

If the driver *can't allocate* resources needed to flush pending commands (due to low memory or otherwise), it skips the flush. This leaves those pending SCSI commands ("orphans") in an ambiguous state.

Later, when memory is freed and the system tries another cable pull or recovery operation, the driver (and the kernel) attempt to access or modify the half-torn-down structures. Critically, it tries to DMA unmap a NULL SGL (scatter-gather list pointer), leading to a fatal NULL pointer dereference.

The call trace will show something like

BUG: unable to handle kernel NULL pointer dereference at 000000000000000
...
RIP: 001:__wake_up_common+x4c/x190
...
qla2xxx [000:12:00.1]-f084:3: qlt_free_session_done: se_sess 000000000000000 / sess ...

The crash happens in the SCSI command flush logic during session teardown.

Sample Exploit Scenario

Although this is not a remote exploit (because it depends on cable pull and memory state), a local admin or an attacker *with physical or privileged VM access* could reliably crash the machine:

# (In a test environment)
# 1. Stress the system memory
stress-ng --vm 8 --vm-bytes 90% --timeout 120

# 2. Initiate heavy storage IO (using dd or fio)
fio --filename=/dev/sda --readwrite=read --ioengine=libaio --bs=4k --numjobs=16 --size=1G

# 3. At peak memory and IO, physically pull the Fibre Channel cable

# Result: Kernel logs show a bug; system may immediately panic and reboot.

SYSLOG output example

Oops: 000 [#1] SMP NOPTI
CPU: 27 PID: 793455 Comm: kworker/u130:6 ... 
RIP: 001:__wake_up_common+x4c/x190
...
qla2xxx [000:12:00.1]-f084:3: qlt_free_session_done: se_sess 000000000000000 ...

How Does the Patch Fix It?

Patch summary: The driver now *always* checks if pending commands have been flushed back when a session is torn down. If not, it doesn't try to access the command structures (possibly NULL) and prevents the crash.

Patched code snippet (pseudocode)

// In qla2xxx_teardown_session():
if (pending_scsi_commands) {
    flush_commands_to_scsi_layer(); // Ensure proper cleanup!
}
if (cmd->sgl == NULL) {
    // Don't try to DMA unmap a non-existent SGL
    return;
}
// Safe to continue
dma_unmap_sg(cmd->sgl, ...);

The kernel patch (excerpt)

Direct link to the fix

Patch your kernel to 6.9 or apply your vendor's security backport.

- See Red Hat advisory or upstream Linux commit
- For enterprise Linux: Install latest RHEL/CentOS/Oracle Linux updates

More Resources and References

- Upstream Linux Kernel commit
- Red Hat CVE summary
- qla2xxx source code

Summary:
CVE-2024-26931 is a bad memory-flush bug in the Linux qla2xxx SCSI driver that could let a simple cable pull crash production servers if system memory is low. This is fixed in new kernels, and you should update as soon as practical.

Stay safe!
If you're in enterprise storage or using VM hosts with Fibre Channel, apply this fix ASAP. If you have questions or want advice on checking your drivers, leave a comment or DM.

Timeline

Published on: 05/01/2024 06:15:07 UTC
Last modified on: 03/03/2025 17:47:59 UTC