In early 2024, a significant race condition vulnerability was found and fixed in the Linux kernel’s s390/cio subsystem, affecting IBM mainframe hardware environments. This flaw, now known as CVE-2024-27009, could cause devices to end up in an inconsistent state, potentially denying further device access and causing reliability problems. This post explains the issue in simple terms, lays out the root cause, links original exploits and references, and demonstrates how this bug could be triggered.

What’s CVE-2024-27009 All About?

The core of this CVE is a race condition in the function ccw_device_set_online() involved with IBM’s s390 channel I/O subsystem (cio). This subsystem is used for controlling hardware devices on mainframe systems. When trying to bring a device “online”, the process could fail because of timing issues, leaving the device stuck in a way that nothing else can use it until a reboot.

Why Is This Bad?

- Leaves hardware devices in an undiscoverable/broken state (ENODEV)

Subsequent online commands fail, hurting system stability and availability

- Especially problematic on boot or on systems with complex multi-path I/O

Technical Background

Let’s break down how and why this bug happens.

Normally, ccw_device_set_online() runs through a sequence to bring a device online. It waits for “final device state”, and then evaluates if that result is OK. However, a path verification request might arrive at just the wrong time: after the wait is done, but before the result is checked. Because the code didn’t hold the lock across this boundary, the device state could change after the wait, but before it was checked — hence, *race condition*.

Here’s a simple pseudo-code representing the problematic sequence (before the fix)

wait_event(device->state_wq, device->online_done);
    // No lock here!
    if (device->state != ONLINE) {
        return -ENODEV;
    }

Let’s say a path verification request hits just after the wait completes and changes the device state. When the if-statement runs, the device is already in a failed state, and the function wrongly returns failure.

The Commit That Made Things Worse

Commit 2297791c92d ("s390/cio: don’t unregister subchannel from child-drivers") increased the odds this race would appear, especially when devices are coming up at boot. Path verification traffic thus became more frequent.

Patch and Solution

The fix for CVE-2024-27009 was simply to make sure the device lock (ccw_device_lock) is held across both the wait and result check. That way, no rogue path verification can sneak in.

Fixed Code (Excerpt)

mutex_lock(&device->lock);   // Lock is held!
wait_event(device->state_wq, device->online_done);
if (device->state != ONLINE) {
    mutex_unlock(&device->lock);
    return -ENODEV;
}
mutex_unlock(&device->lock);

See the fixed patch upsteam.

How Could This Be Exploited?

Direct exploitation of this CVE is tricky — it’s not a “remote code execution” bug or privilege escalation by itself. However, an attacker *with sufficient access* (i.e., root privileges or access to management APIs) might:

- Script repeated online/offline events for devices while triggering path verification events (e.g., faking unplug/plug or via sysfs)
- Intentionally induce the device into a stuck offline state, leading to a denial of service for critical devices (e.g., network or storage paths)

Here’s a conceptual script (runs as root) to amplify race and trigger the bug on vulnerable kernels:

# Simultaneously bounce device and hammer sysfs to cause state thrashing
DEVICE=/sys/bus/ccw/devices/..1234

for i in {1..100}; do
  echo 1 > $DEVICE/online &
  # In the background, induce path checks or offlines
  echo  > $DEVICE/online &
done
wait

After a successful trigger, running

cat $DEVICE/online
# Will return ENODEV or show device stuck offline

Note: Only try this on test systems! On production/mainframes, this could cause critical device outages.

References

- Upstream Linux Kernel Patch
- Problematic Commit
- Linux Kernel Security Mailing List Announcement

Conclusion & Mitigation

- Upgrade your kernel: Any mainframe Linux (s390) users should patch to a kernel including the fix above.
- Be aware: This bug doesn’t give attackers instant root, but can be devastating for system reliability, especially at boot time or if relied-upon devices go missing.
- Vulnerability Management: Add CVE-2024-27009 to your tracking and confirm you’re not running versions affected without the patch!

Timeline

Published on: 05/01/2024 06:15:19 UTC
Last modified on: 05/04/2025 09:02:04 UTC