In early 2024, security researchers and Linux kernel developers resolved a subtle but important issue affecting IBM’s s390 PCI subsystem. The issue, tracked as CVE-2024-56699, could cause a “double remove” of a hotplug slot in certain error paths. While not a classic memory corruption bug, this flaw could destabilize the system or lead to unpredictable behavior on affected hardware, especially in mainframe environments relying on robust PCI hotplug handling.
This post explains the bug in simple terms, demonstrates how it could be provoked, references the fixes, and outlines how you can detect and protect against similar issues.
Background: What is the s390 PCI Subsystem?
The s390 architecture underpins IBM mainframes, and starting with modern models, these support attaching PCI devices. Hotplug allows dynamic addition/removal of PCI devices — a critical feature for availability and resource management.
The Bug: What Went Wrong?
Previously, with commit 6ee600bfbef, the Linux kernel maintainers moved the zpci_exit_slot() function (responsible for cleaning up hotplug slot resources) to be called when the device is actually released instead of earlier. This was supposed to ensure the slot stuck around until all users were done with the device.
However, the code in zpci_release_device() also tried to handle slot cleanup for multiple device states (reserved, configured, standby) — and in one of those flows, it *already* tore down the hotplug slot! If the function got called on an unexpected device state (especially standby), the code could attempt to remove the hotplug slot *twice* for the same device, leading to possible double-free or undefined actions.
Here’s a simplified snippet showing the essence of the problem (not actual kernel code)
// Pseudo-code illustrating the bug pattern
void zpci_release_device(struct zpci_dev *zdev)
{
switch (zdev->state) {
case CONFIGURED:
// ... (code to tear down)
/* fallthrough */
case STANDBY:
remove_hotplug_slot(zdev);
break;
case RESERVED:
// Only now should we remove the slot
remove_hotplug_slot(zdev);
break;
}
}
Notice that both STANDBY and RESERVED cases can call remove_hotplug_slot(), possibly for the same device.
The Fix
The fix is straightforward: Only permit hotplug slot cleanup in the permitted "reserved" state, and warn if somehow the function is called out of order.
Example from the patch
void zpci_release_device(struct zpci_dev *zdev)
{
if (WARN_ON(zdev->state != ZPCI_DEV_STATE_RESERVED))
return;
remove_hotplug_slot(zdev);
}
- Full fix, see Linux commit 5702581da978
- Original bug discussion from Linux s390 mailing list
1. How Could it Have Been Exploited?
While this bug is not an attacker-controlled memory corruption, a rogue user or buggy system-level software accidentally (or intentionally) removing a PCI device in a standby or configured state could have triggered double removal of a hotplug slot. That could have:
2. Proof-of-Concept (PoC)
Here is a simulated PoC in C resembling the real issue. This is just for educational demonstration:
// WARNING: For educational use only!
#include <stdio.h>
#include <stdbool.h>
bool slot_removed = false;
void remove_hotplug_slot() {
if (slot_removed) {
printf("Error: hotplug slot double removal!\n");
// In the real kernel, this could double-free memory or worse
}
slot_removed = true;
printf("Hotplug slot removed.\n");
}
void release_device(int device_state) {
// State: = RESERVED, 1 = CONFIGURED, 2 = STANDBY
switch (device_state) {
case 1: // CONFIGURED
case 2: // STANDBY
remove_hotplug_slot();
break;
case : // RESERVED
remove_hotplug_slot();
break;
}
}
int main() {
// Simulate sequence that leads to double-remove
release_device(2); // Remove slot during standby
release_device(); // Remove slot again during reserved
return ;
}
Expected output
Hotplug slot removed.
Error: hotplug slot double removal!
Hotplug slot removed.
Security Impact
- Availability Risk: If you run workloads relying on reliable PCI hotplug, a kernel crash or panic could occur during unexpected device removals.
- Denial of Service: A local user with sufficient privileges performing PCI hotplug operations could cause system instability.
Affected versions: Linux with s390 PCI hotplug between commit 6ee600bfbef and the fix.
- Upgrade kernels as soon as possible if you run mainframes or z/VM guests using PCI hotplug features.
References
- Linux kernel CVE-2024-56699 NVD entry (TBA)
- Fix commit in Linux mainline
- Linux s390 PCI/PCIe documentation
Conclusion
CVE-2024-56699 serves as a cautionary tale: even when dealing with low-level hardware routines on non-x86 architectures, resource management must be precise. Failing to properly gate cleanup routines can have real-world consequences, especially in environments where reliability is crucial. System administrators and kernel developers should patch proactively and consider additional code reviews for state machine logic in similar subsystems.
Pro tip: Look for similar patterns (“double free”, “resource recycling”) in codebases you review — whether or not you’re working with mainframes, the same logic applies.
*Stay safe, patch early, and keep your mainframes healthy!*
Timeline
Published on: 12/28/2024 10:15:17 UTC
Last modified on: 05/04/2025 10:02:45 UTC