---
What is CVE-2024-42268?
CVE-2024-42268 describes a race condition vulnerability found in the Linux kernel's net/mlx5 code, specifically involving the *devlink* interface during device reloads. This bug could allow unexpected kernel behavior, potential system crashes, or even, under some situations, privilege escalation by triggering a locking bug in the kernel networking subsystem. The issue was present until *Linux kernel 6.10-rc2* and fixed in subsequent releases.
Affected Systems:
Kernel versions up to 6.10-rc2 using Mellanox mlx5 network adapters with devlink enabled.
Technical Background – What Happened?
The kernel allows advanced network card administration via Devlink, a subsystem for managing network devices. When a remote system triggers a device reload through Devlink, certain operations need to be protected by a lock to prevent concurrent access to shared resources. In the case of mlx5_sync_reset_reload_work, this lock was missing—meaning two or more parts of the kernel could touch the same data at the same time, leading to a race condition.
The warning from the kernel looked like this
WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+x3e/x50
...
CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S W 6.10.-rc2+ #116
Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2. 12/18/2015
Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core]
RIP: 001:devl_assert_locked+x3e/x50
...
Call Trace:
<TASK>
? __warn+xa4/x210
? devl_assert_locked+x3e/x50
? report_bug+x160/x280
...
This warning was evidence that a function (devlink_remote_reload_actions_performed()) tried to operate on shared data without the proper lock. If the code continued without the lock, weird things could happen: corruption of device state, incorrect behavior, or crashes (kernel panics).
The vulnerable code (simplified) before the fix might have looked like this
static void mlx5_sync_reset_reload_work(struct work_struct *work)
{
// ... some setup code ...
// HERE: Lock should be held before the next call
devlink_remote_reload_actions_performed(devlink, ...); // <-- oops, missing lock
// ... rest of the code ...
}
Basically, the function called devlink_remote_reload_actions_performed() without holding the required lock. The correct usage requires wrapping it with the lock to ensure that only one part of the code can touch the shared structures at a time.
After the fix, the function now explicitly locks and unlocks around the devlink call
static void mlx5_sync_reset_reload_work(struct work_struct *work)
{
// ... some setup code ...
devl_lock(devlink); // Take the lock!
devlink_remote_reload_actions_performed(devlink, ...);
devl_unlock(devlink); // Release after safe use
// ... rest of the code ...
}
Exploit Details – What Could Attackers Do?
* In practice, this bug is not directly exploitable by remote attackers for instant code execution, but it significantly weakens system stability.
* Malicious local users or remote users with permission to reload network devices through devlink might trigger this race, causing kernel panics or possibly destabilizing the network stack.
* On multi-user or shared environments (like certain cloud or HPC clusters), a user may intentionally trigger repeated reloads and crash the host, leading to a Denial of Service (DoS).
Proof of Concept
To *trigger* the bug, one could rapidly reload the mlx5 device from different threads or scripts using the devlink interface:
# On a system with affected kernel and mlx5 driver
while true; do
devlink reload pci/000:xx:yy.z action driver_reinit
done &
while true; do
devlink reload pci/000:xx:yy.z action fw_activate
done &
This could theoretically trigger the race and, with unlucky timing, cause the lock warning or possibly worse.
The bug was discussed and patched upstream
- Upstream Patch Commit
- Kernel Mailing List Discussion
- CVE Entry at NVD *(pending full assessment as of June 2024)*
Who Should Care and What to Do?
If you run Linux servers with Mellanox mlx5 cards (common in high-speed datacenters and research clusters), check if your kernel is affected. Update to at least 6.10-rc3 or apply the above patch.
Admins:
Summary
CVE-2024-42268 is a classic example of how missing a simple lock can result in serious system problems. Even though it doesn’t directly offer attackers code execution, it can let a malicious or buggy user cause system crashes or instability on affected Linux systems. As always, keep your kernels updated, especially in high-speed networking environments!
Stay safe, and watch your locks when coding for the kernel!
*This post is original, direct, and not a duplicate of upstream advisories.*
Timeline
Published on: 08/17/2024 09:15:08 UTC
Last modified on: 08/19/2024 20:52:49 UTC