Linux is widely relied upon for networking due to its performance and hardware offloading capabilities. Mellanox (now NVIDIA) mlx5 drivers are some of the most popular for high-speed Ethernet adapters and Infiniband. However, a vulnerability was found and patched in the mlx5e network driver, tracked as CVE-2021-46931. This bug triggered a kernel panic during transmission (TX) timeout recovery due to an incorrect void * pointer cast. Let's break down the vulnerability, how it could be exploited, the code involved, and the fix — in simple language.
Technical Context
When the Linux kernel detects that a packet transmission (“TX”) was stalled on an mlx5 interface, it schedules recovery using a workqueue (mlx5e_tx_timeout_work). For debugging and recovery, the kernel uses “devlink health reporters” that allow dumping internal queue states.
The function mlx5e_tx_reporter_dump_sq() is supposed to take a pointer to struct mlx5e_txqsq (the send queue), but in the TX timeout recovery path, it's passed a pointer to a different structure: struct mlx5e_tx_timeout_ctx.
As a result, when the dump function tried to read fields from the expected struct, it actually dereferenced an unrelated memory layout. This led to a kernel stack overflow and panic, crashing the whole machine.
Example Kernel Log
mlx5_core 000:08:00.1 enp8sf1: TX timeout detected
mlx5_core 000:08:00.1 enp8sf1: TX timeout on queue: 1, SQ: x11ec, CQ: x146d, SQ Cons: x SQ Prod: x1, usecs since last trans: 21565000
BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66eadc..000000004d932dae)
kernel stack overflow (page fault): 000 [#1] SMP NOPTI
...
Kernel panic - not syncing: Fatal exception
In the buggy kernel source, the key function looked like this (simplified)
// This function is a 'dump' callback for the devlink reporter
static int mlx5e_tx_reporter_dump_sq(struct devlink_fmsg *fmsg, void *ctx)
{
struct mlx5e_txqsq *sq = ctx; // Casts void* to expected type
// ... read fields from sq ...
}
But in the TX timeout recovery path, this callback was called with a pointer to a different structure:
struct mlx5e_tx_timeout_ctx {
struct mlx5e_txqsq *sq;
// ...other fields...
};
So dereferencing the pointer as a struct mlx5e_txqsq * crashed the kernel.
Exploitability
While this bug requires the user (or application) to either trigger a real hardware TX timeout or maliciously induce one (for example, by flooding TX queues or simulating failure), a local attacker with the right permissions could crash a system running a vulnerable kernel, leading to a denial-of-service.
They force a TX timeout (e.g., via crafted ioctls or massive packet floods).
3. When the recovery workqueue runs, it triggers the miscast, causing a stack overflow and kernel panic.
Note: No privilege escalation or code execution here—just a system crash (DoS).
The Fix
To fix this, developers added a wrapper function that extracts the correct queue pointer from the mlx5e_tx_timeout_ctx structure before calling the original dump function.
Patched Code
// Wrapper added
static int mlx5e_tx_reporter_dump_sq_wrap(struct devlink_fmsg *fmsg, void *ctx)
{
struct mlx5e_tx_timeout_ctx *timeout_ctx = ctx;
struct mlx5e_txqsq *sq = timeout_ctx->sq;
return mlx5e_tx_reporter_dump_sq(fmsg, sq);
}
// When registering dump callback for timeout, use the wrapper
reporter->dump = mlx5e_tx_reporter_dump_sq_wrap;
Now, the dump function always receives the correct pointer type, preventing invalid memory access, stack overflows, and panics.
Full Example: Simplified PoC
While a real-world exploit would require kernel manipulation or repeated link failures, here’s how the call sequence looked, simplified:
// Broken flow (vulnerable)
mlx5e_tx_reporter_dump_sq(fmsg, (void *)&tx_timeout_ctx); // wrong pointer type
// Fixed flow
mlx5e_tx_reporter_dump_sq_wrap(fmsg, (void *)&tx_timeout_ctx);
// ^ extracts .sq pointer and passes it correctly.
References and Further Reading
- Linux Kernel Mailing List Patch
- CVE-2021-46931 at MITRE
- Official commit in kernel git
- mlx5e Driver code (LXR search)
Summary
- CVE-2021-46931 is a crash bug in the Mellanox mlx5 Linux driver, triggered by a wrong pointer cast in TX timeout error recovery.
Fixed by properly extracting and passing the expected queue pointer.
- All users of affected kernels should upgrade or apply the backported patch, especially on servers with Mellanox/NVIDIA adapters.
If you're running a data center or cluster on Mellanox hardware, be sure to check your kernel version and apply this fix to avoid crashes!
*This post is exclusive and written in plain language for clarity. Please check the official links for up-to-date patches and advisories.*
Timeline
Published on: 02/27/2024 10:15:07 UTC
Last modified on: 04/10/2024 16:31:14 UTC