The Linux kernel is at the heart of modern computing, running everything from servers to cloud platforms. Security flaws in its code can have sweeping ramifications. In June 2024, maintainers fixed a severe bug, CVE-2024-53120, that could cause a kernel crash by dereferencing a null pointer deep within the networking stack—specifically, in devices using Mellanox/MLX5 drivers for advanced network features. In this article, we’ll break down what the bug is, how it’s triggered, and why every affected system should update immediately.
What is CVE-2024-53120?
CVE-2024-53120 is a vulnerability in the "mlx5e" network driver’s connection tracking code inside the Linux kernel. More precisely, in the error-handling path of the mlx5_tc_ct_entry_add_rule() function. When an error happens during a rule addition (when offloading Connection Tracking rules to the device), the code mishandled pointers, leading to a NULL pointer dereference and a kernel panic.
Affected Component
- File: drivers/net/ethernet/mellanox/mlx5/core/en_tc_ct.c
Function: mlx5_tc_ct_entry_add_rule()
- Hardware: Mellanox ConnectX-5/6+ network adapters (mlx5 driver)
- Kernel Versions: Before the upstream patch
Here’s a simplified code fragment of the vulnerable logic
// Pseudo code: vulnerable error flow
int mlx5_tc_ct_entry_add_rule(...) {
struct mlx5_flow_attr *attr;
struct mlx5_flow_handle *rule;
zone_rule->attr = NULL; // not set yet
/* ...some operations... */
// try to add rule via callback
rule = ct_rule_add(attr);
if (IS_ERR(rule)) {
// On error, buggy code tries to access zone_rule->attr, which is NULL
mlx5_del_flow_rules(zone_rule->attr); // CRASH!
// Should’ve used 'attr', not zone_rule->attr
return PTR_ERR(rule);
}
zone_rule->attr = attr;
// ... more logic ...
}
If ct_rule_add(attr) fails, the error-handling code tries accessing zone_rule->attr, which is STILL NULL—it was never set! Referencing it results in a "null pointer dereference", which crashes the kernel. The correct fix is to use the local attr pointer, which may still hold a valid value.
If this code path is hit, administrators will see logs similar to
BUG: kernel NULL pointer dereference, address: 000000000000011
RIP: 001:mlx5_tc_ct_entry_add_rule+x2b1/x2f [mlx5_core]
Call Trace:
mlx5_tc_ct_entry_add_rule+x2b1/x2f [mlx5_core]
mlx5_tc_ct_block_flow_offload+xc6a/xf90 [mlx5_core]
nf_flow_offload_tuple+xd8/x190 [nf_flow_table]
flow_offload_work_handler+x142/x320 [nf_flow_table]
process_one_work+x16c/x320
worker_thread+x28c/x3a
kthread+xb8/xf
ret_from_fork+x2d/x50
This would typically happen when offloading flow tables, resulting in a complete loss of networking on the machine until manually restarted.
High-performance computing (HPC): Clusters with Open vSwitch and CT acceleration
If your kernel includes Mellanox (mlx5_core) and does not have this patch, you're at risk.
How to Exploit (Trigger) This Bug
While this is not a "remote code execution," triggering the bug is straightforward for anyone who has control over network configuration or flow rules:
1. Configure Connection Tracking Offload: Enable CT offload with a malformed or unsupported configuration that makes ct_rule_add(attr) fail.
Here is a conceptual example in pseudo code
# Not real code! Demonstration only.
# Step 1: Set large/invalid offload rule to force ct_rule_add() to fail.
command = "ovs-ofctl add-flow br actions=ct(commit),NORMAL" # Or similar via tooling
# Step 2: Kernel processes, ct_rule_add() returns error
# Step 3: mlx5_tc_ct_entry_add_rule tries to cleanup, derefs zone_rule->attr (NULL)
# Step 4: Host crashes
If you have access to the system networking setup or a privileged shell, you can easily induce a denial of service.
Patch & Mitigation
Upstream Fix:
Linux kernel commit 6ec16bf1d348 (June 2024)
Patched code snippet
// FIXED: Use 'attr' in error path, not 'zone_rule->attr'
if (IS_ERR(rule)) {
mlx5_del_flow_rules(attr);
return PTR_ERR(rule);
}
zone_rule->attr = attr;
References
- Upstream Patch Commit
- Linux Kernel Source: mlx5e Connection Tracking
- Patch Discussion Thread
Takeaway
CVE-2024-53120 is a prime example of how a single missing assignment can bring down entire data centers. All users of Mellanox (mlx5) hardware and CT offloading should update immediately to prevent accidental or malicious kernel panics. Stay patched and never overlook error handling, even by a single pointer reference.
*This is an exclusive summary and practical breakdown of CVE-2024-53120 for security teams, system administrators, and kernel developers. For the latest security advisories, follow kernel.org and your distribution’s security announcements.*
Timeline
Published on: 12/02/2024 14:15:12 UTC
Last modified on: 12/19/2024 09:39:38 UTC