A serious race condition vulnerability has been found and fixed in the Linux kernel’s i40e network driver. This bug, assigned CVE-2024-53088, could lead to MAC/VLAN filter corruption, memory leaks, and unpredictable network errors—especially on systems running under heavy network virtualization workloads. This post breaks down the issue, the exact kernel code involved, an exploit scenario, and how the official fix works.
## Background: The i40e Driver and MAC/VLAN Filtering
The i40e driver is used by Intel 40Gb Ethernet adapters. When using SR-IOV Virtual Functions (VFs), this driver sets up MAC and VLAN filters. These filters decide which packets reach which virtual instances.
Filters can be modified concurrently: for example, admin scripts can change MAC addresses, while automated tools alter VLAN assignments. This concurrency, if not managed properly, opens up the possibility for race conditions—bugs that only show up when timing aligns just unfavorably.
What’s the bug?
The race condition in the i40e driver happens when two threads (or CPU cores) do the following, almost at the same time:
1. Thread T calls i40e_add_filter() (typically from i40e_ndo_set_vf_port_vlan()). This allocates a new MAC/VLAN filter object, and temporarily adds it to a working list (tmp_add_list).
2. Thread T1 calls __i40e_del_filter() (typically from i40e_ndo_set_vf_mac()) and frees (deletes) the same filter, unaware that it’s still being referenced elsewhere.
3. Afterwards, a service task (i40e_service_task() → i40e_sync_vsi_filters()) runs and accesses the same filter, though its memory has been freed. Result: memory and filter list corruption.
- dmesg logs like
Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX, please set promiscuous on manually for VF XX
Here’s a simplified version, highlighting the critical path
// Called when setting a port VLAN
int i40e_add_filter(struct i40e_vsi *vsi, const struct i40e_mac_filter *filter) {
struct i40e_mac_filter *newf;
newf = kzalloc(sizeof(*newf), GFP_KERNEL);
// ... setup filter fields ...
// Add to tmp_add_list (intermediate state)
list_add(&newf->list, &vsi->tmp_add_list);
// ... continues
return ;
}
// Called when changing MAC address, possibly by another thread
int __i40e_del_filter(struct i40e_vsi *vsi, const struct i40e_mac_filter *filter) {
// ... find filter in hash list
kfree(filter); // DANGER: memory might still be referenced elsewhere!
return ;
}
// Periodic service task
void i40e_sync_vsi_filters(struct i40e_vsi *vsi) {
// Walks filters, including ones just freed!
}
while true; do
ip link set vf $i mac $(openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/:$//')
Result
Within seconds to minutes, you should see the error listed above. MAC/VLAN assignments may also become inconsistent or nonfunctional.
#### Source: Intel LKML Patch Discussion
Note: Intel does not provide an open-source “proof-of-concept exploit”, arguing that the above concurrent mac/vlan configuration is a sufficient reproducer.
The Fix: Introducing an Intermediate Filter State
The race occurs because a filter being added (but not yet fully synced) can be deleted and freed. The fix is to add an intermediate filter state, named I40E_FILTER_NEW_SYNC, and ensure filters in this state can’t be deleted/free’d directly.
When adding a filter, it’s set to I40E_FILTER_NEW_SYNC while on the temporary add list.
- Deletion does not free filters in this state. Instead, deletion marks the filter for removal using the full filter management process (ensuring all references are done before free’ing).
Key snippet from the patch
// Define new filter state
enum i40e_filter_state {
I40E_FILTER_UNUSED,
I40E_FILTER_NEW_SYNC, // <--- New intermediate state
I40E_FILTER_ACTIVE,
// ... other states
};
// When adding a filter, assign the new state
filter->state = I40E_FILTER_NEW_SYNC;
// When deleting, check the state:
if (filter->state == I40E_FILTER_NEW_SYNC) {
// Don't kfree yet! Instead, mark for full removal process.
}
Full official patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a17240ac10e
Who’s affected?
- Any system using Intel 40G cards with i40e driver (common in datacenters/cloud).
- Especially severe in SR-IOV or virtualized setups with frequent MAC/VLAN churn.
Mitigations and Recommendations
- Patch your kernel. This issue was resolved in Linux mainline as of June 2024 (see this commit).
- If you cannot update immediately, avoid high rates of concurrent MAC/VLAN changes via VFs.
Official kernel commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a17240ac10e
NVD listing:
https://nvd.nist.gov/vuln/detail/CVE-2024-53088
Kernel mailing list discussion:
https://lore.kernel.org/netdev/20240531103851.1328289-1-ryazanov.s.a@gmail.com/
Conclusion
CVE-2024-53088 is a significant but non-remote Linux bug that can destabilize high-throughput virtualized hosts, especially in environments juggling network isolation. This is a classic case where concurrency meets kernel complexity, and why timely patching remains crucial for operational stability and security.
Timeline
Published on: 11/19/2024 18:15:27 UTC
Last modified on: 11/25/2024 13:38:07 UTC