A critical vulnerability, CVE-2025-21681, was recently patched in the Linux kernel affecting the Open vSwitch (OVS) module. This bug could lead to a system lockup when transmitting packets to a network device that is being unregistered but still claims to have a carrier. As Open vSwitch is widely used in cloud and network virtualization environments, this flaw could affect many deployments.

In this article, we’ll break down how the bug happened, why it is serious, and how it has been fixed—with code snippets and pointers to the official patch.

The Root Cause: Infinite Loop on Unregistering Netdev

When an OVS user space process or the Linux kernel attempts to transmit a packet, the following function call-chain is triggered:

do_output
 └─> ovs_vport_send
     └─> dev_queue_xmit
         └─> __dev_queue_xmit
             └─> netdev_core_pick_tx
                 └─> skb_tx_hash

Inside skb_tx_hash(), the code tries to pick an appropriate queue for packet transmission based on a hash and the number of TX queues (dev->real_num_tx_queues). The issue occurs here:

During device unregistration, dev->real_num_tx_queues drops to zero.

- The hashing logic may loop infinitely because the terminating condition (hash >= qcount) is never satisfied if qcount is zero.

This leads to a system hang, requiring a forced reboot because the kernel gets stuck in an infinite "while" loop. The vulnerability is triggered especially during the use of "dummy" network devices, commonly used for testing and analysis in OVS environments, since they may report a carrier even while being unregistered.

Here’s a simplified version of the problematic logic in skb_tx_hash()

unsigned int qcount = dev->real_num_tx_queues;
unsigned int hash = get_skb_hash(skb);

while (unlikely(hash >= qcount)) {
    /* Infinite loop if qcount ==  */
    hash /= 2;
}
return hash;

Originally, a fix was attempted through checking if the device still has a carrier

if (!netif_carrier_ok(dev))
    return -ENETDOWN;

However, some devices (like net/dummy) always report carrier ON, even when being unregistered, because they don't have real hardware to reflect carrier state. This means relying on carrier status alone isn't safe.

The Final, Proper Fix

The patch adds an additional check for whether the device is *running*, not just if it has carrier. The running state is properly managed by the net core and is always unset during device unregistration.

Fixed code snippet in ovs_vport_send

if (!netif_running(dev) || !netif_carrier_ok(dev))
    return -ENETDOWN;

This ensures that packets are only sent to devices that are both running *and* have carrier, matching the approach used in other parts of the kernel.

You can see the official commit here

- Linux kernel git commit

Example

ip link add dummy type dummy
ovs-vsctl add-port br dummy
# Start traffic through br/dummy
ip link del dummy  # Triggers lockup pre-patch

Result: The kernel hangs and the only way out is to reboot.

Impact: Denial of Service (kernel lockup).

- Who is affected: Any Linux system running Open vSwitch that uses dummy or similar devices in OVS bridges, especially for debugging or monitoring.

Apply the fix by backporting the patch or updating distributions.

- Avoid deleting OVS ports/devices while traffic is flowing (workaround, not recommended as a permanent solution).

References

- Linux kernel patch commit
- Open vSwitch patch discussion
- Kernel Bugzilla: CVE-2025-21681

Conclusion

The CVE-2025-21681 vulnerability highlights how subtle logic bugs in kernel networking code can have major availability implications, even when only using virtual network interfaces. The fix is straightforward—double check both carrier *and* running status of devices before transmitting—but critical for OVS and kernel reliability. Make sure your systems are patched!

Timeline

Published on: 01/31/2025 12:15:29 UTC
Last modified on: 02/21/2025 16:54:12 UTC