CVE-2024-46791 - Deadlock Resolved in Linux Kernel MCP251x CAN Driver

A serious vulnerability was discovered and fixed in the Linux kernel's MCP251x controller area network (CAN) driver. Labeled CVE-2024-46791, this bug could create a kernel deadlock under certain timing conditions, affecting stability and potentially leading to denial-of-service on systems using this CAN interface. This article gives an exclusive, beginner-friendly breakdown, including code walk-throughs, how the problem appeared, and the specifics of the fix. If you’re maintaining systems with CAN hardware (like automotive controllers or industrial gear) running on Linux, this post is for you.

What is the MCP251x Driver?

The MCP251x is a CAN controller popular in embedded and industrial Linux environments. The kernel driver drivers/net/can/spi/mcp251x.c manages communication between system software and the MCP2515 or MCP251 chip over SPI.

This driver uses IRQ (interrupt request) handlers to communicate efficiently with hardware. That means the driver needs to carefully synchronize "waking up" the hardware and processing interrupts, so things don't go out of order.

What Was the Vulnerability?

This is a deadlock bug in the way the MCP251x driver manages its locks and IRQs during device initialization.

Imagine two CPUs (or CPU cores) at work

- CPU: Running mcp251x_open() (which opens/initializes the CAN device)

Here’s the sequence

CPU: mcp251x_open()
  -> mutex_lock(&priv->mcp_lock)        // Grab lock
  -> request_threaded_irq()             // Register interrupt handler
       (Meanwhile, before we continue...)

CPU1: <interrupt fires>
  -> mcp251x_can_ist()                  // Interrupt handler runs
     -> mutex_lock(&priv->mcp_lock)     // Tries to grab same lock (blocks)
     
CPU continues:
  -> mcp251x_hw_wake()
     -> disable_irq()                   // Waits for handler to finish (blocks)

Problem:

CPU holds mcp_lock and waits for the interrupt handler to finish (using disable_irq()), but:

- CPU1's handler can't get the same lock (mcp_lock), so it's stuck, waiting for CPU to release it.

Result: Both are waiting on each other. This is a classic deadlock. The system (or at least CAN device access) hangs.

Before the Fix

// mc251x_open()
mutex_lock(&priv->mcp_lock);
...
request_threaded_irq(..., mcp251x_can_ist, ...);
...
mcp251x_hw_wake(priv);

// mcp251x_hw_wake()
disable_irq(priv->irq); // Will wait until interrupt handler is done

Interrupt Handler

static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
{
    mutex_lock(&priv->mcp_lock);
    ... // handle CAN event
    mutex_unlock(&priv->mcp_lock);
}

How Was It Fixed?

The fix revolves around using disable_irq_nosync() instead of disable_irq().

- disable_irq() actually waits for any running interrupt handlers to finish before returning—dangerous if you hold a lock that the handler wants.
- disable_irq_nosync() does not wait, it just disables future interrupts but if the handler is running, it doesn't care. In this code path, that's safe, since the handler always grabs the lock, maintaining integrity.

Main fix in mcp251x_hw_wake

- disable_irq(priv->irq);
+ disable_irq_nosync(priv->irq); // No waiting; just disables future IRQs

Why is this safe?
The IRQ handler always takes the mutex, so we won't have race conditions. This change removes the risk of deadlocks while preserving correctness.

Can this be triggered by an attacker?

Yes, with hardware access or a malicious device, an attacker could nudge the system to trigger interrupts at precisely the right (wrong!) time—causing CAN device initialization hangs.

It's more of a denial-of-service against local device users, especially in embedded/automotive, than remote code execution.

Demo/Trigger Sketch (Pseudo-C)

// This cannot be exploited from userspace, but with crafted hardware/driver races you get:
thread 1: open("/dev/can"); // triggers mcp251x_open(), locks mutex
   // meanwhile, hardware asserts IRQ line
thread 2: kernel interrupt handler fires, tries to lock same mutex

-> Both threads block, system function for CAN device stuck until reboot.

System Impact

- Affected: All kernels with MCP251x CAN support up to ~Apr/May 2024.
- Fixed: By commit 2c38ce274e98e6...

Prefer disable_irq_nosync() when you can't guarantee the lock is free.

- Read kernel API docs carefully—the difference between disable_irq() and disable_irq_nosync() is subtle but crucial.

References and Further Reading

- Linux kernel commit fixing CVE-2024-46791
- CAN: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open (lore.kernel.org)
- Kernel documentation: disable_irq() vs disable_irq_nosync()

Conclusion

CVE-2024-46791 is a great lesson in the pitfalls of interrupt handling and lock management in kernel space. The fix is now mainlined; update your kernels if you use MCP251x hardware! While the bug’s exploitation potential is mostly local, it's a stark reminder: concurrency bugs in kernel code are always lurking, and timing is everything.

If you work with Linux drivers or embedded systems, check your code for similar locking patterns—and always respect the delicate dance between IRQs and mutexes!

Timeline

Published on: 09/18/2024 08:15:06 UTC
Last modified on: 09/20/2024 18:21:19 UTC