CVE-2021-46995 - Linux Kernel CAN Driver Vulnerability and How It Was Fixed

The Linux kernel is the core of millions of computers, servers, and devices. Security vulnerabilities in it can lead to severe consequences. In late 2021, the kernel developers fixed a critical issue in the CAN (Controller Area Network) driver, specifically within the implementation of the mcp251xfd chip driver. This post will break down what happened, how the bug could be exploited, and how the developers fixed it. You'll find code snippets, explanations, and links for further reading.

Vulnerable Function: mcp251xfd_probe()

- Type: Error pointer dereference (leads to kernel crash/Oops)

What Went Wrong?

When the CAN driver for the Microchip MCP251xfd chip was refactored to use Linux’s neat dev_err_probe() function, a subtle bug sneaked in. The developer accidentally _removed a return statement_! Previously, if getting the chip’s clock failed, the function would immediately exit. But after this change, the function kept running, even with an invalid clock pointer. This led to a kernel Oops the next time code tried to access the clock.

Relevant Commit and Patch

You can read the original patch here:
Linux kernel Git commit: mcp251xfd: fix a pointer dereference in probe

Here's a simplified version of the code that caused the problem

struct clk *clk;

clk = devm_clk_get(&spi->dev, NULL);
if (IS_ERR(clk)) {
    dev_err_probe(&spi->dev, PTR_ERR(clk), "Failed to get clock\n");
    // The return was accidentally removed!!!
}
// This call is unsafe if clk is an error pointer.
clk_rate = clk_get_rate(clk);

Notice what happens if devm_clk_get() fails? In old code, we'd return right away. Now, without a return, clk_get_rate(clk) runs—and if clk isn't valid, boom: kernel crash.

How Could Someone Exploit This?

If a local user can force the driver to load in an environment where devm_clk_get() fails (for example, using a crafted device tree or by unhooking hardware), the kernel will dereference an error pointer. This instantly crashes the system or causes a "kernel Oops". While it's not a privilege escalation, it’s an easy way to cause denial of service.

Note: In almost all setups, this requires root (admin) privileges to load kernel modules or change device trees.

Here’s an example illustrating the problem (simplified pseudocode)

// Assume "spi->dev" is set up for a non-existent hardware clock
clk = devm_clk_get(&spi->dev, NULL);   // returns an error pointer
// No return after error!
clk_rate = clk_get_rate(clk);          // Oops! Triggers a crash

If you set up an environment where the device tree omits the clock definition, the above will trigger the bug on load.

The fix is as simple as restoring the missing return statement

clk = devm_clk_get(&spi->dev, NULL);
if (IS_ERR(clk)) {
    dev_err_probe(&spi->dev, PTR_ERR(clk), "Failed to get clock\n");
    return PTR_ERR(clk); // The missing return is now back
}
clk_rate = clk_get_rate(clk);

Now, the probe function will exit if the clock can’t be acquired, preventing the dereference of an error pointer.

You can see the commit diff here:
https://git.kernel.org/linus/fcabe9427ac7f764eba5e6b838051ac32b1ecec

References

- CVE-2021-46995 on NVD
- Linux Kernel Patch
- CAN Bus on Wikipedia
- Linux Kernel Documentation: Error Pointer Functions

Conclusion

CVE-2021-46995 is a great reminder that even experienced kernel developers sometimes slip up. A single missing return after an error check led to a kernel module that could crash the whole system. Luckily, the bug was identified and fixed quickly. If you maintain embedded Linux devices or work with CAN-equipped hardware, make sure your kernel is up-to-date!

Got questions or want to see more bug deep-dives? Let me know in the comments!

Timeline

Published on: 02/28/2024 09:15:37 UTC
Last modified on: 12/06/2024 14:55:32 UTC