CVE-2021-46925 - Fixing a Kernel Panic in Linux SMC Sockets

In December 2021, CVE-2021-46925 was assigned to a high-impact kernel issue affecting the Linux SMC (Shared Memory Communications) network subsystem. This bug caused random kernel panics (crashes) because of a race between two threads: one thread releasing an SMC socket, and another thread trying to use it. Let’s break down what happened, why it was dangerous, and how it was fixed.

What’s the SMC Subsystem?

SMC stands for Shared Memory Communications, a feature in Linux networking that enables high-throughput, low-latency communication over RDMA or RoCE—mainly used in cloud and data center environments.

What Was the Bug?

The issue was a race condition between smc_cdc_tx_handler() and smc_release().

smc_release() is called when the socket is being closed and destroyed.

The handler checks if there’s an ongoing connection (i.e., if conn exists), but before it can lock the socket, the release function might have already destroyed it, making the memory pointer invalid. If the handler then tries to access the freed memory, you get a classic Use-After-Free, which results in an immediate kernel panic (crash).

Here’s what such a kernel panic might look like in dmesg

[ 457.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 457.696048] #PF: supervisor write access in kernel mode
[ 457.696728] #PF: error_code(x0002) - not-present page
... 
[ 457.711446] Call Trace:
[ 457.711992]  smc_cdc_tx_handler+x41/xc
[ 457.712470]  smc_wr_tx_tasklet_fn+x213/x560
[ 457.712981]  ? smc_cdc_tx_dismisser+x10/x10
[ 457.713489]  tasklet_action_common.isra.17+x66/x140
[ 457.714083]  __do_softirq+x123/x2f4

Here’s a simplified view of the race condition

// Thread 1
smc_cdc_tx_handler() {
    if (!conn)
        return;
    // ...
    bh_lock_sock(&smc->sk); // CRASH! smc_sock maybe already freed
    // ...
}

// Thread 2
smc_release() {
    smc_cdc_tx_dismiss_slots();
    // ...
    sock_put(&smc->sk); // Last reference: frees smc_sock
}

Why Was This Dangerous?

Because SMC is used in high-throughput scenarios (like Alibaba Cloud in the example), a race leading to immediate kernel panic is catastrophic. It could be triggered during massive socket churn or high concurrency, resulting in system downtime.

How Was It Fixed?

The fix introduces a reference counter (refcount) on the SMC connection object for each inflight CDC message. As long as there are CDC messages that might still try to access the socket, the smc_connection (and therefore the socket’s memory) will not be freed.

For every post of a CDC message, increase the refcount.

- Don’t release (free) the smc_connection until all in-flight CDC messages complete (success or fail).
- When tearing down an RDMA device, wait for all QPs (Queue Pairs) to be released before destroying Completion Queues. Otherwise, not all refcounts would drop to zero.
- In case of a link reset, always ensure every CQE (Completion Queue Entry—events from HW) will be generated, letting the refcount actually reach zero; to do this, use smc_ib_modify_qp_error() instead of smc_ib_modify_qp_reset().

Simplified Pseudocode (Fix Style)

// Increase refcount on posting CDC message
atomic_inc(&conn->cdc_pending_count);

// On CDC completion (CQE received), decrease refcount
atomic_dec(&conn->cdc_pending_count);

// When releasing smc_connection
wait_until(atomic_read(&conn->cdc_pending_count) == );

// Only now safe to free the connection and smc_sock

Here’s a conceptual snippet showing the idea of the fix (not the actual production patch)

void post_cdc_message(struct smc_connection *conn) {
    atomic_inc(&conn->cdc_pending_count);
    // post work, etc.
}

void cdc_complete(struct smc_connection *conn) {
    // CDC CQE received
    if (atomic_dec_and_test(&conn->cdc_pending_count)) {
        // Last reference, can now free conn
        kfree(conn);
    }
}

void smc_release(struct smc_sock *smc) {
    // Wait for all CDC messages to finish
    wait_until(atomic_read(&smc->conn->cdc_pending_count) == );
    // Free smc_sock, safe now
    kfree(smc);
}

Exploit Scenario

While this bug is mainly a reliability/availability risk (DoS via kernel panic), a malicious process with the ability to open and close many SMC sockets rapidly could trigger panics repeatedly, crashing the server each time.

This could be abused for remote denial of service, especially on public cloud environments or multi-tenant systems using SMC.

- Patch discussion and fix in LKML
- CVE-2021-46925 Entry on NVD
- Linux kernel source: net/smc/
- Linux SMC Documentation

Conclusion

CVE-2021-46925 was a critical concurrency bug in the Linux kernel’s SMC handling. Thanks to careful refcounting added to the SMC connection code, it’s now safe to use SMC sockets in even the most demanding environments. However, this bug highlights how intricate and dangerous kernel-level race conditions can be.

If you run a kernel with SMC/RDMA enabled, make sure you’re patched!

*Written exclusively for this post. Please share and help raise awareness about the need for safe kernel-level coding!*

Timeline

Published on: 02/27/2024 10:15:07 UTC
Last modified on: 04/10/2024 15:22:29 UTC