A recently resolved vulnerability in the Linux kernel, specifically in the net/smc module, has addressed a kernel panic that was caused by a race condition of the smc sock module. The crash occurred when the smc_cdc_tx_handler() function tried to access the smc_sock, while the smc_release() function had already freed it.

- Linux Kernel Mailing List - Patch
- GitHub Merge Commit

smc_cdc_tx_handler()           |smc_release()
if (!conn)                     |
                               |
                               |smc_cdc_tx_dismiss_slots()
                               |    smc_cdc_tx_dismisser()
                               |
                               |sock_put(&smc->sk) <- last sock_put,
                               |                      smc_sock freed
bh_lock_sock(&smc->sk) (panic) |

The fix involved adding a refcount on the smc_connection for inflight CDC messages, which are posted to the QP but haven't received related CQE, and not releasing the smc_connection until all the inflight CDC messages have been completed or failed.

However, using a refcount on CDC messages created a new problem. When the link is about to be destroyed, the smcr_link_clear() function will reset the QP, which then removes all the pending CQEs related to the QP in the CQ. To ensure that all the CQEs will always come back, allowing the refcount on the smc_connection to reach , the smc_ib_modify_qp_reset() function was replaced by smc_ib_modify_qp_error().

Additionally, the timeout in the smc_wr_tx_wait_no_pending_sends() function was removed to prevent encountering a use-after-free issue when handling CQEs.

Finally, for the IB device removal routine, it's necessary to wait for the destruction of all the QPs on that device before destroying the CQs on the device. This ensures that the refcount on the smc_connection can reach , allowing the smc_sock to be properly released.

To sum up, the CVE-2021-46925 vulnerability has been fixed in the Linux kernel, addressing the kernel panic issue caused by the race of the smc_sock. Users and developers are encouraged to update their kernel to the latest version that incorporates this fix to avoid potential issues.

Timeline

Published on: 02/27/2024 10:15:07 UTC
Last modified on: 04/10/2024 15:22:29 UTC