A vulnerability has been discovered and resolved in the Linux kernel. The issue occurs in the nvmet-rdma subsystem and can lead to a NULL pointer dereference when a SEND operation is completed with an error, specifically, when the transport retry counter is exceeded.

Description of Vulnerability

When running traffic on a link and taking down the connection on the peer, a retry counter exceeded error is received. This leads to the execution of the nvmet_rdma_error_comp function, which attempts to access the cq_context to obtain the queue. However, the cq_context is no longer valid after the introduction of the shared CQ mechanism and should be obtained in a similar way to how it is obtained in other functions from the wc->qp.

Here is an example output of the issue occurring

[ 905.786331] nvmet_rdma: SEND for CQE x00000000e3337f90 failed with status transport retry counter exceeded (12).
[ 905.832048] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
...
[ 905.872135] RIP: 001:nvmet_rdma_error_comp+x5/x1b [nvmet_rdma]
...
[ 905.961855] Call Trace:
[ 906.012778] __ib_process_cq+x89/x170 [ib_core]
[ 906.017509] ib_cq_poll_work+x26/x80 [ib_core]
...

Exploit Details

The exploit triggers a kernel NULL pointer dereference error, which can cause undefined behavior, system crashes, or other unintended consequences. Currently, there are no known specific attacks leveraging this vulnerability, but resolving it is necessary to maintain the stability and security of affected systems.

Solution

The vulnerability is resolved by making sure that the cq_context is obtained properly within the nvmet_rdma_error_comp function. This ensures that the correct queue is accessed and avoids the NULL pointer dereference issue.

Users running affected Linux kernel versions should update their kernel to incorporate this fix and protect against potential issues caused by this vulnerability.

Original References

For more details on the original commit that resolved this issue, refer to this link.

For details on the shared CQ mechanism that was introduced and caused the issue, you can find information here.

For a comprehensive view of the Linux kernel sources, refer to the official Linux kernel repository.

Timeline

Published on: 02/28/2024 09:15:37 UTC
Last modified on: 02/28/2024 14:06:45 UTC