CVE-2024-50062 - How a Linux Kernel Null Pointer Bug Was Fixed in RDMA RTRS

A new security vulnerability, CVE-2024-50062, was found, reported, and patched in the Linux kernel RDMA subsystem, specifically in the Remote Direct Memory Access (RDMA) RTRS (RDMA Transport Resilient Server) service. The flaw could let attackers trigger a kernel crash (a likely denial of service scenario) by causing a null pointer dereference during RTRS path establishment. In simple terms, the kernel tried to use a value that hadn't been set, leading to a potential system crash.

This post walks you through what happened in plain language, details the fix, shows code snippets, and explains the potential risks—making it easy to understand even if you aren't an RDMA expert.

What is RDMA and RTRS?

RDMA enables computers to send data directly from the memory of one system to another without involving the CPU—speeding up data transmission, widely used in data centers.

RTRS, or RDMA Transport Resilient Server, is a kernel protocol for building reliable RDMA connections between clients and servers. RTRS is used for high-performance cluster file systems, distributed storage, etc.

About the Vulnerability

When a client wants to establish a path (communication channel) to a server via RTRS, it sets up several connections. Once all the connections are up, the client and server exchange a special message called info_req. At this moment, all connections must be up and the RTRS path must be fully connected.

But there was a flaw:
The Linux kernel didn’t check if all connections were truly finished and if the path was in the right state. If these conditions weren’t met, the code could try to use a pointer that was still NULL (not pointing to anything)—causing a kernel oops (crash by “null pointer dereference”).

Name: CVE-2024-50062

- Affected Component: Linux Kernel rtrs-srv (drivers/infiniband/ulp/rtrs/rtrs-srv.c)

Type: Null pointer dereference (DoS risk)

- Fixed in: Linux kernel after commit fbeb6fa83008

Leading distributions will patch this soon—check with your vendor. Exploiting the bug would require remote access to an RTRS server endpoint (which isn’t common on public servers, but possible in clusters or HPC environments).

Code Walkthrough

The issue centers around failing to check if all the required connections are actually set up. Here’s (simplified) what used to happen:

// Old, buggy code: (stripped down)
static int process_info_req(struct rtrs_srv_path *srv_path, ...)
{
    // ... omitted ...
    // No check if all connections present or path state is CONNECTED
    srv_con = srv_path->srv_con[con_idx]; // Can be NULL!
    // ... use srv_con ...
}

If srv_con was still NULL (not initialized due to an incomplete connection), the kernel would crash when trying to use it. Typical kernel logs for administrators would show a stack trace ending with a null pointer deref in RTRS code.

Why isn't this caught by the compiler? Because kernel pointer checking must be handled manually—C won't help you here.

How Was CVE-2024-50062 Fixed?

The fix? Check that all connections are correctly set up and that the path is in the right state before proceeding. If not, gracefully bail out instead of accessing a NULL pointer.

Here’s the key part of the patch (simplified)

// New, safe code:
static int process_info_req(struct rtrs_srv_path *srv_path, ...)
{
    // ... omitted ...
    if (srv_path->state != RTRS_PATH_CONNECTED) {
        pr_warn("Path not connected (state: %d)", srv_path->state);
        return -EINVAL;
    }
    for (i = ; i < srv_path->con_num; i++) {
        if (!srv_path->srv_con[i]) {
            pr_warn("Connection %d not established", i);
            return -EINVAL;
        }
    }
    srv_con = srv_path->srv_con[con_idx];
    // ... safe usage ...
}

Now, if any srv_con is missing or the path hasn't reached CONNECTED state, the function aborts early.

Exploit Details

This vulnerability cannot, under normal conditions, be exploited remotely without access to the RTRS infrastructure. But if an attacker or buggy client can initiate RTRS path setup and send crafted or ill-timed messages before all connections are made, the kernel can be forced to dereference a null pointer—*triggering a crash/dos*.

Ability to control RTRS connection setup

- Knowledge/timing to force incomplete connection state

Denial of service for applications using RDMA over RTRS

No code execution or privilege escalation is possible from this alone. But causing production node downtime is serious in HPC/data center operations!

How to Tell If You're At Risk

- You are using RDMA RTRS (drivers/infiniband/ulp/rtrs)

Using a kernel tree before the patch, or a distribution without the CVE fix

Mitigation: Update your kernel, or restrict RTRS access to trusted clients only.

Reference Links

- Patch (Git commit): RTRS: Avoid null pointer deref during path establishment
- CVE Record: CVE-2024-50062 entry on MITRE *(may take a few days to appear)*
- Linux Kernel Source: drivers/infiniband/ulp/rtrs/rtrs-srv.c

Conclusion

CVE-2024-50062 is a good reminder that kernel code must always check its pointers, especially in complex networking and storage subsystems. Thanks to good code review, this null pointer deref bug is fixed with proper sanity checks—preventing system crashes that could affect uptime and reliability.

If you run kernel RDMA servers, patch now! For everyone else, see it as another lesson in robust programming—and why open-source kernel fixes really matter.

Timeline

Published on: 10/21/2024 20:15:18 UTC
Last modified on: 10/23/2024 21:48:57 UTC