When it comes to the Linux kernel, even the smallest bug can lead to system-wide consequences. One such issue, now tracked as CVE-2024-42246, concerned the handling of permission errors (EPERM) in the kernel’s SunRPC stack during TCP socket setup. This post offers an exclusive deep-dive into the bug, its risks, how it was patched, and what it means for users and sysadmins.

What is CVE-2024-42246?

CVE-2024-42246 refers to a vulnerability in the way SunRPC handles kernel-level TCP connections when BPF (Berkeley Packet Filter) programs are in play. To put it simply, the Linux kernel could enter an endless loop and eventually freeze if certain errors weren’t handled properly while connecting sockets.

When an application or service uses SunRPC over TCP, the kernel sets up a socket.

- If a BPF program is hooked into the kernel_connect() process (common in modern observability or security tools), it can sometimes cause the connection to fail with an -EPERM ("Operation not permitted") error.
- The kernel’s function xs_tcp_setup_socket() wasn’t ready for this error code. Instead of stopping, it kept retrying the operation. This led to an infinite loop, filled the system logs, and in worst cases, could freeze the whole system.

Neil Brown, a kernel developer, described the core issue as

> “This will propagate -EPERM up into other layers which might not be ready to handle it. It might be safer to map EPERM to an error we would be more likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.”

Resource exhaustion: The endless loop consumed CPU, filled logs, and could SAP system resources.

- Potential DOS: Malicious or accidental BPF programs could trigger the bug and bring down a server.
- Unexpected error propagation: Higher-level network software might not know how to deal with -EPERM at this stage—which could cause silent failures or unpredictable behavior.

Kernel function at the heart of this bug

static int xs_tcp_setup_socket(struct sock_xprt *transport)
{
    // ... setup code ...

    // Establish the TCP connection
    ret = kernel_connect(sock, addr, addrlen, );

    if (ret == -EPERM) {
        // Old Broken Behavior: retry forever!
        // New Behavior: remap to a network-native error
        ret = -ECONNREFUSED;
    }

    // ...cleanup and return...
    return ret;
}

Before the fix:
When kernel_connect() returned -EPERM, xs_tcp_setup_socket() just retried, leading to infinite looping.

After the fix:
Now, when -EPERM is encountered, it is remapped to -ECONNREFUSED, an error code that network layers and applications are designed to handle.

Exploit Scenario and Impact

While this bug is more of a denial-of-service (DoS) hazard than a privilege escalation or code execution vulnerability, it’s still serious:

- Any user (or attacker) who can load a BPF program and cause kernel_connect() to return -EPERM could freeze a server utilizing SunRPC services (like NFS).
- Because the kernel gets stuck in an infinite loop, the machine may become unresponsive and require a hard reset.

Suppose a system has a malicious or misconfigured BPF program that blocks TCP connections

SEC("cgroup/connect4")
int block_connect(struct bpf_sock_addr *ctx)
{
    // Block all connections: deny by returning EPERM
    return -EPERM;
}

If this BPF program is loaded and applied, and then an application tries to mount an NFS share (thus invoking SunRPC over TCP), it triggers the loop in the kernel until patched.

The issue was fixed in the following kernel commit

- linux commit d99eecb899a ("net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket")

Update your kernel to a version that includes this fix (June 2024 or later).

2. Limit BPF program loading to trusted users only. Use bpf-lsm and other security mechanisms.
3. Monitor system logs for repeated connection errors or flooding—this could indicate either this issue or similar resource exhaust problems.

Further References

- The commit fixing this bug
- Related discussion on lore.kernel.org
- Overview of SunRPC in Linux
- Understanding BPF and cgroup hooks

Conclusion

CVE-2024-42246 is a classic case of how subtle error handling can have outsized effects in a complex system like the Linux kernel. By handling -EPERM correctly and mapping it to a more expected network error, the kernel now avoids dangerous infinite loops and freezes.

Sysadmins should promptly update their kernels and restrict who can run BPF programs, especially on servers that provide RPC-based services.

Timeline

Published on: 08/07/2024 16:15:47 UTC
Last modified on: 08/08/2024 14:52:35 UTC