The Linux kernel is the heart of most servers, desktops, and embedded systems today, powering billions of devices. When there’s a bug deep in its core networking code—like CVE-2024-36938—it can have serious safety, security, and availability consequences. This article will break down the vulnerability, explain what caused it, how it can be exploited, and how it was fixed—using plain American English and relevant code snippets and links.

What is CVE-2024-36938?

CVE-2024-36938 is a bug in the Linux kernel’s BPF (Berkeley Packet Filter) infrastructure, specifically in the way socket message (skmsg) buffers are handled internally. The flaw was a NULL pointer dereference, which could lead to kernel panics (crashes) and possible denial-of-service (DoS). It was first reported by syzbot, an automated tool that finds Linux kernel bugs.

In detail:
The problem is in a function called sk_psock_skb_ingress_enqueue(), which handles incoming socket messages for BPF programs attached to sockets. Under some conditions, because of improper locking, the pointer it tries to use can be set to NULL by another thread in the middle of its operation.

What Actually Happened? (The Root Cause)

When two threads operate on the same kernel socket's data, they must do so carefully to avoid data races. In this case, one thread could clear (i.e., set to NULL) key callback pointers while another thread tried to use them. That leads to a crash like this:

BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
write to ... by task ... sk_psock_drop ...
read to ... by task ... sk_psock_skb_ingress_enqueue ...
value changed: xffffffff83d7feb -> x000000000000000

Here’s a simplified diagram

Thread A:                Thread B:
sk_psock_drop()          sk_psock_skb_ingress_enqueue()
   |                        |
   |--- set callback NULL   |
                          <-- tries to use NULL callback

Here’s an annotated snippet from the kernel, showing where the danger was

/* net/core/skmsg.c */

/* BAD: No lock, can race! */
if (psock->saved_data_ready)
        psock->saved_data_ready(sk);

If another thread clears saved_data_ready right here, we have a NULL pointer dereference.

The fix was to always use a read lock on sk_callback_lock while reading or setting the callback, matching the write lock used on the write side:

/* net/core/skmsg.c - Patched Version */
read_lock_bh(&sk->sk_callback_lock);
if (psock->saved_data_ready)
        psock->saved_data_ready(sk);
read_unlock_bh(&sk->sk_callback_lock);

Now, no two threads can race on this pointer anymore.

Linux users on kernel versions before the fix (Linux 6.8+)

- Any system using BPF and socket maps—especially those using advanced networking (e.g., containers, servers, proxies)

Exploitability

- Denial-of-Service (crash): Reliable kernel oops (panic) by triggering the race, causing the device to reboot or freeze
- Local exploit: An attacker with the ability to create and close BPF socket maps can exploit this. Remote exploitation is possible if untrusted users can trigger complex socket operations.

Exploit Proof-of-Concept

While a full exploit requires careful thread timing, the following pseudo-code demonstrates the concept:

// Thread 1: Frees the psock callback (simulate close)
close(sock);

// Thread 2: Triggers packet processing at the same time
send(sock, packet, ...); // Causes sk_psock_skb_ingress_enqueue() execution

Using tools like syzkaller and syzbot, such races can be auto-triggered.

Patch, Mitigation, and Recommendations

Patch Commit:
The fix has been merged into the kernel mainline. You can review it here:
- Mainline Linux Commit: bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
- syzbot Bug Reference

Upgrade ASAP:
Kernel maintainers and distributions should apply the fix, especially on environments with dynamic BPF socket maps or untrusted users.

Manual Mitigation:

Limit creation and manipulation of BPF socket maps to trusted code

- Use kernel hardening features (see Kernel Self Protection Project)

Conclusion

CVE-2024-36938 is a classic example of how concurrency and pointer management issues in the kernel can lead to crashes and vulnerabilities. Thanks to automated fuzzers like syzbot and quick work by the kernel community, it was found and fixed before widespread exploitation.

If you’re running an affected system, upgrade your kernel as soon as possible. For advanced users, review your kernel logs for crashes matching the bug’s signature. Stay safe!

References

1. Linux Kernel Patch: bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
2. syzbot Bug Tracker
3. Kernel Concurrency Sanitizer (KCSAN)
4. Linux BPF Documentation
5. Kernel Self Protection Project


If you want to stay current with new Linux vulnerabilities, follow projects like syzkaller and subscribe to oss-security.

Timeline

Published on: 05/30/2024 16:15:16 UTC
Last modified on: 07/29/2024 07:15:03 UTC