CVE-2024-50138 - Linux Kernel BPF RingBuffer Race – Preemption Bug and Resolution

---

Introduction

Recently, a significant bug was patched in the Linux kernel that affected the behavior of the BPF (Berkeley Packet Filter) ring buffer. Catalogued as CVE-2024-50138, this vulnerability involved the use of an ordinary spinlock_t in a context where it could cause kernel warnings and instability, especially on real-time (RT) Linux kernels. Here’s an explainer in simple language, so you understand the risk, the fix, and the journey from problem to patch.

Background

- BPF ring buffers are important kernel data structures used for fast communication between the kernel and user space, especially for tracing and monitoring.
- These ring buffers are accessed in concurrent environments, so a lock is always used to avoid data races.

The function under focus, __bpf_ringbuf_reserve, is meant to safely reserve space in the ring buffer, and it used a spinlock_t for protection.

The Root Problem

This function sometimes gets called from a tracepoint context, which disables *preemption* (preventing the task from being scheduled out). Under this circumstance, using the ordinary spinlock_t is risky. If another code path tries to take this lock and decides to “sleep” (for example, if the lock is busy), the scheduler can’t help because preemption is already off. This leads to the dreaded "sleep in atomic" warning, a tell-tale sign that the kernel is about to get into trouble.

Here's what it looks like when things go wrong

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): , non_block: , pid: 556208, name: test_progs
preempt_count: 1, expected: 
RCU nest depth: 1, expected: 1
INFO: lockdep is turned off.
Preemption disabled at:
[<ffffd33a5c88ea44>] migrate_enable+xc/x39c
CPU: 7 PID: 556208 Comm: test_progs Tainted: G
...
__bpf_ringbuf_reserve+xc4/x254
...

Technical Dive: Why Did This Happen?

- spinlock_t: On real-time (RT) Linux kernels, the regular spinlock_t can be transformed into a “sleeping” lock, to ensure the kernel remains preemptible — but only where that’s safe.
- raw_spinlock_t: For contexts where sleeping is forbidden (like with preemption disabled), developers must use raw_spinlock_t, which is guaranteed not to sleep and is safe to use in these atomic sections.

If you use spinlock_t with preemption disabled (like in a tracepoint), the kernel might try to sleep while holding the lock. That's Not Okay™.

Exploit Potential

While this bug did not allow a typical privilege escalation or arbitrary code execution, it could be exploited to crash the kernel or force it into an unstable state. An attacker with the ability to run BPF programs (typically root, but sometimes less privileged users on misconfigured systems) could potentially trigger this warning or even a more severe failure by provoking the race condition repeatedly.

The Patch

To fix the crash, maintainers replaced the use of spinlock_t with raw_spinlock_t in these critical paths. Here’s the essence of the patch:

// old code
spinlock_t lock;

// new code
raw_spinlock_t lock;

And wherever the lock is used, the associated API is swapped to raw_spin_lock_irqsave() and friends.

Why does this work?
Because a raw_spinlock_t never sleeps. It is always safe to use in preempt-disabled contexts such as tracepoints, interrupt handlers, and so on.

Patch Snippet

Here’s a pseudocode sketch, inspired by the real commit:

- spinlock_t lock;
+ raw_spinlock_t lock;

// Locking code
- spin_lock_irqsave(&lock, flags);
+ raw_spin_lock_irqsave(&lock, flags);

// Unlocking
- spin_unlock_irqrestore(&lock, flags);
+ raw_spin_unlock_irqrestore(&lock, flags);

How To Mitigate Now

Upgrade your kernel!
The best fix is to update the kernel to a version that includes the patch (see links below).

If you develop with BPF:
Be careful with locking primitives in tracepoint and preempt-disabled environments. Always match the locking primitive to the context.

References

- CVE-2024-50138 at Mitre
- Upstream Patch Commit
- Linux Kernel BPF Ringbuf Code
- LKML Original Patch Discussion _(replace with real thread when available)_

Conclusion

CVE-2024-50138 is a classic example of how deep-kernel mechanics, like which locking primitive is used where, can lead to instability and possible denial-of-service, especially on specialized kernels like RT Linux. The fix is simple: use the right tool for the right job — in this case, raw_spinlock_t for atomic, preemption-disabled contexts.

If you’re a developer, learn from this bug. If you’re an admin, patch now.

*Exclusively written for this request. Please contact me if you want a breakdown of similar kernel issues!*

Timeline

Published on: 11/05/2024 18:15:16 UTC
Last modified on: 11/08/2024 14:27:41 UTC