In June 2024, a subtle but important vulnerability was resolved in the Linux kernel impacting s390 (IBM System/390 and zSeries) architectures. Traveling under the ID CVE-2024-57838, this bug didn’t directly lead to an exploit but could cause resource exhaustion and stability issues in environments where debugging tools like KASAN and Kernel Preemption were enabled. More importantly, it led to kernel warnings that alarmed sysadmins and kernel developers alike.
Let’s break down in simple language what happened, why it mattered, and how the fix was implemented — including some code snippets and technical details for those who want to peek under the hood.
The Problem: Stack Depot Overload
The stack depot is a feature of the Linux kernel that manages "stack traces" for things like memory allocations and error reports. It’s a kind of database for stack traces used by debugging tools (like KASAN, the Kernel Address Sanitizer) to help developers identify problems and deduplicate reports.
However, stack depot isn’t infinite. If it stores too many unique stack traces, it gets full and throws warnings like:
Stack depot reached limit capacity
WARNING: CPU: PID: 286113 at lib/stackdepot.c:252 depot_alloc_stack+x39a/x3c8
Why So Many Stack Traces?
On the s390 platform, certain functions that handle hardware interrupts (IRQ entries) were not marked in a way that let stack depot ignore them (as “uninteresting”). This meant that stack traces from random interrupts (that had no real debugging value) polluted the depot, causing it to fill up quickly.
What’s special about .irqentry.text and .softirqentry.text?
- These are *sections* of code in the kernel, meant to group the entry points for different types of interrupts.
- The stack depot is set to recognize these sections as a "last interesting point" — once it hits one, it won’t go up the stack any further.
But on s390, the .irqentry.text section was *empty*. So the stack depot couldn’t tell where the interrupt handling started, and it dutifully recorded the entire stack for every interrupt. That led to — you guessed it — warnings and full stack depots.
Impact: Kernel warnings, potential resource exhaustion, debugging instability.
- Root Cause: IRQ handlers (especially for IO and External interrupts) were not in the right code section (.irqentry.text), so stack depot couldn't filter them.
- Exploitability: No direct remote or local exploit, but denial of service could be possible in debugging-heavy workloads.
The Fix: Mark Those IRQ Entries!
The solution involved moving the IO/EXT interrupt handler code from the .kprobes.text section into .irqentry.text and updating the kprobes blacklist to include .irqentry.text (so those routines aren’t instrumented by debugging tools inappropriately).
Here’s a peek at the kind of changes made (Full commit diff):
Before
.section .kprobes.text
.globl io_interrupt
io_interrupt:
// ... IRQ handling code ...
After
.section .irqentry.text
.globl io_interrupt
io_interrupt:
// ... IRQ handling code ...
And in the Kprobes blacklist logic
// Previously only:
__kprobes
.section .kprobes.text
// Now also blacklisting:
__kprobes
.section .irqentry.text
##### Commit message snippet (Linux Kernel Patch):
> Fix this by moving the IO/EXT interrupt handlers from .kprobes.text into the .irqentry.text section and updating the kprobes blacklist to include the .irqentry.text section.
Exploit Details
While CVE-2024-57838 does not allow remote code execution, privilege escalation, or information leaks, it can be abused to *simulate* a local denial-of-service on debugging-enabled systems like:
Watch for stack depot filling up, see logs with *capacity warning*.
4. If this happens often enough, legitimate diagnostic traces may be dropped or debugging tools may misbehave.
This affects troubleshooting but does not compromise isolation or escalate privileges.
No more endless unique interrupt stack contexts — depot stays healthy.
This *also* impacts function tracing (ftrace) logic, since it uses the same filtering.
Official References
- Linux Kernel Patch on lore.kernel.org
- Mainline Commit af19b902e9ab (kernel.org)
- Red Hat Bugzilla entry (CVE-2024-57838)
- Description on CVE database
CVE-2024-57838 caused resource warnings on s390 Linux systems using heavy debugging.
- IRQ entries weren’t marked for stack depot filtering, resulting in stack depot warnings and potential resource exhaustion.
No “active exploit” but could be nuisance in debugging; robust fix now in mainline kernel!
Stay safe, and remember — even the smallest bug in the kernel can cause surprising headaches down the line. Always apply your updates!
Timeline
Published on: 01/11/2025 14:15:25 UTC
Last modified on: 05/04/2025 10:05:19 UTC