A recent vulnerability in the Linux kernel’s memory management subsystem, CVE-2024-26960, highlights the surprising complexity and danger behind a seemingly rare race condition with swap devices. While actual exploitation “in the wild” has not been proven, the kernel team considers the bug serious enough to warrant a patch due to reliable code review and theoretical attack paths.

This post dives deep into the vulnerability, explains the root cause in clear terms, walks through a hypothetical exploit scenario, and includes helpful links and code snippets to make it easy for even non-experts to follow.

What is CVE-2024-26960?

CVE-2024-26960 is a race condition vulnerability in the Linux kernel’s handling of swap devices, specifically around how pages are freed and how swapoff (the operation to remove a swap device) synchronizes with ongoing memory operations.

- When two threads are manipulating swap entries at the same time – one freeing swap and another (swapoff) tearing down the entire swap device – there is a narrow window where code may still access already-freed memory, causing a classic use-after-free scenario.
- This could, in theory, allow for memory corruption or even privilege escalation, depending on what memory is overwritten and how an attacker is able to control execution.

This is performed by an admin to disable swap on a device.

- It tears down (frees) swap_info_struct memory, including the swap_map that tracks used/free slots.

The problem: If one thread is in free_swap_and_cache() and another thread runs swapoff() at just the right time, then after the swap device memory is freed, the first thread could accidentally use a dangling pointer. This is most dangerous if an attacker can somehow control the timing and contents of swap, possibly leading to memory corruption.

Two different processes (P1 and P2) each own one subpage in swap.

Step-by-step (based on David Hildenbrand’s analysis):

1. Process 1 (P1) references subpage  in swap.
2. Process 2 (P2) references subpage 1 in swap.

3. P1 exits, triggers free_swap_and_cache():
   - count == SWAP_HAS_CACHE for the entry.
   - *P1 is preempted here (not finished yet).*

4. P2 exits, triggers free_swap_and_cache():
   - count == SWAP_HAS_CACHE for the entry.
   - P2 passes swap_page_trans_huge_swapped() (possibly the last user),
     calls __try_to_reclaim_swap(), which eventually:
     - Frees the swap cache and sets si->inuse_pages == .

5. swapoff() sees that si->inuse_pages == .
   - swapoff proceeds to free swap_info_struct (including swap_map).

6. P1 resumes,
   - It is still running within free_swap_and_cache(),
   - Attempts to access swap_map (already freed!) via swap_page_trans_huge_swapped().

-> Use-after-free occurs! (Dangling pointer is now live.)

Code Snippet: Old Vulnerable Logic

// Vulnerable code before the fix, simplified:
void free_swap_and_cache(swp_entry_t entry) {
    struct swap_info_struct *si = _swap_info_get(entry);
    if (!si) return;
    swap_page_trans_huge_swapped(si, entry); // potential use-after-free!
    ...
}

How Was It Fixed?

The official kernel patch (commit link) addresses this by ensuring that swapoff cannot proceed while a reference to the swap device is held by free_swap_and_cache():

- Introduce get_swap_device() and put_swap_device() calls to take and release a reference, stalling swapoff until all users are done.

Code Snippet: With the Fix

void free_swap_and_cache(swp_entry_t entry) {
    struct swap_info_struct *si;
    if (!(si = get_swap_device(entry)))
        return;

    /* Ensure swap entry is not free */
    if (!swap_entry_is_active(si, entry)) {
        put_swap_device(si);
        return;
    }

    swap_page_trans_huge_swapped(si, entry);
    ...
    put_swap_device(si);
}

Dereferencing a freed pointer (use-after-free),

- Memory corruption, system crash (OOPS/panic), or,

Who is at risk?

- Any multi-threaded/server environments using swap with workloads that cause rapid process exits and swap activity.

References

- Kernel.org Patch Discussion
- Official Patch Commit (linux.git)
- Red Hat Security Impact (CVE-2024-26960)

Summary

CVE-2024-26960 is a little-known but technically impressive race condition bug in the Linux kernel’s swap code. While not known to be actively exploited, its potential for use-after-free and memory corruption highlights the critical importance of careful concurrency handling, even in apparently “safe” subsystems.

System administrators should update Linux kernel packages as soon as patches are released to avoid possible risk, and kernel developers can use this as a case-study in the importance of reference counting and operation ordering for safe code.

Stay safe! Always keep your kernel up-to-date.


If you found this explanation useful or want more technical deep-dives into real world Linux kernel CVEs, let me know in the comments!

Timeline

Published on: 05/01/2024 06:15:12 UTC
Last modified on: 08/02/2024 00:21:06 UTC