CVE-2024-26987 - Linux Kernel mm/memory-failure Deadlock with hugetlb_optimize_vmemmap

In early 2024, a serious bug was found and patched in the Linux kernel's memory management. This flaw (now known as CVE-2024-26987), could cause a system deadlock—making the whole machine hang—when using huge pages together with the hugetlb vmemmap optimization. Let's break down what happened, how it could be triggered, and why it mattered.

What Is This Bug About?

The bug hit Linux kernels when running with hugetlb_optimize_vmemmap enabled—a performance tweak for huge pages (very large memory allocations, common in databases and HPC). During memory "hard offline" operations (like retiring faulty pages), a deadlock could occur because of a circular locking dependency between certain kernel mutexes.

Put simply: under the right conditions, two kernel code paths would need to acquire *each other's* locks to proceed. This causes an unbreakable cycle—the kernel cannot resolve it, and the system freezes.

pcp_batch_high_lock (protects per-CPU page structures during memory operations)

A particular kernel flow would acquire pcp_batch_high_lock and later try for cpu_hotplug_lock, while another code path could do the reverse. When both were needed at once, and two CPUs tried in opposite order, deadlock arose.

Here's a simplified illustration of the problematic order

// CPU :
lock(pcp_batch_high_lock);
...
lock(cpu_hotplug_lock); // Waits for CPU 1

// CPU 1:
lock(cpu_hotplug_lock);
...
lock(pcp_batch_high_lock); // Waits for CPU 

// System stuck forever waiting!

The original kernel backtrace and lockdep warning looked like this

WARNING: possible circular locking dependency detected
...
bash/46904 is trying to acquire lock: cpu_hotplug_lock
but task is already holding lock: pcp_batch_high_lock

How Can This Be Exploited?

While this is not a typical "remote code execution" or "privilege escalation" bug, it can be triggered deliberately by an attacker or accidental admin. If an adversary can run code (even unprivileged) on a system with hugetlb and vmemmap optimize enabled, they could launch operations forcing the kernel into hard offline actions with huge pages, thus freezing the machine. This forms a Denial of Service (DoS) attack.

Enable Hugetlb Optimize

Boot kernel with hugetlb_optimize_vmemmap=on (may require kernel patches or support in your kernel).

`bash

echo 1 > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages
echo x... > /sys/devices/system/memory/memoryY/state

The Patch: How Was It Fixed?

The kernel's maintainers "*fixed*" the issue by restructuring the locking order during hugetlb vmemmap restore and memory-failure handling to avoid circular acquisition.

Commit Reference:
- LKML Patch Discussion
- Kernel.org Patch

Fixed-in versions:

Which Systems Are Affected?

- Any Linux < 6.8 with CONFIG_HUGETLB_VMEMPAP_OPTIMIZATION (or similar) and hugetlb_optimize_vmemmap=on

Systems using hugepages and memory offlining (databases, HPC clusters, virtualization stacks)

- Not all distros enable this by default, but cloud and enterprise/HPC custom kernels might.

Add hugetlb_optimize_vmemmap=off to your kernel command line.

> Summary:
A subtle deadlock could bring your whole Linux system down via a local DoS if using hugetlb vmemmap optimization under certain memory operations.
Patched in upstream kernels. Update and stay secure!

## More Reading / References

- CVE-2024-26987 at NVD
- LKML: Patch submit thread
- Kernel commit diff

Be cautious when managing or offlining huge pages—kernel internals are tricky, and small mistakes can have big, system-down consequences!

Timeline

Published on: 05/01/2024 06:15:16 UTC
Last modified on: 06/17/2024 17:46:47 UTC