The security of the Linux kernel is critical for the functioning of millions of systems around the world, from servers to Android phones. One vulnerability that has recently come to light — CVE-2023-40791 — involves the scatterlist code within the kernel’s memory management routines. In this write-up, I will explain what this bug is, offer a code snippet, explore the root cause and possible attack scenarios, and provide resources for further reading. My aim is to make this easy to understand, whether you’re a developer, sysadmin, or just interested in computer security.
What is CVE-2023-40791?
CVE-2023-40791 is a vulnerability discovered in the Linux kernel function extract_user_to_sg located in lib/scatterlist.c. Specifically, kernels before version 6.4.12 fail to "unpin" (or release) certain memory pages under specific conditions. This results in what’s called a "dangling pin," which can cause subtle stability, performance, or security issues in the system.
In one proof-of-concept, a WARNING in the kernel logs for try_grab_page can be triggered, indicating mishandling of page references.
The Technical Details
At a high level, when user data is passed to kernel code, the kernel manages "pins" on the corresponding physical memory pages to safely manipulate that data. Unpinning simply means releasing the reference to a page after you’re done with it. Forgetting to unpin leads to a memory leak — the kernel thinks pages are still in use, and over time, the system can run out of available memory.
The vulnerable function is intended to copy user data to a scatter-gather list and properly handle reference counting on memory pages. When there’s a partial success (for example, partway through the process an error occurs), the cleanup code was missing in some paths.
Excerpt from scatterlist.c
Below is an annotated snippet, simplified for clarity, that shows how memory pages are pinned and supposed to be released:
int extract_user_to_sg(const void __user *uaddr, size_t len, struct scatterlist *sg)
{
struct page *pages[MAX_PAGES];
int npages, i, ret;
npages = get_user_pages_fast(uaddr, len, FOLL_WRITE, pages);
if (npages < )
return npages; // error
for (i = ; i < npages; ++i)
sg_set_page(&sg[i], pages[i], PAGE_SIZE, );
/* Suppose something goes wrong here, cleanup is skipped: */
if (some_error_condition) {
// MISSING:
// for (j = ; j < i; ++j)
// put_page(pages[j]);
return -EFAULT;
}
return ;
}
If get_user_pages_fast pins some pages and later there’s an error, documentation says we’re supposed to use put_page() on them. But in versions before 6.4.12, in certain code paths, this didn't happen!
Demonstrating the Issue
Researchers found that, by intentionally triggering errors after some pages have been pinned, you can cause the kernel to log a warning about "try_grab_page" (this warning means you've got a pinned page that someone tried to grab twice, or failed to release). In practice, these kinds of bugs are sometimes found by tools like Syzkaller, an automated kernel fuzz-testing platform.
Here’s what an admin might see in dmesg or system logs
WARNING: CPU: 3 PID: 12345 at mm/util.c:XYZ try_grab_page+xAB/xCD
Exploit Potential
While this bug does not directly allow an attacker to run code as root or crash the machine, it can, over time, cause a denial of service. A local attacker (someone who can run programs on your machine) could intentionally leak pinned pages, causing system memory exhaustion. Eventually, legitimate applications and the kernel itself may be unable to allocate memory, resulting in hangs or crashes.
This is a local resource exhaustion vulnerability.
Simple Exploit Proof-of-Concept (PoC)
While a full exploit script would depend on kernel and distro details, conceptually, it looks like running the vulnerable syscall many times:
// Pseudocode
for (int i = ; i < NUM_ITERATIONS; ++i) {
// prepare user buffer...
ioctl_or_syscall_trigger_extract_user_to_sg();
// each iteration leaks a pinned page if error occurs...
}
// Continue until OOM or system instability.
How to Fix It
The kernel team resolved this issue in Linux 6.4.12 by making sure all error exits in extract_user_to_sg correctly unpin pages. The fix ensures that for every path where pages are pinned, they are eventually released if an error happens.
If your system runs a kernel older than 6.4.12, you should upgrade or ask your distribution for a patched kernel.
References and Further Reading
- CVE-2023-40791 on NVD
- Kernel.org Commit Fix (for scatterlist.c)
- Syzkaller Automated Kernel Fuzzer
- Linux Kernel Documentation: Pinning Pages
- Original public report (oss-security)
Final Thoughts
While CVE-2023-40791 isn't the most dramatic Linux kernel vulnerability, it's a strong reminder that even small cleanup bugs can add up to major operational risks, especially in multi-user environments. Memory pinning is tricky business, and proper error handling is crucial. If you’re a sysadmin or developer, check your kernels and keep systems patched!
Timeline
Published on: 10/16/2023 03:15:09 UTC
Last modified on: 11/10/2023 18:15:08 UTC