On some Linux systems, a subtle bug related to kTLS (Kernel Transport Layer Security) network acceleration could cause kernel memory mismanagement. Labeled as CVE-2024-53138, this vulnerability deals with incorrect page reference counting in the kTLS transmit (TX) path—especially when using modern NFS setups with large folios.
How mixing up get_page() and page_ref_inc() led to trouble
- The conditions under which the bug could cause problems (hint: think zero-copy sendfile with NFS large folios)
What Is kTLS and Why TX Page Counting Matters
kTLS lets applications offload crypto work like TLS encryption directly into the Linux kernel, resulting in faster, zero-copy serving of data (like sending files over HTTPS). To do this efficiently, memory pages holding the application’s data are passed around and reference-counted so they’re only freed after use.
Linux supports "folios"—memory chunks potentially larger than a single page—which recent kernels use for better performance. Now, network code must be careful not to confuse page counts when handling folios.
The Bug: kTLS + Folios + Reference Counting Chaos
The vulnerability was found in the kTLS transmit code, specifically in the mlx5e (Mellanox 5th gen Ethernet) driver. Here’s the breakdown:
- The kTLS TX code sometimes uses get_page(), sometimes page_ref_inc() to _increase_ the refcount for pages.
- On release (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is called to _decrease_ the refcount.
page_ref_inc() operates on the specific subpage of the folio.
- This mismatch can make the folio have a lower refcount than it should—possibly freeing memory while still being used (use-after-free), or at best causing memory management headaches.
Triggered scenario:
- A file is served using sendfile() over kTLS zero-copy, where the file’s backing memory is managed as a large folio (NFS setup).
- kTLS code increases some refcount using get_page(), others with page_ref_inc(), but _all_ are decremented with put_page().
The old code looked like this
// Sometimes increment refcount like this:
get_page(page); // on the folio (whole block)
page_ref_inc(page); // on the subpage directly
// On cleanup:
put_page(page); // always on the folio, no matter how it was incremented
Right way:
You need matching pairs: only use either set for both increment and decrement—don’t mix. With folios, this really matters, since mixing could result in double or missed release of memory.
The Fix
The maintainers changed the code to always use matching APIs. For example, always use get_page()/put_page() so that everything operates on the folio page, not on random subpages.
Here is a sample of the correct refcounting
// Always increment the correct way:
get_page(page); // Only use this for both inc/dec
// ... later, on completion:
put_page(page); // Decrements correctly, matches with get_page()
Commit message summary: (Original commit)
“net/mlx5e: kTLS, Fix incorrect page refcounting”
(You can see the patch here.)
Impact
Most users running recent kernels, especially with large NFS deployments and using kTLS with zero-copy sendfile, could in rare cases hit this bug and see memory corruption or kernel crashes.
Exploiting the bug:
- If an attacker could reliably cause unreferenced folios (trigger use-after-free), they may be able to corrupt kernel memory or escalate privileges.
- However, this would require very specific setup—custom NFS with large folios, kTLS in use, and a lot of network I/O.
Detection:
If you see unexplained kernel panics or memory errors using the Mellanox mlx5e driver with NFS and kTLS, you may have hit this bug.
References
- Security tracker: CVE-2024-53138
- Linux kernel commit: net/mlx5e: kTLS, Fix incorrect page refcounting
- Related recent feature: NFS: Add support for large folios
What Should You Do?
- Admins: If you’re running Linux servers with NFS, Mellanox hardware and have enabled kTLS, upgrade to a kernel that includes the CVE-2024-53138 fix.
- Developers: Audit network driver code for consistent use of refcount functions when handling folios.
Conclusion
CVE-2024-53138 is a classic example of how subtle memory management bugs—especially with the shift toward large folio support—can create security problems in low-level optimized code paths like kTLS. The Linux kernel community’s prompt action in documenting, patching, and explaining the bug demonstrates best practice in handling such complex vulnerabilities.
Got questions? Check lkml.org and your distro’s security advisories.
Timeline
Published on: 12/04/2024 15:15:13 UTC
Last modified on: 12/19/2024 09:40:07 UTC