In early 2021, a subtle but impactful security vulnerability, now identified as CVE-2021-46912, was found in the Linux kernel’s TCP congestion control sysctls. This bug affected the way the kernel handled sysctl settings for congestion control within network namespaces, inadvertently creating a namespace isolation leak.
If you’re running containers or isolated network environments on Linux, this flaw could allow one container to tamper with global TCP congestion control policies, affecting networking performance, stability, and potentially even security of *all* containers and the host. Let’s break down the bug, see how it happened, and explore the patch and exploitation details with code snippets.
What Are TCP Congestion Controls?
Linux supports multiple TCP congestion control algorithms (like cubic, reno, etc.). These can be queried and set using the sysctl interface exposed at /proc/sys/net/ipv4/tcp_allowed_congestion_control and /proc/sys/net/ipv4/tcp_available_congestion_control. For security and performance, the ability to change these settings should be constrained per network namespace (netns), especially in a containerized world.
The Vulnerability
Issue:
- tcp_allowed_congestion_control was designed to be global but *writable* from inside any network namespace.
- Any modification from one namespace would “leak” into all others—even into the host net namespace.
.procname = "tcp_allowed_congestion_control",
.data = NULL, // No per-netns storage
.proc_handler = proc_allowed_congestion_control,
.mode = 0644, // Writable!
`
- When a process wrote to /proc/sys/net/ipv4/tcp_allowed_congestion_control inside a network namespace, it *actually* wrote to the global kernel value.
`bash
echo 'reno' > /proc/sys/net/ipv4/tcp_allowed_congestion_control
`
- Suddenly, all other containers (and the host) have tcp_allowed_congestion_control set to reno as well!
Sysctls tcp_allowed_congestion_control and tcp_available_congestion_control were always global.
- Moving them to the ipv4_net_table (meant for per-netns sysctls) made them appear isolatable, but they still pointed to global data.
Sysctl's logic did *not* check if .data was NULL and routes all access through global handlers.
- Any write leaked everywhere—this defeated the security model of Linux namespaces for these TCP parameters.
The Patch – Making the Sysctl Read-Only
Solution:
Since the only safe action across namespaces is “read” (you can only see the allowed algorithms, not change them per netns), the fix was to force these sysctls to be read-only at the kernel level.
Patch Example (Simplified)
- .mode = 0644, // Writable by root
+ .mode = 0444, // Read-only everywhere
Now, any write attempts will fail
$ echo 'reno' > /proc/sys/net/ipv4/tcp_allowed_congestion_control
bash: echo: write error: Operation not permitted
Reference:
- Upstream Kernel Commit Fixing the Issue
- CVE-2021-46912 Red Hat Bugzilla
1. Setup Two Containers or Network Namespaces
# Create two isolated network namespaces
ip netns add ns1
ip netns add ns2
2. Mount Procfs and Set Congestion Control
ip netns exec ns1 bash
echo 'reno' > /proc/sys/net/ipv4/tcp_allowed_congestion_control
exit
ip netns exec ns2 bash
cat /proc/sys/net/ipv4/tcp_allowed_congestion_control
# Output: reno
Note: Even in netns 2, the effect is visible—the sysctl is global!
3. Confirm on Host
cat /proc/sys/net/ipv4/tcp_allowed_congestion_control
# Output: reno
This demonstrates container to container (or container to host) interference, bypassing namespace isolation, which is a serious security issue in shared or multi-tenant environments.
Impact
- Containers: Malicious or misconfigured containers can globally break or lower TCP performance policies.
How to Mitigate
- Upgrade your kernel: Ensure you run a version where this sysctl is read-only (Linux 5.13+ or backported fix in your distribution).
Further Reading
- Linux Kernel Commit Fixing CVE-2021-46912
- Red Hat Security Advisory for CVE-2021-46912
- LWN coverage: “Kernel TCP sysctl leak between namespaces”
- Linux Network Namespace Documentation
Conclusion
CVE-2021-46912 is a great reminder of how small design oversights in kernel sysctls can lead to major isolation failures. On modern Linux, containers and virtualized environments rely heavily on sysctl isolation—make sure your systems are patched!
If you’re running kernels before 5.13 or without the fix, upgrade immediately to prevent network-based data leaks or cross-container disruption.
Timeline
Published on: 02/27/2024 07:15:07 UTC
Last modified on: 04/17/2024 16:53:39 UTC