runc is the backbone command-line tool for running Linux containers, especially with Docker, Podman, and related projects. It's widely used but in early 2023, a serious security flaw—CVE-2023-25809—was found, affecting how cgroups are handled in rootless containers. This post will break down what happened, how it works, how to mitigate it, and why it matters, with real code snippets and references.
What is the Problem? (TL;DR)
- In some setups, rootless runc (versions before 1.1.5) makes /sys/fs/cgroup writable inside containers.
- Attackers inside such containers can modify their own user-owned cgroup hierarchies on the host (not other users’).
1. Rootless in a User Namespace, No Unshared cgroup Namespace
If runc runs inside a user namespace (i.e., "rootless"), and the container is started with the host cgroup namespace (--cgroupns=host), /sys/fs/cgroup stays writable from inside the container.
Examples of affected launches
docker run --rm --cgroupns=host --user $(id -u):$(id -g) alpine sh
# Or with rootless Podman:
podman run --rm --cgroupns=host alpine sh
# Or nerdctl:
nerdctl run -it --cgroupns=host alpine sh
Default is --cgroupns=private, which is safe!
### 2. runc Outside User Namespace with rbind,ro on /sys (Very Rare)
This case happens when /sys is mounted with the rbind,ro option but not inside a user namespace.
Example (unusual)
sudo runc spec --rootless
sudo runc run <container-id>
What Can an Attacker Do?
If the exploit conditions are met, a user inside the container can get write access to their own cgroup hierarchy on the host—for example, /sys/fs/cgroup/user.slice/user-$(id -u).slice/....
Modifying resource limits from inside the container, possibly bypassing restrictions.
- Interfering with other containers/processes under the same user slice.
Demonstrating the Exploit
Let's see what exploitation could look like inside an affected container.
Step 1: Start a Rootless Docker Container With Host cgroup Namespace
docker run --rm --cgroupns=host --user $(id -u):$(id -g) alpine sh
Step 2: Inside Container, Check if Writable
ls -ld /sys/fs/cgroup/user.slice/user-$(id -u).slice/
# You should see root user (but look if you can write)
touch /sys/fs/cgroup/user.slice/user-$(id -u).slice/testfile
If you can create the file, you're vulnerable.
You could echo new limits, for example (not a typical use-case, but for PoC)
echo 100 > /sys/fs/cgroup/user.slice/user-$(id -u).slice/pids.max
cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/pids.max
Warning: Don't do this on production! You may break your system.
Official Fix
- Upgrade runc to 1.1.5 or newer (release notes).
Most distributions released a fixed package.
- If you use Docker/Podman/nerdctl, update your engine—it brings a fresh runc.
Workarounds
- Always use --cgroupns=private (docker run --cgroupns=private). This is now the default on cgroup v2 hosts.
- Mask /sys/fs/cgroup with maskedPaths:
`json
{
"maskedPaths": [
"/sys/fs/cgroup"
]
}
More About runc and cgroups
- runc repo
- runc security docs
- Open Container Initiative (OCI) runtime spec
References
- CVE-2023-25809 at NVD
- runc official security advisory
- GitHub issue discussion
- runc 1.1.5 release notes
Conclusion
CVE-2023-25809 shows that even well-tested software like runc can expose underlying host resources in tricky scenarios. The best practice is update your container runtime, avoid sharing cgroup namespaces unless required, and use namespace or filesystem masking features to block access to sensitive paths.
Timeline
Published on: 03/29/2023 19:15:00 UTC
Last modified on: 04/06/2023 17:41:00 UTC