In June 2021, the security community discovered a serious vulnerability—now tracked as CVE-2021-47081—affecting the Linux kernel's habanalabs/gaudi device driver. The flaw resides in how memory is managed in the function gaudi_memset_device_memory, specifically risking a use-after-free (UAF) issue that could lead to kernel panics or even potential code execution.
This post dissects the vulnerability, shows the risky code, explains its impact, describes how it was fixed, and provides original references. If you're running AI workloads on Habana Gaudi accelerators, or maintaining systems that use these drivers, this info might save you a big headache.
## What is habanalabs/gaudi?
*habanalabs/gaudi* is a driver in the Linux kernel for Habana Labs' Gaudi AI Processors. These chips are used for AI compute acceleration, meaning secure and robust memory handling is paramount.
The Bug Location
The vulnerable function is gaudi_memset_device_memory found in the habanalabs/gaudi part of the kernel source. Inside this function, a command buffer (cb) is created by hl_cb_kernel_create(), which manipulates its reference count.
If allocating a job with hl_cs_allocate_job() fails, the code then runs a cleanup—but the cleanup logic can lead to use-after-free (UAF) due to mishandling of the command buffer's reference count.
If another thread has also released it (refcount drops to ), the cb might be freed.
- After this, the code tries to access cb->id even though cb could already be freed: classic use-after-free!
Real Risk
A use-after-free lets attackers manipulate freed memory, turn pointers into dangling references, and sometimes even execute arbitrary code. In the kernel, UAF can be extremely dangerous, possibly leading to local privilege escalation.
Vulnerable Code (Simplified Extract)
cb = hl_cb_kernel_create(...);
// cb has refcount 2
...
if (hl_cs_allocate_job(...) != )
goto release_cb;
...
release_cb:
hl_cb_put(cb); // cb might be freed if another thread also released
// Use-after-free: cb could be invalid here!
job_id = cb->id;
Fixed Code (Patched)
The patch makes a simple but crucial change: captures cb->id to a new variable *before* possibly freeing cb:
cb = hl_cb_kernel_create(...);
...
if (hl_cs_allocate_job(...) != )
goto release_cb;
...
release_cb:
id = cb->id; // Copy value BEFORE releasing the cb!
hl_cb_put(cb);
// Now use id instead of cb->id
job_id = id;
Now, no dangling pointer is accessed after possible free.
The Actual Patch
See the official patch here:
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 5a445e679f39..db3fdff7d02 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ ... @@
- hl_cb_put(cb); // This might free cb!
- job_id = cb->id; // Use-after-free
+ id = cb->id; // Safe!
+ hl_cb_put(cb);
+ job_id = id;
While there's no published exploit for CVE-2021-47081, the theoretical scenario is
1. Attacker gets a process to trigger hl_cs_allocate_job() failure, making the code run the release branch.
2. Attacker times their actions so another thread releases the cb's last reference just before the original thread executes cb->id.
3. When cb->id is accessed, the driver's memory may now point to attacker-controlled or freed memory.
Possible further exploitation if the attacker can get code or data into the reused area.
On a shared AI server, a malicious user with the right timing/patience could cause disruptions or possibly attack other tenants/users.
Upgrade to a kernel version containing the patch for CVE-2021-47081.
- If you maintain a downstream or custom kernel, backport the fix to all deployed systems using habanalabs/gaudi.
References
- CVE-2021-47081 at cve.org / NVD
- Linux kernel source patch
- habanalabs/gaudi driver code
Conclusion
CVE-2021-47081 is a textbook example of how a seemingly minor pointer handling mistake can jeopardize the whole system's security. Thanks to careful analysis and a quick patch adoption, the risk can be avoided. If your infrastructure involves AI hardware and drivers, pay special attention to kernel and driver security advisories like this one.
Stay updated, review your kernel’s release notes, and test updates in dev before pushing to production!
*This article is based on analysis of public patches, CVE summaries, and the author’s independent research.*
Timeline
Published on: 03/01/2024 22:15:47 UTC
Last modified on: 12/09/2024 18:45:24 UTC