CVE-2021-4453 - How a Linux Kernel Memory Leak Was Patched in AMD GPU Drivers
Recently, Linux kernel developers patched a memory leak in the AMD GPU power management code. The vulnerability, assigned as CVE-2021-4453, revolves around improper memory handling in the AMDGPU DRM driver's power management module. This post will break down what the issue was, how it was fixed, and what the risk was, all explained in easy-to-understand terms. You’ll also find code snippets and original references.
The Vulnerability: Memory Leak in gpu_metrics_table
In Linux, the DRM (Direct Rendering Manager) subsystem manages graphics. The AMDGPU driver is responsible for AMD GPUs, including power management (changing performance and power usage profiles). Within this part of the code, a table named gpu_metrics_table is used to collect GPU metrics.
In the function renoir_init_smc_tables(), memory is allocated for gpu_metrics_table using kzalloc (which allocates zeroed memory). However, when cleaning up (in smu_v12__fini_smc_tables()), that memory wasn’t properly freed with kfree. This meant any time the driver was re-initialized or unloaded, a small chunk of memory could be leaked.
Why Does This Matter?
If this function runs many times (for instance, if the kernel module is loaded/unloaded repeatedly), the system would slowly lose memory. Over time, this kind of "leak" could cause stability problems, especially on systems running a long time without rebooting.
Here’s a simplified version of what the offending code looked like
// Function to allocate memory for gpu_metrics_table
static int renoir_init_smc_tables(struct smu_context *smu)
{
smu->smu_table.gpu_metrics_table = kzalloc(size, GFP_KERNEL);
// ... other code ...
}
// Function to clean up allocations (missing code)
static int smu_v12__fini_smc_tables(struct smu_context *smu)
{
// Missing: kfree(smu->smu_table.gpu_metrics_table);
// ... other code ...
}
Notice that in the cleanup function smu_v12__fini_smc_tables, there’s no kfree call for the gpu_metrics_table.
The Fix
The fix added just a couple of lines—ensuring that when we're done with the gpu_metrics_table, we free the memory:
static int smu_v12__fini_smc_tables(struct smu_context *smu)
{
kfree(smu->smu_table.gpu_metrics_table);
smu->smu_table.gpu_metrics_table = NULL;
// ... other code ...
}
This change prevents the memory leak by explicitly freeing the allocated memory during cleanup.
Exploit Details
This vulnerability wasn’t a security hole in the sense that attackers could exploit it for privilege escalation or remote code execution. However, it was an easy way to cause a denial of service by gobbling up RAM over time.
Practical Example
If a user or script repeatedly loaded and unloaded the GPU driver (e.g., via modprobe and rmmod), they could consume system memory until the machine slowed down or became unstable. On servers or workstations with long uptimes, this risked gradual performance degradation.
# Load and unload the module repeatedly — on vulnerable kernels, each round leaks some memory
while true; do
sudo rmmod amdgpu
sudo modprobe amdgpu
done
The script above would, over hours or days, cause the system to run out of memory. While this requires local access, it’s a crucial type of bug to fix for system stability.
The original patch can be found here
- drm/amd/pm: fix a potential gpu_metrics_table memory leak
- Upstream commit fixing the issue
The CVE record
Summary Table
| Item | Details |
|---------------------------|---------------------------------------------|
| CVE ID | CVE-2021-4453 |
| Affected Component | AMDGPU DRM Driver (Power Management) |
| Vulnerability | Memory leak in gpu_metrics_table |
| Main Functions Affected | renoir_init_smc_tables, smu_v12__fini_smc_tables |
| Impact | System memory exhaustion over time (DoS) |
| Fixed in | Linux kernel mainline (late 2021) |
| Security Risk Level | Low (local, non-privileged DoS) |
Conclusion
CVE-2021-4453 might not be as scary as a remote exploit, but it’s an excellent example of why memory management matters in kernel code. Over time, even tiny leaks contribute to system instability. Thanks to careful review by both AMD and Linux kernel developers, this bug was quickly fixed.
Further Reading
- Linux Kernel Documentation
- DRM AMDGPU Driver Code
- What is a Memory Leak? (simple explanation)
Timeline
Published on: 02/26/2025 06:37:29 UTC
Last modified on: 03/18/2025 18:52:26 UTC