Summary:
A security vulnerability was discovered and patched in the Linux kernel’s AMDGPU driver, specifically affecting the cleaner shader deinitialization in gfx_v9_ modules (GFX9 hardware, such as AMD Vega GPUs). This flaw could potentially lead to memory leaks and unpredictable GPU states. Designated as CVE-2024-56753, it was fixed by ensuring proper resource cleanup after the cleaner shader’s use.

The Vulnerability Explained

The AMDGPU kernel driver—supporting modern AMD graphics hardware—includes special code paths for managing *cleaner shaders*: GPU programs used in graphics processing state management. For GFX9 (Vega and similar GPUs), initialization steps for these shaders were present, but shutdown and deallocation were missing. That meant every time the device was initialized and torn down, resources used by the cleaner shader would not be freed, leading to:

Impure GPU state, risking unpredictable errors on re-initialization.

- Potential *denial of service* scenarios if resources are repeatedly leaked, though direct privilege escalation is unlikely via this bug alone.

The Fix: Code Details

The patch corrects the missing cleanup by adding just one line to the gfx_v9__sw_fini() function, ensuring the cleaner shader is properly destroyed.

Before the Fix: No Deinitialization

// File: drivers/gpu/drm/amd/amdgpu/gfx_v9_.c

static int gfx_v9__sw_fini(void *handle)
{
    // ...finalization steps for GFX9...
    // Cleaner shader resources NOT freed!
    return ;
}

After the Fix: Proper Cleanup

// File: drivers/gpu/drm/amd/amdgpu/gfx_v9_.c

static int gfx_v9__sw_fini(void *handle)
{
    struct amdgpu_device *adev = (struct amdgpu_device *)handle;
    amdgpu_gfx_cleaner_shader_sw_fini(adev); // <-- New line added!
    // ...other finalization steps...
    return ;
}

What does amdgpu_gfx_cleaner_shader_sw_fini() do?
It unmaps and frees the memory/resources associated with the cleaner shader, so nothing is left hanging around in kernel space after device removal or shutdown.

Exploit Potential and Realistic Impact

This bug is primarily a resource leak, making it a lower-risk vulnerability, but one that can be triggered repetitively:

Exploitation Scenario

- Any user or process that can trigger repeated GPU "remove" and "init" operations (e.g., using hotplug scripts, or requesting device resets via sysfs) can keep leaking kernel resources.
- In the worst case, this could eventually exhaust kernel memory, leading to general system instability or a denial of service (DoS).

Suppose you have permission to remove and re-add the AMDGPU device repeatedly (with root)

for i in {1..100}
do
   echo 1 > /sys/bus/pci/devices/000:X:00./remove
   echo 1 > /sys/bus/pci/rescan
done

Replace 000:X:00. with your GPU's PCI address.
Warning: This is disruptive and can crash your display/session! Only run in a test environment.

Result on unpatched kernel:
Memory is leaked each cycle, possibly locking up the system after enough repetitions.

lspci -nn | grep VGA

<br>- Look for "Vega", "GFX900", "GFX906"`, etc.

Fixed versions:
- Mainline Linux kernel commit 5b91160dfd3c
- Kernel versions including this commit *or* downstream patches.

---

## Solution: Patch or Upgrade Kernel

Recommended fix:
- Patch your kernel with the commit,
- Or upgrade to a Linux release including this fix (5.4, 5.10, 6.x as appropriate for distros, after June 2024).

---

## References & More Information

- Linux 5.4 commit: drm/amdgpu/gfx9: add cleaner shader sw fini in gfx_v9_
- CVE-2024-56753 at MITRE (when published)
- AMDGPU driver documentation (kernel.org)

---

## Conclusion

CVE-2024-56753 is a resource leak bug in the AMDGPU driver for Linux, now patched upstream.
If you’re running Linux with GFX9-generation AMD GPUs, you should update to a kernel containing this fix to prevent resource leaks and potential denial of service exploits.

Stay secure—keep your kernel updated!

Timeline

Published on: 12/29/2024 12:15:08 UTC
Last modified on: 01/06/2025 19:13:38 UTC