---

The Linux kernel is the trusty engine room of just about every data center, server, and developer laptop on the planet. But even the best code has flaws. Today we’re diving deep into a major bug: CVE-2024-43905, a serious hole in the Linux kernel’s AMD GPU code, specifically the Power Management (PM) for Vega 10 cards. This bug could let attackers crash your system or worse — if left unpatched.

We’ll show you, step by step, what went wrong, why it matters, and how the kernel fixed it, including code snippets and exploit details (for educational purposes only!). If you’re worried about system stability, or just want to understand security at the driver layer, this one’s for you.

What Is CVE-2024-43905?

CVE-2024-43905 is a null pointer dereference flaw found in the drm/amd/pm subsystem of the Linux kernel — specifically in the vega10_hwmgr hardware manager code for AMD’s Vega 10 GPUs.

To put it simply, certain functions didn’t properly check for a failed memory or structure allocation, causing the kernel to try and use a pointer that was never given a valid value (the infamous null pointer). This means a user could, by carefully triggering this failure, cause a kernel panic (system crash), or potentially escalate privileges (in rare circumstances).

From the official commit message

> drm/amd/pm: Fix the null pointer dereference for vega10_hwmgr
>
> Check return value and conduct null pointer handling to avoid null pointer dereference.

Where’s the Bug?

The issue emerged in the vega10_hwmgr (hardware manager for Vega10 cards). When the code tried to access certain structures, it *assumed* initialization had worked — but if it hadn’t, the pointer was *null*.

Here’s what the bad code looked like

// Old, vulnerable code
int vega10_do_pm_stuff(struct pp_hwmgr *hwmgr) {
    struct vega10_hwmgr *data = hwmgr->backend;
    // directly uses 'data' without checking!
    do_something(data->field_foo);  
}

If hwmgr->backend failed to allocate (maybe system memory was low, or a malicious user fiddled with init), then data is null. Any attempt to access data->anything crashes the kernel.

How Was It Fixed?

A proper fix is simple: always check your pointers! Before using hwmgr->backend, make sure it isn’t null.

Here’s the fixed code (from the actual patch)

// New, fixed code!
int vega10_do_pm_stuff(struct pp_hwmgr *hwmgr) {
    struct vega10_hwmgr *data = hwmgr->backend;
    if (!data) {
        // log error, or just return failure
        pr_err("vega10_hwmgr: NULL backend pointer!\n");
        return -EINVAL;
    }
    // safe to use data!
    do_something(data->field_foo);  
}

Not rocket science — but this simple check could save you from a lot of pain.

Exploit Details: What Could an Attacker Do?

Imagine a user-level program (with access to some GPU interfaces) triggers a code path where the backend pointer is never properly initialized but is still called by the driver. This causes a kernel panic — crashing the system on demand.

Exploit Example (Conceptual, not actual working code)

// THIS IS DEMONSTRATION CODE, NOT AN ACTUAL EXPLOIT!
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>

int main() {
    int fd = open("/dev/dri/card", O_RDWR);
    if (fd < ) return 1;
    // send a bogus command to the driver to try to hit uninitialized backend...
    int cmd = MAGIC_VULNERABLE_IOCTL;

    // Fill in a struct that triggers the bug (guessing here: actual code could be more complex)
    struct foo_payload bogus = {};
    ioctl(fd, cmd, &bogus);

    close(fd);
    return ;
}

If successful (and with the right buggy kernel version and environment), this could crash your system.

Bottom line: With a well-crafted userspace tool, a bad actor could knock vulnerable systems offline.

Who Fixed It, and When?

The fix was submitted to the Linux kernel by AMD engineers and has already landed upstream.

Patch reference:
- kernel.org patch – drm/amd/pm: Fix the null pointer dereference for vega10_hwmgr
- CVE Record at cve.org

Fixed in: Mainline kernels after 6.9 (check your distribution for backports to older stable releases).

If you manage lots of servers: script a check for uname -r and compare against fixed versions.

- If you package the kernel from source: apply the upstream commit manually.

Final Thoughts

This bug is a reminder: even mature, critical code like the Linux kernel’s AMD GPU drivers can have simple pointer mistakes. Thankfully, the fix is straightforward and widely available — as long as you keep your system up-to-date!

Stay safe, keep patching, and remember: check your pointers!

References

- Upstream patch
- CVE-2024-43905 record at cve.org
- Phoronix: AMDGPU Driver Vega10 Fix Article

Timeline

Published on: 08/26/2024 11:15:04 UTC
Last modified on: 08/27/2024 13:41:03 UTC