CVE-2024-27005 - Race Condition in Linux Kernel Interconnect Subsystem (Exclusive Analysis & Exploit Example)

---

The CVE-2024-27005 vulnerability affects the Linux kernel's interconnect ("ICC") subsystem. It is a tricky race condition that can happen while the kernel is managing hardware interconnect bandwidth requests. This problem could crash your kernel (use-after-free or kernel panic), or (depending on how it's exploited) potentially even escalate privileges.

Let’s break down what happened, how the race condition actually works, and see some example code that demonstrates this issue. At the end, we’ll see links to the original patch and references.

Vulnerability Explained in Simple Terms

Imagine you’re trying to manage a list of tasks on two separate sheets—one handles bandwidth (bw_lock), and another handles all other requests (icc_lock). But sometimes, you look at one while the other is being updated. If two people do this at the same time, you might accidentally see or change something that’s already been deleted or modified by someone else.

In Linux, the ICC subsystem organizes and aggregates bandwidth requests between device interconnects. It keeps a req_list (request list) – an internal hash-linked list per node.

Originally, developers split one "big" lock into two: icc_lock (all requests) and icc_bw_lock (just bandwidth). But they forgot to ensure both locks were always used when managing req_list. This led to situations where one thread walks the list (using only icc_bw_lock), while another deletes or adds entries using only icc_lock.

That’s a classic race condition: two places reach into the same list, one to read, one to edit, but each holds a different lock.

Example A

CPU                 | CPU1
---------------------|-------------------------
icc_set_bw(path_a)   |
  mutex_lock(bw)     |
  aggregate_requests()|
    for r in req_list | icc_put(path_b)
                      |   mutex_lock(icc)
                      |   hlist_del(r) <-- delete node
      <r = invalid pointer>

Here, CPU is _reading_ from the req_list while CPU1 _deletes_ from it. This can leave CPU looking at freed memory, which triggers undefined behavior: best-case, a crash; worst-case, an attacker pivot.

Example B

CPU                 | CPU1
---------------------|-------------------------
icc_set_bw(path_a)   |
  mutex_lock(bw)     |
  aggregate_requests()|
    for r in req_list | path_b = of_icc_get()
                      |   mutex_lock(icc)
                      |   hlist_add_head(r) <-- add node
      <r = invalid pointer>

Similar, but this time an entry's added at the head. Again, while one thread iterates, another mutates.

Exploit Possibility

This kind of race is typically hard to exploit, but given that req_list is a kernel linked list, a malicious local user/program may trigger this situation through repeated icc_set_bw(), of_icc_get(), and icc_put() syscalls in parallel threads/processes, deliberately causing inconsistency. This could lead to:

A simplified attack might look like this (psuedocode/C)

// WARNING: PoC, simplified for clarity (do NOT run on production!)
// Compile with -pthread

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void* set_bw_thread(void* arg) {
    while (1) {
        syscall(__NR_icc_set_bw, path_a); // placeholder syscall
    }
}

void* put_thread(void* arg) {
    while (1) {
        syscall(__NR_icc_put, path_b); // placeholder syscall
    }
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, set_bw_thread, NULL);
    pthread_create(&t2, NULL, put_thread, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return ;
}

> NOTE: The actual syscalls and arguments would be system-dependent, but the flavor is the same: hammer requests and deletes in parallel.

A real attacker might try to map freed memory to userland-controlled data, leading the kernel to read or write from/to where it shouldn’t.

PATCH: How The Linux Kernel Fixed It

Developers unified the locking: now the icc_bw_lock mutex must be held wherever the req_list is accessed or modified—not just one or the other. This solves the race: you can’t read or change the list without holding the same lock every time.

Patch Snippet (aligned with the official commit)

// Before: only held icc_lock or icc_bw_lock
mutex_lock(&icc_lock);
// modify req_list
mutex_unlock(&icc_lock);

// Now: always also hold icc_bw_lock when touching req_list
mutex_lock(&icc_bw_lock);
// modify req_list
mutex_unlock(&icc_bw_lock);

Check the official patch here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff2c887c3b24e57e7b064b2e3c2c33e4872a5c06

References

- Linux Kernel Patch Commit
- Original Discussion, LKML
- CVE-2024-27005 at Mitre
- Commit af42269c3523 ("interconnect: Fix locking for runpm vs reclaim")

TL;DR

- Old code used two different mutexes over a shared resource, causing possible kernel crashes or security issues.

Patch ensures only one lock is always used, fixing the race.

- Update your kernel to stay safe! This bug is public, easy to reproduce, and affects recent kernels supporting ICC.

This writeup and example are exclusive, simplified for clarity, and intended for educational use only. Patch promptly and watch for further advisories!

Timeline

Published on: 05/01/2024 06:15:18 UTC
Last modified on: 11/05/2024 16:35:15 UTC