Release Date: June 2024
Component: KVM (Kernel-based Virtual Machine) in Linux kernel
Affected Versions: Linux kernel before the patch described below (notably 6.6.-rc1 and similar)
Severity: High

Introduction

The CVE-2024-26976 security vulnerability affected the Linux kernel’s KVM (Kernel-based Virtual Machine) module, involving improper handling (i.e., not flushing) of the async page fault (PF) workqueue when a virtual CPU (vCPU) is removed, especially as the VM is destroyed.

In simple terms, this led to a situation where background tasks ("workqueue callbacks") could try to access already-freed memory, crash the system, or even allow local attackers to cause a denial of service or run unexpected code paths.

What Went Wrong?

Historically, when a KVM virtual CPU (vCPU) was destroyed (for example, when the VM was torn down), KVM did not always flush the per-vCPU asynchronous page fault (async #PF) workqueue. As a result, tasks in the background might be running while the vCPU is freed, potentially leading to:

Module unloading bugs, or even kernel deadlocks (system hangs).

The original logic tried to "gift" a reference of the VM to the async_pf workqueue callback to keep memory alive. But this solution was flawed — it could create deadlocks, particularly if the cleanup (via kvm_put_kvm()) tries to flush a queue that’s waiting for itself to finish!

#### Warning / Deadlock Example

A real-life deadlock warning trace looked like this

WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+x2d/x320 [kvm]
...
Workqueue: events async_pf_execute [kvm]
...
kvm_clear_async_pf_completion_queue+x129/x190 [kvm]
kvm_arch_destroy_vm+x78/x1b [kvm]
kvm_put_kvm+x1c1/x320 [kvm]
async_pf_execute+x198/x260 [kvm]
...
INFO: task kworker/8:1:251 blocked for more than 120 seconds.
...

In plain English: The kernel is stuck trying to complete the async page fault work while the module and vCPU are being destroyed, which causes a deadlock situation.

Here’s a simplified/pseudocode version of the original vulnerable pattern

// This is a representation, not the full code
void kvm_put_kvm(struct kvm *kvm) {
    // ...
    flush_async_pf_workqueue(kvm); // If not properly flushed, can UAF!
    // Memory is freed here, but the workqueue may still access it.
}

And the callback

void async_pf_execute(struct work_struct *work) {
    // Accesses VM/vCPU memory
}

The Patch

The fix is simple in idea but important:
Update KVM so that it always flushes the per-vCPU async #PF workqueue _before_ destroying vCPUs and the VM. This guarantees all background work is done, and no background jobs can access freed or invalid memory.

The patch also removes the "VM refcount gifting" logic that actually introduced more trouble (deadlocks), and adds a helper for flushing only valid work items.

Key Patch Snippet

// Helper to flush the async_pf queue safely
static void flush_async_pf_workqueue(struct kvm_vcpu *vcpu) {
    if (workqueue_has_pending_work(&vcpu->async_pf_work))
        flush_workqueue(&vcpu->async_pf_workqueue);
}

// Now called whenever a vCPU is destroyed:
void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) {
    flush_async_pf_workqueue(vcpu);
    // Now it's safe to proceed with vCPU destruction!
    // Old code: possibly tried to force GC by gifting VM references; now removed
}

1. Attack Scenario

This bug can be locally exploited by an unprivileged user (who can start/stop KVM VMs) to crash the host kernel.
- Scenario: The attacker rapidly spawns and destroys virtual machines/vCPUs in a loop.

The kernel could execute a callback referencing memory that is now freed.

- Results in a crash (denial of service) or possibly (under certain heap spray patterns) arbitrary code execution inside the kernel, depending on heap state and timing.

Tearing down the VM quickly after #PF events are issued.

4. Observing kernel logs or system crash/reboot.

Notably, reliable exploitation for code execution is difficult, but a system crash or hang ("deadlock") is common.

Example C code snippet (sketch)

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <linux/kvm.h>

// WARNING: This is only for demonstration. Don't use on production!
void *spawn_vm(void *arg) {
    int fd = open("/dev/kvm", O_RDWR);
    int vm = ioctl(fd, KVM_CREATE_VM, );
    int vcpu = ioctl(vm, KVM_CREATE_VCPU, );
    // ... set up VM, vCPU state ...
    // force some memory faults / ballooning here
    // close vCPU, destroy VM quickly
    close(vcpu);
    close(vm);
    close(fd);
    return NULL;
}

int main() {
    const int loop = 100;
    pthread_t t;
    for (int i = ; i < loop; ++i) {
        pthread_create(&t, NULL, spawn_vm, NULL);
        pthread_join(t, NULL);
    }
    printf("Spawned vCPUs rapidly — if your kernel is vulnerable, system might crash/hang.\n");
}

Apply the Patch:

Make sure your kernel includes the patch for CVE-2024-26976 or later.

Restrict KVM Access:

Prevent untrusted users from creating VMs (e.g., restrict /dev/kvm permissions).

References

- CVE-2024-26976 (NIST)
- Upstream commit fixing the issue (LKML)
- Linux kernel v6.6.7 ChangeLog

Final Notes

- This bug is exclusive to Linux systems with KVM enabled and affects primarily kernel maintainers, virtualization hosts, and anyone running untrusted guests.
- If you run QEMU/KVM-based virtualization, make sure your kernel includes this fix — otherwise, you risk host kernel crashes from guests or local users.
- The root cause, a classic workqueue vs. object lifetime bug, shows the importance of careful reference handling in concurrent kernel code.

Stay patched!

*This explanation is unique and synthesized for educational purposes. For verbatim security details, consult the official advisories linked above.*

Timeline

Published on: 05/01/2024 06:15:14 UTC
Last modified on: 07/03/2024 01:50:10 UTC