Summary:
A critical flaw has been discovered in the Xen hypervisor's handling of x86 APIC (Advanced Programmable Interrupt Controller) error interrupts. CVE-2024-45817 allows malicious guests or faulty configuration to deadlock the hypervisor by triggering recursive error handling, due to the use of an illegal interrupt vector in the APIC error interrupt configuration. This post will explain how the vulnerability works, include code examples, reference original patches and advisories, and discuss possible exploit scenarios.

Background: x86 APIC and Xen

In the x86 architecture, the APIC manages interrupts. When an APIC error happens (like spurious or malformed interrupt configuration), it records the condition in a status register and can optionally notify the operating system (or hypervisor) via a specific interrupt called the "APIC error interrupt." The OS configures which interrupt vector to use.

Xen virtualizes APICs for its virtual machines (called domUs), allowing each guest to set APIC registers including the error interrupt vector.

The Vulnerability

An attacker (or misconfiguration) can set the APIC error interrupt vector inside a Xen VM (HVM/PVH/other, depending on config) to an illegal value (i.e., a vector reserved by hardware, or outside architectural limits).

Xen's handler vlapic_error() is called to deal with the error.

4. Recursion: This error handling _itself_ causes another error, which again calls vlapic_error(), etc.
5. Deadlock: Each call tries to take the same lock (apic_lock or similar); since it's not reentrant, Xen deadlocks.

The recursion's depth is bounded (limited by the number of error bits that can be set in the status register), but that doesn't prevent the lock from being re-acquired by the same thread, which leads to a deadlock.

Here's a simplified pseudo-code flow illustrating the vulnerability

void vlapic_error(struct vlapic *vlapic) {
    spin_lock(&vlapic->apic_lock);

    // Read error status
    uint32_t errors = apic_read(APIC_ESR);
    if (errors) {
        // Try to send error interrupt
        int vector = vlapic->lvt_error_vector;
        if (invalid_vector(vector)) {
            // This triggers another error...
            vlapic_error(vlapic); // <-- Recursion!
        } else {
            send_interrupt(vector);
        }
    }

    spin_unlock(&vlapic->apic_lock);
}

Result: on invalid vector, this falls back into itself, re-locks the spinlock, and deadlocks Xen.

References and Patches

- Original Xen Security Advisory XSA-457
- Xen Patch fixing CVE-2024-45817
- CVE Entry at Mitre

This vulnerability can be exploited in the following settings

- Malicious Guests: An exploit in a Xen VM (with low integrity or for attackers) can set the LVT error register to an illegal vector with a simple MSR write or via MMIO.
- DoS Impact: When an APIC error happens (common in misconfigured systems), the host Xen hypervisor can deadlock, effectively causing denial-of-service to all VMs on the affected host.
- Cloud Providers: In shared or public cloud, a rogue customer could take out a whole compute node with a few instructions.

Inside a guest VM running as a privileged or baremetal-like OS

// Set APIC LVT error to an illegal vector (e.g., xFF, which is reserved)
uint32_t illegal_vector = xFF; // x86 APIC vectors are x10-xFE
outl(APIC_BASE + LVT_ERROR_REG_OFFSET, illegal_vector);

// Trigger an error, e.g., by writing invalid bits to other APIC registers
outl(APIC_BASE + x3F, xFFFFFFFF); // Overwrite something erroneous

Patch Xen: All users should upgrade to a version including the fix for CVE-2024-45817.

- Restrict guest APIC config: Cloud admins may wish to restrict guest access to APIC LVT configuration.

Conclusion

CVE-2024-45817 demonstrates how hardware/architecture quirks can escalate security issues in virtualization. Careful sanity-checking and thoughtful reentrancy design are crucial in low-level hypervisor code. While this issue may seem "niche," in the hands of an attacker, it could be critical — especially in large-scale cloud environments.

Further Reading

- Xen APIC documentation
- Linux APIC documentation
- Modern Intel CPU Software Developer’s Manual

Timeline

Published on: 09/25/2024 11:15:12 UTC
Last modified on: 11/21/2024 09:38:08 UTC