The Linux kernel’s KVM (Kernel-based Virtual Machine) subsystem is a cornerstone of virtualization technology today, allowing multiple operating systems to run as virtual machines (VMs) on Linux. But like any complex system, vulnerabilities can emerge. CVE-2021-47060 is a bug that was found and resolved in KVM, specifically relating to how it manages coalesced MMIO (Memory-Mapped I/O) zones and the KVM I/O bus. In this article, we'll break down the vulnerability in simple terms, provide sample code, and discuss its impact, referencing original sources along the way.

Background: What is Coalesced MMIO?

In virtualization, MMIO lets guest VMs interact with virtual hardware by reading and writing specific memory addresses. “Coalesced” MMIO is an optimization in KVM that batches these reads and writes to improve performance.

The zone structures, which describe coalesced MMIO regions, are kept on a linked list. When you add or remove devices (with their own MMIO needs), these lists can be updated at runtime.

The Problem

When a device is removed from the KVM I/O bus, the function kvm_io_bus_unregister_dev() attempts to create a new bus instance without the target device (the one to be removed). If the memory allocation for the new bus fails, the function deletes all devices except the target device, effectively destroying the bus.

The code responsible for “walking” (iterating through) coalesced MMIO zones was not informed if the entire bus and its list entries had already been deleted due to a failure in creating the new bus. This led to a use-after-free scenario: the code would continue iterating through a list that had just been deallocated, causing a potential crash or, in rare cases, an escalation in privilege.

Here’s a simplified snippet of the problematic logic before the fix

list_for_each_entry(zone, &bus->coalesced_zones, next)
    if (in_coalesced_zone(gpa, len, zone))
        return zone;

When bus was destroyed, trying to access bus->coalesced_zones led to undefined behavior.

The Patch

The fix was straightforward but crucial: after a failure in recreating the I/O bus, stop iterating through the now-invalid list. Plus, for code clarity, curly braces were added to the for-loop.

Here’s how the corrected code might look (simplified)

list_for_each_entry(zone, &bus->coalesced_zones, next) {
    if (in_coalesced_zone(gpa, len, zone)) {
        return zone;
    }
    if (bus_destroyed) {
        break;
    }
}

This ensures that if the bus is destroyed during iteration, the loop exits immediately to avoid accessing freed memory.

How Could This Be Exploited?

A malicious user with the ability to hot-unplug virtual devices and trigger low-memory conditions might be able to trigger this bug in practice. In most environments, this would “just” crash the VM or host (denial-of-service), but with more effort, a skilled attacker might use it to run code in kernel space (privilege escalation).

References

- CVE-2021-47060 at NVD
- Linux Kernel Patch
- KVM Bugzilla Report

Here’s a minimal C example to simulate this type of bug

#include <stdio.h>
#include <stdlib.h>

typedef struct zone {
    struct zone *next;
    int id;
} zone;

void iterate_zones(zone *head) {
    zone *z = head;
    while (z) {
        printf("Zone ID: %d\n", z->id);
        // Simulate destruction
        if (z->id == 2) {
            free(z->next);     // Remove rest of list (BAD!)
            z->next = NULL;    // List is now invalid
        }
        z = z->next;          // May dereference freed memory!
    }
}

int main() {
    zone *head = malloc(sizeof(zone));
    head->id = 1;
    head->next = malloc(sizeof(zone));
    head->next->id = 2;
    head->next->next = malloc(sizeof(zone));
    head->next->next->id = 3;
    head->next->next->next = NULL;
    iterate_zones(head);
    free(head->next);
    free(head);
}

*Running this may crash or display glitched output—demonstrating why this bug is a big deal!*

Conclusion

CVE-2021-47060 is a classic example of how edge-case error handling (like failing to allocate memory) can create dangerous security vulnerabilities. The Linux kernel team’s fix ensures safer host operation—even under resource stress—by halting risky operations when an error is detected.

Keep an eye out for unusual crashes if you’re running large numbers of KVM guests.

If you want more technical deep-dives on cutting-edge Linux vulnerabilities, follow major Linux mailing lists or the Kernel.org secdb.

Timeline

Published on: 02/29/2024 23:15:07 UTC
Last modified on: 11/07/2024 17:35:01 UTC