In the Linux kernel, a vulnerability related to the management of nested virtualization has been found and resolved. The problem targeted the handling of eVMCS (enlightened Virtual Machine Control Structure) mapping after migration in KVM (Kernel-based Virtual Machine) nested virtualization. This article will explain the details of the vulnerability, share the code snippet of the fix, and provide links to the original references.

Background

The vulnerability specifically affects the Linux kernel code for handling the mapping of eVMCS pages during nested virtualization migrations. This mapping process is essential for ensuring that the nested virtual machines can continue operating smoothly after migration.

When using eVMCS and migrating the nested state with vmx_get_nested_state()/vmx_set_nested_state(), KVM can't map the eVMCS page right away because the eVMCS GPA (Guest Physical Address) is not part of the struct kvm_vmx_nested_state_hdr. Moreover, we can't read it from the VP assist page since userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring the nested state (QEMU does this, for instance).

To ensure that eVMCS is mapped, vmx_set_nested_state() raises the KVM_REQ_GET_NESTED_STATE_PAGES request.

However, a previous commit (f2c7ef3ba955) added the clearing of KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). This clearing was introduced to make sure the MSR (Model Specific Register) permission bitmap is not switched when an immediate exit from L2 to L1 occurs right after migration. Unfortunately, in the same situation, we still need to have eVMCS mapped so that nested_sync_vmcs12_to_shadow() reflects the changes in VMCS12 to eVMCS.

The Fix

The resolution to this vulnerability involves restoring nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). Although this fix is not perfect, as potential failures can't be easily propagated and it's possibly too late to do so, it solves the immediate issue of eVMCS mapping after migration.

/* The restored nested_get_evmcs_page() call */
if (!nested_evmcs_mapping &&
   (vmx->nested.sync_shadow_vmcs || to_vmx(vcpu->kvm_data)->emulate_invalid_guest_state)) {
   r = nested_get_evmcs_page(vcpu, vmcs12);
   if (r)
      return r;
}

It's important to note that the whole idea of using KVM_REQ_GET_NESTED_STATE_PAGES to map eVMCS after migration seems to be fragile, and this fix is only a band-aid solution. A more robust and comprehensive approach may be needed to address the issue entirely.

References

- Here's the original KVM: nVMX: Always make an attempt to map eVMCS after migration commit mention: Commit Link
- Link to the Linux kernel source code: Linux Kernel
- Original patch submission: Patch Submission

Conclusion

CVE-2021-46978 highlights a Linux kernel vulnerability related to the handling of eVMCS mapping after migration in KVM nested virtualization. While the current fix provides a temporary solution to the problem, a more solid approach may be necessary in the future to address this issue comprehensively.

Timeline

Published on: 02/28/2024 09:15:37 UTC
Last modified on: 02/28/2024 14:06:45 UTC