Interrupt Management high-level design¶
Overview¶
This document describes the interrupt management high-level design for the ACRN hypervisor.
The ACRN hypervisor implements a simple but fully functional framework to manage interrupts and exceptions, as show in Figure 48. In its native layer, it configures the physical PIC, IOAPIC, and LAPIC to support different interrupt sources from local timer/IPI to external INTx/MSI. In its virtual guest layer, it emulates virtual PIC, virtual IOAPIC and virtual LAPIC, and provides full APIs allowing virtual interrupt injection from emulated or pass-thru devices.
In the software modules view shown in Figure 49, the ACRN hypervisor sets up the physical interrupt in its basic interrupt modules (e.g., IOAPIC/LAPIC/IDT). It dispatches the interrupt in the hypervisor interrupt flow control layer to the corresponding handlers, that could be pre-defined IPI notification, timer, or runtime registered pass-thru devices. The ACRN hypervisor then uses its VM interfaces based on vPIC, vIOAPIC, and vMSI modules, to inject the necessary virtual interrupt into the specific VM
Hypervisor Physical Interrupt Management¶
The ACRN hypervisor is responsible for all the physical interrupt handling. All physical interrupts are first handled in VMX root-mode. The “external-interrupt exiting” bit in the VM-Execution controls field is set to support this. The ACRN hypervisor also initializes all the interrupt related modules such as IDT, PIC, IOAPIC, and LAPIC.
Only a few physical interrupts (such as TSC-Deadline timer and IOMMU) are fully serviced in the hypervisor. Most interrupts come from pass-thru devices whose interrupt are remapped to a virtual INTx/MSI source and injected to the SOS or UOS, according to the pass-thru device configuration.
The ACRN hypervisor does handle exceptions and any exception coming from the VMX root-mode will lead to the CPU halting. For guest exception, the hypervisor only traps #MC (machine check), prints a warning message, and injects the exception back into the guest OS.
Physical Interrupt Initialization¶
After the ACRN hypervisor get control from the bootloader, it initializes all physical interrupt-related modules for all the CPUs. The ACRN hypervisor creates a framework to manage the physical interrupt for hypervisor-local devices, pass-thru devices, and IPI between CPUs.
IDT¶
The ACRN hypervisor builds its native Interrupt Descriptor Table (IDT) during
interrupt initialization. For exceptions, it links to function
dispatch_exception
, and for external interrupts it links to function
dispatch_interrupt
. Please refer to arch/x86/idt.S
for more details.
LAPIC¶
The ACRN hypervisor resets LAPIC for each CPU, and provides basic APIs used, for example, by the local timer (TSC Deadline) program and IPI notification program. These APIs include write_laipic_reg32, send_lapic_eoi, send_startup_ipi, and send_single_ipi.
PIC/IOAPIC¶
The ACRN hypervisor masks all interrupts from PIC, so all the legacy interrupts from PIC (<16) are linked to IOAPIC, as shown in Figure 50.
ACRN will pre-allocate vectors and mask them for these legacy interrupts in IOAPIC RTE. For others (>= 16) ACRN will mask them with vector 0 in RTE, and the vector will be dynamically allocated on demand.
Irq Desc¶
The ACRN hypervisor maintains a global irq_desc[]
array shared among the
CPUs and uses a flat mode to manage the interrupts. The same
vector is linked to the same IRQ number for all CPUs.
The irq_desc[]
array is indexed by the IRQ number. An
irq_handler
field can be set to a common edge, level, or quick
handler called from interrupt_dispatch
. The irq_desc
structure
also contains the dev_list
field to maintain this IRQ’s action
handler list.
The global array vector_to_irq[]
is used to manage the vector
resource. This array is initialized with value IRQ_INVALID
for all
vectors, and will be set to a valid IRQ number after the corresponding
vector is registered.
For example, if the local timer registers interrupt with IRQ number 271 and vector 0xEF, then the arrays mentioned above will be set to:
irq_desc[271].irq = 271;
irq_desc[271].vector = 0xEF;
vector_to_irq[0xEF] = 271;
Physical Interrupt Flow¶
When an physical interrupt occurs, and the CPU is running under VMX root mode, the interrupt is triggered from the standard native irq flow: interrupt gate to irq handler. However, if the CPU is running under VMX non-root mode, an external interrupt will trigger a VM exit for reason “external-interrupt”. See Figure 51.
After an interrupt happens (in either case noted above), the ACRN
hypervisor jumps to dispatch_interrupt
. This function will check
which vector caused this interrupt, and the corresponding irq_desc
structure’s irq_handler
will be called for the service.
There are several irq_handler’s defined in the ACRN hypervisor, as shown
in Figure 51, designed for different uses. For
example, quick_handler_nolock
is used when no critical data needs
protection in the action handlers; the VCPU notification IPI and local
timer are good example of this use case.
The more complicated common_dev_handler_level
handler is intended
for pass-thru devices with level triggered interrupts. To avoid
continuously triggering the interrupt, it initially masks IOAPIC pin and
unmasks it only when the corresponding vIOAPIC pin gets an explicit EOI
ACK from the guest.
All the irq handler’s finally call their own action handler list, as shown here:
The common APIs for registering, updating, and unregistering interrupt handlers include irq_to_vector, dev_to_irq, dev_to_vector, pri_register_handler, normal_register_handler, unregister_handler_common, and update_irq_handler.
Physical Interrupt Source¶
The ACRN hypervisor handles interrupts from many different sources, as shown in Table 4:
Interrupt Source | Vector | Description |
---|---|---|
TSC Deadline Timer | 0xEF | The TSC deadline timer implements the timer framework in the hypervisor based on the LAPIC TSC deadline. This interrupt’s target is specific to the CPU to which the LAPIC belongs. |
CPU Startup IPI | N/A | The BSP needs to trigger an INIT-SIPI sequence to wake up the APs. This interrupt’s target is specified by the BSP calling `` start_cpus()``. |
VCPU Notify IPI | 0xF0 | When the hypervisor needs to kick the VCPU out of VMX non-root
mode to do requests such as virtual interrupt injection, EPT
flush, etc. This interrupt’s target is specified by function
send_single_ipi() . |
IOMMU MSI | dynamic | IOMMU device supports an MSI interrupt. The vtd device driver in the hypervisor will register an interrupt to handle dmar fault. This interrupt’s target is specified by vtd device driver. |
PTdev INTx | dynamic | All native devices are owned by the guest (SOS or UOS), taking advantage of the pass-thru method. Each pass-thru device connected with IOAPIC/PIC (PTdev INTx) will register an interrupt when its attached interrupt controller pin first gets unmasked. This interrupt’s target is defined by and RTE entry in the IOAPIC. |
PTdev MSI | dynamic | All native devices are owned by the guest (SOS or UOS), taking advantage of pass-thru method. Each pass-thru device with enabled MSI (PTdev MSI) will register an interrupt when the SOS does an explicit hypercall. This interrupt’s target is defined by an MSI address entry. |
Softirq¶
ACRN hypervisor implements a simple bottom-half softirq to execute the interrupt handler, as showed in Figure 51. The softirq is executed when an interrupt is enabled. Several APIs for softirq are defined including enable_softirq, disable_softirq, raise_softirq, and exec_softirq.
Physical Exception Handling¶
As mentioned earlier, the ACRN hypervisor does not handle any physical exceptions. The VMX root mode code path should guarantee no exceptions are triggered while the hypervisor is running.
Guest Virtual Interrupt Management¶
The previous sections describe physical interrupt management in the ACRN hypervisor. After a physical interrupt happens, a registered action handler is executed. Usually, the action handler represents a service for virtual interrupt injection. For example, if an interrupt is triggered from a pass-thru device, the appropriate virtual interrupt should be injected into its guest VM.
The virtual interrupt injection could also come from an emulated device. The I/O mediator in the Service OS (SOS) could trigger an interrupt through a hypercall, and then do the virtual interrupt injection in the hypervisor.
The following sections give an introduction to the ACRN guest virtual interrupt management, including VCPU request for virtual interrupt kick off, vPIC/vIOAPIC/vLAPIC for virtual interrupt injection interfaces, physical-to-virtual interrupt mapping for a pass-thru device, and the process of VMX interrupt/exception injection.
VCPU Request¶
As mentioned in physical_interrupt_source, physical vector 0xF0 is used to kick the VCPU out of its VMX non-root mode, and make a request for virtual interrupt injection or other requests such as flush EPT.
The request-make API (vcpu_make_request) and eventid supports virtual interrupt injection.
There are requests for exception injection (ACRN_REQUEST_EXCP), vLAPIC event (ACRN_REQUEST_EVENT), external interrupt from vPIC (ACRN_REQUEST_EXTINT) and non-maskable-interrupt (ACRN_REQUEST_NMI).
The vcpu_make_request
is necessary for a virtual interrupt
injection. If the target VCPU is running under VMX non-root mode, it
will send an IPI to kick it out and results in an external-interrupt
VM-Exit. The flow of Figure 51 could be executed
to complete the injection of a virtual interrupt.
There are some cases that do not need to send an IPI when making a request because the CPU making the request is the target VCPU. For example, the #GP exception request always happens on the current CPU when an invalid emulation happens. An external interrupt for a pass-thru device always happens on the VCPUs the device belongs to, so after it triggers an external-interrupt VM-Exit, the current CPU is also the target VCPU.
Virtual PIC¶
The ACRN hypervisor emulates a vPIC for each VM based on IO ranges 0x20-0x21, 0xa0-0xa1, or 0x4d0-0x4d1.
If an interrupt source from vPIC needs to inject an interrupt, the vpic_assert_irq, vpic_deassert_irq, or vpic_pulse_irq functions can be called to make a request for ACRN_REQUEST_EXTINT or ACRN_REQUEST_EVENT:
The vpic_pending_intr and vpic_intr_accepted APIs are used to query the vector being injected and ACK the service, by moving the interrupt from request service (IRR) to in service (ISR).
Virtual IOAPIC¶
ACRN hypervisor emulates a vIOAPIC for each VM based on MMIO VIOAPIC_BASE.
If an interrupt source from vIOAPIC needs to inject an interrupt, the vioapic_assert_irq, vioapic_dessert_irq, and vioapic_pulse_irq APIs are used to make a request for ACRN_REQUEST_EVENT.
As the vIOAPIC is always associated with a vLAPIC, the virtual interrupt injection from vIOAPIC will finally trigger a request for an vLAPIC event.
Virtual LAPIC¶
The ACRN hypervisor emulates a vLAPIC for each VCPU based on MMIO DEFAULT_APIC_BASE.
If an interrupt source from vLAPIC needs to inject an interrupt (e.g., from LVT such as an LAPIC timer, from vIOAPIC for a pass-thru device interrupt, or from an emulated device for a MSI), vlapic_intr_level, vlapic_intr_edge, vlapic_set_local_intr, vlapic_intr_msi, vlapic_deliver_intr APIs need to be called, resulting in a request for ACRN_REQUEST_EVENT.
The vlapic_pending_intr and vlapic_intr_accepted APIs are used to query the vector that needs to be injected and ACK the service that move the interrupt from request service (IRR) to in service (ISR).
By default, the ACRN hypervisor enables vAPIC to improve the performance of a vLAPIC emulation.
Virtual Exception¶
When doing emulation, an exception may be triggered in the hypervisor, for example, if guest accesses an invalid vMSR register, or the hypervisor needs to inject a #GP, or during instruction emulation, an instruction fetch may access a non-exist page from rip_gva, and a #PF must be injected.
ACRN hypervisor implements virtual exception injection using the vcpu_queue_exception, vcpu_inject_gq, and vcpu_inject_pf APIs.
The ACRN hypervisor uses vcpu_inject_gp/vcpu_inject_pf functions to queue exception requests, and follows Intel Software Developer Manual, Vol 3. - 6.15, Table 6-5 listing conditions for generating a double fault.
Interrupt Mapping for a Pass-thru Device¶
A VM can control a PCI device directly through pass-thru device assignment. The pass-thru entry is the major info object, and it is:
- A physical interrupt source, and could be a MSI/MSIX entry, PIC pins, or IOAPIC pins
- Pass-thru remapping information between physical and virtual interrupt source, for MSI/MSIX it is identified by a PCI device’s BDF. For PIC/IOAPIC it is identified by the pin number.
As shown in Figure 52 above, a UOS will assign its pass-thru device entry by the DM, and it will fill its entry info from:
- vPIC/vIOAPIC interrupt mask/unmask
- MSI IOReq from UOS then MSI hypercall from SOS
The SOS adds its pass-thru device entry at runtime and fills info for:
- vPIC/vIOAPIC interrupt mask/unmask
- MSI hypercall from SOS
During the pass-thru device entry info filling, the hypervisor builds native IOAPIC RTE/MSI entry based on vIOAPIC/vPIC/vMSI configuration, and register the physical interrupt handler for it. Then with the pass-thru device entry as the handler private data, the physical interrupt can be linked to a virtual pin of a guest’s vPIC/vIOAPIC or virtual vector of a guest’s vMSI. The handler then injects the corresponding virtual interrupt into the guest, based on vPIC/vIOAPIC/vLAPIC APIs described earlier.
Interrupt Storm Mitigation¶
When the Device Model (DM) launches a User OS (UOS), the ACRN hypervisor will remap the interrupt for this user OS’s pass-through devices. When an interrupt occurs for a pass-through device, the CPU core is assigned to that User OS gets trapped into the hypervisor. The benefit of such a mechanism is that, should an interrupt storm happen in a particular UOS, it will have only a minimal effect on the performance of the Service OS.
Interrupt/Exception Injection Process¶
As shown in Figure 51, the ACRN hypervisor injects virtual interrupt/exception to the guest before its VM-Entry.
This is done by updating the VMX_ENTRY_INT_INFO_FIELD of the VCPU’s VMCS. As this field is unique, the interrupt/exception injection must follow a priority rule to handle one-by-one.
Figure 53 below shows the rules about how to inject virtual interrupt/exception one-by-one. If a high priority interrupt/exception was already injected, the next pending interrupt/exception will enable an interrupt window where the next injection will be done by the following VM-Exit, triggered by the interrupt window.