Physical Interrupt High-Level Design

Overview

The ACRN hypervisor implements a simple but fully functional framework to manage interrupts and exceptions, as shown in Figure 136. In its native layer, it configures the physical PIC, IOAPIC, and LAPIC to support different interrupt sources from the local timer/IPI to the external INTx/MSI. In its virtual guest layer, it emulates virtual PIC, virtual IOAPIC, and virtual LAPIC/passthrough LAPIC. It provides full APIs, allowing virtual interrupt injection from emulated or passthrough devices. The contents in this section do not include the passthrough LAPIC case. For the passthrough LAPIC, refer to LAPIC Passthrough Based on vLAPIC

../../_images/interrupt-image3.png

Figure 136 ACRN Interrupt Modules Overview

In the software modules view shown in Figure 137, the ACRN hypervisor sets up the physical interrupt in its basic interrupt modules (e.g., IOAPIC/LAPIC/IDT). It dispatches the interrupt in the hypervisor interrupt flow control layer to the corresponding handlers; this could be predefined IPI notification, timer, or runtime registered passthrough devices. The ACRN hypervisor then uses its VM interfaces based on vPIC, vIOAPIC, and vMSI modules, to inject the necessary virtual interrupt into the specific VM, or directly deliver interrupt to the specific RT VM with passthrough LAPIC.

../../_images/interrupt-image2.png

Figure 137 ACRN Interrupt SW Modules Overview

The hypervisor implements the following functionalities for handling physical interrupts:

  • Configure interrupt-related hardware including IDT, PIC, LAPIC, and IOAPIC on startup.

  • Provide APIs to manipulate the registers of LAPIC and IOAPIC.

  • Acknowledge physical interrupts.

  • Set up a callback mechanism for the other components in the hypervisor to request for an interrupt vector and register a handler for that interrupt.

HV owns all native physical interrupts and manages 256 vectors per CPU. All physical interrupts are first handled in VMX root-mode. The “external-interrupt exiting” bit in VM-Execution controls field is set to support this. The ACRN hypervisor also initializes all the interrupt related modules like IDT, PIC, IOAPIC, and LAPIC.

HV does not own any host devices (except UART). All devices are by default assigned to the Service VM. Any interrupts received by VM (Service VM or User VM) device drivers are virtual interrupts injected by HV (via vLAPIC). HV manages a Host-to-Guest mapping. When a native IRQ/interrupt occurs, HV decides whether this IRQ/interrupt should be forwarded to a VM and which VM to forward to (if any). Refer to Virtual Interrupt Injection and Interrupt Remapping for more information.

HV does not own any exceptions. Guest VMCS are configured so no VM Exit happens, with some exceptions such as #INT3 and #MC. This is to simplify the design as HV does not support any exception handling itself. HV supports only static memory mapping, so there should be no #PF or #GP. If HV receives an exception indicating an error, an assert function is then executed with an error message printout, and the system then halts.

Native interrupts can be generated from one of the following sources:

  • GSI interrupts

    • PIC or Legacy devices IRQ (0~15)

    • IOAPIC pin

  • PCI MSI/MSI-X vectors

  • Inter CPU IPI

  • LAPIC timer

Physical Interrupt Initialization

After ACRN hypervisor gets control from the bootloader, it initializes all physical interrupt-related modules for all the CPUs. ACRN hypervisor creates a framework to manage the physical interrupt for hypervisor local devices, passthrough devices, and IPI between CPUs, as shown in Figure 138:

../../_images/interrupt-image66.png

Figure 138 Physical Interrupt Initialization

IDT Initialization

ACRN hypervisor builds its native IDT (interrupt descriptor table) during interrupt initialization and sets up the following handlers:

  • On an exception, the hypervisor dumps its context and halts the current physical processor (because physical exceptions are not expected).

  • For external interrupts, HV may mask the interrupt (depending on the trigger mode), followed by interrupt acknowledgement and dispatch to the registered handler, if any.

Most interrupts and exceptions are handled without a stack switch, except for machine-check, double fault, and stack fault exceptions which have their own stack set in TSS.

PIC/IOAPIC Initialization

ACRN hypervisor masks all interrupts from the PIC. All legacy interrupts from PIC (<16) will be linked to IOAPIC, as shown in the connections in Figure 139.

ACRN will pre-allocate vectors and set them for these legacy interrupts in IOAPIC RTEs. For others (>= 16), ACRN will set them with vector 0 in RTEs, and valid vectors will be dynamically allocated on demand.

All external IOAPIC pins are categorized as GSI interrupt according to ACPI definition. HV supports multiple IOAPIC components. IRQ PIN to GSI mappings are maintained internally to determine GSI source IOAPIC. Native PIC is not used in the system.

../../_images/interrupt-image46.png

Figure 139 HV PIC/IOAPIC/LAPIC configuration

LAPIC Initialization

Physical LAPICs are in x2APIC mode in ACRN hypervisor. The hypervisor initializes LAPIC for each physical CPU by masking all interrupts in the local vector table (LVT), clearing all ISRs, and enabling LAPIC.

APIs are provided to access LAPIC for the other components in the hypervisor, aiming for further usage of local timer (TSC Deadline) program, IPI notification program, etc. See Data Structures and Interfaces for a complete list.

HV Interrupt Vectors and Delivery Mode

The interrupt vectors are assigned as shown here:

Vector 0-0x1F

are exceptions that are not handled by HV. If such an exception does occur, the system then halts.

Vector: 0x20-0x2F

are allocated statically for legacy IRQ0-15.

Vector: 0x30-0xDF

are dynamically allocated vectors for PCI devices INTx or MSI/MIS-X usage. According to different interrupt delivery mode (FLAT or PER_CPU mode), an interrupt will be assigned to a vector for all the CPUs or a particular CPU.

Vector: 0xE0-0xFE

are high priority vectors reserved by HV for dedicated purposes. For example, 0xEF is used for timer, 0xF0 is used for IPI.

Vectors

Usage

0x0-0x14

Exceptions: NMI, INT3, page dault, GP, debug.

0x15-0x1F

Reserved

0x20-0x2F

Statically allocated for external IRQ (IRQ0-IRQ15)

0x30-0xDF

Dynamically allocated for IOAPIC IRQ from PCI INTx/MSI

0xE0-0xFE

Static allocated for HV

0xEF

Timer

0xF0

IPI

0xF2

Posted Interrupt

0xF3

Hypervisor Callback HSM

0xF4

Performance Monitering Interrupt

0xFF

SPURIOUS_APIC_VECTOR

Interrupts from either IOAPIC or MSI can be delivered to a target CPU. By default they are configured as Lowest Priority (FLAT mode), i.e. they are delivered to a CPU core that is currently idle or executing lowest priority ISR. There is no guarantee a device’s interrupt will be delivered to a specific Guest’s CPU. Timer interrupts are an exception - these are always delivered to the CPU which programs the LAPIC timer.

x86-64 supports per CPU IDTs, but ACRN uses a global shared IDT, with which the interrupt/IRQ to vector mapping is the same on all CPUs. Vector allocation for CPUs is shown here:

../../_images/interrupt-image89.png

Figure 140 FLAT mode vector allocation

IRQ Descriptor Table

ACRN hypervisor maintains a global IRQ Descriptor Table shared among the physical CPUs, so the same vector will link to the same IRQ number for all CPUs.

The irq_desc[] array’s index represents IRQ number. A handle_irq will be called from interrupt_dispatch to commonly handle edge/level triggered IRQ and call the registered action_fn.

Another reverse mapping from vector to IRQ is used in addition to the IRQ descriptor table which maintains the mapping from IRQ to vector.

On initialization, the descriptor of the legacy IRQs are initialized with proper vectors and the corresponding reverse mapping is set up. The descriptor of other IRQs are filled with an invalid vector which will be updated on IRQ allocation.

For example, if local timer registers an interrupt with IRQ number 254 and vector 0xEF, then this date will be set up:

irq_desc[254].irq = 254
irq_desc[254].vector = 0xEF
vector_to_irq[0xEF] = 254

External Interrupt Handling

CPU runs under VMX non-root mode and inside Guest VMs. MSR_IA32_VMX_PINBASED_CTLS.bit[0] and MSR_IA32_VMX_EXIT_CTLS.bit[15] are set to allow vCPU VM Exit to HV whenever there are interrupts to that physical CPU under non-root mode. HV ACKs the interrupts in VMX non-root and saves the interrupt vector to the relevant VM Exit field for HV IRQ processing.

Note that as discussed above, an external interrupt causing vCPU VM Exit to HV does not mean that the interrupt belongs to that Guest VM. When CPU executes VM Exit into root-mode, interrupt handling will be enabled and the interrupt will be delivered and processed as quickly as possible inside HV. HV may emulate a virtual interrupt and inject to Guest if necessary.

Interrupt and IRQ processing flow diagrams are shown below:

../../_images/interrupt-image48.png

Figure 141 Processing of physical interrupts

When a physical interrupt is raised and delivered to a physical CPU, the CPU may be running under either VMX root mode or non-root mode.

  • If the CPU is running under VMX root mode, the interrupt is handled following the standard native IRQ flow: interrupt gate to dispatch_interrupt(), IRQ handler, and finally the registered callback.

  • If the CPU is running under VMX non-root mode, an external interrupt calls a VM exit for reason “external-interrupt”, and then the VM exit processing flow will call dispatch_interrupt() to dispatch and handle the interrupt.

After an interrupt occurs from either path shown in Figure 141, ACRN hypervisor will jump to dispatch_interrupt. This function gets the vector of the generated interrupt from the context, gets IRQ number from vector_to_irq[], and then gets the corresponding irq_desc.

Though there is only one generic IRQ handler for registered interrupt, there are three different handling flows according to flags:

  • !IRQF_LEVEL

  • IRQF_LEVEL && !IRQF_PT

    To avoid continuous interrupt triggers, it masks the IOAPIC pin and unmask it only after IRQ action callback is executed

  • IRQF_LEVEL && IRQF_PT

    For passthrough devices, to avoid continuous interrupt triggers, it masks the IOAPIC pin and leaves it unmasked until corresponding vIOAPIC pin gets an explicit EOI ACK from guest.

Since interrupts are not shared for multiple devices, there is only one IRQ action registered for each interrupt.

The IRQ number inside HV is a software concept to identify GSI and Vectors. Each GSI will be mapped to one IRQ. The GSI number is usually the same as the IRQ number. IRQ numbers greater than max GSI (nr_gsi) number are dynamically assigned. For example, HV allocates an interrupt vector to a PCI device, an IRQ number is then assigned to that vector. When the vector later reaches a CPU, the corresponding IRQ action function is located and executed.

See Figure 142 for request IRQ control flow for different conditions:

../../_images/interrupt-image76.png

Figure 142 Request IRQ for different conditions

IPI Management

The only purpose of IPI use in HV is to kick a vCPU out of non-root mode and enter to HV mode. This requires I/O request and virtual interrupt injection be distributed to different IPI vectors. The I/O request uses IPI vector 0xF3 upcall. The virtual interrupt injection uses IPI vector 0xF0.

0xF3 upcall

A Guest vCPU VM Exit exits due to EPT violation or IO instruction trap. It requires Device Module to emulate the MMIO/PortIO instruction. However it could be that the Service VM vCPU0 is still in non-root mode. So an IPI (0xF3 upcall vector) should be sent to the physical CPU0 (with non-root mode as vCPU0 inside the Service VM) to force vCPU0 to VM Exit due to the external interrupt. The virtual upcall vector is then injected to the Service VM, and the vCPU0 inside the Service VM then will pick up the IO request and do emulation for other Guest.

0xF0 IPI flow

If Device Module inside the Service VM needs to inject an interrupt to other Guest such as vCPU1, it will issue an IPI first to kick CPU1 (assuming CPU1 is running on vCPU1) to root-hv_interrupt-data-apmode. CPU1 will inject the interrupt before VM Enter.

Data Structures and Interfaces

IOAPIC

The following APIs are external interfaces for IOAPIC related operations.

uint32_t ioapic_gsi_to_irq(uint32_t gsi)

Get irq num from gsi num.

Parameters
  • gsi[in] The gsi number

Returns

irq number

void ioapic_set_rte(uint32_t irq, union ioapic_rte rte)

Set the redirection table entry.

Set the redirection table entry of an interrupt

Parameters
  • irq[in] The number of irq to set

  • rte[in] Union of ioapic_rte to set

void ioapic_get_rte(uint32_t irq, union ioapic_rte *rte)

Get the redirection table entry.

Get the redirection table entry of an interrupt

Preconditions

rte != NULL

Parameters
  • irq[in] The number of irq to fetch RTE

  • rte[inout] Pointer to union ioapic_rte to return result RTE

void suspend_ioapic(void)

Suspend ioapic.

Suspend ioapic, mainly save the RTEs.

void resume_ioapic(void)

Resume ioapic.

Resume ioapic, mainly restore the RTEs.

LAPIC

The following APIs are external interfaces for LAPIC related operations.

void early_init_lapic(void)

Enable LAPIC in x2APIC mode.

Enable LAPIC in x2APIC mode via MSR writes.

void suspend_lapic(void)

Suspend LAPIC.

Suspend LAPIC by getting the APIC base addr and saving the registers.

void resume_lapic(void)

Resume LAPIC.

Resume LAPIC by setting the APIC base addr and restoring the registers.

uint32_t get_cur_lapic_id(void)

Get the LAPIC ID.

Get the LAPIC ID via MSR read.

Returns

LAPIC ID

IPI

The following APIs are external interfaces for IPI related operations.

void send_startup_ipi(uint16_t dest_pcpu_id, uint64_t cpu_startup_start_address)

Send an SIPI to a specific cpu.

Send an Startup IPI to a specific cpu, to notify the cpu to start booting.

Parameters
  • dest_pcpu_id[in] The id of destination physical cpu

  • cpu_startup_start_address[in] The address for the dest pCPU to start running

void send_dest_ipi_mask(uint32_t dest_mask, uint32_t vector)

Send an IPI to multiple pCPUs.

Parameters
  • dest_mask[in] The mask of destination physical cpus

  • vector[in] The vector of interrupt

void send_single_ipi(uint16_t pcpu_id, uint32_t vector)

Send an IPI to a single pCPU.

Parameters
  • pcpu_id[in] The id of destination physical cpu

  • vector[in] The vector of interrupt

Physical Interrupt

The following APIs are external interfaces for physical interrupt related operations.

uint32_t reserve_irq_num(uint32_t req_irq)

Reserve an interrupt num.

Reserved interrupt num will not be available for dynamic IRQ allocations. This is normally used by the hypervisor for static IRQ mappings and/or arch specific, e.g. IOAPIC, interrupts during initialization.

Parameters
  • req_irq[in] irq_num to be reserved

Returns

>=0 – on success, IRQ_INVALID on failure

int32_t request_irq(uint32_t req_irq, irq_action_t action_fn, void *priv_data, uint32_t flags)

Request an interrupt.

Request interrupt num if not specified, and register irq action for the specified/allocated irq.

Parameters
  • req_irq[in] irq_num to request, if IRQ_INVALID, a free irq number will be allocated

  • action_fn[in] Function to be called when the IRQ occurs

  • priv_data[in] Private data for action function.

  • flags[in] Interrupt type flags, including: IRQF_NONE; IRQF_LEVEL - 1: level trigger; 0: edge trigger; IRQF_PT - 1: for passthrough dev

Returns

  • >=0 – on success

  • IRQ_INVALID – on failure

void free_irq(uint32_t irq)

Free an interrupt.

Free irq num and unregister the irq action.

Parameters
  • irq[in] irq_num to be freed

void set_irq_trigger_mode(uint32_t irq, bool is_level_triggered)

Set interrupt trigger mode.

Set the irq trigger mode: edge-triggered or level-triggered

Parameters
  • irq[in] irq_num of interrupt to be set

  • is_level_triggered[in] Trigger mode to set

void do_irq(const uint32_t irq)

Process an IRQ.

To process an IRQ, an action callback will be called if registered.

Parameters
  • irq – irq_num to be processed

void init_interrupt(uint16_t pcpu_id)

Initialize interrupt.

To do interrupt initialization for a cpu, will be called for each physical cpu.

Parameters
  • pcpu_id[in] The id of physical cpu to initialize