16 Jan, 2009

1 commit


31 Dec, 2008

5 commits

  • There is no point in doing the ready_for_nmi_injection/
    request_nmi_window dance with user space. First, we don't do this for
    in-kernel irqchip anyway, while the code path is the same as for user
    space irqchip mode. And second, there is nothing to loose if a pending
    NMI is overwritten by another one (in contrast to IRQs where we have to
    save the number). Actually, there is even the risk of raising spurious
    NMIs this way because the reason for the held-back NMI might already be
    handled while processing the first one.

    Therefore this patch creates a simplified user space NMI injection
    interface, exporting it under KVM_CAP_USER_NMI and dropping the old
    KVM_CAP_NMI capability. And this time we also take care to provide the
    interface only on archs supporting NMIs via KVM (right now only x86).

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Userspace might need to act differently.

    Signed-off-by: Avi Kivity

    Avi Kivity
     
  • We enable guest MSI and host MSI support in this patch. The userspace want to
    enable MSI should set KVM_DEV_IRQ_ASSIGN_ENABLE_MSI in the assigned_irq's flag.
    Function would return -ENOTTY if can't enable MSI, userspace shouldn't set MSI
    Enable bit when KVM_ASSIGN_IRQ return -ENOTTY with
    KVM_DEV_IRQ_ASSIGN_ENABLE_MSI.

    Userspace can tell the support of MSI device from #ifdef KVM_CAP_DEVICE_MSI.

    Signed-off-by: Sheng Yang
    Signed-off-by: Avi Kivity

    Sheng Yang
     
  • Prepared for kvm_arch_assigned_device_msi_dispatch().

    Signed-off-by: Sheng Yang
    Signed-off-by: Avi Kivity

    Sheng Yang
     
  • Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
    injecting NMIs from user space and also extends the statistic report
    accordingly.

    Based on the original patch by Sheng Yang.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Sheng Yang
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

28 Oct, 2008

1 commit


15 Oct, 2008

9 commits

  • With intel iommu hardware, we can assign devices to kvm/ia64 guests.

    Signed-off-by: Xiantao Zhang
    Signed-off-by: Avi Kivity

    Xiantao Zhang
     
  • To share with other archs, this patch moves device assignment
    logic to common parts.

    Signed-off-by: Xiantao Zhang
    Signed-off-by: Avi Kivity

    Xiantao Zhang
     
  • Based on a patch by: Kay, Allen M

    This patch enables PCI device assignment based on VT-d support.
    When a device is assigned to the guest, the guest memory is pinned and
    the mapping is updated in the VT-d IOMMU.

    [Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present
    and also control enable/disable from userspace]

    Signed-off-by: Kay, Allen M
    Signed-off-by: Weidong Han
    Signed-off-by: Ben-Ami Yassour
    Signed-off-by: Amit Shah

    Acked-by: Mark Gross
    Signed-off-by: Avi Kivity

    Ben-Ami Yassour
     
  • Based on a patch from: Amit Shah

    This patch adds support for handling PCI devices that are assigned to
    the guest.

    The device to be assigned to the guest is registered in the host kernel
    and interrupt delivery is handled. If a device is already assigned, or
    the device driver for it is still loaded on the host, the device
    assignment is failed by conveying a -EBUSY reply to the userspace.

    Devices that share their interrupt line are not supported at the moment.

    By itself, this patch will not make devices work within the guest.
    The VT-d extension is required to enable the device to perform DMA.
    Another alternative is PVDMA.

    Signed-off-by: Amit Shah
    Signed-off-by: Ben-Ami Yassour
    Signed-off-by: Weidong Han
    Signed-off-by: Avi Kivity

    Ben-Ami Yassour
     
  • This patch adds a trace point for the instruction emulation on embedded powerpc
    utilizing the KVM_TRACE interface.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Christian Ehrhardt
     
  • This patch adds trace points to track powerpc TLB activities using the
    KVM_TRACE infrastructure.

    Signed-off-by: Jerone Young
    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Jerone Young
     
  • The current kvmtrace code uses get_cycles() while the interpretation would be
    easier using using nanoseconds. ktime_get() should give at least the same
    accuracy as get_cycles on all architectures (even better on 32bit archs) but
    at a better unit (e.g. comparable between hosts with different frequencies.

    [avi: avoid ktime_t in public header]

    Signed-off-by: Christian Ehrhardt
    Acked-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Christian Ehrhardt
     
  • This patch fixes kvmtrace use on big endian systems. When using bit fields the
    compiler will lay data out in the wrong order expected when laid down into a
    file.
    This fixes it by using one variable instead of using bit fields.

    Signed-off-by: Jerone Young
    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Christian Ehrhardt
     
  • Move KVM trace definitions from x86 specific kvm headers to common kvm
    headers to create a cross-architecture numbering scheme for trace
    events. This means the kvmtrace_format userspace tool won't need to know
    which architecture produced the log file being processed.

    Signed-off-by: Jerone Young
    Signed-off-by: Hollis Blanchard
    Signed-off-by: Avi Kivity

    Hollis Blanchard
     

25 Aug, 2008

1 commit

  • The following part of commit 9ef621d3be56e1188300476a8102ff54f7b6793f
    (KVM: Support mixed endian machines) changed on the size of a struct
    that is exported to userspace:

    include/linux/kvm.h:

    @@ -318,14 +318,14 @@ struct kvm_trace_rec {
    __u32 vcpu_id;
    union {
    struct {
    - __u32 cycle_lo, cycle_hi;
    + __u64 cycle_u64;
    __u32 extra_u32[KVM_TRC_EXTRA_MAX];
    } cycle;
    struct {
    __u32 extra_u32[KVM_TRC_EXTRA_MAX];
    } nocycle;
    } u;
    -};
    +} __attribute__((packed));

    Packing a struct was the correct idea, but it packed the wrong struct.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Avi Kivity

    Adrian Bunk
     

29 Jul, 2008

1 commit


20 Jul, 2008

2 commits

  • Currently kvmtrace is not portable. This will prevent from copying a
    trace file from big-endian target to little-endian workstation for analysis.
    In the patch, kernel outputs metadata containing a magic number to trace
    log, and changes 64-bit words to be u64 instead of a pair of u32s.

    Signed-off-by: Tan Li
    Acked-by: Jerone Young
    Acked-by: Hollis Blanchard
    Signed-off-by: Avi Kivity

    Tan, Li
     
  • This patch adds all needed structures to coalesce MMIOs.
    Until an architecture uses it, it is not compiled.

    Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
    can be coalesced:

    - KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
    It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
    a memory area where MMIOs can be coalesced until the next switch to
    user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.

    - KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
    the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).

    The userspace client can check kernel coalesced MMIO availability by asking
    ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
    The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
    or the page offset where will be stored the ring buffer.
    The page offset depends on the architecture.

    After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
    a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
    an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
    to the address of the start of th kvm_run structure. The MMIO ring buffer
    is defined by the structure kvm_coalesced_mmio_ring.

    [akio: fix oops during guest shutdown]

    Signed-off-by: Laurent Vivier
    Signed-off-by: Akio Takebe
    Signed-off-by: Avi Kivity

    Laurent Vivier
     

27 Apr, 2008

15 commits

  • Device Control Registers are essentially another address space found on PowerPC
    4xx processors, analogous to PIO on x86. DCRs are always 32 bits, and can be
    identified by a 32-bit number. We forward most DCR accesses to userspace for
    emulation (with the exception of CPR0 registers, which can be read directly
    for simplicity in timebase frequency determination).

    Signed-off-by: Hollis Blanchard
    Signed-off-by: Avi Kivity

    Hollis Blanchard
     
  • So userspace can save/restore the mpstate during migration.

    [avi: export the #define constants describing the value]
    [christian: add s390 stubs]
    [avi: ditto for ia64]

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Trace markers allow userspace to trace execution of a virtual machine
    in order to monitor its performance.

    Signed-off-by: Feng (Eric) Liu
    Signed-off-by: Avi Kivity

    Feng (Eric) Liu
     
  • This patch introduces interpretation of some diagnose instruction intercepts.
    Diagnose is our classic architected way of doing a hypercall. This patch
    features the following diagnose codes:
    - vm storage size, that tells the guest about its memory layout
    - time slice end, which is used by the guest to indicate that it waits
    for a lock and thus cannot use up its time slice in a useful way
    - ipl functions, which a guest can use to reset and reboot itself

    In order to implement ipl functions, we also introduce an exit reason that
    causes userspace to perform various resets on the virtual machine. All resets
    are described in the principles of operation book, except KVM_S390_RESET_IPL
    which causes a reboot of the machine.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch contains the s390 interrupt subsystem (similar to in kernel apic)
    including timer interrupts (similar to in-kernel-pit) and enabled wait
    (similar to in kernel hlt).

    In order to achieve that, this patch also introduces intercept handling
    for instruction intercepts, and it implements load control instructions.

    This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
    the vm file descriptors and the vcpu file descriptors. In case this ioctl is
    issued against a vm file descriptor, the interrupt is considered floating.
    Floating interrupts may be delivered to any virtual cpu in the configuration.

    The following interrupts are supported:
    SIGP STOP - interprocessor signal that stops a remote cpu
    SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
    (stopped) remote cpu
    INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
    and for smp_call_function() in the guest.
    PROGRAM INT - exception during program execution such as page fault, illegal
    instruction and friends
    RESTART - interprocessor signal that starts a stopped cpu
    INT VIRTIO - floating interrupt for virtio signalisation
    INT SERVICE - floating interrupt for signalisations from the system
    service processor

    struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
    an interrupt, also carrys parameter data for interrupts along with the interrupt
    type. Interrupts on s390 usually have a state that represents the current
    operation, or identifies which device has caused the interruption on s390.

    kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
    disabled wait (that is, disabled for interrupts), we exit to userspace. In case
    of an enabled wait we set up a timer that equals the cpu clock comparator value
    and sleep on a wait queue.

    [christian: change virtio interrupt to 0x2603]

    Acked-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This path introduces handling of sie intercepts in three flavors: Intercepts
    are either handled completely in-kernel by kvm_handle_sie_intercept(),
    or passed to userspace with corresponding data in struct kvm_run in case
    kvm_handle_sie_intercept() returns -ENOTSUPP.
    In case of partial execution in kernel with the need of userspace support,
    kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
    -EREMOTE.

    The trivial intercept reasons are handled in this patch:
    handle_noop() just does nothing for intercepts that don't require our support
    at all
    handle_stop() is called when a cpu enters stopped state, and it drops out to
    userland after updating our vcpu state
    handle_validity() faults in the cpu lowcore if needed, or passes the request
    to userland

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
    (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
    instruction SIE to run virtual machines with up to 64 virtual CPUs each.
    This port is only usable on 64bit host kernels, and can only run 64bit guest
    kernels. However, running 31bit applications in guest userspace is possible.

    The following source files are introduced by this patch
    arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
    arch callbacks for kvm. __vcpu_run calls back into
    sie64a to enter the guest machine context
    arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
    context via SIE, and switches world before and after that
    include/asm-s390/kvm_host.h contains all vital data structures needed to run
    virtual machines on the mainframe
    include/asm-s390/kvm.h defines kvm_regs and friends for user access to
    guest register content
    arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
    arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
    later patches

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • include/linux/kvm.h defines struct kvm_dirty_log to
    [...]
    union {
    void __user *dirty_bitmap; /* one bit per page */
    __u64 padding;
    };

    __user requires compiler.h to compile. Currently, this works on x86
    only coincidentally due to other include files. This patch makes
    kvm.h compile in all cases.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • Hypercall based pte updates are faster than faults, and also allow use
    of the lazy MMU mode to batch operations.

    Don't report the feature if two dimensional paging is enabled.

    [avi:
    - one mmu_op hypercall instead of one per op
    - allow 64-bit gpa on hypercall
    - don't pass host errors (-ENOMEM) to guest]

    [akpm: warning fix on i386]

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Andrew Morton
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Add basic KVM paravirt support. Avoid vm-exits on IO delays.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Signed-off-by: Sheng Yang
    Signed-off-by: Avi Kivity

    Sheng Yang
     
  • The patch moves the PIT model from userspace to kernel, and increases
    the timer accuracy greatly.

    [marcelo: make last_injected_time per-guest]

    Signed-off-by: Sheng Yang
    Signed-off-by: Marcelo Tosatti
    Tested-and-Acked-by: Alex Davis
    Signed-off-by: Avi Kivity

    Sheng Yang
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • This is the host part of kvm clocksource implementation. As it does
    not include clockevents, it is a fairly simple implementation. We
    only have to register a per-vcpu area, and start writing to it periodically.

    The area is binary compatible with xen, as we use the same shadow_info
    structure.

    [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
    [avi: save full value of the msr, even if enable bit is clear]
    [avi: clear previous value of time_page]

    Signed-off-by: Glauber de Oliveira Costa
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Glauber de Oliveira Costa
     

03 Mar, 2008

1 commit

  • One of the use cases for the supported cpuid list is to create a "greatest
    common denominator" of cpu capabilities in a server farm. As such, it is
    useful to be able to get the list without creating a virtual machine first.

    Since the code does not depend on the vm in any way, all that is needed is
    to move it to the device ioctl handler. The capability identifier is also
    changed so that binaries made against -rc1 will fail gracefully.

    Signed-off-by: Avi Kivity

    Avi Kivity
     

31 Jan, 2008

4 commits