10 Oct, 2012

2 commits

  • Pull UML changes from Richard Weinberger:
    "UML receives this time only cleanups.

    The most outstanding change is the 'include "foo.h"' do 'include
    ' conversion done by Al Viro.

    It touches many files, that's why the diffstat is rather big."

    * 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    typo in UserModeLinux-HOWTO
    hppfs: fix the return value of get_inode()
    hostfs: drop vmtruncate
    um: get rid of pointless include "..." where include will do
    um: move sysrq.h out of include/shared
    um/x86: merge 32 and 64 bit variants of ptrace.h
    um/x86: merge 32 and 64bit variants of checksum.h

    Linus Torvalds
     
  • [it seems that I sent it to the wrong maintainer at first... sorry for that]
    copy_from_user was meant instead of copy_to_user.

    Signed-off-by: Richard Genoud
    Signed-off-by: Richard Weinberger

    Richard Genoud
     

23 Sep, 2012

1 commit

  • To emulate level triggered interrupts, add a resample option to
    KVM_IRQFD. When specified, a new resamplefd is provided that notifies
    the user when the irqchip has been resampled by the VM. This may, for
    instance, indicate an EOI. Also in this mode, posting of an interrupt
    through an irqfd only asserts the interrupt. On resampling, the
    interrupt is automatically de-asserted prior to user notification.
    This enables level triggered interrupts to be posted and re-enabled
    from vfio with no userspace intervention.

    All resampling irqfds can make use of a single irq source ID, so we
    reserve a new one for this interface.

    Signed-off-by: Alex Williamson
    Signed-off-by: Avi Kivity

    Alex Williamson
     

18 Sep, 2012

1 commit


09 Sep, 2012

1 commit


22 Aug, 2012

1 commit

  • In current code, if we map a readonly memory space from host to guest
    and the page is not currently mapped in the host, we will get a fault
    pfn and async is not allowed, then the vm will crash

    We introduce readonly memory region to map ROM/ROMD to the guest, read access
    is happy for readonly memslot, write access on readonly memslot will cause
    KVM_EXIT_MMIO exit

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Avi Kivity

    Xiao Guangrong
     

14 Aug, 2012

2 commits


25 Jul, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights include
    - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
    - relatively small ppc and s390 updates
    - PCID/INVPCID support in guests
    - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
    - Lockless write faults during live migration
    - EPT accessed/dirty bits support for new Intel processors"

    Fix up conflicts in:
    - Documentation/virtual/kvm/api.txt:

    Stupid subchapter numbering, added next to each other.

    - arch/powerpc/kvm/booke_interrupts.S:

    PPC asm changes clashing with the KVM fixes

    - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

    Duplicated commits through the kvm tree and the s390 tree, with
    subsequent edits in the KVM tree.

    * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
    KVM: fix race with level interrupts
    x86, hyper: fix build with !CONFIG_KVM_GUEST
    Revert "apic: fix kvm build on UP without IOAPIC"
    KVM guest: switch to apic_set_eoi_write, apic_write
    apic: add apic_set_eoi_write for PV use
    KVM: VMX: Implement PCID/INVPCID for guests with EPT
    KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
    KVM: PPC: Critical interrupt emulation support
    KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
    KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
    KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
    KVM: PPC: bookehv64: Add support for std/ld emulation.
    booke: Added crit/mc exception handler for e500v2
    booke/bookehv: Add host crit-watchdog exception support
    KVM: MMU: document mmu-lock and fast page fault
    KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
    KVM: MMU: trace fast page fault
    KVM: MMU: fast path of handling guest page fault
    KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
    KVM: MMU: fold tlb flush judgement into mmu_spte_update
    ...

    Linus Torvalds
     

11 Jul, 2012

1 commit


03 Jul, 2012

1 commit


25 Jun, 2012

1 commit


30 May, 2012

2 commits

  • If there is pending critical or machine check interrupt then guest
    would like to capture it when guest enable MSR.CE and MSR_ME respectively.
    Also as mostly MSR_CE and MSR_ME are updated with rfi/rfci/rfmii
    which anyway traps so removing the the paravirt optimization for MSR.CE
    and MSR.ME.

    Signed-off-by: Bharat Bhushan
    Signed-off-by: Alexander Graf

    Bharat Bhushan
     
  • This adds a new ioctl to enable userspace to control the size of the guest
    hashed page table (HPT) and to clear it out when resetting the guest.
    The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
    a pointer to a u32 containing the desired order of the HPT (log base 2
    of the size in bytes), which is updated on successful return to the
    actual order of the HPT which was allocated.

    There must be no vcpus running at the time of this ioctl. To enforce
    this, we now keep a count of the number of vcpus running in
    kvm->arch.vcpus_running.

    If the ioctl is called when a HPT has already been allocated, we don't
    reallocate the HPT but just clear it out. We first clear the
    kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
    the kvm->lock mutex, it will prevent any vcpus from starting to run until
    we're done, and (b) it means that the first vcpu to run after we're done
    will re-establish the VRMA if necessary.

    If userspace doesn't call this ioctl before running the first vcpu, the
    kernel will allocate a default-sized HPT at that point. We do it then
    rather than when creating the VM, as the code did previously, so that
    userspace has a chance to do the ioctl if it wants.

    When allocating the HPT, we can allocate either from the kernel page
    allocator, or from the preallocated pool. If userspace is asking for
    a different size from the preallocated HPTs, we first try to allocate
    using the kernel page allocator. Then we try to allocate from the
    preallocated pool, and then if that fails, we try allocating decreasing
    sizes from the kernel page allocator, down to the minimum size allowed
    (256kB). Note that the kernel page allocator limits allocations to
    1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
    16MB (on 64-bit powerpc, at least).

    Signed-off-by: Paul Mackerras
    [agraf: fix module compilation]
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

25 May, 2012

1 commit

  • Pull KVM changes from Avi Kivity:
    "Changes include additional instruction emulation, page-crossing MMIO,
    faster dirty logging, preventing the watchdog from killing a stopped
    guest, module autoload, a new MSI ABI, and some minor optimizations
    and fixes. Outside x86 we have a small s390 and a very large ppc
    update.

    Regarding the new (for kvm) rebaseless workflow, some of the patches
    that were merged before we switch trees had to be rebased, while
    others are true pulls. In either case the signoffs should be correct
    now."

    Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
    arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.

    I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
    check effectively twice (it was done differently in two different
    commits), but better safe than sorry ;)

    * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
    KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
    KVM: s390: onereg for timer related registers
    KVM: s390: epoch difference and TOD programmable field
    KVM: s390: KVM_GET/SET_ONEREG for s390
    KVM: s390: add capability indicating COW support
    KVM: Fix mmu_reload() clash with nested vmx event injection
    KVM: MMU: Don't use RCU for lockless shadow walking
    KVM: VMX: Optimize %ds, %es reload
    KVM: VMX: Fix %ds/%es clobber
    KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
    KVM: VMX: unlike vmcs on fail path
    KVM: PPC: Emulator: clean up SPR reads and writes
    KVM: PPC: Emulator: clean up instruction parsing
    kvm/powerpc: Add new ioctl to retreive server MMU infos
    kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
    KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
    KVM: PPC: Book3S: Enable IRQs during exit handling
    KVM: PPC: Fix PR KVM on POWER7 bare metal
    KVM: PPC: Fix stbux emulation
    KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
    ...

    Linus Torvalds
     

22 May, 2012

1 commit


08 May, 2012

1 commit

  • PPC updates from Alex.

    * 'for-upstream' of git://github.com/agraf/linux-2.6:
    KVM: PPC: Emulator: clean up SPR reads and writes
    KVM: PPC: Emulator: clean up instruction parsing
    kvm/powerpc: Add new ioctl to retreive server MMU infos
    kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
    KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
    KVM: PPC: Book3S: Enable IRQs during exit handling
    KVM: PPC: Fix PR KVM on POWER7 bare metal
    KVM: PPC: Fix stbux emulation
    KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
    KVM: PPC: Book3S: PR: No isync in slbie path
    KVM: PPC: Book3S: PR: Optimize entry path
    KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.
    KVM: PPC: Restrict PPC_[L|ST]D macro to asm code
    KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies.
    KVM: PPC: Use clockevent multiplier and shifter for decrementer
    KVM: Use minimum and maximum address mapped by TLB1

    Signed-off-by: Avi Kivity

    Avi Kivity
     

06 May, 2012

2 commits

  • This is necessary for qemu to be able to pass the right information
    to the guest, such as the supported page sizes and corresponding
    encodings in the SLB and hash table, which can vary depending
    on the processor type, the type of KVM used (PR vs HV) and the
    version of KVM

    Signed-off-by: Benjamin Herrenschmidt
    [agraf: fix compilation on hv, adjust for newer ioctl numbers]
    Signed-off-by: Alexander Graf

    Benjamin Herrenschmidt
     
  • cpuid eax should return the max leaf so that
    guests can find out the valid range.
    This matches Xen et al.
    Update documentation to match.

    Tested with -cpu host.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Michael S. Tsirkin
     

28 Apr, 2012

3 commits

  • We can't run PIT IRQ injection work in the interrupt context of the host
    timer. This would allow the user to influence the handler complexity by
    asking for a broadcast to a large number of VCPUs. Therefore, this work
    was pushed into workqueue context in 9d244caf2e. However, this prevents
    prioritizing the PIT injection over other task as workqueues share
    kernel threads.

    This replaces the workqueue with a kthread worker and gives that thread
    a name in the format "kvm-pit/". That allows to
    identify and adjust the kthread priority according to the VM process
    parameters.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • Add descriptions for KVM_CREATE_PIT2 and KVM_GET/SET_PIT2.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     
  • This helps to identify sections and it also fixes the numbering from
    4.54 to 4.61.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka
     

24 Apr, 2012

1 commit

  • Currently, MSI messages can only be injected to in-kernel irqchips by
    defining a corresponding IRQ route for each message. This is not only
    unhandy if the MSI messages are generated "on the fly" by user space,
    IRQ routes are a limited resource that user space has to manage
    carefully.

    By providing a direct injection path, we can both avoid using up limited
    resources and simplify the necessary steps for user land.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

08 Apr, 2012

1 commit


29 Mar, 2012

1 commit

  • Pull kvm updates from Avi Kivity:
    "Changes include timekeeping improvements, support for assigning host
    PCI devices that share interrupt lines, s390 user-controlled guests, a
    large ppc update, and random fixes."

    This is with the sign-off's fixed, hopefully next merge window we won't
    have rebased commits.

    * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
    KVM: Convert intx_mask_lock to spin lock
    KVM: x86: fix kvm_write_tsc() TSC matching thinko
    x86: kvmclock: abstract save/restore sched_clock_state
    KVM: nVMX: Fix erroneous exception bitmap check
    KVM: Ignore the writes to MSR_K7_HWCR(3)
    KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask
    KVM: PMU: add proper support for fixed counter 2
    KVM: PMU: Fix raw event check
    KVM: PMU: warn when pin control is set in eventsel msr
    KVM: VMX: Fix delayed load of shared MSRs
    KVM: use correct tlbs dirty type in cmpxchg
    KVM: Allow host IRQ sharing for assigned PCI 2.3 devices
    KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
    KVM: x86 emulator: Allow PM/VM86 switch during task switch
    KVM: SVM: Fix CPL updates
    KVM: x86 emulator: VM86 segments must have DPL 3
    KVM: x86 emulator: Fix task switch privilege checks
    arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
    KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation
    KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
    ...

    Linus Torvalds
     

08 Mar, 2012

1 commit

  • PCI 2.3 allows to generically disable IRQ sources at device level. This
    enables us to share legacy IRQs of such devices with other host devices
    when passing them to a guest.

    The new IRQ sharing feature introduced here is optional, user space has
    to request it explicitly. Moreover, user space can inform us about its
    view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
    interrupt and signaling it if the guest masked it via the virtualized
    PCI config space.

    Signed-off-by: Jan Kiszka
    Acked-by: Alex Williamson
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Avi Kivity

    Jan Kiszka
     

07 Mar, 2012

1 commit


05 Mar, 2012

10 commits

  • Instead of keeping separate copies of struct kvm_vcpu_arch_shared (one in
    the code, one in the docs) that inevitably fail to be kept in sync
    (already sr[] is missing from the doc version), just point to the header
    file as the source of documentation on the contents of the magic page.

    Signed-off-by: Scott Wood
    Acked-by: Avi Kivity
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Scott Wood
     
  • Until now, we always set HIOR based on the PVR, but this is just wrong.
    Instead, we should be setting HIOR explicitly, so user space can decide
    what the initial HIOR value is - just like on real hardware.

    We keep the old PVR based way around for backwards compatibility, but
    once user space uses the SET_ONE_REG based method, we drop the PVR logic.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • Right now we transfer a static struct every time we want to get or set
    registers. Unfortunately, over time we realize that there are more of
    these than we thought of before and the extensibility and flexibility of
    transferring a full struct every time is limited.

    So this is a new approach to the problem. With these new ioctls, we can
    get and set a single register that is identified by an ID. This allows for
    very precise and limited transmittal of data. When we later realize that
    it's a better idea to shove over multiple registers at once, we can reuse
    most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
    interface.

    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • This implements a shared-memory API for giving host userspace access to
    the guest's TLB.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Scott Wood
     
  • On some cpus the overhead for virtualization instructions is in the same
    range as a system call. Having to call multiple ioctls to get set registers
    will make certain userspace handled exits more expensive than necessary.
    Lets provide a section in kvm_run that works as a shared save area
    for guest registers.
    We also provide two 64bit flags fields (architecture specific), that will
    specify
    1. which parts of these fields are valid.
    2. which registers were modified by userspace

    Each bit for these flag fields will define a group of registers (like
    general purpose) or a single register.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch allows the user to fault in pages on a virtual cpus
    address space for user controlled virtual machines. Typically this
    is superfluous because userspace can just create a mapping and
    let the kernel's page fault logic take are of it. There is one
    exception: SIE won't start if the lowcore is not present. Normally
    the kernel takes care of this [handle_validity() in
    arch/s390/kvm/intercept.c] but since the kernel does not handle
    intercepts for user controlled virtual machines, userspace needs to
    be able to handle this condition.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch exports the s390 SIE hardware control block to userspace
    via the mapping of the vcpu file descriptor. In order to do so,
    a new arch callback named kvm_arch_vcpu_fault is introduced for all
    architectures. It allows to map architecture specific pages.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch introduces a new exit reason in the kvm_run structure
    named KVM_EXIT_S390_UCONTROL. This exit indicates, that a virtual cpu
    has regognized a fault on the host page table. The idea is that
    userspace can handle this fault by mapping memory at the fault
    location into the cpu's address space and then continue to run the
    virtual cpu.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch introduces two ioctls for virtual cpus, that are only
    valid for kernel virtual machines that are controlled by userspace.
    Each virtual cpu has its individual address space in this mode of
    operation, and each address space is backed by the gmap
    implementation just like the address space for regular KVM guests.
    KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
    space to the vcpu. Starting offset and length in both the user and
    the vcpu address space need to be aligned to 1M.
    KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
    virtual cpu in a similar way.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch introduces a new config option for user controlled kernel
    virtual machines. It introduces a parameter to KVM_CREATE_VM that
    allows to set bits that alter the capabilities of the newly created
    virtual machine.
    The parameter is passed to kvm_arch_init_vm for all architectures.
    The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
    This requires CAP_SYS_ADMIN privileges and creates a user controlled
    virtual machine on s390 architectures.

    Signed-off-by: Carsten Otte
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Carsten Otte
     

28 Jan, 2012

1 commit


12 Jan, 2012

1 commit


27 Dec, 2011

1 commit