04 Sep, 2014

2 commits

  • commit 350b8bdd689cd2ab2c67c8a86a0be86cfa0751a7 upstream.

    The third parameter of kvm_iommu_put_pages is wrong,
    It should be 'gfn - slot->base_gfn'.

    By making gfn very large, malicious guest or userspace can cause kvm to
    go to this error path, and subsequently to pass a huge value as size.
    Alternatively if gfn is small, then pages would be pinned but never
    unpinned, causing host memory leak and local DOS.

    Passing a reasonable but large value could be the most dangerous case,
    because it would unpin a page that should have stayed pinned, and thus
    allow the device to DMA into arbitrary memory. However, this cannot
    happen because of the condition that can trigger the error:

    - out of memory (where you can't allocate even a single page)
    should not be possible for the attacker to trigger

    - when exceeding the iommu's address space, guest pages after gfn
    will also exceed the iommu's address space, and inside
    kvm_iommu_put_pages() the iommu_iova_to_phys() will fail. The
    page thus would not be unpinned at all.

    Reported-by: Jack Morgenstein
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jiri Slaby

    Michael S. Tsirkin
     
  • commit 0f6c0a740b7d3e1f3697395922d674000f83d060 upstream.

    Currently, the EOI exit bitmap (used for APICv) does not include
    interrupts that are masked. However, this can cause a bug that manifests
    as an interrupt storm inside the guest. Alex Williamson reported the
    bug and is the one who really debugged this; I only wrote the patch. :)

    The scenario involves a multi-function PCI device with OHCI and EHCI
    USB functions and an audio function, all assigned to the guest, where
    both USB functions use legacy INTx interrupts.

    As soon as the guest boots, interrupts for these devices turn into an
    interrupt storm in the guest; the host does not see the interrupt storm.
    Basically the EOI path does not work, and the guest continues to see the
    interrupt over and over, even after it attempts to mask it at the APIC.
    The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
    with not many changes in the area of APIC/IOAPIC handling).

    Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
    on in the eoi_exit_bitmap and TMR, and things then work. What happens
    is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
    It does not have set bit 59 because the RTE was masked, so the IOAPIC
    never sees the EOI and the interrupt continues to fire in the guest.

    My guess was that the guest is masking the interrupt in the redirection
    table in the interrupt routine, i.e. while the interrupt is set in a
    LAPIC's ISR, The simplest fix is to ignore the masking state, we would
    rather have an unnecessary exit rather than a missed IRQ ACK and anyway
    IOAPIC interrupts are not as performance-sensitive as for example MSIs.
    Alex tested this patch and it fixed his bug.

    [Thanks to Alex for his precise description of the problem
    and initial debugging effort. A lot of the text above is
    based on emails exchanged with him.]

    Reported-by: Alex Williamson
    Tested-by: Alex Williamson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jiri Slaby

    Paolo Bonzini
     

15 May, 2014

1 commit

  • commit 91021a6c8ffdc55804dab5acdfc7de4f278b9ac3 upstream.

    When dispatch SGI(mode == 0), that is the vcpu of VM should send
    sgi to the cpu which the target_cpus list.
    So, there must add the "break" to branch of case 0.

    Signed-off-by: Haibin Wang
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Jiri Slaby

    Haibin Wang
     

05 May, 2014

1 commit

  • commit 5678de3f15010b9022ee45673f33bcfc71d47b60 upstream.

    QE reported that they got the BUG_ON in ioapic_service to trigger.
    I cannot reproduce it, but there are two reasons why this could happen.

    The less likely but also easiest one, is when kvm_irq_delivery_to_apic
    does not deliver to any APIC and returns -1.

    Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
    function is never reached. However, you can target the similar loop in
    kvm_irq_delivery_to_apic_fast; just program a zero logical destination
    address into the IOAPIC, or an out-of-range physical destination address.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jiri Slaby

    Paolo Bonzini
     

23 Feb, 2014

1 commit

  • commit aac5c4226e7136c331ed384c25d5560204da10a0 upstream.

    If kvm_io_bus_register_dev() fails then it returns success but it should
    return an error code.

    I also did a little cleanup like removing an impossible NULL test.

    Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

20 Dec, 2013

1 commit

  • commit 338c7dbadd2671189cec7faf64c84d01071b3f96 upstream.

    In multiple functions the vcpu_id is used as an offset into a bitfield. Ag
    malicious user could specify a vcpu_id greater than 255 in order to set or
    clear bits in kernel memory. This could be used to elevate priveges in the
    kernel. This patch verifies that the vcpu_id provided is less than 255.
    The api documentation already specifies that the vcpu_id must be less than
    max_vcpus, but this is currently not checked.

    Reported-by: Andrew Honig
    Signed-off-by: Andrew Honig
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Andy Honig
     

30 Nov, 2013

1 commit

  • commit 27ef63c7e97d1e5dddd85051c03f8d44cc887f34 upstream.

    When determining the page size we could use to map with the IOMMU, the
    page size should also be aligned with the hva, not just the gfn. The
    gfn may not reflect the real alignment within the hugetlbfs file.

    Most of the time, this works fine. However, if the hugetlbfs file is
    backed by non-contiguous huge pages, a multi-huge page memslot starts at
    an unaligned offset within the hugetlbfs file, and the gfn is aligned
    with respect to the huge page size, kvm_host_page_size() will return the
    huge page size and we will use that to map with the IOMMU.

    When we later unpin that same memslot, the IOMMU returns the unmap size
    as the huge page size, and we happily unpin that many pfns in
    monotonically increasing order, not realizing we are spanning
    non-contiguous huge pages and partially unpin the wrong huge page.

    Ensure the IOMMU mapping page size is aligned with the hva corresponding
    to the gfn, which does reflect the alignment within the hugetlbfs file.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Greg Edwards
    Signed-off-by: Gleb Natapov
    Signed-off-by: Greg Kroah-Hartman

    Greg Edwards
     

30 Oct, 2013

1 commit


03 Oct, 2013

1 commit


17 Sep, 2013

2 commits

  • When we cancel 'async_pf_execute()', we should behave as if the work was
    never scheduled in 'kvm_setup_async_pf()'.
    Fixes a bug when we can't unload module because the vm wasn't destroyed.

    Signed-off-by: Radim Krčmář
    Reviewed-by: Paolo Bonzini
    Reviewed-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     
  • Page tables in a read-only memory slot will currently cause a triple
    fault because the page walker uses gfn_to_hva and it fails on such a slot.

    OVMF uses such a page table; however, real hardware seems to be fine with
    that as long as the accessed/dirty bits are set. Save whether the slot
    is readonly, and later check it when updating the accessed and dirty bits.

    Reviewed-by: Xiao Guangrong
    Reviewed-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

05 Sep, 2013

1 commit

  • Pull vfs pile 1 from Al Viro:
    "Unfortunately, this merge window it'll have a be a lot of small piles -
    my fault, actually, for not keeping #for-next in anything that would
    resemble a sane shape ;-/

    This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
    cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
    components) + several long-standing patches from various folks.

    There definitely will be a lot more (starting with Miklos'
    check_submount_and_drop() series)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    direct-io: Handle O_(D)SYNC AIO
    direct-io: Implement generic deferred AIO completions
    add formats for dentry/file pathnames
    kvm eventfd: switch to fdget
    powerpc kvm: use fdget
    switch fchmod() to fdget
    switch epoll_ctl() to fdget
    switch copy_module_from_fd() to fdget
    git simplify nilfs check for busy subtree
    ibmasmfs: don't bother passing superblock when not needed
    don't pass superblock to hypfs_{mkdir,create*}
    don't pass superblock to hypfs_diag_create_files
    don't pass superblock to hypfs_vm_create_files()
    oprofile: get rid of pointless forward declarations of struct super_block
    oprofilefs_create_...() do not need superblock argument
    oprofilefs_mkdir() doesn't need superblock argument
    don't bother with passing superblock to oprofile_create_stats_files()
    oprofile: don't bother with passing superblock to ->create_files()
    don't bother passing sb to oprofile_create_files()
    coh901318: don't open-code simple_read_from_buffer()
    ...

    Linus Torvalds
     

04 Sep, 2013

1 commit


30 Aug, 2013

3 commits

  • For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in
    one word and since there are 32 private (per cpu) irqs, we have 8
    private u32 fields on the vgic_bytemap struct. We shift the offset from
    the base of the register group right by 2, giving us the word index
    instead of the field index. But then there are 8 private words, not 4,
    which is also why we subtract 8 words from the offset of the shared
    words.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Gleb Natapov

    Christoffer Dall
     
  • All the code in handle_mmio_cfg_reg() assumes the offset has
    been shifted right to accomodate for the 2:1 bit compression,
    but this is only done when getting the register address.

    Shift the offset early so the code works mostly unchanged.

    Reported-by: Zhaobo (Bob, ERC)
    Signed-off-by: Marc Zyngier
    Signed-off-by: Gleb Natapov

    Marc Zyngier
     
  • vgic_get_target_reg is quite complicated, for no good reason.
    Actually, it is fairly easy to write it in a much more efficient
    way by using the target CPU array instead of the bitmap.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Gleb Natapov

    Marc Zyngier
     

28 Aug, 2013

1 commit


27 Aug, 2013

1 commit

  • The checks on PG_reserved in the page structure on head and tail pages
    aren't necessary because split_huge_page wouldn't transfer the
    PG_reserved bit from head to tail anyway.

    This was a forward-thinking check done in the case PageReserved was
    set by a driver-owned page mapped in userland with something like
    remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
    possible right now). It was meant to be very safe, but it's overkill
    as it's unlikely split_huge_page could ever run without the driver
    noticing and tearing down the hugepage itself.

    And if a driver in the future will really want to map a reserved
    hugepage in userland using an huge pmd it should simply take care of
    marking all subpages reserved too to keep KVM safe. This of course
    would require such a hypothetical driver to tear down the huge pmd
    itself and splitting the hugepage itself, instead of relaying on
    split_huge_page, but that sounds very reasonable, especially
    considering split_huge_page wouldn't currently transfer the reserved
    bit anyway.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Gleb Natapov

    Andrea Arcangeli
     

26 Aug, 2013

1 commit

  • KVM uses anon_inode_get() to allocate file descriptors as part
    of some of its ioctls. But those ioctls are lacking a flag argument
    allowing userspace to choose options for the newly opened file descriptor.

    In such case it's advised to use O_CLOEXEC by default so that
    userspace is allowed to choose, without race, if the file descriptor
    is going to be inherited across exec().

    This patch set O_CLOEXEC flag on all file descriptors created
    with anon_inode_getfd() to not leak file descriptors across exec().

    Signed-off-by: Yann Droneaud
    Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Yann Droneaud
     

29 Jul, 2013

1 commit

  • kvm_io_bus_sort_cmp is used also directly, not just as a callback for
    sort and bsearch. In these cases, it is handy to have a type-safe
    variant. This patch introduces such a variant, __kvm_io_bus_sort_cmp,
    and uses it throughout kvm_main.c.

    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

18 Jul, 2013

2 commits

  • This is called right after the memslots is updated, i.e. when the result
    of update_memslots() gets installed in install_new_memslots(). Since
    the memslots needs to be updated twice when we delete or move a memslot,
    kvm_arch_commit_memory_region() does not correspond to this exactly.

    In the following patch, x86 will use this new API to check if the mmio
    generation has reached its maximum value, in which case mmio sptes need
    to be flushed out.

    Signed-off-by: Takuya Yoshikawa
    Acked-by: Alexander Graf
    Reviewed-by: Xiao Guangrong
    Signed-off-by: Paolo Bonzini

    Takuya Yoshikawa
     
  • Add new functions kvm_io_bus_{read,write}_cookie() that allows users of
    the kvm io infrastructure to use a cookie value to speed up lookup of a
    device on an io bus.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Gleb Natapov

    Cornelia Huck
     

04 Jul, 2013

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "On the x86 side, there are some optimizations and documentation
    updates. The big ARM/KVM change for 3.11, support for AArch64, will
    come through Catalin Marinas's tree. s390 and PPC have misc cleanups
    and bugfixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
    KVM: PPC: Ignore PIR writes
    KVM: PPC: Book3S PR: Invalidate SLB entries properly
    KVM: PPC: Book3S PR: Allow guest to use 1TB segments
    KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
    KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
    KVM: PPC: Book3S PR: Fix proto-VSID calculations
    KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
    KVM: Fix RTC interrupt coalescing tracking
    kvm: Add a tracepoint write_tsc_offset
    KVM: MMU: Inform users of mmio generation wraparound
    KVM: MMU: document fast invalidate all mmio sptes
    KVM: MMU: document fast invalidate all pages
    KVM: MMU: document fast page fault
    KVM: MMU: document mmio page fault
    KVM: MMU: document write_flooding_count
    KVM: MMU: document clear_spte_count
    KVM: MMU: drop kvm_mmu_zap_mmio_sptes
    KVM: MMU: init kvm generation close to mmio wrap-around value
    KVM: MMU: add tracepoint for check_mmio_spte
    KVM: MMU: fast invalidate all mmio sptes
    ...

    Linus Torvalds
     

27 Jun, 2013

3 commits

  • KVM/ARM pull request for 3.11 merge window

    * tag 'kvm-arm-3.11' of git://git.linaro.org/people/cdall/linux-kvm-arm.git:
    ARM: kvm: don't include drivers/virtio/Kconfig
    Update MAINTAINERS: KVM/ARM work now funded by Linaro
    arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logic
    ARM: KVM: clear exclusive monitor on all exception returns
    ARM: KVM: add missing dsb before invalidating Stage-2 TLBs
    ARM: KVM: perform save/restore of PAR
    ARM: KVM: get rid of S2_PGD_SIZE
    ARM: KVM: don't special case PC when doing an MMIO
    ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDs
    ARM: KVM: remove dead prototype for __kvm_tlb_flush_vmid
    ARM: KVM: Don't handle PSCI calls via SMC
    ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq

    Gleb Natapov
     
  • This reverts most of the f1ed0450a5fac7067590317cbf027f566b6ccbca. After
    the commit kvm_apic_set_irq() no longer returns accurate information
    about interrupt injection status if injection is done into disabled
    APIC. RTC interrupt coalescing tracking relies on the information to be
    accurate and cannot recover if it is not.

    Signed-off-by: Gleb Natapov

    Gleb Natapov
     
  • The arch_timer irq numbers (or PPI numbers) are implementation dependent,
    so the host virtual timer irq number can be different from guest virtual
    timer irq number.

    This patch ensures that host virtual timer irq number is read from DTB and
    guest virtual timer irq is determined based on vcpu target type.

    Signed-off-by: Anup Patel
    Signed-off-by: Pranavkumar Sawargaonkar
    Signed-off-by: Christoffer Dall

    Anup Patel
     

04 Jun, 2013

1 commit

  • We can easily reach the 1000 limit by start VM with a couple
    hundred I/O devices (multifunction=on). The hardcode limit
    already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

    In userspace, we already have maximum file descriptor to
    limit ioeventfd count. But kvm_io_bus devices also are used
    for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
    by maximum file descriptor.

    Currently only ioeventfds take too much kvm_io_bus devices,
    so just exclude it from counting kvm_io_range limit.

    Also fixed one indent issue in kvm_host.h

    Signed-off-by: Amos Kong
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: Gleb Natapov

    Amos Kong
     

19 May, 2013

1 commit

  • As KVM/arm64 is looming on the horizon, it makes sense to move some
    of the common code to a single location in order to reduce duplication.

    The code could live anywhere. Actually, most of KVM is already built
    with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
    not exactly talking about style here. But maybe it is time to start
    moving into a less ugly direction.

    The include files must be in a "public" location, as they are accessed
    from non-KVM files (arch/arm/kernel/asm-offsets.c).

    For this purpose, introduce two new locations:
    - virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
    virt/kvm, so this could be seen as a (very ugly) precedent.
    - include/kvm/ : there is already an include/xen, and while the
    intent is slightly different, this seems as good a location as
    any

    Eventually, we should probably have independant Makefiles at every
    levels (just like everywhere else in the kernel), but this is just
    the first step.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Gleb Natapov

    Marc Zyngier
     

14 May, 2013

1 commit

  • Since the arrival of posted interrupt support we can no longer guarantee
    that coalesced IRQs are always reported to the IRQ source. Moreover,
    accumulated APIC timer events could cause a busy loop when a VCPU should
    rather be halted. The consensus is to remove coalesced tracking from the
    LAPIC.

    Signed-off-by: Jan Kiszka
    Acked-by: Marcelo Tosatti
    Signed-off-by: Gleb Natapov

    Jan Kiszka
     

12 May, 2013

1 commit


11 May, 2013

1 commit

  • Pull kvm fixes from Gleb Natapov:
    "Most of the fixes are in the emulator since now we emulate more than
    we did before for correctness sake we see more bugs there, but there
    is also an OOPS fixed and corruption of xcr0 register."

    * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: emulator: emulate SALC
    KVM: emulator: emulate XLAT
    KVM: emulator: emulate AAM
    KVM: VMX: fix halt emulation while emulating invalid guest sate
    KVM: Fix kvm_irqfd_init initialization
    KVM: x86: fix maintenance of guest/host xcr0 state

    Linus Torvalds
     

10 May, 2013

1 commit

  • Pull MIPS updates from Ralf Baechle:

    - More work on DT support for various platforms

    - Various fixes that were to late to make it straight into 3.9

    - Improved platform support, in particular the Netlogic XLR and
    BCM63xx, and the SEAD3 and Malta eval boards.

    - Support for several Ralink SOC families.

    - Complete support for the microMIPS ASE which basically reencodes the
    existing MIPS32/MIPS64 ISA to use non-constant size instructions.

    - Some fallout from LTO work which remove old cruft and will generally
    make the MIPS kernel easier to maintain and resistant to compiler
    optimization, even in absence of LTO.

    - KVM support. While MIPS has announced hardware virtualization
    extensions this KVM extension uses trap and emulate mode for
    virtualization of MIPS32. More KVM work to add support for VZ
    hardware virtualizaiton extensions and MIPS64 will probably already
    be merged for 3.11.

    Most of this has been sitting in -next for a long time. All defconfigs
    have been build or run time tested except three for which fixes are being
    sent by other maintainers.

    Semantic conflict with kvm updates done as per Ralf

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits)
    MIPS: Add new GIC clockevent driver.
    MIPS: Formatting clean-ups for clocksources.
    MIPS: Refactor GIC clocksource code.
    MIPS: Move 'gic_frequency' to common location.
    MIPS: Move 'gic_present' to common location.
    MIPS: MIPS16e: Add unaligned access support.
    MIPS: MIPS16e: Support handling of delay slots.
    MIPS: MIPS16e: Add instruction formats.
    MIPS: microMIPS: Optimise 'strnlen' core library function.
    MIPS: microMIPS: Optimise 'strlen' core library function.
    MIPS: microMIPS: Optimise 'strncpy' core library function.
    MIPS: microMIPS: Optimise 'memset' core library function.
    MIPS: microMIPS: Add configuration option for microMIPS kernel.
    MIPS: microMIPS: Disable LL/SC and fix linker bug.
    MIPS: microMIPS: Add vdso support.
    MIPS: microMIPS: Add unaligned access support.
    MIPS: microMIPS: Support handling of delay slots.
    MIPS: microMIPS: Add support for exception handling.
    MIPS: microMIPS: Floating point support.
    MIPS: microMIPS: Fix macro naming in micro-assembler.
    ...

    Linus Torvalds
     

09 May, 2013

2 commits


08 May, 2013

1 commit

  • In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when
    kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
    kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
    called on the error handling path. This way, the kvm_irqfd system will
    not be ready.

    This patch fix the following:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] _raw_spin_lock+0xe/0x30
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in: vhost_net
    CPU 6
    Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
    RIP: 0010:[] [] _raw_spin_lock+0xe/0x30
    RSP: 0018:ffff880221721cc8 EFLAGS: 00010046
    RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
    RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
    RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
    R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
    R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
    FS: 00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
    Stack:
    ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
    0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
    ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
    Call Trace:
    [] __queue_work+0x45/0x2d0
    [] queue_work_on+0x8e/0xa0
    [] queue_work+0x19/0x20
    [] irqfd_deactivate+0x4b/0x60
    [] kvm_irqfd+0x39d/0x580
    [] kvm_vm_ioctl+0x207/0x5b0
    [] ? update_curr+0xf5/0x180
    [] do_vfs_ioctl+0x98/0x550
    [] ? finish_task_switch+0x4e/0xe0
    [] ? __schedule+0x2ea/0x710
    [] sys_ioctl+0x57/0x90
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b
    Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
    RIP [] _raw_spin_lock+0xe/0x30
    RSP
    CR2: 0000000000000000
    ---[ end trace 13fb1e4b6e5ab21f ]---

    Signed-off-by: Asias He
    Acked-by: Cornelia Huck
    Signed-off-by: Gleb Natapov

    Asias He
     

06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

05 May, 2013

1 commit


02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

27 Apr, 2013

2 commits