13 May, 2014

1 commit

  • commit 5678de3f15010b9022ee45673f33bcfc71d47b60 upstream.

    QE reported that they got the BUG_ON in ioapic_service to trigger.
    I cannot reproduce it, but there are two reasons why this could happen.

    The less likely but also easiest one, is when kvm_irq_delivery_to_apic
    does not deliver to any APIC and returns -1.

    Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
    function is never reached. However, you can target the similar loop in
    kvm_irq_delivery_to_apic_fast; just program a zero logical destination
    address into the IOAPIC, or an out-of-range physical destination address.

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Paolo Bonzini
     

04 Apr, 2014

1 commit

  • commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream.

    Commit bf6bddf1924e ("mm: introduce compaction and migration for
    ballooned pages") introduces page_count(page) into memory compaction
    which dereferences page->first_page if PageTail(page).

    This results in a very rare NULL pointer dereference on the
    aforementioned page_count(page). Indeed, anything that does
    compound_head(), including page_count() is susceptible to racing with
    prep_compound_page() and seeing a NULL or dangling page->first_page
    pointer.

    This patch uses Andrea's implementation of compound_trans_head() that
    deals with such a race and makes it the default compound_head()
    implementation. This includes a read memory barrier that ensures that
    if PageTail(head) is true that we return a head page that is neither
    NULL nor dangling. The patch then adds a store memory barrier to
    prep_compound_page() to ensure page->first_page is set.

    This is the safest way to ensure we see the head page that we are
    expecting, PageTail(page) is already in the unlikely() path and the
    memory barriers are unfortunately required.

    Hugetlbfs is the exception, we don't enforce a store memory barrier
    during init since no race is possible.

    Signed-off-by: David Rientjes
    Cc: Holger Kiehl
    Cc: Christoph Lameter
    Cc: Rafael Aquini
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Rientjes
     

23 Feb, 2014

1 commit

  • commit aac5c4226e7136c331ed384c25d5560204da10a0 upstream.

    If kvm_io_bus_register_dev() fails then it returns success but it should
    return an error code.

    I also did a little cleanup like removing an impossible NULL test.

    Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

20 Dec, 2013

1 commit

  • commit 338c7dbadd2671189cec7faf64c84d01071b3f96 upstream.

    In multiple functions the vcpu_id is used as an offset into a bitfield. Ag
    malicious user could specify a vcpu_id greater than 255 in order to set or
    clear bits in kernel memory. This could be used to elevate priveges in the
    kernel. This patch verifies that the vcpu_id provided is less than 255.
    The api documentation already specifies that the vcpu_id must be less than
    max_vcpus, but this is currently not checked.

    Reported-by: Andrew Honig
    Signed-off-by: Andrew Honig
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Andy Honig
     

30 Nov, 2013

1 commit

  • commit 27ef63c7e97d1e5dddd85051c03f8d44cc887f34 upstream.

    When determining the page size we could use to map with the IOMMU, the
    page size should also be aligned with the hva, not just the gfn. The
    gfn may not reflect the real alignment within the hugetlbfs file.

    Most of the time, this works fine. However, if the hugetlbfs file is
    backed by non-contiguous huge pages, a multi-huge page memslot starts at
    an unaligned offset within the hugetlbfs file, and the gfn is aligned
    with respect to the huge page size, kvm_host_page_size() will return the
    huge page size and we will use that to map with the IOMMU.

    When we later unpin that same memslot, the IOMMU returns the unmap size
    as the huge page size, and we happily unpin that many pfns in
    monotonically increasing order, not realizing we are spanning
    non-contiguous huge pages and partially unpin the wrong huge page.

    Ensure the IOMMU mapping page size is aligned with the hva corresponding
    to the gfn, which does reflect the alignment within the hugetlbfs file.

    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Greg Edwards
    Signed-off-by: Gleb Natapov
    Signed-off-by: Greg Kroah-Hartman

    Greg Edwards
     

11 May, 2013

1 commit

  • Pull kvm fixes from Gleb Natapov:
    "Most of the fixes are in the emulator since now we emulate more than
    we did before for correctness sake we see more bugs there, but there
    is also an OOPS fixed and corruption of xcr0 register."

    * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: emulator: emulate SALC
    KVM: emulator: emulate XLAT
    KVM: emulator: emulate AAM
    KVM: VMX: fix halt emulation while emulating invalid guest sate
    KVM: Fix kvm_irqfd_init initialization
    KVM: x86: fix maintenance of guest/host xcr0 state

    Linus Torvalds
     

10 May, 2013

1 commit

  • Pull MIPS updates from Ralf Baechle:

    - More work on DT support for various platforms

    - Various fixes that were to late to make it straight into 3.9

    - Improved platform support, in particular the Netlogic XLR and
    BCM63xx, and the SEAD3 and Malta eval boards.

    - Support for several Ralink SOC families.

    - Complete support for the microMIPS ASE which basically reencodes the
    existing MIPS32/MIPS64 ISA to use non-constant size instructions.

    - Some fallout from LTO work which remove old cruft and will generally
    make the MIPS kernel easier to maintain and resistant to compiler
    optimization, even in absence of LTO.

    - KVM support. While MIPS has announced hardware virtualization
    extensions this KVM extension uses trap and emulate mode for
    virtualization of MIPS32. More KVM work to add support for VZ
    hardware virtualizaiton extensions and MIPS64 will probably already
    be merged for 3.11.

    Most of this has been sitting in -next for a long time. All defconfigs
    have been build or run time tested except three for which fixes are being
    sent by other maintainers.

    Semantic conflict with kvm updates done as per Ralf

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits)
    MIPS: Add new GIC clockevent driver.
    MIPS: Formatting clean-ups for clocksources.
    MIPS: Refactor GIC clocksource code.
    MIPS: Move 'gic_frequency' to common location.
    MIPS: Move 'gic_present' to common location.
    MIPS: MIPS16e: Add unaligned access support.
    MIPS: MIPS16e: Support handling of delay slots.
    MIPS: MIPS16e: Add instruction formats.
    MIPS: microMIPS: Optimise 'strnlen' core library function.
    MIPS: microMIPS: Optimise 'strlen' core library function.
    MIPS: microMIPS: Optimise 'strncpy' core library function.
    MIPS: microMIPS: Optimise 'memset' core library function.
    MIPS: microMIPS: Add configuration option for microMIPS kernel.
    MIPS: microMIPS: Disable LL/SC and fix linker bug.
    MIPS: microMIPS: Add vdso support.
    MIPS: microMIPS: Add unaligned access support.
    MIPS: microMIPS: Support handling of delay slots.
    MIPS: microMIPS: Add support for exception handling.
    MIPS: microMIPS: Floating point support.
    MIPS: microMIPS: Fix macro naming in micro-assembler.
    ...

    Linus Torvalds
     

09 May, 2013

2 commits


08 May, 2013

1 commit

  • In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when
    kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
    kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
    called on the error handling path. This way, the kvm_irqfd system will
    not be ready.

    This patch fix the following:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] _raw_spin_lock+0xe/0x30
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in: vhost_net
    CPU 6
    Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
    RIP: 0010:[] [] _raw_spin_lock+0xe/0x30
    RSP: 0018:ffff880221721cc8 EFLAGS: 00010046
    RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
    RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
    RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
    R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
    R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
    FS: 00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
    Stack:
    ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
    0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
    ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
    Call Trace:
    [] __queue_work+0x45/0x2d0
    [] queue_work_on+0x8e/0xa0
    [] queue_work+0x19/0x20
    [] irqfd_deactivate+0x4b/0x60
    [] kvm_irqfd+0x39d/0x580
    [] kvm_vm_ioctl+0x207/0x5b0
    [] ? update_curr+0xf5/0x180
    [] do_vfs_ioctl+0x98/0x550
    [] ? finish_task_switch+0x4e/0xe0
    [] ? __schedule+0x2ea/0x710
    [] sys_ioctl+0x57/0x90
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b
    Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
    RIP [] _raw_spin_lock+0xe/0x30
    RSP
    CR2: 0000000000000000
    ---[ end trace 13fb1e4b6e5ab21f ]---

    Signed-off-by: Asias He
    Acked-by: Cornelia Huck
    Signed-off-by: Gleb Natapov

    Asias He
     

06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

05 May, 2013

1 commit


02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

27 Apr, 2013

9 commits

  • The hassle of getting refcounting right was greater than the hassle
    of keeping a list of devices to destroy on VM exit.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Hook the MPIC code up to the KVM interfaces, add locking, etc.

    Signed-off-by: Scott Wood
    [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently, devices that are emulated inside KVM are configured in a
    hardcoded manner based on an assumption that any given architecture
    only has one way to do it. If there's any need to access device state,
    it is done through inflexible one-purpose-only IOCTLs (e.g.
    KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
    cumbersome and depletes a limited numberspace.

    This API provides a mechanism to instantiate a device of a certain
    type, returning an ID that can be used to set/get attributes of the
    device. Attributes may include configuration parameters (e.g.
    register base address), device state, operational commands, etc. It
    is similar to the ONE_REG API, except that it acts on devices rather
    than vcpus.

    Both device types and individual attributes can be tested without having
    to create the device or get/set the attribute, without the need for
    separately managing enumerated capabilities.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Now that we have most irqfd code completely platform agnostic, let's move
    irqfd's resample capability return to generic code as well.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Setting up IRQ routes is nothing IOAPIC specific. Extract everything
    that really is generic code into irqchip.c and only leave the ioapic
    specific bits to irq_comm.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The current irq_comm.c file contains pieces of code that are generic
    across different irqchip implementations, as well as code that is
    fully IOAPIC specific.

    Split the generic bits out into irqchip.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The IRQ routing set ioctl lives in the hacky device assignment code inside
    of KVM today. This is definitely the wrong place for it. Move it to the much
    more natural kvm_main.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Quite a bit of code in KVM has been conditionalized on availability of
    IOAPIC emulation. However, most of it is generically applicable to
    platforms that don't have an IOPIC, but a different type of irq chip.

    Make code that only relies on IRQ routing, not an APIC itself, on
    CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The concept of routing interrupt lines to an irqchip is nothing
    that is IOAPIC specific. Every irqchip has a maximum number of pins
    that can be linked to irq lines.

    So let's add a new define that allows us to reuse generic code for
    non-IOAPIC platforms.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     

17 Apr, 2013

3 commits


16 Apr, 2013

7 commits

  • Current interrupt coalescing logci which only used by RTC has conflict
    with Posted Interrupt.
    This patch introduces a new mechinism to use eoi to track interrupt:
    When delivering an interrupt to vcpu, the pending_eoi set to number of
    vcpu that received the interrupt. And decrease it when each vcpu writing
    eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
    write eoi.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Userspace may deliver RTC interrupt without query the status. So we
    want to track RTC EOI for this case.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Need the EOI to track interrupt deliver status, so force vmexit
    on EOI for rtc interrupt when enabling virtual interrupt delivery.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • restore rtc_status from migration or save/restore

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Add a new parameter to know vcpus who received the interrupt.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • rtc_status is used to track RTC interrupt delivery status. The pending_eoi
    will be increased by vcpu who received RTC interrupt and will be decreased
    when EOI to this interrupt.
    Also, we use dest_map to record the destination vcpu to avoid the case that
    vcpu who didn't get the RTC interupt, but issued EOI with same vector of RTC
    and descreased pending_eoi by mistake.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Add vcpu info to ioapic_update_eoi, so we can know which vcpu
    issued this EOI.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     

08 Apr, 2013

2 commits

  • The routine kvm_spurious_fault() is an x86 specific routine, so
    move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c.

    Fixes this sparse warning when building on arm64:

    virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     
  • The routines get_user_page_nowait(), kvm_io_bus_sort_cmp(), kvm_io_bus_insert_dev()
    and kvm_io_bus_get_first_dev() are only referenced within kvm_main.c, so give them
    static linkage.

    Fixes sparse warnings like these:

    virt/kvm/kvm_main.c: warning: symbol 'get_user_page_nowait' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     

07 Apr, 2013

3 commits


21 Mar, 2013

1 commit

  • Merge reason:

    From: Alexander Graf

    "Just recently this really important patch got pulled into Linus' tree for 3.9:

    commit 1674400aaee5b466c595a8fc310488263ce888c7
    Author: Anton Blanchard samba.org>
    Date: Tue Mar 12 01:51:51 2013 +0000

    Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

    Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

    * upstream/master: (653 commits)
    PCI: Use ROM images from firmware only if no other ROM source available
    sparc: remove unused "config BITS"
    sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
    KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
    KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
    KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
    arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
    arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
    inet: limit length of fragment queue hash table bucket lists
    qeth: Fix scatter-gather regression
    qeth: Fix invalid router settings handling
    qeth: delay feature trace
    sgy-cts1000: Remove __dev* attributes
    KVM: x86: fix deadlock in clock-in-progress request handling
    KVM: allow host header to be included even for !CONFIG_KVM
    hwmon: (lm75) Fix tcn75 prefix
    hwmon: (lm75.h) Update header inclusion
    MAINTAINERS: Remove Mark M. Hoffman
    xfs: ensure we capture IO errors correctly
    xfs: fix xfs_iomap_eof_prealloc_initial_size type
    ...

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     

20 Mar, 2013

1 commit

  • If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
    that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
    that request. ioapic_read_indirect contains an
    ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
    non-debug builds. In recent kernels this allows a guest to cause a kernel
    oops by reading invalid memory. In older kernels (pre-3.3) this allows a
    guest to read from large ranges of host memory.

    Tested: tested against apic unit tests.

    Signed-off-by: Andrew Honig
    Signed-off-by: Marcelo Tosatti

    Andy Honig
     

11 Mar, 2013

1 commit

  • This helps in filtering out the eligible candidates further and
    thus potentially helps in quickly allowing preempted lockholders to run.
    Note that if a vcpu was spinning during preemption we filter them
    by checking whether they are preempted due to pause loop exit.

    Reviewed-by: Chegu Vinod
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Raghavendra K T
    Signed-off-by: Gleb Natapov

    Raghavendra K T