11 May, 2013

1 commit

  • Pull kvm fixes from Gleb Natapov:
    "Most of the fixes are in the emulator since now we emulate more than
    we did before for correctness sake we see more bugs there, but there
    is also an OOPS fixed and corruption of xcr0 register."

    * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: emulator: emulate SALC
    KVM: emulator: emulate XLAT
    KVM: emulator: emulate AAM
    KVM: VMX: fix halt emulation while emulating invalid guest sate
    KVM: Fix kvm_irqfd_init initialization
    KVM: x86: fix maintenance of guest/host xcr0 state

    Linus Torvalds
     

10 May, 2013

1 commit

  • Pull MIPS updates from Ralf Baechle:

    - More work on DT support for various platforms

    - Various fixes that were to late to make it straight into 3.9

    - Improved platform support, in particular the Netlogic XLR and
    BCM63xx, and the SEAD3 and Malta eval boards.

    - Support for several Ralink SOC families.

    - Complete support for the microMIPS ASE which basically reencodes the
    existing MIPS32/MIPS64 ISA to use non-constant size instructions.

    - Some fallout from LTO work which remove old cruft and will generally
    make the MIPS kernel easier to maintain and resistant to compiler
    optimization, even in absence of LTO.

    - KVM support. While MIPS has announced hardware virtualization
    extensions this KVM extension uses trap and emulate mode for
    virtualization of MIPS32. More KVM work to add support for VZ
    hardware virtualizaiton extensions and MIPS64 will probably already
    be merged for 3.11.

    Most of this has been sitting in -next for a long time. All defconfigs
    have been build or run time tested except three for which fixes are being
    sent by other maintainers.

    Semantic conflict with kvm updates done as per Ralf

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits)
    MIPS: Add new GIC clockevent driver.
    MIPS: Formatting clean-ups for clocksources.
    MIPS: Refactor GIC clocksource code.
    MIPS: Move 'gic_frequency' to common location.
    MIPS: Move 'gic_present' to common location.
    MIPS: MIPS16e: Add unaligned access support.
    MIPS: MIPS16e: Support handling of delay slots.
    MIPS: MIPS16e: Add instruction formats.
    MIPS: microMIPS: Optimise 'strnlen' core library function.
    MIPS: microMIPS: Optimise 'strlen' core library function.
    MIPS: microMIPS: Optimise 'strncpy' core library function.
    MIPS: microMIPS: Optimise 'memset' core library function.
    MIPS: microMIPS: Add configuration option for microMIPS kernel.
    MIPS: microMIPS: Disable LL/SC and fix linker bug.
    MIPS: microMIPS: Add vdso support.
    MIPS: microMIPS: Add unaligned access support.
    MIPS: microMIPS: Support handling of delay slots.
    MIPS: microMIPS: Add support for exception handling.
    MIPS: microMIPS: Floating point support.
    MIPS: microMIPS: Fix macro naming in micro-assembler.
    ...

    Linus Torvalds
     

09 May, 2013

2 commits


08 May, 2013

1 commit

  • In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when
    kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
    kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
    called on the error handling path. This way, the kvm_irqfd system will
    not be ready.

    This patch fix the following:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] _raw_spin_lock+0xe/0x30
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in: vhost_net
    CPU 6
    Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
    RIP: 0010:[] [] _raw_spin_lock+0xe/0x30
    RSP: 0018:ffff880221721cc8 EFLAGS: 00010046
    RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
    RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
    RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
    R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
    R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
    FS: 00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
    Stack:
    ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
    0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
    ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
    Call Trace:
    [] __queue_work+0x45/0x2d0
    [] queue_work_on+0x8e/0xa0
    [] queue_work+0x19/0x20
    [] irqfd_deactivate+0x4b/0x60
    [] kvm_irqfd+0x39d/0x580
    [] kvm_vm_ioctl+0x207/0x5b0
    [] ? update_curr+0xf5/0x180
    [] do_vfs_ioctl+0x98/0x550
    [] ? finish_task_switch+0x4e/0xe0
    [] ? __schedule+0x2ea/0x710
    [] sys_ioctl+0x57/0x90
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b
    Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
    RIP [] _raw_spin_lock+0xe/0x30
    RSP
    CR2: 0000000000000000
    ---[ end trace 13fb1e4b6e5ab21f ]---

    Signed-off-by: Asias He
    Acked-by: Cornelia Huck
    Signed-off-by: Gleb Natapov

    Asias He
     

06 May, 2013

1 commit

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     

05 May, 2013

1 commit


02 May, 2013

1 commit

  • This adds the API for userspace to instantiate an XICS device in a VM
    and connect VCPUs to it. The API consists of a new device type for
    the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
    functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
    which is used to assert and deassert interrupt inputs of the XICS.

    The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
    Each attribute within this group corresponds to the state of one
    interrupt source. The attribute number is the same as the interrupt
    source number.

    This does not support irq routing or irqfd yet.

    Signed-off-by: Paul Mackerras
    Acked-by: David Gibson
    Signed-off-by: Alexander Graf

    Paul Mackerras
     

27 Apr, 2013

9 commits

  • The hassle of getting refcounting right was greater than the hassle
    of keeping a list of devices to destroy on VM exit.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Hook the MPIC code up to the KVM interfaces, add locking, etc.

    Signed-off-by: Scott Wood
    [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently, devices that are emulated inside KVM are configured in a
    hardcoded manner based on an assumption that any given architecture
    only has one way to do it. If there's any need to access device state,
    it is done through inflexible one-purpose-only IOCTLs (e.g.
    KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
    cumbersome and depletes a limited numberspace.

    This API provides a mechanism to instantiate a device of a certain
    type, returning an ID that can be used to set/get attributes of the
    device. Attributes may include configuration parameters (e.g.
    register base address), device state, operational commands, etc. It
    is similar to the ONE_REG API, except that it acts on devices rather
    than vcpus.

    Both device types and individual attributes can be tested without having
    to create the device or get/set the attribute, without the need for
    separately managing enumerated capabilities.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Now that we have most irqfd code completely platform agnostic, let's move
    irqfd's resample capability return to generic code as well.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Setting up IRQ routes is nothing IOAPIC specific. Extract everything
    that really is generic code into irqchip.c and only leave the ioapic
    specific bits to irq_comm.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The current irq_comm.c file contains pieces of code that are generic
    across different irqchip implementations, as well as code that is
    fully IOAPIC specific.

    Split the generic bits out into irqchip.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The IRQ routing set ioctl lives in the hacky device assignment code inside
    of KVM today. This is definitely the wrong place for it. Move it to the much
    more natural kvm_main.c.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • Quite a bit of code in KVM has been conditionalized on availability of
    IOAPIC emulation. However, most of it is generically applicable to
    platforms that don't have an IOPIC, but a different type of irq chip.

    Make code that only relies on IRQ routing, not an APIC itself, on
    CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     
  • The concept of routing interrupt lines to an irqchip is nothing
    that is IOAPIC specific. Every irqchip has a maximum number of pins
    that can be linked to irq lines.

    So let's add a new define that allows us to reuse generic code for
    non-IOAPIC platforms.

    Signed-off-by: Alexander Graf
    Acked-by: Michael S. Tsirkin

    Alexander Graf
     

17 Apr, 2013

3 commits


16 Apr, 2013

7 commits

  • Current interrupt coalescing logci which only used by RTC has conflict
    with Posted Interrupt.
    This patch introduces a new mechinism to use eoi to track interrupt:
    When delivering an interrupt to vcpu, the pending_eoi set to number of
    vcpu that received the interrupt. And decrease it when each vcpu writing
    eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus
    write eoi.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Userspace may deliver RTC interrupt without query the status. So we
    want to track RTC EOI for this case.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Need the EOI to track interrupt deliver status, so force vmexit
    on EOI for rtc interrupt when enabling virtual interrupt delivery.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • restore rtc_status from migration or save/restore

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Add a new parameter to know vcpus who received the interrupt.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • rtc_status is used to track RTC interrupt delivery status. The pending_eoi
    will be increased by vcpu who received RTC interrupt and will be decreased
    when EOI to this interrupt.
    Also, we use dest_map to record the destination vcpu to avoid the case that
    vcpu who didn't get the RTC interupt, but issued EOI with same vector of RTC
    and descreased pending_eoi by mistake.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     
  • Add vcpu info to ioapic_update_eoi, so we can know which vcpu
    issued this EOI.

    Signed-off-by: Yang Zhang
    Reviewed-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Yang Zhang
     

08 Apr, 2013

2 commits

  • The routine kvm_spurious_fault() is an x86 specific routine, so
    move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c.

    Fixes this sparse warning when building on arm64:

    virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     
  • The routines get_user_page_nowait(), kvm_io_bus_sort_cmp(), kvm_io_bus_insert_dev()
    and kvm_io_bus_get_first_dev() are only referenced within kvm_main.c, so give them
    static linkage.

    Fixes sparse warnings like these:

    virt/kvm/kvm_main.c: warning: symbol 'get_user_page_nowait' was not declared. Should it be static?

    Signed-off-by: Geoff Levand
    Signed-off-by: Gleb Natapov

    Geoff Levand
     

07 Apr, 2013

3 commits


21 Mar, 2013

1 commit

  • Merge reason:

    From: Alexander Graf

    "Just recently this really important patch got pulled into Linus' tree for 3.9:

    commit 1674400aaee5b466c595a8fc310488263ce888c7
    Author: Anton Blanchard samba.org>
    Date: Tue Mar 12 01:51:51 2013 +0000

    Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

    Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

    * upstream/master: (653 commits)
    PCI: Use ROM images from firmware only if no other ROM source available
    sparc: remove unused "config BITS"
    sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
    KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
    KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
    KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
    arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
    arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
    inet: limit length of fragment queue hash table bucket lists
    qeth: Fix scatter-gather regression
    qeth: Fix invalid router settings handling
    qeth: delay feature trace
    sgy-cts1000: Remove __dev* attributes
    KVM: x86: fix deadlock in clock-in-progress request handling
    KVM: allow host header to be included even for !CONFIG_KVM
    hwmon: (lm75) Fix tcn75 prefix
    hwmon: (lm75.h) Update header inclusion
    MAINTAINERS: Remove Mark M. Hoffman
    xfs: ensure we capture IO errors correctly
    xfs: fix xfs_iomap_eof_prealloc_initial_size type
    ...

    Signed-off-by: Marcelo Tosatti

    Marcelo Tosatti
     

20 Mar, 2013

1 commit

  • If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
    that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
    that request. ioapic_read_indirect contains an
    ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
    non-debug builds. In recent kernels this allows a guest to cause a kernel
    oops by reading invalid memory. In older kernels (pre-3.3) this allows a
    guest to read from large ranges of host memory.

    Tested: tested against apic unit tests.

    Signed-off-by: Andrew Honig
    Signed-off-by: Marcelo Tosatti

    Andy Honig
     

11 Mar, 2013

2 commits

  • This helps in filtering out the eligible candidates further and
    thus potentially helps in quickly allowing preempted lockholders to run.
    Note that if a vcpu was spinning during preemption we filter them
    by checking whether they are preempted due to pause loop exit.

    Reviewed-by: Chegu Vinod
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Raghavendra K T
    Signed-off-by: Gleb Natapov

    Raghavendra K T
     
  • Note that we mark as preempted only when vcpu's task state was
    Running during preemption.

    Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
    for their precious suggestions. Thanks Srikar for an idea on avoiding
    rcu lock while checking task state that improved overcommit numbers.

    Reviewed-by: Chegu Vinod
    Reviewed-by: Marcelo Tosatti
    Signed-off-by: Raghavendra K T
    Signed-off-by: Gleb Natapov

    Raghavendra K T
     

06 Mar, 2013

2 commits

  • Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
    devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     
  • Currently, eventfd introduces module_init/module_exit functions
    to initialize/cleanup the irqfd workqueue. This only works, however,
    if no other module_init/module_exit functions are built into the
    same module.

    Let's just move the initialization and cleanup to kvm_init and kvm_exit.
    This way, it is also clearer where kvm startup may fail.

    Signed-off-by: Cornelia Huck
    Signed-off-by: Marcelo Tosatti

    Cornelia Huck
     

05 Mar, 2013

2 commits