11 Mar, 2018

1 commit

  • commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream.

    Reported by syzkaller:

    pte_list_remove: ffff9714eb1f8078 0->BUG
    ------------[ cut here ]------------
    kernel BUG at arch/x86/kvm/mmu.c:1157!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
    Call Trace:
    drop_spte+0x83/0xb0 [kvm]
    mmu_page_zap_pte+0xcc/0xe0 [kvm]
    kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
    kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
    kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
    kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
    ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
    __mmu_notifier_release+0x79/0x110
    ? __mmu_notifier_release+0x5/0x110
    exit_mmap+0x15a/0x170
    ? do_exit+0x281/0xcb0
    mmput+0x66/0x160
    do_exit+0x2c9/0xcb0
    ? __context_tracking_exit.part.5+0x4a/0x150
    do_group_exit+0x50/0xd0
    SyS_exit_group+0x14/0x20
    do_syscall_64+0x73/0x1f0
    entry_SYSCALL64_slow_path+0x25/0x25

    The reason is that when creates new memslot, there is no guarantee for new
    memslot not overlap with private memslots. This can be triggered by the
    following program:

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    long r[16];

    int main()
    {
    void *p = valloc(0x4000);

    r[2] = open("/dev/kvm", 0);
    r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);

    uint64_t addr = 0xf000;
    ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
    r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
    ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
    ioctl(r[6], KVM_RUN, 0);
    ioctl(r[6], KVM_RUN, 0);

    struct kvm_userspace_memory_region mr = {
    .slot = 0,
    .flags = KVM_MEM_LOG_DIRTY_PAGES,
    .guest_phys_addr = 0xf000,
    .memory_size = 0x4000,
    .userspace_addr = (uintptr_t) p
    };
    ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
    return 0;
    }

    This patch fixes the bug by not adding a new memslot even if it
    overlaps with private memslots.

    Reported-by: Dmitry Vyukov
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Dmitry Vyukov
    Cc: Eric Biggers
    Cc: stable@vger.kernel.org
    Signed-off-by: Wanpeng Li

    Wanpeng Li
     

25 Dec, 2017

2 commits

  • [ Upstream commit 46bea48ac241fe0b413805952dda74dd0c09ba8b ]

    The kvm slabs can consume a significant amount of system memory
    and indeed in our production environment we have observed that
    a lot of machines are spending significant amount of memory that
    can not be left as system memory overhead. Also the allocations
    from these slabs can be triggered directly by user space applications
    which has access to kvm and thus a buggy application can leak
    such memory. So, these caches should be accounted to kmemcg.

    Signed-off-by: Shakeel Butt
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Shakeel Butt
     
  • [ Upstream commit 0292e169b2d9c8377a168778f0b16eadb1f578fd ]

    or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
    destroy VM.

    This is consistent with current vfio implementation.

    Signed-off-by: herongguang
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Herongguang (Stephen)
     

16 Dec, 2017

1 commit

  • commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream.

    The current pending table parsing code assumes that we keep the
    previous read of the pending bits, but keep that variable in
    the current block, making sure it is discarded on each loop.

    We end-up using whatever is on the stack. Who knows, it might
    just be the right thing...

    Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
    Cc: stable@vger.kernel.org # 4.8
    Reported-by: AKASHI Takahiro
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     

14 Dec, 2017

5 commits

  • [ Upstream commit a5e1e6ca94a8cec51571fd62e3eaec269717969c ]

    The ITS spec says that ITS commands are only processed when the ITS
    is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking
    this into account.
    Fix this by checking the enabled state before handling CWRITER writes.

    On the other hand that means that CWRITER could advance while the ITS
    is disabled, and enabling it would need those commands to be processed.
    Fix this case as well by refactoring actual command processing and
    calling this from both the GITS_CWRITER and GITS_CTLR handlers.

    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Andre Przywara
     
  • commit 686f294f2f1ae40705283dd413ca1e4c14f20f93 upstream.

    We miss a test against NULL after allocation.

    Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into generic ID validation")
    Reported-by: AKASHI Takahiro
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • commit 150009e2c70cc3c6e97f00e7595055765d32fb85 upstream.

    Using the size of the structure we're allocating is a good idea
    and avoids any surprise... In this case, we're happilly confusing
    kvm_kernel_irq_routing_entry and kvm_irq_routing_entry...

    Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing")
    Reported-by: AKASHI Takahiro
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • commit fc396e066318c0a02208c1d3f0b62950a7714999 upstream.

    We are incorrectly rearranging 32-bit words inside a 64-bit typed value
    for big endian systems, which would result in never marking a virtual
    interrupt as inactive on big endian systems (assuming 32 or fewer LRs on
    the hardware). Fix this by not doing any word order manipulation for
    the typed values.

    Acked-by: Christoffer Dall
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Christoffer Dall
     
  • commit b1394e745b9453dcb5b0671c205b770e87dedb87 upstream.

    Implementation of the unpinned APIC page didn't update the VMCS address
    cache when invalidation was done through range mmu notifiers.
    This became a problem when the page notifier was removed.

    Re-introduce the arch-specific helper and call it from ...range_start.

    Reported-by: Fabian Grünbichler
    Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr")
    Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
    Reviewed-by: Paolo Bonzini
    Reviewed-by: Andrea Arcangeli
    Tested-by: Wanpeng Li
    Tested-by: Fabian Grünbichler
    Signed-off-by: Radim Krčmář
    Signed-off-by: Greg Kroah-Hartman

    Radim Krčmář
     

10 Dec, 2017

1 commit

  • [ Upstream commit 63e41226afc3f7a044b70325566fa86ac3142538 ]

    When a VCPU blocks (WFI) and has programmed the vtimer, we program a
    soft timer to expire in the future to wake up the vcpu thread when
    appropriate. Because such as wake up involves a vcpu kick, and the
    timer expire function can get called from interrupt context, and the
    kick may sleep, we have to schedule the kick in the work function.

    The work function currently has a warning that gets raised if it turns
    out that the timer shouldn't fire when it's run, which was added because
    the idea was that in that case the work should never have been cancelled.

    However, it turns out that this whole thing is racy and we can get
    spurious warnings. The problem is that we clear the armed flag in the
    work function, which may run in parallel with the
    kvm_timer_unschedule->timer_disarm() call. This results in a possible
    situation where the timer_disarm() call does not call
    cancel_work_sync(), which effectively synchronizes the completion of the
    work function with running the VCPU. As a result, the VCPU thread
    proceeds before the work function completees, causing changes to the
    timer state such that kvm_timer_should_fire(vcpu) returns false in the
    work function.

    All we do in the work function is to kick the VCPU, and an occasional
    rare extra kick never harmed anyone. Since the race above is extremely
    rare, we don't bother checking if the race happens but simply remove the
    check and the clearing of the armed flag from the work function.

    Reported-by: Matthias Brugger
    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Christoffer Dall
     

28 Jul, 2017

1 commit

  • commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream.

    At the point where the kvm-vfio pseudo device wants to release its
    vfio group reference, we can't always acquire a new reference to make
    that happen. The group can be in a state where we wouldn't allow a
    new reference to be added. This new helper function allows a caller
    to match a file to a group to facilitate this. Given a file and
    group, report if they match. Thus the caller needs to already have a
    group reference to match to the file. This allows the deletion of a
    group without acquiring a new reference.

    Signed-off-by: Alex Williamson
    Reviewed-by: Eric Auger
    Reviewed-by: Paolo Bonzini
    Tested-by: Eric Auger
    Signed-off-by: Greg Kroah-Hartman

    Alex Williamson
     

14 Jun, 2017

2 commits

  • commit ddf42d068f8802de122bb7efdfcb3179336053f1 upstream.

    When an interrupt is injected with the HW bit set (indicating that
    deactivation should be propagated to the physical distributor),
    special care must be taken so that we never mark the corresponding
    LR with the Active+Pending state (as the pending state is kept in
    the physycal distributor).

    Cc: stable@vger.kernel.org
    Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • commit 3d6e77ad1489650afa20da92bb589c8778baa8da upstream.

    When an interrupt is injected with the HW bit set (indicating that
    deactivation should be propagated to the physical distributor),
    special care must be taken so that we never mark the corresponding
    LR with the Active+Pending state (as the pending state is kept in
    the physycal distributor).

    Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     

08 Apr, 2017

2 commits

  • commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

    No caller currently checks the return value of
    kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
    freeing their device. A stale reference will remain in the io_bus,
    getting at least used again, when the iobus gets teared down on
    kvm_destroy_vm() - leading to use after free errors.

    There is nothing the callers could do, except retrying over and over
    again.

    So let's simply remove the bus altogether, print an error and make
    sure no one can access this broken bus again (returning -ENOMEM on any
    attempt to access it).

    Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
    Reported-by: Dmitry Vyukov
    Reviewed-by: Cornelia Huck
    Signed-off-by: David Hildenbrand
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.

    When releasing the bus, let's clear the bus pointers to mark it out. If
    any further device unregister happens on this bus, we know that we're
    done if we found the bus being released already.

    Signed-off-by: Peter Xu
    Signed-off-by: Radim Krčmář
    Signed-off-by: Greg Kroah-Hartman

    Peter Xu
     

18 Mar, 2017

1 commit

  • commit 370a0ec1819990f8e2a93df7cc9c0146980ed45f upstream.

    Currently, if a vcpu thread tries to change the active state of an
    interrupt which is already on the same vcpu's AP list, it will loop
    forever. Since the VGIC mmio handler is called after a vcpu has
    already synced back the LR state to the struct vgic_irq, we can just
    let it proceed safely.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Jintack Lim
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Jintack Lim
     

12 Mar, 2017

1 commit

  • commit 0bdbf3b071986ba80731203683cf623d5c0cacb1 upstream.

    The IRQFD framework calls the architecture dependent function
    twice if the corresponding GSI type is edge triggered. For ARM,
    the function kvm_set_msi() is getting called twice whenever the
    IRQFD receives the event signal. The rest of the code path is
    trying to inject the MSI without any validation checks. No need
    to call the function vgic_its_inject_msi() second time to avoid
    an unnecessary overhead in IRQ queue logic. It also avoids the
    possibility of VM seeing the MSI twice.

    Simple fix, return -1 if the argument 'level' value is zero.

    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Shanker Donthineni
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Shanker Donthineni
     

26 Jan, 2017

1 commit

  • commit 1193e6aeecb36c74c48c7cd0f641acbbed9ddeef upstream.

    Dmitry Vyukov reported that the syzkaller fuzzer triggered a
    deadlock in the vgic setup code when an error was detected, as
    the cleanup code tries to take a lock that is already held by
    the setup code.

    The fix is to avoid retaking the lock when cleaning up, by
    telling the cleanup function that we already hold it.

    Reported-by: Dmitry Vyukov
    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     

20 Jan, 2017

1 commit

  • commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.

    Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
    irqfd_shutdown+0x66/0xa0 [kvm]
    process_one_work+0x16b/0x480
    worker_thread+0x4b/0x500
    kthread+0x101/0x140
    ? process_one_work+0x480/0x480
    ? kthread_create_on_node+0x60/0x60
    ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

    The syzkaller folks reported a NULL pointer dereference that due to
    unregister an consumer which fails registration before. The syzkaller
    creates two VMs w/ an equal eventfd occasionally. So the second VM
    fails to register an irqbypass consumer. It will make irqfd as inactive
    and queue an workqueue work to shutdown irqfd and unregister the irqbypass
    consumer when eventfd is closed. However, the second consumer has been
    initialized though it fails registration. So the token(same as the first
    VM's) is taken to unregister the consumer through the workqueue, the
    consumer of the first VM is found and unregistered, then NULL deref incurred
    in the path of deleting consumer from the consumers list.

    This patch fixes it by making irq_bypass_register/unregister_consumer()
    looks for the consumer entry based on consumer pointer itself instead of
    token matching.

    Reported-by: Dmitry Vyukov
    Suggested-by: Alex Williamson
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Dmitry Vyukov
    Cc: Alex Williamson
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Wanpeng Li
     

01 Dec, 2016

2 commits


24 Nov, 2016

1 commit

  • When we inject a level triggerered interrupt (and unless it
    is backed by the physical distributor - timer style), we request
    a maintenance interrupt. Part of the processing for that interrupt
    is to feed to the rest of KVM (and to the eventfd subsystem) the
    information that the interrupt has been EOIed.

    But that notification only makes sense for SPIs, and not PPIs
    (such as the PMU interrupt). Skip over the notification if
    the interrupt is not an SPI.

    Cc: stable@vger.kernel.org # 4.7+
    Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
    Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
    Reported-by: Catalin Marinas
    Tested-by: Catalin Marinas
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

20 Nov, 2016

2 commits

  • This was reported by syzkaller:

    [ INFO: possible recursive locking detected ]
    4.9.0-rc4+ #49 Not tainted
    ---------------------------------------------
    kworker/2:1/5658 is trying to acquire lock:
    ([ 1644.769018] (&work->work)
    [< inline >] list_empty include/linux/compiler.h:243
    [] flush_work+0x0/0x660 kernel/workqueue.c:1511

    but task is already holding lock:
    ([ 1644.769018] (&work->work)
    [] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093

    stack backtrace:
    CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events async_pf_execute
    ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
    0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
    ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
    Call Trace:
    ...
    [] flush_work+0x93/0x660 kernel/workqueue.c:2846
    [] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
    [] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
    [] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
    [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
    [] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
    [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731
    [] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
    [] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
    [] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
    [] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
    [] kthread+0x244/0x2d0 kernel/kthread.c:209
    [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433

    The reason is that kvm_put_kvm is causing the destruction of the VM, but
    the page fault is still on the ->queue list. The ->queue list is owned
    by the VCPU, not by the work items, so we cannot just add list_del to
    the work item.

    Instead, use work->vcpu to note async page faults that have been resolved
    and will be processed through the done list. There is no need to flush
    those.

    Cc: Dmitry Vyukov
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Radim Krčmář

    Paolo Bonzini
     
  • KVM/ARM updates for v4.9-rc6

    - Fix handling of the 32bit cycle counter
    - Fix cycle counter filtering

    Radim Krčmář
     

18 Nov, 2016

1 commit

  • KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
    But this function can't deals with PMCCFILTR correctly because the evtCount
    bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
    type of other PMXEVTYPER registers. To fix it, when eventsel == 0, this
    function shouldn't return immediately; instead it needs to check further
    if select_idx is ARMV8_PMU_CYCLE_IDX.

    Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
    blindly to attr.config. Instead it ought to convert the request to the
    "cpu cycle" event type (i.e. 0x11).

    To support this patch and to prevent duplicated definitions, a limited
    set of ARMv8 perf event types were relocated from perf_event.c to
    asm/perf_event.h.

    Cc: stable@vger.kernel.org # 4.6+
    Acked-by: Will Deacon
    Signed-off-by: Wei Huang
    Signed-off-by: Marc Zyngier

    Wei Huang
     

11 Nov, 2016

1 commit


05 Nov, 2016

3 commits

  • Pull KVM updates from Paolo Bonzini:
    "One NULL pointer dereference, and two fixes for regressions introduced
    during the merge window.

    The rest are fixes for MIPS, s390 and nested VMX"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: x86: Check memopp before dereference (CVE-2016-8630)
    kvm: nVMX: VMCLEAR an active shadow VMCS after last use
    KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
    KVM: x86: fix wbinvd_dirty_mask use-after-free
    kvm/x86: Show WRMSR data is in hex
    kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types
    KVM: document lock orders
    KVM: fix OOPS on flush_work
    KVM: s390: Fix STHYI buffer alignment for diag224
    KVM: MIPS: Precalculate MMIO load resume PC
    KVM: MIPS: Make ERET handle ERL before EXL
    KVM: MIPS: Fix lazy user ASID regenerate for SMP

    Linus Torvalds
     
  • In cases like IPI, we could be queueing an interrupt for a VCPU
    that is already running and is not about to exit, because the
    VCPU has entered the VM with the interrupt pending and would
    not trap on EOI'ing that interrupt. This could result to delays
    in interrupt deliveries or even loss of interrupts.
    To guarantee prompt interrupt injection, here we have to try to
    kick the VCPU.

    Signed-off-by: Shih-Wei Li
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Shih-Wei Li
     
  • In our VGIC implementation we limit the number of SPIs to a number
    that the userland application told us. Accordingly we limit the
    allocation of memory for virtual IRQs to that number.
    However in our MMIO dispatcher we didn't check if we ever access an
    IRQ beyond that limit, leading to out-of-bound accesses.
    Add a test against the number of allocated SPIs in check_region().
    Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which
    is not implemented on ARM(32).

    [maz: cleaned-up original patch]

    Cc: stable@vger.kernel.org
    Reviewed-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Andre Przywara
     

26 Oct, 2016

1 commit

  • The conversion done by commit 3706feacd007 ("KVM: Remove deprecated
    create_singlethread_workqueue") is broken. It flushes a single work
    item &irqfd->shutdown instead of all of them, and even worse if there
    is no irqfd on the list then you get a NULL pointer dereference.
    Revert the virt/kvm/eventfd.c part of that patch; to avoid the
    deprecated function, just allocate our own workqueue---it does
    not even have to be unbound---with alloc_workqueue.

    Fixes: 3706feacd007
    Reviewed-by: Cornelia Huck
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

25 Oct, 2016

1 commit

  • This patch unexports the low-level __get_user_pages() function.

    Recent refactoring of the get_user_pages* functions allow flags to be
    passed through get_user_pages() which eliminates the need for access to
    this function from its one user, kvm.

    We can see that the two calls to get_user_pages() which replace
    __get_user_pages() in kvm_main.c are equivalent by examining their call
    stacks:

    get_user_page_nowait():
    get_user_pages(start, 1, flags, page, NULL)
    __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, start, 1,
    flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

    check_user_page_hwpoison():
    get_user_pages(addr, 1, flags, NULL, NULL)
    __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
    NULL, NULL)

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Paolo Bonzini
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

19 Oct, 2016

1 commit

  • This removes the redundant 'write' and 'force' parameters from
    __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
    callers as use of this flag can result in surprising behaviour (and
    hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Paolo Bonzini
    Reviewed-by: Jan Kara
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

29 Sep, 2016

1 commit


28 Sep, 2016

2 commits

  • If the vgic hasn't been created and initialized, we shouldn't attempt to
    look at its data structures or flush/sync anything to the GIC hardware.

    This fixes an issue reported by Alexander Graf when using a userspace
    irqchip.

    Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
    Cc: stable@vger.kernel.org
    Reported-by: Alexander Graf
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • If userspace creates a PMU for the VCPU, but doesn't create an in-kernel
    irqchip, then we end up in a nasty path where we try to take an
    uninitialized spinlock, which can lead to all sorts of breakages.

    Luckily, QEMU always creates the VGIC before the PMU, so we can
    establish this as ABI and check for the VGIC in the PMU init stage.
    This can be relaxed at a later time if we want to support PMU with a
    userspace irqchip.

    Cc: stable@vger.kernel.org
    Cc: Shannon Zhao
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

22 Sep, 2016

5 commits

  • This patch allows to build and use vgic-v3 in 32-bit mode.

    Unfortunately, it can not be split in several steps without extra
    stubs to keep patches independent and bisectable. For instance,
    virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
    access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
    to be already defined.

    It is how support has been done:

    * handle SGI requests from the guest

    * report configured SRE on access to GICv3 cpu interface from the guest

    * required vgic-v3 macros are provided via uapi.h

    * static keys are used to select GIC backend

    * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
    the static inlines

    Acked-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • We have couple of 64-bit registers defined in GICv3 architecture, so
    unsigned long accesses to these registers will only access a single
    32-bit part of that regitser. On the other hand these registers can't
    be accessed as 64-bit with a single instruction like ldrd/strd or
    ldmia/stmia if we run a 32-bit host because KVM does not support
    access to MMIO space done by these instructions.

    It means that a 32-bit guest accesses these registers in 32-bit
    chunks, so the only thing we need to do is to ensure that
    extract_bytes() always takes 64-bit data.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • Well, this patch is looking ahead of time, but we'll get following
    compiler warnings as soon as we introduce vgic-v3 to 32-bit world

    CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer':
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow]
    value = (mpidr & GENMASK(23, 0)) << 32;
    ^
    In file included from ./include/linux/kernel.h:10:0,
    from ./include/asm-generic/bug.h:13,
    from ./arch/arm/include/asm/bug.h:59,
    from ./include/linux/bug.h:4,
    from ./include/linux/io.h:23,
    from ./arch/arm/include/asm/arch_gicv3.h:23,
    from ./include/linux/irqchip/arm-gic-v3.h:411,
    from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14:
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi':
    ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow]
    #define BIT(nr) (1UL << (nr))
    ^
    arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT'
    broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
    ^
    Let's fix them now.

    Acked-by: Marc Zyngier
    Signed-off-by: Vladimir Murzin
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was
    introduced to hide everything specific to vgic-v3 from 32-bit world.
    We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3
    will gone, but we don't have support for ITS there yet and we need to
    continue keeping ITS away.
    Introduce the new config option to prevent ITS code being build in
    32-bit mode when support for vgic-v3 is done.

    Signed-off-by: Vladimir Murzin
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Vladimir Murzin
     
  • So we can reuse the code under arch/arm

    Signed-off-by: Vladimir Murzin
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Vladimir Murzin