07 Mar, 2017

2 commits

  • The ITS spec says that ITS commands are only processed when the ITS
    is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking
    this into account.
    Fix this by checking the enabled state before handling CWRITER writes.

    On the other hand that means that CWRITER could advance while the ITS
    is disabled, and enabling it would need those commands to be processed.
    Fix this case as well by refactoring actual command processing and
    calling this from both the GITS_CWRITER and GITS_CTLR handlers.

    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Andre Przywara
    Signed-off-by: Marc Zyngier

    Andre Przywara
     
  • Currently, if a vcpu thread tries to change the active state of an
    interrupt which is already on the same vcpu's AP list, it will loop
    forever. Since the VGIC mmio handler is called after a vcpu has
    already synced back the LR state to the struct vgic_irq, we can just
    let it proceed safely.

    Cc: stable@vger.kernel.org
    Reviewed-by: Marc Zyngier
    Signed-off-by: Jintack Lim
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Jintack Lim
     

06 Mar, 2017

1 commit

  • Our GICv3 emulation always presents ICC_SRE_EL1 with DIB/DFB set to
    zero, which implies that there is a way to bypass the GIC and
    inject raw IRQ/FIQ by driving the CPU pins.

    Of course, we don't allow that when the GIC is configured, but
    we fail to indicate that to the guest. The obvious fix is to
    set these bits (and never let them being changed again).

    Reported-by: Peter Maydell
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

05 Mar, 2017

1 commit

  • Pull more KVM updates from Radim Krčmář:
    "Second batch of KVM changes for the 4.11 merge window:

    PPC:
    - correct assumption about ASDR on POWER9
    - fix MMIO emulation on POWER9

    x86:
    - add a simple test for ioperm
    - cleanup TSS (going through KVM tree as the whole undertaking was
    caused by VMX's use of TSS)
    - fix nVMX interrupt delivery
    - fix some performance counters in the guest

    ... and two cleanup patches"

    * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: nVMX: Fix pending events injection
    x86/kvm/vmx: remove unused variable in segment_base()
    selftests/x86: Add a basic selftest for ioperm
    x86/asm: Tidy up TSS limit code
    kvm: convert kvm.users_count from atomic_t to refcount_t
    KVM: x86: never specify a sample period for virtualized in_tx_cp counters
    KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
    KVM: PPC: Book3S HV: Fix software walk of guest process page tables

    Linus Torvalds
     

02 Mar, 2017

4 commits

  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …hed.h> into <linux/sched/signal.h>

    Fix up affected files that include this signal functionality via sched.h.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    The APIs that are going to be moved first are:

    mm_alloc()
    __mmdrop()
    mmdrop()
    mmdrop_async_fn()
    mmdrop_async()
    mmget_not_zero()
    mmput()
    mmput_async()
    get_task_mm()
    mm_access()
    mm_release()

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Radim Krčmář

    Elena Reshetova
     

28 Feb, 2017

2 commits

  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_users);/mmget\(\1\);/'
    git grep -l 'atomic_inc.*mm_users' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_users);/mmget\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-2-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • Apart from adding the helper function itself, the rest of the kernel is
    converted mechanically using:

    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
    git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    (Michal Hocko provided most of the kerneldoc comment.)

    Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     

25 Feb, 2017

1 commit

  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

17 Feb, 2017

6 commits


08 Feb, 2017

9 commits


01 Feb, 2017

1 commit

  • The only benefit of having kvm_vgic_inject_mapped_irq separate from
    kvm_vgic_inject_irq is that we pass a boolean that we use for error
    checking on the injection path.

    While this could potentially help in some aspect of robustness, it's
    also a little bit of a defensive move, and arguably callers into the
    vgic should have make sure they have marked their virtual IRQs as mapped
    if required.

    Acked-by: Marc Zyngier
    Reviewed-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

30 Jan, 2017

5 commits

  • Userspace requires to store and restore of line_level for
    level triggered interrupts using ioctl KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO.

    Reviewed-by: Eric Auger
    Signed-off-by: Vijaya Kumar K
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     
  • VGICv3 CPU interface registers are accessed using
    KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed
    as 64-bit. The cpu MPIDR value is passed along with register id.
    It is used to identify the cpu for registers access.

    The VM that supports SEIs expect it on destination machine to handle
    guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility.
    Similarly, VM that supports Affinity Level 3 that is required for AArch64
    mode, is required to be supported on destination machine. Hence checked
    for ICC_CTLR_EL1.A3V compatibility.

    The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC
    CPU registers for AArch64.

    For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but
    APIs are not implemented.

    Updated arch/arm/include/uapi/asm/kvm.h with new definitions
    required to compile for AArch32.

    The version of VGIC v3 specification is defined here
    Documentation/virtual/kvm/devices/arm-vgic-v3.txt

    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Pavel Fedin
    Signed-off-by: Vijaya Kumar K
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     
  • ICC_VMCR_EL2 supports virtual access to ICC_IGRPEN1_EL1.Enable
    and ICC_IGRPEN0_EL1.Enable fields. Add grpen0 and grpen1 member
    variables to struct vmcr to support read and write of these fields.

    Also refactor vgic_set_vmcr and vgic_get_vmcr() code.
    Drop ICH_VMCR_CTLR_SHIFT and ICH_VMCR_CTLR_MASK macros and instead
    use ICH_VMCR_EOI* and ICH_VMCR_CBPR* macros.

    Signed-off-by: Vijaya Kumar K
    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     
  • VGICv3 Distributor and Redistributor registers are accessed using
    KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS
    with KVM_SET_DEVICE_ATTR and KVM_GET_DEVICE_ATTR ioctls.
    These registers are accessed as 32-bit and cpu mpidr
    value passed along with register offset is used to identify the
    cpu for redistributor registers access.

    The version of VGIC v3 specification is defined here
    Documentation/virtual/kvm/devices/arm-vgic-v3.txt

    Also update arch/arm/include/uapi/asm/kvm.h to compile for
    AArch32 mode.

    Signed-off-by: Vijaya Kumar K
    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     
  • Read and write of some registers like ISPENDR and ICPENDR
    from userspace requires special handling when compared to
    guest access for these registers.

    Refer to Documentation/virtual/kvm/devices/arm-vgic-v3.txt
    for handling of ISPENDR, ICPENDR registers handling.

    Add infrastructure to support guest and userspace read
    and write for the required registers
    Also moved vgic_uaccess from vgic-mmio-v2.c to vgic-mmio.c

    Signed-off-by: Vijaya Kumar K
    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Vijaya Kumar K
     

25 Jan, 2017

2 commits

  • Add a file to debugfs to read the in-kernel state of the vgic. We don't
    do any locking of the entire VGIC state while traversing all the IRQs,
    so if the VM is running the user/developer may not see a quiesced state,
    but should take care to pause the VM using facilities in user space for
    that purpose.

    We also don't support LPIs yet, but they can be added easily if needed.

    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Tested-by: Andre Przywara
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • One of the goals behind the VGIC redesign was to get rid of cached or
    intermediate state in the data structures, but we decided to allow
    ourselves to precompute the pending value of an IRQ based on the line
    level and pending latch state. However, this has now become difficult
    to base proper GICv3 save/restore on, because there is a potential to
    modify the pending state without knowing if an interrupt is edge or
    level configured.

    See the following post and related message for more background:
    https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html

    This commit gets rid of the precomputed pending field in favor of a
    function that calculates the value when needed, irq_is_pending().

    The soft_pending field is renamed to pending_latch to represent that
    this latch is the equivalent hardware latch which gets manipulated by
    the input signal for edge-triggered interrupts and when writing to the
    SPENDR/CPENDR registers.

    After this commit save/restore code should be able to simply restore the
    pending_latch state, line_level state, and config state in any order and
    get the desired result.

    Reviewed-by: Andre Przywara
    Reviewed-by: Marc Zyngier
    Tested-by: Andre Przywara
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

17 Jan, 2017

1 commit


13 Jan, 2017

3 commits

  • Dmitry Vyukov reported that the syzkaller fuzzer triggered a
    deadlock in the vgic setup code when an error was detected, as
    the cleanup code tries to take a lock that is already held by
    the setup code.

    The fix is to avoid retaking the lock when cleaning up, by
    telling the cleanup function that we already hold it.

    Cc: stable@vger.kernel.org
    Reported-by: Dmitry Vyukov
    Reviewed-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • Current KVM world switch code is unintentionally setting wrong bits to
    CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
    timer. Bit positions of CNTHCTL_EL2 are changing depending on
    HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
    not set, but they are 11th and 10th bits respectively when E2H is set.

    In fact, on VHE we only need to set those bits once, not for every world
    switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
    1, which makes those bits have no effect for the host kernel execution.
    So we just set those bits once for guests, and that's it.

    Signed-off-by: Jintack Lim
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Jintack Lim
     
  • When a VCPU blocks (WFI) and has programmed the vtimer, we program a
    soft timer to expire in the future to wake up the vcpu thread when
    appropriate. Because such as wake up involves a vcpu kick, and the
    timer expire function can get called from interrupt context, and the
    kick may sleep, we have to schedule the kick in the work function.

    The work function currently has a warning that gets raised if it turns
    out that the timer shouldn't fire when it's run, which was added because
    the idea was that in that case the work should never have been cancelled.

    However, it turns out that this whole thing is racy and we can get
    spurious warnings. The problem is that we clear the armed flag in the
    work function, which may run in parallel with the
    kvm_timer_unschedule->timer_disarm() call. This results in a possible
    situation where the timer_disarm() call does not call
    cancel_work_sync(), which effectively synchronizes the completion of the
    work function with running the VCPU. As a result, the VCPU thread
    proceeds before the work function completees, causing changes to the
    timer state such that kvm_timer_should_fire(vcpu) returns false in the
    work function.

    All we do in the work function is to kick the VCPU, and an occasional
    rare extra kick never harmed anyone. Since the race above is extremely
    rare, we don't bother checking if the race happens but simply remove the
    check and the clearing of the armed flag from the work function.

    Reported-by: Matthias Brugger
    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     

12 Jan, 2017

1 commit

  • Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
    irqfd_shutdown+0x66/0xa0 [kvm]
    process_one_work+0x16b/0x480
    worker_thread+0x4b/0x500
    kthread+0x101/0x140
    ? process_one_work+0x480/0x480
    ? kthread_create_on_node+0x60/0x60
    ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

    The syzkaller folks reported a NULL pointer dereference that due to
    unregister an consumer which fails registration before. The syzkaller
    creates two VMs w/ an equal eventfd occasionally. So the second VM
    fails to register an irqbypass consumer. It will make irqfd as inactive
    and queue an workqueue work to shutdown irqfd and unregister the irqbypass
    consumer when eventfd is closed. However, the second consumer has been
    initialized though it fails registration. So the token(same as the first
    VM's) is taken to unregister the consumer through the workqueue, the
    consumer of the first VM is found and unregistered, then NULL deref incurred
    in the path of deleting consumer from the consumers list.

    This patch fixes it by making irq_bypass_register/unregister_consumer()
    looks for the consumer entry based on consumer pointer itself instead of
    token matching.

    Reported-by: Dmitry Vyukov
    Suggested-by: Alex Williamson
    Cc: stable@vger.kernel.org
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Cc: Dmitry Vyukov
    Cc: Alex Williamson
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     

26 Dec, 2016

1 commit

  • Pull timer type cleanups from Thomas Gleixner:
    "This series does a tree wide cleanup of types related to
    timers/timekeeping.

    - Get rid of cycles_t and use a plain u64. The type is not really
    helpful and caused more confusion than clarity

    - Get rid of the ktime union. The union has become useless as we use
    the scalar nanoseconds storage unconditionally now. The 32bit
    timespec alike storage got removed due to the Y2038 limitations
    some time ago.

    That leaves the odd union access around for no reason. Clean it up.

    Both changes have been done with coccinelle and a small amount of
    manual mopping up"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ktime: Get rid of ktime_equal()
    ktime: Cleanup ktime_set() usage
    ktime: Get rid of the union
    clocksource: Use a plain u64 instead of cycle_t

    Linus Torvalds