14 Jul, 2017

1 commit

  • Pull VFIO updates from Alex Williamson:

    - Include Intel XXV710 in INTx workaround (Alex Williamson)

    - Make use of ERR_CAST() for error return (Dan Carpenter)

    - Fix vfio_group release deadlock from iommu notifier (Alex Williamson)

    - Unset KVM-VFIO attributes only on group match (Alex Williamson)

    - Fix release path group/file matching with KVM-VFIO (Alex Williamson)

    - Remove unnecessary lock uses triggering lockdep splat (Alex Williamson)

    * tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfio:
    vfio: Remove unnecessary uses of vfio_container.group_lock
    vfio: New external user group/file match
    kvm-vfio: Decouple only when we match a group
    vfio: Fix group release deadlock
    vfio: Use ERR_CAST() instead of open coding it
    vfio/pci: Add Intel XXV710 to hidden INTx devices

    Linus Torvalds
     

07 Jul, 2017

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "PPC:
    - Better machine check handling for HV KVM
    - Ability to support guests with threads=2, 4 or 8 on POWER9
    - Fix for a race that could cause delayed recognition of signals
    - Fix for a bug where POWER9 guests could sleep with interrupts pending.

    ARM:
    - VCPU request overhaul
    - allow timer and PMU to have their interrupt number selected from userspace
    - workaround for Cavium erratum 30115
    - handling of memory poisonning
    - the usual crop of fixes and cleanups

    s390:
    - initial machine check forwarding
    - migration support for the CMMA page hinting information
    - cleanups and fixes

    x86:
    - nested VMX bugfixes and improvements
    - more reliable NMI window detection on AMD
    - APIC timer optimizations

    Generic:
    - VCPU request overhaul + documentation of common code patterns
    - kvm_stat improvements"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
    Update my email address
    kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
    x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
    kvm: x86: mmu: allow A/D bits to be disabled in an mmu
    x86: kvm: mmu: make spte mmio mask more explicit
    x86: kvm: mmu: dead code thanks to access tracking
    KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
    KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
    KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
    KVM: x86: remove ignored type attribute
    KVM: LAPIC: Fix lapic timer injection delay
    KVM: lapic: reorganize restart_apic_timer
    KVM: lapic: reorganize start_hv_timer
    kvm: nVMX: Check memory operand to INVVPID
    KVM: s390: Inject machine check into the nested guest
    KVM: s390: Inject machine check into the guest
    tools/kvm_stat: add new interactive command 'b'
    tools/kvm_stat: add new command line switch '-i'
    tools/kvm_stat: fix error on interactive command 'g'
    KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
    ...

    Linus Torvalds
     

06 Jul, 2017

1 commit

  • Pull arm64 updates from Will Deacon:

    - RAS reporting via GHES/APEI (ACPI)

    - Indirect ftrace trampolines for modules

    - Improvements to kernel fault reporting

    - Page poisoning

    - Sigframe cleanups and preparation for SVE context

    - Core dump fixes

    - Sparse fixes (mainly relating to endianness)

    - xgene SoC PMU v3 driver

    - Misc cleanups and non-critical fixes

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
    arm64: fix endianness annotation for 'struct jit_ctx' and friends
    arm64: cpuinfo: constify attribute_group structures.
    arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()
    arm64: ptrace: Remove redundant overrun check from compat_vfp_set()
    arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails
    arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn()
    arm64: fix endianness annotation in get_kaslr_seed()
    arm64: add missing conversion to __wsum in ip_fast_csum()
    arm64: fix endianness annotation in acpi_parking_protocol.c
    arm64: use readq() instead of readl() to read 64bit entry_point
    arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm()
    arm64: fix endianness annotation for aarch64_insn_write()
    arm64: fix endianness annotation in aarch64_insn_read()
    arm64: fix endianness annotation in call_undef_hook()
    arm64: fix endianness annotation for debug-monitors.c
    ras: mark stub functions as 'inline'
    arm64: pass endianness info to sparse
    arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels
    arm64: signal: Allow expansion of the signal frame
    acpi: apei: check for pending errors when probing GHES entries
    ...

    Linus Torvalds
     

30 Jun, 2017

1 commit


29 Jun, 2017

2 commits

  • At the point where the kvm-vfio pseudo device wants to release its
    vfio group reference, we can't always acquire a new reference to make
    that happen. The group can be in a state where we wouldn't allow a
    new reference to be added. This new helper function allows a caller
    to match a file to a group to facilitate this. Given a file and
    group, report if they match. Thus the caller needs to already have a
    group reference to match to the file. This allows the deletion of a
    group without acquiring a new reference.

    Signed-off-by: Alex Williamson
    Reviewed-by: Eric Auger
    Reviewed-by: Paolo Bonzini
    Tested-by: Eric Auger
    Cc: stable@vger.kernel.org

    Alex Williamson
     
  • Unset-KVM and decrement-assignment only when we find the group in our
    list. Otherwise we can get out of sync if the user triggers this for
    groups that aren't currently on our list.

    Signed-off-by: Alex Williamson
    Reviewed-by: Alexey Kardashevskiy
    Reviewed-by: Eric Auger
    Tested-by: Eric Auger
    Acked-by: Paolo Bonzini
    Cc: stable@vger.kernel.org

    Alex Williamson
     

27 Jun, 2017

2 commits


23 Jun, 2017

2 commits

  • Currently external aborts are unsupported by the guest abort
    handling. Add handling for SEAs so that the host kernel reports
    SEAs which occur in the guest kernel.

    When an SEA occurs in the guest kernel, the guest exits and is
    routed to kvm_handle_guest_abort(). Prior to this patch, a print
    message of an unsupported FSC would be printed and nothing else
    would happen. With this patch, the code gets routed to the APEI
    handling of SEAs in the host kernel to report the SEA information.

    Signed-off-by: Tyler Baicar
    Acked-by: Catalin Marinas
    Acked-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Signed-off-by: Will Deacon

    Tyler Baicar
     
  • Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64, notifications for
    broken memory can call memory_failure() in mm/memory-failure.c to offline
    pages of memory, possibly signalling user space processes and notifying all
    the in-kernel users.

    memory_failure() has two modes, early and late. Early is used by
    machine-managers like Qemu to receive a notification when a memory error is
    notified to the host. These can then be relayed to the guest before the
    affected page is accessed. To enable this, the process must set
    PR_MCE_KILL_EARLY in PR_MCE_KILL_SET using the prctl() syscall.

    Once the early notification has been handled, nothing stops the
    machine-manager or guest from accessing the affected page. If the
    machine-manager does this the page will fail to be mapped and SIGBUS will
    be sent. This patch adds the equivalent path for when the guest accesses
    the page, sending SIGBUS to the machine-manager.

    These two signals can be distinguished by the machine-manager using their
    si_code: BUS_MCEERR_AO for 'action optional' early notifications, and
    BUS_MCEERR_AR for 'action required' synchronous/late notifications.

    Do as x86 does, and deliver the SIGBUS when we discover pfn ==
    KVM_PFN_ERR_HWPOISON. Use the hugepage size as si_addr_lsb if this vma was
    allocated as a hugepage. Transparent hugepages will be split by
    memory_failure() before we see them here.

    Cc: Punit Agrawal
    Signed-off-by: James Morse
    Signed-off-by: Marc Zyngier

    James Morse
     

20 Jun, 2017

1 commit

  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

15 Jun, 2017

28 commits

  • When reading the cntpct_el0 in guest with VHE (Virtual Host Extension)
    enabled in host, the "Unsupported guest sys_reg access" error reported.
    The reason is cnthctl_el2.EL1PCTEN is not enabled, which is expected
    to be done in kvm_timer_init_vhe(). The problem is kvm_timer_init_vhe
    is called by cpu_init_hyp_mode, and which is called when VHE is disabled.
    This patch remove the incorrect call to kvm_timer_init_vhe() from
    cpu_init_hyp_mode(), and calls kvm_timer_init_vhe() to enable
    cnthctl_el2.EL1PCTEN in cpu_hyp_reinit().

    Fixes: 488f94d7212b ("KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hu Huajun
    Reviewed-by: Christoffer Dall
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Hu Huajun
     
  • Per ARM DDI 0487B.a, the registers are named ICC_IGRPEN*_EL1 rather than
    ICC_GRPEN*_EL1. Correct our mnemonics and comments to match, before we
    add more GICv3 register definitions.

    Signed-off-by: Mark Rutland
    Cc: Catalin Marinas
    Cc: Marc Zyngier
    Cc: kvmarm@lists.cs.columbia.edu
    Acked-by: Christoffer Dall
    Acked-by: Will Deacon
    Signed-off-by: Christoffer Dall

    Mark Rutland
     
  • A write-to-read-only GICv3 access should UNDEF at EL1. But since
    we're in complete paranoia-land with broken CPUs, let's assume the
    worse and gracefully handle the case.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • A read-from-write-only GICv3 access should UNDEF at EL1. But since
    we're in complete paranoia-land with broken CPUs, let's assume the
    worse and gracefully handle the case.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to facilitate debug, let's log which class of GICv3 system
    registers are trapped.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Now that we're able to safely handle common sysreg access, let's
    give the user the opportunity to enable it by passing a specific
    command-line option (vgic_v3.common_trap).

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Signed-off-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICC_PMR_EL1
    register, which is located in the ICH_VMCR_EL2.VPMR field.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICV_CTLR_EL1
    register. only EOIMode and CBPR are of interest here, as all the other
    bits directly come from ICH_VTR_EL2 and are Read-Only.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading the guest's view of the ICV_RPR_EL1
    register, returning the highest active priority.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for writing the guest's view of the ICC_DIR_EL1
    register, performing the deactivation of an interrupt if EOImode
    is set ot 1.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Some Cavium Thunder CPUs suffer a problem where a KVM guest may
    inadvertently cause the host kernel to quit receiving interrupts.

    Use the Group-0/1 trapping in order to deal with it.

    [maz]: Adapted patch to the Group-0/1 trapping, reworked commit log

    Tested-by: Alexander Graf
    Acked-by: Catalin Marinas
    Reviewed-by: Eric Auger
    Signed-off-by: David Daney
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    David Daney
     
  • Now that we're able to safely handle Group-0 sysreg access, let's
    give the user the opportunity to enable it by passing a specific
    command-line option (vgic_v3.group0_trap).

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to be able to trap Group-0 GICv3 system registers, we need to
    set ICH_HCR_EL2.TALL0 begore entering the guest. This is conditionnaly
    done after having restored the guest's state, and cleared on exit.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • A number of Group-0 registers can be handled by the same accessors
    as that of Group-1, so let's add the required system register encodings
    and catch them in the dispatching function.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICC_IGRPEN0_EL1
    register, which is located in the ICH_VMCR_EL2.VENG0 field.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICC_BPR0_EL1
    register, which is located in the ICH_VMCR_EL2.BPR0 field.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Now that we're able to safely handle Group-1 sysreg access, let's
    give the user the opportunity to enable it by passing a specific
    command-line option (vgic_v3.group1_trap).

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Signed-off-by: Marc Zyngier
    Acked-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to be able to trap Group-1 GICv3 system registers, we need to
    set ICH_HCR_EL2.TALL1 before entering the guest. This is conditionally
    done after having restored the guest's state, and cleared on exit.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Acked-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading the guest's view of the ICV_HPPIR1_EL1
    register. This is a simple parsing of the available LRs, extracting the
    highest available interrupt.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICV_AP1Rn_EL1
    registers. We just map them to the corresponding ICH_AP1Rn_EL2 registers.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for writing the guest's view of the ICC_EOIR1_EL1
    register. This involves dropping the priority of the interrupt,
    and deactivating it if required (EOImode == 0).

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading the guest's view of the ICC_IAR1_EL1
    register. This involves finding the highest priority Group-1
    interrupt, checking against both PMR and the active group
    priority, activating the interrupt and setting the group
    priority as active.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICC_IGRPEN1_EL1
    register, which is located in the ICH_VMCR_EL2.VENG1 field.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Add a handler for reading/writing the guest's view of the ICC_BPR1_EL1
    register, which is located in the ICH_VMCR_EL2.BPR1 field.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Reviewed-by: Christoffer Dall
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • In order to start handling guest access to GICv3 system registers,
    let's add a hook that will get called when we trap a system register
    access. This is gated by a new static key (vgic_v3_cpuif_trap).

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Reviewed-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • As we're about to trap CP15 accesses and handle them at EL2, we
    need to evaluate whether or not the condition flags are valid,
    as an implementation is allowed to trap despite the condition
    not being met.

    Tagging the function as __hyp_text allows this. We still rely on
    the cc_map array to be mapped at EL2 by virtue of being "const",
    and the linker to only emit relative references.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • As we're about to access the Active Priority registers a lot more,
    let's define accessors that take the register number as a parameter.

    Tested-by: Alexander Graf
    Acked-by: David Daney
    Reviewed-by: Eric Auger
    Signed-off-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Marc Zyngier
     
  • Marc Zyngier
     

08 Jun, 2017

1 commit

  • The PMU IRQ number is set through the VCPU device's KVM_SET_DEVICE_ATTR
    ioctl handler for the KVM_ARM_VCPU_PMU_V3_IRQ attribute, but there is no
    enforced or stated requirement that this must happen after initializing
    the VGIC. As a result, calling vgic_valid_spi() which relies on the
    nr_spis being set during the VGIC init can incorrectly fail.

    Introduce irq_is_spi, which determines if an IRQ number is within the
    SPI range without verifying it against the actual VGIC properties.

    Signed-off-by: Christoffer Dall
    Reviewed-by: Marc Zyngier

    Christoffer Dall