19 Dec, 2014

1 commit

  • Pull KVM update from Paolo Bonzini:
    "3.19 changes for KVM:

    - spring cleaning: removed support for IA64, and for hardware-
    assisted virtualization on the PPC970

    - ARM, PPC, s390 all had only small fixes

    For x86:
    - small performance improvements (though only on weird guests)
    - usual round of hardware-compliancy fixes from Nadav
    - APICv fixes
    - XSAVES support for hosts and guests. XSAVES hosts were broken
    because the (non-KVM) XSAVES patches inadvertently changed the KVM
    userspace ABI whenever XSAVES was enabled; hence, this part is
    going to stable. Guest support is just a matter of exposing the
    feature and CPUID leaves support"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
    KVM: move APIC types to arch/x86/
    KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
    KVM: PPC: Book3S HV: Improve H_CONFER implementation
    KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
    KVM: PPC: Book3S HV: Remove code for PPC970 processors
    KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
    KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
    arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
    arch: powerpc: kvm: book3s_pr.c: Remove unused function
    arch: powerpc: kvm: book3s.c: Remove some unused functions
    arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
    KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
    KVM: PPC: Book3S HV: ptes are big endian
    KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
    KVM: PPC: Book3S HV: Fix KSM memory corruption
    KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
    KVM: PPC: Book3S HV: Fix computation of tlbie operand
    KVM: PPC: Book3S HV: Add missing HPTE unlock
    KVM: PPC: BookE: Improve irq inject tracepoint
    arm/arm64: KVM: Require in-kernel vgic for the arch timers
    ...

    Linus Torvalds
     

15 Dec, 2014

3 commits

  • …git/kvmarm/kvmarm into HEAD

    Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
    problems, clarifies VCPU init, and fixes a regression concerning the
    VGIC init flow.

    Conflicts:
    arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]

    Paolo Bonzini
     
  • It is curently possible to run a VM with architected timers support
    without creating an in-kernel VGIC, which will result in interrupts from
    the virtual timer going nowhere.

    To address this issue, move the architected timers initialization to the
    time when we run a VCPU for the first time, and then only initialize
    (and enable) the architected timers if we have a properly created and
    initialized in-kernel VGIC.

    When injecting interrupts from the virtual timer to the vgic, the
    current setup should ensure that this never calls an on-demand init of
    the VGIC, which is the only call path that could return an error from
    kvm_vgic_inject_irq(), so capture the return value and raise a warning
    if there's an error there.

    We also change the kvm_timer_init() function from returning an int to be
    a void function, since the function always succeeds.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Userspace assumes that it can wire up IRQ injections after having
    created all VCPUs and after having created the VGIC, but potentially
    before starting the first VCPU. This can currently lead to lost IRQs
    because the state of that IRQ injection is not stored anywhere and we
    don't return an error to userspace.

    We haven't seen this problem manifest itself yet, presumably because
    guests reset the devices on boot, but this could cause issues with
    migration and other non-standard startup configurations.

    Reviewed-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

13 Dec, 2014

3 commits

  • Some code paths will need to check to see if the internal state of the
    vgic has been initialized (such as when creating new VCPUs), so
    introduce such a macro that checks the nr_cpus field which is set when
    the vgic has been initialized.

    Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
    vgic_init() will call this function, and code should never errornously
    assume the vgic to be properly initialized after an error.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • The vgic_initialized() macro currently returns the state of the
    vgic->ready flag, which indicates if the vgic is ready to be used when
    running a VM, not specifically if its internal state has been
    initialized.

    Rename the macro accordingly in preparation for a more nuanced
    initialization flow.

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • VGIC initialization currently happens in three phases:
    (1) kvm_vgic_create() (triggered by userspace GIC creation)
    (2) vgic_init_maps() (triggered by userspace GIC register read/write
    requests, or from kvm_vgic_init() if not already run)
    (3) kvm_vgic_init() (triggered by first VM run)

    We were doing initialization of some state to correspond with the
    state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
    since it will overwrite changes made by userspace using the
    register access APIs before the VM is run. Move this initialization
    earlier, into the vgic_init_maps() phase.

    This fixes a bug where QEMU could successfully restore a saved
    VM state snapshot into a VM that had already been run, but could
    not restore it "from cold" using the -loadvm command line option
    (the symptoms being that the restored VM would run but interrupts
    were ignored).

    Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
    kvm_vgic_map_resources.

    [ This patch is originally written by Peter Maydell, but I have
    modified it somewhat heavily, renaming various bits and moving code
    around. If something is broken, I am to be blamed. - Christoffer ]

    Acked-by: Marc Zyngier
    Reviewed-by: Eric Auger
    Signed-off-by: Peter Maydell
    Signed-off-by: Christoffer Dall

    Peter Maydell
     

04 Dec, 2014

6 commits

  • We currently track the pid of the task that runs the VCPU in vcpu_load.
    If a yield to that VCPU is triggered while the PID of the wrong thread
    is active, the wrong thread might receive a yield, but this will most
    likely not help the executing thread at all. Instead, if we only track
    the pid on the KVM_RUN ioctl, there are two possibilities:

    1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
    the VCPU thread is waiting for. In this case, the VCPU thread is not
    runnable, but we also do not do a wrong yield.

    2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
    something that does not block the VCPU thread. In this case, the
    VCPU thread can receive the directed yield correctly.

    Signed-off-by: Christian Borntraeger
    CC: Rik van Riel
    CC: Raghavendra K T
    CC: Michael Mueller
    Signed-off-by: Paolo Bonzini

    Christian Borntraeger
     
  • kvm_enter_guest() has to be called with preemption disabled and will
    set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread
    is running and therefore needs no yield.

    However, the check on PF_VCPU is wrong on s390, where preemption has
    to stay enabled in order to correctly process page faults. Thus,
    s390 reenables preemption and starts to execute the guest. The thread
    might be scheduled out between kvm_enter_guest() and kvm_exit_guest(),
    resulting in PF_VCPU being set but not being run. When this happens,
    the opportunity for directed yield is missed.

    However, this check is done already in kvm_vcpu_on_spin before calling
    kvm_vcpu_yield_loop:

    if (!ACCESS_ONCE(vcpu->preempted))
    continue;

    so the check on PF_VCPU is superfluous in general, and this patch
    removes it.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Paolo Bonzini

    David Hildenbrand
     
  • Current linear search doesn't scale well when
    large amount of memslots is used and looked up slot
    is not in the beginning memslots array.
    Taking in account that memslots don't overlap, it's
    possible to switch sorting order of memslots array from
    'npages' to 'base_gfn' and use binary search for
    memslot lookup by GFN.

    As result of switching to binary search lookup times
    are reduced with large amount of memslots.

    Following is a table of search_memslot() cycles
    during WS2008R2 guest boot.

    boot, boot + ~10 min
    mostly same of using it,
    slot lookup randomized lookup
    max average average
    cycles cycles cycles

    13 slots : 1450 28 30

    13 slots : 1400 30 40
    binary search

    117 slots : 13000 30 460

    117 slots : 2000 35 180
    binary search

    Signed-off-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Igor Mammedov
     
  • it will allow to use binary search for GFN -> memslot
    lookups, reducing lookup cost with large slots amount.

    Signed-off-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Igor Mammedov
     
  • UP/DOWN shift loops will shift array in needed
    direction and stop at place where new slot should
    be placed regardless of old slot size.

    Signed-off-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Igor Mammedov
     
  • if number of pages haven't changed sorting algorithm
    will do nothing, so there is no need to do extra check
    to avoid entering sorting logic.

    Signed-off-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Igor Mammedov
     

26 Nov, 2014

3 commits

  • This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
    kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

    The problem being addressed by the patch above was that some ARM code
    based the memory mapping attributes of a pfn on the return value of
    kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
    be mapped as device memory.

    However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
    and the existing non-ARM users were already using it in a way which
    suggests that its name should probably have been 'kvm_is_reserved_pfn'
    from the beginning, e.g., whether or not to call get_page/put_page on
    it etc. This means that returning false for the zero page is a mistake
    and the patch above should be reverted.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Paolo Bonzini

    Ard Biesheuvel
     
  • If we detect another vCPU is running we just exit and return 0 as if we
    succesfully created the VGIC, but the VGIC wouldn't actual be created.

    This shouldn't break in-kernel behavior because the kernel will not
    observe the failed the attempt to create the VGIC, but userspace could
    be rightfully confused.

    Cc: Andre Przywara
    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier
    Signed-off-by: Paolo Bonzini

    Christoffer Dall
     
  • When call kvm_vgic_inject_irq to inject interrupt, we can known which
    vcpu the interrupt for by the irq_num and the cpuid. So we should just
    kick this vcpu to avoid iterating through all.

    Reviewed-by: Christoffer Dall
    Signed-off-by: Shannon Zhao
    Signed-off-by: Marc Zyngier

    Shannon Zhao
     

25 Nov, 2014

3 commits

  • When 'injecting' an edge-triggered interrupt with a falling edge we
    shouldn't clear the pending state on the distributor. In fact, we
    don't, because the check in vgic_validate_injection would prevent us
    from ever reaching this bit of code.

    Remove the unreachable snippet.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Marc Zyngier

    Christoffer Dall
     
  • This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
    kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

    The problem being addressed by the patch above was that some ARM code
    based the memory mapping attributes of a pfn on the return value of
    kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
    be mapped as device memory.

    However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
    and the existing non-ARM users were already using it in a way which
    suggests that its name should probably have been 'kvm_is_reserved_pfn'
    from the beginning, e.g., whether or not to call get_page/put_page on
    it etc. This means that returning false for the zero page is a mistake
    and the patch above should be reverted.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Marc Zyngier

    Ard Biesheuvel
     
  • When vgic_update_irq_pending with level-sensitive false, it is need to
    deactivates an interrupt, and, it can go to out directly.
    Here return a false value, because it will be not need to kick.

    Signed-off-by: wanghaibin
    Signed-off-by: Marc Zyngier

    wanghaibin
     

24 Nov, 2014

1 commit

  • Now that ia64 is gone, we can hide deprecated device assignment in x86.

    Notable changes:
    - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

    The easy parts were removed from generic kvm code, remaining
    - kvm_iommu_(un)map_pages() would require new code to be moved
    - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier

    Signed-off-by: Radim Krčmář
    Signed-off-by: Paolo Bonzini

    Radim Krčmář
     

22 Nov, 2014

1 commit


20 Nov, 2014

1 commit

  • KVM for ia64 has been marked as broken not just once, but twice even,
    and the last patch from the maintainer is now roughly 5 years old.
    Time for it to rest in peace.

    Acked-by: Gleb Natapov
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

17 Nov, 2014

2 commits


14 Nov, 2014

2 commits

  • This completes the optimization from the previous patch, by
    removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot.

    Reviewed-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • memslots is a sorted array. When a slot is changed, heapsort (lib/sort.c)
    would take O(n log n) time to update it; an optimized insertion sort will
    only cost O(n) on an array with just one item out of order.

    Replace sort() with a custom sort that takes advantage of memslots usage
    pattern and the known position of the changed slot.

    performance change of 128 memslots insertions with gradually increasing
    size (the worst case):

    heap sort custom sort
    max: 249747 2500 cycles

    with custom sort alg taking ~98% less then original
    update time.

    Signed-off-by: Igor Mammedov
    Signed-off-by: Paolo Bonzini

    Igor Mammedov
     

03 Nov, 2014

2 commits

  • commit 72dc67a69690 ("KVM: remove the usage of the mmap_sem for the protection of the memory slots.")
    changed the lock which will be taken. This should be reflected in the function
    commentary.

    Signed-off-by: Dominik Dingel
    Signed-off-by: Paolo Bonzini

    Dominik Dingel
     
  • KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM
    (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
    FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
    physical destination modes."

    In addition, the local-apic enables cluster mode broadcast. As Intel SDM
    10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all
    destination bits to one." This patch enables cluster mode broadcast.

    The fix tries to combine broadcast in different modes through a unified code.

    One rare case occurs when the source of IPI has its APIC disabled. In such
    case, the source can still issue IPIs, but since the source is not obliged to
    have the same LAPIC mode as the enabled ones, we cannot rely on it.
    Since it is a rare case, it is unoptimized and done on the slow-path.

    Signed-off-by: Nadav Amit
    Reviewed-by: Radim Krčmář
    Reviewed-by: Wanpeng Li
    [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from
    kvm_apic_broadcast. - Paolo]
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     

24 Oct, 2014

2 commits

  • After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically),
    kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
    (kvm-vfio: do not use module_init) move the dynamic register invoked by
    kvm_init in order to fix broke unloading of the kvm module. However,
    kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
    which lead to device type collision detection warning after kvm-intel
    module reinsmod.

    WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
    Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
    CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
    0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
    0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
    ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
    Call Trace:
    [] dump_stack+0x49/0x60
    [] warn_slowpath_common+0x7c/0x96
    [] ? kvm_init+0x234/0x282 [kvm]
    [] warn_slowpath_null+0x15/0x17
    [] kvm_init+0x234/0x282 [kvm]
    [] vmx_init+0x1bf/0x42a [kvm_intel]
    [] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
    [] do_one_initcall+0xe3/0x170
    [] ? __vunmap+0xad/0xb8
    [] do_init_module+0x2b/0x174
    [] load_module+0x43e/0x569
    [] ? do_init_module+0x174/0x174
    [] ? copy_module_from_user+0x39/0x82
    [] ? module_sect_show+0x20/0x20
    [] SyS_init_module+0x54/0x81
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 0626f4a3ddea56f3 ]---

    The bug can be reproduced by:

    rmmod kvm_intel.ko
    insmod kvm_intel.ko

    without rmmod/insmod kvm.ko
    This patch fixes the bug by unregistering kvm_device_ops of vfio when the
    kvm-intel module is removed.

    Reported-by: Liu Rongrong
    Fixes: 3c3c29fd0d7cddc32862c350d0700ce69953e3bd
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     
  • The third parameter of kvm_unpin_pages() when called from
    kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
    and not the page size.

    This error was facilitated with an inconsistent API: kvm_pin_pages() takes
    a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
    by matching the two.

    This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
    of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
    un-pinning for pages intended to be un-pinned (i.e. memory leak) but
    unfortunately potentially aggravated the number of pages we un-pin that
    should have stayed pinned. As far as I understand though, the same
    practical mitigations apply.

    This issue was found during review of Red Hat 6.6 patches to prepare
    Ksplice rebootless updates.

    Thanks to Vegard for his time on a late Friday evening to help me in
    understanding this code.

    Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
    Cc: stable@vger.kernel.org
    Signed-off-by: Quentin Casasnovas
    Signed-off-by: Vegard Nossum
    Signed-off-by: Jamie Iles
    Reviewed-by: Sasha Levin
    Signed-off-by: Paolo Bonzini

    Quentin Casasnovas
     

19 Oct, 2014

1 commit

  • Pull second batch of changes for KVM/{arm,arm64} from Marc Zyngier:
    "The most obvious thing is the sizeable MMU changes to support 48bit
    VAs on arm64.

    Summary:

    - support for 48bit IPA and VA (EL2)
    - a number of fixes for devices mapped into guests
    - yet another VGIC fix for BE
    - a fix for CPU hotplug
    - a few compile fixes (disabled VGIC, strict mm checks)"

    [ I'm pulling directly from Marc at the request of Paolo Bonzini, whose
    backpack was stolen at Düsseldorf airport and will do new keys and
    rebuild his web of trust. - Linus ]

    * tag 'kvm-arm-for-3.18-take-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
    arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs
    arm: kvm: STRICT_MM_TYPECHECKS fix for user_mem_abort
    arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE
    arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2
    arm/arm64: KVM: map MMIO regions at creation time
    arm64: kvm: define PAGE_S2_DEVICE as read-only by default
    ARM: kvm: define PAGE_S2_DEVICE as read-only by default
    arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap
    arm/arm64: KVM: fix potential NULL dereference in user_mem_abort()
    arm/arm64: KVM: use __GFP_ZERO not memset() to get zeroed pages
    ARM: KVM: fix vgic-disabled build
    arm: kvm: fix CPU hotplug

    Linus Torvalds
     

16 Oct, 2014

1 commit

  • The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
    store these as an array of two such registers on the vgic vcpu struct.
    However, we access them as a single 64-bit value or as a bitmap pointer
    in the generic vgic code, which breaks BE support.

    Instead, store them as u64 values on the vgic structure and do the
    word-swapping in the assembly code, which already handles the byte order
    for BE systems.

    Tested-by: Victor Kamensky
    Acked-by: Marc Zyngier
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     

15 Oct, 2014

1 commit

  • Pull IOMMU updates from Joerg Roedel:
    "This pull-request includes:

    - change in the IOMMU-API to convert the former iommu_domain_capable
    function to just iommu_capable

    - various fixes in handling RMRR ranges for the VT-d driver (one fix
    requires a device driver core change which was acked by Greg KH)

    - the AMD IOMMU driver now assigns and deassigns complete alias
    groups to fix issues with devices using the wrong PCI request-id

    - MMU-401 support for the ARM SMMU driver

    - multi-master IOMMU group support for the ARM SMMU driver

    - various other small fixes all over the place"

    * tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
    iommu/vt-d: Work around broken RMRR firmware entries
    iommu/vt-d: Store bus information in RMRR PCI device path
    iommu/vt-d: Only remove domain when device is removed
    driver core: Add BUS_NOTIFY_REMOVED_DEVICE event
    iommu/amd: Fix devid mapping for ivrs_ioapic override
    iommu/irq_remapping: Fix the regression of hpet irq remapping
    iommu: Fix bus notifier breakage
    iommu/amd: Split init_iommu_group() from iommu_init_device()
    iommu: Rework iommu_group_get_for_pci_dev()
    iommu: Make of_device_id array const
    amd_iommu: do not dereference a NULL pointer address.
    iommu/omap: Remove omap_iommu unused owner field
    iommu: Remove iommu_domain_has_cap() API function
    IB/usnic: Convert to use new iommu_capable() API function
    vfio: Convert to use new iommu_capable() API function
    kvm: iommu: Convert to use new iommu_capable() API function
    iommu/tegra: Convert to iommu_capable() API function
    iommu/msm: Convert to iommu_capable() API function
    iommu/vt-d: Convert to iommu_capable() API function
    iommu/fsl: Convert to iommu_capable() API function
    ...

    Linus Torvalds
     

10 Oct, 2014

2 commits

  • Add support for read-only MMIO passthrough mappings by adding a
    'writable' parameter to kvm_phys_addr_ioremap. For the moment,
    mappings will be read-write even if 'writable' is false, but once
    the definition of PAGE_S2_DEVICE gets changed, those mappings will
    be created read-only.

    Acked-by: Marc Zyngier
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Christoffer Dall

    Ard Biesheuvel
     
  • Pull PCI updates from Bjorn Helgaas:
    "The interesting things here are:

    - Turn on Config Request Retry Status Software Visibility. This
    caused hangs last time, but we included a fix this time.
    - Rework PCI device configuration to use _HPP/_HPX more aggressively
    - Allow PCI devices to be put into D3cold during system suspend
    - Add arm64 PCI support
    - Add APM X-Gene host bridge driver
    - Add TI Keystone host bridge driver
    - Add Xilinx AXI host bridge driver

    More detailed summary:

    Enumeration
    - Check Vendor ID only for Config Request Retry Status (Rajat Jain)
    - Enable Config Request Retry Status when supported (Rajat Jain)
    - Add generic domain handling (Catalin Marinas)
    - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado)

    Resource management
    - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu)
    - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr)

    PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Move _HPP & _HPX handling into core (Bjorn Helgaas)
    - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas)
    - Apply _HPP/_HPX to display devices (Bjorn Helgaas)
    - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas)
    - Fix wait time in pciehp timeout message (Yinghai Lu)
    - Add more pciehp Slot Control debug output (Yinghai Lu)
    - Stop disabling pciehp notifications during init (Yinghai Lu)

    MSI
    - Remove arch_msi_check_device() (Alexander Gordeev)
    - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev)
    - Move D0 check into pci_msi_check_device() (Alexander Gordeev)
    - Remove unused kobject from struct msi_desc (Yijing Wang)
    - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang)
    - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang)
    - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang)
    - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang)
    - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang)

    Power management
    - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki)
    - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki)

    AER
    - Add additional AER error strings (Gong Chen)
    - Make standalone includable (Thierry Reding)

    Virtualization
    - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson)
    - Add ACS quirk for Intel 10G NICs (Alex Williamson)
    - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp)
    - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson)
    - Add device flag helpers (Ethan Zhao)
    - Assume all Mellanox devices have broken INTx masking (Gavin Shan)

    Generic host bridge driver
    - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau)
    - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau)
    - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau)
    - Fix the conversion of IO ranges into IO resources (Liviu Dudau)
    - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau)
    - Add support for parsing PCI host bridge resources from DT (Liviu Dudau)
    - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau)
    - Add arm64 architectural support for PCI (Liviu Dudau)

    APM X-Gene
    - Add APM X-Gene PCIe driver (Tanmay Inamdar)
    - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar)

    Freescale i.MX6
    - Probe in module_init(), not fs_initcall() (Lucas Stach)
    - Delay enabling reference clock for SS until it stabilizes (Tim Harvey)

    Marvell MVEBU
    - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni)

    NVIDIA Tegra
    - Make sure the PCIe PLL is really reset (Eric Yuen)
    - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang)
    - Fix extended configuration space mapping (Peter Daifuku)
    - Implement resource hierarchy (Thierry Reding)
    - Clear CLKREQ# enable on port disable (Thierry Reding)
    - Add Tegra124 support (Thierry Reding)

    ST Microelectronics SPEAr13xx
    - Pass config resource through reg property (Pratyush Anand)

    Synopsys DesignWare
    - Use NULL instead of false (Fabio Estevam)
    - Parse bus-range property from devicetree (Lucas Stach)
    - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach)
    - Remove pci_assign_unassigned_resources() (Lucas Stach)
    - Check private_data validity in single place (Lucas Stach)
    - Setup and clear exactly one MSI at a time (Lucas Stach)
    - Remove open-coded bitmap operations (Lucas Stach)
    - Fix configuration base address when using 'reg' (Minghuan Lian)
    - Fix IO resource end address calculation (Minghuan Lian)
    - Rename get_msi_data() to get_msi_addr() (Minghuan Lian)
    - Add get_msi_data() to pcie_host_ops (Minghuan Lian)
    - Add support for v3.65 hardware (Murali Karicheri)
    - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand)

    TI Keystone
    - Add TI Keystone PCIe driver (Murali Karicheri)
    - Limit MRSS for all downstream devices (Murali Karicheri)
    - Assume controller is already in RC mode (Murali Karicheri)
    - Set device ID based on SoC to support multiple ports (Murali Karicheri)

    Xilinx AXI
    - Add Xilinx AXI PCIe driver (Srikanth Thokala)
    - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter)

    Miscellaneous
    - Clean up whitespace (Quentin Lambert)
    - Remove assignments from "if" conditions (Quentin Lambert)
    - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri)
    - x86: Mark DMI tables as initialization data (Mathias Krause)
    - x86: Move __init annotation to the correct place (Mathias Krause)
    - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause)
    - x86: Constify pci_mmcfg_probes[] array (Mathias Krause)
    - x86: Mark PCI BIOS initialization code as such (Mathias Krause)
    - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya)
    - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)"

    * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits)
    arm64: dts: Add APM X-Gene PCIe device tree nodes
    PCI: Add ACS quirk for AMD A88X southbridge devices
    PCI: xgene: Add APM X-Gene PCIe driver
    PCI: designware: Remove open-coded bitmap operations
    PCI/MSI: Remove unnecessary temporary variable
    PCI/MSI: Use __write_msi_msg() instead of write_msi_msg()
    MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg()
    PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg()
    PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints
    PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib
    PCI/MSI: Remove unused kobject from struct msi_desc
    PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported()
    PCI/MSI: Move D0 check into pci_msi_check_device()
    PCI/MSI: Remove arch_msi_check_device()
    irqchip: armada-370-xp: Remove arch_msi_check_device()
    PCI/MSI/PPC: Remove arch_msi_check_device()
    arm64: Add architectural support for PCI
    PCI: Add pci_remap_iospace() to map bus I/O resources
    of/pci: Add support for parsing PCI host bridge resources from DT
    of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr()
    ...

    Conflicts:
    arch/arm64/boot/dts/apm-storm.dtsi

    Linus Torvalds
     

08 Oct, 2014

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "Fixes and features for 3.18.

    Apart from the usual cleanups, here is the summary of new features:

    - s390 moves closer towards host large page support

    - PowerPC has improved support for debugging (both inside the guest
    and via gdbstub) and support for e6500 processors

    - ARM/ARM64 support read-only memory (which is necessary to put
    firmware in emulated NOR flash)

    - x86 has the usual emulator fixes and nested virtualization
    improvements (including improved Windows support on Intel and
    Jailhouse hypervisor support on AMD), adaptive PLE which helps
    overcommitting of huge guests. Also included are some patches that
    make KVM more friendly to memory hot-unplug, and fixes for rare
    caching bugs.

    Two patches have trivial mm/ parts that were acked by Rik and Andrew.

    Note: I will soon switch to a subkey for signing purposes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
    kvm: do not handle APIC access page if in-kernel irqchip is not in use
    KVM: s390: count vcpu wakeups in stat.halt_wakeup
    KVM: s390/facilities: allow TOD-CLOCK steering facility bit
    KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
    arm/arm64: KVM: Report correct FSC for unsupported fault types
    arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
    kvm: Fix kvm_get_page_retry_io __gup retval check
    arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
    kvm: x86: Unpin and remove kvm_arch->apic_access_page
    kvm: vmx: Implement set_apic_access_page_addr
    kvm: x86: Add request bit to reload APIC access page address
    kvm: Add arch specific mmu notifier for page invalidation
    kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
    kvm: Fix page ageing bugs
    kvm/x86/mmu: Pass gfn and level to rmapp callback.
    x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
    kvm: x86: use macros to compute bank MSRs
    KVM: x86: Remove debug assertion of non-PAE reserved bits
    kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
    kvm: Faults which trigger IO release the mmap_sem
    ...

    Linus Torvalds
     

02 Oct, 2014

1 commit


27 Sep, 2014

1 commit

  • …marm/kvmarm into kvm-next

    Changes for KVM for arm/arm64 for 3.18

    This includes a bunch of changes:
    - Support read-only memory slots on arm/arm64
    - Various changes to fix Sparse warnings
    - Correctly detect write vs. read Stage-2 faults
    - Various VGIC cleanups and fixes
    - Dynamic VGIC data strcuture sizing
    - Fix SGI set_clear_pend offset bug
    - Fix VTTBR_BADDR Mask
    - Correctly report the FSC on Stage-2 faults

    Conflicts:
    virt/kvm/eventfd.c
    [duplicate, different patch where the kvm-arm version broke x86.
    The kvm tree instead has the right one]

    Paolo Bonzini
     

26 Sep, 2014

2 commits

  • Confusion around -EBUSY and zero (inside a BUG_ON no less).

    Reported-by: Andrea Arcangeli
    Signed-off-by: Andres Lagar-Cavilla
    Signed-off-by: Paolo Bonzini

    Andres Lagar-Cavilla
     
  • The sgi values calculated in read_set_clear_sgi_pend_reg() and
    write_set_clear_sgi_pend_reg() were horribly incorrectly multiplied by 4
    with catastrophic results in that subfunctions ended up overwriting
    memory not allocated for the expected purpose.

    This showed up as bugs in kfree() and the kernel complaining a lot of
    you turn on memory debugging.

    This addresses: http://marc.info/?l=kvm&m=141164910007868&w=2

    Reported-by: Shannon Zhao
    Signed-off-by: Christoffer Dall

    Christoffer Dall