27 Mar, 2020

12 commits

  • Change the rpcrdma_xprt_disconnect() function so that it no longer
    waits for the DISCONNECTED event. This prevents blocking if the
    remote is unresponsive.

    In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is
    detached. Upon return from rpcrdma_xprt_disconnect(), the transport
    (r_xprt) is ready immediately for a new connection.

    The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now
    handled almost identically.

    However, because the lifetimes of rpcrdma_xprt structures and
    rpcrdma_ep structures are now independent, creating an rpcrdma_ep
    needs to take a module ref count. The ep now owns most of the
    hardware resources for a transport.

    Also, a kref is needed to ensure that rpcrdma_ep sticks around
    long enough for the cm_event_handler to finish.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • rpcrdma_cm_event_handler() is always passed an @id pointer that is
    valid. However, in a subsequent patch, we won't be able to extract
    an r_xprt in every case. So instead of using the r_xprt's
    presentation address strings, extract them from struct rdma_cm_id.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • I eventually want to allocate rpcrdma_ep separately from struct
    rpcrdma_xprt so that on occasion there can be more than one ep per
    xprt.

    The new struct rpcrdma_ep will contain all the fields currently in
    rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings
    for the connection, in addition to per-connection settings
    negotiated with the remote.

    Take this opportunity to rename the existing ep fields from rep_* to
    re_* to disambiguate these from struct rpcrdma_rep.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Completion errors after a disconnect often occur much sooner than a
    CM_DISCONNECT event. Use this to try to detect connection loss more
    quickly.

    Note that other kernel ULPs do take care to disconnect explicitly
    when a WR is flushed.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up:
    The upper layer serializes calls to xprt_rdma_close, so there is no
    need for an atomic bit operation, saving 8 bytes in rpcrdma_ia.

    This enables merging rpcrdma_ia_remove directly into the disconnect
    logic.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Move rdma_cm_id creation into rpcrdma_ep_create() so that it is now
    responsible for allocating all per-connection hardware resources.

    With this clean-up, all three arms of the switch statement in
    rpcrdma_ep_connect are exactly the same now, thus the switch can be
    removed.

    Because device removal behaves a little differently than
    disconnection, there is a little more work to be done before
    rpcrdma_ep_destroy() can release the connection's rdma_cm_id. So
    it is not quite symmetrical with rpcrdma_ep_create() yet.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Make a Protection Domain (PD) a per-connection resource rather than
    a per-transport resource. In other words, when the connection
    terminates, the PD is destroyed.

    Thus there is one less HW resource that remains allocated to a
    transport after a connection is closed.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Simplify the synopses of functions in the connect and
    disconnect paths in preparation for combining the rpcrdma_ia and
    struct rpcrdma_ep structures.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: Simplify the synopses of functions in the post_send path
    by combining the struct rpcrdma_ia and struct rpcrdma_ep arguments.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Clean up: prepare for combining the rpcrdma_ia and rpcrdma_ep
    structures. Take the opportunity to rename the function to be
    consistent with the "subsystem _ object _ verb" naming scheme.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Refactor rpcrdma_ep_create(), rpcrdma_ep_disconnect(), and
    rpcrdma_ep_destroy().

    rpcrdma_ep_create will be invoked at connect time instead of at
    transport set-up time. It will be responsible for allocating per-
    connection resources. In this patch it allocates the CQs and
    creates a QP. More to come.

    rpcrdma_ep_destroy() is the inverse functionality that is
    invoked at disconnect time. It will be responsible for releasing
    the CQs and QP.

    These changes should be safe to do because both connect and
    disconnect is guaranteed to be serialized by the transport send
    lock.

    This takes us another step closer to resolving the address and route
    only at connect time so that connection failover to another device
    will work correctly.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     
  • Two changes:
    - Show the number of SG entries that were mapped. This helps debug
    DMA-related problems.
    - Record the MR's resource ID instead of its memory address. This
    groups each MR with its associated rdma-tool output, and reduces
    needless exposure of memory addresses.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

16 Mar, 2020

9 commits

  • Linus Torvalds
     
  • Pull irq fix from Thomas Gleixner:
    "A single commit to handle an erratum in Cavium ThunderX to prevent
    access to GIC registers which are broken in the implementation"

    * tag 'irq-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2

    Linus Torvalds
     
  • Pull futex fix from Thomas Gleixner:
    "Fix for yet another subtle futex issue.

    The futex code used ihold() to prevent inodes from vanishing, but
    ihold() does not guarantee inode persistence. Replace the inode
    pointer with a per boot, machine wide, unique inode identifier.

    The second commit fixes the breakage of the hash mechanism which
    causes a 100% performance regression"

    * tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    futex: Unbreak futex hashing
    futex: Fix inode life-time issue

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "Two fixes for x86:

    - Map EFI runtime service data as encrypted when SEV is enabled.

    Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.

    - Remove the warning in the vector management code which triggered
    when a managed interrupt affinity changed outside of a CPU hotplug
    operation.

    The warning was correct until the recent core code change that
    introduced a CPU isolation feature which needs to migrate managed
    interrupts away from online CPUs under certain conditions to
    achieve the isolation"

    * tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/vector: Remove warning on managed interrupt migration
    x86/ioremap: Map EFI runtime services data as encrypted for SEV

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "A pile of perf fixes:

    Kernel side:

    - AMD uncore driver: Replace the open coded sanity check with the
    core variant, which provides the correct error code and also leaves
    a hint in dmesg

    Tooling:

    - Fix the stdio input handling with glibc versions >= 2.28

    - Unbreak the futex-wake benchmark which was reduced to 0 test
    threads due to the conversion to cpumaps

    - Initialize sigaction structs before invoking sys_sigactio()

    - Plug the mapfile memory leak in perf jevents

    - Fix off by one relative directory includes

    - Fix an undefined string comparison in perf diff"

    * tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
    tools: Fix off-by 1 relative directory includes
    perf jevents: Fix leak of mapfile memory
    perf bench: Clear struct sigaction before sigaction() syscall
    perf bench futex-wake: Restore thread count default to online CPU count
    perf top: Fix stdio interface input handling with glibc 2.28+
    perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
    perf symbols: Don't try to find a vmlinux file when looking for kernel modules
    perf bench: Share some global variables to fix build with gcc 10
    perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
    perf env: Do not return pointers to local variables
    perf tests bp_account: Make global variable static

    Linus Torvalds
     
  • Pull timer fix from Thomas Gleixner:
    "A single fix adding the missing time namespace adjustment in
    sys/sysinfo which caused sys/sysinfo to be inconsistent with
    /proc/uptime when read from a task inside a time namespace"

    * tag 'timers-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sys/sysinfo: Respect boottime inside time namespace

    Linus Torvalds
     
  • Pull RAS fixes from Thomas Gleixner:
    "Two RAS related fixes:

    - Shut down the per CPU thermal throttling poll work properly when a
    CPU goes offline.

    The missing shutdown caused the poll work to be migrated to a
    unbound worker which triggered warnings about the usage of
    smp_processor_id() in preemptible context

    - Fix the PPIN feature initialization which missed to enable the
    functionality when PPIN_CTL was enabled but the MSR locked against
    updates"

    * tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Fix logic and comments around MSR_PPIN_CTL
    x86/mce/therm_throt: Undo thermal polling properly on CPU offline

    Linus Torvalds
     
  • Pull EFI fixes from Thomas Gleixner:
    "Two EFI fixes:

    - Prevent a race and buffer overflow in the sysfs efivars interface
    which causes kernel memory corruption.

    - Add the missing NULL pointer checks in efivar_store_raw()"

    * tag 'efi-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi: Add a sanity check to efivar_store_raw()
    efi: Fix a race and a buffer overflow while reading efivars via sysfs

    Linus Torvalds
     
  • Pull IOMMU fixes from Joerg Roedel:

    - Intel VT-d fixes:
    - RCU list handling fixes
    - Replace WARN_TAINT with pr_warn + add_taint for reporting firmware
    issues
    - DebugFS fixes
    - Fix for hugepage handling in iova_to_phys implementation
    - Fix for handling VMD devices, which have a domain number which
    doesn't fit into 16 bits
    - Warning message fix

    - MSI allocation fix for iommu-dma code

    - Sign-extension fix for io page-table code

    - Fix for AMD-Vi to properly update the is-running bit when AVIC is
    used

    * tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
    iommu/vt-d: Populate debugfs if IOMMUs are detected
    iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE
    iommu/vt-d: Ignore devices with out-of-spec domain number
    iommu/vt-d: Fix the wrong printing in RHSA parsing
    iommu/vt-d: Fix debugfs register reads
    iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
    iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint
    iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
    iommu/vt-d: Silence RCU-list debugging warnings
    iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()
    iommu/dma: Fix MSI reservation allocation
    iommu/io-pgtable-arm: Fix IOVA validation for 32-bit
    iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
    iommu/vt-d: Fix RCU list debugging warnings

    Linus Torvalds
     

15 Mar, 2020

5 commits

  • …/maz/arm-platforms into irq/urgent

    Pull irqchip fixes from Marc Zyngier:

    - Add workaround for Cavium/Marvell ThunderX unimplemented GIC registers

    Thomas Gleixner
     
  • Pull i2c fixes from Wolfram Sang:
    "I2C has quite some regression fixes this time.

    One is also related to watchdogs, we have proper acks from Guenter for
    them"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: acpi: put device when verifying client fails
    misc: eeprom: at24: fix regulator underflow
    i2c: gpio: suppress error on probe defer
    macintosh: windfarm: fix MODINFO regression
    i2c: designware-pci: Fix BUG_ON during device removal
    i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device
    watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional
    watchdog: iTCO_wdt: Export vendorsupport

    Linus Torvalds
     
  • Pull ARC fixes from Vineet Gupta:

    - Fix __ALIGN_STR and __ALIGN to not use default junk padding

    - Misc Kconfig cleanups, header updates

    * tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: define __ALIGN_STR and __ALIGN symbols for ARC
    ARC: show_regs: reduce lines of output
    ARC: Replace by
    ARC: fpu: fix randconfig build error reported by 0-day test service
    ARC: fix some Kconfig typos
    ARC: Cleanup old Kconfig IO scheduler options

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "Bugfixes for x86 and s390"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
    KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
    KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
    KVM: s390: Also reset registers in sync regs for initial cpu reset
    KVM: fix Kconfig menu text for -Werror
    KVM: x86: remove stale comment from struct x86_emulate_ctxt
    KVM: x86: clear stale x86_emulate_ctxt->intercept value
    KVM: SVM: Fix the svm vmexit code for WRMSR
    KVM: X86: Fix dereference null cpufreq policy

    Linus Torvalds
     
  • Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel)
    gets populated only when DMA remapping is enabled (dmar_disabled = 0)
    irrespective of whether interrupt remapping is enabled or not.

    Instead, populate the intel iommu debugfs directory if any IOMMUs are
    detected.

    Cc: Dan Carpenter
    Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support")
    Signed-off-by: Megha Dey
    Signed-off-by: Lu Baolu
    Signed-off-by: Joerg Roedel

    Megha Dey
     

14 Mar, 2020

14 commits

  • Pull clk fixes from Stephen Boyd:
    "A small collection of fixes. I'll make another sweep soon to look for
    more fixes for this -rc series.

    - Mark device node const in of_clk_get_parent APIs to ease landing
    changes in users later

    - Fix flag for Qualcomm SC7180 video clocks where we thought it would
    never turn off but actually hardware takes care of it

    - Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because
    this clk is always on anyway

    - Correct some bad dt-binding numbers for i.MX8MN SoCs"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
    clk: imx8mn: Fix incorrect clock defines
    clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk
    clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk
    of: clk: Make of_clk_get_parent_{count,name}() parameter const

    Linus Torvalds
     
  • Paolo Bonzini
     
  • When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
    with EVMCS GPA = 0 the host crashes because the

    evmcs_gpa != vmx->nested.hv_evmcs_vmptr

    condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
    false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
    happen on vmx->nested.hv_evmcs pointer dereference.

    Another problematic EVMCS ptr value is '-1' but it only causes host crash
    after nested_release_evmcs() invocation. The problem is exactly the same as
    with '0', we mistakenly think that the EVMCS pointer hasn't changed and
    thus nested.hv_evmcs_vmptr is valid.

    Resolve the issue by adding an additional !vmx->nested.hv_evmcs
    check to nested_vmx_handle_enlightened_vmptrld(), this way we will
    always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
    and this is supposed to catch all invalid EVMCS GPAs.

    Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
    to be consistent with initialization where we don't currently
    set hv_evmcs_vmptr to '-1'.

    Cc: stable@vger.kernel.org
    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Paolo Bonzini

    Vitaly Kuznetsov
     
  • …it/kvms390/linux into kvm-master

    KVM: s390: Fully do the CPU resets as intended

    With 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") we clarified
    the meaning of the reset ioctl to fully reset the CPU and not only the
    parts that can not be handled by userspace. Turns out that we missed
    some parts.

    Paolo Bonzini
     
  • Despite the architecture spec requiring that reserved registers in the GIC
    distributor memory map are RES0 (and thus are not allowed to generate
    an exception), the Cavium ThunderX (aka TX1) SoC explodes as such:

    [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode
    [ 0.000000] GICv3: 128 SPIs implemented
    [ 0.000000] GICv3: 0 Extended SPIs implemented
    [ 0.000000] Internal error: synchronous external abort: 96000210 [#1] SMP
    [ 0.000000] Modules linked in:
    [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-00035-g3cf6a3d5725f #7956
    [ 0.000000] Hardware name: cavium,thunder-88xx (DT)
    [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
    [ 0.000000] pc : __raw_readl+0x0/0x8
    [ 0.000000] lr : gic_init_bases+0x110/0x560
    [ 0.000000] sp : ffff800011243d90
    [ 0.000000] x29: ffff800011243d90 x28: 0000000000000000
    [ 0.000000] x27: 0000000000000018 x26: 0000000000000002
    [ 0.000000] x25: ffff8000116f0000 x24: ffff000fbe6a2c80
    [ 0.000000] x23: 0000000000000000 x22: ffff010fdc322b68
    [ 0.000000] x21: ffff800010a7a208 x20: 00000000009b0404
    [ 0.000000] x19: ffff80001124dad0 x18: 0000000000000010
    [ 0.000000] x17: 000000004d8d492b x16: 00000000f67eb9af
    [ 0.000000] x15: ffffffffffffffff x14: ffff800011249908
    [ 0.000000] x13: ffff800091243ae7 x12: ffff800011243af4
    [ 0.000000] x11: ffff80001126e000 x10: ffff800011243a70
    [ 0.000000] x9 : 00000000ffffffd0 x8 : ffff80001069c828
    [ 0.000000] x7 : 0000000000000059 x6 : ffff8000113fb4d1
    [ 0.000000] x5 : 0000000000000001 x4 : 0000000000000000
    [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000
    [ 0.000000] x1 : 0000000000000000 x0 : ffff8000116f000c
    [ 0.000000] Call trace:
    [ 0.000000] __raw_readl+0x0/0x8
    [ 0.000000] gic_of_init+0x188/0x224
    [ 0.000000] of_irq_init+0x200/0x3cc
    [ 0.000000] irqchip_init+0x1c/0x40
    [ 0.000000] init_IRQ+0x160/0x1d0
    [ 0.000000] start_kernel+0x2ec/0x4b8
    [ 0.000000] Code: a8c47bfd d65f03c0 d538d080 d65f03c0 (b9400000)

    when reading the GICv4.1 GICD_TYPER2 register, which is unexpected...

    Work around it by adding a new quirk for the following variants:

    ThunderX: CN88xx
    OCTEON TX: CN83xx, CN81xx
    OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx*

    and use this flag to avoid accessing GICD_TYPER2. Note that all
    reserved registers (including redistributors and ITS) are impacted
    by this erratum, but that only GICD_TYPER2 has to be worked around
    so far.

    Signed-off-by: Marc Zyngier
    Tested-by: Robert Richter
    Tested-by: Mark Salter
    Tested-by: Tim Harvey
    Acked-by: Catalin Marinas
    Acked-by: Robert Richter
    Link: https://lore.kernel.org/r/20191027144234.8395-11-maz@kernel.org
    Link: https://lore.kernel.org/r/20200311115649.26060-1-maz@kernel.org

    Marc Zyngier
     
  • Previously all fields of structure kvm_lapic_irq were not initialized
    before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
    an issue when any of those fields are used for processing a request.
    For example not initializing the msi_redir_hint field before passing
    to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
    kvm_apic_map_get_dest_lapic(). This will specifically happen when the
    kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
    value of msi_redir_hint, which should not happen as the request belongs
    to APIC fixed delivery mode and we do not want to deliver the
    interrupt only to the lowest priority candidate.

    This patch initializes all the fields of kvm_lapic_irq based on the
    values of ioapic redirect_entry object before passing it on to
    kvm_bitmap_or_dest_vcpus().

    Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
    Signed-off-by: Nitesh Narayan Lal
    Reviewed-by: Vitaly Kuznetsov
    [Set level to false since the value doesn't really matter. Suggested
    by Vitaly Kuznetsov. - Paolo]
    Signed-off-by: Paolo Bonzini

    Nitesh Narayan Lal
     
  • Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
    the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1
    is not supported[*], i.e. intercepting ENCLS to inject a #UD is
    unnecessary.

    Avoiding ENCLS-exiting even when it is reported as supported by the CPU
    works around a reported issue where SGX is "hard" disabled after an S3
    suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
    are enumerated as unsupported. While the root cause of the S3 issue is
    unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
    workaround for what is most likely a hardware or firmware issue. As a
    bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
    vmcs02.

    Note, SGX must be disabled in BIOS to take advantage of this workaround

    [*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
    globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
    cleared. Soft disabled meaning disabling SGX without clearing the
    primary CPUID bit (in leaf 0x7) and without poking into non-SGX
    CPU paths, e.g. for the VMCS controls.

    Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
    Reported-by: Toni Spets
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     
  • Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
    (de-)activation code") accidentally left out the ir_data pointer when
    calling modity_irte_ga(), which causes the function amd_iommu_update_ga()
    to return prematurely due to struct amd_ir_data.ref is NULL and
    the "is_run" bit of IRTE does not get updated properly.

    This results in bad I/O performance since IOMMU AVIC always generate GA Log
    entry and notify IOMMU driver and KVM when it receives interrupt from the
    PCI pass-through device instead of directly inject interrupt to the vCPU.

    Fixes by passing ir_data when calling modify_irte_ga() as done previously.

    Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
    Signed-off-by: Suravee Suthikulpanit
    Signed-off-by: Joerg Roedel

    Suravee Suthikulpanit
     
  • VMD subdevices are created with a PCI domain ID of 0x10000 or
    higher.

    These subdevices are also handled like all other PCI devices by
    dmar_pci_bus_notifier().

    However, when dmar_alloc_pci_notify_info() take records of such devices,
    it will truncate the domain ID to a u16 value (in info->seg).
    The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if
    it is 0000:00:02.0.

    In the unlucky event that a real device also exists at 0000:00:02.0 and
    also has a device-specific entry in the DMAR table,
    dmar_insert_dev_scope() will crash on:
      BUG_ON(i >= devices_cnt);

    That's basically a sanity check that only one PCI device matches a
    single DMAR entry; in this case we seem to have two matching devices.

    Fix this by ignoring devices that have a domain number higher than
    what can be looked up in the DMAR table.

    This problem was carefully diagnosed by Jian-Hong Pan.

    Signed-off-by: Lu Baolu
    Signed-off-by: Daniel Drake
    Fixes: 59ce0515cdaf3 ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens")
    Signed-off-by: Joerg Roedel

    Daniel Drake
     
  • When base address in RHSA structure doesn't match base address in
    each DRHD structure, the base address in last DRHD is printed out.

    This doesn't make sense when there are multiple DRHD units, fix it
    by printing the buggy RHSA's base address.

    Signed-off-by: Lu Baolu
    Signed-off-by: Zhenzhong Duan
    Fixes: fd0c8894893cb ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables")
    Signed-off-by: Joerg Roedel

    Zhenzhong Duan
     
  • Pull SCSI fixes from James Bottomley:
    "Two small fixes, both in drivers: ipr and ufs"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: ipr: Fix softlockup when rescanning devices in petitboot
    scsi: ufs: Fix possible unclocked access to auto hibern8 timer register

    Linus Torvalds
     
  • Pull NFS client bugfixes from Anna Schumaker:
    "These are mostly fscontext fixes, but there is also one that fixes
    collisions seen in fscache:

    - Ensure the fs_context has the correct fs_type when mounting and
    submounting

    - Fix leaking of ctx->nfs_server.hostname

    - Add minor version to fscache key to prevent collisions"

    * tag 'nfs-for-5.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
    nfs: add minor version to nfs_server_key for fscache
    NFS: Fix leak of ctx->nfs_server.hostname
    NFS: Don't hard-code the fs_type when submounting
    NFS: Ensure the fs_context has the correct fs_type before mounting

    Linus Torvalds
     
  • Pull fuse fix from Miklos Szeredi:
    "Fix an Oops introduced in v5.4"

    * tag 'fuse-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: fix stack use after return

    Linus Torvalds
     
  • Pull overlayfs fixes from Miklos Szeredi:
    "Fix three bugs introduced in this cycle"

    * tag 'ovl-fixes-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix lockdep warning for async write
    ovl: fix some xino configurations
    ovl: fix lock in ovl_llseek()

    Linus Torvalds