17 Apr, 2019

2 commits

  • commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.

    When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
    is not initialized which causes malfunction.

    Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()")
    Signed-off-by: Kefeng Wang
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Mukesh Ojha
    Cc: Marc Zyngier
    Cc:
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Kefeng Wang
     
  • commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.

    If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
    has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.

    This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
    the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
    walk the chain of parents and set irq wake on any chips that don't have the
    flag set either. If the intent is to call the .irq_set_wake() callback of
    the parent irqchip, then we expect irqchip implementations to omit the
    IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
    calls irq_chip_set_wake_parent().

    The problem has been observed on a Qualcomm sdm845 device where set wake
    fails on any GPIO interrupts after applying work in progress wakeup irq
    patches to the GPIO driver. The chain of chips looks like this:

    QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)

    The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
    GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
    set, and so does the grandparent ARM GIC.

    The GPIO driver doesn't know if the parent needs to set wake or not, so it
    unconditionally calls irq_chip_set_wake_parent() causing this function to
    return a failure because the parent irqchip (PDC) doesn't have the
    .irq_set_wake() callback set. Returning 0 instead makes everything work and
    irqs from the GPIO controller can be configured for wakeup.

    Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
    when a parent chip has IRQCHIP_SKIP_SET_WAKE set.

    [ tglx: Massaged changelog ]

    Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent")
    Signed-off-by: Stephen Boyd
    Signed-off-by: Thomas Gleixner
    Acked-by: Marc Zyngier
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-gpio@vger.kernel.org
    Cc: Lina Iyer
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     

06 Apr, 2019

1 commit

  • [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ]

    Waiman reported that on large systems with a large amount of interrupts the
    readout of /proc/stat takes a long time to sum up the interrupt
    statistics. In principle this is not a problem. but for unknown reasons
    some enterprise quality software reads /proc/stat with a high frequency.

    The reason for this is that interrupt statistics are accounted per cpu. So
    the /proc/stat logic has to sum up the interrupt stats for each interrupt.

    This can be largely avoided for interrupts which are not marked as
    'PER_CPU' interrupts by simply adding a per interrupt summation counter
    which is incremented along with the per interrupt per cpu counter.

    The PER_CPU interrupts need to avoid that and use only per cpu accounting
    because they share the interrupt number and the interrupt descriptor and
    concurrent updates would conflict or require unwanted synchronization.

    Reported-by: Waiman Long
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Waiman Long
    Reviewed-by: Marc Zyngier
    Reviewed-by: Davidlohr Bueso
    Cc: Matthew Wilcox
    Cc: Andrew Morton
    Cc: Alexey Dobriyan
    Cc: Kees Cook
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Davidlohr Bueso
    Cc: Miklos Szeredi
    Cc: Daniel Colascione
    Cc: Dave Chinner
    Cc: Randy Dunlap
    Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de

    8

    Thomas Gleixner
     

06 Mar, 2019

4 commits

  • [ Upstream commit bddda606ec76550dd63592e32a6e87e7d32583f7 ]

    If all CPUs in the irq_default_affinity mask are offline when an interrupt
    is initialized then irq_setup_affinity() can set an empty affinity mask for
    a newly allocated interrupt.

    Fix this by falling back to cpu_online_mask in case the resulting affinity
    mask is zero.

    Signed-off-by: Srinivas Ramana
    Signed-off-by: Thomas Gleixner
    Cc: linux-arm-msm@vger.kernel.org
    Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.org
    Signed-off-by: Sasha Levin

    Srinivas Ramana
     
  • [ Upstream commit e8da8794a7fd9eef1ec9a07f0d4897c68581c72b ]

    On large systems with multiple devices of the same class (e.g. NVMe disks,
    using managed interrupts), the kernel can affinitize these interrupts to a
    small subset of CPUs instead of spreading them out evenly.

    irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask
    of possible target CPUs which has the lowest number of interrupt vectors
    allocated.

    This is done by searching the CPU with the highest number of available
    vectors. While this is correct for non-managed CPUs it can select the wrong
    CPU for managed interrupts. Under certain constellations this results in
    affinitizing the managed interrupts of several devices to a single CPU in
    a set.

    The book keeping of available vectors works the following way:

    1) Non-managed interrupts:

    available is decremented when the interrupt is actually requested by
    the device driver and a vector is assigned. It's incremented when the
    interrupt and the vector are freed.

    2) Managed interrupts:

    Managed interrupts guarantee vector reservation when the MSI/MSI-X
    functionality of a device is enabled, which is achieved by reserving
    vectors in the bitmaps of the possible target CPUs. This reservation
    decrements the available count on each possible target CPU.

    When the interrupt is requested by the device driver then a vector is
    allocated from the reserved region. The operation is reversed when the
    interrupt is freed by the device driver. Neither of these operations
    affect the available count.

    The reservation persist up to the point where the MSI/MSI-X
    functionality is disabled and only this operation increments the
    available count again.

    For non-managed interrupts the available count is the correct selection
    criterion because the guaranteed reservations need to be taken into
    account. Using the allocated counter could lead to a failing allocation in
    the following situation (total vector space of 10 assumed):

    CPU0 CPU1
    available: 2 0
    allocated: 5 3 allocated: 3 3

    available: 4 4 allocated: 4 3

    available: 3 4 allocated: 4 4

    But the allocation of three managed interrupts starting from the same
    point will affinitize all of them to CPU0 because the available count is
    not affected by the allocation (see above). So the end result is:

    CPU0 CPU1
    available: 5 4
    allocated: 5 3

    Introduce a "managed_allocated" field in struct cpumap to track the vector
    allocation for managed interrupts separately. Use this information to
    select the target CPU when a vector is allocated for a managed interrupt,
    which results in more evenly distributed vector assignments. The above
    example results in the following allocations:

    CPU0 CPU1
    managed_allocated: 0 0 allocated: 3 3

    managed_allocated: 1 0 allocated: 3 4

    managed_allocated: 1 1 allocated: 4 4

    The allocation of non-managed interrupts is not affected by this change and
    is still evaluating the available count.

    The overall distribution of interrupt vectors for both types of interrupts
    might still not be perfectly even depending on the number of non-managed
    and managed interrupts in a system, but due to the reservation guarantee
    for managed interrupts this cannot be avoided.

    Expose the new field in debugfs as well.

    [ tglx: Clarified the background of the problem in the changelog and
    described it independent of NVME ]

    Signed-off-by: Long Li
    Signed-off-by: Thomas Gleixner
    Cc: Michael Kelley
    Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.com
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Long Li
     
  • [ Upstream commit 76f99ae5b54d48430d1f0c5512a84da0ff9761e0 ]

    Linux spreads out the non managed interrupt across the possible target CPUs
    to avoid vector space exhaustion.

    Managed interrupts are treated differently, as for them the vectors are
    reserved (with guarantee) when the interrupt descriptors are initialized.

    When the interrupt is requested a real vector is assigned. The assignment
    logic uses the first CPU in the affinity mask for assignment. If the
    interrupt has more than one CPU in the affinity mask, which happens when a
    multi queue device has less queues than CPUs, then doing the same search as
    for non managed interrupts makes sense as it puts the interrupt on the
    least interrupt plagued CPU. For single CPU affine vectors that's obviously
    a NOOP.

    Restructre the matrix allocation code so it does the 'best CPU' search, add
    the sanity check for an empty affinity mask and adapt the call site in the
    x86 vector management code.

    [ tglx: Added the empty mask check to the core and improved change log ]

    Signed-off-by: Dou Liyang
    Signed-off-by: Thomas Gleixner
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dou Liyang
     
  • [ Upstream commit 8ffe4e61c06a48324cfd97f1199bb9838acce2f2 ]

    Linux finds the CPU which has the lowest vector allocation count to spread
    out the non managed interrupts across the possible target CPUs, but does
    not do so for managed interrupts.

    Split out the CPU selection code into a helper function for reuse. No
    functional change.

    Signed-off-by: Dou Liyang
    Signed-off-by: Thomas Gleixner
    Cc: hpa@zytor.com
    Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.com
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dou Liyang
     

13 Feb, 2019

1 commit

  • [ Upstream commit b82592199032bf7c778f861b936287e37ebc9f62 ]

    If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
    which are allocated for a device, the interrupt affinity spreading code
    fails to spread them across all nodes.

    The reason is, that the spreading code starts from node 0 and continues up
    to the number of interrupts requested for allocation. This leaves the nodes
    past the last interrupt unused.

    This results in interrupt concentration on the first nodes which violates
    the assumption of the block layer that all nodes are covered evenly. As a
    consequence the NUMA nodes above the number of interrupts are all assigned
    to hardware queue 0 and therefore NUMA node 0, which results in bad
    performance and has CPU hotplug implications, because queue 0 gets shut
    down when the last CPU of node 0 is offlined.

    Go over all NUMA nodes and assign them round-robin to all requested
    interrupts to solve this.

    [ tglx: Massaged changelog ]

    Signed-off-by: Long Li
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ming Lei
    Cc: Michael Kelley
    Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com
    Signed-off-by: Sasha Levin

    Long Li
     

14 Nov, 2018

1 commit

  • commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream.

    Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
    threaded irqs") made detection of spurious interrupts work for threaded
    handlers by:

    a) incrementing a counter every time the thread returns IRQ_HANDLED, and
    b) checking whether that counter has increased every time the thread is
    woken.

    However for oneshot interrupts, the commit unmasks the interrupt before
    incrementing the counter. If another interrupt occurs right after
    unmasking but before the counter is incremented, that interrupt is
    incorrectly considered spurious:

    time
    | irq_thread()
    | irq_thread_fn()
    | action->thread_fn()
    | irq_finalize_oneshot()
    | unmask_threaded_irq() /* interrupt is unmasked */
    |
    | /* interrupt fires, incorrectly deemed spurious */
    |
    | atomic_inc(&desc->threads_handled); /* counter is incremented */
    v

    This is observed with a hi3110 CAN controller receiving data at high volume
    (from a separate machine sending with "cangen -g 0 -i -x"): The controller
    signals a huge number of interrupts (hundreds of millions per day) and
    every second there are about a dozen which are deemed spurious.

    In theory with high CPU load and the presence of higher priority tasks, the
    number of incorrectly detected spurious interrupts might increase beyond
    the 99,900 threshold and cause disablement of the interrupt.

    In practice it just increments the spurious interrupt count. But that can
    cause people to waste time investigating it over and over.

    Fix it by moving the accounting before the invocation of
    irq_finalize_oneshot().

    [ tglx: Folded change log update ]

    Fixes: 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs")
    Signed-off-by: Lukas Wunner
    Signed-off-by: Thomas Gleixner
    Cc: Mathias Duckeck
    Cc: Akshay Bhat
    Cc: Casey Fitzpatrick
    Cc: stable@vger.kernel.org # v3.16+
    Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de
    Signed-off-by: Greg Kroah-Hartman

    Lukas Wunner
     

14 Aug, 2018

1 commit

  • Pull genirq updates from Thomas Gleixner:
    "The irq departement provides:

    - A synchronization fix for free_irq() to synchronize just the
    removed interrupt thread on shared interrupt lines.

    - Consolidate the multi low level interrupt entry handling and mvoe
    it to the generic code instead of adding yet another copy for
    RISC-V

    - Refactoring of the ARM LPI allocator and LPI exposure to the
    hypervisor

    - Yet another interrupt chip driver for the JZ4725B SoC

    - Speed up for /proc/interrupts as people seem to love reading this
    file with high frequency

    - Miscellaneous fixes and updates"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
    genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
    openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
    arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
    ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
    irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
    irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
    dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
    dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
    irqchip/ingenic: Add support for the JZ4725B SoC
    irqchip/stm32: Add exti0 translation for stm32mp1
    genirq: Remove redundant NULL pointer check in __free_irq()
    irqchip/gic-v3-its: Honor hypervisor enforced LPI range
    irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
    irqchip/gic-v3-its: Drop chunk allocation compatibility
    irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
    irqchip/gic-v3-its: Use full range of LPIs
    irqchip/gic-v3-its: Refactor LPI allocator
    genirq: Synchronize only with single thread on free_irq()
    genirq: Update code comments wrt recycled thread_mask
    ...

    Linus Torvalds
     

06 Aug, 2018

1 commit


03 Aug, 2018

2 commits

  • The support of force threading interrupts which are set up with both a
    primary and a threaded handler wreckaged the setup of regular requested
    threaded interrupts (primary handler == NULL).

    The reason is that it does not check whether the primary handler is set to
    the default handler which wakes the handler thread. Instead it replaces the
    thread handler with the primary handler as it would do with force threaded
    interrupts which have been requested via request_irq(). So both the primary
    and the thread handler become the same which then triggers the warnon that
    the thread handler tries to wakeup a not configured secondary thread.

    Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
    when requesting the threaded interrupt, which is normaly caught by the
    sanity checks when force irq threading is disabled.

    Fix it by skipping the force threading setup when a regular threaded
    interrupt is requested. As a consequence the interrupt request which lacks
    the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
    it.

    Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler")
    Reported-by: Kurt Kanzenbach
    Signed-off-by: Thomas Gleixner
    Tested-by: Kurt Kanzenbach
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     
  • Now that every user of MULTI_IRQ_HANDLER has been convereted over to use
    GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER.

    Signed-off-by: Palmer Dabbelt
    Signed-off-by: Thomas Gleixner
    Cc: linux@armlinux.org.uk
    Cc: catalin.marinas@arm.com
    Cc: Will Deacon
    Cc: jonas@southpole.se
    Cc: stefan.kristiansson@saunalahti.fi
    Cc: shorne@gmail.com
    Cc: jason@lakedaemon.net
    Cc: marc.zyngier@arm.com
    Cc: Arnd Bergmann
    Cc: nicolas.pitre@linaro.org
    Cc: vladimir.murzin@arm.com
    Cc: keescook@chromium.org
    Cc: jinb.park7@gmail.com
    Cc: yamada.masahiro@socionext.com
    Cc: alexandre.belloni@bootlin.com
    Cc: pombredanne@nexb.com
    Cc: Greg KH
    Cc: kstewart@linuxfoundation.org
    Cc: jhogan@kernel.org
    Cc: mark.rutland@arm.com
    Cc: ard.biesheuvel@linaro.org
    Cc: james.morse@arm.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: openrisc@lists.librecores.org
    Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com

    Palmer Dabbelt
     

17 Jul, 2018

1 commit

  • The NULL pointer check in __free_irq() triggers a 'dereference before NULL
    pointer check' warning in static code analysis. It turns out that the check
    is redundant because all callers have a NULL pointer check already.

    Remove it.

    Signed-off-by: RAGHU Halharvi
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20180717102009.7708-1-raghuhack78@gmail.com

    RAGHU Halharvi
     

24 Jun, 2018

2 commits

  • When pciehp is converted to threaded IRQ handling, removal of unplugged
    devices below a PCIe hotplug port happens synchronously in the IRQ thread.
    Removal of devices typically entails a call to free_irq() by their drivers.

    If those devices share their IRQ with the hotplug port, __free_irq()
    deadlocks because it calls synchronize_irq() to wait for all hard IRQ
    handlers as well as all threads sharing the IRQ to finish.

    Actually it's sufficient to wait only for the IRQ thread of the removed
    device, so call synchronize_hardirq() to wait for all hard IRQ handlers to
    finish, but no longer for any threads. Compensate by rearranging the
    control flow in irq_wait_for_interrupt() such that the device's thread is
    allowed to run one last time after kthread_stop() has been called.

    kthread_stop() blocks until the IRQ thread has completed. On completion
    the IRQ thread clears its oneshot thread_mask bit. This is safe because
    __free_irq() holds the request_mutex, thereby preventing __setup_irq() from
    handing out the same oneshot thread_mask bit to a newly requested action.

    Stack trace for posterity:
    INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
    schedule+0x28/0x80
    synchronize_irq+0x6e/0xa0
    __free_irq+0x15a/0x2b0
    free_irq+0x33/0x70
    pciehp_release_ctrl+0x98/0xb0
    pcie_port_remove_service+0x2f/0x40
    device_release_driver_internal+0x157/0x220
    bus_remove_device+0xe2/0x150
    device_del+0x124/0x340
    device_unregister+0x16/0x60
    remove_iter+0x1a/0x20
    device_for_each_child+0x4b/0x90
    pcie_port_device_remove+0x1e/0x30
    pci_device_remove+0x36/0xb0
    device_release_driver_internal+0x157/0x220
    pci_stop_bus_device+0x7d/0xa0
    pci_stop_bus_device+0x3d/0xa0
    pci_stop_and_remove_bus_device+0xe/0x20
    pciehp_unconfigure_device+0xb8/0x160
    pciehp_disable_slot+0x84/0x130
    pciehp_ist+0x158/0x190
    irq_thread_fn+0x1b/0x50
    irq_thread+0x143/0x1a0
    kthread+0x111/0x130

    Signed-off-by: Lukas Wunner
    Signed-off-by: Thomas Gleixner
    Cc: Bjorn Helgaas
    Cc: Mika Westerberg
    Cc: linux-pci@vger.kernel.org
    Link: https://lkml.kernel.org/r/d72b41309f077c8d3bee6cc08ad3662d50b5d22a.1529828292.git.lukas@wunner.de

    Lukas Wunner
     
  • Previously a race existed between __free_irq() and __setup_irq() wherein
    the thread_mask of a just removed action could be handed out to a newly
    added action and the freed irq thread would then tread on the oneshot
    mask bit of the newly added irq thread in irq_finalize_oneshot():

    time
    | __free_irq()
    | raw_spin_lock_irqsave(&desc->lock, flags);
    |
    | raw_spin_unlock_irqrestore(&desc->lock, flags);
    |
    | __setup_irq()
    | raw_spin_lock_irqsave(&desc->lock, flags);
    |
    | raw_spin_unlock_irqrestore(&desc->lock, flags);
    |
    | irq_thread() of freed irq (__free_irq() waits in synchronize_irq())
    | irq_thread_fn()
    | irq_finalize_oneshot()
    | raw_spin_lock_irq(&desc->lock);
    | desc->threads_oneshot &= ~action->thread_mask;
    | raw_spin_unlock_irq(&desc->lock);
    v

    The race was known at least since 2012 when it was documented in a code
    comment by commit e04268b0effc ("genirq: Remove paranoid warnons and bogus
    fixups"). The race itself is harmless as nothing touches any of the
    potentially freed data after synchronize_irq().

    In 2017 the race was close by commit 9114014cf4e6 ("genirq: Add mutex to
    irq desc to serialize request/free_irq()"), apparently inadvertantly so
    because the race is neither mentioned in the commit message nor was the
    code comment updated. Make up for that.

    Signed-off-by: Lukas Wunner
    Signed-off-by: Thomas Gleixner
    Cc: Bjorn Helgaas
    Cc: Mika Westerberg
    Cc: linux-pci@vger.kernel.org
    Link: https://lkml.kernel.org/r/32fc25aa35ecef4b2692f57687bb7fc2a57230e2.1529828292.git.lukas@wunner.de

    Lukas Wunner
     

22 Jun, 2018

2 commits

  • Since commit 425a5072dcd1 ("genirq: Free irq_desc with rcu"),
    show_interrupts() can be switched to rcu locking, which removes possible
    contention on sparse_irq_lock.

    The per_cpu count scan and print can be done without holding desc spinlock.

    And there is no need to call kstat_irqs_cpu() and abuse irq_to_desc() while
    holding rcu read lock, since desc and desc->kstat_irqs wont disappear or
    change.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Cc: Eric Dumazet
    Link: https://lkml.kernel.org/r/20180620150332.163320-1-edumazet@google.com

    Eric Dumazet
     
  • Debug is missing the IRQCHIP_SUPPORTS_LEVEL_MSI debug entry, making debugfs
    slightly less useful.

    Take this opportunity to also add a missing comment in the definition of
    IRQCHIP_SUPPORTS_LEVEL_MSI.

    Fixes: 6988e0e0d283 ("genirq/msi: Limit level-triggered MSI to platform devices")
    Signed-off-by: Marc Zyngier
    Signed-off-by: Thomas Gleixner
    Cc: Jason Cooper
    Cc: Alexandre Belloni
    Cc: Yang Yingliang
    Cc: Sumit Garg
    Link: https://lkml.kernel.org/r/20180622095254.5906-2-marc.zyngier@arm.com

    Marc Zyngier
     

19 Jun, 2018

2 commits

  • When the comment was reflowed to a wider format, the "*" snuck in.

    Fixes: ae88a23b32fa ("irq: refactor and clean up the free_irq() code flow")
    Signed-off-by: Jonathan Neuschäfer
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20180617124018.25539-1-j.neuschaefer@gmx.net

    Jonathan Neuschäfer
     
  • Jeremy Dorfman identified mutex contention when multiple threads
    parse /proc/stat concurrently.

    Since commit 425a5072dcd1 ("genirq: Free irq_desc with rcu"),
    kstat_irqs_usr() can be switched to rcu locking, which removes this mutex
    contention.

    show_interrupts() case will be handled in a separate patch.

    Reported-by: Jeremy Dorfman
    Signed-off-by: Eric Dumazet
    Signed-off-by: Thomas Gleixner
    Cc: Eric Dumazet
    Cc: Willem de Bruijn
    Link: https://lkml.kernel.org/r/20180618125612.155057-1-edumazet@google.com

    Eric Dumazet
     

11 Jun, 2018

1 commit

  • Pull x86 updates and fixes from Thomas Gleixner:

    - Fix the (late) fallout from the vector management rework causing
    hlist corruption and irq descriptor reference leaks caused by a
    missing sanity check.

    The straight forward fix triggered another long standing issue to
    surface. The pre rework code hid the issue due to being way slower,
    but now the chance that user space sees an EBUSY error return when
    updating irq affinities is way higher, though quite a bunch of
    userspace tools do not handle it properly despite the fact that EBUSY
    could be returned for at least 10 years.

    It turned out that the EBUSY return can be avoided completely by
    utilizing the existing delayed affinity update mechanism for irq
    remapped scenarios as well. That's a bit more error handling in the
    kernel, but avoids fruitless fingerpointing discussions with tool
    developers.

    - Decouple PHYSICAL_MASK from AMD SME as its going to be required for
    the upcoming Intel memory encryption support as well.

    - Handle legacy device ACPI detection properly for newer platforms

    - Fix the wrong argument ordering in the vector allocation tracepoint

    - Simplify the IDT setup code for the APIC=n case

    - Use the proper string helpers in the MTRR code

    - Remove a stale unused VDSO source file

    - Convert the microcode update lock to a raw spinlock as its used in
    atomic context.

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
    x86/apic/vector: Print APIC control bits in debugfs
    genirq/affinity: Defer affinity setting if irq chip is busy
    x86/platform/uv: Use apic_ack_irq()
    x86/ioapic: Use apic_ack_irq()
    irq_remapping: Use apic_ack_irq()
    x86/apic: Provide apic_ack_irq()
    genirq/migration: Avoid out of line call if pending is not set
    genirq/generic_pending: Do not lose pending affinity update
    x86/apic/vector: Prevent hlist corruption and leaks
    x86/vector: Fix the args of vector_alloc tracepoint
    x86/idt: Simplify the idt_setup_apic_and_irq_gates()
    x86/platform/uv: Remove extra parentheses
    x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
    x86: Mark native_set_p4d() as __always_inline
    x86/microcode: Make the late update update_lock a raw lock for RT
    x86/mtrr: Convert to use strncpy_from_user() helper
    x86/mtrr: Convert to use match_string() helper
    x86/vdso: Remove unused file
    x86/i8237: Register device based on FADT legacy boot flag

    Linus Torvalds
     

06 Jun, 2018

4 commits

  • The case that interrupt affinity setting fails with -EBUSY can be handled
    in the kernel completely by using the already available generic pending
    infrastructure.

    If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
    interrupts for which irq_chip::set_affinity() can only be invoked from
    interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
    set the affinity pending bit. The next raised interrupt for the affected
    irq will check the pending bit and try to set the new affinity from the
    handler. This avoids that -EBUSY is returned when an affinity change is
    requested from user space and the previous change has not been cleaned
    up. The new affinity will take effect when the next interrupt is raised
    from the device.

    Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
    Signed-off-by: Thomas Gleixner
    Tested-by: Song Liu
    Cc: Joerg Roedel
    Cc: Peter Zijlstra
    Cc: Song Liu
    Cc: Dmitry Safonov
    Cc: stable@vger.kernel.org
    Cc: Mike Travis
    Cc: Borislav Petkov
    Cc: Tariq Toukan
    Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de

    Thomas Gleixner
     
  • The upcoming fix for the -EBUSY return from affinity settings requires to
    use the irq_move_irq() functionality even on irq remapped interrupts. To
    avoid the out of line call, move the check for the pending bit into an
    inline helper.

    Preparatory change for the real fix. No functional change.

    Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
    Signed-off-by: Thomas Gleixner
    Cc: Joerg Roedel
    Cc: Peter Zijlstra
    Cc: Song Liu
    Cc: Dmitry Safonov
    Cc: stable@vger.kernel.org
    Cc: Mike Travis
    Cc: Borislav Petkov
    Cc: Tariq Toukan
    Cc: Dou Liyang
    Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de

    Thomas Gleixner
     
  • The generic pending interrupt mechanism moves interrupts from the interrupt
    handler on the original target CPU to the new destination CPU. This is
    required for x86 and ia64 due to the way the interrupt delivery and
    acknowledge works if the interrupts are not remapped.

    However that update can fail for various reasons. Some of them are valid
    reasons to discard the pending update, but the case, when the previous move
    has not been fully cleaned up is not a legit reason to fail.

    Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
    a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
    tried again when the next interrupt arrives.

    Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
    Signed-off-by: Thomas Gleixner
    Tested-by: Song Liu
    Cc: Joerg Roedel
    Cc: Peter Zijlstra
    Cc: Song Liu
    Cc: Dmitry Safonov
    Cc: stable@vger.kernel.org
    Cc: Mike Travis
    Cc: Borislav Petkov
    Cc: Tariq Toukan
    Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de

    Thomas Gleixner
     
  • The interrupts are enabled/disabled so the interrupt handler can run
    with enabled interrupts while serving the interrupt and not lose other
    interrupts especially the timer tick.
    If the system runs with force-threaded interrupts then there is no need
    to enable the interrupts.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: David S. Miller
    Signed-off-by: David S. Miller

    Sebastian Andrzej Siewior
     

05 Jun, 2018

1 commit

  • Pull irq updates from Thomas Gleixner:

    - Consolidation of softirq pending:

    The softirq mask and its accessors/mutators have many implementations
    scattered around many architectures. Most do the same things
    consisting in a field in a per-cpu struct (often irq_cpustat_t)
    accessed through per-cpu ops. We can provide instead a generic
    efficient version that most of them can use. In fact s390 is the only
    exception because the field is stored in lowcore.

    - Support for level!?! triggered MSI (ARM)

    Over the past couple of years, we've seen some SoCs coming up with
    ways of signalling level interrupts using a new flavor of MSIs, where
    the MSI controller uses two distinct messages: one that raises a
    virtual line, and one that lowers it. The target MSI controller is in
    charge of maintaining the state of the line.

    This allows for a much simplified HW signal routing (no need to have
    hundreds of discrete lines to signal level interrupts if you already
    have a memory bus), but results in a departure from the current idea
    the kernel has of MSIs.

    - Support for Meson-AXG GPIO irqchip

    - Large stm32 irqchip rework (suspend/resume, hierarchical domains)

    - More SPDX conversions

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    ARM: dts: stm32: Add exti support to stm32mp157 pinctrl
    ARM: dts: stm32: Add exti support for stm32mp157c
    pinctrl/stm32: Add irq_eoi for stm32gpio irqchip
    irqchip/stm32: Add suspend/resume support for hierarchy domain
    irqchip/stm32: Add stm32mp1 support with hierarchy domain
    irqchip/stm32: Prepare common functions
    irqchip/stm32: Add host and driver data structures
    irqchip/stm32: Add suspend support
    irqchip/stm32: Add falling pending register support
    irqchip/stm32: Checkpatch fix
    irqchip/stm32: Optimizes and cleans up stm32-exti irq_domain
    irqchip/meson-gpio: Add support for Meson-AXG SoCs
    dt-bindings: interrupt-controller: New binding for Meson-AXG SoC
    dt-bindings: interrupt-controller: Fix the double quotes
    softirq/s390: Move default mutators of overwritten softirq mask to s390
    softirq/x86: Switch to generic local_softirq_pending() implementation
    softirq/sparc: Switch to generic local_softirq_pending() implementation
    softirq/powerpc: Switch to generic local_softirq_pending() implementation
    softirq/parisc: Switch to generic local_softirq_pending() implementation
    softirq/ia64: Switch to generic local_softirq_pending() implementation
    ...

    Linus Torvalds
     

16 May, 2018

1 commit


13 May, 2018

1 commit

  • So far, MSIs have been used to signal edge-triggered interrupts, as
    a write is a good model for an edge (you can't "unwrite" something).
    On the other hand, routing zillions of wires in an SoC because you
    need level interrupts is a bit extreme.

    People have come up with a variety of schemes to support this, which
    involves sending two messages: one to signal the interrupt, and one
    to clear it. Since the kernel cannot represent this, we've ended up
    with side-band mechanisms that are pretty awful.

    Instead, let's acknoledge the requirement, and ensure that, under the
    right circumstances, the irq_compose_msg and irq_write_msg can take
    as a parameter an array of two messages instead of a pointer to a
    single one. We also add some checking that the compose method only
    clobbers the second message if the MSI domain has been created with
    the MSI_FLAG_LEVEL_CAPABLE flags.

    Signed-off-by: Marc Zyngier
    Signed-off-by: Thomas Gleixner
    Cc: Rob Herring
    Cc: Jason Cooper
    Cc: Ard Biesheuvel
    Cc: Srinivas Kandagatla
    Cc: Thomas Petazzoni
    Cc: Miquel Raynal
    Link: https://lkml.kernel.org/r/20180508121438.11301-2-marc.zyngier@arm.com

    Marc Zyngier
     

27 Apr, 2018

1 commit

  • There is the SPDX license identifier now in the irq simulator. Remove the
    license boilerplate.

    While at it: update the copyright notice, since I did some changes in 2018.

    Signed-off-by: Bartosz Golaszewski
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20180426200747.8344-1-brgl@bgdev.pl

    Bartosz Golaszewski
     

06 Apr, 2018

5 commits

  • Commit 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
    tried to spread the interrupts accross all possible CPUs to make sure that
    in case of phsyical hotplug (e.g. virtualization) the CPUs which get
    plugged in after the device was initialized are targeted by a hardware
    queue and the corresponding interrupt.

    This has a downside in cases where the ACPI tables claim that there are
    more possible CPUs than present CPUs and the number of interrupts to spread
    out is smaller than the number of possible CPUs. These bogus ACPI tables
    are unfortunately not uncommon.

    In such a case the vector spreading algorithm assigns interrupts to CPUs
    which can never be utilized and as a consequence these interrupts are
    unused instead of being mapped to present CPUs. As a result the performance
    of the device is suboptimal.

    To fix this spread the interrupt vectors in two stages:

    1) Spread as many interrupts as possible among the present CPUs

    2) Spread the remaining vectors among non present CPUs

    On a 8 core system, where CPU 0-3 are present and CPU 4-7 are not present,
    for a device with 4 queues the resulting interrupt affinity is:

    1) Before 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
    irq 39, cpu list 0
    irq 40, cpu list 1
    irq 41, cpu list 2
    irq 42, cpu list 3

    2) With 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
    irq 39, cpu list 0-2
    irq 40, cpu list 3-4,6
    irq 41, cpu list 5
    irq 42, cpu list 7

    3) With the refined vector spread applied:
    irq 39, cpu list 0,4
    irq 40, cpu list 1,6
    irq 41, cpu list 2,5
    irq 42, cpu list 3,7

    On a 8 core system, where all CPUs are present the resulting interrupt
    affinity for the 4 queues is:

    irq 39, cpu list 0,1
    irq 40, cpu list 2,3
    irq 41, cpu list 4,5
    irq 42, cpu list 6,7

    This is independent of the number of CPUs which are online at the point of
    initialization because in such a system the offline CPUs can be easily
    onlined afterwards, while in non-present CPUs need to be plugged physically
    or virtually which requires external interaction.

    The downside of this approach is that in case of physical hotplug the
    interrupt vector spreading might be suboptimal when CPUs 4-7 are physically
    plugged. Suboptimal from a NUMA point of view and due to the single target
    nature of interrupt affinities the later plugged CPUs might not be targeted
    by interrupts at all.

    Though, physical hotplug systems are not the common case while the broken
    ACPI table disease is wide spread. So it's preferred to have as many
    interrupts as possible utilized at the point where the device is
    initialized.

    Block multi-queue devices like NVME create a hardware queue per possible
    CPU, so the goal of commit 84676c1f21 to assign one interrupt vector per
    possible CPU is still achieved even with physical/virtual hotplug.

    [ tglx: Changed from online to present CPUs for the first spreading stage,
    renamed variables for readability sake, added comments and massaged
    changelog ]

    Reported-by: Laurence Oberman
    Signed-off-by: Ming Lei
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Christoph Hellwig
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: Christoph Hellwig
    Link: https://lkml.kernel.org/r/20180308105358.1506-5-ming.lei@redhat.com

    Ming Lei
     
  • To support two stage irq vector spreading, it's required to add a starting
    point to the spreading function. No functional change, just preparatory
    work for the actual two stage change.

    [ tglx: Renamed variables, tidied up the code and massaged changelog ]

    Signed-off-by: Ming Lei
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Christoph Hellwig
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: Laurence Oberman
    Cc: Christoph Hellwig
    Link: https://lkml.kernel.org/r/20180308105358.1506-4-ming.lei@redhat.com

    Ming Lei
     
  • No functional change, just prepare for converting to 2-stage irq vector
    spreading.

    Signed-off-by: Ming Lei
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Christoph Hellwig
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: Laurence Oberman
    Cc: Christoph Hellwig
    Link: https://lkml.kernel.org/r/20180308105358.1506-3-ming.lei@redhat.com

    Ming Lei
     
  • The following patches will introduce two stage irq spreading for improving
    irq spread on all possible CPUs.

    No functional change.

    Signed-off-by: Ming Lei
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Christoph Hellwig
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: Laurence Oberman
    Cc: Christoph Hellwig
    Link: https://lkml.kernel.org/r/20180308105358.1506-2-ming.lei@redhat.com

    Ming Lei
     
  • When the allocation of node_to_possible_cpumask fails, then
    irq_create_affinity_masks() returns with a pointer to the empty affinity
    masks array, which will cause malfunction.

    Reorder the allocations so the masks array allocation comes last and every
    failure path returns NULL.

    Fixes: 9a0ef98e186d ("genirq/affinity: Assign vectors to all present CPUs")
    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: Ming Lei

    Thomas Gleixner
     

04 Apr, 2018

1 commit

  • These config switches enable the same code in the core and the not yet
    converted architecture code. They can be selected both by randconfig builds
    and cause linker error because the same symbols are defined twice.

    Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to
    prevent that. The dependency will be removed once all architectures are
    converted over.

    Signed-off-by: Palmer Dabbelt
    Signed-off-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Arnd Bergmann
    Link: https://lkml.kernel.org/r/20180404043130.31277-4-palmer@sifive.com

    Palmer Dabbelt
     

20 Mar, 2018

5 commits

  • Now that SPDX identifiers are in place, remove the boilerplate or
    references.

    The change in timings.c has been acked by the author.

    Signed-off-by: Thomas Gleixner
    Acked-by: Daniel Lezcano
    Cc: Kate Stewart
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Link: https://lkml.kernel.org/r/20180314212030.668321222@linutronix.de

    Thomas Gleixner
     
  • Add SPDX identifiers to files

    - which contain an explicit license boiler plate or reference

    - which do not contain a license reference and were not updated in the
    initial SPDX conversion because the license was deduced by the scanners
    via EXPORT_SYMBOL_GPL as GPL2.0 only.

    [ tglx: Moved adding identifiers from the patch which removes the
    references/boilerplate ]

    Signed-off-by: Thomas Gleixner
    Cc: Kate Stewart
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Link: https://lkml.kernel.org/r/20180314212030.668321222@linutronix.de

    Thomas Gleixner
     
  • Use the proper SPDX-Identifier format.

    Signed-off-by: Thomas Gleixner
    Acked-by: Marc Zyngier
    Cc: Kate Stewart
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Link: https://lkml.kernel.org/r/20180314212030.492674761@linutronix.de

    Thomas Gleixner
     
  • Remove pointless references to the file name itself and condense the
    information so it wastes less space.

    Signed-off-by: Thomas Gleixner
    Acked-by: Marc Zyngier
    Cc: Kate Stewart
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Link: https://lkml.kernel.org/r/20180314212030.412095827@linutronix.de

    Thomas Gleixner
     
  • Given that irq_to_desc() is a radix_tree_lookup and the reverse
    operation is only a pointer dereference and that all callers of
    __free_irq already have the desc, pass the desc instead of the irq
    number.

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Thomas Gleixner
    Cc: kernel@pengutronix.de
    Link: https://lkml.kernel.org/r/20180319105202.9794-1-u.kleine-koenig@pengutronix.de

    Uwe Kleine König