04 Jul, 2016

3 commits

  • Pull the irq affinity managing code which is in a seperate branch for block
    developers to pull.

    Thomas Gleixner
     
  • Add an extra argument to the irq(domain) allocation functions, so we can hand
    down affinity hints to the allocator. Thats necessary to implement proper
    support for multiqueue devices.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-4-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Interupts marked with this flag are excluded from user space interrupt
    affinity changes. Contrary to the IRQ_NO_BALANCING flag, the kernel internal
    affinity mechanism is not blocked.

    This flag will be used for multi-queue device interrupts.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-3-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

13 Jun, 2016

3 commits

  • Some IRQ chips may be located in a power domain outside of the CPU
    subsystem and hence will require device specific runtime power
    management. In order to support such IRQ chips, add a pointer for a
    device structure to the irq_chip structure, and if this pointer is
    populated by the IRQ chip driver and CONFIG_PM is selected in the kernel
    configuration, then the pm_runtime_get/put APIs for this chip will be
    called when an IRQ is requested/freed, respectively.

    Reviewed-by: Kevin Hilman
    Signed-off-by: Jon Hunter
    Signed-off-by: Marc Zyngier

    Jon Hunter
     
  • As we now do for non-percpu interrupt, perform a lookup of the
    interrupt trigger if the user doesn't supply one. The difference
    here is that we can only do it at enable time (trigger configuration
    can be per-cpu as well).

    Signed-off-by: Marc Zyngier

    Marc Zyngier
     
  • For some devices the IRQ trigger type for a device is read from
    firmware, such as device-tree. The IRQ trigger type is typically read
    when the mapping for IRQ is created, which is before the IRQ is
    requested. Hence, the IRQ trigger type is programmed when mapping the
    IRQ and not when requesting the IRQ.

    Although this works for most cases, in order to support IRQ chips which
    require runtime power management, which may not be accessible prior
    to requesting the IRQ, it is desirable to look-up the IRQ trigger type
    when it is requested. Therefore, if the IRQ trigger type is not
    specified when __setup_irq() is called, look-up the saved IRQ trigger
    type. This will allow us to defer the programming of the trigger type
    from when the IRQ is mapped to when it is actually requested.

    Signed-off-by: Jon Hunter
    Reviewed-by: Marc Zyngier
    Signed-off-by: Marc Zyngier

    Jon Hunter
     

11 May, 2016

1 commit

  • In the function, setup_irq(), we don't check that the descriptor
    returned from irq_to_desc() is valid before we start using it. For
    example chip_bus_lock() called from setup_irq(), assumes that the
    descriptor pointer is valid and doesn't check before dereferencing it.

    In many other functions including setup/free_percpu_irq() we do check
    that the descriptor returned is not NULL and therefore add the same test
    to setup_irq() to ensure the descriptor returned is valid.

    Signed-off-by: Jon Hunter
    Signed-off-by: Marc Zyngier

    Jon Hunter
     

23 Mar, 2016

1 commit

  • Use the more common logging method with the eventual goal of removing
    pr_warning altogether.

    Miscellanea:

    - Realign arguments
    - Coalesce formats
    - Add missing space between a few coalesced formats

    Signed-off-by: Joe Perches
    Acked-by: Rafael J. Wysocki [kernel/power/suspend.c]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

17 Mar, 2016

1 commit

  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the majority of changes go into cpufreq and they are
    significant.

    First off, the way CPU frequency updates are triggered is different
    now. Instead of having to set up and manage a deferrable timer for
    each CPU in the system to evaluate and possibly change its frequency
    periodically, cpufreq governors set up callbacks to be invoked by the
    scheduler on a regular basis (basically on utilization updates). The
    "old" governors, "ondemand" and "conservative", still do all of their
    work in process context (although that is triggered by the scheduler
    now), but intel_pstate does it all in the callback invoked by the
    scheduler with no need for any additional asynchronous processing.

    Of course, this eliminates the overhead related to the management of
    all those timers, but also it allows the cpufreq governor code to be
    simplified quite a bit. On top of that, the common code and data
    structures used by the "ondemand" and "conservative" governors are
    cleaned up and made more straightforward and some long-standing and
    quite annoying problems are addressed. In particular, the handling of
    governor sysfs attributes is modified and the related locking becomes
    more fine grained which allows some concurrency problems to be avoided
    (particularly deadlocks with the core cpufreq code).

    In principle, the new mechanism for triggering frequency updates
    allows utilization information to be passed from the scheduler to
    cpufreq. Although the current code doesn't make use of it, in the
    works is a new cpufreq governor that will make decisions based on the
    scheduler's utilization data. That should allow the scheduler and
    cpufreq to work more closely together in the long run.

    In addition to the core and governor changes, cpufreq drivers are
    updated too. Fixes and optimizations go into intel_pstate, the
    cpufreq-dt driver is updated on top of some modification in the
    Operating Performance Points (OPP) framework and there are fixes and
    other updates in the powernv cpufreq driver.

    Apart from the cpufreq updates there is some new ACPICA material,
    including a fix for a problem introduced by previous ACPICA updates,
    and some less significant changes in the ACPI code, like CPPC code
    optimizations, ACPI processor driver cleanups and support for loading
    ACPI tables from initrd.

    Also updated are the generic power domains framework, the Intel RAPL
    power capping driver and the turbostat utility and we have a bunch of
    traditional assorted fixes and cleanups.

    Specifics:

    - Redesign of cpufreq governors and the intel_pstate driver to make
    them use callbacks invoked by the scheduler to trigger CPU
    frequency evaluation instead of using per-CPU deferrable timers for
    that purpose (Rafael Wysocki).

    - Reorganization and cleanup of cpufreq governor code to make it more
    straightforward and fix some concurrency problems in it (Rafael
    Wysocki, Viresh Kumar).

    - Cleanup and improvements of locking in the cpufreq core (Viresh
    Kumar).

    - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
    Kumar, Eric Biggers).

    - intel_pstate driver updates including fixes, optimizations and a
    modification to make it enable enable hardware-coordinated P-state
    selection (HWP) by default if supported by the processor (Philippe
    Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
    Franciosi).

    - Operating Performance Points (OPP) framework updates to improve its
    handling of voltage regulators and device clocks and updates of the
    cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

    - Updates of the powernv cpufreq driver to fix initialization and
    cleanup problems in it and correct its worker thread handling with
    respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
    Bhat).

    - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

    - ACPICA updates including one fix for a regression introduced by
    previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
    Colin Ian King).

    - Support for installing ACPI tables from initrd (Lv Zheng).

    - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
    Chaugule).

    - Support for _HID(ACPI0010) devices (ACPI processor containers) and
    ACPI processor driver cleanups (Sudeep Holla).

    - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
    Aleksey Makarov).

    - Modification of the ACPI PCI IRQ management code to make it treat
    255 in the Interrupt Line register as "not connected" on x86 (as
    per the specification) and avoid attempts to use that value as a
    valid interrupt vector (Chen Fan).

    - ACPI APEI fixes related to resource leaks (Josh Hunt).

    - Removal of modularity from a few ACPI drivers (BGRT, GHES,
    intel_pmic_crc) that cannot be built as modules in practice (Paul
    Gortmaker).

    - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
    as a valid resource type (Harb Abdulhamid).

    - New device ID (future AMD I2C controller) in the ACPI driver for
    AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

    - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

    - cpuidle menu governor optimization to avoid a square root
    computation in it (Rasmus Villemoes).

    - Fix for potential use-after-free in the generic device properties
    framework (Heikki Krogerus).

    - Updates of the generic power domains (genpd) framework including
    support for multiple power states of a domain, fixes and debugfs
    output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
    Geert Uytterhoeven).

    - Intel RAPL power capping driver updates to reduce IPI overhead in
    it (Jacob Pan).

    - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
    Sengar).

    - Year 2038 fix for the process freezer (Abhilash Jindal).

    - turbostat utility updates including new features (decoding of more
    registers and CPUID fields, sub-second intervals support, GFX MHz
    and RC6 printout, --out command line option), fixes (syscall jitter
    detection and workaround, reductioin of the number of syscalls
    made, fixes related to Xeon x200 processors, compiler warning
    fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

    * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
    tools/power turbostat: bugfix: TDP MSRs print bits fixing
    tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
    tools/power turbostat: call __cpuid() instead of __get_cpuid()
    tools/power turbostat: indicate SMX and SGX support
    tools/power turbostat: detect and work around syscall jitter
    tools/power turbostat: show GFX%rc6
    tools/power turbostat: show GFXMHz
    tools/power turbostat: show IRQs per CPU
    tools/power turbostat: make fewer systems calls
    tools/power turbostat: fix compiler warnings
    tools/power turbostat: add --out option for saving output in a file
    tools/power turbostat: re-name "%Busy" field to "Busy%"
    tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
    tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
    tools/power turbostat: allow sub-sec intervals
    ACPI / APEI: ERST: Fixed leaked resources in erst_init
    ACPI / APEI: Fix leaked resources
    intel_pstate: Do not skip samples partially
    intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
    intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
    ...

    Linus Torvalds
     

09 Mar, 2016

1 commit

  • Per the x86-specific footnote to PCI spec r3.0, sec 6.2.4, the value 255 in
    the Interrupt Line register means "unknown" or "no connection."
    Previously, when we couldn't derive an IRQ from the _PRT, we fell back to
    using the value from Interrupt Line as an IRQ. It's questionable whether
    we should do that at all, but the spec clearly suggests we shouldn't do it
    for the value 255 on x86.

    Calling request_irq() with IRQ 255 may succeed, but the driver won't
    receive any interrupts. Or, if IRQ 255 is shared with another device, it
    may succeed, and the driver's ISR will be called at random times when the
    *other* device interrupts. Or it may fail if another device is using IRQ
    255 with incompatible flags. What we *want* is for request_irq() to fail
    predictably so the driver can fall back to polling.

    On x86, assume 255 in the Interrupt Line means the INTx line is not
    connected. In that case, set dev->irq to IRQ_NOTCONNECTED so request_irq()
    will fail gracefully with -ENOTCONN.

    We found this problem on a system where Secure Boot firmware assigned
    Interrupt Line 255 to an i801_smbus device and another device was already
    using MSI-X IRQ 255. This was in v3.10, where i801_probe() fails if
    request_irq() fails:

    i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
    i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C
    i801_smbus 0000:00:1f.3: PCI INT C: no GSI
    genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa)
    CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1
    Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5
    Call Trace:
    dump_stack+0x19/0x1b
    __setup_irq+0x54a/0x570
    request_threaded_irq+0xcc/0x170
    i801_probe+0x32f/0x508 [i2c_i801]
    local_pci_probe+0x45/0xa0
    i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16
    i801_smbus: probe of 0000:00:1f.3 failed with error -16

    After aeb8a3d16ae0 ("i2c: i801: Check if interrupts are disabled"),
    i801_probe() will fall back to polling if request_irq() fails. But we
    still need this patch because request_irq() may succeed or fail depending
    on other devices in the system. If request_irq() fails, i801_smbus will
    work by falling back to polling, but if it succeeds, i801_smbus won't work
    because it expects interrupts that it may not receive.

    Signed-off-by: Chen Fan
    Acked-by: Thomas Gleixner
    Acked-by: Bjorn Helgaas
    Signed-off-by: Rafael J. Wysocki

    Chen Fan
     

15 Feb, 2016

1 commit

  • The irq code browses the list of actions differently to inspect the element
    one by one. Even if it is not a problem, for the sake of consistent code,
    provide a macro similar to for_each_irq_desc in order to have the same loop to
    go through the actions list and use it in the code.

    [ tglx: Renamed the macro ]

    Signed-off-by: Daniel Lezcano
    Link: http://lkml.kernel.org/r/1452765253-31148-1-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Thomas Gleixner

    Daniel Lezcano
     

12 Jan, 2016

1 commit

  • Pull irq updates from Thomas Gleixner:
    "The irq department provides:

    - Support for MSI to wire bridges and a first user of it

    - More ACPI support for ARM/GIC

    - A new TS-4800 interrupt controller driver

    - RCU based free of interrupt descriptors to support the upcoming
    Intel VMD technology without introducing a locking nightmare

    - The usual pile of fixes and updates to drivers and core code"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
    irqchip/omap-intc: Add support for spurious irq handling
    irqchip/zevio: Use irq_data_get_chip_type() helper
    irqchip/omap-intc: Remove duplicate setup for IRQ chip type handler
    irqchip/ts4800: Add TS-4800 interrupt controller
    irqchip/ts4800: Add documentation for TS-4800 interrupt controller
    irq/platform-MSI: Increase the maximum MSIs the MSI framework can support
    irqchip/gicv2m: Miscellaneous fixes for v2m resources and SPI ranges
    irqchip/bcm2836: Make code more readable
    irqchip/bcm2836: Tolerate IRQs while no flag is set in ISR
    irqchip/bcm2836: Add SMP support for the 2836
    irqchip/bcm2836: Fix initialization of the LOCAL_IRQ_CNT timers
    irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support
    irqchip/gic-v2m: Refactor to prepare for ACPI support
    irqdomain: Introduce is_fwnode_irqchip helper
    acpi: pci: Setup MSI domain for ACPI based pci devices
    genirq/msi: Export functions to allow MSI domains in modules
    irqchip/mbigen: Implement the mbigen irq chip operation functions
    irqchip/mbigen: Create irq domain for each mbigen device
    irqchip/mgigen: Add platform device driver for mbigen device
    dt-bindings: Documents the mbigen bindings
    ...

    Linus Torvalds
     

14 Dec, 2015

1 commit

  • If a interrupt chip utilizes chip->buslock then free_irq() can
    deadlock in the following way:

    CPU0 CPU1
    interrupt(X) (Shared or spurious)
    free_irq(X) interrupt_thread(X)
    chip_bus_lock(X)
    irq_finalize_oneshot(X)
    chip_bus_lock(X)
    synchronize_irq(X)

    synchronize_irq() waits for the interrupt thread to complete,
    i.e. forever.

    Solution is simple: Drop chip_bus_lock() before calling
    synchronize_irq() as we do with the irq_desc lock. There is nothing to
    be protected after the point where irq_desc lock has been released.

    This adds chip_bus_lock/unlock() to the remove_irq() code path, but
    that's actually correct in the case where remove_irq() is called on
    such an interrupt. The current users of remove_irq() are not affected
    as none of those interrupts is on a chip which requires buslock.

    Reported-by: Fredrik Markström
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     

08 Dec, 2015

1 commit

  • Certain interrupt controller drivers have a register set that does not
    make it easy to save/restore the mask of enabled/disabled interrupts
    at suspend/resume time. At resume time, such drivers rely on the core
    kernel irq subsystem to tell whether such or such interrupt is enabled
    or not, in order to restore the proper state in the interrupt
    controller register.

    While the irqd_irq_disabled() provides the relevant information for
    global interrupts, there is no similar function to query the
    enabled/disabled state of a per-CPU interrupt.

    Therefore, this commit complements the percpu_irq API with an
    irq_percpu_is_enabled() function.

    [ tglx: Simplified the implementation and added kerneldoc ]

    Signed-off-by: Thomas Petazzoni
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Tawfik Bayouk
    Cc: Nadav Haklai
    Cc: Lior Amsalem
    Cc: Andrew Lunn
    Cc: Sebastian Hesselbarth
    Cc: Gregory Clement
    Cc: Jason Cooper
    Cc: Marc Zyngier
    Link: http://lkml.kernel.org/r/1445347435-2333-2-git-send-email-thomas.petazzoni@free-electrons.com
    Signed-off-by: Thomas Gleixner

    Thomas Petazzoni
     

05 Nov, 2015

1 commit

  • Pull networking updates from David Miller:

    Changes of note:

    1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.

    2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
    David Ahern.

    3) Allow the user to ask for the statistics to be filtered out of
    ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.

    4) More work to pass the network namespace context around deep into
    various packet path APIs, starting with the netfilter hooks. From
    Eric W Biederman.

    5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
    Richter.

    6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.

    7) Support Very High Throughput in wireless MESH code, from Bob
    Copeland.

    8) Allow setting the ageing_time in switchdev/rocker. From Scott
    Feldman.

    9) Properly autoload L2TP type modules, from Stephen Hemminger.

    10) Fix and enable offload features by default in 8139cp driver, from
    David Woodhouse.

    11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
    Jiri Benc.

    12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
    Opstad.

    13) Fix IPSEC flowcache overflows on large systems, from Steffen
    Klassert.

    14) Convert bridging to track VLANs using rhashtable entries rather than
    a bitmap. From Nikolay Aleksandrov.

    15) Make TCP listener handling completely lockless, this is a major
    accomplishment. Incoming request sockets now live in the
    established hash table just like any other socket too.

    From Eric Dumazet.

    15) Provide more bridging attributes to netlink, from Nikolay
    Aleksandrov.

    16) Use hash based algorithm for ipv4 multipath routing, this was very
    long overdue. From Peter Nørlund.

    17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.

    18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.

    19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
    influences the port binding selection logic used by SO_REUSEPORT.

    20) Add ipv6 support to VRF, from David Ahern.

    21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.

    22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.

    23) Implement RACK loss recovery in TCP, from Yuchung Cheng.

    24) Support multipath routes in MPLS, from Roopa Prabhu.

    25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
    Dumazet.

    26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
    Sudarsana Kalluru.

    27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
    Sowa.

    28) Support ipv6 geneve tunnels, from John W Linville.

    29) Add flood control support to switchdev layer, from Ido Schimmel.

    30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
    Hannes Frederic Sowa.

    31) Support persistent maps and progs in bpf, from Daniel Borkmann.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
    sh_eth: use DMA barriers
    switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
    net: sched: kill dead code in sch_choke.c
    irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
    net: dsa: mv88e6xxx: include DSA ports in VLANs
    net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
    net/core: fix for_each_netdev_feature
    vlan: Invoke driver vlan hooks only if device is present
    arcnet/com20020: add LEDS_CLASS dependency
    bpf, verifier: annotate verbose printer with __printf
    dp83640: Only wait for timestamps for packets with timestamping enabled.
    ptp: Change ptp_class to a proper bitmask
    dp83640: Prune rx timestamp list before reading from it
    dp83640: Delay scheduled work.
    dp83640: Include hash in timestamp/packet matching
    ipv6: fix tunnel error handling
    net/mlx5e: Fix LSO vlan insertion
    net/mlx5e: Re-eanble client vlan TX acceleration
    net/mlx5e: Return error in case mlx5e_set_features() fails
    net/mlx5e: Don't allow more than max supported channels
    ...

    Linus Torvalds
     

11 Oct, 2015

1 commit

  • If an irq chip does not implement the irq_disable callback, then we
    use a lazy approach for disabling the interrupt. That means that the
    interrupt is marked disabled, but the interrupt line is not
    immediately masked in the interrupt chip. It only becomes masked if
    the interrupt is raised while it's marked disabled. We use this to avoid
    possibly expensive mask/unmask operations for common case operations.

    Unfortunately there are devices which do not allow the interrupt to be
    disabled easily at the device level. They are forced to use
    disable_irq_nosync(). This can result in taking each interrupt twice.

    Instead of enforcing the non lazy mode on all interrupts of a irq
    chip, provide a settings flag, which can be set by the driver for that
    particular interrupt line.

    Reported-and-tested-by: Duc Dang
    Signed-off-by: Thomas Gleixner
    Cc: Marc Zyngier
    Cc: Jason Cooper
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1510092348370.6097@nanos

    Thomas Gleixner
     

10 Oct, 2015

1 commit

  • irq_set_vcpu_affinity() is needed when CONFIG_SMP=n, so move the
    definition out of "#ifdef CONFIG_SMP"

    Suggested-by: Paolo Bonzini
    Signed-off-by: Feng Wu
    Cc: jiang.liu@linux.intel.com
    Cc: pbonzini@redhat.com
    Link: http://lkml.kernel.org/r/1443860438-144926-1-git-send-email-feng.wu@intel.com
    Signed-off-by: Thomas Gleixner

    Feng Wu
     

30 Sep, 2015

2 commits


22 Sep, 2015

1 commit

  • Force threading of interrupts does not really deal with interrupts
    which are requested with a primary and a threaded handler. The current
    policy is to leave them alone and let the primary handler run in
    interrupt context, but we set the ONESHOT flag for those interrupts as
    well.

    Kohji Okuno debugged a problem with the SDHCI driver where the
    interrupt thread waits for a hardware interrupt to trigger, which can't
    work well because the hardware interrupt is masked due to the ONESHOT
    flag being set. He proposed to set the ONESHOT flag only if the
    interrupt does not provide a thread handler.

    Though that does not work either because these interrupts can be
    shared. So the other interrupt would rightfully get the ONESHOT flag
    set and therefor the same situation would happen again.

    To deal with this proper, we need to force thread the primary handler
    of such interrupts as well. That means that the primary interrupt
    handler is treated as any other primary interrupt handler which is not
    marked IRQF_NO_THREAD. The threaded handler becomes a separate thread
    so the SDHCI flow logic can be handled gracefully.

    The same issue was reported against 4.1-rt.

    Reported-and-tested-by: Kohji Okuno
    Reported-By: Michal Smucr
    Reported-and-tested-by: Nathan Sullivan
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Andrzej Siewior
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509211058080.5606@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Sep, 2015

1 commit

  • Irq affinity mask is per-irq instead of per irqchip, so move it into
    struct irq_common_data.

    Signed-off-by: Jiang Liu
    Cc: Konrad Rzeszutek Wilk
    Cc: Tony Luck
    Cc: Bjorn Helgaas
    Cc: Benjamin Herrenschmidt
    Cc: Randy Dunlap
    Cc: Yinghai Lu
    Cc: Borislav Petkov
    Cc: Jason Cooper
    Cc: Kevin Cernekee
    Cc: Arnd Bergmann
    Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com
    Signed-off-by: Thomas Gleixner

    Jiang Liu
     

27 Jul, 2015

1 commit

  • Export these functions to be able to build the Qualcomm family A PMIC
    gpio and mpp drivers as modules.

    [ tglx: Made them GPL exports ]

    Signed-off-by: Bjorn Andersson
    Reviewed-by: Mark Brown
    Cc: Marc Zyngier
    Cc:
    Cc:
    Cc: Srinivas Kandagatla
    Cc: Linus Walleij
    Link: http://lkml.kernel.org/r/1437594184-22966-1-git-send-email-bjorn.andersson@sonymobile.com
    Signed-off-by: Thomas Gleixner

    Bjorn Andersson
     

12 Jul, 2015

5 commits


12 Jun, 2015

1 commit

  • Introduce helper function irq_data_get_node() and variants thereof to
    hide struct irq_data implementation details.

    Convert the core code to use them.

    Signed-off-by: Jiang Liu
    Cc: Konrad Rzeszutek Wilk
    Cc: Tony Luck
    Cc: Bjorn Helgaas
    Cc: Benjamin Herrenschmidt
    Cc: Randy Dunlap
    Cc: Yinghai Lu
    Cc: Borislav Petkov
    Cc: Jason Cooper
    Cc: Kevin Cernekee
    Cc: Arnd Bergmann
    Link: http://lkml.kernel.org/r/1433145945-789-5-git-send-email-jiang.liu@linux.intel.com
    Signed-off-by: Thomas Gleixner

    Jiang Liu
     

19 May, 2015

1 commit

  • With Posted-Interrupts support in Intel CPU and IOMMU, an external
    interrupt from assigned-devices could be directly delivered to a
    virtual CPU in a virtual machine. Instead of hacking KVM and Intel
    IOMMU drivers, we propose a platform independent interface to target
    an interrupt to a specific virtual CPU in a virtual machine, or set
    virtual CPU affinity for an interrupt.

    By adopting this new interface and the hierarchy irqdomain, we could
    easily support posted-interrupts on Intel platforms, and also provide
    flexible enough interfaces for other platforms to support similar
    features.

    Here is the usage scenario for this interface:
    Guest update MSI/MSI-X interrupt configuration
    -->QEMU and KVM handle this
    -->KVM call this interface (passing posted interrupts descriptor
    and guest vector)
    -->irq core will transfer the control to IOMMU
    -->IOMMU will do the real work of updating IRTE (IRTE has new
    format for VT-d Posted-Interrupts)

    Signed-off-by: Jiang Liu
    Signed-off-by: Feng Wu
    Link: http://lkml.kernel.org/r/1432026437-16560-2-git-send-email-feng.wu@intel.com
    Signed-off-by: Thomas Gleixner

    Jiang Liu
     

09 Apr, 2015

2 commits

  • There is a number of cases where a kernel subsystem may want to
    introspect the state of an interrupt at the irqchip level:

    - When a peripheral is shared between virtual machines,
    its interrupt state becomes part of the guest's state,
    and must be switched accordingly. KVM on arm/arm64 requires
    this for its guest-visible timer
    - Some GPIO controllers seem to require peeking into the
    interrupt controller they are connected to to report
    their internal state

    This seem to be a pattern that is common enough for the core code
    to try and support this without too many horrible hacks. Introduce
    a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
    to retrieve the bits that can be of interest to another subsystem:
    pending, active, and masked.

    - irq_get_irqchip_state returns the state of the interrupt according
    to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
    IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
    - irq_set_irqchip_state similarly sets the state of the interrupt.

    Signed-off-by: Marc Zyngier
    Reviewed-by: Bjorn Andersson
    Tested-by: Bjorn Andersson
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Abhijeet Dharmapurikar
    Cc: Stephen Boyd
    Cc: Phong Vo
    Cc: Linus Walleij
    Cc: Tin Huynh
    Cc: Y Vo
    Cc: Toan Le
    Cc: Bjorn Andersson
    Cc: Jason Cooper
    Cc: Arnd Bergmann
    Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     
  • conflict with pending GIC changes.

    Conflicts:
    drivers/usb/isp1760/isp1760-core.c

    Thomas Gleixner
     

05 Mar, 2015

1 commit

  • It currently is required that all users of NO_SUSPEND interrupt
    lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
    WARN_ON_ONCE() in irq_pm_install_action() will trigger. That is
    done to warn about situations in which unprepared interrupt handlers
    may be run unnecessarily for suspended devices and may attempt to
    access those devices by mistake. However, it may cause drivers
    that have no technical reasons for using IRQF_NO_SUSPEND to set
    that flag just because they happen to share the interrupt line
    with something like a timer.

    Moreover, the generic handling of wakeup interrupts introduced by
    commit 9ce7a25849e8 (genirq: Simplify wakeup mechanism) only works
    for IRQs without any NO_SUSPEND users, so the drivers of wakeup
    devices needing to use shared NO_SUSPEND interrupt lines for
    signaling system wakeup generally have to detect wakeup in their
    interrupt handlers. Thus if they happen to share an interrupt line
    with a NO_SUSPEND user, they also need to request that their
    interrupt handlers be run after suspend_device_irqs().

    In both cases the reason for using IRQF_NO_SUSPEND is not because
    the driver in question has a genuine need to run its interrupt
    handler after suspend_device_irqs(), but because it happens to
    share the line with some other NO_SUSPEND user. Otherwise, the
    driver would do without IRQF_NO_SUSPEND just fine.

    To make it possible to specify that condition explicitly, introduce
    a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
    that, when set, will indicate to the IRQ core that the interrupt
    user is generally fine with suspending the IRQ, but it also can
    tolerate handler invocations after suspend_device_irqs() and, in
    particular, it is capable of detecting system wakeup and triggering
    it as appropriate from its interrupt handler.

    That will allow us to work around a problem with a shared timer
    interrupt line on at91 platforms.

    Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
    Link: http://marc.info/?t=142252775300011&r=1&w=2
    Link: https://lkml.org/lkml/2014/12/15/552
    Reported-by: Boris Brezillon
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Mark Rutland

    Rafael J. Wysocki
     

18 Feb, 2015

1 commit

  • For things like netpoll there is a need to disable an interrupt from
    atomic context. Currently netpoll uses disable_irq() which will
    sleep-wait on threaded handlers and thus forced_irqthreads breaks
    things.

    Provide disable_hardirq(), which uses synchronize_hardirq() to only wait
    for active hardirq handlers; also change synchronize_hardirq() to
    return the status of threaded handlers.

    This will allow one to try-disable an interrupt from atomic context, or
    in case of request_threaded_irq() to only wait for the hardirq part.

    Suggested-by: Sabrina Dubroca
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnd Bergmann
    Cc: David Miller
    Cc: Eyal Perry
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Quentin Lambert
    Cc: Randy Dunlap
    Cc: Russell King
    Link: http://lkml.kernel.org/r/20150205130623.GH5029@twins.programming.kicks-ass.net
    [ Fixed typos and such. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Feb, 2015

1 commit

  • The recent set_affinity commit by me introduced some null
    pointer dereferences on driver unload, because some drivers
    call this function with a NULL argument. This fixes the issue
    by just checking for null before setting the affinity mask.

    Fixes: e2e64a932556 ("genirq: Set initial affinity in irq_set_affinity_hint()")
    Reported-by: Yinghai Lu
    Signed-off-by: Jesse Brandeburg
    CC: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20150128185739.9689.84588.stgit@jbrandeb-cp2.jf.intel.com
    Signed-off-by: Ingo Molnar

    Jesse Brandeburg
     

23 Jan, 2015

1 commit

  • Problem:
    The default behavior of the kernel is somewhat undesirable as all
    requested interrupts end up on CPU0 after registration. A user can
    run irqbalance daemon, or can manually configure smp_affinity via the
    proc filesystem, but the default affinity of the interrupts for all
    devices is always CPU zero, this can cause performance problems or
    very heavy cpu use of only one core if not noticed and fixed by the
    user.

    Solution:
    Enable the setting of the initial affinity directly when the driver
    sets a hint.

    This enabling means that kernel drivers can include an initial
    affinity setting for the interrupt, instead of all interrupts starting
    out life on CPU0. Of course if irqbalance is still running then the
    interrupts will get moved as before.

    This function is currently called by drivers in block, crypto,
    infiniband, ethernet and scsi trees, but only a handful, so these will
    be the devices affected by this change.

    Tested on i40e, and default interrupts were spread across the CPUs
    according to the hint.

    drivers/block/mtip32xx/mtip32xx.c:3
    drivers/block/nvme-core.c:2
    drivers/crypto/qat/qat_dh895xcc/adf_isr.c:3
    drivers/infiniband/hw/qib/qib_iba7322.c:2
    drivers/net/ethernet/intel/i40e/i40e_main.c:3
    drivers/net/ethernet/intel/i40evf/i40evf_main.c:3
    drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3
    drivers/net/ethernet/mellanox/mlx4/en_cq.c:2
    drivers/scsi/hpsa.c:3
    drivers/scsi/lpfc/lpfc_init.c:3
    drivers/scsi/megaraid/megaraid_sas_base.c:8
    drivers/soc/ti/knav_qmss_acc.c:1
    drivers/soc/ti/knav_qmss_queue.c:2
    drivers/virtio/virtio_pci_common.c:2

    Signed-off-by: Jesse Brandeburg
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20141219012206.4220.27491.stgit@jbrandeb-cp2.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Jesse Brandeburg
     

23 Nov, 2014

1 commit

  • Add IRQ_SET_MASK_OK_DONE in addition to IRQ_SET_MASK_OK and
    IRQ_SET_MASK_OK_NOCOPY to support stacked irqchip. IRQ_SET_MASK_OK_DONE
    is the same as IRQ_SET_MASK_OK to irq core. To stacked irqchip, it means
    that ascendant irqchips have done all the work and no more handling
    needed in descendant irqchips.

    Signed-off-by: Jiang Liu
    Cc: Bjorn Helgaas
    Cc: Grant Likely
    Cc: Marc Zyngier
    Cc: Yingjoe Chen
    Cc: Yijing Wang
    Signed-off-by: Thomas Gleixner

    Jiang Liu
     

01 Sep, 2014

2 commits


04 May, 2014

1 commit

  • Till reported that the spurious interrupt detection of threaded
    interrupts is broken in two ways:

    - note_interrupt() is called for each action thread of a shared
    interrupt line. That's wrong as we are only interested whether none
    of the device drivers felt responsible for the interrupt, but by
    calling multiple times for a single interrupt line we account
    IRQ_NONE even if one of the drivers felt responsible.

    - note_interrupt() when called from the thread handler is not
    serialized. That leaves the members of irq_desc which are used for
    the spurious detection unprotected.

    To solve this we need to defer the spurious detection of a threaded
    interrupt to the next hardware interrupt context where we have
    implicit serialization.

    If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
    check whether the previous interrupt requested a deferred check. If
    not, we request a deferred check for the next hardware interrupt and
    return.

    If set, we check whether one of the interrupt threads signaled
    success. Depending on this information we feed the result into the
    spurious detector.

    If one primary handler of a shared interrupt returns IRQ_HANDLED we
    disable the deferred check of irq threads on the same line, as we have
    found at least one device driver who cared.

    Reported-by: Till Straumann
    Signed-off-by: Thomas Gleixner
    Tested-by: Austin Schuh
    Cc: Oliver Hartkopp
    Cc: Wolfgang Grandegger
    Cc: Pavel Pisa
    Cc: Marc Kleine-Budde
    Cc: linux-can@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1303071450130.22263@ionos

    Thomas Gleixner
     

18 Apr, 2014

1 commit

  • The current implementation of irq_set_affinity() refuses rightfully to
    route an interrupt to an offline cpu.

    But there is a special case, where this is actually desired. Some of
    the ARM SoCs have per cpu timers which require setting the affinity
    during cpu startup where the cpu is not yet in the online mask.

    If we can't do that, then the local timer interrupt for the about to
    become online cpu is routed to some random online cpu.

    The developers of the affected machines tried to work around that
    issue, but that results in a massive mess in that timer code.

    We have a yet unused argument in the set_affinity callbacks of the irq
    chips, which I added back then for a similar reason. It was never
    required so it got not used. But I'm happy that I never removed it.

    That allows us to implement a sane handling of the above scenario. So
    the affected SoC drivers can add the required force handling to their
    interrupt chip, switch the timer code to irq_force_affinity() and
    things just work.

    This does not affect any existing user of irq_set_affinity().

    Tagged for stable to allow a simple fix of the affected SoC clock
    event drivers.

    Reported-and-tested-by: Krzysztof Kozlowski
    Signed-off-by: Thomas Gleixner
    Cc: Kyungmin Park
    Cc: Marek Szyprowski
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Tomasz Figa ,
    Cc: Daniel Lezcano ,
    Cc: Kukjin Kim
    Cc: linux-arm-kernel@lists.infradead.org,
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner