29 Jul, 2010

1 commit

  • A small number of users of IRQF_TIMER are using it for the implied no
    suspend behaviour on interrupts which are not timer interrupts.

    Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to
    __IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags.

    Signed-off-by: Ian Campbell
    Cc: Jeremy Fitzhardinge
    Cc: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Grant Likely
    Cc: xen-devel@lists.xensource.com
    Cc: linux-input@vger.kernel.org
    Cc: linuxppc-dev@ozlabs.org
    Cc: devicetree-discuss@lists.ozlabs.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Ian Campbell
     

09 Jun, 2010

1 commit

  • The set_type() function can change the chip implementation when the
    trigger mode changes. That might result in using an non-initialized
    irq chip when called from __setup_irq() or when called via
    set_irq_type() on an already enabled irq.

    The set_irq_type() function should not be called on an enabled irq,
    but because we forgot to put a check into it, we have a bunch of users
    which grew the habit of doing that and it never blew up as the
    function is serialized via desc->lock against all users of desc->chip
    and they never hit the non-initialized irq chip issue.

    The easy fix for the __setup_irq() issue would be to move the
    irq_chip_set_defaults(desc->chip) call after the trigger setting to
    make sure that a chip change is covered.

    But as we have already users, which do the type setting after
    request_irq(), the safe fix for now is to call irq_chip_set_defaults()
    from __irq_set_trigger() when desc->set_type() changed the irq chip.

    It needs a deeper analysis whether we should refuse to change the chip
    on an already enabled irq, but that'd be a large scale change to fix
    all the existing users. So that's neither stable nor 2.6.35 material.

    Reported-by: Esben Haabendal
    Signed-off-by: Thomas Gleixner
    Cc: Benjamin Herrenschmidt
    Cc: linuxppc-dev
    Cc: stable@kernel.org

    Thomas Gleixner
     

12 May, 2010

1 commit

  • When an interrupt is disabled and torn down, the CPU mask returned
    through affinity_hint right now is all CPUs. Also, for drivers that
    don't provide an affinity_hint mask, this can be misleading. There
    should be no hint at all, meaning an empty CPU mask.

    [ tglx: use zalloc_cpumask_var instead of clearing it under the lock ]

    Signed-off-by: Peter P Waskiewicz Jr
    Cc: davem@davemloft.net
    Cc: arjan@linux.jf.intel.com
    Cc: bhutchings@solarflare.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter P Waskiewicz Jr
     

03 May, 2010

1 commit

  • This patch adds a cpumask affinity hint to the irq_desc structure,
    along with a registration function and a read-only proc entry for each
    interrupt.

    This affinity_hint handle for each interrupt can be used by underlying
    drivers that need a better mechanism to control interrupt affinity.
    The underlying driver can register a cpumask for the interrupt, which
    will allow the driver to provide the CPU mask for the interrupt to
    anything that requests it. The intent is to extend the userspace
    daemon, irqbalance, to help hint to it a preferred CPU mask to balance
    the interrupt into.

    [ tglx: Fixed compile warnings, added WARN_ON, made SMP only ]

    Signed-off-by: Peter P Waskiewicz Jr
    Cc: davem@davemloft.net
    Cc: arjan@linux.jf.intel.com
    Cc: bhutchings@solarflare.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter P Waskiewicz Jr
     

13 Apr, 2010

4 commits

  • Remove all code which is related to IRQF_DISABLED from the core kernel
    code. IRQF_DISABLED still exists as a flag, but becomes a NOOP and
    will be removed after a grace period. That way we can easily revert to
    the previous behaviour by just restoring the core code.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: Andi Kleen
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    LKML-Reference:

    Thomas Gleixner
     
  • Running interrupt handlers with interrupts enabled can cause stack
    overflows. That has been observed with multiqueue NICs delivering all
    their interrupts to a single core. We might band aid that somehow by
    checking the interrupt stacks, but the real safe fix is to run the irq
    handlers with interrupts disabled.

    Drivers for whacky hardware still can reenable them in the handler
    itself, if the need arises. (They do already due to lockdep)

    The risk of doing this is rather low:

    - lockdep already enforces this
    - CONFIG_NOHZ has shaken out the drivers which relied on jiffies updates
    - time keeping is not longer sensitive to the timer interrupt being delayed

    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: Andi Kleen
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    LKML-Reference:

    Ingo Molnar
     
  • Now that we enjoy threaded interrupts, we're starting to see irq_chip
    implementations (wm831x, pca953x) that make use of threaded interrupts
    for the controller, and nested interrupts for the client interrupt. It
    all works very well, with one drawback:

    Drivers requesting an IRQ must now know whether the handler will
    run in a thread context or not, and call request_threaded_irq() or
    request_irq() accordingly.

    The problem is that the requesting driver sometimes doesn't know
    about the nature of the interrupt, specially when the interrupt
    controller is a discrete chip (typically a GPIO expander connected
    over I2C) that can be connected to a wide variety of otherwise perfectly
    supported hardware.

    This patch introduces the request_any_context_irq() function that mostly
    mimics the usual request_irq(), except that it checks whether the irq
    level is configured as nested or not, and calls the right backend.
    On success, it also returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED.

    [ tglx: Made return value an enum, simplified code and made the export
    of request_any_context_irq GPL ]

    Signed-off-by: Marc Zyngier
    Cc:
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     
  • Reason: Get the upstream IRQF_DISABLED related changes.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

07 Apr, 2010

1 commit


31 Mar, 2010

1 commit

  • Network folks reported that directing all MSI-X vectors of their multi
    queue NICs to a single core can cause interrupt stack overflows when
    enough interrupts fire at the same time.

    This is caused by the fact that we run interrupt handlers by default
    with interrupts enabled unless the driver reuqests the interrupt with
    the IRQF_DISABLED set. The NIC handlers do not set this flag, so
    simultaneous interrupts can nest unlimited and cause the stack
    overflow.

    The only safe counter measure is to run the interrupt handlers with
    interrupts disabled. We can't switch to this mode in general right
    now, but it is safe to do so for MSI interrupts.

    Force IRQF_DISABLED for MSI interrupt handlers.

    Signed-off-by: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: stable@kernel.org

    Thomas Gleixner
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit


24 Mar, 2010

3 commits


11 Mar, 2010

1 commit

  • Lars-Peter pointed out that the oneshot threaded interrupt handler
    code has the following race:

    CPU0 CPU1
    hande_level_irq(irq X)
    mask_ack_irq(irq X)
    handle_IRQ_event(irq X)
    wake_up(thread_handler)
    thread handler(irq X) runs
    finalize_oneshot(irq X)
    does not unmask due to
    !(desc->status & IRQ_MASKED)

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This leaves the interrupt line masked forever.

    The reason for this is the inconsistent handling of the IRQ_MASKED
    flag. Instead of setting it in the mask function the oneshot support
    sets the flag after waking up the irq thread.

    The solution for this is to set/clear the IRQ_MASKED status whenever
    we mask/unmask an interrupt line. That's the easy part, but that
    cleanup opens another race:

    CPU0 CPU1
    hande_level_irq(irq)
    mask_ack_irq(irq)
    handle_IRQ_event(irq)
    wake_up(thread_handler)
    thread handler(irq) runs
    finalize_oneshot_irq(irq)
    unmask(irq)
    irq triggers again
    handle_level_irq(irq)
    mask_ack_irq(irq)
    return from irq due to IRQ_INPROGRESS

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This requires that we synchronize finalize_oneshot_irq() with the
    primary handler. If IRQ_INPROGESS is set we wait until the primary
    handler on the other CPU has returned before unmasking the interrupt
    line again.

    We probably have never seen that problem because it does not happen on
    UP and on SMP the irqbalancer protects us by pinning the primary
    handler and the thread to the same CPU.

    Reported-by: Lars-Peter Clausen
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

08 Mar, 2010

1 commit


18 Feb, 2010

3 commits


15 Feb, 2010

1 commit


11 Feb, 2010

2 commits

  • Fix the reference (in comment).

    Signed-off-by: Jean Delvare
    Signed-off-by: Jiri Kosina

    Jean Delvare
     
  • Keep chip_data in create_irq_nr and destroy_irq.

    When two drivers are setting up MSI-X at the same time via
    pci_enable_msix() there is a race. See this dmesg excerpt:

    [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
    [ 85.170611] alloc irq_desc for 99 on node -1
    [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
    [ 85.170614] alloc kstat_irqs on node -1
    [ 85.170616] alloc irq_2_iommu on node -1
    [ 85.170617] alloc irq_desc for 100 on node -1
    [ 85.170619] alloc kstat_irqs on node -1
    [ 85.170621] alloc irq_2_iommu on node -1
    [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
    [ 85.170626] alloc irq_desc for 101 on node -1
    [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
    [ 85.170630] alloc kstat_irqs on node -1
    [ 85.170631] alloc irq_2_iommu on node -1
    [ 85.170635] alloc irq_desc for 102 on node -1
    [ 85.170636] alloc kstat_irqs on node -1
    [ 85.170639] alloc irq_2_iommu on node -1
    [ 85.170646] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000088

    As you can see igb and ixgbe are both alternating on create_irq_nr()
    via pci_enable_msix() in their probe function.

    ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
    choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
    calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
    NULL via dynamic_irq_init().

    igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
    via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

    cfg_new = irq_desc_ptrs[102]->chip_data;
    if (cfg_new->vector != 0)
    continue;

    This hits the NULL deref.

    Another possible race exists via pci_disable_msix() in a driver or in
    the number of error paths that call free_msi_irqs():

    destroy_irq()
    dynamic_irq_cleanup() which sets desc->chip_data = NULL
    ...race window...
    desc->chip_data = cfg;

    Remove the save and restore code for cfg in create_irq_nr() and
    destroy_irq() and take the desc->lock when checking the irq_cfg.

    Reported-and-analyzed-by: Brandon Philips
    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Brandon Phililps
    Cc: stable@kernel.org
    Signed-off-by: H. Peter Anvin

    Brandon Phiilps
     

05 Feb, 2010

1 commit


15 Dec, 2009

1 commit


10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

09 Dec, 2009

1 commit


08 Dec, 2009

1 commit


06 Dec, 2009

1 commit


04 Dec, 2009

2 commits


20 Nov, 2009

1 commit


18 Nov, 2009

1 commit


08 Nov, 2009

2 commits

  • If a parent directory (ie /proc/irq/) could not be created
    we should not attempt to create subdirectories. Otherwise it
    would lead that "smp_affinity" and "spurious" entries are may be
    registered under /proc root instead of a proper place.

    Signed-off-by: Cyrill Gorcunov
    Cc: Rusty Russell
    Cc: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     
  • Prarit reported:
    =================================
    [ INFO: inconsistent lock state ]
    2.6.32-rc5 #1
    ---------------------------------
    inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
    swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    (&irq_desc_lock_class){?.-...}, at: [] try_one_irq+0x32/0x138
    {IN-HARDIRQ-W} state was registered at:
    [] __lock_acquire+0x2fc/0xd5d
    [] lock_acquire+0xf3/0x12d
    [] _spin_lock+0x40/0x89
    [] handle_level_irq+0x30/0x105
    [] handle_irq+0x95/0xb7
    [] do_IRQ+0x6a/0xe0
    [] ret_from_intr+0x0/0x16
    irq event stamp: 195096
    hardirqs last enabled at (195096): [] _spin_unlock_irq+0x3a/0x5c
    hardirqs last disabled at (195095): [] _spin_lock_irq+0x29/0x95
    softirqs last enabled at (195088): [] __do_softirq+0x1c1/0x1ef
    softirqs last disabled at (195093): [] call_softirq+0x1c/0x30

    other info that might help us debug this:
    1 lock held by swapper/0:
    #0: (kernel/irq/spurious.c:21){+.-...}, at: []
    run_timer_softirq+0x1a9/0x315

    stack backtrace:
    Pid: 0, comm: swapper Not tainted 2.6.32-rc5 #1
    Call Trace:
    [] valid_state+0x187/0x1ae
    [] mark_lock+0x129/0x253
    [] __lock_acquire+0x370/0xd5d
    [] lock_acquire+0xf3/0x12d
    [] _spin_lock+0x40/0x89
    [] try_one_irq+0x32/0x138
    [] poll_all_shared_irqs+0x41/0x6d
    [] poll_spurious_irqs+0x1c/0x49
    [] run_timer_softirq+0x239/0x315
    [] __do_softirq+0x102/0x1ef
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x59/0xca
    [] irq_exit+0x58/0xae
    [] smp_apic_timer_interrupt+0x94/0xba
    [] apic_timer_interrupt+0x13/0x20

    The reason is that try_one_irq() is called from hardirq context with
    interrupts disabled and from softirq context (poll_all_shared_irqs())
    with interrupts enabled.

    Disable interrupts before calling it from poll_all_shared_irqs().

    Reported-and-tested-by: Prarit Bhargava
    Signed-off-by: Yong Zhang
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

04 Nov, 2009

2 commits


12 Oct, 2009

1 commit


12 Sep, 2009

1 commit


29 Aug, 2009

1 commit