07 Apr, 2010

1 commit


31 Mar, 2010

1 commit

  • Network folks reported that directing all MSI-X vectors of their multi
    queue NICs to a single core can cause interrupt stack overflows when
    enough interrupts fire at the same time.

    This is caused by the fact that we run interrupt handlers by default
    with interrupts enabled unless the driver reuqests the interrupt with
    the IRQF_DISABLED set. The NIC handlers do not set this flag, so
    simultaneous interrupts can nest unlimited and cause the stack
    overflow.

    The only safe counter measure is to run the interrupt handlers with
    interrupts disabled. We can't switch to this mode in general right
    now, but it is safe to do so for MSI interrupts.

    Force IRQF_DISABLED for MSI interrupt handlers.

    Signed-off-by: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: stable@kernel.org

    Thomas Gleixner
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit


24 Mar, 2010

2 commits


11 Mar, 2010

1 commit

  • Lars-Peter pointed out that the oneshot threaded interrupt handler
    code has the following race:

    CPU0 CPU1
    hande_level_irq(irq X)
    mask_ack_irq(irq X)
    handle_IRQ_event(irq X)
    wake_up(thread_handler)
    thread handler(irq X) runs
    finalize_oneshot(irq X)
    does not unmask due to
    !(desc->status & IRQ_MASKED)

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This leaves the interrupt line masked forever.

    The reason for this is the inconsistent handling of the IRQ_MASKED
    flag. Instead of setting it in the mask function the oneshot support
    sets the flag after waking up the irq thread.

    The solution for this is to set/clear the IRQ_MASKED status whenever
    we mask/unmask an interrupt line. That's the easy part, but that
    cleanup opens another race:

    CPU0 CPU1
    hande_level_irq(irq)
    mask_ack_irq(irq)
    handle_IRQ_event(irq)
    wake_up(thread_handler)
    thread handler(irq) runs
    finalize_oneshot_irq(irq)
    unmask(irq)
    irq triggers again
    handle_level_irq(irq)
    mask_ack_irq(irq)
    return from irq due to IRQ_INPROGRESS

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This requires that we synchronize finalize_oneshot_irq() with the
    primary handler. If IRQ_INPROGESS is set we wait until the primary
    handler on the other CPU has returned before unmasking the interrupt
    line again.

    We probably have never seen that problem because it does not happen on
    UP and on SMP the irqbalancer protects us by pinning the primary
    handler and the thread to the same CPU.

    Reported-by: Lars-Peter Clausen
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

08 Mar, 2010

1 commit


18 Feb, 2010

3 commits


15 Feb, 2010

1 commit


11 Feb, 2010

2 commits

  • Fix the reference (in comment).

    Signed-off-by: Jean Delvare
    Signed-off-by: Jiri Kosina

    Jean Delvare
     
  • Keep chip_data in create_irq_nr and destroy_irq.

    When two drivers are setting up MSI-X at the same time via
    pci_enable_msix() there is a race. See this dmesg excerpt:

    [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
    [ 85.170611] alloc irq_desc for 99 on node -1
    [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
    [ 85.170614] alloc kstat_irqs on node -1
    [ 85.170616] alloc irq_2_iommu on node -1
    [ 85.170617] alloc irq_desc for 100 on node -1
    [ 85.170619] alloc kstat_irqs on node -1
    [ 85.170621] alloc irq_2_iommu on node -1
    [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
    [ 85.170626] alloc irq_desc for 101 on node -1
    [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
    [ 85.170630] alloc kstat_irqs on node -1
    [ 85.170631] alloc irq_2_iommu on node -1
    [ 85.170635] alloc irq_desc for 102 on node -1
    [ 85.170636] alloc kstat_irqs on node -1
    [ 85.170639] alloc irq_2_iommu on node -1
    [ 85.170646] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000088

    As you can see igb and ixgbe are both alternating on create_irq_nr()
    via pci_enable_msix() in their probe function.

    ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
    choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
    calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
    NULL via dynamic_irq_init().

    igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
    via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

    cfg_new = irq_desc_ptrs[102]->chip_data;
    if (cfg_new->vector != 0)
    continue;

    This hits the NULL deref.

    Another possible race exists via pci_disable_msix() in a driver or in
    the number of error paths that call free_msi_irqs():

    destroy_irq()
    dynamic_irq_cleanup() which sets desc->chip_data = NULL
    ...race window...
    desc->chip_data = cfg;

    Remove the save and restore code for cfg in create_irq_nr() and
    destroy_irq() and take the desc->lock when checking the irq_cfg.

    Reported-and-analyzed-by: Brandon Philips
    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Brandon Phililps
    Cc: stable@kernel.org
    Signed-off-by: H. Peter Anvin

    Brandon Phiilps
     

05 Feb, 2010

1 commit


15 Dec, 2009

1 commit


10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

09 Dec, 2009

1 commit


08 Dec, 2009

1 commit


06 Dec, 2009

1 commit


04 Dec, 2009

2 commits


20 Nov, 2009

1 commit


18 Nov, 2009

1 commit


08 Nov, 2009

2 commits

  • If a parent directory (ie /proc/irq/) could not be created
    we should not attempt to create subdirectories. Otherwise it
    would lead that "smp_affinity" and "spurious" entries are may be
    registered under /proc root instead of a proper place.

    Signed-off-by: Cyrill Gorcunov
    Cc: Rusty Russell
    Cc: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     
  • Prarit reported:
    =================================
    [ INFO: inconsistent lock state ]
    2.6.32-rc5 #1
    ---------------------------------
    inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
    swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    (&irq_desc_lock_class){?.-...}, at: [] try_one_irq+0x32/0x138
    {IN-HARDIRQ-W} state was registered at:
    [] __lock_acquire+0x2fc/0xd5d
    [] lock_acquire+0xf3/0x12d
    [] _spin_lock+0x40/0x89
    [] handle_level_irq+0x30/0x105
    [] handle_irq+0x95/0xb7
    [] do_IRQ+0x6a/0xe0
    [] ret_from_intr+0x0/0x16
    irq event stamp: 195096
    hardirqs last enabled at (195096): [] _spin_unlock_irq+0x3a/0x5c
    hardirqs last disabled at (195095): [] _spin_lock_irq+0x29/0x95
    softirqs last enabled at (195088): [] __do_softirq+0x1c1/0x1ef
    softirqs last disabled at (195093): [] call_softirq+0x1c/0x30

    other info that might help us debug this:
    1 lock held by swapper/0:
    #0: (kernel/irq/spurious.c:21){+.-...}, at: []
    run_timer_softirq+0x1a9/0x315

    stack backtrace:
    Pid: 0, comm: swapper Not tainted 2.6.32-rc5 #1
    Call Trace:
    [] valid_state+0x187/0x1ae
    [] mark_lock+0x129/0x253
    [] __lock_acquire+0x370/0xd5d
    [] lock_acquire+0xf3/0x12d
    [] _spin_lock+0x40/0x89
    [] try_one_irq+0x32/0x138
    [] poll_all_shared_irqs+0x41/0x6d
    [] poll_spurious_irqs+0x1c/0x49
    [] run_timer_softirq+0x239/0x315
    [] __do_softirq+0x102/0x1ef
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x59/0xca
    [] irq_exit+0x58/0xae
    [] smp_apic_timer_interrupt+0x94/0xba
    [] apic_timer_interrupt+0x13/0x20

    The reason is that try_one_irq() is called from hardirq context with
    interrupts disabled and from softirq context (poll_all_shared_irqs())
    with interrupts enabled.

    Disable interrupts before calling it from poll_all_shared_irqs().

    Reported-and-tested-by: Prarit Bhargava
    Signed-off-by: Yong Zhang
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     

04 Nov, 2009

2 commits


12 Oct, 2009

1 commit


12 Sep, 2009

1 commit


29 Aug, 2009

1 commit


27 Aug, 2009

1 commit

  • Masking oneshot edge type interrupts is wrong as we might lose an
    interrupt which is issued when the threaded handler is handling the
    device. We can keep the irq unmasked safely as with edge type
    interrupts there is no danger of interrupt floods. If the threaded
    handler has not yet finished then IRQTF_RUNTHREAD is set which will
    keep the handler thread active.

    Debugged and verified in preempt-rt.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

25 Aug, 2009

1 commit


18 Aug, 2009

1 commit

  • The wake_up_process() of the new irq thread in __setup_irq() is too
    early as the irqaction is not yet fully initialized especially
    action->irq is not yet set. The interrupt thread might dereference the
    wrong irq descriptor.

    Move the wakeup after the action is installed and action->irq has been
    set.

    Reported-by: Michael Buesch
    Signed-off-by: Thomas Gleixner
    Tested-by: Michael Buesch

    Thomas Gleixner
     

17 Aug, 2009

3 commits

  • Interrupt chips which are behind a slow bus (i2c, spi ...) and
    demultiplex other interrupt sources need to run their interrupt
    handler in a thread.

    The demultiplexed interrupt handlers need to run in thread context as
    well and need to finish before the demux handler thread can reenable
    the interrupt line. So the easiest way is to run the sub device
    handlers in the context of the demultiplexing handler thread.

    To avoid that a separate thread is created for the subdevices the
    function set_nested_irq_thread() is provided which sets the
    IRQ_NESTED_THREAD flag in the interrupt descriptor.

    A driver which calls request_threaded_irq() must not be aware of the
    fact that the threaded handler is called in the context of the
    demultiplexing handler thread. The setup code checks the
    IRQ_NESTED_THREAD flag which was set from the irq chip setup code and
    does not setup a separate thread for the interrupt. The primary
    function which is provided by the device driver is replaced by an
    internal dummy function which warns when it is called.

    For the demultiplexing handler a helper function handle_nested_irq()
    is provided which calls the demux interrupt thread function in the
    context of the caller and does the proper interrupt accounting and
    takes the interrupt disabled status of the demultiplexed subdevice
    into account.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • Some interrupt chips are connected to a "slow" bus (i2c, spi ...). The
    bus access needs to sleep and therefor cannot be called in atomic
    contexts.

    Some of the generic interrupt management functions like disable_irq(),
    enable_irq() ... call interrupt chip functions with the irq_desc->lock
    held and interrupts disabled. This does not work for such devices.

    Provide a separate synchronization mechanism for such interrupt
    chips. The irq_chip structure is extended by two optional functions
    (bus_lock and bus_sync_and_unlock).

    The idea is to serialize the bus access for those operations in the
    core code so that drivers which are behind that bus operated interrupt
    controller do not have to worry about it and just can use the normal
    interfaces. To achieve this we add two function pointers to the
    irq_chip: bus_lock and bus_sync_unlock.

    bus_lock() is called to serialize access to the interrupt controller
    bus.

    Now the core code can issue chip->mask/unmask ... commands without
    changing the fast path code at all. The chip implementation merily
    stores that information in a chip private data structure and
    returns. No bus interaction as these functions are called from atomic
    context.

    After that bus_sync_unlock() is called outside the atomic context. Now
    the chip implementation issues the bus commands, waits for completion
    and unlocks the interrupt controller bus.

    The irq_chip implementation as pseudo code:

    struct irq_chip_data {
    struct mutex mutex;
    unsigned int irq_offset;
    unsigned long mask;
    unsigned long mask_status;
    }

    static void bus_lock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    mutex_lock(&data->mutex);
    }

    static void mask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask |= (1 << irq);
    }

    static void unmask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask &= ~(1 << irq);
    }

    static void bus_sync_unlock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    if (data->mask != data->mask_status) {
    do_bus_magic_to_set_mask(data->mask);
    data->mask_status = data->mask;
    }
    mutex_unlock(&data->mutex);
    }

    The device drivers can use request_threaded_irq, free_irq, disable_irq
    and enable_irq as usual with the only restriction that the calls need
    to come from non atomic context.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • For threaded interrupt handlers we expect the hard interrupt handler
    part to mask the interrupt on the originating device. The interrupt
    line itself is reenabled after the hard interrupt handler has
    executed.

    This requires access to the originating device from hard interrupt
    context which is not always possible. There are devices which can only
    be accessed via a bus (i2c, spi, ...). The bus access requires thread
    context. For such devices we need to keep the interrupt line masked
    until the threaded handler has executed.

    Add a new flag IRQF_ONESHOT which allows drivers to request that the
    interrupt is not unmasked after the hard interrupt context handler has
    been executed and the thread has been woken. The interrupt line is
    unmasked after the thread handler function has been executed.

    Note that for now IRQF_ONESHOT cannot be used with IRQF_SHARED to
    avoid complex accounting mechanisms.

    For oneshot interrupts the primary handler simply returns
    IRQ_WAKE_THREAD and does nothing else. A generic implementation
    irq_default_primary_handler() is provided to avoid useless copies all
    over the place. It is automatically installed when
    request_threaded_irq() is called with handler=NULL and
    thread_fn!=NULL.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     

14 Aug, 2009

1 commit

  • free_irq() can remove an irqaction while the corresponding interrupt
    is in progress, but free_irq() sets action->thread to NULL
    unconditionally, which might lead to a NULL pointer dereference in
    handle_IRQ_event() when the hard interrupt context tries to wake up
    the handler thread.

    Prevent this by moving the thread stop after synchronize_irq(). No
    need to set action->thread to NULL either as action is going to be
    freed anyway.

    This fixes a boot crash reported against preempt-rt which uses the
    mainline irq threads code to implement full irq threading.

    [ tglx: removed local irqthread variable ]

    Signed-off-by: Linus Torvalds
    Signed-off-by: Thomas Gleixner

    Linus Torvalds
     

09 Aug, 2009

1 commit


08 Aug, 2009

1 commit