09 Jun, 2010

1 commit

  • The set_type() function can change the chip implementation when the
    trigger mode changes. That might result in using an non-initialized
    irq chip when called from __setup_irq() or when called via
    set_irq_type() on an already enabled irq.

    The set_irq_type() function should not be called on an enabled irq,
    but because we forgot to put a check into it, we have a bunch of users
    which grew the habit of doing that and it never blew up as the
    function is serialized via desc->lock against all users of desc->chip
    and they never hit the non-initialized irq chip issue.

    The easy fix for the __setup_irq() issue would be to move the
    irq_chip_set_defaults(desc->chip) call after the trigger setting to
    make sure that a chip change is covered.

    But as we have already users, which do the type setting after
    request_irq(), the safe fix for now is to call irq_chip_set_defaults()
    from __irq_set_trigger() when desc->set_type() changed the irq chip.

    It needs a deeper analysis whether we should refuse to change the chip
    on an already enabled irq, but that'd be a large scale change to fix
    all the existing users. So that's neither stable nor 2.6.35 material.

    Reported-by: Esben Haabendal
    Signed-off-by: Thomas Gleixner
    Cc: Benjamin Herrenschmidt
    Cc: linuxppc-dev
    Cc: stable@kernel.org

    Thomas Gleixner
     

03 May, 2010

1 commit

  • This patch adds a cpumask affinity hint to the irq_desc structure,
    along with a registration function and a read-only proc entry for each
    interrupt.

    This affinity_hint handle for each interrupt can be used by underlying
    drivers that need a better mechanism to control interrupt affinity.
    The underlying driver can register a cpumask for the interrupt, which
    will allow the driver to provide the CPU mask for the interrupt to
    anything that requests it. The intent is to extend the userspace
    daemon, irqbalance, to help hint to it a preferred CPU mask to balance
    the interrupt into.

    [ tglx: Fixed compile warnings, added WARN_ON, made SMP only ]

    Signed-off-by: Peter P Waskiewicz Jr
    Cc: davem@davemloft.net
    Cc: arjan@linux.jf.intel.com
    Cc: bhutchings@solarflare.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter P Waskiewicz Jr
     

13 Apr, 2010

2 commits

  • Remove all code which is related to IRQF_DISABLED from the core kernel
    code. IRQF_DISABLED still exists as a flag, but becomes a NOOP and
    will be removed after a grace period. That way we can easily revert to
    the previous behaviour by just restoring the core code.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: Andi Kleen
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    LKML-Reference:

    Thomas Gleixner
     
  • Now that we enjoy threaded interrupts, we're starting to see irq_chip
    implementations (wm831x, pca953x) that make use of threaded interrupts
    for the controller, and nested interrupts for the client interrupt. It
    all works very well, with one drawback:

    Drivers requesting an IRQ must now know whether the handler will
    run in a thread context or not, and call request_threaded_irq() or
    request_irq() accordingly.

    The problem is that the requesting driver sometimes doesn't know
    about the nature of the interrupt, specially when the interrupt
    controller is a discrete chip (typically a GPIO expander connected
    over I2C) that can be connected to a wide variety of otherwise perfectly
    supported hardware.

    This patch introduces the request_any_context_irq() function that mostly
    mimics the usual request_irq(), except that it checks whether the irq
    level is configured as nested or not, and calls the right backend.
    On success, it also returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED.

    [ tglx: Made return value an enum, simplified code and made the export
    of request_any_context_irq GPL ]

    Signed-off-by: Marc Zyngier
    Cc:
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

31 Mar, 2010

1 commit

  • Network folks reported that directing all MSI-X vectors of their multi
    queue NICs to a single core can cause interrupt stack overflows when
    enough interrupts fire at the same time.

    This is caused by the fact that we run interrupt handlers by default
    with interrupts enabled unless the driver reuqests the interrupt with
    the IRQF_DISABLED set. The NIC handlers do not set this flag, so
    simultaneous interrupts can nest unlimited and cause the stack
    overflow.

    The only safe counter measure is to run the interrupt handlers with
    interrupts disabled. We can't switch to this mode in general right
    now, but it is safe to do so for MSI interrupts.

    Force IRQF_DISABLED for MSI interrupt handlers.

    Signed-off-by: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: stable@kernel.org

    Thomas Gleixner
     

24 Mar, 2010

1 commit


11 Mar, 2010

1 commit

  • Lars-Peter pointed out that the oneshot threaded interrupt handler
    code has the following race:

    CPU0 CPU1
    hande_level_irq(irq X)
    mask_ack_irq(irq X)
    handle_IRQ_event(irq X)
    wake_up(thread_handler)
    thread handler(irq X) runs
    finalize_oneshot(irq X)
    does not unmask due to
    !(desc->status & IRQ_MASKED)

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This leaves the interrupt line masked forever.

    The reason for this is the inconsistent handling of the IRQ_MASKED
    flag. Instead of setting it in the mask function the oneshot support
    sets the flag after waking up the irq thread.

    The solution for this is to set/clear the IRQ_MASKED status whenever
    we mask/unmask an interrupt line. That's the easy part, but that
    cleanup opens another race:

    CPU0 CPU1
    hande_level_irq(irq)
    mask_ack_irq(irq)
    handle_IRQ_event(irq)
    wake_up(thread_handler)
    thread handler(irq) runs
    finalize_oneshot_irq(irq)
    unmask(irq)
    irq triggers again
    handle_level_irq(irq)
    mask_ack_irq(irq)
    return from irq due to IRQ_INPROGRESS

    return from irq
    does not unmask due to
    (desc->status & IRQ_ONESHOT)

    This requires that we synchronize finalize_oneshot_irq() with the
    primary handler. If IRQ_INPROGESS is set we wait until the primary
    handler on the other CPU has returned before unmasking the interrupt
    line again.

    We probably have never seen that problem because it does not happen on
    UP and on SMP the irqbalancer protects us by pinning the primary
    handler and the thread to the same CPU.

    Reported-by: Lars-Peter Clausen
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org

    Thomas Gleixner
     

15 Dec, 2009

1 commit


09 Dec, 2009

1 commit


12 Sep, 2009

1 commit


18 Aug, 2009

1 commit

  • The wake_up_process() of the new irq thread in __setup_irq() is too
    early as the irqaction is not yet fully initialized especially
    action->irq is not yet set. The interrupt thread might dereference the
    wrong irq descriptor.

    Move the wakeup after the action is installed and action->irq has been
    set.

    Reported-by: Michael Buesch
    Signed-off-by: Thomas Gleixner
    Tested-by: Michael Buesch

    Thomas Gleixner
     

17 Aug, 2009

3 commits

  • Interrupt chips which are behind a slow bus (i2c, spi ...) and
    demultiplex other interrupt sources need to run their interrupt
    handler in a thread.

    The demultiplexed interrupt handlers need to run in thread context as
    well and need to finish before the demux handler thread can reenable
    the interrupt line. So the easiest way is to run the sub device
    handlers in the context of the demultiplexing handler thread.

    To avoid that a separate thread is created for the subdevices the
    function set_nested_irq_thread() is provided which sets the
    IRQ_NESTED_THREAD flag in the interrupt descriptor.

    A driver which calls request_threaded_irq() must not be aware of the
    fact that the threaded handler is called in the context of the
    demultiplexing handler thread. The setup code checks the
    IRQ_NESTED_THREAD flag which was set from the irq chip setup code and
    does not setup a separate thread for the interrupt. The primary
    function which is provided by the device driver is replaced by an
    internal dummy function which warns when it is called.

    For the demultiplexing handler a helper function handle_nested_irq()
    is provided which calls the demux interrupt thread function in the
    context of the caller and does the proper interrupt accounting and
    takes the interrupt disabled status of the demultiplexed subdevice
    into account.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • Some interrupt chips are connected to a "slow" bus (i2c, spi ...). The
    bus access needs to sleep and therefor cannot be called in atomic
    contexts.

    Some of the generic interrupt management functions like disable_irq(),
    enable_irq() ... call interrupt chip functions with the irq_desc->lock
    held and interrupts disabled. This does not work for such devices.

    Provide a separate synchronization mechanism for such interrupt
    chips. The irq_chip structure is extended by two optional functions
    (bus_lock and bus_sync_and_unlock).

    The idea is to serialize the bus access for those operations in the
    core code so that drivers which are behind that bus operated interrupt
    controller do not have to worry about it and just can use the normal
    interfaces. To achieve this we add two function pointers to the
    irq_chip: bus_lock and bus_sync_unlock.

    bus_lock() is called to serialize access to the interrupt controller
    bus.

    Now the core code can issue chip->mask/unmask ... commands without
    changing the fast path code at all. The chip implementation merily
    stores that information in a chip private data structure and
    returns. No bus interaction as these functions are called from atomic
    context.

    After that bus_sync_unlock() is called outside the atomic context. Now
    the chip implementation issues the bus commands, waits for completion
    and unlocks the interrupt controller bus.

    The irq_chip implementation as pseudo code:

    struct irq_chip_data {
    struct mutex mutex;
    unsigned int irq_offset;
    unsigned long mask;
    unsigned long mask_status;
    }

    static void bus_lock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    mutex_lock(&data->mutex);
    }

    static void mask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask |= (1 << irq);
    }

    static void unmask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask &= ~(1 << irq);
    }

    static void bus_sync_unlock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    if (data->mask != data->mask_status) {
    do_bus_magic_to_set_mask(data->mask);
    data->mask_status = data->mask;
    }
    mutex_unlock(&data->mutex);
    }

    The device drivers can use request_threaded_irq, free_irq, disable_irq
    and enable_irq as usual with the only restriction that the calls need
    to come from non atomic context.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • For threaded interrupt handlers we expect the hard interrupt handler
    part to mask the interrupt on the originating device. The interrupt
    line itself is reenabled after the hard interrupt handler has
    executed.

    This requires access to the originating device from hard interrupt
    context which is not always possible. There are devices which can only
    be accessed via a bus (i2c, spi, ...). The bus access requires thread
    context. For such devices we need to keep the interrupt line masked
    until the threaded handler has executed.

    Add a new flag IRQF_ONESHOT which allows drivers to request that the
    interrupt is not unmasked after the hard interrupt context handler has
    been executed and the thread has been woken. The interrupt line is
    unmasked after the thread handler function has been executed.

    Note that for now IRQF_ONESHOT cannot be used with IRQF_SHARED to
    avoid complex accounting mechanisms.

    For oneshot interrupts the primary handler simply returns
    IRQ_WAKE_THREAD and does nothing else. A generic implementation
    irq_default_primary_handler() is provided to avoid useless copies all
    over the place. It is automatically installed when
    request_threaded_irq() is called with handler=NULL and
    thread_fn!=NULL.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     

14 Aug, 2009

1 commit

  • free_irq() can remove an irqaction while the corresponding interrupt
    is in progress, but free_irq() sets action->thread to NULL
    unconditionally, which might lead to a NULL pointer dereference in
    handle_IRQ_event() when the hard interrupt context tries to wake up
    the handler thread.

    Prevent this by moving the thread stop after synchronize_irq(). No
    need to set action->thread to NULL either as action is going to be
    freed anyway.

    This fixes a boot crash reported against preempt-rt which uses the
    mainline irq threads code to implement full irq threading.

    [ tglx: removed local irqthread variable ]

    Signed-off-by: Linus Torvalds
    Signed-off-by: Thomas Gleixner

    Linus Torvalds
     

23 Jul, 2009

1 commit

  • Since genirq: Delegate irq affinity setting to the irq thread
    (591d2fb02ea80472d846c0b8507007806bdd69cc) compilation with
    CONFIG_SMP=n fails with following error:

    /usr/src/linux-2.6/kernel/irq/manage.c:
    In function 'irq_thread_check_affinity':
    /usr/src/linux-2.6/kernel/irq/manage.c:475:
    error: 'struct irq_desc' has no member named 'affinity'
    make[4]: *** [kernel/irq/manage.o] Error 1

    That commit adds a new function irq_thread_check_affinity() which
    uses struct irq_desc.affinity which is only available for CONFIG_SMP=y.
    Move that function under #ifdef CONFIG_SMP.

    [ tglx@brownpaperbag: compile and boot tested on UP and SMP ]

    Signed-off-by: Bruno Premont
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Bruno Premont
     

21 Jul, 2009

1 commit

  • irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might
    sleep, but irq_set_thread_affinity() is called with desc->lock held
    and can be called from hard interrupt context as well. The code has
    another bug as it does not hold a ref on the task struct as required
    by set_cpus_allowed_ptr().

    Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time
    the thread runs it migrates itself. Solves all of the above problems
    nicely.

    Add kerneldoc to irq_set_thread_affinity() while at it.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:

    Thomas Gleixner
     

21 Jun, 2009

1 commit


13 May, 2009

1 commit

  • Trying to implement a driver to use threaded irqs, I was confused when the
    return value to use that was described in the comment above
    request_threaded_irq was not defined.

    Turns out that the enum is IRQ_WAKE_THREAD where as the comment said
    IRQ_THREAD_WAKE.

    [Impact: do not confuse developers with wrong comments ]

    Signed-off-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     

28 Apr, 2009

1 commit

  • irq_set_affinity() and move_masked_irq() try to assign affinity
    before calling chip set_affinity(). Some archs are assigning it
    in ->set_affinity() again.

    We do something like:

    cpumask_cpy(desc->affinity, mask);
    desc->chip->set_affinity(mask);

    But in the failure path, affinity should not be touched - otherwise
    we'll end up with a different affinity mask despite the failure to
    migrate the IRQ.

    So try to update the afffinity only if set_affinity returns with 0.
    Also call irq_set_thread_affinity accordingly.

    v2: update after "irq, x86: Remove IRQ_DISABLED check in process context IRQ move"
    v3: according to Ingo, change set_affinity() in irq_chip should return int.
    v4: update comments by removing moving irq_desc code.

    [ Impact: fix /proc/irq/*/smp_affinity setting corner case bug ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

23 Apr, 2009

1 commit

  • When requesting an IRQ, the DEBUG_SHIRQ code executes a fake IRQ just to make
    sure the driver is ready to receive an IRQ immediately. The problem was that
    this fake IRQ was being executed even if interrupt line failed to be allocated
    by __setup_irq.

    Signed-off-by: Luis Henriques
    LKML-Reference:
    Signed-off-by: Thomas Gleixner
    [ fixed bug pointed out by a warning reported by Stephen Rothwell ]
    Cc: Stephen Rothwell
    Signed-off-by: Ingo Molnar

    Luis Henriques
     

14 Apr, 2009

1 commit

  • As discussed in the thread here:

    http://marc.info/?l=linux-kernel&m=123964468521142&w=2

    Eric W. Biederman observed:

    > It looks like some additional bugs have slipped in since last I looked.
    >
    > set_irq_affinity does this:
    > ifdef CONFIG_GENERIC_PENDING_IRQ
    > if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
    > cpumask_copy(desc->affinity, cpumask);
    > desc->chip->set_affinity(irq, cpumask);
    > } else {
    > desc->status |= IRQ_MOVE_PENDING;
    > cpumask_copy(desc->pending_mask, cpumask);
    > }
    > #else
    >
    > That IRQ_DISABLED case is a software state and as such it has nothing to
    > do with how safe it is to move an irq in process context.

    [...]

    >
    > The only reason we migrate MSIs in interrupt context today is that there
    > wasn't infrastructure for support migration both in interrupt context
    > and outside of it.

    Yes. The idea here was to force the MSI migration to happen in process
    context. One of the patches in the series did

    disable_irq(dev->irq);
    irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
    enable_irq(dev->irq);

    with the above patch adding irq/manage code check for interrupt disabled
    and moving the interrupt in process context.

    IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
    code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
    when we eventually submitted the patch upstream. But, looks like I did a
    blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.

    Below patch fixes this. i.e., revert commit 932775a4ab622e3c99bd59f14cc
    and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
    in generic code as set_affinity routines are doing it internally.

    Reported-by: "Eric W. Biederman"
    Signed-off-by: Venkatesh Pallipadi
    Acked-by: "Eric W. Biederman"
    Cc: "Li Shaohua"
    Cc: Gary Hade
    Cc: "lcm@us.ibm.com"
    Cc: suresh.b.siddha@intel.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Pallipadi, Venkatesh
     

06 Apr, 2009

1 commit


31 Mar, 2009

1 commit

  • Introduce helper functions allowing us to prevent device drivers from
    getting any interrupts (without disabling interrupts on the CPU)
    during suspend (or hibernation) and to make them start to receive
    interrupts again during the subsequent resume. These functions make it
    possible to keep timer interrupts enabled while the "late" suspend and
    "early" resume callbacks provided by device drivers are being
    executed. In turn, this allows device drivers' "late" suspend and
    "early" resume callbacks to sleep, execute ACPI callbacks etc.

    The functions introduced here will be used to rework the handling of
    interrupts during suspend (hibernation) and resume. Namely,
    interrupts will only be disabled on the CPU right before suspending
    sysdevs, while device drivers will be prevented from receiving
    interrupts, with the help of the new helper function, before their
    "late" suspend callbacks run (and analogously during resume).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Ingo Molnar

    Rafael J. Wysocki
     

28 Mar, 2009

1 commit


24 Mar, 2009

3 commits

  • Delta patch to address the review comments.

    - Implement warning when IRQ_WAKE_THREAD is requested and no
    thread handler installed
    - coding style fixes

    Pointed-out-by: Christoph Hellwig
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Add support for threaded interrupt handlers:

    A device driver can request that its main interrupt handler runs in a
    thread. To achive this the device driver requests the interrupt with
    request_threaded_irq() and provides additionally to the handler a
    thread function. The handler function is called in hard interrupt
    context and needs to check whether the interrupt originated from the
    device. If the interrupt originated from the device then the handler
    can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is
    returned when no further action is required. IRQ_WAKE_THREAD causes
    the genirq code to invoke the threaded (main) handler. When
    IRQ_WAKE_THREAD is returned handler must have disabled the interrupt
    on the device level. This is mandatory for shared interrupt handlers,
    but we need to do it as well for obscure x86 hardware where disabling
    an interrupt on the IO_APIC level redirects the interrupt to the
    legacy PIC interrupt lines.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar

    Thomas Gleixner
     
  • Conflicts:
    arch/parisc/kernel/irq.c
    kernel/irq/handle.c

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Mar, 2009

1 commit


13 Mar, 2009

2 commits


12 Mar, 2009

3 commits


03 Mar, 2009

1 commit


18 Feb, 2009

2 commits


15 Feb, 2009

2 commits


13 Feb, 2009

1 commit