03 May, 2010

1 commit

  • This patch adds a cpumask affinity hint to the irq_desc structure,
    along with a registration function and a read-only proc entry for each
    interrupt.

    This affinity_hint handle for each interrupt can be used by underlying
    drivers that need a better mechanism to control interrupt affinity.
    The underlying driver can register a cpumask for the interrupt, which
    will allow the driver to provide the CPU mask for the interrupt to
    anything that requests it. The intent is to extend the userspace
    daemon, irqbalance, to help hint to it a preferred CPU mask to balance
    the interrupt into.

    [ tglx: Fixed compile warnings, added WARN_ON, made SMP only ]

    Signed-off-by: Peter P Waskiewicz Jr
    Cc: davem@davemloft.net
    Cc: arjan@linux.jf.intel.com
    Cc: bhutchings@solarflare.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter P Waskiewicz Jr
     

04 Mar, 2010

1 commit

  • * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
    x86: Fix out of order of gsi
    x86: apic: Fix mismerge, add arch_probe_nr_irqs() again
    x86, irq: Keep chip_data in create_irq_nr and destroy_irq
    xen: Remove unnecessary arch specific xen irq functions.
    smp: Use nr_cpus= to set nr_cpu_ids early
    x86, irq: Remove arch_probe_nr_irqs
    sparseirq: Use radix_tree instead of ptrs array
    sparseirq: Change irq_desc_ptrs to static
    init: Move radix_tree_init() early
    irq: Remove unnecessary bootmem code
    x86: Add iMac9,1 to pci_reboot_dmi_table
    x86: Convert i8259_lock to raw_spinlock
    x86: Convert nmi_lock to raw_spinlock
    x86: Convert ioapic_lock and vector_lock to raw_spinlock
    x86: Avoid race condition in pci_enable_msix()
    x86: Fix SCI on IOAPIC != 0
    x86, ia32_aout: do not kill argument mapping
    x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path
    x86, irq: Update the vector domain for legacy irqs handled by io-apic
    x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
    ...

    Linus Torvalds
     

11 Feb, 2010

1 commit

  • Keep chip_data in create_irq_nr and destroy_irq.

    When two drivers are setting up MSI-X at the same time via
    pci_enable_msix() there is a race. See this dmesg excerpt:

    [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
    [ 85.170611] alloc irq_desc for 99 on node -1
    [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
    [ 85.170614] alloc kstat_irqs on node -1
    [ 85.170616] alloc irq_2_iommu on node -1
    [ 85.170617] alloc irq_desc for 100 on node -1
    [ 85.170619] alloc kstat_irqs on node -1
    [ 85.170621] alloc irq_2_iommu on node -1
    [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
    [ 85.170626] alloc irq_desc for 101 on node -1
    [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
    [ 85.170630] alloc kstat_irqs on node -1
    [ 85.170631] alloc irq_2_iommu on node -1
    [ 85.170635] alloc irq_desc for 102 on node -1
    [ 85.170636] alloc kstat_irqs on node -1
    [ 85.170639] alloc irq_2_iommu on node -1
    [ 85.170646] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000088

    As you can see igb and ixgbe are both alternating on create_irq_nr()
    via pci_enable_msix() in their probe function.

    ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
    choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
    calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
    NULL via dynamic_irq_init().

    igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
    via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

    cfg_new = irq_desc_ptrs[102]->chip_data;
    if (cfg_new->vector != 0)
    continue;

    This hits the NULL deref.

    Another possible race exists via pci_disable_msix() in a driver or in
    the number of error paths that call free_msi_irqs():

    destroy_irq()
    dynamic_irq_cleanup() which sets desc->chip_data = NULL
    ...race window...
    desc->chip_data = cfg;

    Remove the save and restore code for cfg in create_irq_nr() and
    destroy_irq() and take the desc->lock when checking the irq_cfg.

    Reported-and-analyzed-by: Brandon Philips
    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Brandon Phililps
    Cc: stable@kernel.org
    Signed-off-by: H. Peter Anvin

    Brandon Phiilps
     

13 Jan, 2010

1 commit


15 Dec, 2009

1 commit


04 Dec, 2009

1 commit


12 Sep, 2009

1 commit


25 Aug, 2009

1 commit


17 Aug, 2009

3 commits

  • Interrupt chips which are behind a slow bus (i2c, spi ...) and
    demultiplex other interrupt sources need to run their interrupt
    handler in a thread.

    The demultiplexed interrupt handlers need to run in thread context as
    well and need to finish before the demux handler thread can reenable
    the interrupt line. So the easiest way is to run the sub device
    handlers in the context of the demultiplexing handler thread.

    To avoid that a separate thread is created for the subdevices the
    function set_nested_irq_thread() is provided which sets the
    IRQ_NESTED_THREAD flag in the interrupt descriptor.

    A driver which calls request_threaded_irq() must not be aware of the
    fact that the threaded handler is called in the context of the
    demultiplexing handler thread. The setup code checks the
    IRQ_NESTED_THREAD flag which was set from the irq chip setup code and
    does not setup a separate thread for the interrupt. The primary
    function which is provided by the device driver is replaced by an
    internal dummy function which warns when it is called.

    For the demultiplexing handler a helper function handle_nested_irq()
    is provided which calls the demux interrupt thread function in the
    context of the caller and does the proper interrupt accounting and
    takes the interrupt disabled status of the demultiplexed subdevice
    into account.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • Some interrupt chips are connected to a "slow" bus (i2c, spi ...). The
    bus access needs to sleep and therefor cannot be called in atomic
    contexts.

    Some of the generic interrupt management functions like disable_irq(),
    enable_irq() ... call interrupt chip functions with the irq_desc->lock
    held and interrupts disabled. This does not work for such devices.

    Provide a separate synchronization mechanism for such interrupt
    chips. The irq_chip structure is extended by two optional functions
    (bus_lock and bus_sync_and_unlock).

    The idea is to serialize the bus access for those operations in the
    core code so that drivers which are behind that bus operated interrupt
    controller do not have to worry about it and just can use the normal
    interfaces. To achieve this we add two function pointers to the
    irq_chip: bus_lock and bus_sync_unlock.

    bus_lock() is called to serialize access to the interrupt controller
    bus.

    Now the core code can issue chip->mask/unmask ... commands without
    changing the fast path code at all. The chip implementation merily
    stores that information in a chip private data structure and
    returns. No bus interaction as these functions are called from atomic
    context.

    After that bus_sync_unlock() is called outside the atomic context. Now
    the chip implementation issues the bus commands, waits for completion
    and unlocks the interrupt controller bus.

    The irq_chip implementation as pseudo code:

    struct irq_chip_data {
    struct mutex mutex;
    unsigned int irq_offset;
    unsigned long mask;
    unsigned long mask_status;
    }

    static void bus_lock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    mutex_lock(&data->mutex);
    }

    static void mask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask |= (1 << irq);
    }

    static void unmask(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    irq -= data->irq_offset;
    data->mask &= ~(1 << irq);
    }

    static void bus_sync_unlock(unsigned int irq)
    {
    struct irq_chip_data *data = get_irq_desc_chip_data(irq);

    if (data->mask != data->mask_status) {
    do_bus_magic_to_set_mask(data->mask);
    data->mask_status = data->mask;
    }
    mutex_unlock(&data->mutex);
    }

    The device drivers can use request_threaded_irq, free_irq, disable_irq
    and enable_irq as usual with the only restriction that the calls need
    to come from non atomic context.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     
  • For threaded interrupt handlers we expect the hard interrupt handler
    part to mask the interrupt on the originating device. The interrupt
    line itself is reenabled after the hard interrupt handler has
    executed.

    This requires access to the originating device from hard interrupt
    context which is not always possible. There are devices which can only
    be accessed via a bus (i2c, spi, ...). The bus access requires thread
    context. For such devices we need to keep the interrupt line masked
    until the threaded handler has executed.

    Add a new flag IRQF_ONESHOT which allows drivers to request that the
    interrupt is not unmasked after the hard interrupt context handler has
    been executed and the thread has been woken. The interrupt line is
    unmasked after the thread handler function has been executed.

    Note that for now IRQF_ONESHOT cannot be used with IRQF_SHARED to
    avoid complex accounting mechanisms.

    For oneshot interrupts the primary handler simply returns
    IRQ_WAKE_THREAD and does nothing else. A generic implementation
    irq_default_primary_handler() is provided to avoid useless copies all
    over the place. It is automatically installed when
    request_threaded_irq() is called with handler=NULL and
    thread_fn!=NULL.

    Signed-off-by: Thomas Gleixner
    Cc: Mark Brown
    Cc: Dmitry Torokhov
    Cc: Trilok Soni
    Cc: Pavel Machek
    Cc: Brian Swetland
    Cc: Joonyoung Shim
    Cc: m.szyprowski@samsung.com
    Cc: t.fujak@samsung.com
    Cc: kyungmin.park@samsung.com,
    Cc: David Brownell
    Cc: Daniel Ribeiro
    Cc: arve@android.com
    Cc: Barry Song

    Thomas Gleixner
     

05 Jul, 2009

1 commit


21 Jun, 2009

1 commit


14 Jun, 2009

1 commit

  • Fix kernel-doc warnings in linux/irq.h:

    Warning(include/linux/irq.h:201): No description found for parameter 'node'
    Warning(include/linux/irq.h:201): Excess struct/union/enum/typedef member 'cpu' description in 'irq_desc'
    Warning(include/linux/irq.h:434): No description found for parameter 'node'
    Warning(include/linux/irq.h:434): Excess function parameter 'cpu' description in 'alloc_desc_masks'

    Signed-off-by: Randy Dunlap
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

12 Jun, 2009

1 commit


02 May, 2009

1 commit

  • move_irq_desc() will try to move irq_desc to the home node if
    the allocated one is not correct, in create_irq_nr().

    ( This can happen on devices that are on different nodes that
    are using MSI, when drivers are loaded and unloaded randomly. )

    v2: fix non-smp build
    v3: add NUMA_IRQ_DESC to eliminate #ifdefs

    [ Impact: improve irq descriptor locality on NUMA systems ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

28 Apr, 2009

5 commits

  • Try to get irq_desc on the home node in create_irq_nr().

    v2: don't check if we can move it when sparse_irq is not used
    v3: use move_irq_des, if that node is not what we want

    [ Impact: optimization, make MSI IRQ descriptors more NUMA aware ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • This simplifies the node awareness of the code. All our allocators
    only deal with a NUMA node ID locality not with CPU ids anyway - so
    there's no need to maintain (and transform) a CPU id all across the
    IRq layer.

    v2: keep move_irq_desc related

    [ Impact: cleanup, prepare IRQ code to be NUMA-aware ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • according to Ingo, change set_affinity() in irq_chip should return int,
    because that way we can handle failure cases in a much cleaner way, in
    the genirq layer.

    v2: fix two typos

    [ Impact: extend API ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    Cc: linux-arch@vger.kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • The original feature of migrating irq_desc dynamic was too fragile
    and was causing problems: it caused crashes on systems with lots of
    cards with MSI-X when user-space irq-balancer was enabled.

    We now have new patches that create irq_desc according to device
    numa node. This patch removes the leftover bits of the dynamic balancer.

    [ Impact: remove dead code ]

    Signed-off-by: Yinghai Lu
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    Cc: Rusty Russell
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • CPUMASKS_OFFSTACK is not defined anywhere (it is CPUMASK_OFFSTACK).
    It is a typo and init_allocate_desc_masks() is called before it set
    affinity to all cpus...

    Split init_alloc_desc_masks() into all_desc_masks() and init_desc_masks().

    Also use CPUMASK_OFFSTACK in alloc_desc_masks().

    [ Impact: fix smp_affinity copying/setup when moving irq_desc between CPUs ]

    Signed-off-by: Yinghai Lu
    Acked-by: Rusty Russell
    Cc: Andrew Morton
    Cc: Suresh Siddha
    Cc: "Eric W. Biederman"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

10 Apr, 2009

1 commit

  • …or-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    printk: fix wrong format string iter for printk
    futex: comment requeue key reference semantics

    * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    irq: fix cpumask memory leak on offstack cpumask kernels

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)
    posix-timers: fix RLIMIT_CPU && fork()
    timers: add missing kernel-doc

    Linus Torvalds
     

06 Apr, 2009

1 commit


04 Apr, 2009

1 commit


31 Mar, 2009

1 commit

  • Introduce helper functions allowing us to prevent device drivers from
    getting any interrupts (without disabling interrupts on the CPU)
    during suspend (or hibernation) and to make them start to receive
    interrupts again during the subsequent resume. These functions make it
    possible to keep timer interrupts enabled while the "late" suspend and
    "early" resume callbacks provided by device drivers are being
    executed. In turn, this allows device drivers' "late" suspend and
    "early" resume callbacks to sleep, execute ACPI callbacks etc.

    The functions introduced here will be used to rework the handling of
    interrupts during suspend (hibernation) and resume. Namely,
    interrupts will only be disabled on the CPU right before suspending
    sysdevs, while device drivers will be prevented from receiving
    interrupts, with the help of the new helper function, before their
    "late" suspend callbacks run (and analogously during resume).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Ingo Molnar

    Rafael J. Wysocki
     

29 Mar, 2009

1 commit

  • relies on and having been
    included previous. If not, the errors like below will result.

    CC arch/mips/mti-malta/malta-int.o
    In file included from arch/mips/mti-malta/malta-int.c:25:
    include/linux/irq.h: In function ‘init_alloc_desc_masks’:
    include/linux/irq.h:444: error: implicit declaration of function ‘cpu_to_node’
    include/linux/irq.h:446: error: ‘GFP_ATOMIC’ undeclared (first use in this function)
    include/linux/irq.h:446: error: (Each undeclared identifier is reported only once
    include/linux/irq.h:446: error: for each function it appears in.)
    make[3]: *** [arch/mips/mti-malta/malta-int.o] Error 1
    make[2]: *** [arch/mips/mti-malta] Error 2
    make[1]: *** [sub-make] Error 2

    Fixed by including the two missing headers.

    Signed-off-by: Ralf Baechle
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     

28 Mar, 2009

1 commit


24 Mar, 2009

2 commits

  • Add support for threaded interrupt handlers:

    A device driver can request that its main interrupt handler runs in a
    thread. To achive this the device driver requests the interrupt with
    request_threaded_irq() and provides additionally to the handler a
    thread function. The handler function is called in hard interrupt
    context and needs to check whether the interrupt originated from the
    device. If the interrupt originated from the device then the handler
    can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is
    returned when no further action is required. IRQ_WAKE_THREAD causes
    the genirq code to invoke the threaded (main) handler. When
    IRQ_WAKE_THREAD is returned handler must have disabled the interrupt
    on the device level. This is mandatory for shared interrupt handlers,
    but we need to do it as well for obscure x86 hardware where disabling
    an interrupt on the IO_APIC level redirects the interrupt to the
    legacy PIC interrupt lines.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Ingo Molnar

    Thomas Gleixner
     
  • Conflicts:
    arch/parisc/kernel/irq.c
    kernel/irq/handle.c

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Mar, 2009

1 commit


13 Mar, 2009

2 commits


12 Mar, 2009

2 commits


22 Jan, 2009

1 commit

  • David Miller suggested, related to a kstat_irqs related build breakage:

    > Either linux/kernel_stat.h provides the kstat_incr_irqs_this_cpu
    > interface or linux/irq.h does, not both.

    So move them to kernel_stat.h.

    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

12 Jan, 2009

2 commits

  • Impact: fix bug where new irq_desc uses old cpumask pointers which are freed.

    As Yinghai pointed out, init_copy_one_irq_desc() copies the old desc to
    the new desc overwriting the cpumask pointers. Since the old_desc and
    the cpumask pointers are freed, then memory corruption will occur if
    these old pointers are used.

    Move the allocation of these pointers to after the copy.

    Signed-off-by: Mike Travis
    Cc: Yinghai Lu

    Mike Travis
     
  • Impact: reduce memory usage, use new cpumask API.

    Replace the affinity and pending_masks with cpumask_var_t's. This adds
    to the significant size reduction done with the SPARSE_IRQS changes.

    The added functions (init_alloc_desc_masks & init_copy_desc_masks) are
    in the include file so they can be inlined (and optimized out for the
    !CONFIG_CPUMASKS_OFFSTACK case.) [Naming chosen to be consistent with
    the other init*irq functions, as well as the backwards arg declaration
    of "from, to" instead of the more common "to, from" standard.]

    Includes a slight change to the declaration of struct irq_desc to embed
    the pending_mask within ifdef(CONFIG_SMP) to be consistent with other
    references, and some small changes to Xen.

    Tested: sparse/non-sparse/cpumask_offstack/non-cpumask_offstack/nonuma/nosmp on x86_64

    Signed-off-by: Mike Travis
    Cc: Chris Wright
    Cc: Jeremy Fitzhardinge
    Cc: KOSAKI Motohiro
    Cc: Venkatesh Pallipadi
    Cc: virtualization@lists.osdl.org
    Cc: xen-devel@lists.xensource.com
    Cc: Yinghai Lu

    Mike Travis
     

11 Jan, 2009

1 commit


03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

29 Dec, 2008

1 commit

  • GCC has a bug with __weak alias functions: if the functions are in
    the same compilation unit as their call site, GCC can decide to
    inline them - and thus rob the linker of the opportunity to override
    the weak alias with the real thing.

    So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu