13 Dec, 2014

1 commit

  • Since the rework of the sparse interrupt code to actually free the
    unused interrupt descriptors there exists a race between the /proc
    interfaces to the irq subsystem and the code which frees the interrupt
    descriptor.

    CPU0 CPU1
    show_interrupts()
    desc = irq_to_desc(X);
    free_desc(desc)
    remove_from_radix_tree();
    kfree(desc);
    raw_spinlock_irq(&desc->lock);

    /proc/interrupts is the only interface which can actively corrupt
    kernel memory via the lock access. /proc/stat can only read from freed
    memory. Extremly hard to trigger, but possible.

    The interfaces in /proc/irq/N/ are not affected by this because the
    removal of the proc file is serialized in procfs against concurrent
    readers/writers. The removal happens before the descriptor is freed.

    For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
    as the descriptor is never freed. It's merely cleared out with the irq
    descriptor lock held. So any concurrent proc access will either see
    the old correct value or the cleared out ones.

    Protect the lookup and access to the irq descriptor in
    show_interrupts() with the sparse_irq_lock.

    Provide kstat_irqs_usr() which is protecting the lookup and access
    with sparse_irq_lock and switch /proc/stat to use it.

    Document the existing kstat_irqs interfaces so it's clear that the
    caller needs to take care about protection. The users of these
    interfaces are either not affected due to SPARSE_IRQ=n or already
    protected against removal.

    Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     

03 Sep, 2014

1 commit

  • Calling irq_find_mapping from outside a irq_{enter,exit} section is
    unsafe and produces ugly messages if CONFIG_PROVE_RCU is enabled:
    If coming from the idle state, the rcu_read_lock call in irq_find_mapping
    will generate an unpleasant warning:

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.16.0-rc1+ #135 Not tainted
    -------------------------------
    include/linux/rcupdate.h:871 rcu_read_lock() used illegally while idle!

    other info that might help us debug this:

    RCU used illegally from idle CPU!
    rcu_scheduler_active = 1, debug_locks = 0
    RCU used illegally from extended quiescent state!
    1 lock held by swapper/0/0:
    #0: (rcu_read_lock){......}, at: []
    irq_find_mapping+0x4c/0x198

    As this issue is fairly widespread and involves at least three
    different architectures, a possible solution is to add a new
    handle_domain_irq entry point into the generic IRQ code that
    the interrupt controller code can call.

    This new function takes an irq_domain, and calls into irq_find_domain
    inside the irq_{enter,exit} block. An additional "lookup" parameter is
    used to allow non-domain architecture code to be replaced by this as well.

    Interrupt controllers can then be updated to use the new mechanism.

    This code is sitting behind a new CONFIG_HANDLE_DOMAIN_IRQ, as not all
    architectures implement set_irq_regs (yes, mn10300, I'm looking at you...).

    Reported-by: Vladimir Murzin
    Signed-off-by: Marc Zyngier
    Link: https://lkml.kernel.org/r/1409047421-27649-2-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Jason Cooper

    Marc Zyngier
     

06 Jul, 2014

1 commit

  • irq_free_hwirqs() always calls irq_free_descs() with a cnt == 0
    which makes it a no-op since the interrupt count to free is
    decremented in itself.

    Fixes: 7b6ef1262549f6afc5c881aaef80beb8fd15f908

    Signed-off-by: Keith Busch
    Acked-by: David Rientjes
    Link: http://lkml.kernel.org/r/1404167084-8070-1-git-send-email-keith.busch@intel.com
    Signed-off-by: Thomas Gleixner

    Keith Busch
     

16 May, 2014

5 commits

  • No more users. Get rid of the cruft.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Grant Likely
    Tested-by: Tony Luck
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140507154341.012847637@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Create a new interface and confine it with a config switch which makes
    clear that this is just legacy support and not to be used for new code.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Grant Likely
    Tested-by: Tony Luck
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140507154340.574437049@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • No more users. And it's not going to come back. If you need
    hotplugable irq chips, use irq domains.

    Signed-off-by: Thomas Gleixner
    Reviewed-and-acked-by: Grant Likely
    Tested-by: Tony Luck
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140507154340.302183048@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • We want to get rid of the public interface.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Grant Likely
    Tested-by: Tony Luck
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140507154340.061990194@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Not really the solution to the problem, but at least it confines the
    mess in the core code and allows to get rid of the create/destroy_irq
    variants from hell, i.e. 3 implementations with different semantics
    plus the x86 specific variants __create_irqs and create_irq_nr
    which have been invented in another circle of hell.

    x86 : x86 should be converted to irq domains and I'm deliberately
    making it impossible to do the multi-vector MSI support by
    adding more crap to the current mess. It's not that hard to do
    and I'm really tired of the trainwrecks which have been invented
    by baindaid engineering so far. Any attempt to do multi-vector
    MSI or ioapic hotplug without converting to irq domains is NAKed
    hereby.

    tile: Might use irq domains as well, but it has a very limited
    interrupt space, so handling it via this functionality might be
    the right thing to do even in the long run.

    ia64: That's an hopeless case, as I doubt that anyone has the stomach
    to rewrite the homebrewn dynamic allocation facilities. I stared
    at it for a couple of hours and gave up. The create/destroy_irq
    mess could be made private to itanic right away if there
    wouldn't be the iommu/dmar driver being shared with x86. So to
    do that I'm going to add a separate ia64 specific implementation
    later in order not to deep-six itanic right away.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Grant Likely
    Cc: Tony Luck
    Cc: Peter Zijlstra
    Cc: Chris Metcalf
    Cc: Fenghua Yu
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/20140507154334.208629358@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

28 Apr, 2014

1 commit

  • On x86 the allocation of irq descriptors may allocate interrupts which
    are in the range of the GSI interrupts. That's wrong as those
    interrupts are hardwired and we don't have the irq domain translation
    like PPC. So one of these interrupts can be hooked up later to one of
    the devices which are hard wired to it and the io_apic init code for
    that particular interrupt line happily reuses that descriptor with a
    completely different configuration so hell breaks lose.

    Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
    except for a few usage sites which have not yet blown up in our face
    for whatever reason. But for drivers which need an irq range, like the
    GPIO drivers, we have no limit in place and we don't want to expose
    such a detail to a driver.

    To cure this introduce a function which an architecture can implement
    to impose a lower bound on the dynamic interrupt allocations.

    Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
    the end of the hardwired interrupt space, so all dynamic allocations
    happen above.

    That not only allows the GPIO driver to work sanely, it also protects
    the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
    htirq code. They need to be cleaned up as well, but that's a separate
    issue.

    Reported-by: Jin Yao
    Signed-off-by: Thomas Gleixner
    Tested-by: Mika Westerberg
    Cc: Mathias Nyman
    Cc: Linus Torvalds
    Cc: Grant Likely
    Cc: H. Peter Anvin
    Cc: Rafael J. Wysocki
    Cc: Andy Shevchenko
    Cc: Krogerus Heikki
    Cc: Linus Walleij
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

05 Mar, 2014

1 commit

  • There is a common pattern all over the place:

    kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));

    This results in a call to core code anyway. So provide a function
    which does the same thing in core.

    While at it, replace the butt ugly macro with an inline.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140223212737.422068876@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

11 Feb, 2014

1 commit

  • In allmodconfig builds for sparc and any other arch which does
    not set CONFIG_SPARSE_IRQ, the following will be seen at modpost:

    CC [M] lib/cpu-notifier-error-inject.o
    CC [M] lib/pm-notifier-error-inject.o
    ERROR: "irq_to_desc" [drivers/gpio/gpio-mcp23s08.ko] undefined!
    make[2]: *** [__modpost] Error 1

    This happens because commit 3911ff30f5 ("genirq: export
    handle_edge_irq() and irq_to_desc()") added one export for it, but
    there were actually two instances of it, in an if/else clause for
    CONFIG_SPARSE_IRQ. Add the second one.

    Signed-off-by: Paul Gortmaker
    Cc: Jiri Kosina
    Cc: stable@vger.kernel.org # 3.4+
    Link: http://lkml.kernel.org/r/1392057610-11514-1-git-send-email-paul.gortmaker@windriver.com
    Signed-off-by: Thomas Gleixner

    Paul Gortmaker
     

15 May, 2012

1 commit

  • Export handle_edge_irq() and irq_to_desc() to modules to allow them to
    do things such as

    __irq_set_handler_locked(...., handle_edge_irq);

    This fixes

    ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined!
    ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined!

    when gpio-pch is being built as a module.

    This was introduced by commit df9541a60af0 ("gpio: pch9: Use proper flow
    type handlers") that added

    __irq_set_handler_locked(d->irq, handle_edge_irq);

    but handle_edge_irq() was not exported for modules (and inlined
    __irq_set_handler_locked() requires irq_to_desc() exported as well)

    Signed-off-by: Jiri Kosina
    Signed-off-by: Linus Torvalds

    Jiri Kosina
     

01 Nov, 2011

1 commit

  • Recent commit "irq: Track the owner of irq descriptor" in
    commit ID b6873807a7143b7 placed module.h into linux/irq.h
    but we are trying to limit module.h inclusion to just C files
    that really need it, due to its size and number of children
    includes. This targets just reversing that include.

    Add in the basic "struct module" since that is all we really need
    to ensure things compile. In theory, b687380 should have added the
    module.h include to the irqdesc.h header as well, but the implicit
    module.h everywhere presence masked this from showing up. So give
    it the "struct module" as well.

    As for the C files, irqdesc.c is only using THIS_MODULE, so it
    does not need module.h - give it export.h instead. The C file
    irq/manage.c is now (as of b687380) using try_module_get and
    module_put and so it needs module.h (which it already has).

    Also convert the irq_alloc_descs variants to macros, since all
    they really do is is call the __irq_alloc_descs primitive.
    This avoids including export.h and no debug info is lost.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

03 Oct, 2011

1 commit

  • The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
    which are usually used to connect local timers to each core. Each CPU
    has its own private interface to the GIC, and only sees the PPIs that
    are directly connect to it.

    While these timers are separate devices and have a separate interrupt
    line to a core, they all use the same IRQ number.

    For these devices, request_irq() is not the right API as it assumes
    that an IRQ number is visible by a number of CPUs (through the
    affinity setting), but makes it very awkward to express that an IRQ
    number can be handled by all CPUs, and yet be a different interrupt
    line on each CPU, requiring a different dev_id cookie to be passed
    back to the handler.

    The *_percpu_irq() functions is designed to overcome these
    limitations, by providing a per-cpu dev_id vector:

    int request_percpu_irq(unsigned int irq, irq_handler_t handler,
    const char *devname, void __percpu *percpu_dev_id);
    void free_percpu_irq(unsigned int, void __percpu *);
    int setup_percpu_irq(unsigned int irq, struct irqaction *new);
    void remove_percpu_irq(unsigned int irq, struct irqaction *act);
    void enable_percpu_irq(unsigned int irq);
    void disable_percpu_irq(unsigned int irq);

    The API has a number of limitations:
    - no interrupt sharing
    - no threading
    - common handler across all the CPUs

    Once the interrupt is requested using setup_percpu_irq() or
    request_percpu_irq(), it must be enabled by each core that wishes its
    local interrupt to be delivered.

    Based on an initial patch by Thomas Gleixner.

    Signed-off-by: Marc Zyngier
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

19 Aug, 2011

1 commit


28 Jul, 2011

1 commit

  • Interrupt descriptors can be allocated from modules. The interrupts
    are used by other modules, but we have no refcount on the module which
    provides the interrupts and there is no way to establish one on the
    device level as the interrupt using module is agnostic to the fact
    that the interrupt is provided by a module rather than by some builtin
    interrupt controller.

    To prevent removal of the interrupt providing module, we can track the
    owner of the interrupt descriptor, which also provides the relevant
    irq chip functions in the irq descriptor.

    request/setup_irq() can now acquire a refcount on the owner module to
    prevent unloading. free_irq() drops the refcount.

    Signed-off-by: Sebastian Andrzej Siewior
    Link: http://lkml.kernel.org/r/20110711101731.GA13804@Chamillionaire.breakpoint.cc
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

03 Jun, 2011

2 commits

  • When irq_alloc_descs() is called with no base IRQ specified then it will
    search for a range of IRQs starting from a specified base address. In the
    case where an IRQ is specified it still does this search in order to ensure
    that none of the requested range is already allocated and it still uses the
    from parameter to specify the base for the search. This means that in the
    case where a base is specified but from is zero (which is reasonable as
    any IRQ number is in the range specified by a zero from) the function will
    get confused and try to allocate the first suitably sized block of free IRQs
    it finds.

    Instead use a specified IRQ as the base address for the search, and insist
    that any from that is specified can support that IRQ.

    Signed-off-by: Mark Brown
    Link: http://lkml.kernel.org/r/1307037313-15733-1-git-send-email-broonie@opensource.wolfsonmicro.com
    Signed-off-by: Thomas Gleixner

    Mark Brown
     
  • The genirq changes are initializing descriptors for sparse IRQs quite
    differently from how non-sparse (stacked?) IRQs are initialized, with
    the effect that on my platform all IRQs are default-disabled on sparse
    IRQs and default-enabled if non-sparse IRQs are used, crashing some
    GPIO driver.

    Fix this by refactoring the non-sparse IRQs to use the same descriptor
    init function as the sparse IRQs.

    Signed-off: Linus Walleij
    Link: http://lkml.kernel.org/r/1306858479-16622-1-git-send-email-linus.walleij@stericsson.com
    Cc: stable@kernel.org # 2.6.39
    Signed-off-by: Thomas Gleixner

    Linus Walleij
     

18 May, 2011

3 commits

  • Export handle_simple_irq, irq_modify_status, irq_alloc_descs,
    irq_free_descs and generic_handle_irq to allow their usage in
    modules. First user is IIO, which wants to be built modular, but needs
    to be able to create irq chips, allocate and configure interrupt
    descriptors and handle demultiplexing interrupts.

    [ tglx: Moved the uninlinig of generic_handle_irq to a separate patch ]

    Signed-off-by: Jonathan Cameron
    Link: http://lkml.kernel.org/r/%3C1305711544-505-1-git-send-email-jic23%40cam.ac.uk%3E
    Signed-off-by: Thomas Gleixner

    Jonathan Cameron
     
  • generic_handle_irq() is missing a NULL pointer check for the result of
    irq_to_desc. This was a not a big problem, but we want to expose it to
    drivers, so we better have sanity checks in place. Add a return value
    as well, which indicates that the irq number was valid and the handler
    was invoked.

    Based on the pure code move from Jonathan Cameron.

    Signed-off-by: Thomas Gleixner
    Cc: Jonathan Cameron

    Thomas Gleixner
     
  • kernel/irq/ is only built when CONFIG_GENERIC_HARDIRQS=y. So making
    code inside of kernel/irq/ conditional on CONFIG_GENERIC_HARDIRQS is
    pointless.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

28 Mar, 2011

1 commit


27 Mar, 2011

1 commit


24 Mar, 2011

1 commit


22 Feb, 2011

1 commit

  • The runtime expansion of nr_irqs does not take into account that
    bitmap_find_next_zero_area() returns "start" + size in case the search
    for an matching zero area fails. That results in a start value which
    can be completely off and is not covered by the following
    expand_nr_irqs() and possibly outside of the absolute limit. But we
    use it without further checking.

    Use IRQ_BITMAP_BITS as the limit for the bitmap search and expand
    nr_irqs when the start bit is beyond nr_irqs. So start is always
    pointing to the correct area in the bitmap. nr_irqs is just the limit
    for irq enumerations, not the real limit for the irq space.

    [ tglx: Let irq_expand_nr_irqs() take the new upper end so we do not
    expand nr_irqs more than necessary. Made changelog readable ]

    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Yinghai Lu
     

19 Feb, 2011

7 commits

  • Most of the managing functions get the irq descriptor and lock it -
    either with or without buslock. Instead of open coding this over and
    over provide a common function to do that.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Force the usage of wrappers by another nasty CPP substitution.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Keep status in sync until all abusers are fixed.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The irq_desc.status field will either go away or renamed to
    settings. Anyway we need to maintain compatibility to avoid breaking
    the world and some more. While moving bits into the core, I need to
    avoid that I use any of the still existing IRQ_ bits in the core code
    by typos. So that file will hold the inline wrappers and some nasty
    CPP tricks to break the build when typoed.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • We face more and more the requirement to expand nr_irqs at
    runtime. The reason are irq expanders which can not be detected in the
    early boot stage. So we speculate nr_irqs to have enough room. Further
    Xen needs extra irq numbers and we really want to avoid adding more
    "detection" code into the early boot. There is no real good reason why
    we need to limit nr_irqs at early boot.

    Allow the allocation code to expand nr_irqs. We have already 8k extra
    number space in the allocation bitmap, so lets use it.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Reason: Further patches are conflicting with mainline fixes

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Lars-Peter Clausen pointed out:

    I stumbled upon this while looking through the existing archs using
    SPARSE_IRQ. Even with SPARSE_IRQ the NR_IRQS is still the upper
    limit for the number of IRQs.

    Both PXA and MMP set NR_IRQS to IRQ_BOARD_START, with
    IRQ_BOARD_START being the number of IRQs used by the core.

    In various machine files the nr_irqs field of the ARM machine
    defintion struct is then set to "IRQ_BOARD_START + NR_BOARD_IRQS".

    As a result "nr_irqs" will greater then NR_IRQS which then again
    causes the "allocated_irqs" bitmap in the core irq code to be
    accessed beyond its size overwriting unrelated data.

    The core code really misses a sanity check there.

    This went unnoticed so far as by chance the compiler/linker places
    data behind that bitmap which gets initialized later on those affected
    platforms.

    So the obvious fix would be to add a sanity check in early_irq_init()
    and break all affected platforms. Though that check wants to be
    backported to stable as well, which will require to fix all known
    problematic platforms and probably some more yet not known ones as
    well. Lots of churn.

    A way simpler solution is to allocate a slightly larger bitmap and
    avoid the whole churn w/o breaking anything. Add a few warnings when
    an arch returns utter crap.

    Reported-by: Lars-Peter Clausen
    Signed-off-by: Thomas Gleixner
    Cc: stable@kernel.org # .37
    Cc: Haojian Zhuang
    Cc: Eric Miao
    Cc: Peter Zijlstra

    Thomas Gleixner
     

09 Feb, 2011

1 commit

  • CONFIG_KSTAT_IRQS_ONDEMAND does not exist. It's not worth to implement
    it. Use sparse irqs if you care about memory consumption of the
    interrupt layer.

    Found by undertaker: http://vamos.informatik.uni-erlangen.de/trac/undertaker

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

14 Jan, 2011

1 commit

  • Use modern per_cpu API to increment {soft|hard}irq counters, and use
    per_cpu allocation for (struct irq_desc)->kstats_irq instead of an array.

    This gives better SMP/NUMA locality and saves few instructions per irq.

    With small nr_cpuids values (8 for example), kstats_irq was a small array
    (less than L1_CACHE_BYTES), potentially source of false sharing.

    In the !CONFIG_SPARSE_IRQ case, remove the huge, NUMA/cache unfriendly
    kstat_irqs_all[NR_IRQS][NR_CPUS] array.

    Note: we still populate kstats_irq for all possible irqs in
    early_irq_init(). We probably could use on-demand allocations. (Code
    included in alloc_descs()). Problem is not all IRQS are used with a prior
    alloc_descs() call.

    kstat_irqs_this_cpu() is not used anymore, remove it.

    Signed-off-by: Eric Dumazet
    Reviewed-by: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

28 Oct, 2010

1 commit

  • In /proc/stat, the number of per-IRQ event is shown by making a sum each
    irq's events on all cpus. But we can make use of kstat_irqs().

    kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
    it's not a big cost. (Both of the number of cpus and irqs are small.)

    If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

    for_each_irq()
    for_each_cpu()
    - look up a radix tree
    - read desc->irq_stat[cpu]
    This seems not efficient. This patch adds kstat_irqs() for
    CONFIG_GENRIC_HARDIRQ and change the calculation as

    for_each_irq()
    look up radix tree
    for_each_cpu()
    - read desc->irq_stat[cpu]

    This reduces cost.

    A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

    %time cat /proc/stat > /dev/null

    Before Patch: 2.459 sec
    After Patch : .561 sec

    [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
    [akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

13 Oct, 2010

1 commit


12 Oct, 2010

4 commits