15 Dec, 2016

1 commit

  • Commit 34c3d9819fda ("genirq/affinity: Provide smarter irq spreading
    infrastructure") introduced a better IRQ spreading mechanism, taking
    account of the available NUMA nodes in the machine.

    Problem is that the algorithm of retrieving the nodemask iterates
    "linearly" based on the number of online nodes - some architectures
    present non-linear node distribution among the nodemask, like PowerPC.
    If this is the case, the algorithm lead to a wrong node count number
    and therefore to a bad/incomplete IRQ affinity distribution.

    For example, this problem were found in a machine with 128 CPUs and two
    nodes, namely nodes 0 and 8 (instead of 0 and 1, if it was linearly
    distributed). This led to a wrong affinity distribution which then led to
    a bad mq allocation for nvme driver.

    Finally, we take the opportunity to fix a comment regarding the affinity
    distribution when we have _more_ nodes than vectors.

    Fixes: 34c3d9819fda ("genirq/affinity: Provide smarter irq spreading infrastructure")
    Reported-by: Gabriel Krisman Bertazi
    Signed-off-by: Guilherme G. Piccoli
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Gabriel Krisman Bertazi
    Reviewed-by: Gavin Shan
    Cc: linux-pci@vger.kernel.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: hch@lst.de
    Link: http://lkml.kernel.org/r/1481738472-2671-1-git-send-email-gpiccoli@linux.vnet.ibm.com
    Signed-off-by: Thomas Gleixner

    Guilherme G. Piccoli
     

13 Dec, 2016

1 commit

  • Pull irq updates from Thomas Gleixner:
    "The irq department provides:

    - a major update to the auto affinity management code, which is used
    by multi-queue devices

    - move of the microblaze irq chip driver into the common driver code
    so it can be shared between microblaze, powerpc and MIPS

    - a series of updates to the ARM GICV3 interrupt controller

    - the usual pile of fixes and small improvements all over the place"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    powerpc/virtex: Use generic xilinx irqchip driver
    irqchip/xilinx: Try to fall back if xlnx,kind-of-intr not provided
    irqchip/xilinx: Add support for parent intc
    irqchip/xilinx: Rename get_irq to xintc_get_irq
    irqchip/xilinx: Restructure and use jump label api
    irqchip/xilinx: Clean up print messages
    microblaze/irqchip: Move intc driver to irqchip
    ARM: virt: Select ARM_GIC_V3_ITS
    ARM: gic-v3-its: Add 32bit support to GICv3 ITS
    irqchip/gic-v3-its: Specialise readq and writeq accesses
    irqchip/gic-v3-its: Specialise flush_dcache operation
    irqchip/gic-v3-its: Narrow down Entry Size when used as a divider
    irqchip/gic-v3-its: Change unsigned types for AArch32 compatibility
    irqchip/gic-v3: Use nops macro for Cavium ThunderX erratum 23154
    irqchip/gic-v3: Convert arm64 GIC accessors to {read,write}_sysreg_s
    genirq/msi: Drop artificial PCI dependency
    irqchip/bcm7038-l1: Implement irq_cpu_offline() callback
    genirq/affinity: Use default affinity mask for reserved vectors
    genirq/affinity: Take reserved vectors into account when spreading irqs
    PCI: Remove the irq_affinity mask from struct pci_dev
    ...

    Linus Torvalds
     

22 Nov, 2016

1 commit

  • The generic MSI layer doesn't have any PCI ties anymore, and the
    build hack should have been removed some time ago.

    Fixes: d9109698be6e ("genirq: Introduce msi_domain_alloc/free_irqs()")
    Signed-off-by: Marc Zyngier
    Link: http://lkml.kernel.org/r/1479806476-20801-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

17 Nov, 2016

2 commits

  • The reserved vectors at the beginning and the end of the vector space get
    cpu_possible_mask assigned as their affinity mask.

    All other non-auto affine interrupts get the default irq affinity mask
    assigned. Using cpu_possible_mask breaks that rule.

    Treat them like any other interrupt and use irq_default_affinity as target
    mask.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig

    Thomas Gleixner
     
  • The recent addition of reserved vectors at the beginning or the end of the
    vector space did not take the reserved vectors at the beginning into
    account for the various loop exit conditions. As a consequence the last
    vectors of the spread area are not included into the spread algorithm and
    are treated like the reserved vectors at the end of the vector space and
    get the default affinity mask assigned.

    Sum up the affinity vectors and the reserved vectors at the beginning and
    use the sum as exit condition.

    [ tglx: Fixed all conditions instead of only one and massaged changelog ]

    Signed-off-by: Christoph Hellwig
    Link: http://lkml.kernel.org/r/1479201178-29604-2-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     

09 Nov, 2016

2 commits

  • Only calculate the affinity for the main I/O vectors, and skip the
    pre or post vectors specified by struct irq_affinity.

    Also remove the irq_affinity cpumask argument that has never been used.
    If we ever need it in the future we can pass it through struct
    irq_affinity.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Acked-by: Bjorn Helgaas
    Acked-by: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Link: http://lkml.kernel.org/r/1478654107-7384-4-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     
  • Only calculate the affinity for the main I/O vectors, and skip the pre or
    post vectors specified by struct irq_affinity.

    Also remove the irq_affinity cpumask argument that has never been used. If
    we ever need it in the future we can pass it through struct irq_affinity.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Acked-by: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Link: http://lkml.kernel.org/r/1478654107-7384-3-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     

08 Nov, 2016

1 commit

  • The type flags in the irq descriptor are there for historical reasons and
    only updated via irq_modify_status() or irq_set_type(). Both functions also
    update the type flags in irqdata. __setup_irq() is the only left over user
    of the type flags in the irq descriptor.

    If __setup_irq() is called with empty irq type flags, then the type flags
    are retrieved from irqdata. If an interrupt is shared, then the type flags
    are compared with the type flags stored in the irq descriptor.

    On x86 the ioapic does not have a irq_set_type() callback because the type
    is defined in the BIOS tables and cannot be changed. The type is stored in
    irqdata at setup time without updating the type data in the irq
    descriptor. As a result the comparison described above fails.

    There is no point in updating the irq descriptor flags because the only
    relevant storage is irqdata. Use the type flags from irqdata for both
    retrieval and comparison in __setup_irq() instead.

    Aside of that the print out in case of non matching type flags has the old
    and new type flags arguments flipped. Fix that as well.

    For correctness sake the flags stored in the irq descriptor should be
    removed, but this is beyond the scope of this bugfix and will be done in a
    later patch.

    Fixes: 4b357daed698 ("genirq: Look-up trigger type if not specified by caller")
    Reported-and-tested-by: Mika Westerberg
    Signed-off-by: Thomas Gleixner
    Cc: Marc Zyngier
    Cc: Jon Hunter
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611072020360.3501@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

21 Oct, 2016

1 commit

  • The TPS65217 driver grew interrupt support which uses
    irq_set_parent(). While it's not yet clear why this is used in the first
    place, building the driver as a module fails with:

    ERROR: ".irq_set_parent" [drivers/mfd/tps65217.ko] undefined!

    The correctness of the driver change is still investigated, but for now
    it's less trouble to export irq_set_parent() than dealing with the build
    wreckage.

    [ tglx: Rewrote changelog and made the export GPL ]

    Fixes: 6556bdacf646 ("mfd: tps65217: Add support for IRQs")
    Signed-off-by: Sudip Mukherjee
    Cc: Sudip Mukherjee
    Cc: Marcin Niestroj
    Cc: Grygorii Strashko
    Cc: Tony Lindgren
    Cc: Lee Jones
    Link: http://lkml.kernel.org/r/1475775403-27207-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Thomas Gleixner

    Sudip Mukherjee
     

26 Sep, 2016

1 commit


21 Sep, 2016

1 commit


19 Sep, 2016

1 commit

  • There is no point in trying to configure the trigger of a chained
    interrupt if no trigger information has been configured. At best
    this is ignored, and at the worse this confuses the underlying
    irqchip (which is likely not to handle such a thing), and
    unnecessarily alarms the user.

    Only apply the configuration if type is not IRQ_TYPE_NONE.

    Fixes: 1e12c4a9393b ("genirq: Correctly configure the trigger on chained interrupts")
    Reported-and-tested-by: Geert Uytterhoeven
    Signed-off-by: Marc Zyngier
    Link: https://lkml.kernel.org/r/CAMuHMdVW1eTn20=EtYcJ8hkVwohaSuH_yQXrY2MGBEvZ8fpFOg@mail.gmail.com
    Link: http://lkml.kernel.org/r/1474274967-15984-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

16 Sep, 2016

1 commit


15 Sep, 2016

5 commits

  • No more users.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-5-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Switch MSI over to the new spreading code. If a pci device contains a valid
    pointer to a cpumask, then this mask is used for spreading otherwise the
    online cpu mask is used. This allows a driver to restrict the spread to a
    subset of CPUs, e.g. cpus on a particular node.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-4-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The current irq spreading infrastructure is just looking at a cpumask and
    tries to spread the interrupts over the mask. Thats suboptimal as it does
    not take numa nodes into account.

    Change the logic so the interrupts are spread across numa nodes and inside
    the nodes. If there are more cpus than vectors per node, then we set the
    affinity to several cpus. If HT siblings are available we take that into
    account and try to set all siblings to a single vector.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Link: http://lkml.kernel.org/r/1473862739-15032-3-git-send-email-hch@lst.de

    Thomas Gleixner
     
  • For irq spreading want to store affinity masks in the msi_entry. Add the
    infrastructure for it.

    We allocate an array of cpumasks with an array size of the number of used
    vectors in the entry, so we can hand in the information per linux interrupt
    later.

    As we hand in the number of used vectors, we assign them right
    away. Convert all the call sites.

    Signed-off-by: Thomas Gleixner
    Cc: axboe@fb.com
    Cc: keith.busch@intel.com
    Cc: agordeev@redhat.com
    Cc: linux-block@vger.kernel.org
    Cc: Christoph Hellwig
    Link: http://lkml.kernel.org/r/1473862739-15032-2-git-send-email-hch@lst.de

    Thomas Gleixner
     
  • …rm-platforms into irq/core

    Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:

    - ACPI IORT core code
    - IORT support for the GICv3 ITS
    - A few of GIC cleanups

    Thomas Gleixner
     

14 Sep, 2016

1 commit

  • Information about interrupts is exposed via /proc/interrupts, but the
    format of that file has changed over kernel versions and differs across
    architectures. It also has varying column numbers depending on hardware.

    That all makes it hard for tools to parse.

    To solve this, expose the information through sysfs so each irq attribute
    is in a separate file in a consistent, machine parsable way.

    This feature is only available when both CONFIG_SPARSE_IRQ and
    CONFIG_SYSFS are enabled.

    Examples:
    /sys/kernel/irq/18/actions: i801_smbus,ehci_hcd:usb1,uhci_hcd:usb7
    /sys/kernel/irq/18/chip_name: IR-IO-APIC
    /sys/kernel/irq/18/hwirq: 18
    /sys/kernel/irq/18/name: fasteoi
    /sys/kernel/irq/18/per_cpu_count: 0,0
    /sys/kernel/irq/18/type: level

    /sys/kernel/irq/25/actions: ahci0
    /sys/kernel/irq/25/chip_name: IR-PCI-MSI
    /sys/kernel/irq/25/hwirq: 512000
    /sys/kernel/irq/25/name: edge
    /sys/kernel/irq/25/per_cpu_count: 29036,0
    /sys/kernel/irq/25/type: edge

    [ tglx: Moved kobject_del() under sparse_irq_lock, massaged code comments
    and changelog ]

    Signed-off-by: Craig Gallek
    Cc: David Decotigny
    Link: http://lkml.kernel.org/r/1473783291-122873-1-git-send-email-kraigatgoog@gmail.com
    Signed-off-by: Thomas Gleixner

    Craig Gallek
     

06 Sep, 2016

1 commit

  • Some callers of __irq_set_trigger() masks all flags except trigger mode
    flags. This is unnecessary, ase __irq_set_trigger() already does this
    before usage of flags.

    [ tglx: Moved the flag mask and adjusted comment. Removed the hunk in
    enable_percpu_irq() as it is required there ]

    Signed-off-by: Alexander Kuleshov
    Link: http://lkml.kernel.org/r/20160719095408.13778-1-kuleshovmail@gmail.com
    Signed-off-by: Thomas Gleixner

    Alexander Kuleshov
     

05 Sep, 2016

1 commit

  • Commit 1bf4ddc46c5d ("irqdomain: Introduce irq_domain_create_{linear,
    tree}") introduced the use of fwnode_handle to identify the interrupt
    controller when calling __irq_domain_add but missed updating the kernel
    doc parameters for the function.

    Update this comment. While we are touching this code, also consolidate
    the declaration and assignment of of_node.

    Signed-off-by: Punit Agrawal
    Acked-by: Marc Zygnier
    Link: http://lkml.kernel.org/r/1464699409-23113-1-git-send-email-punit.agrawal@arm.com
    Signed-off-by: Thomas Gleixner

    Punit Agrawal
     

03 Sep, 2016

6 commits

  • Most (if not all) code here implicitly assumes that the maximum number of
    IRQs per chip will be 32, and thus uses 'u32' or 'unsigned long' for many
    tasks (for example "struct irq_data" declares its 'mask' field as 'u32',
    and "struct irq_chip_generic" declares its 'installed' field as 'unsigned
    long')

    However, there is no check to verify that irqs_per_chip is = 32) to catch such
    cases.

    [ tglx: Reduced changelog to the essential information ]

    Signed-off-by: Sebastian Frias
    Cc: Marc Zyngier
    Cc: Mason
    Cc: Jason Cooper
    Link: http://lkml.kernel.org/r/57B31D94.5040701@laposte.net
    Signed-off-by: Thomas Gleixner

    Sebastian Frias
     
  • According to the xlate() callback definition, the 'out_type' parameter
    needs to be the "linux irq type".

    A mask for such bits exists, IRQ_TYPE_SENSE_MASK, which is correctly
    applied in irq_domain_xlate_twocell()

    So use it for irq_domain_xlate_onetwocell() as well.

    Signed-off-by: Sebastian Frias
    Cc: Grant Likely
    Cc: Marc Zyngier
    Cc: Mason
    Cc: Jason Cooper
    Link: http://lkml.kernel.org/r/57A05F5D.103@laposte.net
    Signed-off-by: Thomas Gleixner

    Sebastian Frias
     
  • Without this patch irq_domain_disassociate() cannot properly release the
    interrupt. In fact, irq_map_generic_chip() checks a bit on 'gc->installed'
    but said bit is never cleared, only set.

    Commit 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
    added irq_map_generic_chip() function and also stated "This lacks a removal
    function for now".

    This commit provides an implementation of an unmap function that can be
    called by irq_domain_disassociate().

    [ tglx: Made the function static and removed the export as we have neither
    a prototype nor a modular user. ]

    Fixes: 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
    Signed-off-by: Sebastian Frias
    Cc: Marc Zyngier
    Cc: Mason
    Cc: Jason Cooper
    Link: http://lkml.kernel.org/r/579F5C5A.2070507@laposte.net
    Signed-off-by: Thomas Gleixner

    Sebastian Frias
     
  • irq_map_generic_chip() contains about the same code as
    irq_get_domain_generic_chip() except for the return values.

    Split out the irq_get_domain_generic_chip() implementation so it can be
    reused.

    [ tglx: Removed the extra churn in irq_get_domain_generic_chip() callers
    and massaged changelog ]

    Signed-off-by: Sebastian Frias
    Cc: Marc Zyngier
    Cc: Mason
    Cc: Jason Cooper
    Link: http://lkml.kernel.org/r/579F5C69.8070006@laposte.net
    Signed-off-by: Thomas Gleixner

    Sebastian Frias
     
  • No module users.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The percpu_devid handler is not robust against spurious interrupts. If a
    spurious interrupt happens and no action is installed then the handler
    crashes with a NULL pointer dereference.

    Add a sanity check for this and log the wreckage once in dmesg.

    Reported-by: Majun
    Signed-off-by: Thomas Gleixner
    Cc: Mark Rutland
    Cc: Marc Zyngier
    Cc: guohanjun@huawei.com
    Cc: dingtianhong@huawei.com
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1609021436160.5647@nanos

    Thomas Gleixner
     

22 Aug, 2016

2 commits

  • Without locking out CPU mask operations we might end up with an inconsistent
    view of the cpumask in the function.

    Fixes: 5e385a6ef31f: "genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors"
    Signed-off-by: Christoph Hellwig
    Link: http://lkml.kernel.org/r/1470924405-25728-1-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     
  • Obviously we should free action here if irq_chip_pm_get failed.

    Fixes: be45beb2df69: "genirq: Add runtime power management support for IRQ chips"
    Signed-off-by: Shawn Lin
    Cc: Jon Hunter
    Cc: Marc Zyngier
    Link: http://lkml.kernel.org/r/1471854112-13006-1-git-send-email-shawn.lin@rock-chips.com
    Signed-off-by: Thomas Gleixner

    Shawn Lin
     

17 Aug, 2016

1 commit

  • Commit 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ")
    moved the trigger configuration call from the irqdomain mapping to
    the interrupt being actually requested.

    This patch failed to handle the case where we configure a chained
    interrupt, which doesn't get requested through the usual path.

    In order to solve this, let's call __irq_set_trigger just before
    starting the cascade interrupt. Special care must be taken to
    make the flow handler stick, as the .irq_set_type method could
    have reset it (it doesn't know we're dealing with a chained
    interrupt).

    Based on an initial patch by Jon Hunter.

    Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ")
    Reported-by: John Stultz
    Reported-by: Linus Walleij
    Tested-by: John Stultz
    Acked-by: Jon Hunter
    Signed-off-by: Marc Zyngier

    Marc Zyngier
     

09 Aug, 2016

1 commit

  • Bharat Kumar Gogada reported issues with the generic MSI code, where the
    end-point ended up with garbage in its MSI configuration (both for the vector
    and the message).

    It turns out that the two MSI paths in the kernel are doing slightly different
    things:

    generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
    PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI

    And it turns out that end-points are allowed to latch the content of the MSI
    configuration registers as soon as MSIs are enabled. In Bharat's case, the
    end-point ends up using whatever was there already, which is not what you
    want.

    In order to make things converge, we introduce a new MSI domain flag
    (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
    this flag forces the programming of the end-point as soon as the MSIs are
    allocated.

    A consequence of this is that we have an extra activate in irq_startup, but
    that should be without much consequence.

    tglx:

    - Several people reported a VMWare regression with PCI/MSI-X passthrough. It
    turns out that the patch also cures that issue.

    - We need to have a look at the MSI disable interrupt path, where we write
    the msg to all zeros without disabling MSI in the PCI device. Is that
    correct?

    Fixes: 52f518a3a7c2 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
    Reported-and-tested-by: Bharat Kumar Gogada
    Reported-and-tested-by: Foster Snowhill
    Reported-by: Matthias Prager
    Reported-by: Jason Taylor
    Signed-off-by: Marc Zyngier
    Acked-by: Bjorn Helgaas
    Cc: linux-pci@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

19 Jul, 2016

1 commit

  • The new affinity hint argument of __irq_domain_alloc_irqs() is missing in
    irq_reserve_ipi(). Add it.

    This fixes the following compilation error:

    kernel/irq/ipi.c: In function ‘irq_reserve_ipi’:
    kernel/irq/ipi.c:85:9: error: too few arguments to function ‘__irq_domain_alloc_irqs’
    virq = __irq_domain_alloc_irqs(domain, virq, nr_irqs, NUMA_NO_NODE,
    ^
    Fixes: 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")
    Signed-off-by: Vincent Stehlé
    Cc: linux-pci@vger.kernel.org
    Cc: Christoph Hellwig
    Signed-off-by: Thomas Gleixner

    Vincent Stehle
     

11 Jul, 2016

1 commit

  • If an irq_domain is auto-recursive and irq_domain_alloc_irqs_recursive()
    for its parent has returned an error, then do return and avoid calling
    irq_domain_free_irqs_recursive() uselessly, because:
    - if domain->ops->alloc() had failed for an auto-recursive irq_domain,
    then irq_domain_free_irqs_recursive() had already been called;
    - if domain->ops->alloc() had failed for a not auto-recursive irq_domain,
    then there is nothing to free at all.

    Signed-off-by: Alexander Popov
    Acked-by: Marc Zyngier
    Link: http://lkml.kernel.org/r/1467505448-2850-1-git-send-email-alex.popov@linux.com
    Signed-off-by: Thomas Gleixner

    Alexander Popov
     

04 Jul, 2016

7 commits

  • virq is not required to be the same for all msi descs. Use the base irq number
    from the desc in the debug printk.

    Reported-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Pull the irq affinity managing code which is in a seperate branch for block
    developers to pull.

    Thomas Gleixner
     
  • This is lifted from the blk-mq code and adopted to use the affinity mask
    concept just introduced in the irq handling code. It tries to keep the
    algorithm the same as the one current used by blk-mq, but improvements
    like assining vectors on a per-node basis instead of just per sibling
    are possible with this simple move and refactoring.

    Signed-off-by: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-7-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Christoph Hellwig
     
  • Allow the MSI code to provide affinity hints per MSI descriptor.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-6-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Use the affinity hint in the irqdesc allocator. The hint is used to determine
    the node for the allocation and to set the affinity of the interrupt.

    If multiple interrupts are allocated (multi-MSI) then the allocator iterates
    over the cpumask and for each set cpu it allocates on their node and sets the
    initial affinity to that cpu.

    If a single interrupt is allocated (MSI-X) then the allocator uses the first
    cpu in the mask to compute the allocation node and uses the mask for the
    initial affinity setting.

    Interrupts set up this way are marked with the AFFINITY_MANAGED flag to
    prevent userspace from messing with their affinity settings.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-5-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Add an extra argument to the irq(domain) allocation functions, so we can hand
    down affinity hints to the allocator. Thats necessary to implement proper
    support for multiqueue devices.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-4-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Interupts marked with this flag are excluded from user space interrupt
    affinity changes. Contrary to the IRQ_NO_BALANCING flag, the kernel internal
    affinity mechanism is not blocked.

    This flag will be used for multi-queue device interrupts.

    Signed-off-by: Thomas Gleixner
    Cc: Christoph Hellwig
    Cc: linux-block@vger.kernel.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Cc: axboe@fb.com
    Cc: agordeev@redhat.com
    Link: http://lkml.kernel.org/r/1467621574-8277-3-git-send-email-hch@lst.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner