Eric Lee / smarc-fsl-linux-kernel

28 Aug, 2019

1 commit

101f85b56 genirq/affinity: Remove const qualifier from node_to_cpumask argument ... Browse Code »

When CONFIG_CPUMASK_OFFSTACK isn't enabled, 'cpumask_var_t' is as

'typedef struct cpumask cpumask_var_t[1]',

so the argument 'node_to_cpumask' alloc_nodes_vectors() can't be declared
as 'const cpumask_var_t *'

Fixes the following warning:

kernel/irq/affinity.c: In function '__irq_build_affinity_masks':
alloc_nodes_vectors(numvecs, node_to_cpumask, cpu_mask,
^
kernel/irq/affinity.c:128:13: note: expected 'const struct cpumask (*)[1]' but argument is of type 'struct cpumask (*)[1]'
static void alloc_nodes_vectors(unsigned int numvecs,
^
Fixes: b1a5a73e64e9 ("genirq/affinity: Spread vectors on node according to nr_cpu ratio")
Reported-by: kbuild test robot
Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190828085815.19931-1-ming.lei@redhat.com

Ming Lei
2019-08-28 18:20:43 +0800

27 Aug, 2019

2 commits

b1a5a73e6 genirq/affinity: Spread vectors on node according to nr_cpu ratio ... Browse Code »

Now __irq_build_affinity_masks() spreads vectors evenly per node, but there
is a case that not all vectors have been spread when each numa node has a
different number of CPUs which triggers the warning in the spreading code.

Improve the spreading algorithm by

- assigning vectors according to the ratio of the number of CPUs on a node
to the number of remaining CPUs.

- running the assignment from smaller nodes to bigger nodes to guarantee
that every active node gets allocated at least one vector.

This ensures that all vectors are spread out. Asided of that the spread
becomes more fair if the nodes have different number of CPUs.

For example, on the following machine:
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
...
NUMA node0 CPU(s): 0,1,3,5-9,11,13-15
NUMA node1 CPU(s): 2,4,10,12

When a driver requests to allocate 8 vectors, the following spread results:

irq 31, cpu list 2,4
irq 32, cpu list 10,12
irq 33, cpu list 0-1
irq 34, cpu list 3,5
irq 35, cpu list 6-7
irq 36, cpu list 8-9
irq 37, cpu list 11,13
irq 38, cpu list 14-15

So Node 0 has now 6 and Node 1 has 2 vectors assigned. The original
algorithm assigned 4 vectors on each node which was unfair versus Node 0.

[ tglx: Massaged changelog ]

Reported-by: Jon Derrick
Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Keith Busch
Reviewed-by: Jon Derrick
Link: https://lkml.kernel.org/r/20190816022849.14075-3-ming.lei@redhat.com

Ming Lei
2019-08-27 22:31:17 +0800
53c1788b7 genirq/affinity: Improve __irq_build_affinity_masks() ... Browse Code »

One invariant of __irq_build_affinity_masks() is that all CPUs in the
specified masks (cpu_mask AND node_to_cpumask for each node) should be
covered during the spread. Even though all requested vectors have been
reached, it's still required to spread vectors among remained CPUs. A
similar policy has been taken in case of 'numvecs
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190816022849.14075-2-ming.lei@redhat.com

Ming Lei
2019-08-27 22:31:17 +0800

08 Aug, 2019

1 commit

491beed3b genirq/affinity: Create affinity mask for single vector ... Browse Code »

Since commit c66d4bd110a1f8 ("genirq/affinity: Add new callback for
(re)calculating interrupt sets"), irq_create_affinity_masks() returns
NULL in case of single vector. This change has caused regression on some
drivers, such as lpfc.

The problem is that single vector requests can happen in some generic cases:

1) kdump kernel

2) irq vectors resource is close to exhaustion.

If in that situation the affinity mask for a single vector is not created,
every caller has to handle the special case.

There is no reason why the mask cannot be created, so remove the check for
a single vector and create the mask.

Fixes: c66d4bd110a1f8 ("genirq/affinity: Add new callback for (re)calculating interrupt sets")
Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190805011906.5020-1-ming.lei@redhat.com

Ming Lei
2019-08-08 14:47:55 +0800

12 Jun, 2019

1 commit

0e5183304 genirq/affinity: Remove unused argument from [__]irq_build_affinity_masks() ... Browse Code »

The *affd argument is neither used in irq_build_affinity_masks() nor
__irq_build_affinity_masks(). Remove it.

Signed-off-by: Minwoo Im
Signed-off-by: Thomas Gleixner
Reviewed-by: Ming Lei
Cc: Minwoo Im
Cc: linux-block@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602112117.31839-1-minwoo.im.dev@gmail.com

Minwoo Im
2019-06-12 16:52:45 +0800

18 Feb, 2019

4 commits

a6a309edb genirq/affinity: Remove the leftovers of the original set support ... Browse Code »

Now that the NVME driver is converted over to the calc_set() callback, the
workarounds of the original set support can be removed.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ming Lei
Acked-by: Marc Zyngier
Cc: Christoph Hellwig
Cc: Bjorn Helgaas
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch
Cc: Sumit Saxena
Cc: Kashyap Desai
Cc: Shivasharan Srikanteshwara
Link: https://lkml.kernel.org/r/20190216172228.689834224@linutronix.de

Thomas Gleixner
2019-02-18 18:21:29 +0800
c66d4bd11 genirq/affinity: Add new callback for (re)calculating interrupt sets ... Browse Code »

The interrupt affinity spreading mechanism supports to spread out
affinities for one or more interrupt sets. A interrupt set contains one or
more interrupts. Each set is mapped to a specific functionality of a
device, e.g. general I/O queues and read I/O queus of multiqueue block
devices.

The number of interrupts per set is defined by the driver. It depends on
the total number of available interrupts for the device, which is
determined by the PCI capabilites and the availability of underlying CPU
resources, and the number of queues which the device provides and the
driver wants to instantiate.

The driver passes initial configuration for the interrupt allocation via a
pointer to struct irq_affinity.

Right now the allocation mechanism is complex as it requires to have a loop
in the driver to determine the maximum number of interrupts which are
provided by the PCI capabilities and the underlying CPU resources. This
loop would have to be replicated in every driver which wants to utilize
this mechanism. That's unwanted code duplication and error prone.

In order to move this into generic facilities it is required to have a
mechanism, which allows the recalculation of the interrupt sets and their
size, in the core code. As the core code does not have any knowledge about the
underlying device, a driver specific callback is required in struct
irq_affinity, which can be invoked by the core code. The callback gets the
number of available interupts as an argument, so the driver can calculate the
corresponding number and size of interrupt sets.

At the moment the struct irq_affinity pointer which is handed in from the
driver and passed through to several core functions is marked 'const', but for
the callback to be able to modify the data in the struct it's required to
remove the 'const' qualifier.

Add the optional callback to struct irq_affinity, which allows drivers to
recalculate the number and size of interrupt sets and remove the 'const'
qualifier.

For simple invocations, which do not supply a callback, a default callback
is installed, which just sets nr_sets to 1 and transfers the number of
spreadable vectors to the set_size array at index 0.

This is for now guarded by a check for nr_sets != 0 to keep the NVME driver
working until it is converted to the callback mechanism.

To make sure that the driver configuration is correct under all circumstances
the callback is invoked even when there are no interrupts for queues left,
i.e. the pre/post requirements already exhaust the numner of available
interrupts.

At the PCI layer irq_create_affinity_masks() has to be invoked even for the
case where the legacy interrupt is used. That ensures that the callback is
invoked and the device driver can adjust to that situation.

[ tglx: Fixed the simple case (no sets required). Moved the sanity check
for nr_sets after the invocation of the callback so it catches
broken drivers. Fixed the kernel doc comments for struct
irq_affinity and de-'This patch'-ed the changelog ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Acked-by: Marc Zyngier
Cc: Christoph Hellwig
Cc: Bjorn Helgaas
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch
Cc: Sumit Saxena
Cc: Kashyap Desai
Cc: Shivasharan Srikanteshwara
Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de

Ming Lei
2019-02-18 18:21:28 +0800
9cfef55bb genirq/affinity: Store interrupt sets size in struct irq_affinity ... Browse Code »

The interrupt affinity spreading mechanism supports to spread out
affinities for one or more interrupt sets. A interrupt set contains one
or more interrupts. Each set is mapped to a specific functionality of a
device, e.g. general I/O queues and read I/O queus of multiqueue block
devices.

The number of interrupts per set is defined by the driver. It depends on
the total number of available interrupts for the device, which is
determined by the PCI capabilites and the availability of underlying CPU
resources, and the number of queues which the device provides and the
driver wants to instantiate.

The driver passes initial configuration for the interrupt allocation via
a pointer to struct irq_affinity.

Right now the allocation mechanism is complex as it requires to have a
loop in the driver to determine the maximum number of interrupts which
are provided by the PCI capabilities and the underlying CPU resources.
This loop would have to be replicated in every driver which wants to
utilize this mechanism. That's unwanted code duplication and error
prone.

In order to move this into generic facilities it is required to have a
mechanism, which allows the recalculation of the interrupt sets and
their size, in the core code. As the core code does not have any
knowledge about the underlying device, a driver specific callback will
be added to struct affinity_desc, which will be invoked by the core
code. The callback will get the number of available interupts as an
argument, so the driver can calculate the corresponding number and size
of interrupt sets.

To support this, two modifications for the handling of struct irq_affinity
are required:

1) The (optional) interrupt sets size information is contained in a
separate array of integers and struct irq_affinity contains a
pointer to it.

This is cumbersome and as the maximum number of interrupt sets is small,
there is no reason to have separate storage. Moving the size array into
struct affinity_desc avoids indirections and makes the code simpler.

2) At the moment the struct irq_affinity pointer which is handed in from
the driver and passed through to several core functions is marked
'const'.

With the upcoming callback to recalculate the number and size of
interrupt sets, it's necessary to remove the 'const'
qualifier. Otherwise the callback would not be able to update the data.

Implement #1 and store the interrupt sets size in 'struct irq_affinity'.

No functional change.

[ tglx: Fixed the memcpy() size so it won't copy beyond the size of the
source. Fixed the kernel doc comments for struct irq_affinity and
de-'This patch'-ed the changelog ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Acked-by: Marc Zyngier
Cc: Christoph Hellwig
Cc: Bjorn Helgaas
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch
Cc: Sumit Saxena
Cc: Kashyap Desai
Cc: Shivasharan Srikanteshwara
Link: https://lkml.kernel.org/r/20190216172228.423723127@linutronix.de

Ming Lei
2019-02-18 18:21:27 +0800
0145c30e8 genirq/affinity: Code consolidation ... Browse Code »

All information and calculations in the interrupt affinity spreading code
is strictly unsigned int. Though the code uses int all over the place.

Convert it over to unsigned int.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ming Lei
Acked-by: Marc Zyngier
Cc: Christoph Hellwig
Cc: Bjorn Helgaas
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: Keith Busch
Cc: Sumit Saxena
Cc: Kashyap Desai
Cc: Shivasharan Srikanteshwara
Link: https://lkml.kernel.org/r/20190216172228.336424556@linutronix.de

Thomas Gleixner
2019-02-18 18:21:27 +0800

11 Feb, 2019

1 commit

347253c42 genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() ... Browse Code »

'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
so move it into irq_build_affinity_masks().

No functioanl change.

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Bjorn Helgaas
Cc: Christoph Hellwig
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com

Ming Lei
2019-02-11 02:53:55 +0800

19 Dec, 2018

3 commits

c410abbba genirq/affinity: Add is_managed to struct irq_affinity_desc ... Browse Code »

Devices which use managed interrupts usually have two classes of
interrupts:

- Interrupts for multiple device queues
- Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.

Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.

[ tglx: Simplify code and massage changelog ]

Reported-by: Kashyap Desai
Reported-by: Sumit Saxena
Signed-off-by: Dou Liyang
Signed-off-by: Thomas Gleixner
Cc: linux-pci@vger.kernel.org
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: bhelgaas@google.com
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com

Dou Liyang
2018-12-19 18:32:08 +0800
bec04037e genirq/core: Introduce struct irq_affinity_desc ... Browse Code »

The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.

Devices which use managed interrupts usually have two classes of
interrupts:

- Interrupts for multiple device queues
- Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.

To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.

Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.

No functional change, just preparatory work.

[ tglx: Simplified logic and clarified changelog ]

Suggested-by: Thomas Gleixner
Suggested-by: Bjorn Helgaas
Signed-off-by: Dou Liyang
Signed-off-by: Thomas Gleixner
Cc: linux-pci@vger.kernel.org
Cc: kashyap.desai@broadcom.com
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com

Dou Liyang
2018-12-19 18:32:08 +0800
c2899c347 genirq/affinity: Remove excess indentation ... Browse Code »

Plus other coding style issues which stood out while staring at that code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2018-12-19 18:32:07 +0800

05 Nov, 2018

4 commits

6da4b3ab9 genirq/affinity: Add support for allocating interrupt sets ... Browse Code »

A driver may have a need to allocate multiple sets of MSI/MSI-X interrupts,
and have them appropriately affinitized.

Add support for defining a number of sets in the irq_affinity structure, of
varying sizes, and get each set affinitized correctly across the machine.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Jens Axboe
Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Hannes Reinecke
Reviewed-by: Ming Lei
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Cc: linux-block@vger.kernel.org
Link: https://lkml.kernel.org/r/20181102145951.31979-5-ming.lei@redhat.com

Jens Axboe
2018-11-05 19:16:27 +0800
060746d9e genirq/affinity: Pass first vector to __irq_build_affinity_masks() ... Browse Code »

No functional change.

Prepares for support of allocating and affinitizing sets of interrupts, in
which each set of interrupts needs a full two stage spreading. The first
vector argument is necessary for this so the affinitizing starts from the
first vector of each set.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke
Cc: Keith Busch
Cc: Sagi Grimberg
Link: https://lkml.kernel.org/r/20181102145951.31979-4-ming.lei@redhat.com

Ming Lei
2018-11-05 19:16:26 +0800
5c903e108 genirq/affinity: Move two stage affinity spreading into a helper function ... Browse Code »

No functional change. Prepares for supporting allocating and affinitizing
interrupt sets.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke
Cc: Keith Busch
Cc: Sagi Grimberg
Link: https://lkml.kernel.org/r/20181102145951.31979-3-ming.lei@redhat.com

Ming Lei
2018-11-05 19:16:26 +0800
b82592199 genirq/affinity: Spread IRQs to all available NUMA nodes ... Browse Code »

If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
which are allocated for a device, the interrupt affinity spreading code
fails to spread them across all nodes.

The reason is, that the spreading code starts from node 0 and continues up
to the number of interrupts requested for allocation. This leaves the nodes
past the last interrupt unused.

This results in interrupt concentration on the first nodes which violates
the assumption of the block layer that all nodes are covered evenly. As a
consequence the NUMA nodes above the number of interrupts are all assigned
to hardware queue 0 and therefore NUMA node 0, which results in bad
performance and has CPU hotplug implications, because queue 0 gets shut
down when the last CPU of node 0 is offlined.

Go over all NUMA nodes and assign them round-robin to all requested
interrupts to solve this.

[ tglx: Massaged changelog ]

Signed-off-by: Long Li
Signed-off-by: Thomas Gleixner
Reviewed-by: Ming Lei
Cc: Michael Kelley
Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com

Long Li
2018-11-05 19:16:26 +0800

06 Apr, 2018

5 commits

d3056812e genirq/affinity: Spread irq vectors among present CPUs as far as possible ... Browse Code »

Commit 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
tried to spread the interrupts accross all possible CPUs to make sure that
in case of phsyical hotplug (e.g. virtualization) the CPUs which get
plugged in after the device was initialized are targeted by a hardware
queue and the corresponding interrupt.

This has a downside in cases where the ACPI tables claim that there are
more possible CPUs than present CPUs and the number of interrupts to spread
out is smaller than the number of possible CPUs. These bogus ACPI tables
are unfortunately not uncommon.

In such a case the vector spreading algorithm assigns interrupts to CPUs
which can never be utilized and as a consequence these interrupts are
unused instead of being mapped to present CPUs. As a result the performance
of the device is suboptimal.

To fix this spread the interrupt vectors in two stages:

1) Spread as many interrupts as possible among the present CPUs

2) Spread the remaining vectors among non present CPUs

On a 8 core system, where CPU 0-3 are present and CPU 4-7 are not present,
for a device with 4 queues the resulting interrupt affinity is:

1) Before 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
irq 39, cpu list 0
irq 40, cpu list 1
irq 41, cpu list 2
irq 42, cpu list 3

2) With 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
irq 39, cpu list 0-2
irq 40, cpu list 3-4,6
irq 41, cpu list 5
irq 42, cpu list 7

3) With the refined vector spread applied:
irq 39, cpu list 0,4
irq 40, cpu list 1,6
irq 41, cpu list 2,5
irq 42, cpu list 3,7

On a 8 core system, where all CPUs are present the resulting interrupt
affinity for the 4 queues is:

irq 39, cpu list 0,1
irq 40, cpu list 2,3
irq 41, cpu list 4,5
irq 42, cpu list 6,7

This is independent of the number of CPUs which are online at the point of
initialization because in such a system the offline CPUs can be easily
onlined afterwards, while in non-present CPUs need to be plugged physically
or virtually which requires external interaction.

The downside of this approach is that in case of physical hotplug the
interrupt vector spreading might be suboptimal when CPUs 4-7 are physically
plugged. Suboptimal from a NUMA point of view and due to the single target
nature of interrupt affinities the later plugged CPUs might not be targeted
by interrupts at all.

Though, physical hotplug systems are not the common case while the broken
ACPI table disease is wide spread. So it's preferred to have as many
interrupts as possible utilized at the point where the device is
initialized.

Block multi-queue devices like NVME create a hardware queue per possible
CPU, so the goal of commit 84676c1f21 to assign one interrupt vector per
possible CPU is still achieved even with physical/virtual hotplug.

[ tglx: Changed from online to present CPUs for the first spreading stage,
renamed variables for readability sake, added comments and massaged
changelog ]

Reported-by: Laurence Oberman
Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Christoph Hellwig
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20180308105358.1506-5-ming.lei@redhat.com

Ming Lei
2018-04-06 18:19:51 +0800
1a2d0914e genirq/affinity: Allow irq spreading from a given starting point ... Browse Code »

To support two stage irq vector spreading, it's required to add a starting
point to the spreading function. No functional change, just preparatory
work for the actual two stage change.

[ tglx: Renamed variables, tidied up the code and massaged changelog ]

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Christoph Hellwig
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20180308105358.1506-4-ming.lei@redhat.com

Ming Lei
2018-04-06 18:19:51 +0800
b3e6aaa8d genirq/affinity: Move actual irq vector spreading into a helper function ... Browse Code »

No functional change, just prepare for converting to 2-stage irq vector
spreading.

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Christoph Hellwig
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20180308105358.1506-3-ming.lei@redhat.com

Ming Lei
2018-04-06 18:19:51 +0800
47778f33d genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask ... Browse Code »

The following patches will introduce two stage irq spreading for improving
irq spread on all possible CPUs.

No functional change.

Signed-off-by: Ming Lei
Signed-off-by: Thomas Gleixner
Reviewed-by: Christoph Hellwig
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20180308105358.1506-2-ming.lei@redhat.com

Ming Lei
2018-04-06 18:19:50 +0800
0211e12dd genirq/affinity: Don't return with empty affinity masks on error ... Browse Code »

When the allocation of node_to_possible_cpumask fails, then
irq_create_affinity_masks() returns with a pointer to the empty affinity
masks array, which will cause malfunction.

Reorder the allocations so the masks array allocation comes last and every
failure path returns NULL.

Fixes: 9a0ef98e186d ("genirq/affinity: Assign vectors to all present CPUs")
Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: Ming Lei

Thomas Gleixner
2018-04-06 18:19:50 +0800

13 Jan, 2018

1 commit

84676c1f2 genirq/affinity: assign vectors to all possible CPUs ... Browse Code »

Currently we assign managed interrupt vectors to all present CPUs. This
works fine for systems were we only online/offline CPUs. But in case of
systems that support physical CPU hotplug (or the virtualized version of
it) this means the additional CPUs covered for in the ACPI tables or on
the command line are not catered for. To fix this we'd either need to
introduce new hotplug CPU states just for this case, or we can start
assining vectors to possible but not present CPUs.

Reported-by: Christian Borntraeger
Tested-by: Christian Borntraeger
Tested-by: Stefan Haberland
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
Cc: linux-kernel@vger.kernel.org
Cc: Thomas Gleixner
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-01-13 02:01:38 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

09 Jul, 2017

1 commit

f263fbb8d Merge tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull PCI updates from Bjorn Helgaas:

- add sysfs max_link_speed/width, current_link_speed/width (Wong Vee
Khee)

- make host bridge IRQ mapping much more generic (Matthew Minter,
Lorenzo Pieralisi)

- convert most drivers to pci_scan_root_bus_bridge() (Lorenzo
Pieralisi)

- mutex sriov_configure() (Jakub Kicinski)

- mutex pci_error_handlers callbacks (Christoph Hellwig)

- split ->reset_notify() into ->reset_prepare()/reset_done()
(Christoph Hellwig)

- support multiple PCIe portdrv interrupts for MSI as well as MSI-X
(Gabriele Paoloni)

- allocate MSI/MSI-X vector for Downstream Port Containment (Gabriele
Paoloni)

- fix MSI IRQ affinity pre/post/min_vecs issue (Michael Hernandez)

- test INTx masking during enumeration, not at run-time (Piotr Gregor)

- avoid using device_may_wakeup() for runtime PM (Rafael J. Wysocki)

- restore the status of PCI devices across hibernation (Chen Yu)

- keep parent resources that start at 0x0 (Ard Biesheuvel)

- enable ECRC only if device supports it (Bjorn Helgaas)

- restore PRI and PASID state after Function-Level Reset (CQ Tang)

- skip DPC event if device is not present (Keith Busch)

- check domain when matching SMBIOS info (Sujith Pandel)

- mark Intel XXV710 NIC INTx masking as broken (Alex Williamson)

- avoid AMD SB7xx EHCI USB wakeup defect (Kai-Heng Feng)

- work around long-standing Macbook Pro poweroff issue (Bjorn Helgaas)

- add Switchtec "running" status flag (Logan Gunthorpe)

- fix dra7xx incorrect RW1C IRQ register usage (Arvind Yadav)

- modify xilinx-nwl IRQ chip for legacy interrupts (Bharat Kumar
Gogada)

- move VMD SRCU cleanup after bus, child device removal (Jon Derrick)

- add Faraday clock handling (Linus Walleij)

- configure Rockchip MPS and reorganize (Shawn Lin)

- limit Qualcomm TLP size to 2K (hardware issue) (Srinivas Kandagatla)

- support Tegra MSI 64-bit addressing (Thierry Reding)

- use Rockchip normal (not privileged) register bank (Shawn Lin)

- add HiSilicon Kirin SoC PCIe controller driver (Xiaowei Song)

- add Sigma Designs Tango SMP8759 PCIe controller driver (Marc
Gonzalez)

- add MediaTek PCIe host controller support (Ryder Lee)

- add Qualcomm IPQ4019 support (John Crispin)

- add HyperV vPCI protocol v1.2 support (Jork Loeser)

- add i.MX6 regulator support (Quentin Schulz)

* tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits)
PCI: tango: Add Sigma Designs Tango SMP8759 PCIe host bridge support
PCI: Add DT binding for Sigma Designs Tango PCIe controller
PCI: rockchip: Use normal register bank for config accessors
dt-bindings: PCI: Add documentation for MediaTek PCIe
PCI: Remove __pci_dev_reset() and pci_dev_reset()
PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
PCI: xilinx: Make of_device_ids const
PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts
PCI: vmd: Move SRCU cleanup after bus, child device removal
PCI: vmd: Correct comment: VMD domains start at 0x10000, not 0x1000
PCI: versatile: Add local struct device pointers
PCI: tegra: Do not allocate MSI target memory
PCI: tegra: Support MSI 64-bit addressing
PCI: rockchip: Use local struct device pointer consistently
PCI: rockchip: Check for clk_prepare_enable() errors during resume
MAINTAINERS: Remove Wenrui Li as Rockchip PCIe driver maintainer
PCI: rockchip: Configure RC's MPS setting
PCI: rockchip: Reconfigure configuration space header type
PCI: rockchip: Split out rockchip_pcie_cfg_configuration_accesses()
PCI: rockchip: Move configuration accesses into rockchip_pcie_cfg_atu()
...

Linus Torvalds
2017-07-09 06:51:57 +0800

23 Jun, 2017

1 commit

9a0ef98e1 genirq/affinity: Assign vectors to all present CPUs ... Browse Code »

Currently the irq vector spread algorithm is restricted to online CPUs,
which ties the IRQ mapping to the currently online devices and doesn't deal
nicely with the fact that CPUs could come and go rapidly due to e.g. power
management.

Instead assign vectors to all present CPUs to avoid this churn.

Build a map of all possible CPUs for a given node, as the architectures
only provide a map of all onlines CPUs. Do this dynamically on each call
for the vector assingments, which is a bit suboptimal and could be
optimized in the future by provinding a mapping from the arch code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Thomas Gleixner
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: Sagi Grimberg
Cc: Marc Zyngier
Cc: Michael Ellerman
Cc: linux-nvme@lists.infradead.org
Cc: Keith Busch
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20170603140403.27379-5-hch@lst.de

Christoph Hellwig
2017-06-23 00:21:26 +0800

23 May, 2017

1 commit

6f9a22bc5 PCI/MSI: Ignore affinity if pre/post vector count is more than min_vecs ... Browse Code »

min_vecs is the minimum amount of vectors needed to operate in MSI-X mode
which may just include the vectors that don't need affinity.

Disabling affinity settings causes the qla2xxx driver scsi_add_host() to fail
when blk_mq is enabled as the blk_mq_pci_map_queues() expects affinity masks
on each vector.

Fixes: dfef358bd1be ("PCI/MSI: Don't apply affinity if there aren't enough vectors left")
Signed-off-by: Michael Hernandez
Signed-off-by: Himanshu Madhani
Signed-off-by: Bjorn Helgaas
Reviewed-by: Christoph Hellwig
Cc: stable@vger.kernel.org # v4.10+

Michael Hernandez
2017-05-23 04:06:05 +0800

20 Apr, 2017

1 commit

b72f8051f genirq/affinity: Fix calculating vectors to assign ... Browse Code »

The vectors_per_node is calculated from the remaining available vectors.
The current vector starts after pre_vectors, so we need to subtract that
from the current to properly account for the number of remaining vectors
to assign.

Fixes: 3412386b531 ("irq/affinity: Fix extra vecs calculation")
Reported-by: Andrei Vagin
Signed-off-by: Keith Busch
Link: http://lkml.kernel.org/r/1492645870-13019-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner

Keith Busch
2017-04-20 22:03:09 +0800

14 Apr, 2017

1 commit

3412386b5 irq/affinity: Fix extra vecs calculation ... Browse Code »

This fixes a math error calculating the extra_vecs. The error assumed
only 1 cpu per vector, but the value needs to account for the actual
number of cpus per vector in order to get the correct remainder for
extra CPU assignment.

Fixes: 7bf8222b9bd0 ("irq/affinity: Fix CPU spread for unbalanced nodes")
Reported-by: Xiaolong Ye
Signed-off-by: Keith Busch
Link: http://lkml.kernel.org/r/1492104492-19943-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner

Keith Busch
2017-04-14 05:41:00 +0800

04 Apr, 2017

1 commit

7bf8222b9 irq/affinity: Fix CPU spread for unbalanced nodes ... Browse Code »

The irq_create_affinity_masks routine is responsible for assigning a
number of interrupt vectors to CPUs. The optimal assignemnet will spread
requested vectors to all CPUs, with the fewest CPUs sharing a vector.

The algorithm may fail to assign some vectors to any CPUs if a node's
CPU count is lower than the average number of vectors per node. These
vectors are unusable and create an un-optimal spread.

Recalculate the number of vectors to assign at each node iteration by using
the remaining number of vectors and nodes to be assigned, not exceeding the
number of CPUs in that node. This will guarantee that every CPU is assigned
at least one vector.

Signed-off-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Cc: linux-nvme@lists.infradead.org
Link: http://lkml.kernel.org/r/1491247553-7603-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner

Keith Busch
2017-04-04 17:57:28 +0800

15 Dec, 2016

1 commit

c0af52437 genirq/affinity: Fix node generation from cpumask ... Browse Code »

Commit 34c3d9819fda ("genirq/affinity: Provide smarter irq spreading
infrastructure") introduced a better IRQ spreading mechanism, taking
account of the available NUMA nodes in the machine.

Problem is that the algorithm of retrieving the nodemask iterates
"linearly" based on the number of online nodes - some architectures
present non-linear node distribution among the nodemask, like PowerPC.
If this is the case, the algorithm lead to a wrong node count number
and therefore to a bad/incomplete IRQ affinity distribution.

For example, this problem were found in a machine with 128 CPUs and two
nodes, namely nodes 0 and 8 (instead of 0 and 1, if it was linearly
distributed). This led to a wrong affinity distribution which then led to
a bad mq allocation for nvme driver.

Finally, we take the opportunity to fix a comment regarding the affinity
distribution when we have _more_ nodes than vectors.

Fixes: 34c3d9819fda ("genirq/affinity: Provide smarter irq spreading infrastructure")
Reported-by: Gabriel Krisman Bertazi
Signed-off-by: Guilherme G. Piccoli
Reviewed-by: Christoph Hellwig
Reviewed-by: Gabriel Krisman Bertazi
Reviewed-by: Gavin Shan
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: hch@lst.de
Link: http://lkml.kernel.org/r/1481738472-2671-1-git-send-email-gpiccoli@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner

Guilherme G. Piccoli
2016-12-15 19:32:35 +0800

17 Nov, 2016

2 commits

b6e5d5b94 genirq/affinity: Use default affinity mask for reserved vectors ... Browse Code »

The reserved vectors at the beginning and the end of the vector space get
cpu_possible_mask assigned as their affinity mask.

All other non-auto affine interrupts get the default irq affinity mask
assigned. Using cpu_possible_mask breaks that rule.

Treat them like any other interrupt and use irq_default_affinity as target
mask.

Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig

Thomas Gleixner
2016-11-17 01:44:01 +0800
bfe130773 genirq/affinity: Take reserved vectors into account when spreading irqs ... Browse Code »

The recent addition of reserved vectors at the beginning or the end of the
vector space did not take the reserved vectors at the beginning into
account for the various loop exit conditions. As a consequence the last
vectors of the spread area are not included into the spread algorithm and
are treated like the reserved vectors at the end of the vector space and
get the default affinity mask assigned.

Sum up the affinity vectors and the reserved vectors at the beginning and
use the sum as exit condition.

[ tglx: Fixed all conditions instead of only one and massaged changelog ]

Signed-off-by: Christoph Hellwig
Link: http://lkml.kernel.org/r/1479201178-29604-2-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2016-11-17 01:44:01 +0800

09 Nov, 2016

2 commits

67c93c218 genirq/affinity: Handle pre/post vectors in irq_create_affinity_masks() ... Browse Code »

Only calculate the affinity for the main I/O vectors, and skip the
pre or post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used.
If we ever need it in the future we can pass it through struct
irq_affinity.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Acked-by: Bjorn Helgaas
Acked-by: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/1478654107-7384-4-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2016-11-09 15:25:09 +0800
212bd8462 genirq/affinity: Handle pre/post vectors in irq_calc_affinity_vectors() ... Browse Code »

Only calculate the affinity for the main I/O vectors, and skip the pre or
post vectors specified by struct irq_affinity.

Also remove the irq_affinity cpumask argument that has never been used. If
we ever need it in the future we can pass it through struct irq_affinity.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Acked-by: Jens Axboe
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/1478654107-7384-3-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2016-11-09 15:25:08 +0800

15 Sep, 2016

2 commits

44082fd67 genirq/affinity: Remove old irq spread infrastructure ... Browse Code »

No more users.

Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-5-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-09-15 04:11:09 +0800
34c3d9819 genirq/affinity: Provide smarter irq spreading infrastructure ... Browse Code »

The current irq spreading infrastructure is just looking at a cpumask and
tries to spread the interrupts over the mask. Thats suboptimal as it does
not take numa nodes into account.

Change the logic so the interrupts are spread across numa nodes and inside
the nodes. If there are more cpus than vectors per node, then we set the
affinity to several cpus. If HT siblings are available we take that into
account and try to set all siblings to a single vector.

Signed-off-by: Thomas Gleixner
Cc: Christoph Hellwig
Cc: axboe@fb.com
Cc: keith.busch@intel.com
Cc: agordeev@redhat.com
Cc: linux-block@vger.kernel.org
Link: http://lkml.kernel.org/r/1473862739-15032-3-git-send-email-hch@lst.de

Thomas Gleixner
2016-09-15 04:11:08 +0800

22 Aug, 2016

1 commit

3ee0ce2a5 genirq/affinity: Use get/put_online_cpus around cpumask operations ... Browse Code »

Without locking out CPU mask operations we might end up with an inconsistent
view of the cpumask in the function.

Fixes: 5e385a6ef31f: "genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors"
Signed-off-by: Christoph Hellwig
Link: http://lkml.kernel.org/r/1470924405-25728-1-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2016-08-22 17:22:44 +0800

04 Jul, 2016

1 commit

5e385a6ef genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors ... Browse Code »

This is lifted from the blk-mq code and adopted to use the affinity mask
concept just introduced in the irq handling code. It tries to keep the
algorithm the same as the one current used by blk-mq, but improvements
like assining vectors on a per-node basis instead of just per sibling
are possible with this simple move and refactoring.

Signed-off-by: Christoph Hellwig
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: axboe@fb.com
Cc: agordeev@redhat.com
Link: http://lkml.kernel.org/r/1467621574-8277-7-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2016-07-04 18:25:14 +0800