Eric Lee / smarc-fsl-linux-kernel

05 Aug, 2019

1 commit

556f36e90 blk-mq: balance mapping between present CPUs and queues ... Browse Code »

Spread queues among present CPUs first, then building mapping on other
non-present CPUs.

So we can minimize count of dead queues which are mapped by un-present
CPUs only. Then bad IO performance can be avoided by unbalanced mapping
between present CPUs and queues.

The similar policy has been applied on Managed IRQ affinity.

Cc: Yi Zhang
Reported-by: Yi Zhang
Reviewed-by: Bob Liu
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-08-05 11:43:12 +0800

01 Jun, 2019

2 commits

cd669f88b blk-mq: Document the blk_mq_hw_queue_to_node() arguments ... Browse Code »

Document the meaning of the blk_mq_hw_queue_to_node() arguments.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-06-01 05:12:34 +0800
ef025d7ec blk-mq: Fix spelling in a source code comment ... Browse Code »

Change one occurrence of 'performace' into 'performance'.

Cc: Max Gurtovoy
Fixes: fe631457ff3e ("blk-mq: map all HWQ also in hyperthreaded system") # v4.13.
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-06-01 05:12:34 +0800

01 May, 2019

1 commit

3dcf60bcb block: add SPDX tags to block layer files missing licensing information ... Browse Code »

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:03 +0800

08 Nov, 2018

2 commits

843477d4c blk-mq: initial support for multiple queue maps ... Browse Code »

Add a queue offset to the tag map. This enables users to map
iteratively, for each queue map type they support.

Bump maximum number of supported maps to 2, we're now fully
able to support more than 1 map.

Reviewed-by: Hannes Reinecke
Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:45:00 +0800
ed76e329d blk-mq: abstract out queue map ... Browse Code »

This is in preparation for allowing multiple sets of maps per
queue, if so desired.

Reviewed-by: Hannes Reinecke
Reviewed-by: Bart Van Assche
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:44:59 +0800

10 Apr, 2018

1 commit

bffa9909a blk-mq: don't keep offline CPUs mapped to hctx 0 ... Browse Code »

From commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU),
blk-mq doesn't remap queue after CPU topo is changed, that said when
some of these offline CPUs become online, they are still mapped to
hctx 0, then hctx 0 may become the bottleneck of IO dispatch and
completion.

This patch sets up the mapping from the beginning, and aligns to
queue mapping for PCI device (blk_mq_pci_map_queues()).

Cc: Stefan Haberland
Cc: Keith Busch
Cc: stable@vger.kernel.org
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU)
Tested-by: Christian Borntraeger
Reviewed-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-04-10 22:38:46 +0800

25 Jul, 2017

1 commit

76451d79b blk-mq: map queues to all present CPUs ... Browse Code »

We already do this for PCI mappings, and the higher level code now
expects that CPU on/offlining doesn't have an affect on the queue
mappings.

Signed-off-by: Christoph Hellwig
Tested-by: Max Gurtovoy
Reviewed-by: Max Gurtovoy
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-07-25 00:01:31 +0800

04 Jul, 2017

1 commit

03ffbcdd7 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq updates from Thomas Gleixner:
"The irq department delivers:

- Expand the generic infrastructure handling the irq migration on CPU
hotplug and convert X86 over to it. (Thomas Gleixner)

Aside of consolidating code this is a preparatory change for:

- Finalizing the affinity management for multi-queue devices. The
main change here is to shut down interrupts which are affine to a
outgoing CPU and reenabling them when the CPU comes online again.
That avoids moving interrupts pointlessly around and breaking and
reestablishing affinities for no value. (Christoph Hellwig)

Note: This contains also the BLOCK-MQ and NVME changes which depend
on the rework of the irq core infrastructure. Jens acked them and
agreed that they should go with the irq changes.

- Consolidation of irq domain code (Marc Zyngier)

- State tracking consolidation in the core code (Jeffy Chen)

- Add debug infrastructure for hierarchical irq domains (Thomas
Gleixner)

- Infrastructure enhancement for managing generic interrupt chips via
devmem (Bartosz Golaszewski)

- Constification work all over the place (Tobias Klauser)

- Two new interrupt controller drivers for MVEBU (Thomas Petazzoni)

- The usual set of fixes, updates and enhancements all over the
place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
irqchip/or1k-pic: Fix interrupt acknowledgement
irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
nvme: Allocate queues for all possible CPUs
blk-mq: Create hctx for each present CPU
blk-mq: Include all present CPUs in the default queue mapping
genirq: Avoid unnecessary low level irq function calls
genirq: Set irq masked state when initializing irq_desc
genirq/timings: Add infrastructure for estimating the next interrupt arrival time
genirq/timings: Add infrastructure to track the interrupt timings
genirq/debugfs: Remove pointless NULL pointer check
irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID
irqchip/gic-v3-its: Add ACPI NUMA node mapping
irqchip/gic-v3-its-platform-msi: Make of_device_ids const
irqchip/gic-v3-its: Make of_device_ids const
irqchip/irq-mvebu-icu: Add new driver for Marvell ICU
irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP
dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU
genirq/irqdomain: Remove auto-recursive hierarchy support
irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access
...

Linus Torvalds
2017-07-04 07:50:31 +0800

29 Jun, 2017

2 commits

fe631457f blk-mq: map all HWQ also in hyperthreaded system ... Browse Code »

This patch performs sequential mapping between CPUs and queues.
In case the system has more CPUs than HWQs then there are still
CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
and their siblings to the same HWQ.
This actually fixes a bug that found unmapped HWQs in a system with
2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
running NVMEoF (opens upto maximum of 64 HWQs).

Performance results running fio (72 jobs, 128 iodepth)
using null_blk (w/w.o patch):

bs IOPS(read submit_queues=72) IOPS(write submit_queues=72) IOPS(read submit_queues=24) IOPS(write submit_queues=24)
----- ---------------------------- ------------------------------ ---------------------------- -----------------------------
512 4890.4K/4723.5K 4524.7K/4324.2K 4280.2K/4264.3K 3902.4K/3909.5K
1k 4910.1K/4715.2K 4535.8K/4309.6K 4296.7K/4269.1K 3906.8K/3914.9K
2k 4906.3K/4739.7K 4526.7K/4330.6K 4301.1K/4262.4K 3890.8K/3900.1K
4k 4918.6K/4730.7K 4556.1K/4343.6K 4297.6K/4264.5K 3886.9K/3893.9K
8k 4906.4K/4748.9K 4550.9K/4346.7K 4283.2K/4268.8K 3863.4K/3858.2K
16k 4903.8K/4782.6K 4501.5K/4233.9K 4292.3K/4282.3K 3773.1K/3773.5K
32k 4885.8K/4782.4K 4365.9K/4184.2K 4307.5K/4289.4K 3780.3K/3687.3K
64k 4822.5K/4762.7K 2752.8K/2675.1K 4308.8K/4312.3K 2651.5K/2655.7K
128k 2388.5K/2313.8K 1391.9K/1375.7K 2142.8K/2152.2K 1395.5K/1374.2K

Signed-off-by: Max Gurtovoy
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Max Gurtovoy
2017-06-29 22:40:11 +0800
5f042e7cb blk-mq: Include all present CPUs in the default queue mapping ... Browse Code »

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jens Axboe
Cc: Keith Busch
Cc: linux-block@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-2-hch@lst.de
Signed-off-by: Thomas Gleixner

Christoph Hellwig
2017-06-29 05:00:06 +0800

09 Nov, 2016

1 commit

9e5a7e229 blk-mq: export blk_mq_map_queues ... Browse Code »

This will allow SCSI to have a single blk_mq_ops structure that either
lets the LLDD map the queues to PCIe MSIx vectors or use the default.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Sagi Grimberg
Reviewed-by: Jens Axboe
Signed-off-by: Martin K. Petersen

Christoph Hellwig
2016-11-09 06:30:00 +0800

15 Sep, 2016

1 commit

da695ba23 blk-mq: allow the driver to pass in a queue mapping ... Browse Code »

This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map. This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800

04 Dec, 2015

1 commit

bffed4571 blk-mq: Avoid memoryless numa node encoded in hctx numa_node ... Browse Code »

In architecture like powerpc, we can have cpus without any local memory
attached to it (a.k.a memoryless nodes). In such cases cpu to node mapping
can result in memory allocation hints for block hctx->numa_node populated
with node values which does not have real memory.

Instead use local_memory_node(), which is guaranteed to have memory.
local_memory_node is a noop in other architectures that does not support
memoryless nodes.

Signed-off-by: Raghavendra K T
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Raghavendra K T
2015-12-04 00:56:27 +0800

30 Sep, 2015

1 commit

5778322e6 blk-mq: avoid inserting requests before establishing new mapping ... Browse Code »

Notifier callbacks for CPU_ONLINE action can be run on the other CPU
than the CPU which was just onlined. So it is possible for the
process running on the just onlined CPU to insert request and run
hw queue before establishing new mapping which is done by
blk_mq_queue_reinit_notify().

This can cause a problem when the CPU has just been onlined first time
since the request queue was initialized. At this time ctx->index_hw
for the CPU, which is the index in hctx->ctxs[] for this ctx, is still
zero before blk_mq_queue_reinit_notify() is called by notifier
callbacks for CPU_ONLINE action.

For example, there is a single hw queue (hctx) and two CPU queues
(ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and
a request is inserted into ctx1->rq_list and set bit0 in pending
bitmap as ctx1->index_hw is still zero.

And then while running hw queue, flush_busy_ctxs() finds bit0 is set
in pending bitmap and tries to retrieve requests in
hctx->ctxs[0]->rq_list. But htx->ctxs[0] is a pointer to ctx0, so the
request in ctx1->rq_list is ignored.

Fix it by ensuring that new mapping is established before onlined cpu
starts running.

Signed-off-by: Akinobu Mita
Reviewed-by: Ming Lei
Cc: Jens Axboe
Cc: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Akinobu Mita
2015-09-30 01:32:50 +0800

27 May, 2015

1 commit

06931e622 sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() ... Browse Code »

Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Thomas Gleixner
Acked-by: Russell King
Acked-by: Catalin Marinas
Cc: Benoit Cousson
Cc: Fenghua Yu
Cc: Guenter Roeck
Cc: Jean Delvare
Cc: Jonathan Corbet
Cc: Linus Torvalds
Cc: Oleg Drokin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Russell King
Cc: Viresh Kumar
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar

Bartosz Golaszewski
2015-05-27 21:22:15 +0800

10 Dec, 2014

1 commit

959f5f5b2 blk-mq: Use all available hardware queues ... Browse Code »

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

HWQ 0: 0 1
HWQ 1: 2 3
HWQ 2: 4 5
HWQ 3: (none)

This patch changes the queue assignment into:

HWQ 0: 0 1
HWQ 1: 2
HWQ 2: 3 4
HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
above example the range of the number of CPU cores associated
with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
it is now guaranteed that all CPU cores associated with a single
HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche
Reviewed-by: Sagi Grimberg
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Alexander Gordeev
Signed-off-by: Jens Axboe

Bart Van Assche
2014-12-10 00:08:21 +0800

25 Nov, 2014

1 commit

a33c1ba29 blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map ... Browse Code »

We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.

Cc: stable@kernel.org # 3.13+
Signed-off-by: Jens Axboe

Jens Axboe
2014-11-25 06:02:42 +0800

29 May, 2014

1 commit

75bb4625b blk-mq: add file comments and update copyright notices ... Browse Code »

None of the blk-mq files have an explanatory comment at the top
for what that particular file does. Add that and add appropriate
copyright notices as well.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-29 00:15:41 +0800

28 May, 2014

1 commit

f14bbe77a blk-mq: pass in suggested NUMA node to ->alloc_hctx() ... Browse Code »

Drivers currently have to figure this out on their own, and they
are missing information to do it properly. The ones that did
attempt to do it, do it wrong.

So just pass in the suggested node directly to the alloc
function.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-28 02:06:53 +0800

16 Apr, 2014

1 commit

24d2f9030 blk-mq: split out tag initialization, support shared tags ... Browse Code »

Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue. A single blk_mq_tag_set structure can be shared by multiple
queues.

Signed-off-by: Christoph Hellwig

Modular export of blk_mq_{alloc,free}_tagset added by me.

Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-16 04:18:02 +0800

21 Mar, 2014

1 commit

676141e48 blk-mq: don't dump CPU -> hw queue map on driver load ... Browse Code »

Now that we are out of initial debug/bringup mode, remove
the verbose dump of the mapping table.

Provide the mapping table in sysfs, under the hardware queue
directory, in the cpu_list file.

Signed-off-by: Jens Axboe

Jens Axboe
2014-03-21 03:31:44 +0800

25 Oct, 2013

1 commit

320ae51fe blk-mq: new multi-queue block IO queueing mechanism ... Browse Code »

Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
request units for IO. The block layer provides various helper
functionalities to let drivers share code, things like tag
management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
block layer and IO submitter. Since this bypasses the IO stack,
driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
be able to uniquely identify a request both in the driver and
to the hardware. The tagging uses per-cpu caches for freed
tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
basis. Basically the driver should be able to get a notification,
if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
submission queues. blk-mq can redirect IO completions to the
desired location.

- Support for per-request payloads. Drivers almost always need
to associate a request structure with some driver private
command structure. Drivers can tell blk-mq this at init time,
and then any request handed to the driver will have the
required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
gets neither of these. Even for high IOPS devices, merging
sequential IO reduces per-command overhead and thus
increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li
Alexander Gordeev
Christoph Hellwig
Mike Christie
Matias Bjorling
Jeff Moyer

Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:56:00 +0800