Eric Lee / smarc-fsl-linux-kernel

15 Sep, 2016

1 commit

da695ba23 blk-mq: allow the driver to pass in a queue mapping ... Browse Code »

This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map. This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800

04 Dec, 2015

1 commit

bffed4571 blk-mq: Avoid memoryless numa node encoded in hctx numa_node ... Browse Code »

In architecture like powerpc, we can have cpus without any local memory
attached to it (a.k.a memoryless nodes). In such cases cpu to node mapping
can result in memory allocation hints for block hctx->numa_node populated
with node values which does not have real memory.

Instead use local_memory_node(), which is guaranteed to have memory.
local_memory_node is a noop in other architectures that does not support
memoryless nodes.

Signed-off-by: Raghavendra K T
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Raghavendra K T
2015-12-04 00:56:27 +0800

30 Sep, 2015

1 commit

5778322e6 blk-mq: avoid inserting requests before establishing new mapping ... Browse Code »

Notifier callbacks for CPU_ONLINE action can be run on the other CPU
than the CPU which was just onlined. So it is possible for the
process running on the just onlined CPU to insert request and run
hw queue before establishing new mapping which is done by
blk_mq_queue_reinit_notify().

This can cause a problem when the CPU has just been onlined first time
since the request queue was initialized. At this time ctx->index_hw
for the CPU, which is the index in hctx->ctxs[] for this ctx, is still
zero before blk_mq_queue_reinit_notify() is called by notifier
callbacks for CPU_ONLINE action.

For example, there is a single hw queue (hctx) and two CPU queues
(ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and
a request is inserted into ctx1->rq_list and set bit0 in pending
bitmap as ctx1->index_hw is still zero.

And then while running hw queue, flush_busy_ctxs() finds bit0 is set
in pending bitmap and tries to retrieve requests in
hctx->ctxs[0]->rq_list. But htx->ctxs[0] is a pointer to ctx0, so the
request in ctx1->rq_list is ignored.

Fix it by ensuring that new mapping is established before onlined cpu
starts running.

Signed-off-by: Akinobu Mita
Reviewed-by: Ming Lei
Cc: Jens Axboe
Cc: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Akinobu Mita
2015-09-30 01:32:50 +0800

27 May, 2015

1 commit

06931e622 sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() ... Browse Code »

Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.

Signed-off-by: Bartosz Golaszewski
Reviewed-by: Thomas Gleixner
Acked-by: Russell King
Acked-by: Catalin Marinas
Cc: Benoit Cousson
Cc: Fenghua Yu
Cc: Guenter Roeck
Cc: Jean Delvare
Cc: Jonathan Corbet
Cc: Linus Torvalds
Cc: Oleg Drokin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Russell King
Cc: Viresh Kumar
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar

Bartosz Golaszewski
2015-05-27 21:22:15 +0800

10 Dec, 2014

1 commit

959f5f5b2 blk-mq: Use all available hardware queues ... Browse Code »

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

HWQ 0: 0 1
HWQ 1: 2 3
HWQ 2: 4 5
HWQ 3: (none)

This patch changes the queue assignment into:

HWQ 0: 0 1
HWQ 1: 2
HWQ 2: 3 4
HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
above example the range of the number of CPU cores associated
with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
it is now guaranteed that all CPU cores associated with a single
HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche
Reviewed-by: Sagi Grimberg
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Alexander Gordeev
Signed-off-by: Jens Axboe

Bart Van Assche
2014-12-10 00:08:21 +0800

25 Nov, 2014

1 commit

a33c1ba29 blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map ... Browse Code »

We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.

Cc: stable@kernel.org # 3.13+
Signed-off-by: Jens Axboe

Jens Axboe
2014-11-25 06:02:42 +0800

29 May, 2014

1 commit

75bb4625b blk-mq: add file comments and update copyright notices ... Browse Code »

None of the blk-mq files have an explanatory comment at the top
for what that particular file does. Add that and add appropriate
copyright notices as well.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-29 00:15:41 +0800

28 May, 2014

1 commit

f14bbe77a blk-mq: pass in suggested NUMA node to ->alloc_hctx() ... Browse Code »

Drivers currently have to figure this out on their own, and they
are missing information to do it properly. The ones that did
attempt to do it, do it wrong.

So just pass in the suggested node directly to the alloc
function.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-28 02:06:53 +0800

16 Apr, 2014

1 commit

24d2f9030 blk-mq: split out tag initialization, support shared tags ... Browse Code »

Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue. A single blk_mq_tag_set structure can be shared by multiple
queues.

Signed-off-by: Christoph Hellwig

Modular export of blk_mq_{alloc,free}_tagset added by me.

Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-16 04:18:02 +0800

21 Mar, 2014

1 commit

676141e48 blk-mq: don't dump CPU -> hw queue map on driver load ... Browse Code »

Now that we are out of initial debug/bringup mode, remove
the verbose dump of the mapping table.

Provide the mapping table in sysfs, under the hardware queue
directory, in the cpu_list file.

Signed-off-by: Jens Axboe

Jens Axboe
2014-03-21 03:31:44 +0800

25 Oct, 2013

1 commit

320ae51fe blk-mq: new multi-queue block IO queueing mechanism ... Browse Code »

Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
request units for IO. The block layer provides various helper
functionalities to let drivers share code, things like tag
management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
block layer and IO submitter. Since this bypasses the IO stack,
driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
be able to uniquely identify a request both in the driver and
to the hardware. The tagging uses per-cpu caches for freed
tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
basis. Basically the driver should be able to get a notification,
if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
submission queues. blk-mq can redirect IO completions to the
desired location.

- Support for per-request payloads. Drivers almost always need
to associate a request structure with some driver private
command structure. Drivers can tell blk-mq this at init time,
and then any request handed to the driver will have the
required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
gets neither of these. Even for high IOPS devices, merging
sequential IO reduces per-command overhead and thus
increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li
Alexander Gordeev
Christoph Hellwig
Mike Christie
Matias Bjorling
Jeff Moyer

Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:56:00 +0800