Eric Lee / smarc-fsl-linux-kernel

01 Oct, 2018

1 commit

c0aac682f Merge tag 'v4.19-rc6' into for-4.20/block ... Browse Code »

Merge -rc6 in, for two reasons:

1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.

Signed-off-by: Jens Axboe

* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...

Signed-off-by: Jens Axboe

Jens Axboe
2018-10-01 22:58:57 +0800

26 Sep, 2018

1 commit

530ca2c9b blk-mq: Allow blocking queue tag iter callbacks ... Browse Code »

A recent commit runs tag iterator callbacks under the rcu read lock,
but existing callbacks do not satisfy the non-blocking requirement.
The commit intended to prevent an iterator from accessing a queue that's
being modified. This patch fixes the original issue by taking a queue
reference instead of reading it, which allows callbacks to make blocking
calls.

Fixes: f5bbbbe4d6357 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Acked-by: Jianchao Wang
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2018-09-26 10:17:59 +0800

22 Sep, 2018

1 commit

c7b1bf5cc blk-mq: Document the functions that iterate over requests ... Browse Code »

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Jianchao Wang
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe

Bart Van Assche
2018-09-22 10:30:22 +0800

23 Aug, 2018

1 commit

5bed49adf Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:

- Set of bcache fixes and changes (Coly)

- The flush warn fix (me)

- Small series of BFQ fixes (Paolo)

- wbt hang fix (Ming)

- blktrace fix (Steven)

- blk-mq hardware queue count update fix (Jianchao)

- Various little fixes

* tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits)
block/DAC960.c: make some arrays static const, shrinks object size
blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
blk-mq: init hctx sched after update ctx and hctx mapping
block: remove duplicate initialization
tracing/blktrace: Fix to allow setting same value
pktcdvd: fix setting of 'ret' error return for a few cases
block: change return type to bool
block, bfq: return nbytes and not zero from struct cftype .write() method
block, bfq: improve code of bfq_bfqq_charge_time
block, bfq: reduce write overcharge
block, bfq: always update the budget of an entity when needed
block, bfq: readd missing reset of parent-entity service
blk-wbt: fix IO hang in wbt_wait()
block: don't warn for flush on read-only device
bcache: add the missing comments for smp_mb()/smp_wmb()
bcache: remove unnecessary space before ioctl function pointer arguments
bcache: add missing SPDX header
bcache: move open brace at end of function definitions to next line
bcache: add static const prefix to char * array declarations
bcache: fix code comments style
...

Linus Torvalds
2018-08-23 04:38:05 +0800

21 Aug, 2018

1 commit

f5bbbbe4d blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter ... Browse Code »

For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.

Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.

Signed-off-by: Jianchao Wang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-21 23:02:56 +0800

15 Aug, 2018

1 commit

73ba2fb33 Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.

This pull request contains:

- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)

- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes

- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)

- Series improving how we handle sense information on the stack
(Kees)

- Lightnvm fixes and updates/improvements (Mathias/Javier et al)

- Zoned device support for null_blk (Matias)

- AIX partition fixes (Mauricio Faria de Oliveira)

- DIF checksum code made generic (Max Gurtovoy)

- Add support for discard in iostats (Michael Callahan / Tejun)

- Set of updates for BFQ (Paolo)

- Removal of async write support for bsg (Christoph)

- Bio page dirtying and clone fixups (Christoph)

- Set of bcache fix/changes (via Coly)

- Series improving blk-mq queue setup/teardown speed (Ming)

- Series improving merging performance on blk-mq (Ming)

- Lots of other fixes and cleanups from a slew of folks"

* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...

Linus Torvalds
2018-08-15 01:23:25 +0800

09 Aug, 2018

1 commit

d263ed992 blk-mq: count the hctx as active before allocating tag ... Browse Code »

Currently, we count the hctx as active after allocate driver tag
successfully. If a previously inactive hctx try to get tag first
time, it may fails and need to wait. However, due to the stale tag
->active_queues, the other shared-tags users are still able to
occupy all driver tags while there is someone waiting for tag.
Consequently, even if the previously inactive hctx is waked up, it
still may not be able to get a tag and could be starved.

To fix it, we count the hctx as active before try to allocate driver
tag, then when it is waiting the tag, the other shared-tag users
will reserve budget for it.

Reviewed-by: Ming Lei
Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-09 22:34:17 +0800

03 Aug, 2018

2 commits

2d5ba0e2d blk-mq: fix blk_mq_tagset_busy_iter ... Browse Code »

Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight requests
in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT'
to replace 'blk_mq_request_started(req)', this way is wrong, and causes
lots of test system hang during booting.

Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter().

Fixes: d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter")
Cc: Josef Bacik
Cc: Christoph Hellwig
Cc: Guenter Roeck
Cc: Mark Brown
Cc: Matt Hart
Cc: Johannes Thumshirn
Cc: John Garry
Cc: Hannes Reinecke ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Bart Van Assche
Tested-by: Guenter Roeck
Reported-by: Mark Brown
Reported-by: Guenter Roeck
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-08-03 04:47:20 +0800
75d6e175f blk-mq: fix updating tags depth ... Browse Code »

The passed 'nr' from userspace represents the total depth, meantime
inside 'struct blk_mq_tags', 'nr_tags' stores the total tag depth,
and 'nr_reserved_tags' stores the reserved part.

There are two issues in blk_mq_tag_update_depth() now:

1) for growing tags, we should have used the passed 'nr', and keep the
number of reserved tags not changed.

2) the passed 'nr' should have been used for checking against
'tags->nr_tags', instead of number of the normal part.

This patch fixes the above two cases, and avoids kernel crash caused
by wrong resizing sbitmap queue.

Cc: "Ewan D. Milne"
Cc: Christoph Hellwig
Cc: Bart Van Assche
Cc: Omar Sandoval
Tested by: Marco Patalano
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-08-03 04:41:58 +0800

14 Jun, 2018

1 commit

e6c3456aa blk-mq: remove blk_mq_tagset_iter ... Browse Code »

Unused now that nvme stopped using it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jens Axboe

Christoph Hellwig
2018-06-14 23:01:45 +0800

31 May, 2018

1 commit

d250bf4e7 blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter ... Browse Code »

We already check for started commands in all callbacks, but we should
also protect against already completed commands. Do this by taking
the checks to common code.

Acked-by: Josef Bacik
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-31 01:31:34 +0800

25 May, 2018

1 commit

e6fc46498 blk-mq: avoid starving tag allocation after allocating process migrates ... Browse Code »

When the allocation process is scheduled back and the mapped hw queue is
changed, fake one extra wake up on previous queue for compensating wake
up miss, so other allocations on the previous queue won't be starved.

This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.

The race is as follows:

1) 2 hw queues, nr_requests are 2, and wake_batch is one

2) there are 3 waiters on hw queue 0

3) two in-flight requests in hw queue 0 are completed, and only two
waiters of 3 are waken up because of wake_batch, but both the two
waiters can be scheduled to another CPU and cause to switch to hw
queue 1

4) then the 3rd waiter will wait for ever, since no in-flight request
is in hw queue 0 any more.

5) this patch fixes it by the fake wakeup when waiter is scheduled to
another hw queue

Cc:
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei

Modified commit message to make it clearer, and make it apply on
top of the 4.18 branch.

Signed-off-by: Jens Axboe

Ming Lei
2018-05-25 01:00:39 +0800

23 Dec, 2017

1 commit

4e5dff41b blk-mq: improve heavily contended tag case ... Browse Code »

Even with a number of waitqueues, we can get into a situation where we
are heavily contended on the waitqueue lock. I got a report on spc1
where we're spending seconds doing this. Arguably the use case is nasty,
I reproduce it with one device and 1000 threads banging on the device.
But that doesn't mean we shouldn't be handling it better.

What ends up happening is that a thread will fail to get a tag, add
itself to the waitqueue, and subsequently get woken up when a tag is
freed - only to find itself going back to sleep on the waitqueue.

Instead of waking all threads, use an exclusive wait and wake up our
sbitmap batch count instead. This seems to work well for me (massive
improvement for this use case), and it survives basic testing. But I
haven't fully verified it yet.

An additional improvement is running the queue and checking for a new
tag BEFORE needing to add ourselves to the waitqueue.

Signed-off-by: Jens Axboe

Jens Axboe
2017-12-23 02:09:37 +0800

19 Oct, 2017

2 commits

dab7487bd block: remove blk_mq_reinit_tagset ... Browse Code »

No callers left.

Reviewed-by: Jens Axboe
Reviewed-by: Bart Van Assche
Reviewed-by: Max Gurtovoy
Reviewed-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2017-10-19 01:27:49 +0800
149e10f8f block: introduce blk_mq_tagset_iter ... Browse Code »

Iterator helper to apply a function on all the
tags in a given tagset. export it as it will be used
outside the block layer later on.

Reviewed-by: Bart Van Assche
Reviewed-by: Jens Axboe
Reviewed-by: Max Gurtovoy
Reviewed-by: Johannes Thumshirn
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig

Sagi Grimberg
2017-10-19 01:27:48 +0800

18 Aug, 2017

1 commit

d352ae205 blk-mq: Make blk_mq_reinit_tagset() calls easier to read ... Browse Code »

Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.

Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Cc: Christoph Hellwig
Cc: Sagi Grimberg
Cc: James Smart
Cc: Johannes Thumshirn
Signed-off-by: Jens Axboe

Bart Van Assche
2017-08-18 22:36:58 +0800

10 Aug, 2017

1 commit

7f5562d5e blk-mq-tag: check for NULL rq when iterating tags ... Browse Code »

Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.

This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.

Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2017-08-10 03:09:08 +0800

15 Apr, 2017

1 commit

229a92873 blk-mq: add shallow depth option for blk_mq_get_tag() ... Browse Code »

Wire up the sbitmap_get_shallow() operation to the tag code so that a
caller can limit the number of tags available to it.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-15 04:06:54 +0800

13 Mar, 2017

1 commit

0067d4b02 blk-mq: Fix tagset reinit in the presence of cpu hot-unplug ... Browse Code »

In case cpu was unplugged, we need to make sure not to assume
that the tags for that cpu are still allocated. so check
for null tags when reinitializing a tagset.

Reported-by: Yi Zhang
Signed-off-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Sagi Grimberg
2017-03-13 22:14:23 +0800

02 Mar, 2017

1 commit

415b806de blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset ... Browse Code »

Signed-off-by: Sagi Grimberg

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.

Signed-off-by: Jens Axboe

Sagi Grimberg
2017-03-02 23:56:04 +0800

28 Jan, 2017

1 commit

bd6737f1a blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() ... Browse Code »

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().

Signed-off-by: Jens Axboe

Jens Axboe
2017-01-28 00:03:14 +0800

27 Jan, 2017

1 commit

d96b37c0a blk-mq: move tags and sched_tags info from sysfs to debugfs ... Browse Code »

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.

Reviewed-by: Hannes Reinecke
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-01-27 23:17:44 +0800

25 Jan, 2017

1 commit

200e86b33 blk-mq: only apply active queue tag throttling for driver tags ... Browse Code »

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.

Signed-off-by: Jens Axboe

Jens Axboe
2017-01-25 23:11:38 +0800

21 Jan, 2017

1 commit

70f36b600 blk-mq: allow resize of scheduler requests ... Browse Code »

Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-01-21 00:05:53 +0800

19 Jan, 2017

1 commit

8cecb07d7 blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL ... Browse Code »

We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.

Reported-by: Dan Carpenter
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: Jens Axboe

Jens Axboe
2017-01-19 22:43:05 +0800

18 Jan, 2017

2 commits

2af8cbe30 blk-mq: split tag ->rqs[] into two ... Browse Code »

This is in preparation for having two sets of tags available. For
that we need a static index, and a dynamically assignable one.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-01-18 01:04:15 +0800
4941115be blk-mq-tag: cleanup the normal/reserved tag allocation ... Browse Code »

This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().

Signed-off-by: Jens Axboe
[hch: even more cleanups]
Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval

Jens Axboe
2017-01-18 01:03:59 +0800

10 Oct, 2016

1 commit

12e3d3cdd Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block ... Browse Code »

Pull blk-mq irq/cpu mapping updates from Jens Axboe:
"This is the block-irq topic branch for 4.9-rc. It's mostly from
Christoph, and it allows drivers to specify their own mappings, and
more importantly, to share the blk-mq mappings with the IRQ affinity
mappings. It's a good step towards making this work better out of the
box"

* 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
blk_mq: linux/blk-mq.h does not include all the headers it depends on
blk-mq: kill unused blk_mq_create_mq_map()
blk-mq: get rid of the cpumask in struct blk_mq_tags
nvme: remove the post_scan callout
nvme: switch to use pci_alloc_irq_vectors
blk-mq: provide a default queue mapping for PCI device
blk-mq: allow the driver to pass in a queue mapping
blk-mq: remove ->map_queue
blk-mq: only allocate a single mq_map per tag_set
blk-mq: don't redistribute hardware queues on a CPU hotplug event

Linus Torvalds
2016-10-10 08:29:33 +0800

17 Sep, 2016

4 commits

98d95416d sbitmap: randomize initial alloc_hint values ... Browse Code »

In order to get good cache behavior from a sbitmap, we want each CPU to
stick to its own cacheline(s) as much as possible. This might happen
naturally as the bitmap gets filled up and the alloc_hint values spread
out, but we really want this behavior from the start. blk-mq apparently
intended to do this, but the code to do this was never wired up. Get rid
of the dead code and make it part of the sbitmap library.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:39:14 +0800
f4a644db8 sbitmap: push alloc policy into sbitmap_queue ... Browse Code »

Again, there's no point in passing this in every time. Make it part of
struct sbitmap_queue and clean up the API.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:39:12 +0800
40aabb674 sbitmap: push per-cpu last_tag into sbitmap_queue ... Browse Code »

Allocating your own per-cpu allocation hint separately makes for an
awkward API. Instead, allocate the per-cpu hint as part of the struct
sbitmap_queue. There's no point for a struct sbitmap_queue without the
cache, but you can still use a bare struct sbitmap.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:39:10 +0800
88459642c blk-mq: abstract tag allocation out into sbitmap library ... Browse Code »

This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.

The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.

This should be a complete noop functionality-wise.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:38:44 +0800

15 Sep, 2016

2 commits

1b157939f blk-mq: get rid of the cpumask in struct blk_mq_tags ... Browse Code »

Unused now that NVMe sets up irq affinity before calling into blk-mq.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
7d7e0f90b blk-mq: remove ->map_queue ... Browse Code »

All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800

08 Jul, 2016

1 commit

486cf9899 blk-mq: Introduce blk_mq_reinit_tagset ... Browse Code »

The new nvme-rdma driver will need to reinitialize all the tags as part of
the error recovery procedure (realloc the tag memory region). Add a helper
in blk-mq for it that can iterate over all requests in a tagset to make
this easier.

Signed-off-by: Sagi Grimberg
Tested-by: Ming Lin
Reviewed-by: Stephen Bates
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Tested-by: Steve Wise
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-07-08 22:38:49 +0800

13 Apr, 2016

2 commits

e8f1e1630 blk-mq: Make blk_mq_all_tag_busy_iter static ... Browse Code »

No caller outside the blk-mq code so we can settle
with it static.

Signed-off-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-04-13 05:07:36 +0800
e0489487e blk-mq: Export tagset iter function ... Browse Code »

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.

Signed-off-by: Sagi Grimberg
[hch: carefully check for valid tagsets]
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-04-13 03:43:53 +0800

02 Dec, 2015

1 commit

6f3b0e8bc blk-mq: add a flags parameter to blk_mq_alloc_request ... Browse Code »

We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-02 01:53:59 +0800

07 Nov, 2015

1 commit

d0164adc8 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep an… ... Browse Code »

…d avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2015-11-07 09:50:42 +0800

05 Nov, 2015

1 commit

d9734e0d1 Merge branch 'for-4.4/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"This is the core block pull request for 4.4. I've got a few more
topic branches this time around, some of them will layer on top of the
core+drivers changes and will come in a separate round. So not a huge
chunk of changes in this round.

This pull request contains:

- Enable blk-mq page allocation tracking with kmemleak, from Catalin.

- Unused prototype removal in blk-mq from Christoph.

- Cleanup of the q->blk_trace exchange, using cmpxchg instead of two
xchg()'s, from Davidlohr.

- A plug flush fix from Jeff.

- Also from Jeff, a fix that means we don't have to update shared tag
sets at init time unless we do a state change. This cuts down boot
times on thousands of devices a lot with scsi/blk-mq.

- blk-mq waitqueue barrier fix from Kosuke.

- Various fixes from Ming:

- Fixes for segment merging and splitting, and checks, for
the old core and blk-mq.

- Potential blk-mq speedup by marking ctx pending at the end
of a plug insertion batch in blk-mq.

- direct-io no page dirty on kernel direct reads.

- A WRITE_SYNC fix for mpage from Roman"

* 'for-4.4/core' of git://git.kernel.dk/linux-block:
blk-mq: avoid excessive boot delays with large lun counts
blktrace: re-write setting q->blk_trace
blk-mq: mark ctx as pending at batch in flush plug path
blk-mq: fix for trace_block_plug()
block: check bio_mergeable() early before merging
blk-mq: check bio_mergeable() early before merging
block: avoid to merge splitted bio
block: setup bi_phys_segments after splitting
block: fix plug list flushing for nomerge queues
blk-mq: remove unused blk_mq_clone_flush_request prototype
blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c
fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read
fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write
block: kmemleak: Track the page allocations for struct request

Linus Torvalds
2015-11-05 12:28:10 +0800