Eric Lee / smarc-fsl-linux-kernel

22 Feb, 2018

1 commit

0528a533f blk-wbt: account flush requests correctly ... Browse Code »

commit 5235553d821433e1f4fa720fd025d2c4b7ee9994 upstream.

Mikulas reported a workload that saw bad performance, and figured
out what it was due to various other types of requests being
accounted as reads. Flush requests, for instance. Due to the
high latency of those, we heavily throttle the writes to keep
the latencies in balance. But they really should be accounted
as writes.

Fix this by checking the exact type of the request. If it's a
read, account as a read, if it's a write or a flush, account
as a write. Any other request we disregard. Previously everything
would have been mistakenly accounted as reads.

Reported-by: Mikulas Patocka
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2018-02-22 22:42:29 +0800

25 Dec, 2017

1 commit

7535afccf block,bfq: Disable writeback throttling ... Browse Code »

[ Upstream commit b5dc5d4d1f4ff9032eb6c21a3c571a1317dc9289 ]

Similarly to CFQ, BFQ has its write-throttling heuristics, and it
is better not to combine them with further write-throttling
heuristics of a different nature.
So this commit disables write-back throttling for a device if BFQ
is used as I/O scheduler for that device.

Signed-off-by: Luca Miccio
Signed-off-by: Paolo Valente
Tested-by: Oleksandr Natalenko
Tested-by: Lee Tibbert
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Luca Miccio
2017-12-25 21:26:26 +0800

20 Jun, 2017

2 commits

2055da973 sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming ... Browse Code »

So I've noticed a number of instances where it was not obvious from the
code whether ->task_list was for a wait-queue head or a wait-queue entry.

Furthermore, there's a number of wait-queue users where the lists are
not for 'tasks' but other entities (poll tables, etc.), in which case
the 'task_list' name is actively confusing.

To clear this all up, name the wait-queue head and entry list structure
fields unambiguously:

struct wait_queue_head::task_list => ::head
struct wait_queue_entry::task_list => ::entry

For example, this code:

rqw->wait.task_list.next != &wait->task_list

... is was pretty unclear (to me) what it's doing, while now it's written this way:

rqw->wait.head.next != &wait->entry

... which makes it pretty clear that we are iterating a list until we see the head.

Other examples are:

list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
list_for_each_entry(wq, &fence->wait.task_list, task_list) {

... where it's unclear (to me) what we are iterating, and during review it's
hard to tell whether it's trying to walk a wait-queue entry (which would be
a bug), while now it's written as:

list_for_each_entry_safe(pos, next, &x->head, entry) {
list_for_each_entry(wq, &fence->wait.head, entry) {

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-20 18:19:14 +0800
ac6424b98 sched/wait: Rename wait_queue_t => wait_queue_entry_t ... Browse Code »

Rename:

wait_queue_t => wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-20 18:18:27 +0800

21 Apr, 2017

1 commit

99c749a4c blk-stat: kill blk_stat_rq_ddir() ... Browse Code »

No point in providing and exporting this helper. There's just
one (real) user of it, just use rq_data_dir().

Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2017-04-21 21:56:23 +0800

19 Apr, 2017

1 commit

8330cdb0f block: Make writeback throttling defaults consistent for SQ devices ... Browse Code »

When CFQ is used as an elevator, it disables writeback throttling
because they don't play well together. Later when a different elevator
is chosen for the device, writeback throttling doesn't get enabled
again as it should. Make sure CFQ enables writeback throttling (if it
should be enabled by default) when we switch from it to another IO
scheduler.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2017-04-19 22:49:03 +0800

11 Apr, 2017

1 commit

3f19cd23f block: Fix list corruption of blk stats callback list ... Browse Code »

When CFQ calls wbt_disable_default(), it will call
blk_stat_remove_callback() to stop gathering IO statistics for the
purposes of writeback throttling. Later, when request_queue is
unregistered, wbt_exit() will call blk_stat_remove_callback() again
which will try to delete callback from the list again and possibly cause
list corruption.

Fix the problem by making wbt_disable_default() called wbt_exit() which
is properly guarded against being called multiple times.

Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2017-04-11 22:09:14 +0800

22 Mar, 2017

2 commits

34dbad5d2 blk-stat: convert to callback-based statistics reporting ... Browse Code »

Currently, statistics are gathered in ~0.13s windows, and users grab the
statistics whenever they need them. This is not ideal for both in-tree
users:

1. Writeback throttling wants its own dynamically sized window of
statistics. Since the blk-stats statistics are reset after every
window and the wbt windows don't line up with the blk-stats windows,
wbt doesn't see every I/O.
2. Polling currently grabs the statistics on every I/O. Again, depending
on how the window lines up, we may miss some I/Os. It's also
unnecessary overhead to get the statistics on every I/O; the hybrid
polling heuristic would be just as happy with the statistics from the
previous full window.

This reworks the blk-stats infrastructure to be callback-based: users
register a callback that they want called at a given time with all of
the statistics from the window during which the callback was active.
Users can dynamically bucketize the statistics. wbt and polling both
currently use read vs. write, but polling can be extended to further
subdivide based on request size.

The callbacks are kept on an RCU list, and each callback has percpu
stats buffers. There will only be a few users, so the overhead on the
I/O completion side is low. The stats flushing is also simplified
considerably: since the timer function is responsible for clearing the
statistics, we don't have to worry about stale statistics.

wbt is a trivial conversion. After the conversion, the windowing problem
mentioned above is fixed.

For polling, we register an extra callback that caches the previous
window's statistics in the struct request_queue for the hybrid polling
heuristic to use.

Since we no longer have a single stats buffer for the request queue,
this also removes the sysfs and debugfs stats entries. To replace those,
we add a debugfs entry for the poll statistics.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-03-22 00:03:11 +0800
fa2e39cb9 blk-stat: use READ and WRITE instead of BLK_STAT_{READ,WRITE} ... Browse Code »

The stats buckets will become generic soon, so make the existing users
use the common READ and WRITE definitions instead of one internal to
blk-stat.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-03-22 00:03:08 +0800

02 Feb, 2017

1 commit

dc3b17cc8 block: Use pointer to backing_dev_info from request_queue ... Browse Code »

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2017-02-02 23:20:48 +0800

03 Jan, 2017

2 commits

9eca53508 block: Avoid that sparse complains about context imbalance in __wbt_wait() ... Browse Code »

This patch does not change any functionality.

Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2017-01-03 00:48:47 +0800
f2e0a0b29 block: Make wbt_wait() definition consistent with declaration ... Browse Code »

Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2017-01-03 00:46:15 +0800

09 Dec, 2016

1 commit

be07e14f9 blk-wbt: don't throttle discard or write zeroes ... Browse Code »

Both of these are metadata only commands that are not issued by the
writeback code and not directly relevant to the writeback bandwith.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-12-09 23:29:35 +0800

01 Dec, 2016

1 commit

a6f0788ec block: add support for REQ_OP_WRITE_ZEROES ... Browse Code »

This adds a new block layer operation to zero out a range of
LBAs. This allows to implement zeroing for devices that don't use
either discard with a predictable zero pattern or WRITE SAME of zeroes.
The prominent example of that is NVMe with the Write Zeroes command,
but in the future, this should also help with improving the way
zeroing discards work. For this operation, suitable entry is exported in
sysfs which indicate the number of maximum bytes allowed in one
write zeroes operation by the device.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2016-12-01 22:58:40 +0800

29 Nov, 2016

3 commits

d62118b6d blk-wbt: allow wbt to be enabled always through sysfs ... Browse Code »

Currently there's no way to enable wbt if it's not enabled in the
kernel config by default for a device. Allow a write to the
'wbt_lat_usec' queue sysfs file to enable wbt.

This is useful for both the kernel config case, but also if the
device is CFQ managed and it was turned off by default.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-29 01:27:03 +0800
fa224eed2 blk-wbt: cleanup disable-by-default for CFQ ... Browse Code »

Make it clear that we are disabling wbt for the specified queued,
if it was enabled by default. This is in preparation for allowing
users to re-enable wbt, and not have it disabled automatically
again.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-29 01:27:03 +0800
80e091d10 blk-wbt: allow reset of default latency through sysfs ... Browse Code »

Allow a write of '-1' to reset the default latency target for
a given device. This removes knowledge of the different default
settings for rotational vs non-rotational from user space.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-29 01:27:03 +0800

16 Nov, 2016

1 commit

4121d385f blk-wbt: fix old-style function declaration ... Browse Code »

The newly added driver causes a harmless warning in some configurations:

block/blk-wbt.c:250:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration]
static bool inline stat_sample_valid(struct blk_rq_stat *stat)

This makes it use the expected format for the declaration.

Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe

Arnd Bergmann
2016-11-16 23:32:40 +0800

12 Nov, 2016

3 commits

382cf633e blk-wbt: use BLK_STAT_{READ,WRITE} instead of 0/1 ... Browse Code »

Since we have proper enums for the stats directions, use them.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-12 07:18:29 +0800
8054b89f8 blk-wbt: remove stat ops ... Browse Code »

Again a leftover from when the throttling code was generic. Now that we
just have the block user, get rid of the stat ops and indirections.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-12 07:18:24 +0800
d8a0cbfd7 blk-wbt: store queue instead of bdi ... Browse Code »

The bdi was a leftover from when the code was block layer agnostic.
Now that we just support a block layer user, store the queue directly.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-12 07:18:18 +0800

11 Nov, 2016

1 commit

e34cbd307 blk-wbt: add general throttling mechanism ... Browse Code »

We can hook this up to the block layer, to help throttle buffered
writes.

wbt registers a few trace points that can be used to track what is
happening in the system:

wbt_lat: 259:0: latency 2446318
wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
wmean=518866, wmin=15522, wmax=5330353, wsamples=57
wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32

This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
dumps the current read/write stats for that window, and wbt_step shows a
step down event where we now scale back writes. Each trace includes the
device, 259:0 in this case.

Signed-off-by: Jens Axboe

Jens Axboe
2016-11-11 04:53:32 +0800