22 Feb, 2018

1 commit

  • commit 5235553d821433e1f4fa720fd025d2c4b7ee9994 upstream.

    Mikulas reported a workload that saw bad performance, and figured
    out what it was due to various other types of requests being
    accounted as reads. Flush requests, for instance. Due to the
    high latency of those, we heavily throttle the writes to keep
    the latencies in balance. But they really should be accounted
    as writes.

    Fix this by checking the exact type of the request. If it's a
    read, account as a read, if it's a write or a flush, account
    as a write. Any other request we disregard. Previously everything
    would have been mistakenly accounted as reads.

    Reported-by: Mikulas Patocka
    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     

25 Dec, 2017

1 commit

  • [ Upstream commit b5dc5d4d1f4ff9032eb6c21a3c571a1317dc9289 ]

    Similarly to CFQ, BFQ has its write-throttling heuristics, and it
    is better not to combine them with further write-throttling
    heuristics of a different nature.
    So this commit disables write-back throttling for a device if BFQ
    is used as I/O scheduler for that device.

    Signed-off-by: Luca Miccio
    Signed-off-by: Paolo Valente
    Tested-by: Oleksandr Natalenko
    Tested-by: Lee Tibbert
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Luca Miccio
     

20 Jun, 2017

2 commits

  • So I've noticed a number of instances where it was not obvious from the
    code whether ->task_list was for a wait-queue head or a wait-queue entry.

    Furthermore, there's a number of wait-queue users where the lists are
    not for 'tasks' but other entities (poll tables, etc.), in which case
    the 'task_list' name is actively confusing.

    To clear this all up, name the wait-queue head and entry list structure
    fields unambiguously:

    struct wait_queue_head::task_list => ::head
    struct wait_queue_entry::task_list => ::entry

    For example, this code:

    rqw->wait.task_list.next != &wait->task_list

    ... is was pretty unclear (to me) what it's doing, while now it's written this way:

    rqw->wait.head.next != &wait->entry

    ... which makes it pretty clear that we are iterating a list until we see the head.

    Other examples are:

    list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
    list_for_each_entry(wq, &fence->wait.task_list, task_list) {

    ... where it's unclear (to me) what we are iterating, and during review it's
    hard to tell whether it's trying to walk a wait-queue entry (which would be
    a bug), while now it's written as:

    list_for_each_entry_safe(pos, next, &x->head, entry) {
    list_for_each_entry(wq, &fence->wait.head, entry) {

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

21 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • When CFQ is used as an elevator, it disables writeback throttling
    because they don't play well together. Later when a different elevator
    is chosen for the device, writeback throttling doesn't get enabled
    again as it should. Make sure CFQ enables writeback throttling (if it
    should be enabled by default) when we switch from it to another IO
    scheduler.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

11 Apr, 2017

1 commit

  • When CFQ calls wbt_disable_default(), it will call
    blk_stat_remove_callback() to stop gathering IO statistics for the
    purposes of writeback throttling. Later, when request_queue is
    unregistered, wbt_exit() will call blk_stat_remove_callback() again
    which will try to delete callback from the list again and possibly cause
    list corruption.

    Fix the problem by making wbt_disable_default() called wbt_exit() which
    is properly guarded against being called multiple times.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

22 Mar, 2017

2 commits

  • Currently, statistics are gathered in ~0.13s windows, and users grab the
    statistics whenever they need them. This is not ideal for both in-tree
    users:

    1. Writeback throttling wants its own dynamically sized window of
    statistics. Since the blk-stats statistics are reset after every
    window and the wbt windows don't line up with the blk-stats windows,
    wbt doesn't see every I/O.
    2. Polling currently grabs the statistics on every I/O. Again, depending
    on how the window lines up, we may miss some I/Os. It's also
    unnecessary overhead to get the statistics on every I/O; the hybrid
    polling heuristic would be just as happy with the statistics from the
    previous full window.

    This reworks the blk-stats infrastructure to be callback-based: users
    register a callback that they want called at a given time with all of
    the statistics from the window during which the callback was active.
    Users can dynamically bucketize the statistics. wbt and polling both
    currently use read vs. write, but polling can be extended to further
    subdivide based on request size.

    The callbacks are kept on an RCU list, and each callback has percpu
    stats buffers. There will only be a few users, so the overhead on the
    I/O completion side is low. The stats flushing is also simplified
    considerably: since the timer function is responsible for clearing the
    statistics, we don't have to worry about stale statistics.

    wbt is a trivial conversion. After the conversion, the windowing problem
    mentioned above is fixed.

    For polling, we register an extra callback that caches the previous
    window's statistics in the struct request_queue for the hybrid polling
    heuristic to use.

    Since we no longer have a single stats buffer for the request queue,
    this also removes the sysfs and debugfs stats entries. To replace those,
    we add a debugfs entry for the poll statistics.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • The stats buckets will become generic soon, so make the existing users
    use the common READ and WRITE definitions instead of one internal to
    blk-stat.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

02 Feb, 2017

1 commit

  • We will want to have struct backing_dev_info allocated separately from
    struct request_queue. As the first step add pointer to backing_dev_info
    to request_queue and convert all users touching it. No functional
    changes in this patch.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

03 Jan, 2017

2 commits


09 Dec, 2016

1 commit


01 Dec, 2016

1 commit

  • This adds a new block layer operation to zero out a range of
    LBAs. This allows to implement zeroing for devices that don't use
    either discard with a predictable zero pattern or WRITE SAME of zeroes.
    The prominent example of that is NVMe with the Write Zeroes command,
    but in the future, this should also help with improving the way
    zeroing discards work. For this operation, suitable entry is exported in
    sysfs which indicate the number of maximum bytes allowed in one
    write zeroes operation by the device.

    Signed-off-by: Chaitanya Kulkarni
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

29 Nov, 2016

3 commits


16 Nov, 2016

1 commit

  • The newly added driver causes a harmless warning in some configurations:

    block/blk-wbt.c:250:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration]
    static bool inline stat_sample_valid(struct blk_rq_stat *stat)

    This makes it use the expected format for the declaration.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

12 Nov, 2016

3 commits


11 Nov, 2016

1 commit

  • We can hook this up to the block layer, to help throttle buffered
    writes.

    wbt registers a few trace points that can be used to track what is
    happening in the system:

    wbt_lat: 259:0: latency 2446318
    wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
    wmean=518866, wmin=15522, wmax=5330353, wsamples=57
    wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32

    This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
    dumps the current read/write stats for that window, and wbt_step shows a
    step down event where we now scale back writes. Each trace includes the
    device, 259:0 in this case.

    Signed-off-by: Jens Axboe

    Jens Axboe