10 Jul, 2019

1 commit

  • Pull block updates from Jens Axboe:
    "This is the main block updates for 5.3. Nothing earth shattering or
    major in here, just fixes, additions, and improvements all over the
    map. This contains:

    - Series of documentation fixes (Bart)

    - Optimization of the blk-mq ctx get/put (Bart)

    - null_blk removal race condition fix (Bob)

    - req/bio_op() cleanups (Chaitanya)

    - Series cleaning up the segment accounting, and request/bio mapping
    (Christoph)

    - Series cleaning up the page getting/putting for bios (Christoph)

    - block cgroup cleanups and moving it to where it is used (Christoph)

    - block cgroup fixes (Tejun)

    - Series of fixes and improvements to bcache, most notably a write
    deadlock fix (Coly)

    - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

    - Series of improvements and fixes to BFQ (Douglas, Paolo)

    - debugfs_create() return value check removal for drbd (Greg)

    - Use struct_size(), where appropriate (Gustavo)

    - Two lighnvm fixes (Heiner, Geert)

    - MD fixes, including a read balance and corruption fix (Guoqing,
    Marcos, Xiao, Yufen)

    - block opal shadow mbr additions (Jonas, Revanth)

    - sbitmap compare-and-exhange improvemnts (Pavel)

    - Fix for potential bio->bi_size overflow (Ming)

    - NVMe pull requests:
    - improved PCIe suspent support (Keith Busch)
    - error injection support for the admin queue (Akinobu Mita)
    - Fibre Channel discovery improvements (James Smart)
    - tracing improvements including nvmetc tracing support (Minwoo Im)
    - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
    Kulkarni)"

    - Various little fixes and improvements to drivers and core"

    * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
    blk-iolatency: fix STS_AGAIN handling
    block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
    blk-mq: simplify blk_mq_make_request()
    blk-mq: remove blk_mq_put_ctx()
    sbitmap: Replace cmpxchg with xchg
    block: fix .bi_size overflow
    block: sed-opal: check size of shadow mbr
    block: sed-opal: ioctl for writing to shadow mbr
    block: sed-opal: add ioctl for done-mark of shadow mbr
    block: never take page references for ITER_BVEC
    direct-io: use bio_release_pages in dio_bio_complete
    block_dev: use bio_release_pages in bio_unmap_user
    block_dev: use bio_release_pages in blkdev_bio_end_io
    iomap: use bio_release_pages in iomap_dio_bio_end_io
    block: use bio_release_pages in bio_map_user_iov
    block: use bio_release_pages in bio_unmap_user
    block: optionally mark pages dirty in bio_release_pages
    block: move the BIO_NO_PAGE_REF check into bio_release_pages
    block: skd_main.c: Remove call to memset after dma_alloc_coherent
    block: mtip32xx: Remove call to memset after dma_alloc_coherent
    ...

    Linus Torvalds
     

09 Jul, 2019

1 commit


07 Jul, 2019

1 commit

  • When the blk-mq debugfs file creation logic was "cleaned up" it was
    cleaned up too much, causing the queue file to not be created in the
    correct location. Turns out the check for the directory being present
    is needed as if that has not happened yet, the files should not be
    created, and the function will be called later on in the initialization
    code so that the files can be created in the correct location.

    Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions")
    Reported-by: Stephen Rothwell
    Cc: linux-block@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Jens Axboe

    Greg Kroah-Hartman
     

06 Jul, 2019

1 commit

  • The iolatency controller is based on rq_qos. It increments on
    rq_qos_throttle() and decrements on either rq_qos_cleanup() or
    rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where
    blk_mq_make_request() may call both rq_qos_cleanup() and
    rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
    double decrement.

    The above works upstream as the only way we can get STS_AGAIN is from
    blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
    problem as bio_endio() skipping only happens on reserved tag allocation
    failures which can only be caused by driver bugs and already triggers
    WARN.

    However, the fix creates a not so great dependency on how STS_AGAIN can
    be propagated. Internally, we (Facebook) carry a patch that kills read
    ahead if a cgroup is io congested or a fatal signal is pending. This
    combined with chained bios progagate their bi_status to the parent is
    not already set can can cause the parent bio to not clean up properly
    even though it was successful. This consequently leaks the inflight
    counter and can hang all IOs under that blkg.

    To nip the adverse interaction early, this removes the rq_qos_cleanup()
    callback in iolatency in favor of cleaning up always on the
    rq_qos_done_bio() path.

    Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios")
    Debugged-by: Tejun Heo
    Debugged-by: Josef Bacik
    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

03 Jul, 2019

3 commits

  • Fix a regression introduced when removing bi_phys_segments for Write Zeroes
    requests, which need to have a segment count of zero, as they don't have a
    payload.

    Fixes: 14ccb66b3f58 ("block: remove the bi_phys_segments field in struct bio")
    Reported-by: Jens Axboe
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Move the blk_mq_bio_to_request() call in front of the if-statement.

    Cc: Hannes Reinecke
    Cc: Omar Sandoval
    Reviewed-by: Minwoo Im
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
    on preemption being disabled for its correctness. Since removing the CPU
    preemption calls does not measurably affect performance, simplify the
    blk-mq code by removing the blk_mq_put_ctx() function and also by not
    disabling preemption in blk_mq_get_ctx().

    Cc: Hannes Reinecke
    Cc: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

01 Jul, 2019

2 commits

  • 'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
    bytes.

    Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can
    include very limited pages, and usually at most 256, so the fs bio
    size won't be bigger than 1M bytes most of times.

    Since we support multi-page bvec, in theory one fs bio really can
    be added > 1M pages, especially in case of hugepage, or big writeback
    with too many dirty pages. Then there is chance in which .bi_size
    is overflowed.

    Fixes this issue by using bio_full() to check if the added segment may
    overflow .bi_size.

    Cc: Liu Yiding
    Cc: kernel test robot
    Cc: "Darrick J. Wong"
    Cc: linux-xfs@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 07173c3ec276 ("block: enable multipage bvecs")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
    fix. Otherwise we end up having conflicts with future patches between
    for-5.3/block and master that touch this area. In particular, it makes
    the bio_full() fix hard to backport to stable.

    * tag 'v5.2-rc6': (482 commits)
    Linux 5.2-rc6
    Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
    Bluetooth: Fix regression with minimum encryption key size alignment
    tcp: refine memory limit test in tcp_fragment()
    x86/vdso: Prevent segfaults due to hoisted vclock reads
    SUNRPC: Fix a credential refcount leak
    Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
    net :sunrpc :clnt :Fix xps refcount imbalance on the error path
    NFS4: Only set creation opendata if O_CREAT
    ARM: 8867/1: vdso: pass --be8 to linker if necessary
    KVM: nVMX: reorganize initial steps of vmx_set_nested_state
    KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
    habanalabs: use u64_to_user_ptr() for reading user pointers
    nfsd: replace Jeff by Chuck as nfsd co-maintainer
    inet: clear num_timeout reqsk_alloc()
    PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
    net: mvpp2: debugfs: Add pmap to fs dump
    ipv6: Default fib6_type to RTN_UNICAST when not set
    net: hns3: Fix inconsistent indenting
    net/af_iucv: always register net_device notifier
    ...

    Jens Axboe
     

30 Jun, 2019

3 commits

  • Check whether the shadow mbr does fit in the provided space on the
    target. Also a proper firmware should handle this case and return an
    error we may prevent problems or even damage with crappy firmwares.

    Signed-off-by: Jonas Rabenstein
    Signed-off-by: David Kozub
    Reviewed-by: Scott Bauer
    Reviewed-by: Jon Derrick
    Signed-off-by: Jens Axboe

    Jonas Rabenstein
     
  • Allow modification of the shadow mbr. If the shadow mbr is not marked as
    done, this data will be presented read only as the device content. Only
    after marking the shadow mbr as done and unlocking a locking range the
    actual content is accessible.

    Co-authored-by: David Kozub
    Signed-off-by: Jonas Rabenstein
    Signed-off-by: David Kozub
    Reviewed-by: Scott Bauer
    Reviewed-by: Jon Derrick
    Signed-off-by: Jens Axboe

    Jonas Rabenstein
     
  • Enable users to mark the shadow mbr as done without completely
    deactivating the shadow mbr feature. This may be useful on reboots,
    when the power to the disk is not disconnected in between and the shadow
    mbr stores the required boot files. Of course, this saves also the
    (few) commands required to enable the feature if it is already enabled
    and one only wants to mark the shadow mbr as done.

    Co-authored-by: David Kozub
    Signed-off-by: Jonas Rabenstein
    Signed-off-by: David Kozub
    Reviewed-by: Christoph Hellwig
    Reviewed by: Scott Bauer
    Reviewed-by: Jon Derrick
    Signed-off-by: Jens Axboe

    Jonas Rabenstein
     

29 Jun, 2019

7 commits


28 Jun, 2019

1 commit

  • In reboot tests on several devices we were seeing a "use after free"
    when slub_debug or KASAN was enabled. The kernel complained about:

    Unable to handle kernel paging request at virtual address 6b6b6c2b

    ...which is a classic sign of use after free under slub_debug. The
    stack crawl in kgdb looked like:

    0 test_bit (addr=, nr=)
    1 bfq_bfqq_busy (bfqq=)
    2 bfq_select_queue (bfqd=)
    3 __bfq_dispatch_request (hctx=)
    4 bfq_dispatch_request (hctx=)
    5 0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
    6 0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
    7 0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
    8 0xc0568d94 in blk_mq_run_work_fn (work=)
    9 0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
    10 0xc024cff4 in worker_thread (__worker=0xec6d4640)

    Digging in kgdb, it could be found that, though bfqq looked fine,
    bfqq->bic had been freed.

    Through further digging, I postulated that perhaps it is illegal to
    access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
    because the "bic" can be freed at some point in time after this call
    is made. I confirmed that there certainly were cases where the exact
    crashing code path would access the "bic" after bfq_exit_icq() had
    been called. Sspecifically I set the "bfqq->bic" to (void *)0x7 and
    saw that the bic was 0x7 at the time of the crash.

    To understand a bit more about why this crash was fairly uncommon (I
    saw it only once in a few hundred reboots), you can see that much of
    the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
    access the ->bic anymore. The only case it doesn't is if
    bfq_put_queue() sees a reference still held.

    However, even in the case when bfqq isn't freed, the crash is still
    rare. Why? I tracked what happened to the "bic" after the exit
    routine. It doesn't get freed right away. Rather,
    put_io_context_active() eventually called put_io_context() which
    queued up freeing on a workqueue. The freeing then actually happened
    later than that through call_rcu(). Despite all these delays, some
    extra debugging showed that all the hoops could be jumped through in
    time and the memory could be freed causing the original crash. Phew!

    To make a long story short, assuming it truly is illegal to access an
    icq after the "exit_icq" callback is finished, this patch is needed.

    Cc: stable@vger.kernel.org
    Reviewed-by: Paolo Valente
    Signed-off-by: Douglas Anderson
    Signed-off-by: Jens Axboe

    Douglas Anderson
     

27 Jun, 2019

2 commits

  • bio_flush_dcache_pages() is unused. Remove it.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Some debug code suggested by Paolo was tripping when I did reboot
    stress tests. Specifically in bfq_bfqq_resume_state()
    "bic->saved_wr_start_at_switch_to_srt" was later than the current
    value of "jiffies". A bit of debugging showed that
    "bic->saved_wr_start_at_switch_to_srt" was actually 0 and a bit more
    debugging showed that was because we had run through the "unlikely"
    case in the bfq_bfqq_save_state() function.

    Let's init "saved_wr_start_at_switch_to_srt" in the unlikely case to
    something sane.

    NOTE: this fixes no known real-world errors.

    Reviewed-by: Paolo Valente
    Reviewed-by: Guenter Roeck
    Signed-off-by: Douglas Anderson
    Signed-off-by: Jens Axboe

    Douglas Anderson
     

26 Jun, 2019

1 commit

  • By mistake, there is a '&' instead of a '==' in the definition of the
    macro BFQQ_TOTALLY_SEEKY. This commit replaces the wrong operator with
    the correct one.

    Fixes: 7074f076ff15 ("block, bfq: do not tag totally seeky queues as soft rt")
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

25 Jun, 2019

7 commits

  • Consider, on one side, a bfq_queue Q that remains empty while in
    service, and, on the other side, the pending I/O of bfq_queues that,
    according to their timestamps, have to be served after Q. If an
    uncontrolled amount of I/O from the latter bfq_queues were dispatched
    while Q is waiting for its new I/O to arrive, then Q's bandwidth
    guarantees would be violated. To prevent this, I/O dispatch is plugged
    until Q receives new I/O (except for a properly controlled amount of
    injected I/O). Unfortunately, preemption breaks I/O-dispatch plugging,
    for the following reason.

    Preemption is performed in two steps. First, Q is expired and
    re-scheduled. Second, the new bfq_queue to serve is chosen. The first
    step is needed by the second, as the second can be performed only
    after Q's timestamps have been properly updated (done in the
    expiration step), and Q has been re-queued for service. This
    dependency is a consequence of the way how BFQ's scheduling algorithm
    is currently implemented.

    But Q is not re-scheduled at all in the first step, because Q is
    empty. As a consequence, an uncontrolled amount of I/O may be
    dispatched until Q becomes non empty again. This breaks Q's service
    guarantees.

    This commit addresses this issue by re-scheduling Q even if it is
    empty. This in turn breaks the assumption that all scheduled queues
    are non empty. Then a few extra checks are now needed.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • BFQ enqueues the I/O coming from each process into a separate
    bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be
    served for at most timeout_sync milliseconds (default: 125 ms). This
    service scheme is prone to the following inaccuracy.

    While a bfq_queue Q1 is in service, some empty bfq_queue Q2 may
    receive I/O, and, according to BFQ's scheduling policy, may become the
    right bfq_queue to serve, in place of the currently in-service
    bfq_queue. In this respect, postponing the service of Q2 to after the
    service of Q1 finishes may delay the completion of Q2's I/O, compared
    with an ideal service in which all non-empty bfq_queues are served in
    parallel, and every non-empty bfq_queue is served at a rate
    proportional to the bfq_queue's weight. This additional delay is equal
    at most to the time Q1 may unjustly remain in service before switching
    to Q2.

    If Q1 and Q2 have the same weight, then this time is most likely
    negligible compared with the completion time to be guaranteed to Q2's
    I/O. In addition, first, one of the reasons why BFQ may want to serve
    Q1 for a while is that this boosts throughput and, second, serving Q1
    longer reduces BFQ's overhead. As a conclusion, it is usually better
    not to preempt Q1 if both Q1 and Q2 have the same weight.

    In contrast, as Q2's weight or priority becomes higher and higher
    compared with that of Q1, the above delay becomes larger and larger,
    compared with the I/O completion times that have to be guaranteed to
    Q2 according to Q2's weight. So reducing this delay may be more
    important than avoiding the costs of preempting Q1.

    Accordingly, this commit preempts Q1 if Q2 has a higher weight or a
    higher priority than Q1. Preemption causes Q1 to be re-scheduled, and
    triggers a new choice of the next bfq_queue to serve. If Q2 really is
    the next bfq_queue to serve, then Q2 will be set in service
    immediately.

    This change reduces the component of the I/O latency caused by the
    above delay by about 80%. For example, on an (old) PLEXTOR PX-256M5
    SSD, the maximum latency reported by fio drops from 15.1 to 3.2 ms for
    a process doing sporadic random reads while another process is doing
    continuous sequential reads.

    Signed-off-by: Nicola Bottura
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • A bfq_queue Q may happen to be synchronized with another
    bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to
    receive new I/O. We call Q2 "waker queue".

    If I/O plugging is being performed for Q, and Q is not receiving any
    more I/O because of the above synchronization, then, thanks to BFQ's
    injection mechanism, the waker queue is likely to get served before
    the I/O-plugging timeout fires.

    Unfortunately, this fact may not be sufficient to guarantee a high
    throughput during the I/O plugging, because the inject limit for Q may
    be too low to guarantee a lot of injected I/O. In addition, the
    duration of the plugging, i.e., the time before Q finally receives new
    I/O, may not be minimized, because the waker queue may happen to be
    served only after other queues.

    To address these issues, this commit introduces the explicit detection
    of the waker queue, and the unconditional injection of a pending I/O
    request of the waker queue on each invocation of
    bfq_dispatch_request().

    One may be concerned that this systematic injection of I/O from the
    waker queue delays the service of Q's I/O. Fortunately, it doesn't. On
    the contrary, next Q's I/O is brought forward dramatically, for it is
    not blocked for milliseconds.

    Reported-by: Srivatsa S. Bhat (VMware)
    Tested-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Until the base value for request service times gets finally computed
    for a bfq_queue, the inject limit for that queue does depend on the
    think-time state (short|long) of the queue. A timely update of the
    think time then guarantees a quicker activation or deactivation of the
    injection. Fortunately, the think time of a bfq_queue is updated in
    the same code path as the inject limit; but after the inject limit.

    This commits moves the update of the think time before the update of
    the inject limit. For coherence, it moves the update of the seek time
    too.

    Reported-by: Srivatsa S. Bhat (VMware)
    Tested-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • I/O injection gets reduced if it increases the request service times
    of the victim queue beyond a certain threshold. The threshold, in its
    turn, is computed as a function of the base service time enjoyed by
    the queue when it undergoes no injection.

    As a consequence, for injection to work properly, the above base value
    has to be accurate. In this respect, such a value may vary over
    time. For example, it varies if the size or the spatial locality of
    the I/O requests in the queue change. It is then important to update
    this value whenever possible. This commit performs this update.

    Reported-by: Srivatsa S. Bhat (VMware)
    Tested-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • One of the cases where the parameters for injection may be updated is
    when there are no more in-flight I/O requests. The number of in-flight
    requests is stored in the field bfqd->rq_in_driver of the descriptor
    bfqd of the device. So, the controlled condition is
    bfqd->rq_in_driver == 0.

    Unfortunately, this is wrong because, the instruction that checks this
    condition is in the code path that handles the completion of a
    request, and, in particular, the instruction is executed before
    bfqd->rq_in_driver is decremented in such a code path.

    This commit fixes this issue by just replacing 0 with 1 in the
    comparison.

    Reported-by: Srivatsa S. Bhat (VMware)
    Tested-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Until the base value of the request service times gets finally
    computed for a bfq_queue, the inject limit does depend on the
    think-time state (short|long). The limit must be 0 or 1 if the think
    time is deemed, respectively, as short or long. However, such a check
    and possible limit update is performed only periodically, once per
    second. So, to make the injection mechanism much more reactive, this
    commit performs the update also every time the think-time state
    changes.

    In addition, in the following special case, this commit lets the
    inject limit of a bfq_queue bfqq remain equal to 1 even if bfqq's
    think time is short: bfqq's I/O is synchronized with that of some
    other queue, i.e., bfqq may receive new I/O only after the I/O of the
    other queue is completed. Keeping the inject limit to 1 allows the
    blocking I/O to be served while bfqq is in service. And this is very
    convenient both for bfqq and for the total throughput, as explained
    in detail in the comments in bfq_update_has_short_ttime().

    Reported-by: Srivatsa S. Bhat (VMware)
    Tested-by: Srivatsa S. Bhat (VMware)
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

21 Jun, 2019

10 commits

  • Improve the print_req_error with additional request fields which are
    helpful for debugging. Use newly introduced blk_op_str() to print the
    REQ_OP_XXX in the string format.

    Reviewed-by: Chao Yu
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Now that we've a helper function blk_op_str() to convert the
    REQ_OP_XXX to string XXX, adjust the code to use that. Get rid of
    the duplicate array op_name which is now present in the blk-core.c
    which we renamed it to "blk_op_name" and open coding in the
    blk-mq-debugfs.c.

    Reviewed-by: Bart Van Assche
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • In order to centralize the REQ_OP_XXX to string conversion which can be
    used in the block layer and different places in the kernel like f2fs,
    this patch adds a new helper function along with an array similar to the
    one present in the blk-mq-debugfs.c.

    We keep this helper functionality centralize under blk-core.c instead of
    blk-mq-debugfs.c since blk-core.c is configured using CONFIG_BLOCK and
    it will not be dependent on blk-mq-debugfs.c which is configured using
    CONFIG_BLK_DEBUG_FS.

    Next patch adjusts the code in the blk-mq-debugfs.c with newly
    introduced helper.

    Reviewed-by: Bart Van Assche
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     
  • Print the calling function instead of print_req_error as a prefix, and
    print the operation and op_flags separately instead of the whole field.

    Reviewed-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This option is entirely bfq specific, give it an appropinquate name.

    Also make it depend on CONFIG_BFQ_GROUP_IOSCHED in Kconfig, as all
    the functionality already does so anyway.

    Acked-by: Tejun Heo
    Acked-by: Paolo Valente
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This function was moved from core block code and is way to generic.
    Fold it into the only caller and simplify it based on the actually
    passed arguments.

    Acked-by: Tejun Heo
    Acked-by: Paolo Valente
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This structure and assorted infrastructure is only used by the bfq I/O
    scheduler. Move it there instead of bloating the common code.

    Acked-by: Tejun Heo
    Acked-by: Paolo Valente
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • When sampling the blkcg counts we don't need atomics or per-cpu
    variables. Introduce a new structure just containing plain u64
    counters.

    Acked-by: Tejun Heo
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Returning a structure generates rather bad code, so switch to passing
    by reference. Also don't require the structure to be zeroed and add
    to the 0-initialized counters, but actually set the counters to the
    calculated value.

    Acked-by: Tejun Heo
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Trying to break up the crazy statements to something readable.
    Also switch to an unsigned counter as it can't ever turn negative.

    Reviewed-by: Chaitanya Kulkarni
    Acked-by: Tejun Heo
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig