31 May, 2018

25 commits

  • The maximum possible duration of the weight-raising period for
    interactive applications is limited to 13 seconds, as this is the time
    needed to load the largest application that we considered when tuning
    weight raising. Unfortunately, in such an evaluation, we did not
    consider the case of very slow virtual machines.

    For example, on a QEMU/KVM virtual machine
    - running in a slow PC;
    - with a virtual disk stacked on a slow low-end 5400rpm HDD;
    - serving a heavy I/O workload, such as the sequential reading of
    several files;
    mplayer takes 23 seconds to start, if constantly weight-raised.

    To address this issue, this commit conservatively sets the upper limit
    for weight-raising duration to 25 seconds.

    Signed-off-by: Davide Sapienza
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Davide Sapienza
     
  • BFQ computes the duration of weight raising for interactive
    applications automatically, using some reference parameters. In
    particular, BFQ uses the best durations (see comments in the code for
    how these durations have been assessed) for two classes of systems:
    slow and fast ones. Examples of slow systems are old phones or systems
    using micro HDDs. Fast systems are all the remaining ones. Using these
    parameters, BFQ computes the actual duration of the weight raising,
    for the system at hand, as a function of the relative speed of the
    system w.r.t. the speed of a reference system, belonging to the same
    class of systems as the system at hand.

    This slow vs fast differentiation proved to be useful in the past, but
    happens to have little meaning with current hardware. Even worse, it
    does cause problems in virtual systems, where the speed of the system
    can vary frequently, and so widely to just confuse the class-detection
    mechanism, and, as we have verified experimentally, to cause BFQ to
    compute non-sensical weight-raising durations.

    This commit addresses this issue by removing the slow class and the
    class-detection mechanism.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • A description of how weight raising works is missing in BFQ
    sources. In addition, the code for handling weight raising is
    scattered across a few functions. This makes it rather hard to
    understand the mechanism and its rationale. This commits adds such a
    description at the beginning of the main source file.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Since bfq_finish_request() is always called on the request 'next',
    after bfq_requests_merged() is finished, and bfq_finish_request()
    removes 'next' from its bfq_queue if needed, it isn't necessary to do
    such a removal in advance in bfq_merged_requests().

    This commit removes such a useless 'next' removal.

    Signed-off-by: Filippo Muzzini
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Filippo Muzzini
     
  • The request rq passed to the function bfq_requests_merged is always in
    a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the
    beginning of bfq_requests_merged always succeeds, and the control
    flow systematically skips to the end of the function. This implies
    that the body of the function is never executed, i.e., the
    repositioning of rq is never performed.

    On the opposite end, a control is missing in the body of the function:
    'next' must be removed only if it is inside a bfq_queue.

    This commit removes the wrong check on rq, and adds the missing check
    on 'next'. In addition, this commit adds comments on
    bfq_requests_merged.

    Signed-off-by: Filippo Muzzini
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • In bfq_requests_merged(), there is a deadlock because the lock on
    bfqq->bfqd->lock is held by the calling function, but the code of
    this function tries to grab the lock again.

    This deadlock is currently hidden by another bug (fixed by next commit
    for this source file), which causes the body of bfq_requests_merged()
    to be never executed.

    This commit removes the deadlock by removing the lock/unlock pair.

    Signed-off-by: Filippo Muzzini
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Filippo Muzzini
     
  • Missed converting the bioset_integrity_create() bounce bio set
    call.

    Fixes: 338aa96d5661 ("block: convert bounce, q->bio_split to bioset_init()/mempool_init()")
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • All users have been converted to bioset_init(), kill off the
    old API.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert XFS to embedded bio sets.

    Acked-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert btrfs to embedded bio sets.

    Acked-by: Chris Mason
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert block DIO code to embedded bio sets.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert the target code to embedded bio sets.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert dm to embedded bio sets.

    Acked-by: Mike Snitzer
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert md to embedded bio sets.

    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert bcache to embedded bio sets.

    Reviewed-by: Coly Li
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert lightnvm to embedded bio sets.

    Reviewed-by: Javier González
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert pktcdvd to embedded bio sets.

    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert drbd to embedded bio sets and mempools.

    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Convert the core block functionality to embedded bio sets.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Change to return true/false only for bool type return code.

    Signed-off-by: Chengguang Xu
    Signed-off-by: Jens Axboe

    Chengguang Xu
     
  • We already check for started commands in all callbacks, but we should
    also protect against already completed commands. Do this by taking
    the checks to common code.

    Acked-by: Josef Bacik
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • When a userspace client requests a NBD device be disconnected, the
    DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
    will not inform userspace when a connection is closed.

    Unfortunately the flag was never cleared, so once a disconnect was
    requested the driver would thereafter never tell userspace about a
    closed connection. Thus when connections failed due to timeout, no
    attempt to reconnect was made and eventually the device would fail.

    Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
    DISCONNECTED flag) once all connections are closed.

    Reviewed-by: Josef Bacik
    Signed-off-by: Kevin Vigor
    Signed-off-by: Jens Axboe

    Kevin Vigor
     
  • tg in throtl_select_dispatch is used first and then do check. Since tg
    may be NULL, it has potential NULL pointer dereference risk. So fix
    it.

    Signed-off-by: Joseph Qi
    Signed-off-by: Liu Bo
    Signed-off-by: Jens Axboe

    Liu Bo
     
  • Currently, kyber is very unfriendly with merging. kyber depends
    on ctx rq_list to do merging, however, most of time, it will not
    leave any requests in ctx rq_list. This is because even if tokens
    of one domain is used up, kyber will try to dispatch requests
    from other domain and flush the rq_list there.

    To improve this, we setup kyber_ctx_queue (kcq) which is similar
    with ctx, but it has rq_lists for different domain and build same
    mapping between kcq and khd as the ctx & hctx. Then we could merge,
    insert and dispatch for different domains separately. At the same
    time, only flush the rq_list of kcq when get domain token successfully.
    Then if one domain token is used up, the requests could be left in
    the rq_list of that domain and maybe merged with following io.

    Following is my test result on machine with 8 cores and NVMe card
    INTEL SSDPEKKR128G7

    fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8
    seq/random
    +------+---------------------------------------------------------------+
    |patch?| bw(MB/s) | iops | slat(usec) | clat(usec) | merge |
    +----------------------------------------------------------------------+
    | w/o | 606/612 | 151k/153k | 6.89/7.03 | 3349.21/3305.40 | 0/0 |
    +----------------------------------------------------------------------+
    | w/ | 1083/616 | 277k/154k | 4.93/6.95 | 1830.62/3279.95 | 223k/3k |
    +----------------------------------------------------------------------+
    When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k
    on my platform.

    Signed-off-by: Jianchao Wang
    Tested-by: Holger Hoffstätte
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jianchao Wang
     
  • No functional changes in this patch, just a prep patch for utilizing
    this in an IO scheduler.

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

30 May, 2018

2 commits

  • Bsg holding a reference to the parent device may result in a crash if a
    bsg file handle is closed after the parent device driver has unloaded.

    Holding a reference is not really needed: the parent device must exist
    between bsg_register_queue and bsg_unregister_queue. Before the device
    goes away the caller does blk_cleanup_queue so that all in-flight
    requests to the device are gone and all new requests cannot pass beyond
    the queue. The queue itself is a refcounted object and it will stay
    alive with a bsg file.

    Based on analysis, previous patch and changelog from Anatoliy Glagolev.

    Reported-by: Anatoliy Glagolev
    Reviewed-by: James E.J. Bottomley
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Pull NVMe changes from Christoph:

    "Here is the current batch of nvme updates for 4.18, we have a few more
    patches in the queue, but I'd like to get this pile into your tree
    and linux-next ASAP.

    The biggest item is support for file-backed namespaces in the NVMe
    target from Chaitanya, in addition to that we mostly small fixes from
    all the usual suspects."

    * 'nvme-4.18-2' of git://git.infradead.org/nvme:
    nvme: fixup memory leak in nvme_init_identify()
    nvme: fix KASAN warning when parsing host nqn
    nvmet-loop: use nr_phys_segments when map rq to sgl
    nvmet-fc: increase LS buffer count per fc port
    nvmet: add simple file backed ns support
    nvmet: remove duplicate NULL initialization for req->ns
    nvmet: make a few error messages more generic
    nvme-fabrics: allow duplicate connections to the discovery controller
    nvme-fabrics: centralize discovery controller defaults
    nvme-fabrics: remove unnecessary controller subnqn validation
    nvme-fc: remove setting DNR on exception conditions
    nvme-rdma: stop admin queue before freeing it
    nvme-pci: Fix AER reset handling
    nvme-pci: set nvmeq->cq_vector after alloc cq/sq
    nvme: host: core: fix precedence of ternary operator
    nvme: fix lockdep warning in nvme_mpath_clear_current_path

    Jens Axboe
     

29 May, 2018

13 commits