07 Sep, 2018

1 commit


17 Aug, 2018

1 commit

  • The value that struct cftype .write() method returns is then directly
    returned to userspace as the value returned by write() syscall, so it
    should be the number of bytes actually written (or consumed) and not zero.

    Returning zero from write() syscall makes programs like /bin/echo or bash
    spin.

    Signed-off-by: Maciej S. Szmigiero
    Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Maciej S. Szmigiero
     

09 May, 2018

1 commit

  • cfq and bfq have some internal fields that use sched_clock() which can
    trivially use ktime_get_ns() instead. Their timestamp fields in struct
    request can also use ktime_get_ns(), which resolves the 8 year old
    comment added by commit 28f4197e5d47 ("block: disable preemption before
    using sched_clock()").

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

09 Jan, 2018

1 commit

  • For each pair [device for which bfq is selected as I/O scheduler,
    group in blkio/io], bfq maintains a corresponding bfq group. Each such
    bfq group contains a set of async queues, with each async queue
    created on demand, i.e., when some I/O request arrives for it. On
    creation, an async queue gets an extra reference, to make sure that
    the queue is not freed as long as its bfq group exists. Accordingly,
    to allow the queue to be freed after the group exited, this extra
    reference must released on group exit.

    The above holds also for a bfq root group, i.e., for the bfq group
    corresponding to the root blkio/io root for a given device. Yet, by
    mistake, the references to the existing async queues of a root group
    are not released when the latter exits. This causes a memory leak when
    the instance of bfq for a given device exits. In a similar vein,
    bfqg_stats_xfer_dead is not executed for a root group.

    This commit fixes bfq_pd_offline so that the latter executes the above
    missing operations for a root group too.

    Reported-by: Holger Hoffstätte
    Reported-by: Guoqing Jiang
    Tested-by: Holger Hoffstätte
    Signed-off-by: Davide Ferrari
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

15 Nov, 2017

1 commit

  • BFQ currently creates, and updates, its own instance of the whole
    set of blkio statistics that cfq creates. Yet, from the comments
    of Tejun Heo in [1], it turned out that most of these statistics
    are meant/useful only for debugging. This commit makes BFQ create
    the latter, debugging statistics only if the option
    CONFIG_DEBUG_BLK_CGROUP is set.

    By doing so, this commit also enables BFQ to enjoy a high perfomance
    boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
    BFQ has to update far fewer statistics, and, in particular, not the
    heaviest to update. To give an idea of the benefits, if
    CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
    with 8 threads doing random I/O in parallel on null_blk (configured
    with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
    (+30%). We have measured similar or even much higher boosts with other
    CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
    been obtained and can be reproduced very easily with the script in [1].

    [1] https://www.spinics.net/lists/linux-block/msg18943.html

    Suggested-by: Tejun Heo
    Suggested-by: Ulf Hansson
    Tested-by: Lee Tibbert
    Tested-by: Oleksandr Natalenko
    Signed-off-by: Luca Miccio
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Luca Miccio
     

02 Sep, 2017

1 commit


08 Jun, 2017

1 commit

  • In blk-cgroup, operations on blkg objects are protected with the
    request_queue lock. This is no more the lock that protects
    I/O-scheduler operations in blk-mq. In fact, the latter are now
    protected with a finer-grained per-scheduler-instance lock. As a
    consequence, although blkg lookups are also rcu-protected, blk-mq I/O
    schedulers may see inconsistent data when they access blkg and
    blkg-related objects. BFQ does access these objects, and does incur
    this problem, in the following case.

    The blkg_lookup performed in bfq_get_queue, being protected (only)
    through rcu, may happen to return the address of a copy of the
    original blkg. If this is the case, then the blkg_get performed in
    bfq_get_queue, to pin down the blkg, is useless: it does not prevent
    blk-cgroup code from destroying both the original blkg and all objects
    directly or indirectly referred by the copy of the blkg. BFQ accesses
    these objects, which typically causes a crash for NULL-pointer
    dereference of memory-protection violation.

    Some additional protection mechanism should be added to blk-cgroup to
    address this issue. In the meantime, this commit provides a quick
    temporary fix for BFQ: cache (when safe) blkg data that might
    disappear right after a blkg_lookup.

    In particular, this commit exploits the following facts to achieve its
    goal without introducing further locks. Destroy operations on a blkg
    invoke, as a first step, hooks of the scheduler associated with the
    blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
    consequence, for any blkg associated with the request queue an
    instance of BFQ is attached to, we are guaranteed that such a blkg is
    not destroyed, and that all the pointers it contains are consistent,
    while that instance is holding its bfqd->lock. A blkg_lookup performed
    with bfqd->lock held then returns a fully consistent blkg, which
    remains consistent until this lock is held. In more detail, this holds
    even if the returned blkg is a copy of the original one.

    Finally, also the object describing a group inside BFQ needs to be
    protected from destruction on the blkg_free of the original blkg
    (which invokes bfq_pd_free). This commit adds private refcounting for
    this object, to let it disappear only after no bfq_queue refers to it
    any longer.

    This commit also removes or updates some stale comments on locking
    issues related to blk-cgroup operations.

    Reported-by: Tomas Konir
    Reported-by: Lee Tibbert
    Reported-by: Marco Piazza
    Signed-off-by: Paolo Valente
    Tested-by: Tomas Konir
    Tested-by: Lee Tibbert
    Tested-by: Marco Piazza
    Signed-off-by: Jens Axboe

    Paolo Valente
     

19 Apr, 2017

1 commit

  • The BFQ I/O scheduler features an optimal fair-queuing
    (proportional-share) scheduling algorithm, enriched with several
    mechanisms to boost throughput and reduce latency for interactive and
    real-time applications. This makes BFQ a large and complex piece of
    code. This commit addresses this issue by splitting BFQ into three
    main, independent components, and by moving each component into a
    separate source file:
    1. Main algorithm: handles the interaction with the kernel, and
    decides which requests to dispatch; it uses the following two further
    components to achieve its goals.
    2. Scheduling engine (Hierarchical B-WF2Q+ scheduling algorithm):
    computes the schedule, using weights and budgets provided by the above
    component.
    3. cgroups support: handles group operations (creation, destruction,
    move, ...).

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente