23 Aug, 2018

1 commit

  • A previous commit removed the ability to have per-rq flags. We used
    those flags to maintain inflight counts. Since we don't have those
    anymore, we have to always maintain inflight counts, even if wbt is
    disabled. This is clearly suboptimal.

    Add a queue quiesce around changing the wbt latency settings from sysfs
    to work around this. With that, we can reliably put the enabled check in
    our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be
    consistent for the lifetime of the request.

    Fixes: c1c80384c8f ("block: remove external dependency on wbt_flags")
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Aug, 2018

1 commit

  • For legacy queues the only call of blkg_root_lookup() happens after
    bypass mode has been enabled. Since blkg_lookup() returns NULL for
    queues in bypass mode, modify the blkg_root_lookup() such that it
    no longer depends on bypass mode. Rename the function into
    blk_queue_root_blkg() as suggested by Tejun.

    Suggested-by: Tejun Heo
    Fixes: 6bad9b210a22 ("blkcg: Introduce blkg_root_lookup()")
    Signed-off-by: Bart Van Assche
    Cc: Tejun Heo
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

09 Aug, 2018

1 commit

  • Several block drivers call alloc_disk() followed by put_disk() if
    something fails before device_add_disk() is called without calling
    blk_cleanup_queue(). Make sure that also for this scenario a request
    queue is dissociated from the cgroup controller. This patch avoids
    that loading the parport_pc, paride and pf drivers triggers the
    following kernel crash:

    BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
    Read of size 4 at addr 0000000000000008 by task modprobe/744
    Call Trace:
    dump_stack+0x9a/0xeb
    kasan_report+0x139/0x350
    pi_init+0x42e/0x580 [paride]
    pf_init+0x2bb/0x1000 [pf]
    do_one_initcall+0x8e/0x405
    do_init_module+0xd9/0x2f2
    load_module+0x3ab4/0x4700
    SYSC_finit_module+0x176/0x1a0
    do_syscall_64+0xee/0x2b0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Reported-by: Alexandru Moise
    Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17
    Signed-off-by: Bart Van Assche
    Tested-by: Alexandru Moise
    Reviewed-by: Johannes Thumshirn
    Cc: Tejun Heo
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Alexandru Moise
    Cc: Joseph Qi
    Cc:
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

09 Jul, 2018

1 commit


31 May, 2018

1 commit


25 May, 2018

1 commit

  • Convert the S_ symbolic permissions to their octal equivalents as
    using octal and not symbolic permissions is preferred by many as more
    readable.

    see: https://lkml.org/lkml/2016/8/2/1945

    Done with automated conversion via:
    $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

    Miscellanea:

    o Wrapped modified multi-line calls to a single line where appropriate
    o Realign modified multi-line calls to open parenthesis

    Signed-off-by: Joe Perches
    Signed-off-by: Jens Axboe

    Joe Perches
     

15 May, 2018

1 commit


09 Mar, 2018

1 commit

  • Introduce functions that modify the queue flags and that protect
    these modifications with the request queue lock. Except for moving
    one wake_up_all() call from inside to outside a critical section,
    this patch does not change any functionality.

    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Ming Lei
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

01 Mar, 2018

1 commit

  • Avoid that the following race can occur:

    blk_cleanup_queue() blkcg_print_blkgs()
    spin_lock_irq(lock) (1) spin_lock_irq(blkg->q->queue_lock) (2,5)
    q->queue_lock = &q->__queue_lock (3)
    spin_unlock_irq(lock) (4)
    spin_unlock_irq(blkg->q->queue_lock) (6)

    (1) take driver lock;
    (2) busy loop for driver lock;
    (3) override driver lock with internal lock;
    (4) unlock driver lock;
    (5) can take driver lock now;
    (6) but unlock internal lock.

    This change is safe because only the SCSI core and the NVME core keep
    a reference on a request queue after having called blk_cleanup_queue().
    Neither driver accesses any of the removed data structures between its
    blk_cleanup_queue() and blk_put_queue() calls.

    Reported-by: Joseph Qi
    Signed-off-by: Bart Van Assche
    Reviewed-by: Joseph Qi
    Cc: Jan Kara
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

19 Jan, 2018

1 commit

  • The __blk_mq_register_dev(), blk_mq_unregister_dev(),
    elv_register_queue() and elv_unregister_queue() calls need to be
    protected with sysfs_lock but other code in these functions not.
    Hence protect only this code with sysfs_lock. This patch fixes a
    locking inversion issue in blk_unregister_queue() and also in an
    error path of blk_register_queue(): it is not allowed to hold
    sysfs_lock around the kobject_del(&q->kobj) call.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

15 Jan, 2018

2 commits

  • Since I can remember DM has forced the block layer to allow the
    allocation and initialization of the request_queue to be distinct
    operations. Reason for this is block/genhd.c:add_disk() has requires
    that the request_queue (and associated bdi) be tied to the gendisk
    before add_disk() is called -- because add_disk() also deals with
    exposing the request_queue via blk_register_queue().

    DM's dynamic creation of arbitrary device types (and associated
    request_queue types) requires the DM device's gendisk be available so
    that DM table loads can establish a master/slave relationship with
    subordinate devices that are referenced by loaded DM tables -- using
    bd_link_disk_holder(). But until these DM tables, and their associated
    subordinate devices, are known DM cannot know what type of request_queue
    it needs -- nor what its queue_limits should be.

    This chicken and egg scenario has created all manner of problems for DM
    and, at times, the block layer.

    Summary of changes:

    - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
    that drivers may use to add a disk without also calling
    blk_register_queue(). Driver must call blk_register_queue() once its
    request_queue is fully initialized.

    - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
    is not set. It won't be set if driver used add_disk_no_queue_reg()
    but driver encounters an error and must del_gendisk() before calling
    blk_register_queue().

    - Export blk_register_queue().

    These changes allow DM to use add_disk_no_queue_reg() to anchor its
    gendisk as the "master" for master/slave relationships DM must establish
    with subordinate devices referenced in DM tables that get loaded. Once
    all "slave" devices for a DM device are known its request_queue can be
    properly initialized and then advertised via sysfs -- important
    improvement being that no request_queue resource initialization
    performed by blk_register_queue() is missed for DM devices anymore.

    Signed-off-by: Mike Snitzer
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • The original commit e9a823fb34a8b (block: fix warning when I/O elevator
    is changed as request_queue is being removed) is pretty conflated.
    "conflated" because the resource being protected by q->sysfs_lock isn't
    the queue_flags (it is the 'queue' kobj).

    q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
    from racing with blk_unregister_queue():
    1) By holding q->sysfs_lock first, __elevator_change() can complete
    before a racing blk_unregister_queue().
    2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
    in case elv_iosched_store() loses the race with blk_unregister_queue(),
    it needs a way to know the 'queue' kobj isn't there.

    Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
    held until after the 'queue' kobj is removed.

    To do so blk_mq_unregister_dev() must not also take q->sysfs_lock. So
    rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().

    Also, blk_unregister_queue() should use q->queue_lock to protect against
    any concurrent writes to q->queue_flags -- even though chances are the
    queue is being cleaned up so no concurrent writes are likely.

    Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
    Signed-off-by: Mike Snitzer
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

24 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

29 Aug, 2017

1 commit

  • There is a race between changing I/O elevator and request_queue removal
    which can trigger the warning in kobject_add_internal. A program can
    use sysfs to request a change of elevator at the same time another task
    is unregistering the request_queue the elevator would be attached to.
    The elevator's kobject will then attempt to be connected to the
    request_queue in the object tree when the request_queue has just been
    removed from sysfs. This triggers the warning in kobject_add_internal
    as the request_queue no longer has a sysfs directory:

    kobject_add_internal failed for iosched (error: -2 parent: queue)
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

    To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
    changing the elevator and use the request_queue's sysfs_lock to
    serialize between clearing the flag and the elevator testing the flag.

    Signed-off-by: David Jeffery
    Tested-by: Ming Lei
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    David Jeffery
     

15 Jun, 2017

1 commit

  • Avoid that the following complaint is reported:

    BUG: sleeping function called from invalid context at kernel/workqueue.c:2790
    in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3
    1 lock held by rcuop/3/41:
    #0: (rcu_callback){......}, at: [] rcu_nocb_kthread+0x282/0x500
    Call Trace:
    dump_stack+0x86/0xcf
    ___might_sleep+0x174/0x260
    __might_sleep+0x4a/0x80
    flush_work+0x7e/0x2e0
    __cancel_work_timer+0x143/0x1c0
    cancel_work_sync+0x10/0x20
    blk_throtl_exit+0x25/0x60
    blkcg_exit_queue+0x35/0x40
    blk_release_queue+0x42/0x130
    kobject_put+0xa9/0x190

    This happens since we invoke callbacks that need to block from the
    queue release handler. Fix this by pushing the final release to
    a workqueue.

    Reported-by: Ross Zwisler
    Fixes: commit b425e5049258 ("block: Avoid that blk_exit_rl() triggers a use-after-free")
    Signed-off-by: Bart Van Assche
    Tested-by: Ross Zwisler

    Updated changelog
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

02 Jun, 2017

1 commit

  • Since the introduction of .init_rq_fn() and .exit_rq_fn() it is
    essential that the memory allocated for struct request_queue
    stays around until all blk_exit_rl() calls have finished. Hence
    make blk_init_rl() take a reference on struct request_queue.

    This patch fixes the following crash:

    general protection fault: 0000 [#2] SMP
    CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G D 4.12.0-rc2-dbg+ #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
    task: ffff88013a108040 task.stack: ffffc9000071c000
    RIP: 0010:free_request_size+0x1a/0x30
    RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202
    RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003
    RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418
    RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009
    R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8
    R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a
    FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0
    Call Trace:
    mempool_destroy.part.10+0x21/0x40
    mempool_destroy+0xe/0x10
    blk_exit_rl+0x12/0x20
    blkg_free+0x4d/0xa0
    __blkg_release_rcu+0x59/0x170
    rcu_process_callbacks+0x260/0x4e0
    __do_softirq+0x116/0x250
    smpboot_thread_fn+0x123/0x1e0
    kthread+0x109/0x140
    ret_from_fork+0x31/0x40

    Fixes: commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request")
    Signed-off-by: Bart Van Assche
    Acked-by: Tejun Heo
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Cc: Jan Kara
    Cc: # v4.11+
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

26 May, 2017

1 commit

  • The code in blk-mq-debugfs.c assumes that it is working on a blk-mq
    queue and is not intended to work on a blk-sq queue. Hence only
    register blk-mq debugfs attributes for blk-mq queues.

    Fixes: commit 9c1051aacde8 ("blk-mq: untangle debugfs and sysfs")
    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Reviewed-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

04 May, 2017

2 commits

  • Originally, I tied debugfs registration/unregistration together with
    sysfs. There's no reason to do this, and it's getting in the way of
    letting schedulers define their own debugfs attributes. Instead, tie the
    debugfs registration to the lifetime of the structures themselves.

    The saner lifetimes mean we can also get rid of the extra mq directory
    and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
    just nvme0n1/hctx0/tags.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Preparation for adding more declarations.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

27 Apr, 2017

1 commit

  • A later patch in this series will modify blk_mq_debugfs_register()
    such that it uses q->kobj.parent to determine the name of a
    request queue. Hence make sure that that pointer is initialized
    before blk_mq_debugfs_register() is called. To avoid lock inversion,
    protect sysfs / debugfs registration with the queue sysfs_lock
    instead of the global mutex all_q_mutex.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

19 Apr, 2017

1 commit

  • When CFQ is used as an elevator, it disables writeback throttling
    because they don't play well together. Later when a different elevator
    is chosen for the device, writeback throttling doesn't get enabled
    again as it should. Make sure CFQ enables writeback throttling (if it
    should be enabled by default) when we switch from it to another IO
    scheduler.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

09 Apr, 2017

1 commit


08 Apr, 2017

1 commit

  • We've added a considerable amount of fixes for stalls and issues
    with the blk-mq scheduling in the 4.11 series since forking
    off the for-4.12/block branch. We need to do improvements on
    top of that for 4.12, so pull in the previous fixes to make
    our lives easier going forward.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

07 Apr, 2017

1 commit

  • In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
    back to the original scheduler. However, at this point, we've already
    torn down the original scheduler's tags, so this causes a crash. Doing
    the fallback like the legacy elevator path is much harder for mq, so fix
    it by just falling back to none, instead.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

29 Mar, 2017

2 commits

  • CONFIG_DEBUG_TEST_DRIVER_REMOVE found a possible leak of q->rq_wb when a
    request queue is reregistered. This has been a problem since wbt was
    introduced, but the WARN_ON(!list_empty(&stats->callbacks)) in the
    blk-stat rework exposed it. Fix it by cleaning up wbt when we unregister
    the queue.

    Fixes: 87760e5eef35 ("block: hook up writeback throttling")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Now that the remaining drivers have been converted to one request queue
    per gendisk, let's warn if a request queue gets registered more than
    once. This will catch future drivers which might do it inadvertently or
    any old drivers that I may have missed.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

28 Mar, 2017

2 commits

  • The throtl_slice is 100ms by default. This is a long time for SSD, a lot
    of IO can run. To make cgroups have smoother throughput, we choose a
    small value (20ms) for SSD.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • throtl_slice is important for blk-throttling. It's called slice
    internally but it really is a time window blk-throttling samples data.
    blk-throttling will make decision based on the samplings. An example is
    bandwidth measurement. A cgroup's bandwidth is measured in the time
    interval of throtl_slice.

    A small throtl_slice meanse cgroups have smoother throughput but burn
    more CPUs. It has 100ms default value, which is not appropriate for all
    disks. A fast SSD can dispatch a lot of IOs in 100ms. This patch makes
    it tunable.

    Since throtl_slice isn't a time slice, the sysfs name
    'throttle_sample_time' reflects its character better.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

22 Mar, 2017

2 commits

  • Currently, statistics are gathered in ~0.13s windows, and users grab the
    statistics whenever they need them. This is not ideal for both in-tree
    users:

    1. Writeback throttling wants its own dynamically sized window of
    statistics. Since the blk-stats statistics are reset after every
    window and the wbt windows don't line up with the blk-stats windows,
    wbt doesn't see every I/O.
    2. Polling currently grabs the statistics on every I/O. Again, depending
    on how the window lines up, we may miss some I/Os. It's also
    unnecessary overhead to get the statistics on every I/O; the hybrid
    polling heuristic would be just as happy with the statistics from the
    previous full window.

    This reworks the blk-stats infrastructure to be callback-based: users
    register a callback that they want called at a given time with all of
    the statistics from the window during which the callback was active.
    Users can dynamically bucketize the statistics. wbt and polling both
    currently use read vs. write, but polling can be extended to further
    subdivide based on request size.

    The callbacks are kept on an RCU list, and each callback has percpu
    stats buffers. There will only be a few users, so the overhead on the
    I/O completion side is low. The stats flushing is also simplified
    considerably: since the timer function is responsible for clearing the
    statistics, we don't have to worry about stale statistics.

    wbt is a trivial conversion. After the conversion, the windowing problem
    mentioned above is fixed.

    For polling, we register an extra callback that caches the previous
    window's statistics in the struct request_queue for the hybrid polling
    heuristic to use.

    Since we no longer have a single stats buffer for the request queue,
    this also removes the sysfs and debugfs stats entries. To replace those,
    we add a debugfs entry for the poll statistics.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • The stats buckets will become generic soon, so make the existing users
    use the common READ and WRITE definitions instead of one internal to
    blk-stat.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

03 Mar, 2017

1 commit

  • For legacy scheduling, we always call ioc_exit_icq() with both the
    ioc and queue lock held. This poses a problem for blk-mq with
    scheduling, since the queue lock isn't what we use in the scheduler.
    And since we don't need the queue lock held for ioc exit there,
    don't grab it and leave any extra locking up to the blk-mq scheduler.

    Reported-by: Paolo Valente
    Tested-by: Paolo Valente
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Feb, 2017

1 commit

  • When a new disk shows up, sysfs queue directory is created before elevator
    is registered. This allows a user to attempt a scheduler switch even though
    the initial registration hasn't completed yet.

    In one scenario, blk_register_queue() calls elv_register_queue() and
    right before cfq_registered_queue() is called, another process executes
    elevator_switch() and replaces q->elevator with deadline scheduler. When
    cfq_registered_queue() executes it interprets e->elevator_data as struct
    cfq_data even though it is actually struct deadline_data.

    Grab q->sysfs_lock in blk_register_queue() to synchronize with sysfs
    callers.

    Signed-off-by: Tahsin Erdogan
    Signed-off-by: Jens Axboe

    Tahsin Erdogan
     

09 Feb, 2017

1 commit

  • Add a new merge strategy that merges discard bios into a request until the
    maximum number of discard ranges (or the maximum discard size) is reached
    from the plug merging code. I/O scheduler merging is not wired up yet
    but might also be useful, although not for fast devices like NVMe which
    are the only user for now.

    Note that for now we don't support limiting the size of each discard range,
    but if needed that can be added later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Feb, 2017

1 commit

  • I noticed that when booting with a default blk-mq I/O scheduler, the
    /sys/block/*/queue/iosched directory was missing. However, switching
    after boot did create the directory. This is because we skip the initial
    elevator register/unregister when we don't have a ->request_fn(), but we
    should still do it for the ->mq_ops case.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

03 Feb, 2017

1 commit


02 Feb, 2017

2 commits

  • Instead of storing backing_dev_info inside struct request_queue,
    allocate it dynamically, reference count it, and free it when the last
    reference is dropped. Currently only request_queue holds the reference
    but in the following patch we add other users referencing
    backing_dev_info.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     
  • We will want to have struct backing_dev_info allocated separately from
    struct request_queue. As the first step add pointer to backing_dev_info
    to request_queue and convert all users touching it. No functional
    changes in this patch.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

28 Jan, 2017

1 commit


14 Dec, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main block pull request this series. Contrary to previous
    release, I've kept the core and driver changes in the same branch. We
    always ended up having dependencies between the two for obvious
    reasons, so makes more sense to keep them together. That said, I'll
    probably try and keep more topical branches going forward, especially
    for cycles that end up being as busy as this one.

    The major parts of this pull request is:

    - Improved support for O_DIRECT on block devices, with a small
    private implementation instead of using the pig that is
    fs/direct-io.c. From Christoph.

    - Request completion tracking in a scalable fashion. This is utilized
    by two components in this pull, the new hybrid polling and the
    writeback queue throttling code.

    - Improved support for polling with O_DIRECT, adding a hybrid mode
    that combines pure polling with an initial sleep. From me.

    - Support for automatic throttling of writeback queues on the block
    side. This uses feedback from the device completion latencies to
    scale the queue on the block side up or down. From me.

    - Support from SMR drives in the block layer and for SD. From Hannes
    and Shaun.

    - Multi-connection support for nbd. From Josef.

    - Cleanup of request and bio flags, so we have a clear split between
    which are bio (or rq) private, and which ones are shared. From
    Christoph.

    - A set of patches from Bart, that improve how we handle queue
    stopping and starting in blk-mq.

    - Support for WRITE_ZEROES from Chaitanya.

    - Lightnvm updates from Javier/Matias.

    - Supoort for FC for the nvme-over-fabrics code. From James Smart.

    - A bunch of fixes from a whole slew of people, too many to name
    here"

    * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
    blk-stat: fix a few cases of missing batch flushing
    blk-flush: run the queue when inserting blk-mq flush
    elevator: make the rqhash helpers exported
    blk-mq: abstract out blk_mq_dispatch_rq_list() helper
    blk-mq: add blk_mq_start_stopped_hw_queue()
    block: improve handling of the magic discard payload
    blk-wbt: don't throttle discard or write zeroes
    nbd: use dev_err_ratelimited in io path
    nbd: reset the setup task for NBD_CLEAR_SOCK
    nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
    nvme-fabrics: Add target support for FC transport
    nvme-fabrics: Add host support for FC transport
    nvme-fabrics: Add FC transport LLDD api definitions
    nvme-fabrics: Add FC transport FC-NVME definitions
    nvme-fabrics: Add FC transport error codes to nvme.h
    Add type 0x28 NVME type code to scsi fc headers
    nvme-fabrics: patch target code in prep for FC transport support
    nvme-fabrics: set sqe.command_id in core not transports
    parser: add u64 number parser
    nvme-rdma: align to generic ib_event logging helper
    ...

    Linus Torvalds