16 Dec, 2011

1 commit

  • While probing, fd sets up queue, probes hardware and tears down the
    queue if probing fails. In the process, blk_drain_queue() kicks the
    queue which failed to finish initialization and fd is unhappy about
    that.

    floppy0: no floppy controllers found
    ------------[ cut here ]------------
    WARNING: at drivers/block/floppy.c:2929 do_fd_request+0xbf/0xd0()
    Hardware name: To Be Filled By O.E.M.
    VFS: do_fd_request called on non-open device
    Modules linked in:
    Pid: 1, comm: swapper Not tainted 3.2.0-rc4-00077-g5983fe2 #2
    Call Trace:
    [] warn_slowpath_common+0x7a/0xb0
    [] warn_slowpath_fmt+0x41/0x50
    [] do_fd_request+0xbf/0xd0
    [] blk_drain_queue+0x65/0x80
    [] blk_cleanup_queue+0xe3/0x1a0
    [] floppy_init+0xdeb/0xe28
    [] ? daring+0x6b/0x6b
    [] do_one_initcall+0x3f/0x170
    [] kernel_init+0x9d/0x11e
    [] ? schedule_tail+0x22/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? start_kernel+0x2be/0x2be
    [] ? gs_change+0xb/0xb

    Avoid it by making blk_drain_queue() kick queue iff dispatch queue has
    something on it.

    Signed-off-by: Tejun Heo
    Reported-by: Ralf Hildebrandt
    Reported-by: Wu Fengguang
    Tested-by: Sergei Trofimovich
    Signed-off-by: Jens Axboe

    Tejun Heo
     

23 Nov, 2011

1 commit

  • struct request_queue is allocated with __GFP_ZERO so its "node" field is
    zero before initialization. This causes an oops if node 0 is offline in
    the page allocator because its zonelists are not initialized. From Dave
    Young's dmesg:

    SRAT: Node 1 PXM 2 0-d0000000
    SRAT: Node 1 PXM 2 100000000-330000000
    SRAT: Node 0 PXM 1 330000000-630000000
    Initmem setup node 1 0000000000000000-000000000affb000
    ...
    Built 1 zonelists in Node order, mobility grouping on.
    ...
    BUG: unable to handle kernel paging request at 0000000000001c08
    IP: [] __alloc_pages_nodemask+0xb5/0x870

    and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
    zonelist->_zonerefs.

    The fix is to initialize q->node at the time of allocation so the correct
    node is passed to the slab allocator later.

    Since blk_init_allocated_queue_node() is no longer needed, merge it with
    blk_init_allocated_queue().

    [rientjes@google.com: changelog, initializing q->node]
    Cc: stable@vger.kernel.org [2.6.37+]
    Reported-by: Dave Young
    Signed-off-by: Mike Snitzer
    Signed-off-by: David Rientjes
    Tested-by: Dave Young
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

16 Nov, 2011

2 commits


04 Nov, 2011

1 commit

  • blk_cleanup_queue() may be called before elevator is set up on a
    queue which triggers the following oops.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] elv_drain_elevator+0x1c/0x70
    ...
    Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590
    Bochs Bochs
    RIP: 0010:[] [] elv_drain_elevator+0x1c/0x70
    ...
    Call Trace:
    [] blk_drain_queue+0x42/0x70
    [] blk_cleanup_queue+0xd0/0x1c0
    [] md_free+0x50/0x70
    [] kobject_release+0x8b/0x1d0
    [] kref_put+0x36/0xa0
    [] kobject_put+0x27/0x60
    [] mddev_delayed_delete+0x2f/0x40
    [] process_one_work+0x100/0x3b0
    [] worker_thread+0x15f/0x3a0
    [] kthread+0x87/0x90
    [] kernel_thread_helper+0x4/0x10

    Fix it by making blk_cleanup_queue() check whether q->elevator is set
    up before invoking blk_drain_queue.

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Jens Axboe

    Tejun Heo
     

24 Oct, 2011

3 commits

  • Jens Axboe
     
  • A dm-multipath user reported[1] a problem when trying to boot
    a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
    (block: fix flush machinery for stacking drivers with differring
    flush flags) applied. It turns out that an empty flush request
    can be sent into blk_insert_flush. When the BUG_ON was fixed
    to allow for this, I/O on the underlying device would stall. The
    reason is that blk_insert_cloned_request does not kick the queue.
    In the aforementioned commit, I had added a special case to
    kick the queue if data was sent down but the queue flags did
    not require a flush. A better solution is to push the queue
    kick up into blk_insert_cloned_request.

    This patch, along with a follow-on which fixes the BUG_ON, fixes
    the issue reported.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • bio originally has the functionality to set the complete cpu, but
    it is broken.

    Chirstoph said that "This code is unused, and from the all the
    discussions lately pretty obviously broken. The only thing keeping
    it serves is creating more confusion and possibly more bugs."

    And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
    with leaving cpu control to the request based drivers, they are the
    only ones that can toggle the setting anyway".

    So this patch tries to remove all the work of controling complete cpu
    from a bio.

    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     

19 Oct, 2011

7 commits

  • request_queue is refcounted but actually depdends on lifetime
    management from the queue owner - on blk_cleanup_queue(), block layer
    expects that there's no request passing through request_queue and no
    new one will.

    This is fundamentally broken. The queue owner (e.g. SCSI layer)
    doesn't have a way to know whether there are other active users before
    calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
    guarantee that the queue is and would stay valid while it's holding a
    reference.

    With delay added in blk_queue_bio() before queue_lock is grabbed, the
    following oops can be easily triggered when a device is removed with
    in-flight IOs.

    sd 0:0:1:0: [sdb] Stopping disk
    ata1.01: disabled
    general protection fault: 0000 [#1] PREEMPT SMP
    CPU 2
    Modules linked in:

    Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
    RIP: 0010:[] [] elv_rqhash_find+0x61/0x100
    ...
    Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
    ...
    Call Trace:
    [] elv_merge+0x84/0xe0
    [] blk_queue_bio+0xf4/0x400
    [] generic_make_request+0xca/0x100
    [] submit_bio+0x74/0x100
    [] dio_bio_submit+0xbc/0xc0
    [] __blockdev_direct_IO+0x92e/0xb40
    [] blkdev_direct_IO+0x57/0x60
    [] generic_file_aio_read+0x6d5/0x760
    [] do_sync_read+0xda/0x120
    [] vfs_read+0xc5/0x180
    [] sys_pread64+0x9a/0xb0
    [] system_call_fastpath+0x16/0x1b

    This happens because blk_queue_cleanup() destroys the queue and
    elevator whether IOs are in progress or not and DEAD tests are
    sprinkled in the request processing path without proper
    synchronization.

    Similar problem exists for blk-throtl. On queue cleanup, blk-throtl
    is shutdown whether it has requests in it or not. Depending on
    timing, it either oopses or throttled bios are lost putting tasks
    which are waiting for bio completion into eternal D state.

    The way it should work is having the usual clear distinction between
    shutdown and release. Shutdown drains all currently pending requests,
    marks the queue dead, and performs partial teardown of the now
    unnecessary part of the queue. Even after shutdown is complete,
    reference holders are still allowed to issue requests to the queue
    although they will be immmediately failed. The rest of teardown
    happens on release.

    This patch makes the following changes to make blk_queue_cleanup()
    behave as proper shutdown.

    * QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
    queue_lock.

    * Unsynchronized DEAD check in generic_make_request_checks() removed.
    This couldn't make any meaningful difference as the queue could die
    after the check.

    * blk_drain_queue() updated such that it can drain all requests and is
    now called during cleanup.

    * blk_throtl updated such that it checks DEAD on grabbing queue_lock,
    drains all throttled bios during cleanup and free td when queue is
    released.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • attempt_plug_merge() accesses elevator without holding queue_lock and
    may call into ->elevator_bio_merge_fn(). The elvator is guaranteed to
    be valid because it's accessed iff the plugged list has requests and
    elevator is never exited with live requests, so as long as the
    elevator method can deal with unlocked access, this is safe.

    Explain the sync rules around attempt_plug_merge() and drop the
    unnecessary @tsk parameter.

    This patch doesn't introduce any functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently get_request[_wait]() allocates request whether queue is dead
    or not. This patch makes get_request[_wait]() return NULL if @q is
    dead. blk_queue_bio() is updated to fail the submitted bio if request
    allocation fails. While at it, add docbook comments for
    get_request[_wait]().

    Note that the current code has rather unclear (there are spurious DEAD
    tests scattered around) assumption that the owner of a queue
    guarantees that no request travels block layer if the queue is dead
    and this patch in itself doesn't change much; however, this will allow
    fixing the broken assumption in the next patch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_throtl_bio() and throtl_get_tg() have rather unusual interface.

    * throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV),
    and drops queue_lock in the latter case. Different locking context
    depending on return value is error-prone and DEAD state is scheduled
    to be protected by queue_lock anyway. Move DEAD check inside
    queue_lock and return valid tg or NULL.

    * blk_throtl_bio() indicates return status both with its return value
    and in/out param **@bio. The former is used to indicate whether
    queue is found to be dead during throtl processing. The latter
    whether the bio is throttled.

    There's no point in returning DEAD check result from
    blk_throtl_bio(). The queue can die after blk_throtl_bio() is
    finished but before make_request_fn() grabs queue lock.

    Make it take *@bio instead and return boolean result indicating
    whether the request is throttled or not.

    This patch doesn't cause any visible functional difference.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Reorganize queue draining related code in preparation of queue exit
    changes.

    * Factor out actual draining from elv_quiesce_start() to
    blk_drain_queue().

    * Make elv_quiesce_start/end() responsible for their own locking.

    * Replace open-coded ELVSWITCH clearing in elevator_switch() with
    elv_quiesce_end().

    This patch doesn't cause any visible functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_alloc_request() and freed_request() take different combinations of
    REQ_* @flags, @priv and @is_sync when @flags is superset of the latter
    two. Make them take @flags only. This cleans up the code a bit and
    will ease updating allocation related REQ_* flags.

    This patch doesn't introduce any functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Conflicts:
    block/blk-core.c
    include/linux/blkdev.h

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Sep, 2011

1 commit

  • A kernel crash is observed when a mounted ext3/ext4 filesystem is
    physically removed. The problem is that blk_cleanup_queue() frees up
    some resources eg by calling elevator_exit(), which are not checked for
    in normal operation. So we should rather move these calls to the
    destructor function blk_release_queue() as at that point all remaining
    references are gone. However, in doing so we have to ensure that any
    externally supplied queue_lock is disconnected as the driver might free
    up the lock after the call of blk_cleanup_queue(),

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

21 Sep, 2011

1 commit

  • Thus spake Andrew Morton:

    "And I have the usual maintainability whine. If someone comes up to
    vmscan.c and sees it calling blk_start_plug(), how are they supposed to
    work out why that call is there? They go look at the blk_start_plug()
    definition and it is undocumented. I think we can do better than this?"

    Adapted from the LWN article - http://lwn.net/Articles/438256/ by Jens
    Axboe and from an earlier attempt by Shaohua Li to document blk-plug.

    [akpm@linux-foundation.org: grammatical and spelling tweaks]
    Signed-off-by: Suresh Jayaraman
    Cc: Shaohua Li
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Suresh Jayaraman
     

15 Sep, 2011

1 commit

  • Move all the checks performed on a bio into a new helper, and call it as
    soon as bio is submitted even if it is a re-submission from ->make_request.

    We explicitly mark the new helper as beeing non-inlined as the stack
    usage for printing the block device name in the failure case is quite
    high and this a patch where we have to be extremely conservative about
    stack usage.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Sep, 2011

3 commits


24 Aug, 2011

2 commits

  • Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
    list anyway, we can do the request counting there, so stack size is reduced
    a little bit.
    The motivation here is I suspect if we should count the requests for each
    queue (task could handle multiple disks in the meantime), but my test doesn't
    show it's worthy doing. If somebody proves we should do it, below change
    will make that more easier.

    Signed-off-by: Shaohua Li
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • Do blk_flush_plug_list() first and then add new request aDo blk_flush_plug_list() first and then add new request aDo blk_flush_plug_list() first and then add new request at the tail. New
    request can't be merged to existing requests, but later new requests might
    be merged with this new one. If blk_flush_plug_list() is done later, the
    merge doesn't happen.
    Believe it or not, this fixes a 10% regression running sysbench workload.

    Signed-off-by: Shaohua Li
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

20 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
    Revert "cfq: Remove special treatment for metadata rqs."
    block: fix flush machinery for stacking drivers with differring flush flags
    block: improve rq_affinity placement
    blktrace: add FLUSH/FUA support
    Move some REQ flags to the common bio/request area
    allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
    xen/blkback: Make description more obvious.
    cfq-iosched: Add documentation about idling
    block: Make rq_affinity = 1 work as expected
    block: swim3: fix unterminated of_device_id table
    block/genhd.c: remove useless cast in diskstats_show()
    drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
    drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
    bsg-lib: add module.h include
    cfq-iosched: Reduce linked group count upon group destruction
    blk-throttle: correctly determine sync bio
    loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
    loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
    loop: add management interface for on-demand device allocation
    loop: replace linked list of allocated devices with an idr index
    ...

    Linus Torvalds
     

16 Aug, 2011

1 commit

  • Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
    FLUSH/FUA to support merge, introduced a performance regression when
    running any sort of fsyncing workload using dm-multipath and certain
    storage (in our case, an HP EVA). The test I ran was fs_mark, and it
    dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
    that dm-multipath always advertised flush+fua support, and passed
    commands on down the stack, where those flags used to get stripped off.
    The above commit changed that behavior:

    static inline struct request *__elv_next_request(struct request_queue *q)
    {
    struct request *rq;

    while (1) {
    - while (!list_empty(&q->queue_head)) {
    + if (!list_empty(&q->queue_head)) {
    rq = list_entry_rq(q->queue_head.next);
    - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
    - (rq->cmd_flags & REQ_FLUSH_SEQ))
    - return rq;
    - rq = blk_do_flush(q, rq);
    - if (rq)
    - return rq;
    + return rq;
    }

    Note that previously, a command would come in here, have
    REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

    struct request *blk_do_flush(struct request_queue *q, struct request *rq)
    {
    unsigned int fflags = q->flush_flags; /* may change, cache it */
    bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
    bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
    bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
    REQ_FUA);
    unsigned skip = 0;
    ...
    if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
    rq->cmd_flags &= ~REQ_FLUSH;
    if (!has_fua)
    rq->cmd_flags &= ~REQ_FUA;
    return rq;
    }

    So, the flush machinery was bypassed in such cases (q->flush_flags == 0
    && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

    Now, however, we don't get into the flush machinery at all. Instead,
    __elv_next_request just hands a request with flush and fua bits set to
    the scsi_request_fn, even if the underlying request_queue does not
    support flush or fua.

    The agreed upon approach is to fix the flush machinery to allow
    stacking. While this isn't used in practice (since there is only one
    request-based dm target, and that target will now reflect the flush
    flags of the underlying device), it does future-proof the solution, and
    make it function as designed.

    In order to make this work, I had to add a field to the struct request,
    inside the flush structure (to store the original req->end_io). Shaohua
    had suggested overloading the union with rb_node and completion_data,
    but the completion data is used by device mapper and can also be used by
    other drivers. So, I didn't see a way around the additional field.

    I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
    the lost performance. Comments and other testers, as always, are
    appreciated.

    Cheers,
    Jeff

    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

04 Aug, 2011

1 commit

  • init_fault_attr_dentries() is used to export fault_attr via debugfs.
    But it can only export it in debugfs root directory.

    Per Forlin is working on mmc_fail_request which adds support to inject
    data errors after a completed host transfer in MMC subsystem.

    The fault_attr for mmc_fail_request should be defined per mmc host and
    export it in debugfs directory per mmc host like
    /sys/kernel/debug/mmc0/mmc_fail_request.

    init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
    introduces fault_create_debugfs_attr() which is able to create a
    directory in the arbitrary directory and replace
    init_fault_attr_dentries().

    [akpm@linux-foundation.org: extraneous semicolon, per Randy]
    Signed-off-by: Akinobu Mita
    Tested-by: Per Forlin
    Cc: Jens Axboe
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Randy Dunlap
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

27 Jul, 2011

1 commit

  • This changes should_fail_request() to more usable wrapper function of
    should_fail(). It can avoid putting #ifdef CONFIG_FAIL_MAKE_REQUEST in
    the middle of a function.

    Signed-off-by: Akinobu Mita
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

26 Jul, 2011

2 commits

  • After commit 5757a6d7 introduced an unsafe calling of
    smp_processor_id(), with preempt debuggin turned on we spew a lot of:

    BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514
    caller is __make_request+0x1b8/0x308
    [] (unwind_backtrace+0x0/0xe8) from [] (debug_smp_processor_id+0xbc/0xf0)
    [] (debug_smp_processor_id+0xbc/0xf0) from [] (__make_request+0x1b8/0x308)
    [] (__make_request+0x1b8/0x308) from [] (generic_make_request+0x4dc/0x558)
    [] (generic_make_request+0x4dc/0x558) from [] (submit_bio+0x114/0x138)
    [] (submit_bio+0x114/0x138) from [] (submit_bh+0x148/0x16c)
    [] (submit_bh+0x148/0x16c) from [] (__sync_dirty_buffer+0x88/0xd8)
    [] (__sync_dirty_buffer+0x88/0xd8) from [] (journal_commit_transaction+0x1198/0x1688)
    [] (journal_commit_transaction+0x1198/0x1688) from [] (kjournald+0xb4/0x224)
    [] (kjournald+0xb4/0x224) from [] (kthread+0x8c/0x94)
    [] (kthread+0x8c/0x94) from [] (kernel_thread_exit+0x0/0x8)

    Fix this by just using raw_smp_processor_id(), it's just a hint
    after all. There's no pinning of the CPU or accessing per-cpu
    structures involved.

    Reported-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
    block: strict rq_affinity
    backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
    block: fix patch import error in max_discard_sectors check
    block: reorder request_queue to remove 64 bit alignment padding
    CFQ: add think time check for group
    CFQ: add think time check for service tree
    CFQ: move think time check variables to a separate struct
    fixlet: Remove fs_excl from struct task.
    cfq: Remove special treatment for metadata rqs.
    block: document blk_plug list access
    block: avoid building too big plug list
    compat_ioctl: fix make headers_check regression
    block: eliminate potential for infinite loop in blkdev_issue_discard
    compat_ioctl: fix warning caused by qemu
    block: flush MEDIA_CHANGE from drivers on close(2)
    blk-throttle: Make total_nr_queued unsigned
    block: Add __attribute__((format(printf...) and fix fallout
    fs/partitions/check.c: make local symbols static
    block:remove some spare spaces in genhd.c
    block:fix the comment error in blkdev.h
    ...

    Linus Torvalds
     

24 Jul, 2011

1 commit

  • Some systems benefit from completions always being steered to the strict
    requester cpu rather than the looser "per-socket" steering that
    blk_cpu_to_group() attempts by default. This is because the first
    CPU in the group mask ends up being completely overloaded with work,
    while the others (including the original submitter) has power left
    to spare.

    Allow the strict mode to be set by writing '2' to the sysfs control
    file. This is identical to the scheme used for the nomerges file,
    where '2' is a more aggressive setting than just being turned on.

    echo 2 > /sys/block//queue/rq_affinity

    Cc: Christoph Hellwig
    Cc: Roland Dreier
    Tested-by: Dave Jiang
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     

22 Jul, 2011

1 commit

  • USB surprise removal of sr is triggering an oops in
    scsi_dispatch_command(). What seems to be happening is that USB is
    hanging on to a queue reference until the last close of the upper
    device, so the crash is caused by surprise remove of a mounted CD
    followed by attempted unmount.

    The problem is that USB doesn't issue its final commands as part of
    the SCSI teardown path, but on last close when the block queue is long
    gone. The long term fix is probably to make sr do the teardown in the
    same way as sd (so remove all the lower bits on ejection, but keep the
    upper disk alive until last close of user space). However, the
    current oops can be simply fixed by not allowing any commands to be
    sent to a dead queue.

    Cc: stable@kernel.org
    Signed-off-by: James Bottomley

    James Bottomley
     

08 Jul, 2011

1 commit

  • When I test fio script with big I/O depth, I found the total throughput drops
    compared to some relative small I/O depth. The reason is the thread accumulates
    big requests in its plug list and causes some delays (surely this depends
    on CPU speed).
    I thought we'd better have a threshold for requests. When a threshold reaches,
    this means there is no request merge and queue lock contention isn't severe
    when pushing per-task requests to queue, so the main advantages of blk plug
    don't exist. We can force a plug list flush in this case.
    With this, my test throughput actually increases and almost equals to small
    I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
    for big I/O depth.
    The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
    reduce lock contention to me. But I'm open here, 32 is ok in my test too.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

27 May, 2011

2 commits


23 May, 2011

1 commit

  • Commit 73c101011926 ("block: initial patch for on-stack per-task plugging")
    removed calls to elv_bio_merged() when @bio merged with @req. Re-add them.

    This in turn will update merged stats in associated group. That
    should be safe as long as request has got reference to the blkio_group.

    Signed-off-by: Namhyung Kim
    Cc: Divyesh Shah
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

21 May, 2011

2 commits

  • We don't need them anymore, so kill:

    - REQ_ON_PLUG checks in various places
    - !rq_mergeable() check in plug merging

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently, all the cfq_group or throtl_group allocations happen while
    we are holding ->queue_lock and sleeping is not allowed.

    Soon, we will move to per cpu stats and also need to allocate the
    per group stats. As one can not call alloc_percpu() from atomic
    context as it can sleep, we need to drop ->queue_lock, allocate the
    group, retake the lock and continue processing.

    In throttling code, I check the queue DEAD flag again to make sure
    that driver did not call blk_cleanup_queue() in the mean time.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

18 May, 2011

1 commit

  • Let's check a scenario:
    1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
    2. blk_run_queue_async();
    the second one will became a noop, because q->delay_work already has
    WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
    SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
    work runs immediately.

    Fix this by doing a cancel on potentially pending delayed work
    before queuing an immediate run of the workqueue.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

19 Apr, 2011

2 commits

  • We don't pass in a 'force_kblockd' anymore, get rid of the
    stsale comment.

    Reported-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We are currently using this flag to check whether it's safe
    to call into ->request_fn(). If it is set, we punt to kblockd.
    But we get a lot of false positives and excessive punts to
    kblockd, which hurts performance.

    The only real abuser of this infrastructure is SCSI. So export
    the async queue run and convert SCSI over to use that. There's
    room for improvement in that SCSI need not always use the async
    call, but this fixes our performance issue and they can fix that
    up in due time.

    Signed-off-by: Jens Axboe

    Jens Axboe