27 May, 2011

2 commits


23 May, 2011

1 commit

  • Commit 73c101011926 ("block: initial patch for on-stack per-task plugging")
    removed calls to elv_bio_merged() when @bio merged with @req. Re-add them.

    This in turn will update merged stats in associated group. That
    should be safe as long as request has got reference to the blkio_group.

    Signed-off-by: Namhyung Kim
    Cc: Divyesh Shah
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

21 May, 2011

2 commits

  • We don't need them anymore, so kill:

    - REQ_ON_PLUG checks in various places
    - !rq_mergeable() check in plug merging

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently, all the cfq_group or throtl_group allocations happen while
    we are holding ->queue_lock and sleeping is not allowed.

    Soon, we will move to per cpu stats and also need to allocate the
    per group stats. As one can not call alloc_percpu() from atomic
    context as it can sleep, we need to drop ->queue_lock, allocate the
    group, retake the lock and continue processing.

    In throttling code, I check the queue DEAD flag again to make sure
    that driver did not call blk_cleanup_queue() in the mean time.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

18 May, 2011

1 commit

  • Let's check a scenario:
    1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
    2. blk_run_queue_async();
    the second one will became a noop, because q->delay_work already has
    WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
    SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
    work runs immediately.

    Fix this by doing a cancel on potentially pending delayed work
    before queuing an immediate run of the workqueue.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

19 Apr, 2011

3 commits


18 Apr, 2011

5 commits


16 Apr, 2011

1 commit

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Apr, 2011

2 commits


12 Apr, 2011

6 commits


11 Apr, 2011

1 commit

  • If the request_fn ends up blocking, we could be re-entering
    the plug flush. Since the list is protected by explicitly
    not allowing schedule events, this isn't a terribly good idea.

    Additionally, it can cause us to recurse. As request_fn called by
    __blk_run_queue is allowed to 'schedule()' (after dropping the queue
    lock of course), it is possible to get a recursive call:

    schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
    -> __blk_run_queue -> request_fn -> schedule

    We must make sure that the second schedule does not call into
    blk_flush_plug again. So instead of leaving the list of requests on
    blk_plug->list, move them to a separate list leaving blk_plug->list
    empty.

    Signed-off-by: Jens Axboe

    NeilBrown
     

08 Apr, 2011

1 commit


06 Apr, 2011

2 commits


31 Mar, 2011

1 commit


26 Mar, 2011

2 commits

  • When the queue work handler was converted to delayed work, the
    stopping was inadvertently made sync as well. Change this back
    to being async stop, using __cancel_delayed_work() instead of
    cancel_delayed_work().

    Reported-by: Jeremy Fitzhardinge
    Reported-by: Chris Mason
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With the introduction of the on-stack plugging, we would assume
    that any request being inserted was a normal file system request.
    As flush/fua requires a special insert mode, this caused problems.

    Fix this up by checking for this in flush_plug_list() and use
    the appropriate insert mechanism.

    Big thanks goes to Markus Tripplesdorf for tirelessly testing
    patches, and to Sergey Senozhatsky for helping find the real
    issue.

    Reported-by: Markus Tripplesdorf
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

21 Mar, 2011

1 commit

  • One of the disadvantages of on-stack plugging is that we potentially
    lose out on merging since all pending IO isn't always visible to
    everybody. When we flush the on-stack plugs, right now we don't do
    any checks to see if potential merge candidates could be utilized.

    Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE.
    It works just ELEVATOR_INSERT_SORT, but first checks whether we can
    merge with an existing request before doing the insertion (if we fail
    merging).

    This fixes a regression with multiple processes issuing IO that
    can be merged.

    Thanks to Shaohua Li for testing and fixing
    an accounting bug.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Mar, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
    [SCSI] scsi_dh_rdac: Add MD36xxf into device list
    [SCSI] scsi_debug: add consecutive medium errors
    [SCSI] libsas: fix ata list corruption issue
    [SCSI] hpsa: export resettable host attribute
    [SCSI] hpsa: move device attributes to avoid forward declarations
    [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
    [SCSI] sd: Logical Block Provisioning update
    [SCSI] Include protection operation in SCSI command trace
    [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
    [SCSI] target: Fix volume size misreporting for volumes > 2TB
    [SCSI] bnx2fc: Broadcom FCoE offload driver
    [SCSI] fcoe: fix broken fcoe interface reset
    [SCSI] fcoe: precedence bug in fcoe_filter_frames()
    [SCSI] libfcoe: Remove stale fcoe-netdev entries
    [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
    [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
    [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
    [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
    [SCSI] libfc: Fixing a memory leak when destroying an interface
    [SCSI] megaraid_sas: Version and Changelog update
    ...

    Fix up trivial conflicts due to whitespace differences in
    drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}

    Linus Torvalds
     

10 Mar, 2011

5 commits

  • Conflicts:
    block/blk-core.c
    block/blk-flush.c
    drivers/md/raid1.c
    drivers/md/raid10.c
    drivers/md/raid5.c
    fs/nilfs2/btnode.c
    fs/nilfs2/mdt.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch adds support for creating a queuing context outside
    of the queue itself. This enables us to batch up pieces of IO
    before grabbing the block device queue lock and submitting them to
    the IO scheduler.

    The context is created on the stack of the process and assigned in
    the task structure, so that we can auto-unplug it if we hit a schedule
    event.

    The current queue plugging happens implicitly if IO is submitted to
    an empty device, yet callers have to remember to unplug that IO when
    they are going to wait for it. This is an ugly API and has caused bugs
    in the past. Additionally, it requires hacks in the vm (->sync_page()
    callback) to handle that logic. By switching to an explicit plugging
    scheme we make the API a lot nicer and can get rid of the ->sync_page()
    hack in the vm.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we use plugging for that, but as plugging is going away,
    we need an alternative mechanism.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Mar, 2011

1 commit

  • This merge creates two set of conflicts. One is simple context
    conflicts caused by removal of throtl_scheduled_delayed_work() in
    for-linus and removal of throtl_shutdown_timer_wq() in
    for-2.6.39/core.

    The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
    call directly into q->request_fn() __blk_run_queue()) in for-linus
    crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
    isn't trivial but the resolution is straight-forward.

    * __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
    should be called with @force_kblockd set to %true.

    * elv_insert() in blk_kick_flush() should use
    %ELEVATOR_INSERT_REQUEUE.

    Both changes are to avoid invoking ->request_fn() directly from
    request completion path and closely match the changes in the commit
    255bb490c8.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

03 Mar, 2011

1 commit

  • Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
    written in such a way that it needs queue lock. In blk_release_queue()
    there is no gurantee that ->queue_lock is still around.

    Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
    one problem.

    https://lkml.org/lkml/2010/10/23/86

    And a quick fix moved blk_throtl_exit() to blk_release_queue().

    commit 7ad58c028652753814054f4e3ac58f925e7343f4
    Author: Jens Axboe
    Date: Sat Oct 23 20:40:26 2010 +0200

    block: fix use-after-free bug in blk throttle code

    This patch reverts above change and does not try to shutdown the
    throtl work in blk_sync_queue(). By avoiding call to
    throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
    the problem reported by Ingo.

    blk_sync_queue() seems to be used only by md driver and it seems to be
    using it to make sure q->unplug_fn is not called as md registers its
    own unplug functions and it is about to free up the data structures
    used by unplug_fn(). Block throttle does not call back into unplug_fn()
    or into md. So there is no need to cancel blk throttle work.

    In fact I think cancelling block throttle work is bad because it might
    happen that some bios are throttled and scheduled to be dispatched later
    with the help of pending work and if work is cancelled, these bios might
    never be dispatched.

    Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
    blk_release_queue() time. That should be safe as we are also calling
    blk_throtl_exit() which should make sure all the throttling related
    data structures are cleaned up.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal