22 Apr, 2011

2 commits

  • DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
    internal event for revalidation of removeable devices. Some legacy
    drivers don't implement proper event detection and continuously
    generate events under certain circumstances. For example, ide-cd
    generates media changed continuously if there's no media in the drive,
    which can lead to infinite loop of events jumping back and forth
    between the driver and userland event handler.

    This patch updates disk event infrastructure such that it never
    propagates events not listed in disk->events to userland. Those
    events are processed the same for internal purposes but uevent
    generation is suppressed.

    This also ensures that userland only gets events which are advertised
    in the @events sysfs node lowering risk of confusion.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • The sort insert is the one that goes to the IO scheduler. With
    the SORT_MERGE addition, we could bypass IO scheduler setup
    but still ask the IO scheduler to insert the request. This would
    cause an oops on switching IO schedulers through the sysfs
    interface, unless the disk just happened to be idle while it
    occured.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

19 Apr, 2011

6 commits

  • In queue_requests_store, the code looks like
    if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
    blk_set_queue_full(q, BLK_RW_SYNC);
    } else if (rl->count[BLK_RW_SYNC]+1 nr_requests) {
    blk_clear_queue_full(q, BLK_RW_SYNC);
    wake_up(&rl->wait[BLK_RW_SYNC]);
    }
    If we don't satify the situation of "if", we can get that
    rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
    rl->count[BLK_RW_SYNC]+1 nr_requests.
    All the "else" should satisfy the "else if" check so it isn't
    needed actually.

    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     
  • We do not call blk_trace_remove_sysfs() in err return path
    if kobject_add() fails. This path fixes it.

    Cc: stable@kernel.org
    Signed-off-by: Liu Yuan
    Signed-off-by: Jens Axboe

    Liu Yuan
     
  • We don't pass in a 'force_kblockd' anymore, get rid of the
    stsale comment.

    Reported-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We are currently using this flag to check whether it's safe
    to call into ->request_fn(). If it is set, we punt to kblockd.
    But we get a lot of false positives and excessive punts to
    kblockd, which hurts performance.

    The only real abuser of this infrastructure is SCSI. So export
    the async queue run and convert SCSI over to use that. There's
    room for improvement in that SCSI need not always use the async
    call, but this fixes our performance issue and they can fix that
    up in due time.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • For some configurations of CONFIG_PREEMPT that is not true. So
    get rid of __call_for_each_cic() and always uses the explicitly
    rcu_read_lock() protected call_for_each_cic() instead.

    This fixes a potential bug related to IO scheduler removal or
    online switching.

    Thanks to Paul McKenney for clarifying this.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With all drivers and file systems converted, we only have
    in-core use of this function. So remove the export.

    Reporteed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Apr, 2011

5 commits


16 Apr, 2011

1 commit

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Apr, 2011

2 commits

  • For the explicit unplugging, we'd prefer to kick things off
    immediately and not pay the penalty of the latency to switch
    to kblockd. So let blk_finish_plug() do the run inline, while
    the implicit-on-schedule-out unplug will punt to kblockd.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's a bit of a mess currently. task->plug is being cleared
    and reset in __blk_finish_plug(), and blk_finish_plug() is
    testing for a NULL plug which cannot happen even from schedule()
    anymore since it uses blk_needs_flush_plug() to determine
    whether to call into this function at all.

    So get rid of some of the cruft.

    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Apr, 2011

1 commit


12 Apr, 2011

6 commits


11 Apr, 2011

1 commit

  • If the request_fn ends up blocking, we could be re-entering
    the plug flush. Since the list is protected by explicitly
    not allowing schedule events, this isn't a terribly good idea.

    Additionally, it can cause us to recurse. As request_fn called by
    __blk_run_queue is allowed to 'schedule()' (after dropping the queue
    lock of course), it is possible to get a recursive call:

    schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
    -> __blk_run_queue -> request_fn -> schedule

    We must make sure that the second schedule does not call into
    blk_flush_plug again. So instead of leaving the list of requests on
    blk_plug->list, move them to a separate list leaving blk_plug->list
    empty.

    Signed-off-by: Jens Axboe

    NeilBrown
     

08 Apr, 2011

1 commit


06 Apr, 2011

6 commits

  • Comparison function for list_sort() must be anticommutative,
    otherwise it is not sorting in ordinary meaning.

    But fortunately list_sort() always check ((*cmp)(priv, a, b)
    Signed-off-by: Jens Axboe

    Konstantin Khlebnikov
     
  • The current block integrity (DIF/DIX) support in DM is verifying that
    all devices' integrity profiles match during DM device resume (which
    is past the point of no return). To some degree that is unavoidable
    (stacked DM devices force this late checking). But for most DM
    devices (which aren't stacking on other DM devices) the ideal time to
    verify all integrity profiles match is during table load.

    Introduce the notion of an "initialized" integrity profile: a profile
    that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
    template. Add blk_integrity_is_initialized() to allow checking if a
    profile was initialized.

    Update DM integrity support to:
    - check all devices with _initialized_ integrity profiles match
    during table load; uninitialized profiles (e.g. for underlying DM
    device(s) of a stacked DM device) are ignored.
    - disallow a table load that would result in an integrity profile that
    conflicts with a DM device's existing (in-use) integrity profile
    - avoid clearing an existing integrity profile
    - validate all integrity profiles match during resume; but if they
    don't all we can do is report the mismatch (during resume we're past
    the point of no return)

    Signed-off-by: Mike Snitzer
    Cc: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • xchg does not work portably with smaller than 32bit types.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Jens Axboe

    Andreas Schwab
     
  • It's not a preempt type request, in fact we have to insert it
    behind requests that do specify INSERT_FRONT.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Merge it with __elv_add_request(), it's pretty pointless to
    have a function with only two callers. The main interface
    is elv_add_request()/__elv_add_request().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we just dump a non-informative 'request botched' message.
    Lets actually try and print something sane to help debug issues
    around this.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

31 Mar, 2011

1 commit


26 Mar, 2011

2 commits

  • When the queue work handler was converted to delayed work, the
    stopping was inadvertently made sync as well. Change this back
    to being async stop, using __cancel_delayed_work() instead of
    cancel_delayed_work().

    Reported-by: Jeremy Fitzhardinge
    Reported-by: Chris Mason
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With the introduction of the on-stack plugging, we would assume
    that any request being inserted was a normal file system request.
    As flush/fua requires a special insert mode, this caused problems.

    Fix this up by checking for this in flush_plug_list() and use
    the appropriate insert mechanism.

    Big thanks goes to Markus Tripplesdorf for tirelessly testing
    patches, and to Sergey Senozhatsky for helping find the real
    issue.

    Reported-by: Markus Tripplesdorf
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

23 Mar, 2011

5 commits

  • Removing think time checking. A high thinktime queue might means the queue
    dispatches several requests and then do away. Limitting such queue seems
    meaningless. And also this can simplify code. This is suggested by Vivek.

    Signed-off-by: Shaohua Li
    Acked-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Li, Shaohua
     
  • For v2, I added back lines to cfq_preempt_queue() that were removed
    during updates for accounting unaccounted_time. Thanks for pointing out
    that I'd missed these, Vivek.

    Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly
    cleared stats for preempting queues when it shouldn't have, because when
    we choose a queue to preempt, it still isn't necessarily scheduled next.

    Thanks to Vivek Goyal for figuring this out and understanding how the
    preemption code works.

    Signed-off-by: Justin TerAvest
    Signed-off-by: Jens Axboe

    Justin TerAvest
     
  • Lina reported that if throttle limits are initially very high and then
    dropped, then no new bio might be dispatched for a long time. And the
    reason being that after dropping the limits we don't reset the existing
    slice and do the rate calculation with new low rate and account the bios
    dispatched at high rate. To fix it, reset the slice upon rate change.

    https://lkml.org/lkml/2011/3/10/298

    Another problem with very high limit is that we never queued the
    bio on throtl service tree. That means we kept on extending the
    group slice but never trimmed it. Fix that also by regulary
    trimming the slice even if bio is not being queued up.

    Reported-by: Lina Lu
    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • This change moves unaccounted_time to only be reported when
    CONFIG_DEBUG_BLK_CGROUP is true.

    Signed-off-by: Justin TerAvest
    Signed-off-by: Jens Axboe

    Justin TerAvest
     
  • Commit "Add unaccounted time to timeslice_used" changed the behavior of
    cfq_preempt_queue to set cfqq active. Vivek pointed out that other
    preemption rules might get involved, so we shouldn't manually set which
    queue is active.

    This cleans up the code to just clear the queue stats at preemption
    time.

    Signed-off-by: Justin TerAvest
    Signed-off-by: Jens Axboe

    Justin TerAvest