04 Aug, 2011

1 commit

  • init_fault_attr_dentries() is used to export fault_attr via debugfs.
    But it can only export it in debugfs root directory.

    Per Forlin is working on mmc_fail_request which adds support to inject
    data errors after a completed host transfer in MMC subsystem.

    The fault_attr for mmc_fail_request should be defined per mmc host and
    export it in debugfs directory per mmc host like
    /sys/kernel/debug/mmc0/mmc_fail_request.

    init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
    introduces fault_create_debugfs_attr() which is able to create a
    directory in the arbitrary directory and replace
    init_fault_attr_dentries().

    [akpm@linux-foundation.org: extraneous semicolon, per Randy]
    Signed-off-by: Akinobu Mita
    Tested-by: Per Forlin
    Cc: Jens Axboe
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Randy Dunlap
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

27 Jul, 2011

1 commit

  • This changes should_fail_request() to more usable wrapper function of
    should_fail(). It can avoid putting #ifdef CONFIG_FAIL_MAKE_REQUEST in
    the middle of a function.

    Signed-off-by: Akinobu Mita
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

26 Jul, 2011

2 commits

  • After commit 5757a6d7 introduced an unsafe calling of
    smp_processor_id(), with preempt debuggin turned on we spew a lot of:

    BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514
    caller is __make_request+0x1b8/0x308
    [] (unwind_backtrace+0x0/0xe8) from [] (debug_smp_processor_id+0xbc/0xf0)
    [] (debug_smp_processor_id+0xbc/0xf0) from [] (__make_request+0x1b8/0x308)
    [] (__make_request+0x1b8/0x308) from [] (generic_make_request+0x4dc/0x558)
    [] (generic_make_request+0x4dc/0x558) from [] (submit_bio+0x114/0x138)
    [] (submit_bio+0x114/0x138) from [] (submit_bh+0x148/0x16c)
    [] (submit_bh+0x148/0x16c) from [] (__sync_dirty_buffer+0x88/0xd8)
    [] (__sync_dirty_buffer+0x88/0xd8) from [] (journal_commit_transaction+0x1198/0x1688)
    [] (journal_commit_transaction+0x1198/0x1688) from [] (kjournald+0xb4/0x224)
    [] (kjournald+0xb4/0x224) from [] (kthread+0x8c/0x94)
    [] (kthread+0x8c/0x94) from [] (kernel_thread_exit+0x0/0x8)

    Fix this by just using raw_smp_processor_id(), it's just a hint
    after all. There's no pinning of the CPU or accessing per-cpu
    structures involved.

    Reported-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
    block: strict rq_affinity
    backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
    block: fix patch import error in max_discard_sectors check
    block: reorder request_queue to remove 64 bit alignment padding
    CFQ: add think time check for group
    CFQ: add think time check for service tree
    CFQ: move think time check variables to a separate struct
    fixlet: Remove fs_excl from struct task.
    cfq: Remove special treatment for metadata rqs.
    block: document blk_plug list access
    block: avoid building too big plug list
    compat_ioctl: fix make headers_check regression
    block: eliminate potential for infinite loop in blkdev_issue_discard
    compat_ioctl: fix warning caused by qemu
    block: flush MEDIA_CHANGE from drivers on close(2)
    blk-throttle: Make total_nr_queued unsigned
    block: Add __attribute__((format(printf...) and fix fallout
    fs/partitions/check.c: make local symbols static
    block:remove some spare spaces in genhd.c
    block:fix the comment error in blkdev.h
    ...

    Linus Torvalds
     

24 Jul, 2011

1 commit

  • Some systems benefit from completions always being steered to the strict
    requester cpu rather than the looser "per-socket" steering that
    blk_cpu_to_group() attempts by default. This is because the first
    CPU in the group mask ends up being completely overloaded with work,
    while the others (including the original submitter) has power left
    to spare.

    Allow the strict mode to be set by writing '2' to the sysfs control
    file. This is identical to the scheme used for the nomerges file,
    where '2' is a more aggressive setting than just being turned on.

    echo 2 > /sys/block//queue/rq_affinity

    Cc: Christoph Hellwig
    Cc: Roland Dreier
    Tested-by: Dave Jiang
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     

22 Jul, 2011

1 commit

  • USB surprise removal of sr is triggering an oops in
    scsi_dispatch_command(). What seems to be happening is that USB is
    hanging on to a queue reference until the last close of the upper
    device, so the crash is caused by surprise remove of a mounted CD
    followed by attempted unmount.

    The problem is that USB doesn't issue its final commands as part of
    the SCSI teardown path, but on last close when the block queue is long
    gone. The long term fix is probably to make sr do the teardown in the
    same way as sd (so remove all the lower bits on ejection, but keep the
    upper disk alive until last close of user space). However, the
    current oops can be simply fixed by not allowing any commands to be
    sent to a dead queue.

    Cc: stable@kernel.org
    Signed-off-by: James Bottomley

    James Bottomley
     

08 Jul, 2011

1 commit

  • When I test fio script with big I/O depth, I found the total throughput drops
    compared to some relative small I/O depth. The reason is the thread accumulates
    big requests in its plug list and causes some delays (surely this depends
    on CPU speed).
    I thought we'd better have a threshold for requests. When a threshold reaches,
    this means there is no request merge and queue lock contention isn't severe
    when pushing per-task requests to queue, so the main advantages of blk plug
    don't exist. We can force a plug list flush in this case.
    With this, my test throughput actually increases and almost equals to small
    I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
    for big I/O depth.
    The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
    reduce lock contention to me. But I'm open here, 32 is ok in my test too.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

27 May, 2011

2 commits


23 May, 2011

1 commit

  • Commit 73c101011926 ("block: initial patch for on-stack per-task plugging")
    removed calls to elv_bio_merged() when @bio merged with @req. Re-add them.

    This in turn will update merged stats in associated group. That
    should be safe as long as request has got reference to the blkio_group.

    Signed-off-by: Namhyung Kim
    Cc: Divyesh Shah
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

21 May, 2011

2 commits

  • We don't need them anymore, so kill:

    - REQ_ON_PLUG checks in various places
    - !rq_mergeable() check in plug merging

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently, all the cfq_group or throtl_group allocations happen while
    we are holding ->queue_lock and sleeping is not allowed.

    Soon, we will move to per cpu stats and also need to allocate the
    per group stats. As one can not call alloc_percpu() from atomic
    context as it can sleep, we need to drop ->queue_lock, allocate the
    group, retake the lock and continue processing.

    In throttling code, I check the queue DEAD flag again to make sure
    that driver did not call blk_cleanup_queue() in the mean time.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

18 May, 2011

1 commit

  • Let's check a scenario:
    1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
    2. blk_run_queue_async();
    the second one will became a noop, because q->delay_work already has
    WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
    SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
    work runs immediately.

    Fix this by doing a cancel on potentially pending delayed work
    before queuing an immediate run of the workqueue.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

19 Apr, 2011

3 commits

  • We don't pass in a 'force_kblockd' anymore, get rid of the
    stsale comment.

    Reported-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We are currently using this flag to check whether it's safe
    to call into ->request_fn(). If it is set, we punt to kblockd.
    But we get a lot of false positives and excessive punts to
    kblockd, which hurts performance.

    The only real abuser of this infrastructure is SCSI. So export
    the async queue run and convert SCSI over to use that. There's
    room for improvement in that SCSI need not always use the async
    call, but this fixes our performance issue and they can fix that
    up in due time.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With all drivers and file systems converted, we only have
    in-core use of this function. So remove the export.

    Reporteed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Apr, 2011

5 commits


16 Apr, 2011

1 commit

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Apr, 2011

2 commits

  • For the explicit unplugging, we'd prefer to kick things off
    immediately and not pay the penalty of the latency to switch
    to kblockd. So let blk_finish_plug() do the run inline, while
    the implicit-on-schedule-out unplug will punt to kblockd.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's a bit of a mess currently. task->plug is being cleared
    and reset in __blk_finish_plug(), and blk_finish_plug() is
    testing for a NULL plug which cannot happen even from schedule()
    anymore since it uses blk_needs_flush_plug() to determine
    whether to call into this function at all.

    So get rid of some of the cruft.

    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Apr, 2011

6 commits


11 Apr, 2011

1 commit

  • If the request_fn ends up blocking, we could be re-entering
    the plug flush. Since the list is protected by explicitly
    not allowing schedule events, this isn't a terribly good idea.

    Additionally, it can cause us to recurse. As request_fn called by
    __blk_run_queue is allowed to 'schedule()' (after dropping the queue
    lock of course), it is possible to get a recursive call:

    schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
    -> __blk_run_queue -> request_fn -> schedule

    We must make sure that the second schedule does not call into
    blk_flush_plug again. So instead of leaving the list of requests on
    blk_plug->list, move them to a separate list leaving blk_plug->list
    empty.

    Signed-off-by: Jens Axboe

    NeilBrown
     

08 Apr, 2011

1 commit


06 Apr, 2011

2 commits


31 Mar, 2011

1 commit


26 Mar, 2011

2 commits

  • When the queue work handler was converted to delayed work, the
    stopping was inadvertently made sync as well. Change this back
    to being async stop, using __cancel_delayed_work() instead of
    cancel_delayed_work().

    Reported-by: Jeremy Fitzhardinge
    Reported-by: Chris Mason
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • With the introduction of the on-stack plugging, we would assume
    that any request being inserted was a normal file system request.
    As flush/fua requires a special insert mode, this caused problems.

    Fix this up by checking for this in flush_plug_list() and use
    the appropriate insert mechanism.

    Big thanks goes to Markus Tripplesdorf for tirelessly testing
    patches, and to Sergey Senozhatsky for helping find the real
    issue.

    Reported-by: Markus Tripplesdorf
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

21 Mar, 2011

1 commit

  • One of the disadvantages of on-stack plugging is that we potentially
    lose out on merging since all pending IO isn't always visible to
    everybody. When we flush the on-stack plugs, right now we don't do
    any checks to see if potential merge candidates could be utilized.

    Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE.
    It works just ELEVATOR_INSERT_SORT, but first checks whether we can
    merge with an existing request before doing the insertion (if we fail
    merging).

    This fixes a regression with multiple processes issuing IO that
    can be merged.

    Thanks to Shaohua Li for testing and fixing
    an accounting bug.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Mar, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
    [SCSI] scsi_dh_rdac: Add MD36xxf into device list
    [SCSI] scsi_debug: add consecutive medium errors
    [SCSI] libsas: fix ata list corruption issue
    [SCSI] hpsa: export resettable host attribute
    [SCSI] hpsa: move device attributes to avoid forward declarations
    [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
    [SCSI] sd: Logical Block Provisioning update
    [SCSI] Include protection operation in SCSI command trace
    [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
    [SCSI] target: Fix volume size misreporting for volumes > 2TB
    [SCSI] bnx2fc: Broadcom FCoE offload driver
    [SCSI] fcoe: fix broken fcoe interface reset
    [SCSI] fcoe: precedence bug in fcoe_filter_frames()
    [SCSI] libfcoe: Remove stale fcoe-netdev entries
    [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
    [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
    [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
    [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
    [SCSI] libfc: Fixing a memory leak when destroying an interface
    [SCSI] megaraid_sas: Version and Changelog update
    ...

    Fix up trivial conflicts due to whitespace differences in
    drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}

    Linus Torvalds