28 Apr, 2009

7 commits

  • Impact: code reorganization

    elv_next_request() and elv_dequeue_request() are public block layer
    interface than actual elevator implementation. They mostly deal with
    how requests interact with block layer and low level drivers at the
    beginning of rqeuest processing whereas __elv_next_request() is the
    actual eleveator request fetching interface.

    Move the two functions to blk-core.c. This prepares for further
    interface cleanup.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Reorder request completion functions such that

    * All request completion functions are located together.

    * Functions which are used by only one caller is put right above the
    caller.

    * end_request() is put after other completion functions but before
    blk_update_request().

    This change is for completion function cleanup which will follow.

    [ Impact: cleanup, code reorganization ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
    Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is
    now only used in elevator proper and for discard requests.

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
    mergeable. There is no reason to specify it superflously. It only
    adds to confusion. Don't set REQ_NOMERGE for barriers and requests
    with specific queueing directive. REQ_NOMERGE is now exclusively used
    by the merging code.

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_start_queueing() is identical to __blk_run_queue() except that it
    doesn't check for recursion. None of the current users depends on
    blk_start_queueing() running request_fn directly. Replace usages of
    blk_start_queueing() with [__]blk_run_queue() and kill it.

    [ Impact: removal of mostly duplicate interface function ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • __blk_run_queue wraps blk_invoke_request_fn() such that it
    additionally removes plug and bails out early if the queue is empty.
    Both extra operations have their own pending mechanisms and don't
    cause any harm correctness-wise when they are done superflously.

    The only user of blk_invoke_request_fn() being blk_start_queue(),
    there isn't much reason to keep both functions around. Merge
    blk_invoke_request_fn() into __blk_run_queue() and make
    blk_start_queue() use __blk_run_queue() instead.

    [ Impact: merge two subtly different internal functions ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: subtle behavior change

    For fs requests, rq is only carrier of bios and rq error status as a
    whole doesn't mean much. This is the reason why rq->errors is being
    cleared on each partial completion of a request as on each partial
    completion the error status is transferred to the respective bios.

    For pc requests, rq->errors is used to carry error status to the
    issuer and thus __end_that_request_first() doesn't clear it on such
    cases.

    The condition was fine till now as only fs and pc requests have used
    bio and thus the bio completion path. However, future changes will
    unify data accesses to bio and all non fs users care about rq error
    status. Clear rq->errors on bio completion only for fs requests.

    In general, the implicit clearing is a bit too subtle especially as
    the meaning of rq->errors is completely dependent on low level
    drivers. Unifying / cleaning up rq->errors usage and letting llds
    manage it would be better. TODO comment added.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe

    Tejun Heo
     

24 Apr, 2009

1 commit

  • This simplifies I/O stat accounting switching code and separates it
    completely from I/O scheduler switch code.

    Requests are accounted according to the state of their request queue
    at the time of the request allocation. There is no need anymore to
    flush the request queue when switching I/O accounting state.

    Signed-off-by: Jerome Marchand
    Signed-off-by: Jens Axboe

    Jerome Marchand
     

08 Apr, 2009

1 commit

  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
    branch tracer: Fix for enabling branch profiling makes sparse unusable
    ftrace: Correct a text align for event format output
    Update /debug/tracing/README
    tracing/ftrace: alloc the started cpumask for the trace file
    tracing, x86: remove duplicated #include
    ftrace: Add check of sched_stopped for probe_sched_wakeup
    function-graph: add proper initialization for init task
    tracing/ftrace: fix missing include string.h
    tracing: fix incorrect return type of ns2usecs()
    tracing: remove CALLER_ADDR2 from wakeup tracer
    blktrace: fix pdu_len when tracing packet command requests
    blktrace: small cleanup in blk_msg_write()
    blktrace: NUL-terminate user space messages
    tracing: move scripts/trace/power.pl to scripts/tracing/power.pl

    Linus Torvalds
     

07 Apr, 2009

2 commits


06 Apr, 2009

3 commits


03 Apr, 2009

1 commit

  • Impact: output all of packet commands - not just the first 4 / 8 bytes

    Since commit d7e3c3249ef23b4617393c69fe464765b4ff1645 ("block: add
    large command support"), struct request->cmd has been changed from
    unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.

    v1 -> v2: by: FUJITA Tomonori

    - make sure rq->cmd_len is always intialized, and then we can use
    rq->cmd_len instead of BLK_MAX_CDB.

    Signed-off-by: Li Zefan
    Acked-by: FUJITA Tomonori
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

26 Mar, 2009

1 commit

  • Put a WARN_ON in __blk_put_request if it is about to
    leak bio(s). This is a serious bug that can happen in error
    handling code paths.

    For this to work I have fixed a couple of places in block/ where
    request->bio != NULL ownership was not honored. And a small cleanup
    at sg_io() while at it.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     

24 Mar, 2009

2 commits


02 Feb, 2009

1 commit


30 Jan, 2009

3 commits


29 Dec, 2008

5 commits

  • We just want to hand the first bits of IO to the device as fast
    as possible. Gains a few percent on the IOPS rate.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • * Because barrier mode can be changed dynamically, whether barrier is
    supported or not can be determined only when actually issuing the
    barrier and there is no point in checking it earlier. Drop barrier
    support check in generic_make_request() and __make_request(), and
    update comment around the support check in blk_do_ordered().

    * There is no reason to check discard support in both
    generic_make_request() and __make_request(). Drop the check in
    __make_request(). While at it, move error action block to the end
    of the function and add unlikely() to q existence test.

    * Barrier request, be it empty or not, is never passed to low level
    driver and thus it's meaningless to try to copy back req->sector to
    bio->bi_sector on error. In addition, the notion of failed sector
    doesn't make any sense for empty barrier to begin with. Drop the
    code block from __end_that_request_first().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • After many improvements on kblockd_flush_work, it is now identical to
    cancel_work_sync, so a direct call to cancel_work_sync is suggested.

    The only difference is that cancel_work_sync is a GPL symbol,
    so no non-GPL modules anymore.

    Signed-off-by: Cheng Renquan
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Cheng Renquan
     
  • Allow the scsi request REQ_QUIET flag to be propagated to the buffer
    file system layer. The basic ideas is to pass the flag from the scsi
    request to the bio (block IO) and then to the buffer layer. The buffer
    layer can then suppress needless printks.

    This patch declutters the kernel log by removed the 40-50 (per lun)
    buffer io error messages seen during a boot in my multipath setup . It
    is a good chance any real errors will be missed in the "noise" it the
    logs without this patch.

    During boot I see blocks of messages like
    "
    __ratelimit: 211 callbacks suppressed
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242847
    Buffer I/O error on device sdm, logical block 1
    Buffer I/O error on device sdm, logical block 5242878
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242879
    Buffer I/O error on device sdm, logical block 5242872
    "
    in my logs.

    My disk environment is multipath fiber channel using the SCSI_DH_RDAC
    code and multipathd. This topology includes an "active" and "ghost"
    path for each lun. IO's to the "ghost" path will never complete and the
    SCSI layer, via the scsi device handler rdac code, quick returns the IOs
    to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
    layer messages.

    I am wanting to extend the QUIET behavior to include the buffer file
    system layer to deal with these errors as well. I have been running this
    patch for a while now on several boxes without issue. A few runs of
    bonnie++ show no noticeable difference in performance in my setup.

    Thanks for John Stultz for the quiet_error finalization.

    Submitted-by: Keith Mannthey
    Signed-off-by: Jens Axboe

    Keith Mannthey
     
  • For sync IO, we'll often do them serialized. This means we'll be touching
    the queue timer for every IO, as opposed to only occasionally like we
    do for queued IO. Instead of deleting the timer when the last request
    is removed, just let continue running. If a new request comes up soon
    we then don't have to readd the timer again. If no new requests arrive,
    the timer will expire without side effect later.

    This improves high iops sync IO by ~1%.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Dec, 2008

1 commit


03 Dec, 2008

2 commits

  • Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
    devices.

    When stacking devices (LVM over MD over SCSI) some of the request queue
    parameters are not set up correctly in some cases by default, namely
    max_segment_size and and seg_boundary mask.

    If you create MD device over SCSI, these attributes are zeroed.

    Problem become when there is over this mapping next device-mapper mapping
    - queue attributes are set in DM this way:

    request_queue max_segment_size seg_boundary_mask
    SCSI 65536 0xffffffff
    MD RAID1 0 0
    LVM 65536 -1 (64bit)

    Unfortunately bio_add_page (resp. bio_phys_segments) calculates number of
    physical segments according to these parameters.

    During the generic_make_request() is segment cout recalculated and can
    increase bio->bi_phys_segments count over the allowed limit. (After
    bio_clone() in stack operation.)

    Thi is specially problem in CCISS driver, where it produce OOPS here

    BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

    (MAXSEGENTRIES is 31 by default.)

    Sometimes even this command is enough to cause oops:

    dd iflag=direct if=/dev// of=/dev/null bs=128000 count=10

    This command generates bios with 250 sectors, allocated in 32 4k-pages
    (last page uses only 1024 bytes).

    For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
    unfortunatelly on lower layer it is recalculated to 32 segments and this
    violates CCISS restriction and triggers BUG_ON().

    The patch tries to fix it by:

    * initializing attributes above in queue request constructor
    blk_queue_make_request()

    * make sure that blk_queue_stack_limits() inherits setting

    (DM uses its own function to set the limits because it
    blk_queue_stack_limits() was introduced later. It should probably switch
    to use generic stack limit function too.)

    * sets the default seg_boundary value in one place (blkdev.h)

    * use this mask as default in DM (instead of -1, which differs in 64bit)

    Bugs related to this:
    https://bugzilla.redhat.com/show_bug.cgi?id=471639
    http://bugzilla.kernel.org/show_bug.cgi?id=8672

    Signed-off-by: Milan Broz
    Reviewed-by: Alasdair G Kergon
    Cc: Neil Brown
    Cc: FUJITA Tomonori
    Cc: Tejun Heo
    Cc: Mike Miller
    Signed-off-by: Jens Axboe

    Milan Broz
     
  • blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
    both start the timeout timer. Barrier code dequeues the original
    barrier request but doesn't passes the request itself to lower level
    driver, only broken down proxy requests; however, as the original
    barrier code goes through the same dequeue path and timeout timer is
    started on it. If barrier sequence takes long enough, this timer
    expires but the low level driver has no idea about this request and
    oops follows.

    Timeout timer shouldn't have been started on the original barrier
    request as it never goes through actual IO. This patch unexports
    elv_dequeue_request(), which has no external user anyway, and makes it
    operate on elevator proper w/o adding the timer and make
    blkdev_dequeue_request() call elv_dequeue_request() and add timer.
    Internal users which don't pass the request to driver - barrier code
    and end_that_request_last() - are converted to use
    elv_dequeue_request().

    Signed-off-by: Tejun Heo
    Cc: Mike Anderson
    Signed-off-by: Jens Axboe

    Tejun Heo
     

26 Nov, 2008

2 commits

  • Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
    sites. Spread them out to the usage sites, as suggested by
    Mathieu Desnoyers.

    Signed-off-by: Ingo Molnar
    Acked-by: Mathieu Desnoyers

    Ingo Molnar
     
  • This was a forward port of work done by Mathieu Desnoyers, I changed it to
    encode the 'what' parameter on the tracepoint name, so that one can register
    interest in specific events and not on classes of events to then check the
    'what' parameter.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jens Axboe
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

06 Nov, 2008

1 commit


18 Oct, 2008

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    block: remove __generic_unplug_device() from exports
    block: move q->unplug_work initialization
    blktrace: pass zfcp driver data
    blktrace: add support for driver data
    block: fix current kernel-doc warnings
    block: only call ->request_fn when the queue is not stopped
    block: simplify string handling in elv_iosched_store()
    block: fix kernel-doc for blk_alloc_devt()
    block: fix nr_phys_segments miscalculation bug
    block: add partition attribute for partition number
    block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT
    softirq: Add support for triggering softirq work on softirqs.

    Linus Torvalds
     

17 Oct, 2008

4 commits

  • The only out-of-core user is IDE, and that should be using
    blk_start_queueing() instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • modprobe loop; rmmod loop effectively creates a blk_queue and destroys it
    which results in q->unplug_work being canceled without it ever being
    initialized.

    Therefore, move the initialization of q->unplug_work from
    blk_queue_make_request() to blk_alloc_queue*().

    Reported-by: Alexey Dobriyan
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Jens Axboe

    Peter Zijlstra
     
  • Fix block kernel-doc warnings:

    Warning(linux-2.6.27-git4//fs/block_dev.c:1272): No description found for parameter 'path'
    Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'cpu'
    Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'part'
    Warning(/var/linsrc/linux-2.6.27-git4//block/genhd.c:544): No description found for parameter 'partno'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Jens Axboe

    Randy Dunlap
     
  • Callers should use either blk_run_queue/__blk_run_queue, or
    blk_start_queueing() to invoke request handling instead of calling
    ->request_fn() directly as that does not take the queue stopped
    flag into account.

    Also add appropriate comments on the above functions to detail
    their usage.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

13 Oct, 2008

1 commit

  • Multipath is best at handling transport errors. If it gets a device
    error then there is not much the multipath layer can do. It will just
    access the same device but from a different path.

    This patch breaks up failfast into device, transport and driver errors.
    The multipath layers (md and dm mutlipath) only ask the lower levels to
    fast fail transport errors. The user of failfast, read ahead, will ask
    to fast fail on all errors.

    Note that blk_noretry_request will return true if any failfast bit
    is set. This allows drivers that do not support the multipath failfast
    bits to continue to fail on any failfast error like before. Drivers
    like scsi that are able to fail fast specific errors can check
    for the specific fail fast type. In the next patch I will convert
    scsi.

    Signed-off-by: Mike Christie
    Cc: Jens Axboe
    Signed-off-by: James Bottomley

    Mike Christie
     

09 Oct, 2008

1 commit

  • This patch removes end_queued_request() and end_dequeued_request(),
    which are no longer used.

    As a results, users of __end_request() became only end_request().
    So the actual code in __end_request() is moved to end_request()
    and __end_request() is removed.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda