28 Apr, 2009

40 commits

  • There are many [__]blk_end_request() call sites which call it with
    full request length and expect full completion. Many of them ensure
    that the request actually completes by doing BUG_ON() the return
    value, which is awkward and error-prone.

    This patch adds [__]blk_end_request_all() which takes @rq and @error
    and fully completes the request. BUG_ON() is added to to ensure that
    this actually happens.

    Most conversions are simple but there are a few noteworthy ones.

    * cdrom/viocd: viocd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/block/dasd: dasd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/char/tape_block: tapeblock_end_request() replaced with direct
    calls to blk_end_request_all().

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Stephen Rothwell
    Cc: Mike Miller
    Cc: Martin Schwidefsky
    Cc: Jeff Garzik
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: James Bottomley

    Tejun Heo
     
  • rq->start_time was initialized in init_request_from_bio() so special
    requests didn't have start_time set. This has been okay as start_time
    has been used only for fs requests; however, there is no indication of
    this actually is the case or not. Set rq->start_time in blk_rq_init()
    and guarantee that all initialized rq's have its start_time set. This
    improves consistency at virtually no cost and future changes will make
    use of the timestamp for !bio requests.

    [ Impact: rq->start_time is valid for all requests ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Request completion has gone through several changes and became a bit
    messy over the time. Clean it up.

    1. end_that_request_data() is a thin wrapper around
    end_that_request_data_first() which checks whether bio is NULL
    before doing anything and handles bidi completion.
    blk_update_request() is a thin wrapper around
    end_that_request_data() which clears nr_sectors on the last
    iteration but doesn't use the bidi completion.

    Clean it up by moving the initial bio NULL check and nr_sectors
    clearing on the last iteration into end_that_request_data() and
    renaming it to blk_update_request(), which makes blk_end_io() the
    only user of end_that_request_data(). Collapse
    end_that_request_data() into blk_end_io().

    2. There are four visible completion variants - blk_end_request(),
    __blk_end_request(), blk_end_bidi_request() and end_request().
    blk_end_request() and blk_end_bidi_request() uses blk_end_request()
    as the backend but __blk_end_request() and end_request() use
    separate implementation in __blk_end_request() due to different
    locking rules.

    blk_end_bidi_request() is identical to blk_end_io(). Collapse
    blk_end_io() into blk_end_bidi_request(), separate out request
    update into internal helper blk_update_bidi_request() and add
    __blk_end_bidi_request(). Redefine [__]blk_end_request() as thin
    inline wrappers around [__]blk_end_bidi_request().

    3. As the whole request issue/completion usages are about to be
    modified and audited, it's a good chance to convert completion
    functions return bool which better indicates the intended meaning
    of return values.

    4. The function name end_that_request_last() is from the days when it
    was a public interface and slighly confusing. Give it a proper
    internal name - blk_finish_request().

    5. Add description explaning that blk_end_bidi_request() can be safely
    used for uni requests as suggested by Boaz Harrosh.

    The only visible behavior change is from #1. nr_sectors counts are
    cleared after the final iteration no matter which function is used to
    complete the request. I couldn't find any place where the code
    assumes those nr_sectors counters contain the values for the last
    segment and this change is good as it makes the API much more
    consistent as the end result is now same whether a request is
    completed using [__]blk_end_request() alone or in combination with
    blk_update_request().

    API further cleaned up per Christoph's suggestion.

    [ Impact: cleanup, rq->*nr_sectors always updated after req completion ]

    Signed-off-by: Tejun Heo
    Reviewed-by: Boaz Harrosh
    Cc: Christoph Hellwig

    Tejun Heo
     
  • With recent IDE updates, blk_end_request_callback() doesn't have any
    user now. Kill it.

    [ Impact: removal of unused convoluted interface ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: code reorganization

    elv_next_request() and elv_dequeue_request() are public block layer
    interface than actual elevator implementation. They mostly deal with
    how requests interact with block layer and low level drivers at the
    beginning of rqeuest processing whereas __elv_next_request() is the
    actual eleveator request fetching interface.

    Move the two functions to blk-core.c. This prepares for further
    interface cleanup.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Reorder request completion functions such that

    * All request completion functions are located together.

    * Functions which are used by only one caller is put right above the
    caller.

    * end_request() is put after other completion functions but before
    blk_update_request().

    This change is for completion function cleanup which will follow.

    [ Impact: cleanup, code reorganization ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • * In blk_rq_timed_out_timer(), else { if } to else if

    * In blk_add_timer(), simplify if/else block

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
    Don't set it. Combined with recent ide updates, REQ_SOFTBARRIER is
    now only used in elevator proper and for discard requests.

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
    mergeable. There is no reason to specify it superflously. It only
    adds to confusion. Don't set REQ_NOMERGE for barriers and requests
    with specific queueing directive. REQ_NOMERGE is now exclusively used
    by the merging code.

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_start_queueing() is identical to __blk_run_queue() except that it
    doesn't check for recursion. None of the current users depends on
    blk_start_queueing() running request_fn directly. Replace usages of
    blk_start_queueing() with [__]blk_run_queue() and kill it.

    [ Impact: removal of mostly duplicate interface function ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • __blk_run_queue wraps blk_invoke_request_fn() such that it
    additionally removes plug and bails out early if the queue is empty.
    Both extra operations have their own pending mechanisms and don't
    cause any harm correctness-wise when they are done superflously.

    The only user of blk_invoke_request_fn() being blk_start_queue(),
    there isn't much reason to keep both functions around. Merge
    blk_invoke_request_fn() into __blk_run_queue() and make
    blk_start_queue() use __blk_run_queue() instead.

    [ Impact: merge two subtly different internal functions ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Doing a proper block dev ->readpages() speeds up the crazy dump(8)
    approach of using interleaved process IO.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • Enable by default support for large devices and files (CONFIG_LBD):

    - With 1TB disks being a commodity hardware it is quite easy to hit 2TB
    limitation while building RAIDs etc. and many distros have been using
    CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1).

    - This should also prevent a subtle ext4 filesystem compatibility issue:
    mke2fs.ext4 defaults to creating filesystems with huge_files feature
    enabled and such filesystems cannot be later mounted read-write on
    machines with CONFIG_LBD=n (it should be quite easy to hit this issue
    when trying to use filesystem created using distro kernel on system
    running the self-build kernel, think about USB disk enclosures & co.).

    While at it:

    - Clarify config option help text w.r.t. mounting ext4 filesystems
    (they can be mounted with CONFIG_LBD=n but in the read-only mode).

    Cc: "Theodore Ts'o"
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Bartlomiej Zolnierkiewicz
     
  • Impact: drop unnecessary code

    Now that everything uses bio and block operations, there is no need to
    reset request fields manually when retrying a request. Every field is
    guaranteed to be always valid. Drop unnecessary request field
    resetting from ide_dma_timeout_retry().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: remove code path which is no longer necessary

    All IDE data transfers now use rq->bio. Simplify ide_map_sg()
    accordingly.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: remove fields and code paths which are no longer necessary

    Now that ide-tape uses standard mechanisms to transfer data, special
    case handling for bh handling can be dropped from ide-atapi. Drop the
    followings.

    * pc->cur_pos, b_count, bh and b_data
    * drive->pc_update_buffers() and pc_io_buffers().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: cleanup

    idetape_chrdev_read/write() functions are unnecessarily complex when
    everything can be handled in a single loop. Collapse
    idetape_add_chrdev_read/write_request() into the rw functions and
    simplify the implementation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: cleanup

    Byte size is what most issue functions deal with, make
    idetape_queue_rw_tail() and its wrappers take byte size instead of
    sector counts. idetape_chrdev_read() and write() functions are
    converted to use tape->buffer_size instead of ctl from tape->cap.

    This cleans up code a little bit and will ease the next r/w
    reimplementation.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: cleanup

    Read and write init paths are almost identical. Unify them into
    idetape_init_rw().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: kill now unnecessary idetape_bh

    With everything using standard mechanisms, there is no need for
    idetape_bh anymore. Kill it and use tape->buf, cur and valid to
    describe data buffer instead.

    Changes worth mentioning are...

    * idetape_queue_rq_tail() now always queue tape->buf and and adjusts
    buffer state properly before completion.

    * idetape_pad_zeros() clears the buffer only once.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: use standard way to transfer data

    ide-tape uses rq in an interesting way. For r/w requests, rq->special
    is used to carry a private buffer management structure idetape_bh and
    rq->nr_sectors and current_nr_sectors are initialized to the number of
    idetape blocks which isn't necessary 512 bytes. Also,
    rq->current_nr_sectors is used to report back the residual count in
    units of idetape blocks.

    This peculiarity taxes both block layer and ide. ide-atapi has
    different paths and hooks to accomodate it and what a rq means becomes
    quite confusing and making changes at the block layer becomes quite
    difficult and error-prone.

    This patch makes ide-tape use bio instead. With the previous patch,
    ide-tape currently is using single contiguos buffer so replacing it
    isn't difficult. Data buffer is mapped into bio using
    blk_rq_map_kern() in idetape_queue_rw_tail(). idetape_io_buffers()
    and idetape_update_buffers() are dropped and pc->bh is set to null to
    tell ide-atapi to use standard data transfer mechanism and idetape_bh
    byte counts are updated by the issuer on completion using the residual
    count.

    This change also nicely removes the FIXME in ide_pc_intr() where
    ide-tape rqs need to be completed using ide_rq_bytes() instead of
    blk_rq_bytes() (although this didn't really matter as the request
    didn't have bio).

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: simpler buffer allocation and handling, kills OOM, fix DMA transfers

    ide-tape has its own multiple buffer mechanism using struct
    idetape_bh. It allocates buffer with decreasing order-of-two
    allocations so that it results in minimum number of segments.
    However, the implementation is quite complex and works in a way that
    no other block or ide driver works necessitating a lot of special case
    handling.

    The benefit this complex allocation scheme brings is questionable as
    PIO or DMA the number of segments (16 maximum) doesn't make any
    noticeable difference and it also doesn't negate the need for multiple
    order allocation which can fail under memory pressure or high
    fragmentation although it does lower the highest order necessary by
    one when the buffer size isn't power of two.

    As the first step to remove the custom buffer management, this patch
    makes ide-tape allocate single continous buffer. The maximum order is
    four. I doubt the change would cause any trouble but if it ever
    matters, it should be converted to regular sg mechanism like everyone
    else and even in that case dropping custom buffer handling and moving
    to standard mechanism first make sense as an intermediate step.

    This patch makes the first bh to contain the whole buffer and drops
    multi bh handling code. Following patches will make further changes.

    This patch has the side effect of killing OOM triggered by allocation
    path and fixing DMA transfers. Previously, bug in alloc path
    triggered OOM on command issue and commands were passed to DMA engine
    without DMA-mapping all the segments.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: allow residual count implementation in ->pc_callback()

    rq->data_len has two duties - carrying the number of input bytes on
    issue and carrying residual count back to the issuer on completion.
    ide-atapi completion callback ->pc_callback() is the right place to do
    this but currently ide-atapi depends on rq->data_len carrying the
    original request size after calling ->pc_callback() to complete the pc
    request.

    This patch makes ide_pc_intr(), ide_tape_issue_pc() and
    ide_floppy_issue_pc() cache length to complete before calling
    ->pc_callback() so that it can modify rq->data_len as necessary.

    Note: As using rq->data_len for two purposes can make cases like this
    incorrect in subtle ways, future changes will introduce separate
    field for residual count.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: fix infinite retry loop

    After a command failed, ide-tape and floppy inserts REQUEST_SENSE in
    front of the failed command and according to the result, sets
    pc->retries, flags and errors. After REQUEST_SENSE is complete, the
    failed command is again at the front of the queue and if the verdict
    was to terminate the request, the issue functions tries to complete it
    directly by calling drive->pc_callback() and returning ide_stopped.

    However, drive->pc_callback() doesn't complete a request. It only
    prepares for completion of the request. As a result, this creates an
    infinite loop where the failed request is retried perpetually.

    Fix it by actually ending the request by calling ide_complete_rq().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: cleanup rq->data usage

    ide-pm uses rq->data to carry pointer to struct request_pm_state
    through request queue and rq->special is used to carray pointer to
    local struct ide_cmd, which isn't necessary. Use rq->special for
    request_pm_state instead and use local ide_cmd in
    ide_start_power_step().

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: unify request data buffer handling

    rq->data is used mostly to pass kernel buffer through request queue
    without using bio. There are only a couple of places which still do
    this in kernel and converting to bio isn't difficult.

    This patch converts ide-cd and atapi to use bio instead of rq->data
    for request sense and internal pc commands. With previous change to
    unify sense request handling, this is relatively easily achieved by
    adding blk_rq_map_kern() during sense_rq prep and PC issue.

    If blk_rq_map_kern() fails for sense, the error is deferred till sense
    issue and aborts the failed command which triggered the sense. Note
    that this is a slim possibility as sense prep is done on each command
    issue, so for the above condition to actually trigger, all preps since
    the last sense issue till the issue of the request which would require
    a sense should fail.

    * do_request functions might sleep now. This should be okay as ide
    request_fn - do_ide_request() - is invoked only from make_request
    and plug work. Make sure this is the case by adding might_sleep()
    to do_ide_request().

    * Functions which access the read sense data before the sense request
    is complete now should access bio_data(sense_rq->bio) as the sense
    buffer might have been copied during blk_rq_map_kern().

    * ide-tape updated to map sg.

    * cdrom_do_block_pc() now doesn't have to deal with REQ_TYPE_ATA_PC
    special case. Simplified.

    * tp_ops->output/input_data path dropped from ide_pc_intr().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Since we're issuing REQ_TYPE_SENSE now we need to allow those types of
    rqs in the ->do_request callbacks. As a future improvement, sense_len
    assignment might be unified across all ATAPI devices. Borislav to
    check with specs and test.

    As a result, get rid of ide_queue_pc_head() and
    drive->request_sense_rq.

    tj: * Init request sense ide_atapi_pc from sense request. In the
    longer timer, it would probably better to fold
    ide_create_request_sense_cmd() into its only current user -
    ide_floppy_get_format_progress().

    * ide_retry_pc() no longer takes @disk.

    CC: Bartlomiej Zolnierkiewicz
    CC: FUJITA Tomonori
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tejun Heo

    Borislav Petkov
     
  • Preallocate a sense request in the ->do_request method and reinitialize
    it only on demand, in case it's been consumed in the IRQ handler path.
    The reason for this is that we don't want to be mapping rq to bio in
    the IRQ path and introduce all kinds of unnecessary hacks to the block
    layer.

    tj: * Both user and kernel PC requests expect sense data to be stored
    in separate storage other than drive->sense_data. Copy sense
    data to rq->sense on completion if rq->sense is not NULL. This
    fixes bogus sense data on PC requests.

    As a result, remove cdrom_queue_request_sense.

    CC: Bartlomiej Zolnierkiewicz
    CC: FUJITA Tomonori
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tejun Heo

    Borislav Petkov
     
  • This is in preparation of removing the queueing of a sense request out
    of the IRQ handler path.

    Use struct request_sense as a general sense buffer for all ATAPI
    devices ide-{floppy,tape,cd}.

    tj: * blk_get_request(__GFP_WAIT) can't be called from do_request() as
    it can cause deadlock. Converted to use inline struct request
    and blk_rq_init().

    * Added xfer / cdb len selection depending on device type.

    * All sense prep logics folded into ide_prep_sense() which never
    fails.

    * hwif->rq clearing and sense_rq used handling moved into
    ide_queue_sense_rq().

    * blk_rq_map_kern() conversion is moved to later patch.

    CC: Bartlomiej Zolnierkiewicz
    CC: FUJITA Tomonori
    Signed-off-by: Borislav Petkov
    Signed-off-by: Tejun Heo

    Borislav Petkov
     
  • Impact: rq->buffer usage cleanup

    ide-cd uses rq->buffer to carry pointer to the original request when
    issuing REQUEST_SENSE. Use rq->special instead.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: rq->buffer usage cleanup

    ide-atapi uses rq->buffer as private opaque value for internal special
    requests. rq->special isn't used for these cases (the only case where
    rq->special is used is for ide-tape rw requests). Use rq->special
    instead.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: rq->buffer usage cleanup

    ide_raw_taskfile() directly uses rq->buffer to carry pointer to the
    data buffer. This complicates both block interface and ide backend
    request handling. Use blk_rq_map_kern() instead and drop special
    handling for REQ_TYPE_ATA_TASKFILE from ide_map_sg().

    Note that REQ_RW setting is moved upwards as blk_rq_map_kern() uses it
    to initialize bio rw flag.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: remove unnecessary code path

    Block pc requests always use bio and rq->data is always NULL. No need
    to worry about !rq->bio cases in idefloppy_block_pc_cmd(). Note that
    ide-atapi uses ide_pio_bytes() for bio PIO transfer which handle sg
    fine.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe

    Tejun Heo
     
  • Impact: code simplification

    ide_cd_request_sense_fixup() clears the tail of the sense buffer if
    the device didn't completely fill it. This patch makes
    cdrom_queue_request_sense() clear the sense buffer before issuing the
    command instead of clearing it afterwards. This simplifies code and
    eases future changes.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: removal of unused field

    No one uses ide_cmd->special anymore. Kill it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • ide doesn't have to worry about REQ_SOFTBARRIER. Don't set it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_start_queueing() is being phased out in favor of
    [__]blk_run_queue(). Switch.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: fix an oops which always triggers

    ide_tape_issue_pc() assumed drive->pc isn't NULL on invocation when
    checking for back-to-back request sense issues but drive->pc can be
    NULL and even when it's not NULL, it's not safe to dereference it once
    the previous command is complete because pc could have been freed or
    was on stack. Kill back-to-back REQUEST_SENSE detection.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: subtle behavior change

    For fs requests, rq is only carrier of bios and rq error status as a
    whole doesn't mean much. This is the reason why rq->errors is being
    cleared on each partial completion of a request as on each partial
    completion the error status is transferred to the respective bios.

    For pc requests, rq->errors is used to carry error status to the
    issuer and thus __end_that_request_first() doesn't clear it on such
    cases.

    The condition was fine till now as only fs and pc requests have used
    bio and thus the bio completion path. However, future changes will
    unify data accesses to bio and all non fs users care about rq error
    status. Clear rq->errors on bio completion only for fs requests.

    In general, the implicit clearing is a bit too subtle especially as
    the meaning of rq->errors is completely dependent on low level
    drivers. Unifying / cleaning up rq->errors usage and letting llds
    manage it would be better. TODO comment added.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe

    Tejun Heo
     
  • Now that the bio list management stuff is generic, convert loop to use
    bio lists instead of its own private bio list implementation.

    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Akinobu Mita
    Signed-off-by: Jens Axboe

    Akinobu Mita