29 Apr, 2010

3 commits


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

29 Dec, 2009

1 commit


02 Oct, 2009

2 commits

  • Currently we set the bio size to the byte equivalent of the blocks to
    be trimmed when submitting the initial DISCARD ioctl. That means it
    is subject to the max_hw_sectors limitation of the HBA which is
    much lower than the size of a DISCARD request we can support.
    Add a separate max_discard_sectors tunable to limit the size for discard
    requests.

    We limit the max discard request size in bytes to 32bit as that is the
    limit for bio->bi_size. This could be much larger if we had a way to pass
    that information through the block layer.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • prepare_discard_fn() was being called in a place where memory allocation
    was effectively impossible. This makes it inappropriate for all but
    the most trivial translations of Linux's DISCARD operation to the block
    command set. Additionally adding a payload there makes the ownership
    of the bio backing unclear as it's now allocated by the device driver
    and not the submitter as usual.

    It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
    the queue supports discard operations or not. blkdev_issue_discard now
    allocates a one-page, sector-length payload which is the right thing
    for the common ATA and SCSI implementations.

    The mtd implementation of prepare_discard_fn() is replaced with simply
    checking for the request being a discard.

    Largely based on a previous patch from Matthew Wilcox
    which did the prepare_discard_fn but not the different payload allocation
    yet.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Sep, 2009

1 commit

  • blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
    the only difference between the two is that blkdev_issue_discard needs to
    send a barrier discard request and blk_ioctl_discard a non-barrier one,
    and blk_ioctl_discard needs to wait on the request. To facilitates this
    add a flags argument to blkdev_issue_discard to control both aspects of the
    behaviour. This will be very useful later on for using the waiting
    funcitonality for other callers.

    Based on an earlier patch from Matthew Wilcox .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

23 May, 2009

1 commit


20 May, 2009

1 commit


11 May, 2009

3 commits

  • Till now block layer allowed two separate modes of request execution.
    A request is always acquired from the request queue via
    elv_next_request(). After that, drivers are free to either dequeue it
    or process it without dequeueing. Dequeue allows elv_next_request()
    to return the next request so that multiple requests can be in flight.

    Executing requests without dequeueing has its merits mostly in
    allowing drivers for simpler devices which can't do sg to deal with
    segments only without considering request boundary. However, the
    benefit this brings is dubious and declining while the cost of the API
    ambiguity is increasing. Segment based drivers are usually for very
    old or limited devices and as converting to dequeueing model isn't
    difficult, it doesn't justify the API overhead it puts on block layer
    and its more modern users.

    Previous patches converted all block low level drivers to dequeueing
    model. This patch completes the API transition by...

    * renaming elv_next_request() to blk_peek_request()

    * renaming blkdev_dequeue_request() to blk_start_request()

    * adding blk_fetch_request() which is combination of peek and start

    * disallowing completion of queued (not started) requests

    * applying new API to all LLDs

    Renamings are for consistency and to break out of tree code so that
    it's apparent that out of tree drivers need updating.

    [ Impact: block request issue API cleanup, no functional change ]

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc: James Bottomley
    Cc: Mike Miller
    Cc: unsik Kim
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Geert Uytterhoeven
    Cc: David S. Miller
    Cc: Laurent Vivier
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Grant Likely
    Cc: Adrian McMenamin
    Cc: Stephen Rothwell
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Alex Dubov
    Cc: Pierre Ossman
    Cc: David Woodhouse
    Cc: Markus Lidel
    Cc: Stefan Weinhuber
    Cc: Martin Schwidefsky
    Cc: Pete Zaitcev
    Cc: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With recent cleanups, there is no place where low level driver
    directly manipulates request fields. This means that the 'hard'
    request fields always equal the !hard fields. Convert all
    rq->sectors, nr_sectors and current_nr_sectors references to
    accessors.

    While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

    [ Impact: use pos and nr_sectors accessors ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Tested-by: Adrian McMenamin
    Acked-by: Adrian McMenamin
    Acked-by: Mike Miller
    Cc: James Bottomley
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Eric Moore
    Cc: Alan Stern
    Cc: FUJITA Tomonori
    Cc: Pete Zaitcev
    Cc: Stephen Rothwell
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: Dario Ballabio
    Cc: David S. Miller
    Cc: Rusty Russell
    Cc: unsik Kim
    Cc: Laurent Vivier
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Implement accessors - blk_rq_pos(), blk_rq_sectors() and
    blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
    and rq->hard_cur_sectors respectively and convert direct references of
    the said fields to the accessors.

    This is in preparation of request data length handling cleanup.

    Geert : suggested adding const to struct request * parameter to accessors
    Sergei : spotted error in patch description

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Acked-by: Stephen Rothwell
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Ackec-by: Sergei Shtylyov
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

28 Apr, 2009

1 commit

  • There are many [__]blk_end_request() call sites which call it with
    full request length and expect full completion. Many of them ensure
    that the request actually completes by doing BUG_ON() the return
    value, which is awkward and error-prone.

    This patch adds [__]blk_end_request_all() which takes @rq and @error
    and fully completes the request. BUG_ON() is added to to ensure that
    this actually happens.

    Most conversions are simple but there are a few noteworthy ones.

    * cdrom/viocd: viocd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/block/dasd: dasd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/char/tape_block: tapeblock_end_request() replaced with direct
    calls to blk_end_request_all().

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Stephen Rothwell
    Cc: Mike Miller
    Cc: Martin Schwidefsky
    Cc: Jeff Garzik
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: James Bottomley

    Tejun Heo
     

15 Apr, 2009

1 commit


30 Jan, 2009

1 commit


29 Dec, 2008

6 commits

  • Empty barrier on write-through (or no cache) w/ ordered tag has no
    command to execute and without any command to execute ordered tag is
    never issued to the device and the ordering is never achieved. Force
    draining for such cases.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Empty barrier required special handling in __elv_next_request() to
    complete it without letting the low level driver see it.

    With previous changes, barrier code is now flexible enough to skip the
    BAR step using the same barrier sequence selection mechanism. Drop
    the special handling and mask off q->ordered from start_ordered().

    Remove blk_empty_barrier() test which now has no user.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Barrier completion had the following assumptions.

    * start_ordered() couldn't finish the whole sequence properly. If all
    actions are to be skipped, q->ordseq is set correctly but the actual
    completion was never triggered thus hanging the barrier request.

    * Drain completion in elv_complete_request() assumed that there's
    always at least one request in the queue when drain completes.

    Both assumptions are true but these assumptions need to be removed to
    improve empty barrier implementation. This patch makes the following
    changes.

    * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
    steps complete and notify __elv_next_request() that it should fetch
    the next request if the whole barrier has completed inside
    start_ordered().

    * Make drain completion path in elv_complete_request() check whether
    the queue is empty. Empty queue also indicates drain completion.

    * While at it, convert 0/1 return from blk_do_ordered() to false/true.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • In all barrier sequences, the barrier write itself was always assumed
    to be issued and thus didn't have corresponding control flag. This
    patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
    start_ordered() such that any barrier action can be skipped.

    This patch doesn't introduce any visible behavior changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * Because barrier mode can be changed dynamically, whether barrier is
    supported or not can be determined only when actually issuing the
    barrier and there is no point in checking it earlier. Drop barrier
    support check in generic_make_request() and __make_request(), and
    update comment around the support check in blk_do_ordered().

    * There is no reason to check discard support in both
    generic_make_request() and __make_request(). Drop the check in
    __make_request(). While at it, move error action block to the end
    of the function and add unlikely() to q existence test.

    * Barrier request, be it empty or not, is never passed to low level
    driver and thus it's meaningless to try to copy back req->sector to
    bio->bi_sector on error. In addition, the notion of failed sector
    doesn't make any sense for empty barrier to begin with. Drop the
    code block from __end_that_request_first().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Separate out ordering type (drain,) and action masks (preflush,
    postflush, fua) from visible ordering mode selectors
    (QUEUE_ORDERED_*). Ordering types are now named QUEUE_ORDERED_BY_*
    while action masks are named QUEUE_ORDERED_DO_*.

    This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
    optional to improve empty barrier implementation.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

03 Dec, 2008

1 commit

  • blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
    both start the timeout timer. Barrier code dequeues the original
    barrier request but doesn't passes the request itself to lower level
    driver, only broken down proxy requests; however, as the original
    barrier code goes through the same dequeue path and timeout timer is
    started on it. If barrier sequence takes long enough, this timer
    expires but the low level driver has no idea about this request and
    oops follows.

    Timeout timer shouldn't have been started on the original barrier
    request as it never goes through actual IO. This patch unexports
    elv_dequeue_request(), which has no external user anyway, and makes it
    operate on elevator proper w/o adding the timer and make
    blkdev_dequeue_request() call elv_dequeue_request() and add timer.
    Internal users which don't pass the request to driver - barrier code
    and end_that_request_last() - are converted to use
    elv_dequeue_request().

    Signed-off-by: Tejun Heo
    Cc: Mike Anderson
    Signed-off-by: Jens Axboe

    Tejun Heo
     

09 Oct, 2008

4 commits

  • Two mods to blkdev_issue_discard(), thinking ahead to its use on swap:

    1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL
    might deadlock but GFP_NOIO is safe.

    2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is
    enough to cover a whole swap area, but sector_t suits any partition.

    Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen
    for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard().

    Signed-off-by: Hugh Dickins
    Signed-off-by: Jens Axboe

    Hugh Dickins
     
  • But blkdev_issue_discard() still emits requests which are interpreted as
    soft barriers, because naïve callers might otherwise issue subsequent
    writes to those same sectors, which might cross on the queue (if they're
    reallocated quickly enough).

    Callers still _can_ issue non-barrier discard requests, but they have to
    take care of queue ordering for themselves.

    Signed-off-by: David Woodhouse
    Signed-off-by: Jens Axboe

    David Woodhouse
     
  • Barriers should be submitted with the WRITE flag set.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: David Woodhouse
    Signed-off-by: Jens Axboe

    OGAWA Hirofumi
     
  • Some block devices benefit from a hint that they can forget the contents
    of certain sectors. Add basic support for this to the block core, along
    with a 'blkdev_issue_discard()' helper function which issues such
    requests.

    The caller doesn't get to provide an end_io functio, since
    blkdev_issue_discard() will automatically split the request up into
    multiple bios if appropriate. Neither does the function wait for
    completion -- it's expected that callers won't care about when, or even
    _if_, the request completes. It's only a hint to the device anyway. By
    definition, the file system doesn't _care_ about these sectors any more.

    [With feedback from OGAWA Hirofumi and
    Jens Axboe
    Signed-off-by: Jens Axboe

    David Woodhouse
     

01 May, 2008

1 commit


29 Apr, 2008

3 commits

  • This rename rq_init() blk_rq_init() and export it. Any path that hands
    the request to the block layer needs to call it to initialize the
    request.

    This is a preparation for large command support, which needs to
    initialize the request in a proper way (that is, just doing a memset()
    will not work).

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • This patch fixes the following build error with UML and gcc 4.3:

    ...
    CC block/blk-barrier.o
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c: In function ‘blk_do_ordered’:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:57: sorry, unimplemented: inlining failed in call to ‘blk_ordered_cur_seq’: function body not available
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:252: sorry, unimplemented: called from here
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:57: sorry, unimplemented: inlining failed in call to ‘blk_ordered_cur_seq’: function body not available
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:253: sorry, unimplemented: called from here
    make[2]: *** [block/blk-barrier.o] Error 1

    Signed-off-by: Adrian Bunk
    Signed-off-by: Jens Axboe

    Adrian Bunk
     
  • This requires moving rq_init() from get_request() to blk_alloc_request().
    The upside is that we can now require an rq_init() from any path that
    wishes to hand the request to the block layer.

    rq_init() will be exported for the code that uses struct request
    without blk_get_request.

    This is a preparation for large command support, which needs to
    initialize struct request in a proper way (that is, just doing a
    memset() will not work).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

04 Mar, 2008

1 commit


01 Feb, 2008

1 commit


30 Jan, 2008

1 commit