09 Oct, 2008

10 commits

  • Since all bio_split calls refer the same single bio_split_pool, the bio_split
    function can use bio_split_pool directly instead of the mempool_t parameter;

    then the mempool_t parameter can be removed from bio_split param list, and
    bio_split_pool is only referred in fs/bio.c file, can be marked static.

    Signed-off-by: Denis ChengRq
    Signed-off-by: Jens Axboe

    Denis ChengRq
     
  • Helper function to find the sector offset in a bio given bvec index
    and page offset.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Not all callers need (or want!) the mempool backing guarentee, it
    essentially means that you can only use bio_alloc() for short allocations
    and not for preallocating some bio's at setup or init time.

    So add bio_kmalloc() which does the same thing as bio_alloc(), except
    it just uses kmalloc() as the backing instead of the bio mempools.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch changes blk_rq_map_user to accept a NULL user-space buffer
    with a READ command if rq_map_data is not NULL. Thus a caller can pass
    page frames to lk_rq_map_user to just set up a request and bios with
    page frames propely. bio_uncopy_user (called via blk_rq_unmap_user)
    doesn't copy data to user space with such request.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • bio_copy_kern and bio_copy_user are very similar. This converts
    bio_copy_kern to use bio_copy_user.

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • This patch introduces struct rq_map_data to enable bio_copy_use_iov()
    use reserved pages.

    Currently, bio_copy_user_iov allocates bounce pages but
    drivers/scsi/sg.c wants to allocate pages by itself and use
    them. struct rq_map_data can be used to pass allocated pages to
    bio_copy_user_iov.

    The current users of bio_copy_user_iov simply passes NULL (they don't
    want to use pre-allocated pages).

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Cc: Douglas Gilbert
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • Currently, blk_rq_map_user and blk_rq_map_user_iov always do
    GFP_KERNEL allocation.

    This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
    so sg can use it (sg always does GFP_ATOMIC allocation).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Douglas Gilbert
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • This patch adds support for controlling the IO completion CPU of
    either all requests on a queue, or on a per-request basis. We export
    a sysfs variable (rq_affinity) which, if set, migrates completions
    of requests to the CPU that originally submitted it. A bio helper
    (bio_set_completion_cpu()) is also added, so that queuers can ask
    for completion on that specific CPU.

    In testing, this has been show to cut the system time by as much
    as 20-40% on synthetic workloads where CPU affinity is desired.

    This requires a little help from the architecture, so it'll only
    work as designed for archs that are using the new generic smp
    helper infrastructure.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Remove hw_segments field from struct bio and struct request. Without virtual
    merge accounting they have no purpose.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     
  • Remove virtual merge accounting.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     

27 Aug, 2008

2 commits

  • The commit c5dec1c3034f1ae3503efbf641ff3b0273b64797 introduced
    __bio_copy_iov() to add bounce support to blk_rq_map_user_iov.

    __bio_copy_iov() uses bio->bv_len to copy data for READ commands after
    the completion but it doesn't work with a request that partially
    completed. SCSI always completes a PC request as a whole but seems
    some don't.

    Signed-off-by: FUJITA Tomonori
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • The commit 68154e90c9d1492d570671ae181d9a8f8530da55 introduced
    bio_copy_kern() to add bounce support to blk_rq_map_kern.

    bio_copy_kern() uses bio->bv_len to copy data for READ commands after
    the completion but it doesn't work with a request that partially
    completed. SCSI always completes a PC request as a whole but seems
    some don't.

    This patch fixes bio_copy_kern to handle the above case. As
    bio_copy_user does, bio_copy_kern uses struct bio_map_data to store
    struct bio_vec.

    Signed-off-by: FUJITA Tomonori
    Reported-by: Nix
    Tested-by: Nix
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

06 Aug, 2008

1 commit


27 Jul, 2008

1 commit

  • Use get_user_pages_fast in the common/generic block and fs direct IO paths.

    Signed-off-by: Nick Piggin
    Cc: Dave Kleikamp
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Dave Kleikamp
    Cc: Badari Pulavarty
    Cc: Zach Brown
    Cc: Jens Axboe
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

03 Jul, 2008

3 commits

  • When devices are stacked, one device's merge_bvec_fn may need to perform
    the mapping and then call one or more functions for its underlying devices.

    The following bio fields are used:
    bio->bi_sector
    bio->bi_bdev
    bio->bi_size
    bio->bi_rw using bio_data_dir()

    This patch creates a new struct bvec_merge_data holding a copy of those
    fields to avoid having to change them directly in the struct bio when
    going down the stack only to have to change them back again on the way
    back up. (And then when the bio gets mapped for real, the whole
    exercise gets repeated, but that's a problem for another day...)

    Signed-off-by: Alasdair G Kergon
    Cc: Neil Brown
    Cc: Milan Broz
    Signed-off-by: Jens Axboe

    Alasdair G Kergon
     
  • Some block devices support verifying the integrity of requests by way
    of checksums or other protection information that is submitted along
    with the I/O.

    This patch implements support for generating and verifying integrity
    metadata, as well as correctly merging, splitting and cloning bios and
    requests that have this extra information attached.

    See Documentation/block/data-integrity.txt for more information.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Move struct bio_set and biovec_slab definitions to bio.h so they can
    be used outside of bio.c.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

08 May, 2008

1 commit


07 May, 2008

1 commit


29 Apr, 2008

1 commit

  • This patch adds bio_copy_kern similar to
    bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of
    bio_map_kern if necessary.

    bio_copy_kern uses temporary pages and the bi_end_io callback frees
    these pages. bio_copy_kern saves the original kernel buffer at
    bio->bi_private it doesn't use something like struct bio_map_data to
    store the information about the caller.

    Signed-off-by: FUJITA Tomonori
    Cc: Tejun Heo
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

21 Apr, 2008

1 commit

  • This patch enables bio_copy_user to take struct sg_iovec (renamed
    bio_copy_user_iov). bio_copy_user uses bio_copy_user_iov internally as
    bio_map_user uses bio_map_user_iov.

    The major changes are:

    - adds sg_iovec array to struct bio_map_data

    - adds __bio_copy_iov that copy data between bio and
    sg_iovec. bio_copy_user_iov and bio_uncopy_user use it.

    Signed-off-by: FUJITA Tomonori
    Cc: Tejun Heo
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

18 Mar, 2008

1 commit

  • Outside users like asmlib uses the mapping functions. API wise, the
    export is definitely sane. It's a better idea to keep this export
    than to require external users to open-code this piece of code instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

19 Feb, 2008

1 commit

  • Commit b2e895dbd80c420bfc0937c3729b4afe073b3848 #if 0'ed this code stating:

    [PATCH] revert blockdev direct io back to 2.6.19 version

    Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
    slowdown due to the recent special-casing of direct-io against
    blockdevs. We don't know why either of these things are occurring.

    The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
    release.

    It has since been dead code, and unless someone wants to revive it now
    it's time to remove it.

    This patch also makes bio_release_pages() static again and removes the
    ki_bio_count member from struct kiocb, reverting changes that had been
    done for this dead code.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Jens Axboe

    Adrian Bunk
     

28 Jan, 2008

1 commit


16 Oct, 2007

2 commits


10 Oct, 2007

3 commits

  • As bi_end_io is only called once when the reqeust is complete,
    the 'size' argument is now redundant. Remove it.

    Now there is no need for bio_endio to subtract the size completed
    from bi_size. So don't do that either.

    While we are at it, change bi_end_io to return void.

    Signed-off-by: Neil Brown
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • The only caller of bio_endio that does not pass the full bi_size
    is end_that_request_first. Also, no ->bi_end_io method is really
    interested in bi_size being decremented.

    So move the decrement and related code into ll_rw_blk and merge it
    with order_bio_endio to form req_bio_endio which does endio functionality
    specific to request completion.

    As some ->bi_end_io methods do check bi_size of 0, we set it thus for
    now, but that will go in the next patch.

    Signed-off-by: Neil Brown

    ### Diffstat output
    ./block/ll_rw_blk.c | 42 +++++++++++++++++++++++++++---------------
    ./fs/bio.c | 23 +++++++++++------------
    2 files changed, 38 insertions(+), 27 deletions(-)

    diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • Currently bi_end_io can be called multiple times as sub-requests
    complete. However no ->bi_end_io function wants to know about that.
    So only call when the bio is complete.

    Signed-off-by: Neil Brown

    ### Diffstat output
    ./fs/bio.c | 4 +++-
    1 file changed, 3 insertions(+), 1 deletion(-)

    diff .prev/fs/bio.c ./fs/bio.c
    Signed-off-by: Jens Axboe

    NeilBrown
     

24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

10 Jul, 2007

1 commit


08 May, 2007

1 commit

  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

30 Apr, 2007

1 commit

  • Currently we scale the mempool sizes depending on memory installed
    in the machine, except for the bio pool itself which sits at a fixed
    256 entry pre-allocation.

    There's really no point in "optimizing" this OOM path, we just need
    enough preallocated to make progress. A single unit is enough, lets
    scale it down to 2 just to be on the safe side.

    This patch saves ~150kb of pinned kernel memory on a 32-bit box.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Dec, 2006

1 commit

  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

05 Dec, 2006

1 commit


01 Dec, 2006

2 commits

  • This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c
    users so that it supports requests larger than bio by chaining them together.

    Signed-off-by: Mike Christie
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • The target mode support is mapping in bios using bio_map_user. The
    current targets do not need their len to be aligned with a queue limit
    so this check is causing some problems. Note: pointers passed into the
    kernel are properly aligned by usersapace tgt code so the uaddr check
    in bio_map_user is ok.

    The major user, blk_bio_map_user checks for the len before mapping
    so it is not affected by this patch.

    And the semi-newly added user blk_rq_map_user_iov has been failing
    out when the len is not aligned properly so maybe people have been
    good and not sending misaligned lens or that path is not used very
    often and this change will not be very dangerous. st and sg do not
    check the length and we have not seen any problem reports from those
    wider used paths so this patch should be fairly safe - for mm
    and wider testing at least.

    Signed-off-by: Mike Christie
    Signed-off-by: FUJITA Tomonori
    Signed-off-by: James Bottomley
    Signed-off-by: Jens Axboe

    Mike Christie
     

22 Nov, 2006

1 commit

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells