12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

11 Jun, 2009

1 commit

  • As reported by sparse:

    fs/bio.c:720:13: warning: incorrect type in assignment (different address spaces)
    fs/bio.c:720:13: expected char *iov_addr
    fs/bio.c:720:13: got void [noderef] *
    fs/bio.c:724:36: warning: incorrect type in argument 2 (different address spaces)
    fs/bio.c:724:36: expected void const [noderef] *from
    fs/bio.c:724:36: got char *iov_addr

    Signed-off-by: Michal Simek
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Michal Simek
     

10 Jun, 2009

1 commit

  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

23 May, 2009

3 commits


19 May, 2009

1 commit

  • When a read bio_copy_kern() request fails, the content of the bounce
    buffer is not copied back. However, as request failure doesn't
    necessarily mean complete failure, the buffer state can be useful.
    This behavior is also inconsistent with the user map counterpart and
    causes the subtle difference between bounced and unbounced IO causes
    confusion.

    This patch makes bio_copy_kern_endio() ignore @err and always copy
    back data on request completion.

    Signed-off-by: Tejun Heo
    Cc: Boaz Harrosh
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

29 Apr, 2009

1 commit


22 Apr, 2009

2 commits

  • Impact: remove possible deadlock condition

    There is no reason to use mempool backed allocation for map functions.
    Also, because kern mapping is used inside LLDs (e.g. for EH), using
    mempool backed allocation can lead to deadlock under extreme
    conditions (mempool already consumed by the time a request reached EH
    and requests are blocked on EH).

    Switch copy/map functions to bio_kmalloc().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Impact: fix bio_kmalloc() and its destruction path

    bio_kmalloc() was broken in two ways.

    * bvec_alloc_bs() first allocates bvec using kmalloc() and then
    ignores it and allocates again like non-kmalloc bvecs.

    * bio_kmalloc_destructor() didn't check for and free bio integrity
    data.

    This patch fixes the above problems. kmalloc patch is separated out
    from bio_alloc_bioset() and allocates the requested number of bvecs as
    inline bvecs.

    * bio_alloc_bioset() no longer takes NULL @bs. None other than
    bio_kmalloc() used it and outside users can't know how it was
    allocated anyway.

    * Define and use BIO_POOL_NONE so that pool index check in
    bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
    there.

    * Relocate destructors on top of each allocation function so that how
    they're used is more clear.

    Jens Axboe suggested allocating bvecs inline.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

15 Apr, 2009

1 commit


30 Mar, 2009

1 commit


24 Mar, 2009

3 commits

  • The integrity bio allocation needs its own bio_set to avoid violating
    the mempool allocation rules and risking deadlocks.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • If we don't have CONFIG_BLK_DEV_INTEGRITY set, then we don't have
    any external dependencies on the bio_vec slabs. So don't create
    the ones that we will inline anyway.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • this warning (which got fixed by commit b2bf968):

    fs/bio.c: In function ‘bio_alloc_bioset’:
    fs/bio.c:305: warning: ‘p’ may be used uninitialized in this function

    Triggered because the code flow in bio_alloc_bioset() is correct
    but a bit complex for the compiler to see through.

    Streamline it a bit - this also makes the code a tiny bit more compact:

    text data bss dec hex filename
    7540 256 40 7836 1e9c bio.o.before
    7539 256 40 7835 1e9b bio.o.after

    Also remove an older compiler-warnings annotation from this function,
    it's not needed.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Jens Axboe

    Ingo Molnar
     

15 Mar, 2009

2 commits


26 Feb, 2009

1 commit


18 Feb, 2009

1 commit


03 Jan, 2009

3 commits

  • The commit 818827669d85b84241696ffef2de485db46b0b5e (block: make
    blk_rq_map_user take a NULL user-space buffer) extended
    blk_rq_map_user to accept a NULL user-space buffer with a READ
    command. It was necessary to convert sg to use the block layer mapping
    API.

    This patch extends blk_rq_map_user again for a WRITE command. It is
    necessary to convert st and osst drivers to use the block layer
    apping API.

    Signed-off-by: FUJITA Tomonori
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     
  • This fixes bio_copy_user_iov to properly handle the partial mappings
    with struct rq_map_data (which only sg uses for now but st and osst
    will shortly). It adds the offset member to struct rq_map_data and
    changes blk_rq_map_user to update it so that bio_copy_user_iov can add
    an appropriate page frame via bio_add_pc_page().

    Signed-off-by: FUJITA Tomonori
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     
  • This fixes bio_add_page misuse in bio_copy_user_iov with rq_map_data,
    which only sg uses now.

    rq_map_data carries page frames for bio_add_pc_page. bio_copy_user_iov
    uses bio_add_pc_page with a larger size than PAGE_SIZE. It's clearly
    wrong.

    Signed-off-by: FUJITA Tomonori
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     

29 Dec, 2008

5 commits

  • We don't need to clear the memory used for adding bio_vec entries,
    since nobody should be looking at members unitialized. Any valid
    use should be below bio->bi_vcnt, and that members up until that count
    must be valid since they were added through bio_add_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When we go and allocate a bio for IO, we actually do two allocations.
    One for the bio itself, and one for the bi_io_vec that holds the
    actual pages we are interested in.

    This feature inlines a definable amount of io vecs inside the bio
    itself, so we eliminate the bio_vec array allocation for IO's up
    to a certain size. It defaults to 4 vecs, which is typically 16k
    of IO.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of having a global bio slab cache, add a reference to one
    in each bio_set that is created. This allows for personalized slabs
    in each bio_set, so that they can have bios of different sizes.

    This means we can personalize the bios we return. File systems may
    want to embed the bio inside another structure, to avoid allocation
    more items (and stuffing them in ->bi_private) after the get a bio.
    Or we may want to embed a number of bio_vecs directly at the end
    of a bio, to avoid doing two allocations to return a bio. This is now
    possible.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • In preparation for adding differently sized bios.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We only very rarely need the mempool backing, so it makes sense to
    get rid of all but one of the mempool in a bio_set. So keep the
    largest bio_vec count mempool so we can always honor the largest
    allocation, and "upgrade" callers that fail.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

26 Nov, 2008

2 commits

  • Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
    sites. Spread them out to the usage sites, as suggested by
    Mathieu Desnoyers.

    Signed-off-by: Ingo Molnar
    Acked-by: Mathieu Desnoyers

    Ingo Molnar
     
  • This was a forward port of work done by Mathieu Desnoyers, I changed it to
    encode the 'what' parameter on the tracepoint name, so that one can register
    interest in specific events and not on classes of events to then check the
    'what' parameter.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jens Axboe
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

09 Oct, 2008

10 commits

  • Since all bio_split calls refer the same single bio_split_pool, the bio_split
    function can use bio_split_pool directly instead of the mempool_t parameter;

    then the mempool_t parameter can be removed from bio_split param list, and
    bio_split_pool is only referred in fs/bio.c file, can be marked static.

    Signed-off-by: Denis ChengRq
    Signed-off-by: Jens Axboe

    Denis ChengRq
     
  • Helper function to find the sector offset in a bio given bvec index
    and page offset.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Not all callers need (or want!) the mempool backing guarentee, it
    essentially means that you can only use bio_alloc() for short allocations
    and not for preallocating some bio's at setup or init time.

    So add bio_kmalloc() which does the same thing as bio_alloc(), except
    it just uses kmalloc() as the backing instead of the bio mempools.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch changes blk_rq_map_user to accept a NULL user-space buffer
    with a READ command if rq_map_data is not NULL. Thus a caller can pass
    page frames to lk_rq_map_user to just set up a request and bios with
    page frames propely. bio_uncopy_user (called via blk_rq_unmap_user)
    doesn't copy data to user space with such request.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • bio_copy_kern and bio_copy_user are very similar. This converts
    bio_copy_kern to use bio_copy_user.

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • This patch introduces struct rq_map_data to enable bio_copy_use_iov()
    use reserved pages.

    Currently, bio_copy_user_iov allocates bounce pages but
    drivers/scsi/sg.c wants to allocate pages by itself and use
    them. struct rq_map_data can be used to pass allocated pages to
    bio_copy_user_iov.

    The current users of bio_copy_user_iov simply passes NULL (they don't
    want to use pre-allocated pages).

    Signed-off-by: FUJITA Tomonori
    Cc: Jens Axboe
    Cc: Douglas Gilbert
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • Currently, blk_rq_map_user and blk_rq_map_user_iov always do
    GFP_KERNEL allocation.

    This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
    so sg can use it (sg always does GFP_ATOMIC allocation).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Douglas Gilbert
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • This patch adds support for controlling the IO completion CPU of
    either all requests on a queue, or on a per-request basis. We export
    a sysfs variable (rq_affinity) which, if set, migrates completions
    of requests to the CPU that originally submitted it. A bio helper
    (bio_set_completion_cpu()) is also added, so that queuers can ask
    for completion on that specific CPU.

    In testing, this has been show to cut the system time by as much
    as 20-40% on synthetic workloads where CPU affinity is desired.

    This requires a little help from the architecture, so it'll only
    work as designed for archs that are using the new generic smp
    helper infrastructure.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Remove hw_segments field from struct bio and struct request. Without virtual
    merge accounting they have no purpose.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     
  • Remove virtual merge accounting.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     

27 Aug, 2008

1 commit

  • The commit c5dec1c3034f1ae3503efbf641ff3b0273b64797 introduced
    __bio_copy_iov() to add bounce support to blk_rq_map_user_iov.

    __bio_copy_iov() uses bio->bv_len to copy data for READ commands after
    the completion but it doesn't work with a request that partially
    completed. SCSI always completes a PC request as a whole but seems
    some don't.

    Signed-off-by: FUJITA Tomonori
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    FUJITA Tomonori