16 Nov, 2011

1 commit

  • This is just a cleanup patch to silence a static checker warning.

    The problem is that we cap "nr_iovecs" so it can't be larger than
    "UIO_MAXIOV" but we don't check for negative values. It turns out this is
    prevented at other layers, but logically it doesn't make sense to have
    negative nr_iovecs so making it unsigned is nicer.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Dan Carpenter
     

24 Oct, 2011

1 commit

  • bio originally has the functionality to set the complete cpu, but
    it is broken.

    Chirstoph said that "This code is unused, and from the all the
    discussions lately pretty obviously broken. The only thing keeping
    it serves is creating more confusion and possibly more bugs."

    And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
    with leaving cpu control to the request based drivers, they are the
    only ones that can toggle the setting anyway".

    So this patch tries to remove all the work of controling complete cpu
    from a bio.

    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     

27 May, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

23 Mar, 2011

1 commit

  • printk()s without a priority level default to KERN_WARNING. To reduce
    noise at KERN_WARNING, this patch set the priority level appriopriately
    for unleveled printks()s. This should be useful to folks that look at
    dmesg warnings closely.

    Signed-off-by: Mandeep Singh Baines
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     

17 Mar, 2011

1 commit

  • MD and DM create a new bio_set for every metadevice. Each bio_set has an
    integrity mempool attached regardless of whether the metadevice is
    capable of passing integrity metadata. This is a waste of memory.

    Instead we defer the allocation decision to MD and DM since we know at
    metadevice creation time whether integrity passthrough is needed or not.

    Automatic integrity mempool allocation can then be removed from
    bioset_create() and we make an explicit integrity allocation for the
    fs_bio_set.

    Signed-off-by: Martin K. Petersen
    Reported-by: Zdenek Kabelac
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

08 Mar, 2011

1 commit


10 Nov, 2010

2 commits


08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 Mar, 2010

1 commit


08 Mar, 2010

2 commits

  • Conflicts:
    Documentation/filesystems/proc.txt
    arch/arm/mach-u300/include/mach/debug-macro.S
    drivers/net/qlge/qlge_ethtool.c
    drivers/net/qlge/qlge_main.c
    drivers/net/typhoon.c

    Jiri Kosina
     
  • merge_bvec_fn() returns bvec->bv_len on success. So we have to check
    against this value. But in case of fs_optimization merge we compare
    with wrong value. This patch must be included in
    b428cd6da7e6559aca69aa2e3a526037d3f20403
    But accidentally i've forgot to add this in the initial patch.
    To make things straight let's replace all such checks.
    In fact this makes code easy to understand.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

03 Mar, 2010

1 commit


01 Mar, 2010

1 commit

  • merge_bvec_fn() returns bvec->bv_len on success. So we have to check
    against this value. But in case of fs_optimization merge we compare
    with wrong value. This patch must be included in
    b428cd6da7e6559aca69aa2e3a526037d3f20403
    But accidentally i've forgot to add this in the initial patch.
    To make things straight let's replace all such checks.
    In fact this makes code easy to understand.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

26 Feb, 2010

1 commit


05 Feb, 2010

1 commit

  • In commit 451a9ebf653d28337ba53ed5b4b70b0b9543cca1 bio_alloc_bioset()
    was refactored not to take NULL as a valid argument for bs. This patch
    changes the comment for that function accordingly. Currently, passing
    NULL as argument to parameter bs would result in a NULL pointer
    dereference.

    Signed-off-by: Jaak Ristioja
    Signed-off-by: Jiri Kosina

    Jaak Ristioja
     

28 Jan, 2010

1 commit

  • We have to properly decrease bi_size in order to merge_bvec_fn return
    right result. Otherwise this result in false merge rejects for two
    absolutely valid bio_vecs. This may cause significant performance
    penalty for example fs_block_size == 1k and block device is raid0 with
    small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in
    to bio which already has 6 fs-blocks.

    Cc:
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

19 Jan, 2010

1 commit


10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

26 Nov, 2009

1 commit

  • Mtdblock driver doesn't call flush_dcache_page for pages in request. So,
    this causes problems on architectures where the icache doesn't fill from
    the dcache or with dcache aliases. The patch fixes this.

    The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
    pointless empty cache-thrashing loops on architectures for which
    flush_dcache_page() is a no-op. Every architecture was provided with this
    flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
    equal 1 or do nothing otherwise.

    See "fix mtd_blkdevs problem with caches on some architectures" discussion
    on LKML for more information.

    Signed-off-by: Ilya Loginov
    Cc: Ingo Molnar
    Cc: David Woodhouse
    Cc: Peter Horton
    Cc: "Ed L. Cashin"
    Signed-off-by: Jens Axboe

    Ilya Loginov
     

02 Nov, 2009

2 commits


02 Oct, 2009

1 commit


11 Jul, 2009

1 commit

  • I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
    the block layer mapping API (2.6.28).

    Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:

    http://www.spinics.net/lists/linux-scsi/msg37135.html

    =
    The semantics of SG_DXFER_TO_FROM_DEV were:
    - copy user space buffer to kernel (LLD) buffer
    - do SCSI command which is assumed to be of the DATA_IN
    (data from device) variety. This would overwrite
    some or all of the kernel buffer
    - copy kernel (LLD) buffer back to the user space.

    The idea was to detect short reads by filling the original
    user space buffer with some marker bytes ("0xec" it would
    seem in this report). The "resid" value is a better way
    of detecting short reads but that was only added this century
    and requires co-operation from the LLD.
    =

    This patch changes the block layer mapping API to support this
    semantics. This simply adds another field to struct rq_map_data and
    enables __bio_copy_iov() to copy data from user space even with READ
    requests.

    It's better to add the flags field and kills null_mapped and the new
    from_user fields in struct rq_map_data but that approach makes it
    difficult to send this patch to stable trees because st and osst
    drivers use struct rq_map_data (they were converted to use the block
    layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
    mapping API.

    zhou sf reported this regiression and tested this patch:

    http://www.spinics.net/lists/linux-scsi/msg37128.html
    http://www.spinics.net/lists/linux-scsi/msg37168.html

    Reported-by: zhou sf
    Tested-by: zhou sf
    Cc: stable@kernel.org
    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

01 Jul, 2009

1 commit

  • This patch restores stacking ability to the block layer integrity
    infrastructure by creating a set of dedicated bip slabs. Each bip slab
    has an embedded bio_vec array at the end. This cuts down on memory
    allocations and also simplifies the code compared to the original bvec
    version. Only the largest bip slab is backed by a mempool. The pool is
    contained in the bio_set so stacking drivers can ensure forward
    progress.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

16 Jun, 2009

1 commit


13 Jun, 2009

1 commit


12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

11 Jun, 2009

1 commit

  • As reported by sparse:

    fs/bio.c:720:13: warning: incorrect type in assignment (different address spaces)
    fs/bio.c:720:13: expected char *iov_addr
    fs/bio.c:720:13: got void [noderef] *
    fs/bio.c:724:36: warning: incorrect type in argument 2 (different address spaces)
    fs/bio.c:724:36: expected void const [noderef] *from
    fs/bio.c:724:36: got char *iov_addr

    Signed-off-by: Michal Simek
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Michal Simek
     

10 Jun, 2009

1 commit

  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

23 May, 2009

3 commits


19 May, 2009

1 commit

  • When a read bio_copy_kern() request fails, the content of the bounce
    buffer is not copied back. However, as request failure doesn't
    necessarily mean complete failure, the buffer state can be useful.
    This behavior is also inconsistent with the user map counterpart and
    causes the subtle difference between bounced and unbounced IO causes
    confusion.

    This patch makes bio_copy_kern_endio() ignore @err and always copy
    back data on request completion.

    Signed-off-by: Tejun Heo
    Cc: Boaz Harrosh
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

29 Apr, 2009

1 commit


22 Apr, 2009

2 commits

  • Impact: remove possible deadlock condition

    There is no reason to use mempool backed allocation for map functions.
    Also, because kern mapping is used inside LLDs (e.g. for EH), using
    mempool backed allocation can lead to deadlock under extreme
    conditions (mempool already consumed by the time a request reached EH
    and requests are blocked on EH).

    Switch copy/map functions to bio_kmalloc().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Impact: fix bio_kmalloc() and its destruction path

    bio_kmalloc() was broken in two ways.

    * bvec_alloc_bs() first allocates bvec using kmalloc() and then
    ignores it and allocates again like non-kmalloc bvecs.

    * bio_kmalloc_destructor() didn't check for and free bio integrity
    data.

    This patch fixes the above problems. kmalloc patch is separated out
    from bio_alloc_bioset() and allocates the requested number of bvecs as
    inline bvecs.

    * bio_alloc_bioset() no longer takes NULL @bs. None other than
    bio_kmalloc() used it and outside users can't know how it was
    allocated anyway.

    * Define and use BIO_POOL_NONE so that pool index check in
    bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
    there.

    * Relocate destructors on top of each allocation function so that how
    they're used is more clear.

    Jens Axboe suggested allocating bvecs inline.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo