14 Jan, 2013

1 commit

  • bio completion didn't kick block_bio_complete TP. Only dm was
    explicitly triggering the TP on IO completion. This makes
    block_bio_complete TP useless for tracers which want to know about
    bios, and all other bio based drivers skip generating blktrace
    completion events.

    This patch makes all bio completions via bio_endio() generate
    block_bio_complete TP.

    * Explicit trace_block_bio_complete() invocation removed from dm and
    the trace point is unexported.

    * @rq dropped from trace_block_bio_complete(). bios may fly around
    w/o queue associated. Verifying and accessing the assocaited queue
    belongs to TP probes.

    * blktrace now gets both request and bio completions. Make it ignore
    bio completions if request completion path is happening.

    This makes all bio based drivers generate blktrace completion events
    properly and makes the block_bio_complete TP actually useful.

    v2: With this change, block_bio_complete TP could be invoked on sg
    commands which have bio's with %NULL bi_bdev. Update TP
    assignment code to check whether bio->bi_bdev is %NULL before
    dereferencing.

    Signed-off-by: Tejun Heo
    Original-patch-by: Namhyung Kim
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

23 Oct, 2012

1 commit


28 Sep, 2012

1 commit


20 Sep, 2012

1 commit

  • The WRITE SAME command supported on some SCSI devices allows the same
    block to be efficiently replicated throughout a block range. Only a
    single logical block is transferred from the host and the storage device
    writes the same data to all blocks described by the I/O.

    This patch implements support for WRITE SAME in the block layer. The
    blkdev_issue_write_same() function can be used by filesystems and block
    drivers to replicate a buffer across a block range. This can be used to
    efficiently initialize software RAID devices, etc.

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

09 Sep, 2012

6 commits

  • Previously, there was bio_clone() but it only allocated from the fs bio
    set; as a result various users were open coding it and using
    __bio_clone().

    This changes bio_clone() to become bio_clone_bioset(), and then we add
    bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
    the functionality the last patch adedd.

    This will also help in a later patch changing how bio cloning works.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: NeilBrown
    CC: Alasdair Kergon
    CC: Boaz Harrosh
    CC: Jeff Garzik
    Acked-by: Jeff Garzik
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
    different because there was some almost-duplicated code - this fixes
    some of that.

    The important change is that previously bio_kmalloc() always set
    bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
    bio_alloc_bioset(). This would cause bio_has_data() to return true; I
    don't know if this resulted in any actual bugs but it was certainly
    wrong.

    bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
    limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
    (BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
    at least they're enforced closer together and hopefully they will be
    fixed in a later patch.

    This'll also help with some future cleanups - there are a fair number of
    functions that allocate bios (e.g. bio_clone()), and now they don't have
    to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    v7: Re-add dropped comments, improv patch description
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Now that we've got generic code for freeing bios allocated from bio
    pools, this isn't needed anymore.

    This patch also makes bio_free() static, since without bi_destructor
    there should be no need for it to be called anywhere else.

    bio_free() is now only called from bio_put, so we can refactor those a
    bit - move some code from bio_put() to bio_free() and kill the redundant
    bio->bi_next = NULL.

    v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
    v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
    v7: No #define BIO_KMALLOC_POOL anymore

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Reusing bios is something that's been highly frowned upon in the past,
    but driver code keeps doing it anyways. If it's going to happen anyways,
    we should provide a generic method.

    This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
    was open coding it, by doing a bio_init() and resetting bi_destructor.

    This required reordering struct bio, but the block layer is not yet
    nearly fast enough for any cacheline effects to matter here.

    v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
    bio->bi_flags are saved.
    v6: Further commenting verbosity, per Tejun
    v9: Add a function comment

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Now that bios keep track of where they were allocated from,
    bio_integrity_alloc_bioset() becomes redundant.

    Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
    related functions and make them use bio->bi_pool.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Martin K. Petersen
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • With the old code, when you allocate a bio from a bio pool you have to
    implement your own destructor that knows how to find the bio pool the
    bio was originally allocated from.

    This adds a new field to struct bio (bi_pool) and changes
    bio_alloc_bioset() to use it. This makes various bio destructors
    unnecessary, so they're then deleted.

    v6: Explain the temporary if statement in bio_put

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: NeilBrown
    CC: Alasdair Kergon
    CC: Nicholas Bellinger
    CC: Lars Ellenberg
    Acked-by: Tejun Heo
    Acked-by: Nicholas Bellinger
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

26 Aug, 2012

1 commit

  • Pull block-related fixes from Jens Axboe:

    - Improvements to the buffered and direct write IO plugging from
    Fengguang.

    - Abstract out the mapping of a bio in a request, and use that to
    provide a blk_bio_map_sg() helper. Useful for mapping just a bio
    instead of a full request.

    - Regression fix from Hugh, fixing up a patch that went into the
    previous release cycle (and marked stable, too) attempting to prevent
    a loop in __getblk_slow().

    - Updates to discard requests, fixing up the sizing and how we align
    them. Also a change to disallow merging of discard requests, since
    that doesn't really work properly yet.

    - A few drbd fixes.

    - Documentation updates.

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: replace __getblk_slow misfix by grow_dev_page fix
    drbd: Write all pages of the bitmap after an online resize
    drbd: Finish requests that completed while IO was frozen
    drbd: fix drbd wire compatibility for empty flushes
    Documentation: update tunable options in block/cfq-iosched.txt
    Documentation: update tunable options in block/cfq-iosched.txt
    Documentation: update missing index files in block/00-INDEX
    block: move down direct IO plugging
    block: remove plugging at buffered write time
    block: disable discard request merge temporarily
    bio: Fix potential memory leak in bio_find_or_create_slab()
    block: Don't use static to define "void *p" in show_partition_start()
    block: Add blk_bio_map_sg() helper
    block: Introduce __blk_segment_map_sg() helper
    fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices
    block: split discard into aligned requests
    block: reorganize rounding of max_discard_sectors

    Linus Torvalds
     

09 Aug, 2012

1 commit


04 Aug, 2012

1 commit


30 May, 2012

1 commit

  • Merge block/IO core bits from Jens Axboe:
    "This is a bit bigger on the core side than usual, but that is purely
    because we decided to hold off on parts of Tejun's submission on 3.4
    to give it a bit more time to simmer. As a consequence, it's seen a
    long cycle in for-next.

    It contains:

    - Bug fix from Dan, wrong locking type.
    - Relax splice gifting restriction from Eric.
    - A ton of updates from Tejun, primarily for blkcg. This improves
    the code a lot, making the API nicer and cleaner, and also includes
    fixes for how we handle and tie policies and re-activate on
    switches. The changes also include generic bug fixes.
    - A simple fix from Vivek, along with a fix for doing proper delayed
    allocation of the blkcg stats."

    Fix up annoying conflict just due to different merge resolution in
    Documentation/feature-removal-schedule.txt

    * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
    blkcg: tg_stats_alloc_lock is an irq lock
    vmsplice: relax alignement requirements for SPLICE_F_GIFT
    blkcg: use radix tree to index blkgs from blkcg
    blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
    block: fix elvpriv allocation failure handling
    block: collapse blk_alloc_request() into get_request()
    blkcg: collapse blkcg_policy_ops into blkcg_policy
    blkcg: embed struct blkg_policy_data in policy specific data
    blkcg: mass rename of blkcg API
    blkcg: style cleanups for blk-cgroup.h
    blkcg: remove blkio_group->path[]
    blkcg: blkg_rwstat_read() was missing inline
    blkcg: shoot down blkgs if all policies are deactivated
    blkcg: drop stuff unused after per-queue policy activation update
    blkcg: implement per-queue policy activation
    blkcg: add request_queue->root_blkg
    blkcg: make request_queue bypassing on allocation
    blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
    blkcg: make blkg_conf_prep() take @pol and return with queue lock held
    blkcg: remove static policy ID enums
    ...

    Linus Torvalds
     

11 May, 2012

1 commit

  • The number of bio_get_nr_vecs() is passed down via bio_alloc() to
    bvec_alloc_bs(), which fails the bio allocation if
    nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
    unexpected bio allocation failure.
    Limiting to queue_max_segments() is not sufficient, as max_segments
    also might be very large.

    bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs > BIO_MAX_PAGES
    bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
    bio_alloc(GFP_NOIO, nvecs)
    xfs_alloc_ioend_bio()

    Signed-off-by: Bernd Schubert
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Bernd Schubert
     

02 Apr, 2012

1 commit

  • cgroup/for-3.5 contains the following changes which blk-cgroup needs
    to proceed with the on-going cleanup.

    * Dynamic addition and removal of cftypes to make config/stat file
    handling modular for policies.

    * cgroup removal update to not wait for css references to drain to fix
    blkcg removal hang caused by cfq caching cfqgs.

    Pull in cgroup/for-3.5 into block/for-3.5/core. This causes the
    following conflicts in block/blk-cgroup.c.

    * 761b3ef50e "cgroup: remove cgroup_subsys argument from callbacks"
    conflicts with blkiocg_pre_destroy() addition and blkiocg_attach()
    removal. Resolved by removing @subsys from all subsys methods.

    * 676f7c8f84 "cgroup: relocate cftype and cgroup_subsys definitions in
    controllers" conflicts with ->pre_destroy() and ->attach() updates
    and removal of modular config. Resolved by dropping forward
    declarations of the methods and applying updates to the relocated
    blkio_subsys.

    * 4baf6e3325 "cgroup: convert all non-memcg controllers to the new
    cftype interface" builds upon the previous item. Resolved by adding
    ->base_cftypes to the relocated blkio_subsys.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

07 Mar, 2012

1 commit

  • IO scheduling and cgroup are tied to the issuing task via io_context
    and cgroup of %current. Unfortunately, there are cases where IOs need
    to be routed via a different task which makes scheduling and cgroup
    limit enforcement applied completely incorrectly.

    For example, all bios delayed by blk-throttle end up being issued by a
    delayed work item and get assigned the io_context of the worker task
    which happens to serve the work item and dumped to the default block
    cgroup. This is double confusing as bios which aren't delayed end up
    in the correct cgroup and makes using blk-throttle and cfq propio
    together impossible.

    Any code which punts IO issuing to another task is affected which is
    getting more and more common (e.g. btrfs). As both io_context and
    cgroup are firmly tied to task including userland visible APIs to
    manipulate them, it makes a lot of sense to match up tasks to bios.

    This patch implements bio_associate_current() which associates the
    specified bio with %current. The bio will record the associated ioc
    and blkcg at that point and block layer will use the recorded ones
    regardless of which task actually ends up issuing the bio. bio
    release puts the associated ioc and blkcg.

    It grabs and remembers ioc and blkcg instead of the task itself
    because task may already be dead by the time the bio is issued making
    ioc and blkcg inaccessible and those are all block layer cares about.

    elevator_set_req_fn() is updated such that the bio elvdata is being
    allocated for is available to the elevator.

    This doesn't update block cgroup policies yet. Further patches will
    implement the support.

    -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in
    rq_ioc() to fix build breakage.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Tejun Heo
     

29 Feb, 2012

1 commit


09 Feb, 2012

1 commit

  • There were two places bio_get_nr_vecs() could overflow:

    First, it did a left shift to convert from sectors to bytes immediately
    before dividing by PAGE_SIZE. If PAGE_SIZE ever was less than 512 a great
    many things would break, so dividing by PAGE_SIZE >> 9 is safe and will
    generate smaller code too.

    The nastier overflow was in the DIV_ROUND_UP() (that's what the code was
    effectively doing, anyways). If n + d overflowed, the whole thing would
    return 0 which breaks things rather effectively.

    bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the
    DIV_ROUND_UP() is silly; we could do a straight divide except if a
    device's queue_max_sectors was less than PAGE_SIZE we'd return 0. So we
    just add 1; this should always be safe - things will break badly if
    bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start
    failing) but it's queue_max_segments that must guard against this, if
    queue_max_sectors is preventing this from happen things are going to
    explode on architectures with different PAGE_SIZE.

    Signed-off-by: Kent Overstreet
    Cc: Tejun Heo
    Acked-by: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

16 Nov, 2011

1 commit

  • This is just a cleanup patch to silence a static checker warning.

    The problem is that we cap "nr_iovecs" so it can't be larger than
    "UIO_MAXIOV" but we don't check for negative values. It turns out this is
    prevented at other layers, but logically it doesn't make sense to have
    negative nr_iovecs so making it unsigned is nicer.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Dan Carpenter
     

24 Oct, 2011

1 commit

  • bio originally has the functionality to set the complete cpu, but
    it is broken.

    Chirstoph said that "This code is unused, and from the all the
    discussions lately pretty obviously broken. The only thing keeping
    it serves is creating more confusion and possibly more bugs."

    And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
    with leaving cpu control to the request based drivers, they are the
    only ones that can toggle the setting anyway".

    So this patch tries to remove all the work of controling complete cpu
    from a bio.

    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     

27 May, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

23 Mar, 2011

1 commit

  • printk()s without a priority level default to KERN_WARNING. To reduce
    noise at KERN_WARNING, this patch set the priority level appriopriately
    for unleveled printks()s. This should be useful to folks that look at
    dmesg warnings closely.

    Signed-off-by: Mandeep Singh Baines
    Cc: Jens Axboe
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     

17 Mar, 2011

1 commit

  • MD and DM create a new bio_set for every metadevice. Each bio_set has an
    integrity mempool attached regardless of whether the metadevice is
    capable of passing integrity metadata. This is a waste of memory.

    Instead we defer the allocation decision to MD and DM since we know at
    metadevice creation time whether integrity passthrough is needed or not.

    Automatic integrity mempool allocation can then be removed from
    bioset_create() and we make an explicit integrity allocation for the
    fs_bio_set.

    Signed-off-by: Martin K. Petersen
    Reported-by: Zdenek Kabelac
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

08 Mar, 2011

1 commit


10 Nov, 2010

2 commits


08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 Mar, 2010

1 commit


08 Mar, 2010

2 commits

  • Conflicts:
    Documentation/filesystems/proc.txt
    arch/arm/mach-u300/include/mach/debug-macro.S
    drivers/net/qlge/qlge_ethtool.c
    drivers/net/qlge/qlge_main.c
    drivers/net/typhoon.c

    Jiri Kosina
     
  • merge_bvec_fn() returns bvec->bv_len on success. So we have to check
    against this value. But in case of fs_optimization merge we compare
    with wrong value. This patch must be included in
    b428cd6da7e6559aca69aa2e3a526037d3f20403
    But accidentally i've forgot to add this in the initial patch.
    To make things straight let's replace all such checks.
    In fact this makes code easy to understand.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

03 Mar, 2010

1 commit


01 Mar, 2010

1 commit

  • merge_bvec_fn() returns bvec->bv_len on success. So we have to check
    against this value. But in case of fs_optimization merge we compare
    with wrong value. This patch must be included in
    b428cd6da7e6559aca69aa2e3a526037d3f20403
    But accidentally i've forgot to add this in the initial patch.
    To make things straight let's replace all such checks.
    In fact this makes code easy to understand.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

26 Feb, 2010

1 commit


05 Feb, 2010

1 commit

  • In commit 451a9ebf653d28337ba53ed5b4b70b0b9543cca1 bio_alloc_bioset()
    was refactored not to take NULL as a valid argument for bs. This patch
    changes the comment for that function accordingly. Currently, passing
    NULL as argument to parameter bs would result in a NULL pointer
    dereference.

    Signed-off-by: Jaak Ristioja
    Signed-off-by: Jiri Kosina

    Jaak Ristioja
     

28 Jan, 2010

1 commit

  • We have to properly decrease bi_size in order to merge_bvec_fn return
    right result. Otherwise this result in false merge rejects for two
    absolutely valid bio_vecs. This may cause significant performance
    penalty for example fs_block_size == 1k and block device is raid0 with
    small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in
    to bio which already has 6 fs-blocks.

    Cc:
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

19 Jan, 2010

1 commit


10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds