20 Sep, 2012

2 commits

  • The WRITE SAME command supported on some SCSI devices allows the same
    block to be efficiently replicated throughout a block range. Only a
    single logical block is transferred from the host and the storage device
    writes the same data to all blocks described by the I/O.

    This patch implements support for WRITE SAME in the block layer. The
    blkdev_issue_write_same() function can be used by filesystems and block
    drivers to replicate a buffer across a block range. This can be used to
    efficiently initialize software RAID devices, etc.

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Remove special-casing of non-rw fs style requests (discard). The nomerge
    flags are consolidated in blk_types.h, and rq_mergeable() and
    bio_mergeable() have been modified to use them.

    bio_is_rw() is used in place of bio_has_data() a few places. This is
    done to to distinguish true reads and writes from other fs type requests
    that carry a payload (e.g. write same).

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

09 Sep, 2012

3 commits

  • Now that we've got generic code for freeing bios allocated from bio
    pools, this isn't needed anymore.

    This patch also makes bio_free() static, since without bi_destructor
    there should be no need for it to be called anywhere else.

    bio_free() is now only called from bio_put, so we can refactor those a
    bit - move some code from bio_put() to bio_free() and kill the redundant
    bio->bi_next = NULL.

    v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
    v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
    v7: No #define BIO_KMALLOC_POOL anymore

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Reusing bios is something that's been highly frowned upon in the past,
    but driver code keeps doing it anyways. If it's going to happen anyways,
    we should provide a generic method.

    This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
    was open coding it, by doing a bio_init() and resetting bi_destructor.

    This required reordering struct bio, but the block layer is not yet
    nearly fast enough for any cacheline effects to matter here.

    v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
    bio->bi_flags are saved.
    v6: Further commenting verbosity, per Tejun
    v9: Add a function comment

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • With the old code, when you allocate a bio from a bio pool you have to
    implement your own destructor that knows how to find the bio pool the
    bio was originally allocated from.

    This adds a new field to struct bio (bi_pool) and changes
    bio_alloc_bioset() to use it. This makes various bio destructors
    unnecessary, so they're then deleted.

    v6: Explain the temporary if statement in bio_put

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: NeilBrown
    CC: Alasdair Kergon
    CC: Nicholas Bellinger
    CC: Lars Ellenberg
    Acked-by: Tejun Heo
    Acked-by: Nicholas Bellinger
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

01 Aug, 2012

1 commit

  • This patch adds two new APIs get_kernel_pages() and get_kernel_page() that
    may be used to pin a vector of kernel addresses for IO. The initial user
    is expected to be NFS for allowing pages to be written to swap using
    aops->direct_IO(). Strictly speaking, swap-over-NFS only needs to pin one
    page for IO but it makes sense to express the API in terms of a vector and
    add a helper for pinning single pages.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Cc: Mark Salter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

07 Mar, 2012

1 commit

  • IO scheduling and cgroup are tied to the issuing task via io_context
    and cgroup of %current. Unfortunately, there are cases where IOs need
    to be routed via a different task which makes scheduling and cgroup
    limit enforcement applied completely incorrectly.

    For example, all bios delayed by blk-throttle end up being issued by a
    delayed work item and get assigned the io_context of the worker task
    which happens to serve the work item and dumped to the default block
    cgroup. This is double confusing as bios which aren't delayed end up
    in the correct cgroup and makes using blk-throttle and cfq propio
    together impossible.

    Any code which punts IO issuing to another task is affected which is
    getting more and more common (e.g. btrfs). As both io_context and
    cgroup are firmly tied to task including userland visible APIs to
    manipulate them, it makes a lot of sense to match up tasks to bios.

    This patch implements bio_associate_current() which associates the
    specified bio with %current. The bio will record the associated ioc
    and blkcg at that point and block layer will use the recorded ones
    regardless of which task actually ends up issuing the bio. bio
    release puts the associated ioc and blkcg.

    It grabs and remembers ioc and blkcg instead of the task itself
    because task may already be dead by the time the bio is issued making
    ioc and blkcg inaccessible and those are all block layer cares about.

    elevator_set_req_fn() is updated such that the bio elvdata is being
    allocated for is available to the elevator.

    This doesn't update block cgroup policies yet. Further patches will
    implement the support.

    -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in
    rq_ioc() to fix build breakage.

    Signed-off-by: Tejun Heo
    Cc: Vivek Goyal
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Tejun Heo
     

24 Oct, 2011

1 commit

  • bio originally has the functionality to set the complete cpu, but
    it is broken.

    Chirstoph said that "This code is unused, and from the all the
    discussions lately pretty obviously broken. The only thing keeping
    it serves is creating more confusion and possibly more bugs."

    And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
    with leaving cpu control to the request based drivers, they are the
    only ones that can toggle the setting anyway".

    So this patch tries to remove all the work of controling complete cpu
    from a bio.

    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     

23 Aug, 2011

1 commit

  • Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
    and lave REQ_META purely for marking requests as metadata in blktrace.

    All existing callers of REQ_META except for XFS are updated to also
    set REQ_PRIO for now.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 Aug, 2011

1 commit


20 Jun, 2011

1 commit


21 May, 2011

1 commit


10 Mar, 2011

2 commits

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch adds support for creating a queuing context outside
    of the queue itself. This enables us to batch up pieces of IO
    before grabbing the block device queue lock and submitting them to
    the IO scheduler.

    The context is created on the stack of the process and assigned in
    the task structure, so that we can auto-unplug it if we hit a schedule
    event.

    The current queue plugging happens implicitly if IO is submitted to
    an empty device, yet callers have to remember to unplug that IO when
    they are going to wait for it. This is an ugly API and has caused bugs
    in the past. Additionally, it requires hacks in the vm (->sync_page()
    callback) to handle that logic. By switching to an explicit plugging
    scheme we make the API a lot nicer and can get rid of the ->sync_page()
    hack in the vm.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Jan, 2011

1 commit

  • rq == &q->flush_rq was used to determine whether a rq is part of a
    flush sequence, which worked because all requests in a flush sequence
    were sequenced using the single dedicated request. This is about to
    change, so introduce REQ_FLUSH_SEQ flag to distinguish flush sequence
    requests.

    This patch doesn't cause any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

10 Nov, 2010

1 commit

  • REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
    at this point is:

    - various checks inside the block layer.
    - sanity checks in bio based drivers.
    - now unused bio_empty_barrier helper.
    - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
    but Xen really needs to sort out it's barrier situaton.
    - setting of ordered tags in uas - dead code copied from old scsi
    drivers.
    - scsi different retry for barriers - it's dead and should have been
    removed when flushes were converted to FS requests.
    - blktrace handling of barriers - removed. Someone who knows blktrace
    better should add support for REQ_FLUSH and REQ_FUA, though.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

15 Oct, 2010

1 commit

  • Previously we tracked whether the integrity metadata had been remapped
    using a request flag. This was fine for low-level retries. However, if
    an I/O was redriven by upper layers we would end up remapping again,
    causing the retry to fail.

    Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY
    which enables filesystems to notify lower layers that the bio in
    question has already been remapped.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

16 Sep, 2010

1 commit


10 Sep, 2010

3 commits

  • Currently __blk_rq_prep_clone() copies only REQ_WRITE and REQ_DISCARD.
    There's no reason to omit other command flags and REQ_FUA needs to be
    copied to implement FUA support in request-based dm.

    REQ_COMMON_MASK which specifies flags to be copied from bio to request
    already identifies all the command flags. Define REQ_CLONE_MASK to be
    the same as REQ_COMMON_MASK for clarity and make __blk_rq_prep_clone()
    copy all flags in the mask.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Now that the backend conversion is complete, export sequenced
    FLUSH/FUA capability through REQ_FLUSH/FUA flags. REQ_FLUSH means the
    device cache should be flushed before executing the request. REQ_FUA
    means that the data in the request should be on non-volatile media on
    completion.

    Block layer will choose the correct way of implementing the semantics
    and execute it. The request may be passed to the device directly if
    the device can handle it; otherwise, it will be sequenced using one or
    more proxy requests. Devices will never see REQ_FLUSH and/or FUA
    which it doesn't support.

    Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are
    never failed with -EOPNOTSUPP. If the underlying device doesn't
    support FLUSH/FUA, the block layer simply make those noop. IOW, it no
    longer distinguishes between writeback cache which doesn't support
    cache flush and writethrough/no cache. Devices which have WB cache
    w/o flush are very difficult to come by these days and there's nothing
    much we can do anyway, so it doesn't make sense to require everyone to
    implement -EOPNOTSUPP handling. This will simplify filesystems and
    block drivers as they can drop -EOPNOTSUPP retry logic for barriers.

    * QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into
    blk-flush.c.

    * REQ_FLUSH w/o data can also be directly passed to drivers without
    sequencing but some drivers assume that zero length requests don't
    have rq->bio which isn't true for these requests requiring the use
    of proxy requests.

    * REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are
    copied from bio to request.

    * WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and
    WRITE_FLUSH_FUA are added.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Filesystems will take all the responsibilities for ordering requests
    around commit writes and will only indicate how the commit writes
    themselves should be handled by block layers. This patch drops
    barrier ordering by queue draining from block layer. Ordering by
    draining implementation was somewhat invasive to request handling.
    List of notable changes follow.

    * Each queue has 1 bit color which is flipped on each barrier issue.
    This is used to track whether a given request is issued before the
    current barrier or not. REQ_ORDERED_COLOR flag and coloring
    implementation in __elv_add_request() are removed.

    * Requests which shouldn't be processed yet for draining were stalled
    by returning -EAGAIN from blk_do_ordered() according to the test
    result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
    This logic is removed.

    * Draining completion logic in elv_completed_request() removed.

    * All barrier sequence requests were queued to request queue and then
    trckled to lower layer according to progress and thus maintaining
    request orders during requeue was necessary. This is replaced by
    queueing the next request in the barrier sequence only after the
    current one is complete from blk_ordered_complete_seq(), which
    removes the need for multiple proxy requests in struct request_queue
    and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
    elv_insert().

    * As barriers no longer have ordering constraints, there's no need to
    dump the whole elevator onto the dispatch queue on each barrier.
    Insert barriers at the front instead.

    * If other barrier requests come to the front of the dispatch queue
    while one is already in progress, they are stored in
    q->pending_barriers and restored to dispatch queue one-by-one after
    each barrier completion from blk_ordered_complete_seq().

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Aug, 2010

1 commit

  • Secure discard is the same as discard except that all copies of the
    discarded sectors (perhaps created by garbage collection) must also be
    erased.

    Signed-off-by: Adrian Hunter
    Acked-by: Jens Axboe
    Cc: Kyungmin Park
    Cc: Madhusudhan Chikkature
    Cc: Christoph Hellwig
    Cc: Ben Gardiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Hunter
     

11 Aug, 2010

1 commit

  • These form the basis of the basic WRITE etc primitives, so we
    need them to be always visible. Otherwise we see errors like:

    mm/filemap.c:2164: error: 'REQ_WRITE' undeclared
    fs/read_write.c:362: error: 'REQ_WRITE' undeclared
    fs/splice.c:1108: error: 'REQ_WRITE' undeclared
    fs/aio.c:1496: error: 'REQ_WRITE' undeclared

    Reported-by: Randy Dunlap
    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Aug, 2010

1 commit

  • linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
    flags. This is fragile and caused breakage during BIO_RW_* flag
    rearrangement. The hardcoding is to avoid include dependency hell.

    Create linux/bio_types.h which contatins definitions for bio data
    structures and flags and include it from bio.h and fs.h, and make fs.h
    define all READ/WRITE related constants in terms of BIO_RW_* flags.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo