24 Mar, 2013

1 commit

  • Just a little convenience macro - main reason to add it now is preparing
    for immutable bio vecs, it'll reduce the size of the patch that puts
    bi_sector/bi_size/bi_idx into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Lars Ellenberg
    CC: Jiri Kosina
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: linux-s390@vger.kernel.org
    CC: Chris Mason
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse

    Kent Overstreet
     

11 Oct, 2012

2 commits


02 Apr, 2012

1 commit


19 Mar, 2012

2 commits

  • These personalities currently set a max request size of one page
    when any member device has a merge_bvec_fn because they don't
    bother to call that function.

    This causes extra works in splitting and combining requests.

    So make the extra effort to call the merge_bvec_fn when it exists
    so that we end up with larger requests out the bottom.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
    mddev. However it uses the 'safe' version of list_for_each_entry,
    and so requires the extra variable, but doesn't include 'safe' in the
    name, which is useful documentation.

    Consequently some places use this safe version without needing it, and
    many use an explicity list_for_each entry.

    So:
    - rename rdev_for_each to rdev_for_each_safe
    - create a new rdev_for_each which uses the plain
    list_for_each_entry,
    - use the 'safe' version only where needed, and convert all other
    list_for_each_entry calls to use rdev_for_each.

    Signed-off-by: NeilBrown

    NeilBrown
     

23 Dec, 2011

1 commit

  • commit d70ed2e4fafdbef0800e73942482bb075c21578b
    broke hot-add to a linear array.
    After that commit, metadata if not written to devices until they
    have been fully integrated into the array as determined by
    saved_raid_disk. That patch arranged to clear that field after
    a recovery completed.

    However for linear arrays, there is no recovery - the integration is
    instantaneous. So we need to explicitly clear the saved_raid_disk
    field.

    Signed-off-by: NeilBrown

    NeilBrown
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

05 Nov, 2011

1 commit

  • * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
    block: don't call blk_drain_queue() if elevator is not up
    blk-throttle: use queue_is_locked() instead of lockdep_is_held()
    blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
    blk-throttle: Free up policy node associated with deleted rule
    block: warn if tag is greater than real_max_depth.
    block: make gendisk hold a reference to its queue
    blk-flush: move the queue kick into
    blk-flush: fix invalid BUG_ON in blk_insert_flush
    block: Remove the control of complete cpu from bio.
    block: fix a typo in the blk-cgroup.h file
    block: initialize the bounce pool if high memory may be added later
    block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
    block: drop @tsk from attempt_plug_merge() and explain sync rules
    block: make get_request[_wait]() fail if queue is dead
    block: reorganize throtl_get_tg() and blk_throtl_bio()
    block: reorganize queue draining
    block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
    block: pass around REQ_* flags instead of broken down booleans during request alloc/free
    block: move blk_throtl prototypes to block/blk.h
    block: fix genhd refcounting in blkio_policy_parse_and_set()
    ...

    Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
    and making the request functions be of type "void" instead of "int" in
    - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
    - drivers/staging/zram/zram_drv.c

    Linus Torvalds
     

01 Nov, 2011

1 commit


11 Oct, 2011

5 commits


12 Sep, 2011

1 commit

  • There is very little benefit in allowing to let a ->make_request
    instance update the bios device and sector and loop around it in
    __generic_make_request when we can archive the same through calling
    generic_make_request from the driver and letting the loop in
    generic_make_request handle it.

    Note that various drivers got the return value from ->make_request and
    returned non-zero values for errors.

    Signed-off-by: Christoph Hellwig
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 Jul, 2011

1 commit


17 Mar, 2011

1 commit

  • MD and DM create a new bio_set for every metadevice. Each bio_set has an
    integrity mempool attached regardless of whether the metadevice is
    capable of passing integrity metadata. This is a waste of memory.

    Instead we defer the allocation decision to MD and DM since we know at
    metadevice creation time whether integrity passthrough is needed or not.

    Automatic integrity mempool allocation can then be removed from
    bioset_create() and we make an explicit integrity allocation for the
    fs_bio_set.

    Signed-off-by: Martin K. Petersen
    Reported-by: Zdenek Kabelac
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

10 Mar, 2011

2 commits


21 Feb, 2011

1 commit

  • blk_throtl_exit assumes that ->queue_lock still exists,
    so make sure that it does.
    To do this, we stop redirecting ->queue_lock to conf->device_lock
    and leave it pointing where it is initialised - __queue_lock.

    As the blk_plug functions check the ->queue_lock is held, we now
    take that spin_lock explicitly around the plug functions. We don't
    need the locking, just the warning removal.

    This is needed for any kernel with the blk_throtl code, which is
    which is 2.6.37 and later.

    Cc: stable@kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     

10 Sep, 2010

1 commit

  • This patch converts md to support REQ_FLUSH/FUA instead of now
    deprecated REQ_HARDBARRIER. In the core part (md.c), the following
    changes are notable.

    * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
    processing of other requests and thus there is no reason to mark the
    queue congested while FLUSH/FUA is in progress.

    * REQ_FLUSH/FUA failures are final and its users don't need retry
    logic. Retry logic is removed.

    * Preflush needs to be issued to all member devices but FUA writes can
    be handled the same way as other writes - their processing can be
    deferred to request_queue of member devices. md_barrier_request()
    is renamed to md_flush_request() and simplified accordingly.

    For linear, raid0 and multipath, the core changes are enough. raid1,
    5 and 10 need the following conversions.

    * raid1: Handling of FLUSH/FUA bio's can simply be deferred to
    request_queues of member devices. Barrier related logic removed.

    * raid5: Queue draining logic dropped. FUA bit is propagated through
    biodrain and stripe resconstruction such that all the updated parts
    of the stripe are written out with FUA writes if any of the dirtying
    writes was FUA. preread_active_stripes handling in make_request()
    is updated as suggested by Neil Brown.

    * raid10: FUA bit needs to be propagated to write clones.

    linear, raid0, 1, 5 and 10 tested.

    Signed-off-by: Tejun Heo
    Reviewed-by: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 May, 2010

1 commit


18 May, 2010

3 commits


17 May, 2010

1 commit

  • Since commit ef286f6fa673cd7fb367e1b145069d8dbfcc6081
    it has been important that each personality clears
    ->private in the ->stop() function, or sets it to a
    attribute group to be removed.
    linear.c doesn't. This can sometimes lead to an oops,
    though it doesn't always.

    Suitable for 2.6.33-stable and 2.6.34.

    Signed-off-by: NeilBrown
    Cc: stable@kernel.org

    NeilBrown
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit

  • If a component device has a merge_bvec_fn then as we never call it
    we must ensure we never need to. Currently this is done by setting
    max_sector to 1 PAGE, however this does not stop a bio being created
    with several sub-page iovecs that would violate the merge_bvec_fn.

    So instead set max_segments to 1 and set the segment boundary to the
    same as a page boundary to ensure there is only ever one single-page
    segment of IO requested at a time.

    This can particularly be an issue when 'xen' is used as it is
    known to submit multiple small buffers in a single bio.

    Signed-off-by: NeilBrown
    Cc: stable@kernel.org

    NeilBrown
     

26 Feb, 2010

1 commit

  • The block layer calling convention is blk_queue_.
    blk_queue_max_sectors predates this practice, leading to some confusion.
    Rename the function to appropriately reflect that its intended use is to
    set max_hw_sectors.

    Also introduce a temporary wrapper for backwards compability. This can
    be removed after the merge window is closed.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

14 Dec, 2009

2 commits

  • Suggested by Oren Held

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Previously barriers were only supported on RAID1. This is because
    other levels requires synchronisation across all devices and so needed
    a different approach.
    Here is that approach.

    When a barrier arrives, we send a zero-length barrier to every active
    device. When that completes - and if the original request was not
    empty - we submit the barrier request itself (with the barrier flag
    cleared) and then submit a fresh load of zero length barriers.

    The barrier request itself is asynchronous, but any subsequent
    request will block until the barrier completes.

    The reason for clearing the barrier flag is that a barrier request is
    allowed to fail. If we pass a non-empty barrier through a striping
    raid level it is conceivable that part of it could succeed and part
    could fail. That would be way too hard to deal with.
    So if the first run of zero length barriers succeed, we assume all is
    sufficiently well that we send the request and ignore errors in the
    second run of barriers.

    RAID5 needs extra care as write requests may not have been submitted
    to the underlying devices yet. So we flush the stripe cache before
    proceeding with the barrier.

    Note that the second set of zero-length barriers are submitted
    immediately after the original request is submitted. Thus when
    a personality finds mddev->barrier to be set during make_request,
    it should not return from make_request until the corresponding
    per-device request(s) have been queued.

    That will be done in later patches.

    Signed-off-by: NeilBrown
    Reviewed-by: Andre Noll

    NeilBrown
     

23 Sep, 2009

1 commit


11 Sep, 2009

1 commit


03 Aug, 2009

2 commits

  • As revalidate_disk calls check_disk_size_change, it will cause
    any capacity change of a gendisk to be propagated to the blockdev
    inode. So use that instead of mucking about with locks and
    i_size_write.

    Also add a call to revalidate_disk in do_md_run and a few other places
    where the gendisk capacity is changed.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This patch replaces md_integrity_check() by two new public functions:
    md_integrity_register() and md_integrity_add_rdev() which are both
    personality-independent.

    md_integrity_register() is called from the ->run and ->hot_remove
    methods of all personalities that support data integrity. The
    function iterates over the component devices of the array and
    determines if all active devices are integrity capable and if their
    profiles match. If this is the case, the common profile is registered
    for the mddev via blk_integrity_register().

    The second new function, md_integrity_add_rdev() is called from the
    ->hot_add_disk methods, i.e. whenever a new device is being added
    to a raid array. If the new device does not support data integrity,
    or has a profile different from the one already registered, data
    integrity for the mddev is disabled.

    For raid0 and linear, only the call to md_integrity_register() from
    the ->run method is necessary.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     

01 Jul, 2009

1 commit


18 Jun, 2009

2 commits

  • Current, when we update the 'conf' structure, when adding a
    drive to a linear array, we keep the old version around until
    the array is finally stopped, as it is not safe to free it
    immediately.

    Now that we have rcu protection on all accesses to 'conf',
    we can use call_rcu to free it more promptly.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Due to the lack of memory ordering guarantees, we may have races around
    mddev->conf.

    In particular, the correct contents of the structure we get from
    dereferencing ->private might not be visible to this CPU yet, and
    they might not be correct w.r.t mddev->raid_disks.

    This patch addresses the problem using rcu protection to avoid
    such race conditions.

    Signed-off-by: SandeepKsinha
    Signed-off-by: NeilBrown

    SandeepKsinha