26 Feb, 2013

3 commits

  • Mentioning "bad disk number -1" exposes irrelevant internal detail.
    Just say they are inactive and must be removed.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Create_stripe_zones returns an error slightly differently to
    raid0_run and to raid0_takeover_*.

    The error returned used by the second was wrong and an error would
    result in mddev->private being set to NULL and sooner or later a
    crash.

    So never return NULL, return ERR_PTR(err), not NULL from
    create_stripe_zones.

    This bug has been present since 2.6.35 so the fix is suitable
    for any kernel since then.

    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     
  • You cannot resize a RAID0 array (in terms of making the devices
    bigger), but the code doesn't entirely stop you.
    So:

    disable setting of the available size on each device for
    RAID0 and Linear devices. This must not change as doing so
    can change the effective layout of data.

    Make sure that the size that raid0_size() reports is accurate,
    but rounding devices sizes to chunk sizes. As the device sizes
    cannot change now, this isn't so important, but it is best to be
    safe.

    Without this change:
    mdadm --grow /dev/md0 -z max
    mdadm --grow /dev/md0 -Z max
    then read to the end of the array

    can cause a BUG in a RAID0 array.

    These bugs have been present ever since it became possible
    to resize any device, which is a long time. So the fix is
    suitable for any -stable kerenl.

    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     

11 Oct, 2012

1 commit


20 Sep, 2012

1 commit

  • The WRITE SAME command supported on some SCSI devices allows the same
    block to be efficiently replicated throughout a block range. Only a
    single logical block is transferred from the host and the storage device
    writes the same data to all blocks described by the I/O.

    This patch implements support for WRITE SAME in the block layer. The
    blkdev_issue_write_same() function can be used by filesystems and block
    drivers to replicate a buffer across a block range. This can be used to
    efficiently initialize software RAID devices, etc.

    Signed-off-by: Martin K. Petersen
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

03 Apr, 2012

1 commit


02 Apr, 2012

1 commit


19 Mar, 2012

2 commits

  • These personalities currently set a max request size of one page
    when any member device has a merge_bvec_fn because they don't
    bother to call that function.

    This causes extra works in splitting and combining requests.

    So make the extra effort to call the merge_bvec_fn when it exists
    so that we end up with larger requests out the bottom.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
    mddev. However it uses the 'safe' version of list_for_each_entry,
    and so requires the extra variable, but doesn't include 'safe' in the
    name, which is useful documentation.

    Consequently some places use this safe version without needing it, and
    many use an explicity list_for_each entry.

    So:
    - rename rdev_for_each to rdev_for_each_safe
    - create a new rdev_for_each which uses the plain
    list_for_each_entry,
    - use the 'safe' version only where needed, and convert all other
    list_for_each_entry calls to use rdev_for_each.

    Signed-off-by: NeilBrown

    NeilBrown
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

05 Nov, 2011

1 commit

  • * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
    block: don't call blk_drain_queue() if elevator is not up
    blk-throttle: use queue_is_locked() instead of lockdep_is_held()
    blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
    blk-throttle: Free up policy node associated with deleted rule
    block: warn if tag is greater than real_max_depth.
    block: make gendisk hold a reference to its queue
    blk-flush: move the queue kick into
    blk-flush: fix invalid BUG_ON in blk_insert_flush
    block: Remove the control of complete cpu from bio.
    block: fix a typo in the blk-cgroup.h file
    block: initialize the bounce pool if high memory may be added later
    block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
    block: drop @tsk from attempt_plug_merge() and explain sync rules
    block: make get_request[_wait]() fail if queue is dead
    block: reorganize throtl_get_tg() and blk_throtl_bio()
    block: reorganize queue draining
    block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
    block: pass around REQ_* flags instead of broken down booleans during request alloc/free
    block: move blk_throtl prototypes to block/blk.h
    block: fix genhd refcounting in blkio_policy_parse_and_set()
    ...

    Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
    and making the request functions be of type "void" instead of "int" in
    - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
    - drivers/staging/zram/zram_drv.c

    Linus Torvalds
     

01 Nov, 2011

1 commit


11 Oct, 2011

4 commits


07 Oct, 2011

2 commits


12 Sep, 2011

1 commit

  • There is very little benefit in allowing to let a ->make_request
    instance update the bios device and sector and loop around it in
    __generic_make_request when we can archive the same through calling
    generic_make_request from the driver and letting the loop in
    generic_make_request handle it.

    Note that various drivers got the return value from ->make_request and
    returned non-zero values for errors.

    Signed-off-by: Christoph Hellwig
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Mar, 2011

1 commit

  • MD and DM create a new bio_set for every metadevice. Each bio_set has an
    integrity mempool attached regardless of whether the metadevice is
    capable of passing integrity metadata. This is a waste of memory.

    Instead we defer the allocation decision to MD and DM since we know at
    metadevice creation time whether integrity passthrough is needed or not.

    Automatic integrity mempool allocation can then be removed from
    bioset_create() and we make an explicit integrity allocation for the
    fs_bio_set.

    Signed-off-by: Martin K. Petersen
    Reported-by: Zdenek Kabelac
    Acked-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

10 Mar, 2011

2 commits


21 Feb, 2011

1 commit

  • blk_throtl_exit assumes that ->queue_lock still exists,
    so make sure that it does.
    To do this, we stop redirecting ->queue_lock to conf->device_lock
    and leave it pointing where it is initialised - __queue_lock.

    As the blk_plug functions check the ->queue_lock is held, we now
    take that spin_lock explicitly around the plug functions. We don't
    need the locking, just the warning removal.

    This is needed for any kernel with the blk_throtl code, which is
    which is 2.6.37 and later.

    Cc: stable@kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     

14 Feb, 2011

1 commit

  • Takeover raid1->raid0 not succeded. Kernel message is shown:
    "md/raid0:md126: too few disks (1 of 2) - aborting!"

    Problem was that we weren't updating ->raid_disks for that
    takeover, unlike all the others.

    Signed-off-by: Krzysztof Wojcik
    Signed-off-by: NeilBrown

    Krzysztof Wojcik
     

31 Jan, 2011

1 commit


10 Sep, 2010

1 commit

  • This patch converts md to support REQ_FLUSH/FUA instead of now
    deprecated REQ_HARDBARRIER. In the core part (md.c), the following
    changes are notable.

    * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
    processing of other requests and thus there is no reason to mark the
    queue congested while FLUSH/FUA is in progress.

    * REQ_FLUSH/FUA failures are final and its users don't need retry
    logic. Retry logic is removed.

    * Preflush needs to be issued to all member devices but FUA writes can
    be handled the same way as other writes - their processing can be
    deferred to request_queue of member devices. md_barrier_request()
    is renamed to md_flush_request() and simplified accordingly.

    For linear, raid0 and multipath, the core changes are enough. raid1,
    5 and 10 need the following conversions.

    * raid1: Handling of FLUSH/FUA bio's can simply be deferred to
    request_queues of member devices. Barrier related logic removed.

    * raid5: Queue draining logic dropped. FUA bit is propagated through
    biodrain and stripe resconstruction such that all the updated parts
    of the stripe are written out with FUA writes if any of the dirtying
    writes was FUA. preread_active_stripes handling in make_request()
    is updated as suggested by Neil Brown.

    * raid10: FUA bit needs to be propagated to write clones.

    linear, raid0, 1, 5 and 10 tested.

    Signed-off-by: Tejun Heo
    Reviewed-by: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Jun, 2010

3 commits

  • Only level 5 with layout=PARITY_N can be taken over to raid0 now.
    Lets allow level 4 either.

    Signed-off-by: Maciej Trela
    Signed-off-by: NeilBrown

    Maciej Trela
     
  • After takeover from raid5/10 -> raid0 mddev->layout is not cleared.

    Signed-off-by: Maciej Trela
    Signed-off-by: NeilBrown

    Maciej Trela
     
  • Most array level changes leave the list of devices largely unchanged,
    possibly causing one at the end to become redundant.
    However conversions between RAID0 and RAID10 need to renumber
    all devices (except 0).

    This renumbering is currently being done in the ->run method when the
    new personality takes over. However this is too late as the common
    code in md.c might already have invalidated some of the devices if
    they had a ->raid_disk number that appeared to high.

    Moving it into the ->takeover method is too early as the array is
    still active at that time and wrong ->raid_disk numbers could cause
    confusion.

    So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
    the new raid_disk number.
    Now the common code knows exactly which devices need to be renumbered,
    and which can be invalidated, and can do it all at a convenient time
    when the array is suspend.
    It can also update some symlinks in sysfs which previously were not be
    updated correctly.

    Reported-by: Maciej Trela
    Signed-off-by: NeilBrown

    NeilBrown
     

22 May, 2010

1 commit


18 May, 2010

5 commits


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Mar, 2010

1 commit

  • If a component device has a merge_bvec_fn then as we never call it
    we must ensure we never need to. Currently this is done by setting
    max_sector to 1 PAGE, however this does not stop a bio being created
    with several sub-page iovecs that would violate the merge_bvec_fn.

    So instead set max_segments to 1 and set the segment boundary to the
    same as a page boundary to ensure there is only ever one single-page
    segment of IO requested at a time.

    This can particularly be an issue when 'xen' is used as it is
    known to submit multiple small buffers in a single bio.

    Signed-off-by: NeilBrown
    Cc: stable@kernel.org

    NeilBrown
     

26 Feb, 2010

1 commit

  • The block layer calling convention is blk_queue_.
    blk_queue_max_sectors predates this practice, leading to some confusion.
    Rename the function to appropriately reflect that its intended use is to
    set max_hw_sectors.

    Also introduce a temporary wrapper for backwards compability. This can
    be removed after the merge window is closed.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

14 Dec, 2009

1 commit