26 Feb, 2010

1 commit

  • The block layer calling convention is blk_queue_.
    blk_queue_max_sectors predates this practice, leading to some confusion.
    Rename the function to appropriately reflect that its intended use is to
    set max_hw_sectors.

    Also introduce a temporary wrapper for backwards compability. This can
    be removed after the merge window is closed.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

14 Dec, 2009

2 commits

  • Suggested by Oren Held

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Previously barriers were only supported on RAID1. This is because
    other levels requires synchronisation across all devices and so needed
    a different approach.
    Here is that approach.

    When a barrier arrives, we send a zero-length barrier to every active
    device. When that completes - and if the original request was not
    empty - we submit the barrier request itself (with the barrier flag
    cleared) and then submit a fresh load of zero length barriers.

    The barrier request itself is asynchronous, but any subsequent
    request will block until the barrier completes.

    The reason for clearing the barrier flag is that a barrier request is
    allowed to fail. If we pass a non-empty barrier through a striping
    raid level it is conceivable that part of it could succeed and part
    could fail. That would be way too hard to deal with.
    So if the first run of zero length barriers succeed, we assume all is
    sufficiently well that we send the request and ignore errors in the
    second run of barriers.

    RAID5 needs extra care as write requests may not have been submitted
    to the underlying devices yet. So we flush the stripe cache before
    proceeding with the barrier.

    Note that the second set of zero-length barriers are submitted
    immediately after the original request is submitted. Thus when
    a personality finds mddev->barrier to be set during make_request,
    it should not return from make_request until the corresponding
    per-device request(s) have been queued.

    That will be done in later patches.

    Signed-off-by: NeilBrown
    Reviewed-by: Andre Noll

    NeilBrown
     

23 Sep, 2009

2 commits


11 Sep, 2009

1 commit


03 Aug, 2009

1 commit

  • This patch replaces md_integrity_check() by two new public functions:
    md_integrity_register() and md_integrity_add_rdev() which are both
    personality-independent.

    md_integrity_register() is called from the ->run and ->hot_remove
    methods of all personalities that support data integrity. The
    function iterates over the component devices of the array and
    determines if all active devices are integrity capable and if their
    profiles match. If this is the case, the common profile is registered
    for the mddev via blk_integrity_register().

    The second new function, md_integrity_add_rdev() is called from the
    ->hot_add_disk methods, i.e. whenever a new device is being added
    to a raid array. If the new device does not support data integrity,
    or has a profile different from the one already registered, data
    integrity for the mddev is disabled.

    For raid0 and linear, only the call to md_integrity_register() from
    the ->run method is necessary.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     

01 Jul, 2009

1 commit


18 Jun, 2009

4 commits

  • If the superblock of a component device indicates the presence of a
    bitmap but the corresponding raid personality does not support bitmaps
    (raid0, linear, multipath, faulty), then something is seriously wrong
    and we'd better refuse to run such an array.

    Currently, this check is performed while the superblocks are examined,
    i.e. before entering personality code. Therefore the generic md layer
    must know which raid levels support bitmaps and which do not.

    This patch avoids this layer violation without adding identical code
    to various personalities. This is accomplished by introducing a new
    public function to md.c, md_check_no_bitmap(), which replaces the
    hard-coded checks in the superblock loading functions.

    A call to md_check_no_bitmap() is added to the ->run method of each
    personality which does not support bitmaps and assembly is aborted
    if at least one component device contains a bitmap.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • This is currently ensured by common code, but it is more reliable to
    ensure it where it is needed in personality code.
    All the other personalities that care already round the size to
    the chunk_size. raid0 and linear are the only hold-outs.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • following the conversion to chunk_sectors, there is room
    for cleaning up a little.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This patch renames the chunk_size field to chunk_sectors with the
    implied change of semantics. Since

    is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
    = is_power_of_2(chunk_sectors)

    these bits don't need an adjustment for the shift.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     

16 Jun, 2009

15 commits

  • Maintain two flows, one for pow2 chunk sizes (which uses masks and
    shift), and a flow for the general case (which uses sector_div).
    This is for the sake of performance.

    - introduce map_sector and is_io_in_chunk_boundary to encapsulate
    those two flows better for raid0_make_request
    - fix blk_mergeable to support the two flows.

    Signed-off-by: raziebe@gmail.com
    Signed-off-by: NeilBrown

    raz ben yehuda
     
  • have raid0 check chunk size in run method instead of in md.
    This is part of a series moving the checks from common code to
    the personalities where they belong.

    hardsect is short and chunksize is an int, so it is safe to use %.

    Signed-off-by: raziebe@gmail.com
    Signed-off-by: NeilBrown

    raz ben yehuda
     
  • Report to the user what are the raid zones

    Signed-off-by: raziebe@gmail.com
    Signed-off-by: NeilBrown

    raz ben yehuda
     
  • Because of the removal of the device list from
    the strips raid0 did not compile with MD_DEBUG flag on

    Signed-off-by: NeilBrown

    raz ben yehuda
     
  • Having a macro just to cast a void* isn't really helpful.
    I would must rather see that we are simply de-referencing ->private,
    than have to know what the macro does.

    So open code the macro everywhere and remove the pointless cast.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This setting doesn't seem to make sense (half the chunk size??) and
    shouldn't be needed.
    The segment boundary exported by raid0 should simply be the minimum
    of the segment boundary of all component devices. And we already
    get that right.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • If we treat conf->devlist more like a 2 dimensional array,
    we can get the devlist for a particular zone simply by indexing
    that array, so we don't need to store the pointers to subarrays
    in strip_zone. This makes strip_zone smaller and so (hopefully)
    searches faster.

    Signed-of-by: NeilBrown

    NeilBrown
     
  • storing ->sectors is redundant as is can be computed from the
    difference z->zone_end - (z-1)->zone_end

    The one place where it is used, it is just as efficient to use
    a zone_end value instead.

    And removing it makes strip_zone smaller, so they array of these that
    is searched on every request has a better chance to say in cache.

    So discard the field and get the value from elsewhere.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • raid0_stop() removes all references to the raid0 configuration but
    misses to free the ->devlist buffer.

    This patch closes this leak, removes a pointless initialization and
    fixes a coding style issue in raid0_stop().

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • Currently the raid0 configuration is allocated in raid0_run() while
    the buffers for the strip_zone and the dev_list arrays are allocated
    in create_strip_zones(). On errors, all three buffers are freed
    in raid0_run().

    It's easier and more readable to do the allocation and cleanup within
    a single function. So move that code into create_strip_zones().

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • Currently raid0_run() always returns -ENOMEM on errors. This is
    incorrect as running the array might fail for other reasons, for
    example because not all component devices were available.

    This patch changes create_strip_zones() so that it returns a proper
    error code (either -ENOMEM or -EINVAL) rather than 1 on errors and
    makes raid0_run(), its single caller, return that value instead
    of -ENOMEM.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • The "sector_shift" and "spacing" fields of struct raid0_private_data
    were only used for the hash table lookups. So the removal of the
    hash table allows get rid of these fields as well which simplifies
    create_strip_zones() and raid0_run() quite a bit.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • The raid0 hash table has become unused due to the changes in the
    previous patch. This patch removes the hash table allocation and
    setup code and kills the hash_table field of struct raid0_private_data.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • 1/ remove current_start. The same value is available in
    zone->dev_start and storing it separately doesn't gain anything.
    2/ rename curr_zone_start to curr_zone_end as we are now more
    focused on the 'end' of each zone. We end up storing the
    same number though - the old name was a little confusing
    (and what does 'current' mean in this context anyway).

    Signed-off-by: NeilBrown

    NeilBrown
     
  • The number of strip_zones of a raid0 array is bounded by the number of
    drives in the array and is in fact much smaller for typical setups. For
    example, any raid0 array containing identical disks will have only
    a single strip_zone.

    Therefore, the hash tables which are used for quickly finding the
    strip_zone that holds a particular sector are of questionable value
    and add quite a bit of unnecessary complexity.

    This patch replaces the hash table lookup by equivalent code which
    simply loops over all strip zones to find the zone that holds the
    given sector.

    In order to make this loop as fast as possible, the zone->start field
    of struct strip_zone has been renamed to zone_end, and it now stores
    the beginning of the next zone in sectors. This allows to save one
    addition in the loop.

    Subsequent cleanup patches will remove the hash table structure.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     

23 May, 2009

1 commit


31 Mar, 2009

7 commits

  • Allow userspace to set the size of the array according to the following
    semantics:

    1/ size must be pers->size(mddev, 0, 0)
    a) If size is set before the array is running, do_md_run will fail
    if size is greater than the default size
    b) A reshape attempt that reduces the default size to less than the set
    array size should be blocked
    2/ once userspace sets the size the kernel will not change it
    3/ writing 'default' to this attribute returns control of the size to the
    kernel and reverts to the size reported by the personality

    Also, convert locations that need to know the default size from directly
    reading ->array_sectors to _size. Resync/reshape operations
    always follow the default size.

    Finally, fixup other locations that read a number of 1k-blocks from
    userspace to use strict_blocks_to_sectors() which checks for unsigned
    long long to sector_t overflow and blocks to sectors overflow.

    Reviewed-by: Andre Noll
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Get personalities out of the business of directly modifying
    ->array_sectors. Lays groundwork to introduce policy on when
    ->array_sectors can be modified.

    Reviewed-by: Andre Noll
    Signed-off-by: Dan Williams

    Dan Williams
     
  • In preparation for giving userspace control over ->array_sectors we need
    to be able to retrieve the 'default' size, and the 'anticipated' size
    when a reshape is requested. For personalities that do not reshape emit
    a warning if anything but the default size is requested.

    In the raid5 case we need to update ->previous_raid_disks to make the
    new 'default' size available.

    Reviewed-by: Andre Noll
    Signed-off-by: Dan Williams

    Dan Williams
     
  • This patch renames the "size" field of struct mdk_rdev_s to
    "sectors" and changes this field to store sectors instead of
    blocks.

    All users of this field, linear.c, raid0.c and md.c, are fixed up
    accordingly which gets rid of many multiplications and divisions.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • It really is nicer to keep related code together..

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This makes the includes more explicit, and is preparation for moving
    md_k.h to drivers/md/md.h

    Remove include/raid/md.h as its only remaining use was to #include
    other files.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Move the headers with the local structures for the disciplines and
    bitmap.h into drivers/md/ so that they are more easily grepable for
    hacking and not far away. md.h is left where it is for now as there
    are some uses from the outside.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: NeilBrown

    Christoph Hellwig
     

09 Jan, 2009

5 commits

  • The rdev_for_each macro defined in is identical to
    list_for_each_entry_safe, from , it should be defined to
    use list_for_each_entry_safe, instead of reinventing the wheel.

    But some calls to each_entry_safe don't really need a safe version,
    just a direct list_for_each_entry is enough, this could save a temp
    variable (tmp) in every function that used rdev_for_each.

    In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
    totally save many tmp vars; and only in the other situations that will call
    list_del to delete an entry, the safe version is used.

    Signed-off-by: Cheng Renquan
    Signed-off-by: NeilBrown

    Cheng Renquan
     
  • This patch renames the hash_spacing and preshift members of struct
    raid0_private_data to spacing and sector_shift respectively and
    changes the semantics as follows:

    We always have spacing = 2 * hash_spacing. In case
    sizeof(sector_t) > sizeof(u32) we also have sector_shift = preshift + 1
    while sector_shift = preshift = 0 otherwise.

    Note that the values of nb_zone and zone are unaffected by these changes
    because in the sector_div() preceeding the assignement of these two
    variables both arguments double.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • This completes the block -> sector conversion of struct strip_zone.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • This patch consists only of these trivial changes.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • current_offset and curr_zone_offset stored the corresponding offsets
    as 1K quantities. Rename them to current_start and curr_zone_start
    to match the naming of struct strip_zone and store the offsets as
    sector counts.

    Also, add KERN_INFO to the printk() affected by this change to make
    checkpatch happy.

    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll