24 Mar, 2011

1 commit

  • Implement a merge function in the striped target.

    When the striped target's underlying devices provide a merge_bvec_fn
    (like all DM devices do via dm_merge_bvec) it is important to call down
    to them when building a biovec that doesn't span a stripe boundary.

    Without the merge method, a striped DM device stacked on DM devices
    causes bios with a single page to be submitted which results
    in unnecessary overhead that hurts performance.

    This change really helps filesystems (e.g. XFS and now ext4) which take
    care to assemble larger bios. By implementing stripe_merge(), DM and the
    stripe target no longer undermine the filesystem's work by only allowing
    a single page per bio. Buffered IO sees the biggest improvement
    (particularly uncached reads, buffered writes to a lesser degree). This
    is especially so for more capable "enterprise" storage LUNs.

    The performance improvement has been measured to be ~12-35% -- when a
    reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe
    count that is a power of 2.

    In contrast, the performance penalty is ~5-7% for the pathological worst
    case stripe configuration (small chunk_size with a stripe count that is
    not a power of 2). The reason for this is that stripe_map_sector() is
    now called once for every call to dm_merge_bvec(). stripe_map_sector()
    will use slower division if stripe count isn't a power of 2.

    Signed-off-by: Mustafa Mesanovic
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mustafa Mesanovic
     

14 Jan, 2011

1 commit

  • kstriped only serves sc->kstriped_ws which runs dm_table_event().
    This doesn't need to be executed from an ordered workqueue w/ rescuer.
    Drop kstriped and use the system_wq instead. While at it, rename
    kstriped_ws to trigger_event so that it's consistent with other dm
    modules.

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     

10 Sep, 2010

1 commit

  • This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
    now deprecated REQ_HARDBARRIER.

    * -EOPNOTSUPP handling logic dropped.

    * Preflush is handled as before but postflush is dropped and replaced
    with passing down REQ_FUA to member request_queues. This replaces
    one array wide cache flush w/ member specific FUA writes.

    * __split_and_process_bio() now calls __clone_and_map_flush() directly
    for flushes and guarantees all FLUSH bio's going to targets are zero
    ` length.

    * It's now guaranteed that all FLUSH bio's which are passed onto dm
    targets are zero length. bio_empty_barrier() tests are replaced
    with REQ_FLUSH tests.

    * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

    * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
    enough to be marked with unlikely().

    * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
    doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
    capability.

    * Request based dm isn't converted yet. dm_init_request_based_queue()
    resets flush support to 0 for now. To avoid disturbing request
    based dm code, dm->flush_error is added for bio based dm while
    requested based dm continues to use dm->barrier_error.

    Lightly tested linear, stripe, raid1, snap and crypt targets. Please
    proceed with caution as I'm not familiar with the code base.

    Signed-off-by: Tejun Heo
    Cc: dm-devel@redhat.com
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Aug, 2010

5 commits


08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 Mar, 2010

1 commit


17 Feb, 2010

1 commit

  • If a table containing zero as stripe count is passed into stripe_ctr
    the code attempts to divide by zero.

    This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
    is zero.

    We now get the following error messages:
    device-mapper: table: 253:0: striped: Invalid stripe count
    device-mapper: ioctl: error adding target to table

    Signed-off-by: Nikanth Karthikesan
    Cc: stable@kernel.org
    Signed-off-by: Alasdair G Kergon

    Nikanth Karthikesan
     

14 Sep, 2009

1 commit


11 Sep, 2009

1 commit


05 Sep, 2009

1 commit

  • Set sensible I/O hints for striped DM devices in the topology
    infrastructure added for 2.6.31 for userspace tools to
    obtain via sysfs.

    Add .io_hints to 'struct target_type' to allow the I/O hints portion
    (io_min and io_opt) of the 'struct queue_limits' to be set by each
    target and implement this for dm-stripe.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

24 Jul, 2009

1 commit

  • Incorrect device area lengths are being passed to device_area_is_valid().

    The regression appeared in 2.6.31-rc1 through commit
    754c5fc7ebb417b23601a6222a6005cc2e7f2913.

    With the dm-stripe target, the size of the target (ti->len) was used
    instead of the stripe_width (ti->len/#stripes). An example of a
    consequent incorrect error message is:

    device-mapper: table: 254:0: sdb too small for target

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

22 Jun, 2009

2 commits

  • Add .iterate_devices to 'struct target_type' to allow a function to be
    called for all devices in a DM target. Implemented it for all targets
    except those in dm-snap.c (origin and snapshot).

    (The raid1 version number jumps to 1.12 because we originally reserved
    1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
    instead.)

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon
    Cc: martin.petersen@oracle.com

    Mike Snitzer
     
  • Flush support for the stripe target.

    This sets ti->num_flush_requests to the number of stripes and
    remaps individual flush requests to the appropriate stripe devices.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

06 Jan, 2009

1 commit

  • Change dm_unregister_target to return void and use BUG() for error
    reporting.

    dm_unregister_target can only fail because of programming bug in the
    target driver. It can't fail because of user's behavior or disk errors.

    This patch changes unregister_target to return void and use BUG if
    someone tries to unregister non-registered target or unregister target
    that is in use.

    This patch removes code duplication (testing of error codes in all dm
    targets) and reports bugs in just one place, in dm_unregister_target. In
    some target drivers, these return codes were ignored, which could lead
    to a situation where bugs could be missed.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

14 Nov, 2008

1 commit


22 Oct, 2008

2 commits

  • Change #include "dm.h" to #include in all targets.
    Targets should not need direct access to internal DM structures.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Move array_too_big to include/linux/device-mapper.h because it is
    used by targets.

    Remove the test from dm-raid1 as the number of mirror legs is limited
    such that it can never fail. (Even for stripes it seems rather
    unlikely.)

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

09 Oct, 2008

1 commit

  • * Implement disk_devt() and part_devt() and use them to directly
    access devt instead of computing it from ->major and ->first_minor.

    Note that all references to ->major and ->first_minor outside of
    block layer is used to determine devt of the disk (the part0) and as
    ->major and ->first_minor will continue to represent devt for the
    disk, converting these users aren't strictly necessary. However,
    convert them for consistency.

    * Implement disk_max_parts() to avoid directly deferencing
    genhd->minors.

    * Update bdget_disk() such that it doesn't assume consecutive minor
    space.

    * Move devt computation from register_disk() to add_disk() and make it
    the only one (all other usages use the initially determined value).

    These changes clean up the code and will help disk->part dereference
    fix and extended block device numbers.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Feb, 2008

2 commits

  • This patch adds additional information to the status line. It is added at the
    end of the returned text so it will not interfere with existing
    implementations using this data. The addition of this information will allow
    for a common return interface to match that returned with the dm-raid1.c
    status line (with Jonathan Brassow's patches).

    Here is a sample of what is returned with a mirror "status" call:
    isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core

    Here's what's returned with this patch for a stripe "status" call:
    isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA

    Signed-off-by: Brian Wood
    Signed-off-by: Alasdair G Kergon

    Brian Wood
     
  • This patch adds the stripe_end_io function to process errors that might
    occur after an IO operation. As part of this there are a number of
    enhancements made to record and trigger events:

    - New atomic variable in struct stripe to record the number of
    errors each stripe volume device has experienced (could be used
    later with uevents to report back directly to userspace)

    - New workqueue/work struct setup to process the trigger_event function

    - New end_io function. It is here that testing for BIO error conditions
    take place. It determines the exact stripe that cause the error,
    records this in the new atomic variable, and calls the queue_work() function

    - New trigger_event function to process failure events. This
    calls dm_table_event()

    Signed-off-by: Brian Wood
    Signed-off-by: Alasdair G Kergon

    Brian Wood
     

20 Oct, 2007

1 commit


09 Dec, 2006

1 commit

  • Update existing targets to use the new symbols for return values from target
    map and end_io functions.

    There is no effect on behaviour.

    Test results:
    Done build test without errors.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon
    Cc: dm-devel@redhat.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kiyoshi Ueda
     

27 Jun, 2006

1 commit


28 Mar, 2006

2 commits

  • Signed-off-by: Kevin Corry
    Cc: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Corry
     
  • We don't know what type sector_t has. Sometimes it's unsigned long, sometimes
    it's unsigned long long. For example on ppc64 it's unsigned long with
    CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.

    The way to handle all of this is to always use unsigned long long and to
    always typecast the sector_t when printing it.

    Acked-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Mar, 2006

1 commit

  • The dm-stripe target currently does not enforce that the size of a stripe
    device be a multiple of the chunk-size. Under certain conditions, this can
    lead to I/O requests going off the end of an underlying device. This
    test-case shows one example.

    echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
    echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
    echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
    dmsetup create stripe0
    dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k

    This will produce the output:
    dd: writing '/dev/mapper/stripe0': Input/output error
    97+0 records in
    96+0 records out

    And in the kernel log will be:
    attempt to access beyond end of device
    dm-0: rw=0, want=104, limit=100

    The patch will check that the table size is a multiple of the stripe
    chunk-size when the table is created, which will prevent the above striped
    device from being created.

    This should not affect tools like LVM or EVMS, since in all the cases I can
    think of, striped devices are always created with the sizes being a
    multiple of the chunk-size.

    The size of a stripe device must be a multiple of its chunk-size.

    (akpm: that typecast is quite gratuitous)

    Signed-off-by: Kevin Corry
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Corry
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds