09 Dec, 2006

2 commits

  • The complete_resync_work function only provides the ability to change an
    out-of-sync region to in-sync. This patch enhances the function to allow us
    to change the status from in-sync to out-of-sync as well, something that is
    needed when a mirror write to one of the devices or an initial resync on a
    given region fails.

    Signed-off-by: Jonathan E Brassow
    Signed-off-by: Alasdair G Kergon
    Cc: dm-devel@redhat.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan E Brassow
     
  • Update existing targets to use the new symbols for return values from target
    map and end_io functions.

    There is no effect on behaviour.

    Test results:
    Done build test without errors.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon
    Cc: dm-devel@redhat.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kiyoshi Ueda
     

22 Nov, 2006

1 commit


09 Nov, 2006

1 commit

  • All device-mapper targets must complete outstanding I/O before suspending.
    The mirror target generates I/O in its recovery phase and fails to wait for
    it. It needs to be tracked so we can ensure that it has completed before we
    suspend.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Jonathan E Brassow
    Signed-off-by: Alasdair G Kergon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan E Brassow
     

03 Oct, 2006

1 commit


28 Aug, 2006

1 commit

  • On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
    we experienced frequent deadlock of the system under high i/o load. 'cat
    /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
    after a few GB, 'cp' would be left in 'D' state along with kjournald and
    kmirrord. The functions cp and kjournald were blocked in did vary, but
    kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern
    on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates
    that this problem has been around even before.

    So much for the facts, here's my interpretation: mempool_alloc() first tries
    to atomically allocate the requested memory, or falls back to hand out
    preallocated chunks from the mempool. If both fail, it puts the calling
    process (kmirrord in this case) on a private waitqueue until somebody refills
    the pool. Where the only 'somebody' is kmirrord itself, so we have a
    deadlock.

    I worked around this problem by falling back to a (blocking) kmalloc when
    before kmirrord would have ended up on the waitqueue. This defeats part of
    the benefits of using the mempool, but at least keeps the system running. And
    it could be done with a two-line change. Note that mempool_alloc() clears the
    GFP_NOIO flag internally, and only uses it to decide whether to wait or return
    an error if immediate allocation fails, so the attached patch doesn't change
    behaviour in the non-deadlocking case. Path is against current git
    (2.6.18-rc4), but should apply to earlier versions as well. I've tested on
    2.6.15, where this patch makes the difference between random lockup and a
    stable system.

    Signed-off-by: Daniel Kobras
    Acked-by: Alasdair G Kergon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kobras
     

27 Jun, 2006

5 commits

  • Tidy device-mapper error messages to include context information
    automatically.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • kcopyd should accumulate errors - otherwise I/O failures may be ignored
    unintentionally.

    And invert 'success' (used in a future patch), using a more intuitive
    !(read_err || write_err).

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Brassow
     
  • On-disk logs for dm-mirror devices are currently hard-coded to use 512 byte
    hard-sector-sizes. This patch fixes dm-log so it will work with devices with
    non-512-byte hard-sector-sizes.

    To maintain full compatibility, instead of moving the clean-bits bitset to a
    bitset, and enlarges the disk-header buffer to encompass both the header and
    the bitset. The I/O routines for the bitset are removed, and the I/O routines
    for the disk-header now also read/write the bitset.

    Signed-off-by: Kevin Corry
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Corry
     
  • The device-mapper core does not perform any remapping of bios before passing
    them to the targets. If a particular mapping begins part-way into a device,
    targets obtain the sector relative to the start of the mapping by subtracting
    ti->begin.

    The dm-raid1 target didn't do this everywhere: this patch fixes it, taking
    care to subtract ti->begin exactly once for each bio.

    [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]

    Signed-off-by: Neil Brown
    Signed-off-by: Alasdair G Kergon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Brown
     
  • This patch converts the combination of list_del(A) and list_add(A, B) to
    list_move(A, B) under drivers/.

    Acked-by: Corey Minyard
    Cc: Ben Collins
    Acked-by: Roland Dreier
    Cc: Alasdair Kergon
    Cc: Gerd Knorr
    Cc: Paul Mackerras
    Cc: Frank Pavlic
    Acked-by: Matthew Wilcox
    Cc: Andrew Vasquez
    Cc: Mikael Starvik
    Cc: Greg Kroah-Hartman
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

28 Mar, 2006

2 commits

  • We don't know what type sector_t has. Sometimes it's unsigned long, sometimes
    it's unsigned long long. For example on ppc64 it's unsigned long with
    CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.

    The way to handle all of this is to always use unsigned long long and to
    always typecast the sector_t when printing it.

    Acked-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • dm-mirror has potential data corruption problem: while on-disk log shows
    that all disk contents are in-sync, actual contents of the disks are not
    synchronized. This problem occurs if initial recovery (synching) is
    interrupted and resumed.

    Attached patch fixes this problem.

    Background:

    rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
    (in-sync), which results in the corresponding bit of clean_bits being set.

    This is harmful if on-disk log is used and the map is removed/suspended
    before the initial sync is completed. The clean_bits is written down to
    the on-disk log at the map removal, and, upon resume, it's read and copied
    to sync_bits. Since the recovery process refers to the sync_bits to find a
    region to be recovered, the region whose state was changed from RH_NOSYNC
    to RH_CLEAN is no longer recovered.

    If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
    sometimes ago, the contents of the mirrored disk just corrupt silently. If
    you have, balanced read may get bogus data from out-of-sync disks.

    The patch keeps RH_NOSYNC state unchanged. It will be changed to
    RH_RECOVERING when recovery starts and get reclaimed when the recovery
    completes. So it doesn't leak the region hash entry.

    Description:

    Keep RH_NOSYNC state unchanged when I/O on the region completes.

    rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
    (in-sync), which results in the corresponding bit of clean_bits being set.

    This is harmful if on-disk log is used and the map is removed/suspended
    before the initial sync is completed. The clean_bits is written down to
    the on-disk log at the map removal, and, upon resume, it's read and copied
    to sync_bits. Since the recovery process refers to the sync_bits to find a
    region to be recovered, the region whose state was changed from RH_NOSYNC
    to RH_CLEAN is no longer recovered.

    If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
    sometimes ago, the contents of the mirrored disk just corrupt silently. If
    you have, balanced read may get bogus data from out-of-sync disks.

    The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts
    on the region and get reclaimed when the recovery completes. So it doesn't
    leak the region hash entry.

    Alasdair said:

    I've analysed the relevant part of the state machine and I believe that
    the patch is correct.

    (Further work on this code is still needed - this patch has the
    side-effect of holding onto memory unnecessarily for long periods of time
    under certain workloads - but better that than corrupting data.)

    Signed-off-by: Jun'ichi Nomura
    Acked-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     

27 Mar, 2006

1 commit

  • This patch changes several mempool users, all of which are basically just
    wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
    than their own wrapper function, removing a bunch of duplicated code.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     

07 Jan, 2006

1 commit


23 Nov, 2005

1 commit

  • The spinlock region_lock is held while calling mark_region which can sleep.
    Drop the spinlock before calling that function.

    A region's state and inclusion in the clean list are altered by rh_inc and
    rh_dec. The state variable is set to RH_CLEAN in rh_dec, but only if
    'pending' is zero. It is set to RH_DIRTY in rh_inc, but not if it is already
    so. The changes to 'pending', the state, and the region's inclusion in the
    clean list need to be atomicly.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan E Brassow
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Sep, 2005

1 commit

  • Fix another bug in dm-raid1.c that the dirty region may stay in or be moved
    to clean list and freed while in use.

    It happens as follows:

    CPU0 CPU1
    ------------------------------------------------------------------------------
    rh_dec()
    if (atomic_dec_and_test(pending))

    rh_inc()
    if the region is clean
    mark the region dirty
    and remove from clean list
    mark the region clean
    and move to clean list
    atomic_inc(pending)

    At this stage, the region is in clean list and will be mistakenly reclaimed
    by rh_update_states() later.

    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     

05 Aug, 2005

1 commit


08 Jul, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds