12 Apr, 2012

3 commits

  • If a bitmap is added while the array is active, it is possible
    for bitmap_daemon_work to run while the bitmap is being
    initialised.
    This is particularly a problem if bitmap_daemon_work sees
    bitmap->filemap as non-NULL before it has been filled in properly.
    So hold bitmap_info.mutex while filling in ->filemap
    to prevent problems.

    This patch is suitable for any -stable kernel, though it might not
    apply cleanly before about 3.1.

    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     
  • If r1bio->sectors % 8 != 0,then the memcmp and a later
    memcpy will omit the last bio_vec.

    This is suitable for any stable kernel since 3.1 when bad-block
    management was introduced.

    Cc: stable@vger.kernel.org
    Signed-off-by: majianpeng
    Signed-off-by: NeilBrown

    majianpeng
     
  • bitmap_new_disk_sb() would still create V3 bitmap superblock
    with host-endian layout.

    Perhaps I'm confused, but shouldn't bitmap_new_disk_sb() be
    creating a V4 bitmap superblock instead, that is portable,
    as per comment in bitmap.h?

    Signed-off-by: Andrei Warkentin
    Signed-off-by: NeilBrown

    Andrei Warkentin
     

03 Apr, 2012

5 commits

  • When comparing two pages read from different legs of a mirror, only
    compare the bytes that were read, not the whole page.

    In most cases we read a whole page, but in some cases with
    bad blocks or odd sizes devices we might read fewer than that.

    This bug has been present "forever" but at worst it might cause
    a report of two many mismatches and generate a little bit
    extra resync IO, so there is no need to back-port to -stable
    kernels.

    Reported-by: majianpeng
    Signed-off-by: NeilBrown

    NeilBrown
     
  • When create a raid5 using assume-clean and echo check or repair to
    sync_action.Then component disks did not operated IO but the raid
    check/resync faster than normal.
    Because the judgement in function analyse_stripe():
    if (do_recovery ||
    sh->sector >= conf->mddev->recovery_cp)
    s->syncing = 1;
    else
    s->replacing = 1;
    When check or repair,the recovery_cp == MaxSectore,so syncing equal zero
    not one.

    This bug was introduced by commit 9a3e1101b827
    md/raid5: detect and handle replacements during recovery.
    so this patch is suitable for 3.3-stable.

    Cc: stable@vger.kernel.org
    Signed-off-by: majianpeng
    Signed-off-by: NeilBrown

    majianpeng
     
  • Because rde->nr_pending > 0,so can not remove this disk.
    And in any case, we aren't holding rcu_read_lock()

    Signed-off-by: majianpeng
    Signed-off-by: NeilBrown

    majianpeng
     
  • raid1 arrays do not have the notion of chunk size. Calculate the
    largest chunk sector size we can use to avoid a divide by zero OOPS
    when aligning the size of the new array to the chunk size.

    Signed-off-by: Jes Sorensen
    Signed-off-by: NeilBrown

    Jes Sorensen
     
  • 1/ We can only treat a known-bad-block like a read-error if we
    have the data that belongs in that block. So fix that test.

    2/ If we cannot recovery a stripe due to insufficient data,
    don't tell "md_done_sync" that the sync failed unless we really
    did fail something. If we successfully record bad blocks,
    that is success.

    Reported-by: "majianpeng"
    Signed-off-by: NeilBrown

    NeilBrown
     

02 Apr, 2012

3 commits


29 Mar, 2012

25 commits

  • This device-mapper target creates a read-only device that transparently
    validates the data on one underlying device against a pre-generated tree
    of cryptographic checksums stored on a second device.

    Two checksum device formats are supported: version 0 which is already
    shipping in Chromium OS and version 1 which incorporates some
    improvements.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Will Drewry
    Signed-off-by: Elly Jones
    Cc: Milan Broz
    Cc: Olof Johansson
    Cc: Steffen Klassert
    Cc: Andrew Morton
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • This patch introduces a new function dm_bufio_prefetch. It prefetches
    the specified range of blocks into dm-bufio cache without waiting
    for i/o completion.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Add dm thin target arguments to control discard support.

    ignore_discard: Disables discard support

    no_discard_passdown: Don't pass discards down to the underlying data
    device, but just remove the mapping within the thin provisioning target.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Support discards in the thin target.

    On discard the corresponding mapping(s) are removed from the thin
    device. If the associated block(s) are no longer shared the discard
    is passed to the underlying device.

    All bios other than discards now have an associated deferred_entry
    that is saved to the 'all_io_entry' in endio_hook. When non-discard
    IO completes and associated mappings are quiesced any discards that
    were deferred, via ds_add_work() in process_discard(), will be queued
    for processing by the worker thread.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    drivers/md/dm-thin.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++----
    drivers/md/dm-thin.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++-----
    1 file changed, 158 insertions(+), 14 deletions(-)

    Joe Thornber
     
  • This patch contains the ground work needed for dm-thin to support discard.

    - Adds endio function that replaces shared_read_endio.

    - Introduce an explicit 'quiesced' flag into the new_mapping structure.
    Before, this was implicitly indicated by m->list being empty.

    - The map_info->ptr remains constant for the duration of a bio's trip
    through the thin target. Make it easier to reason about it.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Use dm_target_offset wrapper instead of referencing the awkward ti->begin
    explicitly.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Support the use of an external _read only_ device as an origin for a thin
    device.

    Any read to an unprovisioned area of the thin device will be passed
    through to the origin. Writes trigger allocation of new blocks as
    usual.

    One possible use case for this would be VM hosts that want to run
    guests on thinly-provisioned volumes but have the base image on another
    device (possibly shared between many VMs).

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The thin metadata format can only make use of a device that is = 1 GB, physical extents).

    Rather than reject a larger metadata device, during thin-pool device
    construction, switch to allowing it but issue a warning if a device
    larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is
    provided. Any space over 15.9375 GB will not be used.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Save space by removing entries from the space map ref_count tree if
    they're no longer needed.

    Ref counts are stored in two places: a bitmap if the ref_count is
    below 3, or a btree of uint32_t if 3 or above.

    When a ref_count that was above 3 drops below we can remove it from
    the tree and save some metadata space. This removal was commented out
    before because I was unsure why this was causing under-populated btree
    nodes. Earlier patches have fixed this issue.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Commit unwritten data every second to prevent too much building up.

    Released blocks don't become available until after the next commit
    (for crash resilience). Prior to this patch commits were only
    triggered by a message to the target or a REQ_{FLUSH,FUA} bio. This
    allowed far too big a position to build up.

    The interval is hard-coded to 1 second. This is a sensible setting.
    I'm not making this user configurable, since there isn't much to be
    gained by tweaking this - and a lot lost by setting it far too high.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Device mapper uses sscanf to convert arguments to numbers. The problem is that
    the way we use it ignores additional unmatched characters in the scanned string.

    For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
    but also it will match number with some garbage appended, like "123abc".

    As a result, device mapper accepts garbage after some numbers. For example
    the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
    will pass without an error.

    This patch fixes all sscanf uses in device mapper. It appends "%c" with
    a pointer to a dummy character variable to every sscanf statement.

    The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
    only if string is a null-terminated number (optionally preceded by some
    whitespace characters). If there is some character appended after the number,
    sscanf matches "%c", writes the character to the dummy variable and returns 2.
    We check the return value for 1 and consequently reject numbers with some
    garbage appended.

    Signed-off-by: Mikulas Patocka
    Acked-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • The dm-raid code currently fails to create a RAID array if any of the
    superblocks cannot be read. This was an oversight as there is already
    code to handle this case if the values ('- -') were provided for the
    failed array position.

    With this patch, if a superblock cannot be read, the array position's
    fields are initialized as though '- -' was set in the table. That is,
    the device is failed and the position should not be used, but if there
    is sufficient redundancy, the array should still be activated.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     
  • Fix a harmless typo.

    The root is a chunk of data that gets written to the superblock. This
    data is used to recreate the space map when opening a metadata area.
    We have two space maps; one tracking space on the metadata device and
    one of the data device. Both of these use the same format for their
    root, so this typo was harmless.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Now that the value_size is held within every node of the btrees we can
    remove this argument from value_ptr().

    For the last few months a BUG_ON has been checking this argument is
    the same as that held in the node. No issues were reported. So this
    is a safe change.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The map_context pointer should always be set. However, we have reports
    that upon requeuing it is not set correctly. So add set and clear
    functions with a BUG_ON() to track the issue properly.

    Signed-off-by: Jun'ichi Nomura
    Cc: Mike Snitzer
    Acked-by: Hannes Reinecke
    Tested-by: Heiko Carstens
    Acked-by: Dave Wysochanski
    Signed-off-by: Alasdair G Kergon

    Jun'ichi Nomura
     
  • As a precaution, set bi_end_io to NULL when failing to remap.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Alasdair G Kergon

    Hannes Reinecke
     
  • free_devices in dm_table.c already uses list_for_each(), so we don't
    need to check if the list is empty.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Alasdair G Kergon

    Hannes Reinecke
     
  • Remove documentation for unimplemented 'trim' message.

    I'd planned a 'trim' target message for shrinking thin devices, but
    this is better handled via the discard ioctl.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The dm raid module (using md) is becoming the preferred way of creating long-lived
    mirrors through userspace LVM so remove the EXPERIMENTAL tag.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Drop EXPERIMENTAL tag from dm-uevent.

    It's not changed for a while and some userspace tools are relying upon it.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • When we remove an entry from a node we sometimes rebalance with it's
    two neighbours. This wasn't being done correctly; in some cases
    entries have to move all the way from the right neighbour to the left
    neighbour, or vice versa. This patch pretty much re-writes the
    balancing code to fix it.

    This code is barely used currently; only when you delete a thin
    device, and then only if you have hundreds of them in the same pool.
    Once we have discard support, which removes mappings, this will be used
    much more heavily.

    Signed-off-by: Joe Thornber
    Cc: stable@kernel.org
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Avoid using the bi_next field for the holder of a cell when deferring
    bios because a stacked device below might change it. Store the
    holder in a new field in struct cell instead.

    When a cell is created, the bio that triggered creation (the holder) was
    added to the same bio list as subsequent bios. In some cases we pass
    this holder bio directly to devices underneath. If those devices use
    the bi_next field there will be trouble...

    This also simplifies some code that had to work out which bio was the
    holder.

    Signed-off-by: Joe Thornber
    Cc: stable@kernel.org
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Always set io->error to -EIO when an error is detected in dm-crypt.

    There were cases where an error code would be set only if we finish
    processing the last sector. If there were other encryption operations in
    flight, the error would be ignored and bio would be returned with
    success as if no error happened.

    This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
    and kcryptd_async_done.

    Signed-off-by: Mikulas Patocka
    Cc: stable@kernel.org
    Reviewed-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • This patch fixes a possible deadlock in dm-crypt's mempool use.

    Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
    It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
    cannot fail and waits until the mempool is refilled). Further pages are
    allocated with different gfp flags that allow failing.

    Because allocations may be done in parallel, this code can deadlock. Example:
    There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
    run simultaneously.
    It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
    pages. The mempool is exhausted. Each process waits for more pages to be freed
    to the mempool, which never happens.

    To avoid this deadlock scenario, this patch changes the code so that only
    the first page is allocated with non-failing gfp mask. Allocation of further
    pages may fail.

    Signed-off-by: Mikulas Patocka
    Cc: stable@kernel.org
    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Call the correct exit function on failure in dm_exception_store_init.

    Signed-off-by: Andrei Warkentin
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Alasdair G Kergon

    Andrei Warkentin
     

23 Mar, 2012

1 commit

  • Pull md updates for 3.4 from Neil Brown:
    "Mostly tidying up code in preparation for some bigger changes next
    time.

    A few bug fixes tagged for -stable.

    Main functionality change is that some RAID10 arrays can now grow to
    use extra space that may have been made available on the individual
    devices."

    Fixed up trivial conflicts with the k[un]map_atomic() cleanups in
    drivers/md/bitmap.c.

    * tag 'md-3.4' of git://neil.brown.name/md: (22 commits)
    md: Add judgement bb->unacked_exist in function md_ack_all_badblocks().
    md: fix clearing of the 'changed' flags for the bad blocks list.
    md/bitmap: discard CHUNK_BLOCK_SHIFT macro
    md/bitmap: remove unnecessary indirection when allocating.
    md/bitmap: remove some pointless locking.
    md/bitmap: change a 'goto' to a normal 'if' construct.
    md/bitmap: move printing of bitmap status to bitmap.c
    md/bitmap: remove some unused noise from bitmap.h
    md/raid10 - support resizing some RAID10 arrays.
    md/raid1: handle merge_bvec_fn in member devices.
    md/raid10: handle merge_bvec_fn in member devices.
    md: add proper merge_bvec handling to RAID0 and Linear.
    md: tidy up rdev_for_each usage.
    md/raid1,raid10: avoid deadlock during resync/recovery.
    md/bitmap: ensure to load bitmap when creating via sysfs.
    md: don't set md arrays to readonly on shutdown.
    md: allow re-add to failed arrays.
    md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read().
    md: Use existed macros instead of numbers
    md/raid5: removed unused 'added_devices' variable.
    ...

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds
     

21 Mar, 2012

1 commit

  • Pull trivial tree from Jiri Kosina:
    "It's indeed trivial -- mostly documentation updates and a bunch of
    typo fixes from Masanari.

    There are also several linux/version.h include removals from Jesper."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
    kcore: fix spelling in read_kcore() comment
    constify struct pci_dev * in obvious cases
    Revert "char: Fix typo in viotape.c"
    init: fix wording error in mm_init comment
    usb: gadget: Kconfig: fix typo for 'different'
    Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
    writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
    writeback: fix typo in the writeback_control comment
    Documentation: Fix multiple typo in Documentation
    tpm_tis: fix tis_lock with respect to RCU
    Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
    Doc: Update numastat.txt
    qla4xxx: Add missing spaces to error messages
    compiler.h: Fix typo
    security: struct security_operations kerneldoc fix
    Documentation: broken URL in libata.tmpl
    Documentation: broken URL in filesystems.tmpl
    mtd: simplify return logic in do_map_probe()
    mm: fix comment typo of truncate_inode_pages_range
    power: bq27x00: Fix typos in comment
    ...

    Linus Torvalds
     

20 Mar, 2012

1 commit