26 Apr, 2018

1 commit

  • [ Upstream commit 9b28a1102efc75d81298198166ead87d643a29ce ]

    Fixes:
    1. The use of "exceeds" when the opposite of exceeds, falls below,
    was meant.
    2. Properly speaking, a table can not exceed a threshold.

    It emphasizes the important point, which is that it is the userspace
    daemon's responsibility to check for low free space when a device
    is resumed, since it won't get a special event indicating low free
    space in that situation.

    Signed-off-by: mulhern
    Signed-off-by: Mike Snitzer
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    mulhern
     

06 Oct, 2017

1 commit

  • There are three important fields that indicate the overall health and
    status of an array: dev_health, sync_ratio, and sync_action. They tell
    us the condition of the devices in the array, and the degree to which
    the array is synchronized.

    This commit fixes a condition that is reported incorrectly. When a member
    of the array is being rebuilt or a new device is added, the "recover"
    process is used to synchronize it with the rest of the array. When the
    process is complete, but the sync thread hasn't yet been reaped, it is
    possible for the state of MD to be:
    mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
    curr_resync_completed = (but not MaxSector)
    and all rdevs to be In_sync.
    This causes the 'array_in_sync' output parameter that is passed to
    rs_get_progress() to be computed incorrectly and reported as 'false' --
    or not in-sync. This in turn causes the dev_health status characters to
    be reported as all 'a', rather than the proper 'A'.

    This can cause erroneous output for several seconds at a time when tools
    will want to be checking the condition due to events that are raised at
    the end of a sync process. Fix this by properly calculating the
    'array_in_sync' return parameter in rs_get_progress().

    Also, remove an unnecessary intermediate 'recovery_cp' variable in
    rs_get_progress().

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Mike Snitzer

    Jonathan Brassow
     

26 Jul, 2017

1 commit

  • Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c78c
    ("md: fix deadlock between mddev_suspend() and md_write_start()") is
    available.

    This version change allows userspace to detect that MD fix is available.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

19 Jun, 2017

1 commit

  • The dm-zoned device mapper target provides transparent write access
    to zoned block devices (ZBC and ZAC compliant block devices).
    dm-zoned hides to the device user (a file system or an application
    doing raw block device accesses) any constraint imposed on write
    requests by the device, equivalent to a drive-managed zoned block
    device model.

    Write requests are processed using a combination of on-disk buffering
    using the device conventional zones and direct in-place processing for
    requests aligned to a zone sequential write pointer position.
    A background reclaim process implemented using dm_kcopyd_copy ensures
    that conventional zones are always available for executing unaligned
    write requests. The reclaim process overhead is minimized by managing
    buffer zones in a least-recently-written order and first targeting the
    oldest buffer zones. Doing so, blocks under regular write access (such
    as metadata blocks of a file system) remain stored in conventional
    zones, resulting in no apparent overhead.

    dm-zoned implementation focus on simplicity and on minimizing overhead
    (CPU, memory and storage overhead). For a 14TB host-managed disk with
    256 MB zones, dm-zoned memory usage per disk instance is at most about
    3 MB and as little as 5 zones will be used internally for storing metadata
    and performing buffer zone reclaim operations. This is achieved using
    zone level indirection rather than a full block indirection system for
    managing block movement between zones.

    dm-zoned primary target is host-managed zoned block devices but it can
    also be used with host-aware device models to mitigate potential
    device-side performance degradation due to excessive random writing.

    Zoned block devices can be formatted and checked for use with the dm-zoned
    target using the dmzadm utility available at:

    https://github.com/hgst/dm-zoned-tools

    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    [Mike Snitzer partly refactored Damien's original work to cleanup the code]
    Signed-off-by: Mike Snitzer

    Damien Le Moal
     

04 May, 2017

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - A major update for DM cache that reduces the latency for deciding
    whether blocks should migrate to/from the cache. The bio-prison-v2
    interface supports this improvement by enabling direct dispatch of
    work to workqueues rather than having to delay the actual work
    dispatch to the DM cache core. So the dm-cache policies are much more
    nimble by being able to drive IO as they see fit. One immediate
    benefit from the improved latency is a cache that should be much more
    adaptive to changing workloads.

    - Add a new DM integrity target that emulates a block device that has
    additional per-sector tags that can be used for storing integrity
    information.

    - Add a new authenticated encryption feature to the DM crypt target
    that builds on the capabilities provided by the DM integrity target.

    - Add MD interface for switching the raid4/5/6 journal mode and update
    the DM raid target to use it to enable aid4/5/6 journal write-back
    support.

    - Switch the DM verity target over to using the asynchronous hash
    crypto API (this helps work better with architectures that have
    access to off-CPU algorithm providers, which should reduce CPU
    utilization).

    - Various request-based DM and DM multipath fixes and improvements from
    Bart and Christoph.

    - A DM thinp target fix for a bio structure leak that occurs for each
    discard IFF discard passdown is enabled.

    - A fix for a possible deadlock in DM bufio and a fix to re-check the
    new buffer allocation watermark in the face of competing admin
    changes to the 'max_cache_size_bytes' tunable.

    - A couple DM core cleanups.

    * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
    dm bufio: check new buffer allocation watermark every 30 seconds
    dm bufio: avoid a possible ABBA deadlock
    dm mpath: make it easier to detect unintended I/O request flushes
    dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
    dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
    dm: introduce enum dm_queue_mode to cleanup related code
    dm mpath: verify __pg_init_all_paths locking assumptions at runtime
    dm: verify suspend_locking assumptions at runtime
    dm block manager: remove an unused argument from dm_block_manager_create()
    dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
    dm mpath: delay requeuing while path initialization is in progress
    dm mpath: avoid that path removal can trigger an infinite loop
    dm mpath: split and rename activate_path() to prepare for its expanded use
    dm ioctl: prevent stack leak in dm ioctl call
    dm integrity: use previously calculated log2 of sectors_per_block
    dm integrity: use hex2bin instead of open-coded variant
    dm crypt: replace custom implementation of hex2bin()
    dm crypt: remove obsolete references to per-CPU state
    dm verity: switch to using asynchronous hash crypto API
    dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
    ...

    Linus Torvalds
     

25 Apr, 2017

2 commits

  • The DM integrity block size can now be 512, 1k, 2k or 4k. Using larger
    blocks reduces metadata handling overhead. The block size can be
    configured at table load time using the "block_size:" option;
    where is expressed in bytes (defult is still 512 bytes).

    It is safe to use larger block sizes with DM integrity, because the
    DM integrity journal makes sure that the whole block is updated
    atomically even if the underlying device doesn't support atomic writes
    of that size (e.g. 4k block ontop of a 512b device).

    Depends-on: 2859323e ("block: fix blk_integrity_register to use template's interval_exp if not 0")
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Some coding style changes.

    Fix a bug that the array test_tag has insufficient size if the digest
    size of internal has is bigger than the tag size.

    The function __fls is undefined for zero argument, this patch fixes
    undefined behavior if the user sets zero interleave_sectors.

    Fix the limit of optional arguments to 8.

    Don't allocate crypt_data on the stack to avoid a BUG with debug kernel.

    Rename all optional argument names to have underscores rather than
    dashes.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

28 Mar, 2017

1 commit

  • Commit 63c32ed4afc ("dm raid: add raid4/5/6 journaling support") added
    journal support to close the raid4/5/6 "write hole" -- in terms of
    writethrough caching.

    Introduce a "journal_mode" feature and use the new
    r5c_journal_mode_set() API to add support for switching the journal
    device's cache mode between write-through (the current default) and
    write-back.

    NOTE: If the journal device is not layered on resilent storage and it
    fails, write-through mode will cause the "write hole" to reoccur. But
    if the journal fails while in write-back mode it will cause data loss
    for any dirty cache entries unless resilent storage is used for the
    journal.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

27 Mar, 2017

1 commit

  • Commit 3a1c1ef2f ("dm raid: enhance status interface and fixup
    takeover/raid0") added new table line arguments and introduced an
    ordering flaw. The sequence of the raid10_copies and raid10_format
    raid parameters got reversed which causes lvm2 userspace to fail by
    falsely assuming a changed table line.

    Sequence those 2 parameters as before so that old lvm2 can function
    properly with new kernels by adjusting the table line output as
    documented in Documentation/device-mapper/dm-raid.txt.

    Also, add missing version 1.10.1 highlight to the documention.

    Fixes: 3a1c1ef2f ("dm raid: enhance status interface and fixup takeover/raid0")
    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

25 Mar, 2017

5 commits

  • In recovery mode, we don't:
    - replay the journal
    - check checksums
    - allow writes to the device

    This mode can be used as a last resort for data recovery. The
    motivation for recovery mode is that when there is a single error in the
    journal, the user should not lose access to the whole device.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Add optional "sector_size" parameter that specifies encryption sector
    size (atomic unit of block device encryption).

    Parameter can be in range 512 - 4096 bytes and must be power of two.
    For compatibility reasons, the maximal IO must fit into the page limit,
    so the limit is set to the minimal page size possible (4096 bytes).

    NOTE: this device cannot yet be handled by cryptsetup if this parameter
    is set.

    IV for the sector is calculated from the 512 bytes sector offset unless
    the iv_large_sectors option is used.

    Test script using dmsetup:

    DEV="/dev/sdb"
    DEV_SIZE=$(blockdev --getsz $DEV)
    KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac"
    BLOCK_SIZE=4096

    # dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
    # dmsetup table --showkeys test_crypt

    Signed-off-by: Milan Broz
    Signed-off-by: Mike Snitzer

    Milan Broz
     
  • For the new authenticated encryption we have to support generic composed
    modes (combination of encryption algorithm and authenticator) because
    this is how the kernel crypto API accesses such algorithms.

    To simplify the interface, we accept an algorithm directly in crypto API
    format. The new format is recognised by the "capi:" prefix. The
    dmcrypt internal IV specification is the same as for the old format.

    The crypto API cipher specifications format is:
    capi:cipher_api_spec-ivmode[:ivopts]
    Examples:
    capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
    capi:xts(aes)-plain64 (equivalent to old aes-xts-plain64)
    Examples of authenticated modes:
    capi:gcm(aes)-random
    capi:authenc(hmac(sha256),xts(aes))-random
    capi:rfc7539(chacha20,poly1305)-random

    Authenticated modes can only be configured using the new cipher format.
    Note that this format allows user to specify arbitrary combinations that
    can be insecure. (Policy decision is done in cryptsetup userspace.)

    Authenticated encryption algorithms can be of two types, either native
    modes (like GCM) that performs both encryption and authentication
    internally, or composed modes where user can compose AEAD with separate
    specification of encryption algorithm and authenticator.

    For composed mode with HMAC (length-preserving encryption mode like an
    XTS and HMAC as an authenticator) we have to calculate HMAC digest size
    (the separate authentication key is the same size as the HMAC digest).
    Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get
    HMAC algorithm and retrieve digest size from it.

    Also, for HMAC composed mode we need to parse the crypto API string to
    get the cipher mode nested in the specification. For native AEAD mode
    (like GCM), we can use crypto_tfm_alg_name() API to get the cipher
    specification.

    Because the HMAC composed mode is not processed the same as the native
    AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and
    "hmac" specification for the table integrity argument is removed.

    Signed-off-by: Milan Broz
    Signed-off-by: Mike Snitzer

    Milan Broz
     
  • Allow the use of per-sector metadata, provided by the dm-integrity
    module, for integrity protection and persistently stored per-sector
    Initialization Vector (IV). The underlying device must support the
    "DM-DIF-EXT-TAG" dm-integrity profile.

    The per-bio integrity metadata is allocated by dm-crypt for every bio.

    Example of low-level mapping table for various types of use:
    DEV=/dev/sdb
    SIZE=417792

    # Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
    SIZE_INT=389952
    dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
    dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
    11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
    00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
    0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"

    # AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
    # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
    SIZE_INT=393024
    dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
    dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
    11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
    0 /dev/mapper/x 0 1 integrity:28:aead"

    # Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
    SIZE_INT=401272
    dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
    dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
    11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
    0 /dev/mapper/x 0 1 integrity:16:none"

    # Random IV with XTS + HMAC integrity protection
    SIZE_INT=377656
    dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
    dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
    11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
    00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
    0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"

    Both AEAD and HMAC protection authenticates not only data but also
    sector metadata.

    HMAC protection is implemented through autenc wrapper (so it is
    processed the same way as an authenticated mode).

    In HMAC mode there are two keys (concatenated in dm-crypt mapping
    table). First is the encryption key and the second is the key for
    authentication (HMAC). (It is userspace decision if these keys are
    independent or somehow derived.)

    The sector request for AEAD/HMAC authenticated encryption looks like this:
    |----- AAD -------|------ DATA -------|-- AUTH TAG --|
    | (authenticated) | (auth+encryption) | |
    | sector_LE | IV | sector in/out | tag in/out |

    For writes, the integrity fields are calculated during AEAD encryption
    of every sector and stored in bio integrity fields and sent to
    underlying dm-integrity target for storage.

    For reads, the integrity metadata is verified during AEAD decryption of
    every sector (they are filled in by dm-integrity, but the integrity
    fields are pre-allocated in dm-crypt).

    There is also an experimental support in cryptsetup utility for more
    friendly configuration (part of LUKS2 format).

    Because the integrity fields are not valid on initial creation, the
    device must be "formatted". This can be done by direct-io writes to the
    device (e.g. dd in direct-io mode). For now, there is available trivial
    tool to do this, see: https://github.com/mbroz/dm_int_tools

    Signed-off-by: Milan Broz
    Signed-off-by: Ondrej Mosnacek
    Signed-off-by: Vashek Matyas
    Signed-off-by: Mike Snitzer

    Milan Broz
     
  • The dm-integrity target emulates a block device that has additional
    per-sector tags that can be used for storing integrity information.

    A general problem with storing integrity tags with every sector is that
    writing the sector and the integrity tag must be atomic - i.e. in case of
    crash, either both sector and integrity tag or none of them is written.

    To guarantee write atomicity the dm-integrity target uses a journal. It
    writes sector data and integrity tags into a journal, commits the journal
    and then copies the data and integrity tags to their respective location.

    The dm-integrity target can be used with the dm-crypt target - in this
    situation the dm-crypt target creates the integrity data and passes them
    to the dm-integrity target via bio_integrity_payload attached to the bio.
    In this mode, the dm-crypt and dm-integrity targets provide authenticated
    disk encryption - if the attacker modifies the encrypted device, an I/O
    error is returned instead of random data.

    The dm-integrity target can also be used as a standalone target, in this
    mode it calculates and verifies the integrity tag internally. In this
    mode, the dm-integrity target can be used to detect silent data
    corruption on the disk or in the I/O path.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Milan Broz
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

19 Mar, 2017

1 commit


28 Feb, 2017

1 commit


17 Feb, 2017

1 commit

  • If "metadata2" is provided as a table argument when creating/loading a
    cache target a more compact metadata format, with separate dirty bits,
    is used. "metadata2" improves speed of shutting down a cache target.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

25 Jan, 2017

2 commits

  • Add md raid4/5/6 journaling support (upstream commit bac624f3f86a started
    the implementation) which closes the write hole (i.e. non-atomic updates
    to stripes) using a dedicated journal device.

    Background:
    raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5
    or two raid6 P/Q syndrome payloads in an in-memory stripe cache.
    Parity or P/Q syndromes used to recover any data payloads in case of a disk
    failure are calculated from the N data payloads and need to be updated on the
    different component devices of the raid device. Those are non-atomic,
    persistent updates. Hence a crash can cause failure to update all stripe
    payloads persistently and thus cause data loss during stripe recovery.
    This problem gets addressed by writing whole stripe cache entries (together with
    journal metadata) to a persistent journal entry on a dedicated journal device.
    Only if that journal entry is written successfully, the stripe cache entry is
    updated on the component devices of the raid device (i.e. writethrough type).
    In case of a crash, the entry can be recovered from the journal and be written
    again thus ensuring consistent stripe payload suitable to data recovery.

    Future dependencies:
    once writeback caching being worked on to compensate for the throughput
    implictions involved with writethrough overhead is supported with journaling
    in upstream, an additional patch based on this one will support it in dm-raid.

    Journal resilience related remarks:
    because stripes are recovered from the journal in case of a crash, the
    journal device better be resilient. Resilience becomes mandatory with
    future writeback support, because loosing the working set in the log
    means data loss as oposed to writethrough, were the loss of the
    journal device 'only' reintroduces the write hole.

    Fix comment on data offsets in parse_dev_params() and initialize
    new_data_offset as well.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     
  • This fix addresses the following 3 failure scenarios:

    1) If a (transiently) inaccessible metadata device is being passed into the
    constructor (e.g. a device tuple '254:4 254:5'), it is processed as if
    '- -' was given. This erroneously results in a status table line containing
    '- -', which mistakenly differs from what has been passed in. As a result,
    userspace libdevmapper puts the device tuple seperate from the RAID device
    thus not processing the dependencies properly.

    2) False health status char 'A' instead of 'D' is emitted on the status
    status info line for the meta/data device tuple in this metadata device
    failure case.

    3) If the metadata device is accessible when passed into the constructor
    but the data device (partially) isn't, that leg may be set faulty by the
    raid personality on access to the (partially) unavailable leg. Restore
    tried in a second raid device resume on such failed leg (status char 'D')
    fails after the (partial) leg returned.

    Fixes for aforementioned failure scenarios:

    - don't release passed in devices in the constructor thus allowing the
    status table line to e.g. contain '254:4 254:5' rather than '- -'

    - emit device status char 'D' rather than 'A' for the device tuple
    with the failed metadata device on the status info line

    - when attempting to restore faulty devices in a second resume, allow the
    device hot remove function to succeed by setting the device to not in-sync

    In case userspace intentionally passes '- -' into the constructor to avoid that
    device tuple (e.g. to split off a raid1 leg temporarily for later re-addition),
    the status table line will correctly show '- -' and the status info line will
    provide a '-' device health character for the non-defined device tuple.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

15 Dec, 2016

1 commit


14 Dec, 2016

1 commit

  • According to `man blockdev':

    --getsize
    Print device size (32-bit!) in sectors.
    Deprecated in favor of the --getsz option.
    ...
    --getsz
    Get size in 512-byte sectors.

    Hence, occurrences of `--getsize' should be replaced with `--getsz',
    which this commit has achieved as follows:

    $ cd "$repo"
    $ git grep -l -e --getsz
    Documentation/device-mapper/delay.txt
    Documentation/device-mapper/dm-crypt.txt
    Documentation/device-mapper/linear.txt
    Documentation/device-mapper/log-writes.txt
    Documentation/device-mapper/striped.txt
    Documentation/device-mapper/switch.txt
    $ cd Documentation/device-mapper
    $ sed -i s/getsize/getsz/g *

    Signed-off-by: Michael Witten
    Signed-off-by: Jiri Kosina

    Michael Witten
     

09 Dec, 2016

2 commits

  • Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     
  • The kernel key service is a generic way to store keys for the use of
    other subsystems. Currently there is no way to use kernel keys in dm-crypt.
    This patch aims to fix that. Instead of key userspace may pass a key
    description with preceding ':'. So message that constructs encryption
    mapping now looks like this:

    [|:] [ ]

    where is in format: ::

    Currently we only support two elementary key types: 'user' and 'logon'.
    Keys may be loaded in dm-crypt either via or using
    classical method and pass the key in hex representation directly.

    dm-crypt device initialised with a key passed in hex representation may be
    replaced with key passed in key_string format and vice versa.

    (Based on original work by Andrey Ryabinin)

    Signed-off-by: Ondrej Kozina
    Reviewed-by: David Howells
    Signed-off-by: Mike Snitzer

    Ondrej Kozina
     

21 Nov, 2016

1 commit


18 Oct, 2016

1 commit

  • dm-raid 1.9.0 fails to activate existing RAID4/10 devices that have the
    old superblock format (which does not have takeover/reshaping support
    that was added via commit 33e53f06850f).

    Fix validation path for old superblocks by reverting to the old raid4
    layout and basing checks on mddev->new_{level,layout,...} members in
    super_init_validation().

    Cc: stable@vger.kernel.org # 4.8
    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Jun, 2016

1 commit


08 Jun, 2016

1 commit

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

06 May, 2016

2 commits


11 Mar, 2016

1 commit

  • smq seems to be performing better than the old mq policy in all
    situations, as well as using a quarter of the memory.

    Make 'mq' an alias for 'smq' when choosing a cache policy. The tunables
    that were present for the old mq are faked, and have no effect. mq
    should be considered deprecated now.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

10 Dec, 2015

2 commits

  • If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
    matching a zero hash without validating the content.

    Signed-off-by: Sami Tolvanen
    Signed-off-by: Mike Snitzer

    Sami Tolvanen
     
  • Add support for correcting corrupted blocks using Reed-Solomon.

    This code uses RS(255, N) interleaved across data and hash
    blocks. Each error-correcting block covers N bytes evenly
    distributed across the combined total data, so that each byte is a
    maximum distance away from the others. This makes it possible to
    recover from several consecutive corrupted blocks with relatively
    small space overhead.

    In addition, using verity hashes to locate erasures nearly doubles
    the effectiveness of error correction. Being able to detect
    corrupted blocks also improves performance, because only corrupted
    blocks need to corrected.

    For a 2 GiB partition, RS(255, 253) (two parity bytes for each
    253-byte block) can correct up to 16 MiB of consecutive corrupted
    blocks if erasures can be located, and 8 MiB if they cannot, with
    16 MiB space overhead.

    Signed-off-by: Sami Tolvanen
    Signed-off-by: Mike Snitzer

    Sami Tolvanen
     

05 Nov, 2015

1 commit

  • Pull device mapper updates from Mike Snitzer:
    "Smaller set of DM changes for this merge. I've based these changes on
    Jens' for-4.4/reservations branch because the associated DM changes
    required it.

    - Revert a dm-multipath change that caused a regression for
    unprivledged users (e.g. kvm guests) that issued ioctls when a
    multipath device had no available paths.

    - Include Christoph's refactoring of DM's ioctl handling and add
    support for passing through persistent reservations with DM
    multipath.

    - All other changes are very simple cleanups"

    * tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm switch: simplify conditional in alloc_region_table()
    dm delay: document that offsets are specified in sectors
    dm delay: capitalize the start of an delay_ctr() error message
    dm delay: Use DM_MAPIO macros instead of open-coded equivalents
    dm linear: remove redundant target name from error messages
    dm persistent data: eliminate unnecessary return values
    dm: eliminate unused "bioset" process for each bio-based DM device
    dm: convert ffs to __ffs
    dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
    dm: add support for passing through persistent reservations
    dm: refactor ioctl handling
    Revert "dm mpath: fix stalls when handling invalid ioctls"
    dm: initialize non-blk-mq queue data before queue is used

    Linus Torvalds
     

01 Nov, 2015

1 commit


10 Oct, 2015

1 commit

  • Commit 76c44f6d80 introduced the possibly for "Overflow" to be reported
    by the snapshot device's status. Older userspace (e.g. lvm2) does not
    handle the "Overflow" status response.

    Fix this incompatibility by requiring newer userspace code, that can
    cope with "Overflow", request the persistent store with overflow support
    by using "PO" (Persistent with Overflow) for the snapshot store type.

    Reported-by: Zdenek Kabelac
    Fixes: 76c44f6d80 ("dm snapshot: don't invalidate on-disk image on snapshot write overflow")
    Reviewed-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

01 Sep, 2015

1 commit


19 Aug, 2015

1 commit

  • If the user selected the precise_timestamps or histogram options, report
    it in the @stats_list message output.

    If the user didn't select these options, no extra tokens are reported,
    thus it is backward compatible with old software that doesn't know about
    precise timestamps and histogram.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # 4.2

    Mikulas Patocka
     

16 Jul, 2015

2 commits