08 Oct, 2020

1 commit


25 Sep, 2020

1 commit


04 Aug, 2020

1 commit

  • Pull core block updates from Jens Axboe:
    "Good amount of cleanups and tech debt removals in here, and as a
    result, the diffstat shows a nice net reduction in code.

    - Softirq completion cleanups (Christoph)

    - Stop using ->queuedata (Christoph)

    - Cleanup bd claiming (Christoph)

    - Use check_events, moving away from the legacy media change
    (Christoph)

    - Use inode i_blkbits consistently (Christoph)

    - Remove old unused writeback congestion bits (Christoph)

    - Cleanup/unify submission path (Christoph)

    - Use bio_uninit consistently, instead of bio_disassociate_blkg
    (Christoph)

    - sbitmap cleared bits handling (John)

    - Request merging blktrace event addition (Jan)

    - sysfs add/remove race fixes (Luis)

    - blk-mq tag fixes/optimizations (Ming)

    - Duplicate words in comments (Randy)

    - Flush deferral cleanup (Yufen)

    - IO context locking/retry fixes (John)

    - struct_size() usage (Gustavo)

    - blk-iocost fixes (Chengming)

    - blk-cgroup IO stats fixes (Boris)

    - Various little fixes"

    * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
    block: blk-timeout: delete duplicated word
    block: blk-mq-sched: delete duplicated word
    block: blk-mq: delete duplicated word
    block: genhd: delete duplicated words
    block: elevator: delete duplicated word and fix typos
    block: bio: delete duplicated words
    block: bfq-iosched: fix duplicated word
    iocost_monitor: start from the oldest usage index
    iocost: Fix check condition of iocg abs_vdebt
    block: Remove callback typedefs for blk_mq_ops
    block: Use non _rcu version of list functions for tag_set_list
    blk-cgroup: show global disk stats in root cgroup io.stat
    blk-cgroup: make iostat functions visible to stat printing
    block: improve discard bio alignment in __blkdev_issue_discard()
    block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
    block: defer flush request no matter whether we have elevator
    block: make blk_timeout_init() static
    block: remove retry loop in ioc_release_fn()
    block: remove unnecessary ioc nested locking
    block: integrate bd_start_claiming into __blkdev_get
    ...

    Linus Torvalds
     

24 Jul, 2020

1 commit

  • Commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended
    device during destroy") broke integrity recalculation.

    The problem is dm_suspended() returns true not only during suspend,
    but also during resume. So this race condition could occur:
    1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
    2. integrity_recalc (&ic->recalc_work) preempts the current thread
    3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
    4. integrity_recalc exits and no recalculating is done.

    To fix this race condition, add a function dm_post_suspending that is
    only true during the postsuspend phase and use it instead of
    dm_suspended().

    Signed-off-by: Mikulas Patocka
    Fixes: adc0daad366b ("dm: report suspended device during destroy")
    Cc: stable vger kernel org # v4.18+
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

09 Jul, 2020

1 commit

  • Except for pktdvd, the only places setting congested bits are file
    systems that allocate their own backing_dev_info structures. And
    pktdvd is a deprecated driver that isn't useful in stack setup
    either. So remove the dead congested_fn stacking infrastructure.

    Signed-off-by: Christoph Hellwig
    Acked-by: Song Liu
    Acked-by: David Sterba
    [axboe: fixup unused variables in bcache/request.c]
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 May, 2020

1 commit


15 May, 2020

1 commit


03 Apr, 2020

1 commit


26 Nov, 2019

1 commit

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Fix DM core to disallow stacking request-based DM on partitions.

    - Fix DM raid target to properly resync raidset even if bitmap needed
    additional pages.

    - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
    IO and crypt workqueues.

    - Fix DM integrity metadata layout that was aligned on 128K boundary
    rather than the intended 4K boundary (removes 124K of wasted space
    for each metadata block).

    - Improve the DM thin, cache and clone targets to use spin_lock_irq
    rather than spin_lock_irqsave where possible.

    - Fix DM thin single thread performance that was lost due to needless
    workqueue wakeups.

    - Fix DM zoned target performance that was lost due to excessive
    backing device checks.

    - Add ability to trigger write failure with the DM dust test target.

    - Fix whitespace indentation in drivers/md/Kconfig.

    - Various smalls fixes and cleanups (e.g. use struct_size, fix
    uninitialized variable, variable renames, etc).

    * tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
    dm: Fix Kconfig indentation
    dm thin: wakeup worker only when deferred bios exist
    dm integrity: fix excessive alignment of metadata runs
    dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
    dm zoned: reduce overhead of backing device checks
    dm dust: add limited write failure mode
    dm dust: change ret to r in dust_map_read and dust_map
    dm dust: change result vars to r
    dm cache: replace spin_lock_irqsave with spin_lock_irq
    dm bio prison: replace spin_lock_irqsave with spin_lock_irq
    dm thin: replace spin_lock_irqsave with spin_lock_irq
    dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
    dm clone: replace spin_lock_irqsave with spin_lock_irq
    dm writecache: handle REQ_FUA
    dm writecache: fix uninitialized variable warning
    dm stripe: use struct_size() in kmalloc()
    dm raid: streamline rs_get_progress() and its raid_status() caller side
    dm raid: simplify rs_setup_recovery call chain
    dm raid: to ensure resynchronization, perform raid set grow in preresume
    ...

    Linus Torvalds
     

13 Nov, 2019

1 commit

  • Avoid the need to allocate a potentially large array of struct blk_zone
    in the block layer by switching the ->report_zones method interface to
    a callback model. Now the caller simply supplies a callback that is
    executed on each reported zone, and private data for it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Shin'ichiro Kawasaki
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 Nov, 2019

1 commit

  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct stripe_c {
    ...
    struct stripe stripe[0];
    };

    In this case alloc_context() and dm_array_too_big() are removed and
    replaced by the direct use of the struct_size() helper in kmalloc().

    Notice that open-coded form is prone to type mistakes.

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Mike Snitzer

    Gustavo A. R. Silva
     

19 Jul, 2019

1 commit

  • …t/device-mapper/linux-dm

    Pull more device mapper updates from Mike Snitzer:

    - Fix zone state management race in DM zoned target by eliminating the
    unnecessary DMZ_ACTIVE state.

    - A couple fixes for issues the DM snapshot target's optional discard
    support added during first week of the 5.3 merge.

    - Increase default size of outstanding IO that is allowed for a each
    dm-kcopyd client and introduce tunable to allow user adjust.

    - Update DM core to use printk ratelimiting functions rather than
    duplicate them and in doing so fix an issue where DMDEBUG_LIMIT()
    rate limited KERN_DEBUG messages had excessive "callbacks suppressed"
    messages.

    * tag 'for-5.3/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm: use printk ratelimiting functions
    dm kcopyd: Increase default sub-job size to 512KB
    dm snapshot: fix oversights in optional discard support
    dm zoned: fix zone state management race

    Linus Torvalds
     

18 Jul, 2019

1 commit

  • DM provided its own ratelimiting printk wrapper but given printk
    advances this is no longer needed.

    Also, switching DMDEBUG_LIMIT to using pr_debug_ratelimited() fixes the
    reported issue where DMDEBUG_LIMIT() still caused a flood of "callbacks
    suppressed" messages.

    Reported-by: Milan Broz
    Depends-on: 29fc2bc7539386 ("printk: pr_debug_ratelimited: check state first to reduce "callbacks suppressed" messages")
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

12 Jul, 2019

1 commit

  • Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
    preparation of using vmalloc() for large report buffer and zone array
    allocations used by this function, remove its "gfp_t gfp_mask" argument
    and rely on the caller context to use memalloc_noio_save/restore() where
    necessary (block layer zone revalidation and dm-zoned I/O error path).

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

26 Apr, 2019

1 commit

  • After commit 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via
    blk_insert_cloned_request feedback"), map_request() will requeue the tio
    when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.

    Thus, if device driver status is error, a tio may be requeued multiple
    times until the return value is not DM_MAPIO_REQUEUE. That means
    type->start_io may be called multiple times, while type->end_io is only
    called when IO complete.

    In fact, even without commit 396eaf21ee17, setup_clone() failure can
    also cause tio requeue and associated missed call to type->end_io.

    The service-time path selector selects path based on in_flight_size,
    which is increased by st_start_io() and decreased by st_end_io().
    Missed calls to st_end_io() can lead to in_flight_size count error and
    will cause the selector to make the wrong choice. In addition,
    queue-length path selector will also be affected.

    To fix the problem, call type->end_io in ->release_clone_rq before tio
    requeue. map_info is passed to ->release_clone_rq() for map_request()
    error path that result in requeue.

    Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
    Cc: stable@vger.kernl.org
    Signed-off-by: Yufen Yu
    Signed-off-by: Mike Snitzer

    Yufen Yu
     

06 Mar, 2019

2 commits

  • Add a "create" module parameter, which allows device-mapper targets to
    be configured at boot time. This enables early use of DM targets in the
    boot process (as the root device or otherwise) without the need of an
    initramfs.

    The syntax used in the boot param is based on the concise format from
    the dmsetup tool to follow the rule of least surprise:

    dmsetup table --concise /dev/mapper/lroot

    Which is:
    dm-mod.create=,,,,[,+][;,,,,[,+]+]

    Where,
    ::= The device name.
    ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
    ::= The device minor number | ""
    ::= "ro" | "rw"
    ::=
    ::= "verity" | "linear" | ...

    For example, the following could be added in the boot parameters:
    dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0

    Only the targets that were tested are allowed and the ones that don't
    change any block device when the device is create as read-only. For
    example, mirror and cache targets are not allowed. The rationale behind
    this is that if the user makes a mistake, choosing the wrong device to
    be the mirror or the cache can corrupt data.

    The only targets initially allowed are:
    * crypt
    * delay
    * linear
    * snapshot-origin
    * striped
    * verity

    Co-developed-by: Will Drewry
    Co-developed-by: Kees Cook
    Co-developed-by: Enric Balletbo i Serra
    Signed-off-by: Helen Koike
    Reviewed-by: Kees Cook
    Signed-off-by: Mike Snitzer

    Helen Koike
     
  • A dm-raid array with devices larger than 4GB won't assemble on
    a 32 bit host since _check_data_dev_sectors() was added in 4.16.
    This is because to_sector() treats its argument as an "unsigned long"
    which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
    is more correct.

    Kernels as early as 4.2 can have other problems due to to_sector()
    being used on the size of a device.

    Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality")
    cc: stable@vger.kernel.org (v4.2+)
    Reported-and-tested-by: Guillaume Perréal
    Signed-off-by: NeilBrown
    Signed-off-by: Mike Snitzer

    NeilBrown
     

21 Feb, 2019

1 commit


27 Oct, 2018

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - The biggest change this cycle is to remove support for the legacy IO
    path (.request_fn) from request-based DM.

    Jens has already started preparing for complete removal of the legacy
    IO path in 4.21 but this earlier removal of support from DM has been
    coordinated with Jens (as evidenced by the commit being attributed to
    him).

    Making request-based DM exclussively blk-mq only cleans up that
    portion of DM core quite nicely.

    - Convert the thinp and zoned targets over to using refcount_t where
    applicable.

    - A couple fixes to the DM zoned target for refcounting and other races
    buried in the implementation of metadata block creation and use.

    - Small cleanups to remove redundant unlikely() around a couple
    WARN_ON_ONCE().

    - Simplify how dm-ioctl copies from userspace, eliminating some
    potential for a malicious user trying to change the executed ioctl
    after its processing has begun.

    - Tweaked DM crypt target to use the DM device name when naming the
    various workqueues created for a particular DM crypt device (makes
    the N workqueues for a DM crypt device more easily understood and
    enhances user's accounting capabilities at a glance via "ps")

    - Small fixup to remove dead branch in DM writecache's memory_entry().

    * tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm writecache: remove disabled code in memory_entry()
    dm zoned: fix various dmz_get_mblock() issues
    dm zoned: fix metadata block ref counting
    dm raid: avoid bitmap with raid4/5/6 journal device
    dm crypt: make workqueue names device-specific
    dm: add dm_table_device_name()
    dm ioctl: harden copy_params()'s copy_from_user() from malicious users
    dm: remove unnecessary unlikely() around WARN_ON_ONCE()
    dm zoned: target: use refcount_t for dm zoned reference counters
    dm thin: use refcount_t for thin_c reference counting
    dm table: require that request-based DM be layered on blk-mq devices
    dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED
    dm: remove legacy request-based IO path

    Linus Torvalds
     

26 Oct, 2018

1 commit

  • Dispatching a report zones command through the request queue is a major
    pain due to the command reply payload rewriting necessary. Given that
    blkdev_report_zones() is executing everything synchronously, implement
    report zones as a block device file operation instead, allowing major
    simplification of the code in many places.

    sd, null-blk, dm-linear and dm-flakey being the only block device
    drivers supporting exposing zoned block devices, these drivers are
    modified to provide the device side implementation of the
    report_zones() block device file operation.

    For device mappers, a new report_zones() target type operation is
    defined so that the upper block layer calls blkdev_report_zones() can
    be propagated down to the underlying devices of the dm targets.
    Implementation for this new operation is added to the dm-linear and
    dm-flakey targets.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    [Damien]
    * Changed method block_device argument to gendisk
    * Various bug fixes and improvements
    * Added support for null_blk, dm-linear and dm-flakey.
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Mike Snitzer
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 Oct, 2018

1 commit


11 Oct, 2018

1 commit


23 May, 2018

1 commit

  • Similar to the ->copy_from_iter() operation, a platform may want to
    deploy an architecture or device specific routine for handling reads
    from a dax_device like /dev/pmemX. On x86 this routine will point to a
    machine check safe version of copy_to_iter(). For now, add the plumbing
    to device-mapper and the dax core.

    Cc: Ross Zwisler
    Cc: Mike Snitzer
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

07 Apr, 2018

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - DM core passthrough ioctl fix to retain reference to DM table, and
    that table's block devices, while issuing the ioctl to one of those
    block devices.

    - DM core passthrough ioctl fix to _not_ override the fmode_t used to
    issue the ioctl. Overriding by using the fmode_t that the block
    device was originally open with during DM table load is a liability.

    - Add DM core support for secure erase forwarding and update the DM
    linear and DM striped targets to support them.

    - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
    same, write zeroes) for targets that make use of the non-splitting IO
    variant (as is done for multipath or thinp when layered directly on
    NVMe).

    - Allow DM targets to return a payload in response to a DM message that
    they are sent. This is useful for DM targets that would like to
    provide statistics data in response to DM messages.

    - Update DM bufio to support non-power-of-2 block sizes. Numerous other
    related changes prepare the DM bufio code for this support.

    - Fix DM crypt to use a bounded amount of memory across the entire
    system. This is to avoid OOM that can otherwise occur in response to
    certain pathological IO workloads (e.g. discarding a large DM crypt
    device).

    - Add a 'check_at_most_once' feature to the DM verity target to allow
    verity to be used on mobile devices that have very limited resources.

    - Fix the DM integrity target to fail early if a keyed algorithm (e.g.
    HMAC) is to be used but the key isn't set.

    - Add non-power-of-2 support to the DM unstripe target.

    - Eliminate the use of a Variable Length Array in the DM stripe target.

    - Update the DM log-writes target to record metadata (REQ_META flag).

    - DM raid fixes for its nosync status and some variable range issues.

    * tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
    dm: remove fmode_t argument from .prepare_ioctl hook
    dm: hold DM table for duration of ioctl rather than use blkdev_get
    dm raid: fix parse_raid_params() variable range issue
    dm verity: make verity_for_io_block static
    dm verity: add 'check_at_most_once' option to only validate hashes once
    dm bufio: don't embed a bio in the dm_buffer structure
    dm bufio: support non-power-of-two block sizes
    dm bufio: use slab cache for dm_buffer structure allocations
    dm bufio: reorder fields in dm_buffer structure
    dm bufio: relax alignment constraint on slab cache
    dm bufio: remove code that merges slab caches
    dm bufio: get rid of slab cache name allocations
    dm bufio: move dm-bufio.h to include/linux/
    dm bufio: delete outdated comment
    dm: add support for secure erase forwarding
    dm: backfill abnormal IO support to non-splitting IO submission
    dm raid: fix nosync status
    dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
    dm stripe: get rid of a Variable Length Array (VLA)
    dm log writes: record metadata flag for better flags record
    ...

    Linus Torvalds
     

05 Apr, 2018

1 commit

  • Use the fmode_t that is passed to dm_blk_ioctl() rather than
    inconsistently (varies across targets) drop it on the floor by
    overriding it with the fmode_t stored in 'struct dm_dev'.

    All the persistent reservation functions weren't using the fmode_t they
    got back from .prepare_ioctl so remove them.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

04 Apr, 2018

2 commits


18 Mar, 2018

1 commit

  • It happens often while I'm preparing a patch for a block driver that
    I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
    available for this driver? Do I have to introduce definitions of these
    constants before I can use these constants? To avoid this confusion,
    move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
    header file such that these become available for all
    block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
    header file conditional to avoid that including that header file after
    causes the compiler to complain about a SECTOR_SIZE
    redefinition.

    Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
    not been removed from uapi header files nor from NAND drivers in
    which these constants are used for another purpose than converting
    block layer offsets and sizes into a number of sectors.

    Cc: David S. Miller
    Cc: Mike Snitzer
    Cc: Dan Williams
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

30 Jan, 2018

1 commit

  • Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's
    multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue.
    This is beneficial to do if BLK_STS_RESOURCE is returned from the target
    (because target is busy).

    Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(),
    indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay.

    For old .request_fn: use blk_delay_queue().

    bio-based multipath doesn't have feature parity with request-based for
    retryable error requeues; that is something that'll need fixing in the
    future.

    Suggested-by: Bart Van Assche
    Signed-off-by: Mike Snitzer
    Acked-by: Bart Van Assche
    [as interpreted from Bart's "... patch looks fine to me."]

    Mike Snitzer
     

17 Jan, 2018

1 commit


20 Dec, 2017

1 commit

  • If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
    all devices in the DM table do not support partial completions. Also,
    the table has a single immutable target that doesn't require DM core to
    split bios.

    This will enable adding NVMe optimizations to bio-based DM.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

17 Dec, 2017

1 commit

  • Eliminates need for a separate mempool to allocate 'struct dm_io'
    objects from. As such, it saves an extra mempool allocation for each
    original bio that DM core is issued.

    This complicates the per-bio-data accessor functions by needing to
    conditonally add extra padding to get to a target's per-bio-data. But
    in the end this provides a decent performance improvement for all
    bio-based DM devices.

    On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
    DM linear performance improved by 2% (went from 2665 to 2777 MB/s).

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

14 Dec, 2017

1 commit

  • No DM target provides num_write_bios and none has since dm-cache's
    brief use in 2013.

    Having the possibility of num_write_bios > 1 complicates bio
    allocation. So remove the interface and assume there is only one bio
    needed.

    If a target ever needs more, it must provide a suitable bioset and
    allocate itself based on its particular needs.

    Signed-off-by: NeilBrown
    Signed-off-by: Mike Snitzer

    NeilBrown
     

11 Sep, 2017

1 commit

  • Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
    buggy. A DM device may be composed of multiple underlying devices and
    all of them need to be flushed. That commit just routes the flush
    request to the first device and ignores the other devices.

    It could be fixed by adding more complex logic to the device mapper. But
    there is only one implementation of the method pmem_dax_ops->flush - that
    is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
    don't need the pmem_dax_ops->flush abstraction at all, we can call
    arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
    can't ever reach anything different from arch_wb_cache_pmem().

    It should be also pointed out that for some uses of persistent memory it
    is needed to flush only a very small amount of data (such as 1 cacheline),
    and it would be overkill if we go through that device mapper machinery for
    a single flushed cache line.

    Fix this by removing the pmem_dax_ops->flush abstraction and call
    arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
    mapper code that forwards the flushes.

    Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Reviewed-by: Dan Williams
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

28 Aug, 2017

2 commits

  • The arrays of 'struct dm_arg' are never modified by the device-mapper
    core, so constify them so that they are placed in .rodata.

    (Exception: the args array in dm-raid cannot be constified because it is
    allocated on the stack and modified.)

    Signed-off-by: Eric Biggers
    Signed-off-by: Mike Snitzer

    Eric Biggers
     
  • Using the same rate limiting state for different kinds of messages
    is wrong because this can cause a high frequency message to suppress
    a report of a low frequency message. Hence use a unique rate limiting
    state per message type.

    Fixes: 71a16736a15e ("dm: use local printk ratelimit")
    Cc: stable@vger.kernel.org
    Signed-off-by: Bart Van Assche
    Signed-off-by: Mike Snitzer

    Bart Van Assche
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

19 Jun, 2017

3 commits

  • A target driver support zoned block devices and exposing it as such may
    receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
    device zone configuration. To process properly such request, the target
    driver may need to remap the zone descriptors provided in the report
    reply. The helper function dm_remap_zone_report() does this generically
    using only the target start offset and length and the start offset
    within the target device.

    dm_remap_zone_report() will remap the start sector of all zones
    reported. If the report includes sequential zones, the write pointer
    position of these zones will also be remapped.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    Signed-off-by: Mike Snitzer

    Damien Le Moal
     
  • 1) Introduce DM_TARGET_ZONED_HM feature flag:

    The target drivers currently available will not operate correctly if a
    table target maps onto a host-managed zoned block device.

    To avoid problems, introduce the new feature flag DM_TARGET_ZONED_HM to
    allow a target to explicitly state that it supports host-managed zoned
    block devices. This feature is checked for all targets in a table if
    any of the table's block devices are host-managed.

    Note that as host-aware zoned block devices are backward compatible with
    regular block devices, they can be used by any of the current target
    types. This new feature is thus restricted to host-managed zoned block
    devices.

    2) Check device area zone alignment:

    If a target maps to a zoned block device, check that the device area is
    aligned on zone boundaries to avoid problems with REQ_OP_ZONE_RESET
    operations (resetting a partially mapped sequential zone would not be
    possible). This also facilitates the processing of zone report with
    REQ_OP_ZONE_REPORT bios.

    3) Check block devices zone model compatibility

    When setting the DM device's queue limits, several possibilities exists
    for zoned block devices:
    1) The DM target driver may want to expose a different zone model
    (e.g. host-managed device emulation or regular block device on top of
    host-managed zoned block devices)
    2) Expose the underlying zone model of the devices as-is

    To allow both cases, the underlying block device zone model must be set
    in the target limits in dm_set_device_limits() and the compatibility of
    all devices checked similarly to the logical block size alignment. For
    this last check, introduce validate_hardware_zoned_model() to check that
    all targets of a table have the same zone model and that the zone size
    of the target devices are equal.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    [Mike Snitzer refactored Damien's original work to simplify the code]
    Signed-off-by: Mike Snitzer

    Damien Le Moal
     
  • Using pr_ is the more common logging style.

    Standardize style and use new macro DM_FMT.
    Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.

    Signed-off-by: Joe Perches
    Signed-off-by: Mike Snitzer

    Joe Perches