21 May, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    sizeof(flexible-array-member) triggers a warning because flexible array
    members have incomplete type[1]. There are some instances of code in
    which the sizeof operator is being incorrectly/erroneously applied to
    zero-length arrays and the result is zero. Such instances may be hiding
    some bugs. So, this work (flexible-array member conversions) will also
    help to get completely rid of those sorts of issues.

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Mike Snitzer

    Gustavo A. R. Silva
     

03 Apr, 2020

1 commit


06 Nov, 2019

1 commit

  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct stripe_c {
    ...
    struct stripe stripe[0];
    };

    In this case alloc_context() and dm_array_too_big() are removed and
    replaced by the direct use of the struct_size() helper in kmalloc().

    Notice that open-coded form is prone to type mistakes.

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Mike Snitzer

    Gustavo A. R. Silva
     

23 May, 2018

1 commit

  • Similar to the ->copy_from_iter() operation, a platform may want to
    deploy an architecture or device specific routine for handling reads
    from a dax_device like /dev/pmemX. On x86 this routine will point to a
    machine check safe version of copy_to_iter(). For now, add the plumbing
    to device-mapper and the dax core.

    Cc: Ross Zwisler
    Cc: Mike Snitzer
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

11 Apr, 2018

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "This cycle was was not something I ever want to repeat as there were
    several late changes that have only now just settled.

    Half of the branch up to commit d2c997c0f145 ("fs, dax: use
    page->mapping to warn...") have been in -next for several releases.
    The of_pmem driver and the address range scrub rework were late
    arrivals, and the dax work was scaled back at the last moment.

    The of_pmem driver missed a previous merge window due to an oversight.
    A sense of obligation to rectify that miss is why it is included for
    4.17. It has acks from PowerPC folks. Stephen reported a build failure
    that only occurs when merging it with your latest tree, for now I have
    fixed that up by disabling modular builds of of_pmem. A test merge
    with your tree has received a build success report from the 0day robot
    over 156 configs.

    An initial version of the ARS rework was submitted before the merge
    window. It is self contained to libnvdimm, a net code reduction, and
    passing all unit tests.

    The filesystem-dax changes are based on the wait_var_event()
    functionality from tip/sched/core. However, late review feedback
    showed that those changes regressed truncate performance to a large
    degree. The branch was rewound to drop the truncate behavior change
    and now only includes preparation patches and cleanups (with full acks
    and reviews). The finalization of this dax-dma-vs-trnucate work will
    need to wait for 4.18.

    Summary:

    - A rework of the filesytem-dax implementation provides for detection
    of unmap operations (truncate / hole punch) colliding with
    in-progress device-DMA. A fix for these collisions remains a
    work-in-progress pending resolution of truncate latency and
    starvation regressions.

    - The of_pmem driver expands the users of libnvdimm outside of x86
    and ACPI to describe an implementation of persistent memory on
    PowerPC with Open Firmware / Device tree.

    - Address Range Scrub (ARS) handling is completely rewritten to
    account for the fact that ARS may run for 100s of seconds and there
    is no platform defined way to cancel it. ARS will now no longer
    block namespace initialization.

    - The NVDIMM Namespace Label implementation is updated to handle
    label areas as small as 1K, down from 128K.

    - Miscellaneous cleanups and updates to unit test infrastructure"

    * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
    libnvdimm, of_pmem: workaround OF_NUMA=n build error
    nfit, address-range-scrub: add module option to skip initial ars
    nfit, address-range-scrub: rework and simplify ARS state machine
    nfit, address-range-scrub: determine one platform max_ars value
    powerpc/powernv: Create platform devs for nvdimm buses
    doc/devicetree: Persistent memory region bindings
    libnvdimm: Add device-tree based driver
    libnvdimm: Add of_node to region and bus descriptors
    libnvdimm, region: quiet region probe
    libnvdimm, namespace: use a safe lookup for dimm device name
    libnvdimm, dimm: fix dpa reservation vs uninitialized label area
    libnvdimm, testing: update the default smart ctrl_temperature
    libnvdimm, testing: Add emulation for smart injection commands
    nfit, address-range-scrub: introduce nfit_spa->ars_state
    libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
    nfit, address-range-scrub: fix scrub in-progress reporting
    dax, dm: allow device-mapper to operate without dax support
    dax: introduce CONFIG_DAX_DRIVER
    fs, dax: use page->mapping to warn if truncate collides with a busy page
    ext2, dax: introduce ext2_dax_aops
    ...

    Linus Torvalds
     

04 Apr, 2018

2 commits

  • Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
    data devices support secure erase.

    Also, add support for secure erase to both the linear and striped
    targets.

    Signed-off-by: Denis Semakin
    Signed-off-by: Mike Snitzer

    Denis Semakin
     
  • Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to
    the build args: https://lkml.org/lkml/2018/3/7/621

    This one is a simple case, since we don't actually need the VLA at all: we
    can just iterate over the stripes twice, once to emit their names, and the
    second time to emit status (i.e. trade memory for time). Since the number
    of stripes is probably low, this is hopefully not that expensive.

    Signed-off-by: Tycho Andersen
    Reviewed-by: Kees Cook
    Signed-off-by: Mike Snitzer

    Tycho Andersen
     

03 Apr, 2018

1 commit

  • Change device-mapper's DAX dependency to require the presence of at
    least one DAX_DRIVER. This allows device-mapper to be built without
    bringing the DAX core along which is especially wasteful when there are
    no DAX drivers, like BLK_DEV_PMEM, configured.

    Cc: Alasdair Kergon
    Reported-by: Bart Van Assche
    Reported-by: kbuild test robot
    Reported-by: Arnd Bergmann
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Sep, 2017

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Some request-based DM core and DM multipath fixes and cleanups

    - Constify a few variables in DM core and DM integrity

    - Add bufio optimization and checksum failure accounting to DM
    integrity

    - Fix DM integrity to avoid checking integrity of failed reads

    - Fix DM integrity to use init_completion

    - A couple DM log-writes target fixes

    - Simplify DAX flushing by eliminating the unnecessary flush
    abstraction that was stood up for DM's use.

    * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dax: remove the pmem_dax_ops->flush abstraction
    dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
    dm integrity: make blk_integrity_profile structure const
    dm integrity: do not check integrity for failed read operations
    dm log writes: fix >512b sectorsize support
    dm log writes: don't use all the cpu while waiting to log blocks
    dm ioctl: constify ioctl lookup table
    dm: constify argument arrays
    dm integrity: count and display checksum failures
    dm integrity: optimize writing dm-bufio buffers that are partially changed
    dm rq: do not update rq partially in each ending bio
    dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
    dm mpath: complain about unsupported __multipath_map_bio() return values
    dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through

    Linus Torvalds
     

11 Sep, 2017

1 commit

  • Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
    buggy. A DM device may be composed of multiple underlying devices and
    all of them need to be flushed. That commit just routes the flush
    request to the first device and ignores the other devices.

    It could be fixed by adding more complex logic to the device mapper. But
    there is only one implementation of the method pmem_dax_ops->flush - that
    is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
    don't need the pmem_dax_ops->flush abstraction at all, we can call
    arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
    can't ever reach anything different from arch_wb_cache_pmem().

    It should be also pointed out that for some uses of persistent memory it
    is needed to flush only a very small amount of data (such as 1 cacheline),
    and it would be overkill if we go through that device mapper machinery for
    a single flushed cache line.

    Fix this by removing the pmem_dax_ops->flush abstraction and call
    arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
    mapper code that forwards the flushes.

    Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Reviewed-by: Dan Williams
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

16 Jun, 2017

1 commit

  • Allow device-mapper to route flush operations to the
    per-target implementation. In order for the device stacking to work we
    need a dax_dev and a pgoff relative to that device. This gives each
    layer of the stack the information it needs to look up the operation
    pointer for the next level.

    This conceptually allows for an array of mixed device drivers with
    varying flush implementations.

    Reviewed-by: Toshi Kani
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams
     

10 Jun, 2017

1 commit

  • Allow device-mapper to route copy_from_iter operations to the
    per-target implementation. In order for the device stacking to work we
    need a dax_dev and a pgoff relative to that device. This gives each
    layer of the stack the information it needs to look up the operation
    pointer for the next level.

    This conceptually allows for an array of mixed device drivers with
    varying copy_from_iter implementations.

    Reviewed-by: Toshi Kani
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams
     

09 Jun, 2017

3 commits

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Turn the error paramter into a pointer so that target drivers can change
    the value, and make sure only DM_ENDIO_* values are returned from the
    methods.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • A few (but not all) dm targets use a special EWOULDBLOCK error code for
    failing REQ_RAHEAD requests that fail due to a lack of available resources.
    But no one else knows about this magic code, and lower level drivers also
    don't generate it when failing read-ahead requests for similar reasons.

    So remove this special casing and ignore all additional error handling for
    REQ_RAHEAD - if this was a real underlying error we'd get a normal read
    once the real read comes in.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 May, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has been in multiple -next releases. There were a few
    late breaking fixes and small features that got added in the last
    couple days, but the whole set has received a build success
    notification from the kbuild robot.

    Change summary:

    - Region media error reporting: A libnvdimm region device is the
    parent to one or more namespaces. To date, media errors have been
    reported via the "badblocks" attribute attached to pmem block
    devices for namespaces in "raw" or "memory" mode. Given that
    namespaces can be in "device-dax" or "btt-sector" mode this new
    interface reports media errors generically, i.e. independent of
    namespace modes or state.

    This subsequently allows userspace tooling to craft "ACPI 6.1
    Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
    requests and submit them via the ioctl path for NVDIMM root bus
    devices.

    - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
    by a request from Linus and feedback from Christoph this allows for
    dax capable drivers to publish their own custom dax operations.
    This fixes the broken assumption that all dax operations are
    related to a persistent memory device, and makes it easier for
    other architectures and platforms to add customized persistent
    memory support.

    - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
    available for storage appliance applications to manually trigger
    memory controllers to drain write-pending buffers that would
    otherwise be flushed automatically by the platform ADR
    (asynchronous-DRAM-refresh) mechanism at a power loss event.
    Support for "locked" DIMMs is included to prevent namespaces from
    surfacing when the namespace label data area is locked. Finally,
    fixes for various reported deadlocks and crashes, also tagged for
    -stable.

    - ACPI / nfit driver updates: General updates of the nfit driver to
    add DSM command overrides, ACPI 6.1 health state flags support, DSM
    payload debug available by default, and various fixes.

    Acknowledgements that came after the branch was pushed:

    - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
    Tested-by: Yi Zhang

    - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
    Tested-by: Toshi Kani "

    * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
    libnvdimm, pfn: fix 'npfns' vs section alignment
    libnvdimm: handle locked label storage areas
    libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
    brd: fix uninitialized use of brd->dax_dev
    block, dax: use correct format string in bdev_dax_supported
    device-dax: fix sysfs attribute deadlock
    libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
    libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
    libnvdimm: rework region badblocks clearing
    acpi, nfit: kill ACPI_NFIT_DEBUG
    libnvdimm: fix clear length of nvdimm_forget_poison()
    libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
    libnvdimm, region: sysfs trigger for nvdimm_flush()
    libnvdimm: fix phys_addr for nvdimm_clear_poison
    x86, dax, pmem: remove indirection around memcpy_from_pmem()
    block: remove block_device_operations ->direct_access()
    block, dax: convert bdev_dax_supported() to dax_direct_access()
    filesystem-dax: convert to dax_direct_access()
    Revert "block: use DAX for partition table reads"
    ext2, ext4, xfs: retrieve dax_device for iomap operations
    ...

    Linus Torvalds
     

04 May, 2017

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - A major update for DM cache that reduces the latency for deciding
    whether blocks should migrate to/from the cache. The bio-prison-v2
    interface supports this improvement by enabling direct dispatch of
    work to workqueues rather than having to delay the actual work
    dispatch to the DM cache core. So the dm-cache policies are much more
    nimble by being able to drive IO as they see fit. One immediate
    benefit from the improved latency is a cache that should be much more
    adaptive to changing workloads.

    - Add a new DM integrity target that emulates a block device that has
    additional per-sector tags that can be used for storing integrity
    information.

    - Add a new authenticated encryption feature to the DM crypt target
    that builds on the capabilities provided by the DM integrity target.

    - Add MD interface for switching the raid4/5/6 journal mode and update
    the DM raid target to use it to enable aid4/5/6 journal write-back
    support.

    - Switch the DM verity target over to using the asynchronous hash
    crypto API (this helps work better with architectures that have
    access to off-CPU algorithm providers, which should reduce CPU
    utilization).

    - Various request-based DM and DM multipath fixes and improvements from
    Bart and Christoph.

    - A DM thinp target fix for a bio structure leak that occurs for each
    discard IFF discard passdown is enabled.

    - A fix for a possible deadlock in DM bufio and a fix to re-check the
    new buffer allocation watermark in the face of competing admin
    changes to the 'max_cache_size_bytes' tunable.

    - A couple DM core cleanups.

    * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
    dm bufio: check new buffer allocation watermark every 30 seconds
    dm bufio: avoid a possible ABBA deadlock
    dm mpath: make it easier to detect unintended I/O request flushes
    dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
    dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
    dm: introduce enum dm_queue_mode to cleanup related code
    dm mpath: verify __pg_init_all_paths locking assumptions at runtime
    dm: verify suspend_locking assumptions at runtime
    dm block manager: remove an unused argument from dm_block_manager_create()
    dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
    dm mpath: delay requeuing while path initialization is in progress
    dm mpath: avoid that path removal can trigger an infinite loop
    dm mpath: split and rename activate_path() to prepare for its expanded use
    dm ioctl: prevent stack leak in dm ioctl call
    dm integrity: use previously calculated log2 of sectors_per_block
    dm integrity: use hex2bin instead of open-coded variant
    dm crypt: replace custom implementation of hex2bin()
    dm crypt: remove obsolete references to per-CPU state
    dm verity: switch to using asynchronous hash crypto API
    dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
    ...

    Linus Torvalds
     

26 Apr, 2017

1 commit

  • Arrange for dm to lookup the dax services available from member devices.
    Update the dax-capable targets, linear and stripe, to route dax
    operations to the underlying device. Changes the target-internal
    ->direct_access() method to more closely align with the dax_operations
    ->direct_access() calling convention.

    Cc: Toshi Kani
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams
     

25 Apr, 2017

1 commit

  • A dm-crypt on dm-integrity device incorrectly advertises an integrity
    profile on the DM crypt device. It can be seen in the files
    "/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target
    advertise the integrity profile. That is incorrect, only the
    dm-integrity target should advertise the integrity profile.

    A general problem in DM is that if we have a DM device that depends on
    another device with an integrity profile, the upper device will always
    advertise the integrity profile, even when the target driver doesn't
    support handling integrity data.

    Most targets don't support integrity data, so we provide a whitelist of
    targets that support it (linear, delay and striped). The targets that
    support passing integrity data to the lower device are marked with the
    flag DM_TARGET_PASSES_INTEGRITY. The DM core will now advertise
    integrity data on a DM device only if all the targets support the
    integrity data.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

09 Apr, 2017

1 commit


08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

29 Jul, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:

    - Replace pcommit with ADR / directed-flushing.

    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement
    either ADR, or provide one or more flush addresses per nvdimm.

    ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
    to the memory controller on a power-fail event.

    Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
    Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
    A flush hint is an mmio address that when written and fenced assures
    that all previous posted writes targeting a given dimm have been
    flushed to media.

    - On-demand ARS (address range scrub).

    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices. When latent errors are detected we re-scrub the
    media to refresh the bad block list, userspace can also request a
    re-scrub at any time.

    - Support for the Microsoft DSM (device specific method) command
    format.

    - Support for EDK2/OVMF virtual disk device memory ranges.

    - Various fixes and cleanups across the subsystem.

    * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
    libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
    nfit: do an ARS scrub on hitting a latent media error
    nfit: move to nfit/ sub-directory
    nfit, libnvdimm: allow an ARS scrub to be triggered on demand
    libnvdimm: register nvdimm_bus devices with an nd_bus driver
    pmem: clarify a debug print in pmem_clear_poison
    x86/insn: remove pcommit
    Revert "KVM: x86: add pcommit support"
    nfit, tools/testing/nvdimm/: unify shutdown paths
    libnvdimm: move ->module to struct nvdimm_bus_descriptor
    nfit: cleanup acpi_nfit_init calling convention
    nfit: fix _FIT evaluation memory leak + use after free
    tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
    tools/testing/nvdimm: add virtual ramdisk range
    acpi, nfit: treat virtual ramdisk SPA as pmem region
    pmem: kill __pmem address space
    pmem: kill wmb_pmem()
    libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
    fs/dax: remove wmb_pmem()
    libnvdimm, pmem: flush posted-write queues on shutdown
    ...

    Linus Torvalds
     

21 Jul, 2016

1 commit

  • Change dm-stripe to implement direct_access function,
    stripe_direct_access(), which maps bdev and sector and
    calls direct_access function of its physical target device.

    Signed-off-by: Toshi Kani
    Signed-off-by: Mike Snitzer

    Toshi Kani
     

08 Jun, 2016

2 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • Separate the op from the rq_flag_bits and have dm
    set/get the bio using bio_set_op_attrs/bio_op.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

03 Sep, 2015

1 commit

  • Pull device mapper update from Mike Snitzer:

    - a couple small cleanups in dm-cache, dm-verity, persistent-data's
    dm-btree, and DM core.

    - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
    prison cells

    - a 4.2-stable fix that adds feature reporting for the dm-stats
    features added in 4.2

    - improve DM-snapshot to not invalidate the on-disk snapshot if
    snapshot device write overflow occurs; but a write overflow triggered
    through the origin device will still invalidate the snapshot.

    - optimize DM-thinp's async discard submission a bit now that late bio
    splitting has been included in block core.

    - switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
    improves performance on very low latency devices (eg. NVMe SSD).

    - document DM RAID 4/5/6's discard support

    [ I did not pull the slab changes, which weren't appropriate for this
    tree, and weren't obviously the right thing to do anyway. At the very
    least they need some discussion and explanation before getting merged.

    Because not pulling the actual tagged commit but doing a partial pull
    instead, this merge commit thus also obviously is missing the git
    signature from the original tag ]

    * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm cache: fix use after freeing migrations
    dm cache: small cleanups related to deferred prison cell cleanup
    dm cache: fix leaking of deferred bio prison cells
    dm raid: document RAID 4/5/6 discard support
    dm stats: report precise_timestamps and histogram in @stats_list output
    dm thin: optimize async discard submission
    dm snapshot: don't invalidate on-disk image on snapshot write overflow
    dm: remove unlikely() before IS_ERR()
    dm: do not override error code returned from dm_get_device()
    dm: test return value for DM_MAPIO_SUBMITTED
    dm verity: remove unused mempool
    dm cache: move wake_waker() from free_migrations() to where it is needed
    dm btree remove: remove unused function get_nr_entries()
    dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
    dm cache policy smq: change the mutex to a spinlock

    Linus Torvalds
     

14 Aug, 2015

1 commit

  • As generic_make_request() is now able to handle arbitrarily sized bios,
    it's no longer necessary for each individual block driver to define its
    own ->merge_bvec_fn() callback. Remove every invocation completely.

    Cc: Jens Axboe
    Cc: Lars Ellenberg
    Cc: drbd-user@lists.linbit.com
    Cc: Jiri Kosina
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Neil Brown
    Cc: linux-raid@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: "Martin K. Petersen"
    Acked-by: NeilBrown (for the 'md' bits)
    Acked-by: Mike Snitzer
    Signed-off-by: Kent Overstreet
    [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
    dm-era-target, and resolve merge conflicts]
    Signed-off-by: Dongsu Park
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

12 Aug, 2015

1 commit

  • Some of the device mapper targets override the error code returned by
    dm_get_device() and return either -EINVAL or -ENXIO. There is nothing
    gained by this override. It is better to propagate the returned error
    code unchanged to caller.

    This work was motivated by hitting an issue where the underlying device
    was busy but -EINVAL was being returned. After this change we get
    -EBUSY instead and it is easier to figure out the problem.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Mike Snitzer

    Vivek Goyal
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

30 May, 2015

1 commit


11 Oct, 2014

1 commit

  • Fix a potential struct stripe_c leak that would occur if the
    chunk_size exceeded the maximum allowed by dm_set_target_max_io_len
    (UINT_MAX). However, in practice there is no possibility of this
    occuring given that chunk_size is of type uint32_t. But it is good to
    fix this to future-proof in case dm_set_target_max_io_len's
    implementation were to change.

    Signed-off-by: Pavitra Kumar
    Signed-off-by: Mike Snitzer

    Pavitra Kumar
     

24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

06 Sep, 2013

1 commit

  • Eliminate the following sparse warnings:
    drivers/md/dm-stripe.c:443:12: warning: symbol 'dm_stripe_init' was not declared. Should it be static?
    drivers/md/dm-stripe.c:456:6: warning: symbol 'dm_stripe_exit' was not declared. Should it be static?

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

10 May, 2013

1 commit

  • Fix a regression in the calculation of the stripe_width in the
    dm stripe target which led to incorrect processing of device limits.

    The stripe_width is the stripe device length divided by the number of
    stripes. The group of commits in the range f14fa69 ("dm stripe: fix
    size test") to eb850de ("dm stripe: support for non power of 2
    chunksize") interfered with each other (a merging error) and led to the
    stripe_width being set incorrectly to the stripe device length divided by
    chunk_size * stripe_count.

    For example, a stripe device's table with: 0 33553920 striped 3 512 ...
    should result in a stripe_width of 11184640 (33553920 / 3), but due to
    the bug it was getting set to 21845 (33553920 / (512 * 3)).

    The impact of this bug is that device topologies that previously worked
    fine with the stripe target are no longer considered valid. In
    particular, there is a higher risk of seeing this issue if one of the
    stripe devices has a 4K logical block size. Resulting in an error
    message like this:
    "device-mapper: table: 253:4: len=21845 not aligned to h/w logical block size 4096 of dm-1"

    The fix is to swap the order of the divisions and to use a temporary
    variable for the second one, so that width retains the intended
    value.

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # 3.6+
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

24 Mar, 2013

1 commit

  • Just a little convenience macro - main reason to add it now is preparing
    for immutable bio vecs, it'll reduce the size of the patch that puts
    bi_sector/bi_size/bi_idx into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Lars Ellenberg
    CC: Jiri Kosina
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: linux-s390@vger.kernel.org
    CC: Chris Mason
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse

    Kent Overstreet
     

02 Mar, 2013

2 commits

  • Use 'bio' in the name of variables and functions that deal with
    bios rather than 'request' to avoid confusion with the normal
    block layer use of 'request'.

    No functional changes.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Avoid returning a truncated table or status string instead of setting
    the DM_BUFFER_FULL_FLAG when the last target of a table fills the
    buffer.

    When processing a table or status request, the function retrieve_status
    calls ti->type->status. If ti->type->status returns non-zero,
    retrieve_status assumes that the buffer overflowed and sets
    DM_BUFFER_FULL_FLAG.

    However, targets don't return non-zero values from their status method
    on overflow. Most targets returns always zero.

    If a buffer overflow happens in a target that is not the last in the
    table, it gets noticed during the next iteration of the loop in
    retrieve_status; but if a buffer overflow happens in the last target, it
    goes unnoticed and erroneously truncated data is returned.

    In the current code, the targets behave in the following way:
    * dm-crypt returns -ENOMEM if there is not enough space to store the
    key, but it returns 0 on all other overflows.
    * dm-thin returns errors from the status method if a disk error happened.
    This is incorrect because retrieve_status doesn't check the error
    code, it assumes that all non-zero values mean buffer overflow.
    * all the other targets always return 0.

    This patch changes the ti->type->status function to return void (because
    most targets don't use the return code). Overflow is detected in
    retrieve_status: if the status method fills up the remaining space
    completely, it is assumed that buffer overflow happened.

    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

22 Dec, 2012

1 commit