23 Sep, 2020

1 commit

  • commit e2ec5128254518cae320d5dc631b71b94160f663 upstream.

    DM was calling generic_fsdax_supported() to determine whether a device
    referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
    they don't have to duplicate common generic checks. High level code
    should call dax_supported() helper which that calls into appropriate
    helper for the particular device. This problem manifested itself as
    kernel messages:

    dm-3: error: dax access failed (-95)

    when lvm2-testsuite run in cases where a DM device was stacked on top of
    another DM device.

    Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
    Cc:
    Tested-by: Adrian Huang
    Signed-off-by: Jan Kara
    Acked-by: Mike Snitzer
    Reported-by: kernel test robot
    Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

06 Jul, 2019

1 commit

  • This patch adds 'DAXDEV_SYNC' flag which is set
    for nd_region doing synchronous flush. This later
    is used to disable MAP_SYNC functionality for
    ext4 & xfs filesystem for devices don't support
    synchronous flush.

    Signed-off-by: Pankaj Gupta
    Signed-off-by: Dan Williams

    Pankaj Gupta
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 64 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 May, 2019

2 commits

  • Convert the dax filesystem to the new internal mount API as the old
    one will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells
    cc: Dan Williams
    cc: Vishal Verma
    cc: Keith Busch
    cc: Dave Jiang
    cc: linux-nvdimm@lists.01.org
    Signed-off-by: Al Viro

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

2 commits

  • The device-dax fs is only there to allocate a common inode for each
    device-node that refers to the same device by major:minor. It is
    otherwise not user mountable and need not be displayed in
    /proc/filesystems.

    Reported-by: Al Viro
    Acked-by: Al Viro
    Signed-off-by: Dan Williams
    Signed-off-by: Al Viro

    Dan Williams
     
  • Pankaj reports that starting with commit ad428cdb525a "dax: Check the
    end of the block-device capacity with dax_direct_access()" device-mapper
    no longer allows dax operation. This results from the stricter checks in
    __bdev_dax_supported() that validate that the start and end of a
    block-device map to the same 'pagemap' instance.

    Teach the dax-core and device-mapper to validate the 'pagemap' on a
    per-target basis. This is accomplished by refactoring the
    bdev_dax_supported() internals into generic_fsdax_supported() which
    takes a sector range to validate. Consequently generic_fsdax_supported()
    is suitable to be used in a device-mapper ->iterate_devices() callback.
    A new ->dax_supported() operation is added to allow composite devices to
    split and route upper-level bdev_dax_supported() requests.

    Fixes: ad428cdb525a ("dax: Check the end of the block-device...")
    Cc:
    Cc: Ira Weiny
    Cc: Dave Jiang
    Cc: Keith Busch
    Cc: Matthew Wilcox
    Cc: Vishal Verma
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Reviewed-by: Jan Kara
    Reported-by: Pankaj Gupta
    Reviewed-by: Pankaj Gupta
    Tested-by: Pankaj Gupta
    Tested-by: Vaibhav Jain
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams
     

02 May, 2019

1 commit

  • we might want to drop ->destroy_inode() there - it's used only for
    WARN_ON() now, and AFAICS that could be moved to ->evict_inode()
    if we had one...

    Reviewed-by: Jan Kara
    Acked-by: Dan Williams
    Signed-off-by: Al Viro

    Al Viro
     

17 Mar, 2019

1 commit

  • Pull device-dax updates from Dan Williams:
    "New device-dax infrastructure to allow persistent memory and other
    "reserved" / performance differentiated memories, to be assigned to
    the core-mm as "System RAM".

    Some users want to use persistent memory as additional volatile
    memory. They are willing to cope with potential performance
    differences, for example between DRAM and 3D Xpoint, and want to use
    typical Linux memory management apis rather than a userspace memory
    allocator layered over an mmap() of a dax file. The administration
    model is to decide how much Persistent Memory (pmem) to use as System
    RAM, create a device-dax-mode namespace of that size, and then assign
    it to the core-mm. The rationale for device-dax is that it is a
    generic memory-mapping driver that can be layered over any "special
    purpose" memory, not just pmem. On subsequent boots udev rules can be
    used to restore the memory assignment.

    One implication of using pmem as RAM is that mlock() no longer keeps
    data off persistent media. For this reason it is recommended to enable
    NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
    at rest. We considered making this recommendation an actively enforced
    requirement, but in the end decided to leave it as a distribution /
    administrator policy to allow for emulation and test environments that
    lack security capable NVDIMMs.

    Summary:

    - Replace the /sys/class/dax device model with /sys/bus/dax, and
    include a compat driver so distributions can opt-in to the new ABI.

    - Allow for an alternative driver for the device-dax address-range

    - Introduce the 'kmem' driver to hotplug / assign a device-dax
    address-range to the core-mm.

    - Arrange for the device-dax target-node to be onlined so that the
    newly added memory range can be uniquely referenced by numa apis"

    NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
    we currently have special - and very annoying rules in the kernel about
    accessing PMEM only with the "MC safe" accessors, because machine checks
    inside the regular repeat string copy functions can be fatal in some
    (not described) circumstances.

    And apparently the PMEM modules can cause that a lot more than regular
    RAM. The argument is that this happens because PMEM doesn't necessarily
    get scrubbed at boot like RAM does, but that is planned to be added for
    the user space tooling.

    Quoting Dan from another email:
    "The exposure can be reduced in the volatile-RAM case by scanning for
    and clearing errors before it is onlined as RAM. The userspace tooling
    for that can be in place before v5.1-final. There's also runtime
    notifications of errors via acpi_nfit_uc_error_notify() from
    background scrubbers on the DIMM devices. With that mechanism the
    kernel could proactively clear newly discovered poison in the volatile
    case, but that would be additional development more suitable for v5.2.

    I understand the concern, and the need to highlight this issue by
    tapping the brakes on feature development, but I don't see PMEM as RAM
    making the situation worse when the exposure is also there via DAX in
    the PMEM case. Volatile-RAM is arguably a safer use case since it's
    possible to repair pages where the persistent case needs active
    application coordination"

    * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    device-dax: "Hotplug" persistent memory for use like normal RAM
    mm/resource: Let walk_system_ram_range() search child resources
    mm/memory-hotplug: Allow memory resources to be children
    mm/resource: Move HMM pr_debug() deeper into resource code
    mm/resource: Return real error codes from walk failures
    device-dax: Add a 'modalias' attribute to DAX 'bus' devices
    device-dax: Add a 'target_node' attribute
    device-dax: Auto-bind device after successful new_id
    acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
    device-dax: Add /sys/class/dax backwards compatibility
    device-dax: Add support for a dax override driver
    device-dax: Move resource pinning+mapping into the common driver
    device-dax: Introduce bus + driver model
    device-dax: Start defining a dax bus model
    device-dax: Remove multi-resource infrastructure
    device-dax: Kill dax_region base
    device-dax: Kill dax_region ida

    Linus Torvalds
     

21 Feb, 2019

1 commit

  • The checks in __bdev_dax_supported() helped mitigate a potential data
    corruption bug in the pmem driver's handling of section alignment
    padding. Strengthen the checks, including checking the end of the range,
    to validate the dev_pagemap, Xarray entries, and sector-to-pfn
    translation established for pmem namespaces.

    Acked-by: Jan Kara
    Cc: "Darrick J. Wong"
    Signed-off-by: Dan Williams

    Dan Williams
     

07 Jan, 2019

2 commits

  • In support of multiple device-dax instances per device-dax-region and
    allowing the 'kmem' driver to attach to dax-instances instead of the
    current device-node access, convert the dax sub-system from a class to a
    bus. Recall that the kmem driver takes reserved / special purpose
    memories and assigns them to be managed by the core-mm.

    Aside from the fact the device-dax instances are registered and probed
    on a bus, two other lifetime-management changes are made:

    1/ Delay attaching a cdev until driver probe time

    2/ A new run_dax() helper is introduced to allow restoring dax-operation
    after a kill_dax() event. So, at driver ->probe() time we run_dax()
    and at ->remove() time we kill_dax() and invalidate all mappings.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Towards eliminating the dax_class, move the dax-device-attribute
    enabling to a new bus.c file in the core. The amount of code
    thrash of sub-sequent patches is reduced as no logic changes are made,
    just pure code movement.

    A temporary export of unregister_dex_dax() and dax_attribute_groups is
    needed to preserve compilation, but those symbols become static again in
    a follow-on patch.

    Signed-off-by: Dan Williams

    Dan Williams
     

26 Aug, 2018

1 commit

  • Pull libnvdimm updates from Dave Jiang:
    "Collection of misc libnvdimm patches for 4.19 submission:

    - Adding support to read locked nvdimm capacity.

    - Change test code to make DSM failure code injection an override.

    - Add support for calculate maximum contiguous area for namespace.

    - Add support for queueing a short ARS when there is on going ARS for
    nvdimm.

    - Allow NULL to be passed in to ->direct_access() for kaddr and pfn
    params.

    - Improve smart injection support for nvdimm emulation testing.

    - Fix test code that supports for emulating controller temperature.

    - Fix hang on error before devm_memremap_pages()

    - Fix a bug that causes user memory corruption when data returned to
    user for ars_status.

    - Maintainer updates for Ross Zwisler emails and adding Jan Kara to
    fsdax"

    * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm: fix ars_status output length calculation
    device-dax: avoid hang on error before devm_memremap_pages()
    tools/testing/nvdimm: improve emulation of smart injection
    filesystem-dax: Do not request kaddr and pfn when not required
    md/dm-writecache: Don't request pointer dummy_addr when not required
    dax/super: Do not request a pointer kaddr when not required
    tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()
    s390, dcssblk: kaddr and pfn can be NULL to ->direct_access()
    libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access()
    acpi/nfit: queue issuing of ars when an uc error notification comes in
    libnvdimm: Export max available extent
    libnvdimm: Use max contiguous area for namespace size
    MAINTAINERS: Add Jan Kara for filesystem DAX
    MAINTAINERS: update Ross Zwisler's email address
    tools/testing/nvdimm: Fix support for emulating controller temperature
    tools/testing/nvdimm: Make DSM failure code injection an override
    acpi, nfit: Prefer _DSM over _LSR for namespace label reads
    libnvdimm: Introduce locked DIMM capacity support

    Linus Torvalds
     

31 Jul, 2018

1 commit


29 Jun, 2018

1 commit

  • Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This
    is needed for DM configurations where the first element in the dm-linear or
    dm-stripe target supports DAX, but other elements do not. Without this
    check __bdev_dax_supported() will pass for such devices, letting a
    filesystem on that device mount with the DAX option.

    Signed-off-by: Ross Zwisler
    Suggested-by: Mike Snitzer
    Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
    Cc: stable@vger.kernel.org
    Acked-by: Dan Williams
    Reviewed-by: Toshi Kani
    Signed-off-by: Mike Snitzer

    Ross Zwisler
     

09 Jun, 2018

3 commits

  • Pull libnvdimm updates from Dan Williams:
    "This adds a user for the new 'bytes-remaining' updates to
    memcpy_mcsafe() that you already received through Ingo via the
    x86-dax- for-linus pull.

    Not included here, but still targeting this cycle, is support for
    handling memory media errors (poison) consumed via userspace dax
    mappings.

    Summary:

    - DAX broke a fundamental assumption of truncate of file mapped
    pages. The truncate path assumed that it is safe to disconnect a
    pinned page from a file and let the filesystem reclaim the physical
    block. With DAX the page is equivalent to the filesystem block.
    Introduce dax_layout_busy_page() to enable filesystems to wait for
    pinned DAX pages to be released. Without this wait a filesystem
    could allocate blocks under active device-DMA to a new file.

    - DAX arranges for the block layer to be bypassed and uses
    dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
    However, the memcpy_mcsafe() facility is available through the pmem
    block driver. In order to safely handle media errors, via the DAX
    block-layer bypass, introduce copy_to_iter_mcsafe().

    - Fix cache management policy relative to the ACPI NFIT Platform
    Capabilities Structure to properly elide cache flushes when they
    are not necessary. The table indicates whether CPU caches are
    power-fail protected. Clarify that a deep flush is always performed
    on REQ_{FUA,PREFLUSH} requests"

    * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
    dax: Use dax_write_cache* helpers
    libnvdimm, pmem: Do not flush power-fail protected CPU caches
    libnvdimm, pmem: Unconditionally deep flush on *sync
    libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
    acpi, nfit: Remove ecc_unit_size
    dax: dax_insert_mapping_entry always succeeds
    libnvdimm, e820: Register all pmem resources
    libnvdimm: Debug probe times
    linvdimm, pmem: Preserve read-only setting for pmem devices
    x86, nfit_test: Add unit test for memcpy_mcsafe()
    pmem: Switch to copy_to_iter_mcsafe()
    dax: Report bytes remaining in dax_iomap_actor()
    dax: Introduce a ->copy_to_iter dax operation
    uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
    xfs, dax: introduce xfs_break_dax_layouts()
    xfs: prepare xfs_break_layouts() for another layout type
    xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
    mm, fs, dax: handle layout changes to pinned dax mappings
    mm: fix __gup_device_huge vs unmap
    mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
    ...

    Linus Torvalds
     
  • Dan Williams
     
  • Dan Williams
     

07 Jun, 2018

1 commit


31 May, 2018

2 commits

  • The function return values are confusing with the way the function is
    named. We expect a true or false return value but it actually returns
    0/-errno. This makes the code very confusing. Changing the return values
    to return a bool where if DAX is supported then return true and no DAX
    support returns false.

    Signed-off-by: Dave Jiang
    Signed-off-by: Ross Zwisler
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Jiang
     
  • Change bdev_dax_supported so it takes a bdev parameter. This enables
    multi-device filesystems like xfs to check that a dax device can work for
    the particular filesystem. Once that's in place, actually fix all the
    parts of XFS where we need to be able to distinguish between datadev and
    rtdev.

    This patch fixes the problem where we screw up the dax support checking
    in xfs if the datadev and rtdev have different dax capabilities.

    Signed-off-by: Darrick J. Wong
    [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
    Signed-off-by: Ross Zwisler
    Reviewed-by: Eric Sandeen

    Darrick J. Wong
     

23 May, 2018

1 commit

  • Similar to the ->copy_from_iter() operation, a platform may want to
    deploy an architecture or device specific routine for handling reads
    from a dax_device like /dev/pmemX. On x86 this routine will point to a
    machine check safe version of copy_to_iter(). For now, add the plumbing
    to device-mapper and the dax core.

    Cc: Ross Zwisler
    Cc: Mike Snitzer
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

22 May, 2018

1 commit

  • In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
    be able to rely on the fact that they will get wakeups on dev_pagemap
    page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
    generic_dax_page_free() as common indicator / infrastructure for dax
    filesytems to require. With this change there are no users of the
    MEMORY_DEVICE_HOST designation, so remove it.

    The HMM sub-system extended dev_pagemap to arrange a callback when a
    dev_pagemap managed page is freed. Since a dev_pagemap page is free /
    idle when its reference count is 1 it requires an additional branch to
    check the page-type at put_page() time. Given put_page() is a hot-path
    we do not want to incur that check if HMM is not in use, so a static
    branch is used to avoid that overhead when not necessary.

    Now, the FS_DAX implementation wants to reuse this mechanism for
    receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
    static-key into a generic mechanism that either HMM or FS_DAX code paths
    can enable.

    For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
    care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
    However, we still need to support FS_DAX in the FS_DAX_LIMITED case
    implemented by the s390/dcssblk driver.

    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Michal Hocko
    Reported-by: kbuild test robot
    Reported-by: Thomas Meyer
    Reported-by: Dave Jiang
    Cc: "Jérôme Glisse"
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

31 Mar, 2018

1 commit

  • In preparation for examining the busy state of dax pages in the truncate
    path, switch from sectors to pfns in the radix.

    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     

27 Feb, 2018

1 commit


20 Jan, 2018

1 commit

  • If a dax buffer from a device that does not map pages is passed to
    read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
    gdb attempts to examine the contents of a dax buffer from a device that
    does not map pages it triggers SIGBUS. If fork(2) is called on a process
    with a dax mapping from a device that does not map pages it triggers
    SIGBUS. 'struct page' is required otherwise several kernel code paths
    break in surprising ways. Disable filesystem-dax on devices that do not
    map pages.

    In addition to needing pfn_to_page() to be valid we also require devmap
    pages. We need this to detect dax pages in the get_user_pages_fast()
    path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
    drivers that have not supported get_user_pages() to date we allow them
    to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
    option which requires ->direct_access() to return pfn_t_special() pfns.
    This leaves DAX support in brd disabled and scheduled for removal.

    Note that when the initial dax support was being merged a few years back
    there was concern that struct page was unsuitable for use with next
    generation persistent memory devices. The theoretical concern was that
    struct page access, being such a hotly used data structure in the
    kernel, would lead to media wear out. While that was a reasonable
    conservative starting position it has not held true in practice. We have
    long since committed to using devm_memremap_pages() to support higher
    order kernel functionality that needs get_user_pages() and
    pfn_to_page().

    Cc: Jeff Moyer
    Cc: Ross Zwisler
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Gerald Schaefer
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Nov, 2017

3 commits

  • Don't crash in case of allocation failure in dax_alloc_inode.

    syzkaller hit the following crash on e4880bc5dfb1

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    [..]
    RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348
    Call Trace:
    alloc_inode+0x65/0x180 fs/inode.c:208
    new_inode_pseudo+0x69/0x190 fs/inode.c:890
    new_inode+0x1c/0x40 fs/inode.c:919
    mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261
    mount_pseudo include/linux/fs.h:2137 [inline]
    dax_mount+0x2e/0x40 drivers/dax/super.c:388
    mount_fs+0x66/0x2d0 fs/super.c:1223

    Cc:
    Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider...")
    Reported-by: syzbot
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Dan Williams

    Mikulas Patocka
     
  • Now that dax_flush() is no longer a driver callback (commit c3ca015fab6d
    "dax: remove the pmem_dax_ops->flush abstraction"), stop requiring the
    dax_read_lock() to be held and the device to be alive. This is in
    preparation for switching filesystem-dax to store pfns instead of
    sectors in the radix.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Before we add another failure reason, quiet the existing log messages.
    Leave it to the caller to decide if bdev_dax_supported() failures are
    errors worth emitting to the log.

    Reported-by: Jeff Moyer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Sep, 2017

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Some request-based DM core and DM multipath fixes and cleanups

    - Constify a few variables in DM core and DM integrity

    - Add bufio optimization and checksum failure accounting to DM
    integrity

    - Fix DM integrity to avoid checking integrity of failed reads

    - Fix DM integrity to use init_completion

    - A couple DM log-writes target fixes

    - Simplify DAX flushing by eliminating the unnecessary flush
    abstraction that was stood up for DM's use.

    * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dax: remove the pmem_dax_ops->flush abstraction
    dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
    dm integrity: make blk_integrity_profile structure const
    dm integrity: do not check integrity for failed read operations
    dm log writes: fix >512b sectorsize support
    dm log writes: don't use all the cpu while waiting to log blocks
    dm ioctl: constify ioctl lookup table
    dm: constify argument arrays
    dm integrity: count and display checksum failures
    dm integrity: optimize writing dm-bufio buffers that are partially changed
    dm rq: do not update rq partially in each ending bio
    dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
    dm mpath: complain about unsupported __multipath_map_bio() return values
    dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through

    Linus Torvalds
     

11 Sep, 2017

1 commit

  • Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
    buggy. A DM device may be composed of multiple underlying devices and
    all of them need to be flushed. That commit just routes the flush
    request to the first device and ignores the other devices.

    It could be fixed by adding more complex logic to the device mapper. But
    there is only one implementation of the method pmem_dax_ops->flush - that
    is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
    don't need the pmem_dax_ops->flush abstraction at all, we can call
    arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
    can't ever reach anything different from arch_wb_cache_pmem().

    It should be also pointed out that for some uses of persistent memory it
    is needed to flush only a very small amount of data (such as 1 cacheline),
    and it would be overkill if we go through that device mapper machinery for
    a single flushed cache line.

    Fix this by removing the pmem_dax_ops->flush abstraction and call
    arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
    mapper code that forwards the flushes.

    Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Reviewed-by: Dan Williams
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

04 Sep, 2017

1 commit

  • The 0day kbuild robot reports:

    >> drivers//dax/super.c:64:20: error: redefinition of 'fs_dax_get_by_bdev'
    struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
    ^~~~~~~~~~~~~~~~~~
    In file included from drivers//dax/super.c:22:0:
    include/linux/dax.h:76:34: note: previous definition of 'fs_dax_get_by_bdev' was here
    static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
    ^~~~~~~~~~~~~~~~~~

    Protect the definition of fs_dax_get_by_bdev() in drivers/dax/super.c
    with an ifdef.

    Fixes: 78f354735081 ("dax: introduce a fs_dax_get_by_bdev() helper")
    Cc: Christoph Hellwig
    Cc: Darrick J. Wong
    Reviewed-by: Jan Kara
    Reported-by: kbuild test robot
    Signed-off-by: Dan Williams

    Dan Williams
     

31 Aug, 2017

1 commit

  • Add a helper that can replace the following common pattern:

    if (blk_queue_dax(bdev->bd_queue))
    fs_dax_get_by_host(bdev->bd_disk->disk_name);

    This will be used to move dax_device lookup from iomap-operation time to
    fs-mount time.

    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Dan Williams

    Dan Williams
     

27 Jul, 2017

1 commit

  • Currently dm_dax_flush() is not being called, even if underlying dax
    device supports write cache, because DAXDEV_WRITE_CACHE is not being
    propagated up to the DM dax device.

    If the underlying dax device supports write cache, set
    DAXDEV_WRITE_CACHE on the DM dax device. This will cause dm_dax_flush()
    to be called.

    Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support")
    Signed-off-by: Vivek Goyal
    Acked-by: Dan Williams
    Signed-off-by: Mike Snitzer

    Vivek Goyal
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

30 Jun, 2017

2 commits

  • The dax_flush() operation can be turned into a nop on platforms where
    firmware arranges for cpu caches to be flushed on a power-fail event.
    The ACPI 6.2 specification defines a mechanism for the platform to
    indicate this capability so the kernel can select the proper default.
    However, for other platforms, the administrator must toggle this setting
    manually.

    Given this flush setting is a dax-specific mechanism we advertise it
    through a 'dax' attribute group hanging off a host device. For example,
    a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
    'write_cache' attribute to control response to dax cache flush requests.
    This is similar to the 'queue/write_cache' attribute that appears under
    block devices.

    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Suggested-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • In preparation for adding more flags, convert the existing flag to a
    bit-flag.

    Signed-off-by: Dan Williams

    Dan Williams
     

28 Jun, 2017

1 commit

  • Require all dax-drivers to register a ->copy_from_iter() operation so
    that it is clear which dax_operations are optional and which must be
    implemented for filesystem-dax to operate.

    Cc: Gerald Schaefer
    Suggested-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jun, 2017

1 commit

  • Allow device-mapper to route flush operations to the
    per-target implementation. In order for the device stacking to work we
    need a dax_dev and a pgoff relative to that device. This gives each
    layer of the stack the information it needs to look up the operation
    pointer for the next level.

    This conceptually allows for an array of mixed device drivers with
    varying flush implementations.

    Reviewed-by: Toshi Kani
    Reviewed-by: Mike Snitzer
    Signed-off-by: Dan Williams

    Dan Williams