14 Oct, 2020

1 commit

  • The 'struct resource' in 'struct dev_pagemap' is only used for holding
    resource span information. The other fields, 'name', 'flags', 'desc',
    'parent', 'sibling', and 'child' are all unused wasted space.

    This is in preparation for introducing a multi-range extension of
    devm_memremap_pages().

    The bulk of this change is unwinding all the places internal to libnvdimm
    that used 'struct resource' unnecessarily, and replacing instances of
    'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

    P2PDMA had a minor usage of the resource flags field, but only to report
    failures with "%pR". That is replaced with an open coded print of the
    range.

    [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
    Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

    Signed-off-by: Dan Williams
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Reviewed-by: Boris Ostrovsky [xen]
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     

06 Oct, 2020

1 commit

  • In reaction to a proposal to introduce a memcpy_mcsafe_fast()
    implementation Linus points out that memcpy_mcsafe() is poorly named
    relative to communicating the scope of the interface. Specifically what
    addresses are valid to pass as source, destination, and what faults /
    exceptions are handled.

    Of particular concern is that even though x86 might be able to handle
    the semantics of copy_mc_to_user() with its common copy_user_generic()
    implementation other archs likely need / want an explicit path for this
    case:

    On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
    >
    > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
    > >
    > > However now I see that copy_user_generic() works for the wrong reason.
    > > It works because the exception on the source address due to poison
    > > looks no different than a write fault on the user address to the
    > > caller, it's still just a short copy. So it makes copy_to_user() work
    > > for the wrong reason relative to the name.
    >
    > Right.
    >
    > And it won't work that way on other architectures. On x86, we have a
    > generic function that can take faults on either side, and we use it
    > for both cases (and for the "in_user" case too), but that's an
    > artifact of the architecture oddity.
    >
    > In fact, it's probably wrong even on x86 - because it can hide bugs -
    > but writing those things is painful enough that everybody prefers
    > having just one function.

    Replace a single top-level memcpy_mcsafe() with either
    copy_mc_to_user(), or copy_mc_to_kernel().

    Introduce an x86 copy_mc_fragile() name as the rename for the
    low-level x86 implementation formerly named memcpy_mcsafe(). It is used
    as the slow / careful backend that is supplanted by a fast
    copy_mc_generic() in a follow-on patch.

    One side-effect of this reorganization is that separating copy_mc_64.S
    to its own file means that perf no longer needs to track dependencies
    for its memcpy_64.S benchmarks.

    [ bp: Massage a bit. ]

    Signed-off-by: Dan Williams
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Acked-by: Michael Ellerman
    Cc:
    Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
    Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

15 Nov, 2019

1 commit

  • The nvdimm core currently maps the full namespace to an ioremap range
    while probing the namespace mode. This can result in probe failures on
    architectures that have limited ioremap space.

    For example, with a large btt namespace that consumes most of I/O remap
    range, depending on the sequence of namespace initialization, the user
    can find a pfn namespace initialization failure due to unavailable I/O
    remap space which nvdimm core uses for temporary mapping.

    nvdimm core can avoid this failure by only mapping the reserved info
    block area to check for pfn superblock type and map the full namespace
    resource only before using the namespace.

    Given that personalities like BTT can be layered on top of any namespace
    type create a generic form of devm_nsio_enable (devm_namespace_enable)
    and use it inside the per-personality attach routines. Now
    devm_namespace_enable() is always paired with disable unless the mapping
    is going to be used for long term runtime access.

    Signed-off-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
    [djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
    Reported-by: kbuild test robot
    Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Dan Williams

    Aneesh Kumar K.V
     

06 Jul, 2019

1 commit

  • This patch adds functionality to perform flush from guest
    to host over VIRTIO. We are registering a callback based
    on 'nd_region' type. virtio_pmem driver requires this special
    flush function. For rest of the region types we are registering
    existing flush function. Report error returned by host fsync
    failure to userspace.

    Signed-off-by: Pankaj Gupta
    Signed-off-by: Dan Williams

    Pankaj Gupta
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 64 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Jun, 2018

1 commit

  • Commit 60622d68227d "x86/asm/memcpy_mcsafe: Return bytes remaining"
    converted callers of memcpy_mcsafe() to expect a positive 'bytes
    remaining' value rather than a negative error code. The nsio_rw_bytes()
    conversion failed to return success. The failure is benign in that
    nsio_rw_bytes() will end up writing back what it just read.

    Fixes: 60622d68227d ("x86/asm/memcpy_mcsafe: Return bytes remaining")
    Cc: Dan Williams
    Reviewed-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

15 May, 2018

1 commit

  • Machine check safe memory copies are currently deployed in the pmem
    driver whenever reading from persistent memory media, so that -EIO is
    returned rather than triggering a kernel panic. While this protects most
    pmem accesses, it is not complete in the filesystem-dax case. When
    filesystem-dax is enabled reads may bypass the block layer and the
    driver via dax_iomap_actor() and its usage of copy_to_iter().

    In preparation for creating a copy_to_iter() variant that can handle
    machine checks, teach memcpy_mcsafe() to return the number of bytes
    remaining rather than -EFAULT when an exception occurs.

    Co-developed-by: Tony Luck
    Signed-off-by: Dan Williams
    Cc: Al Viro
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: hch@lst.de
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     

07 Mar, 2018

1 commit

  • Dynamic debug can be instructed to add the function name to the debug
    output using the +f switch, so there is no need for the libnvdimm
    modules to do it again. If a user decides to add the +f switch for
    libnvdimm's dynamic debug this results in double prints of the function
    name.

    Reported-by: Johannes Thumshirn
    Reported-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

01 Sep, 2017

2 commits

  • Clearing errors or badblocks during a BTT write requires sending an ACPI
    DSM, which means potentially sleeping. Since a BTT IO happens in atomic
    context (preemption disabled, spinlocks may be held), we cannot perform
    error clearing in the course of an IO. Due to this error clearing for
    BTT IOs has hitherto been disabled.

    In this patch we move error clearing out of the atomic section, and thus
    re-enable error clearing with BTTs. When we are about to add a block to
    the free list, we check if it was previously marked as an error, and if
    it was, we add it to the freelist, but also set a flag that says error
    clearing will be required. We then drop the lane (ending the atomic
    context), and send a zero buffer so that the error can be cleared. The
    error flag in the free list is protected by the nd 'lane', and is set
    only be a thread while it holds that lane. When the error is cleared,
    the flag is cleared, but while holding a mutex for that freelist index.

    When writing, we check for two things -
    1/ If the freelist mutex is held or if the error flag is set. If so,
    this is an error block that is being (or about to be) cleared.
    2/ If the block is a known badblock based on nsio->bb

    The second check is required because the BTT map error flag for a map
    entry only gets set when an error LBA is read. If we write to a new
    location that may not have the map error flag set, but still might be in
    the region's badblock list, we can trigger an EIO on the write, which is
    undesirable and completely avoidable.

    Cc: Jeff Moyer
    Cc: Toshi Kani
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • The IO context conversion for rw_bytes missed a case in the BTT write
    path (btt_map_write) which should've been marked as atomic.

    In reality this should not cause a problem, because map writes are to
    small for nsio_rw_bytes to attempt error clearing, but it should be
    fixed for posterity.

    Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
    things like the nfit unit tests, which don't actually sleep, can catch
    bugs like this.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

04 Jul, 2017

1 commit


01 Jul, 2017

1 commit

  • A leftover from the 'bandaid' fix that disabled BTT error clearing in
    rw_bytes resulted in an incorrect check. After we converted these checks
    over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
    redundant, and incorrect. Remove it.

    Fixes: 3ae3d67ba705 ("libnvdimm: add an atomic vs process context flag to rw_bytes")
    Cc:
    Cc: Dave Jiang
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

30 Jun, 2017

1 commit

  • The UEFI 2.7 specification defines an updated BTT metadata format,
    bumping the revision to 2.0. Add support for the new format, while
    retaining compatibility for the old 1.1 format.

    Cc: Toshi Kani
    Cc: Linda Knippers
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

28 Jun, 2017

2 commits

  • Now that all callers of the pmem api have been converted to dax helpers that
    call back to the pmem driver, we can remove include/linux/pmem.h and
    asm/pmem.h.

    Cc:
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: Oliver O'Halloran
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Kill this globally defined wrapper and move to libnvdimm so that we can
    ultimately remove include/linux/pmem.h and asm/pmem.h.

    Cc:
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Jun, 2017

1 commit

  • Starting with v1.2 labels, 'address abstractions' can be hinted via an
    address abstraction id that implies an info-block format. The standard
    address abstraction in the specification is the v2 format of the
    Block-Translation-Table (BTT). Support for that is saved for a later
    patch, for now we add support for the Linux supported address
    abstractions BTT (v1), PFN, and DAX.

    The new 'holder_class' attribute for namespace devices is added for
    tooling to specify the 'abstraction_guid' to store in the namespace label.
    For v1.1 labels this field is undefined and any setting of
    'holder_class' away from the default 'none' value will only have effect
    until the driver is unloaded. Setting 'holder_class' requires that
    whatever device tries to claim the namespace must be of the specified
    class.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

10 Jun, 2017

1 commit

  • The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".

    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.

    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].

    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.

    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

11 May, 2017

1 commit

  • nsio_rw_bytes can clear media errors, but this cannot be done while we
    are in an atomic context due to locking within ACPI. From the BTT,
    ->rw_bytes may be called either from atomic or process context depending
    on whether the calls happen during initialization or during IO.

    During init, we want to ensure error clearing happens, and the flag
    marking process context allows nsio_rw_bytes to do that. When called
    during IO, we're in atomic context, and error clearing can be skipped.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

05 May, 2017

1 commit


02 May, 2017

1 commit

  • This continues the 4.11 status quo of disabling of error clearing from
    the BTT I/O path. Toshi found that even though we have eliminated all
    the libnvdimm sources of sleeping-while-atomic triggers, we still have
    sleeping operations that will occur in the path to send the ACPI DSM to
    the DIMM to clear the error:

    BUG: sleeping function called from invalid context at mm/slab.h:432
    in_atomic(): 1, irqs_disabled(): 0, pid: 13353, name: dd
    Call Trace:
    dump_stack+0x86/0xc3
    ___might_sleep+0x17d/0x250
    __might_sleep+0x4a/0x80
    __kmalloc+0x1c0/0x2e0
    acpi_os_allocate_zeroed+0x2d/0x2f
    acpi_evaluate_object+0x59/0x3b1
    acpi_evaluate_dsm+0xbd/0x10c
    acpi_nfit_ctl+0x1ef/0x7c0 [nfit]
    ? nsio_rw_bytes+0x152/0x280
    nvdimm_clear_poison+0x77/0x140
    nsio_rw_bytes+0x18f/0x280
    btt_write_pg+0x1d4/0x3d0 [nd_btt]
    btt_make_request+0x119/0x2d0 [nd_btt]

    A solution for tracking and handling media errors natively in the BTT is
    needed.

    Cc: Jeff Moyer
    Cc: Dave Jiang
    Cc: Vishal Verma
    Reported-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     

01 May, 2017

1 commit

  • A debug patch to turn the standard device_lock() into something that
    lockdep can analyze yielded the following:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    4.11.0-rc4+ #106 Tainted: G O
    -------------------------------------------------------
    lt-libndctl/1898 is trying to acquire lock:
    (&dev->nvdimm_mutex/3){+.+.+.}, at: [] nd_attach_ndns+0x178/0x1b0 [libnvdimm]

    but task is already holding lock:
    (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [] nvdimm_bus_lock+0x21/0x30 [libnvdimm]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
    lock_acquire+0xf6/0x1f0
    __mutex_lock+0x88/0x980
    mutex_lock_nested+0x1b/0x20
    nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
    nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
    nd_pmem_probe+0x14/0x180 [nd_pmem]
    nvdimm_bus_probe+0xa9/0x260 [libnvdimm]

    -> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
    __lock_acquire+0x1107/0x1280
    lock_acquire+0xf6/0x1f0
    __mutex_lock+0x88/0x980
    mutex_lock_nested+0x1b/0x20
    nd_attach_ndns+0x178/0x1b0 [libnvdimm]
    nd_namespace_store+0x308/0x3c0 [libnvdimm]
    namespace_store+0x87/0x220 [libnvdimm]

    In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.

    Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
    nd_{attach,detach}_ndns() operations.

    Cc:
    Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices")
    Reported-by: Yi Zhang
    Signed-off-by: Dan Williams

    Dan Williams
     

28 Apr, 2017

1 commit


26 Apr, 2017

1 commit

  • memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
    serves no real benefit aside from affording a more generic function name
    than the x86-specific 'mcsafe'. However this would not be the first time
    that x86 terminology leaked into the global namespace. For lack of
    better name, just use memcpy_mcsafe() directly.

    This conversion also catches a place where we should have been using
    plain memcpy, acpi_nfit_blk_single_io().

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Acked-by: Tony Luck
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Apr, 2017

1 commit

  • This reverts commit 4aa5615e080a "libnvdimm: band aid btt vs clear
    poison locking".

    Now that poison list locking has been converted to a spinlock and poison
    list entry allocation during i/o has been converted to GFP_NOWAIT,
    revert the band-aid that disabled error clearing from btt i/o.

    Cc: Vishal Verma
    Cc: Dave Jiang
    Signed-off-by: Dan Williams

    Dan Williams
     

11 Apr, 2017

1 commit

  • The following warning results from holding a lane spinlock,
    preempt_disable(), or the btt map spinlock and then trying to take the
    reconfig_mutex to walk the poison list and potentially add new entries.

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
    in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
    [..]
    Call Trace:
    dump_stack+0x85/0xc8
    ___might_sleep+0x184/0x250
    __might_sleep+0x4a/0x90
    __mutex_lock+0x58/0x9b0
    ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    ? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
    ? acpi_nfit_forget_poison+0x79/0x80 [nfit]
    ? _raw_spin_unlock+0x27/0x40
    mutex_lock_nested+0x1b/0x20
    nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    nvdimm_forget_poison+0x25/0x50 [libnvdimm]
    nvdimm_clear_poison+0x106/0x140 [libnvdimm]
    nsio_rw_bytes+0x164/0x270 [libnvdimm]
    btt_write_pg+0x1de/0x3e0 [nd_btt]
    ? blk_queue_enter+0x30/0x290
    btt_make_request+0x11a/0x310 [nd_btt]
    ? blk_queue_enter+0xb7/0x290
    ? blk_queue_enter+0x30/0x290
    generic_make_request+0x118/0x3b0

    As a minimal fix, disable error clearing when the BTT is enabled for the
    namespace. For the final fix a larger rework of the poison list locking
    is needed.

    Note that this is not a problem in the blk case since that path never
    calls nvdimm_clear_poison().

    Cc:
    Fixes: 82bf1037f2ca ("libnvdimm: check and clear poison before writing to pmem")
    Cc: Dave Jiang
    [jeff: dynamically disable error clearing in the btt case]
    Suggested-by: Jeff Moyer
    Reviewed-by: Jeff Moyer
    Reported-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

17 Dec, 2016

1 commit

  • Colin, via static analysis, reports that the length could be negative
    from nvdimm_clear_poison() in the error case. There was a similar
    problem with commit 0a3f27b9a6a8 "libnvdimm, namespace: avoid multiple
    sector calculations" that I noticed when merging the for-4.10/libnvdimm
    topic branch into libnvdimm-for-next, but I missed this one. Fix both of
    them to the following procedure:

    * if we clear a block's worth of media, clear that many blocks in
    badblocks

    * if we clear less than the requested size of the transfer return an
    error

    * always invalidate cache after any non-error / non-zero
    nvdimm_clear_poison result

    Fixes: 82bf1037f2ca ("libnvdimm: check and clear poison before writing to pmem")
    Fixes: 0a3f27b9a6a8 ("libnvdimm, namespace: avoid multiple sector calculations")
    Cc: Fabian Frederick
    Cc: Dave Jiang
    Reported-by: Colin Ian King
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Dec, 2016

1 commit


05 Dec, 2016

1 commit


29 Nov, 2016

1 commit

  • Here is an example /proc/iomem listing for a system with 2 namespaces,
    one in "sector" mode and one in "memory" mode:

    1fc000000-2fbffffff : Persistent Memory (legacy)
    1fc000000-2fbffffff : namespace1.0
    340000000-34fffffff : Persistent Memory
    340000000-34fffffff : btt0.1

    Here is the corresponding ndctl listing:

    # ndctl list
    [
    {
    "dev":"namespace1.0",
    "mode":"memory",
    "size":4294967296,
    "blockdev":"pmem1"
    },
    {
    "dev":"namespace0.0",
    "mode":"sector",
    "size":267091968,
    "uuid":"f7594f86-badb-4592-875f-ded577da2eaf",
    "sector_size":4096,
    "blockdev":"pmem0s"
    }
    ]

    Notice that the ndctl listing is purely in terms of namespace devices,
    while the iomem listing leaks the internal "btt0.1" implementation
    detail. Given that ndctl requires the namespace device name to change
    the mode, for example:

    # ndctl create-namespace --reconfig=namespace0.0 --mode=raw --force

    ...use the namespace name in the iomem listing to keep the claiming
    device name consistent across different mode settings.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

12 Nov, 2016

1 commit


13 Jul, 2016

1 commit


18 Jun, 2016

1 commit

  • Prompted by commit 287980e49ffc "remove lots of IS_ERR_VALUE abuses", I
    ran make coccicheck against drivers/nvdimm/ and found that:

    if (IS_ERR(x))
    return PTR_ERR(x);
    return 0;

    ...can be replaced with PTR_ERR_OR_ZERO().

    Reported-by: Linus Torvalds
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

22 May, 2016

1 commit

  • The ndctl unit tests discovered that the dax enabling omitted updates to
    nd_detach_and_reset(). This routine clears device the configuration
    when the namespace is detached. Without this clearing userspace may
    assume that the device is in the process of being configured by another
    agent in the system.

    Signed-off-by: Dan Williams

    Dan Williams
     

10 May, 2016

1 commit

  • Device DAX is the device-centric analogue of Filesystem DAX
    (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and
    mapped without need of an intervening file system. This initial
    infrastructure arranges for a libnvdimm pfn-device to be represented as
    a different device-type so that it can be attached to a driver other
    than the pmem driver.

    Signed-off-by: Dan Williams

    Dan Williams
     

23 Apr, 2016

1 commit

  • In preparation for providing an alternative (to block device) access
    mechanism to persistent memory, convert pmem_rw_bytes() to
    nsio_rw_bytes(). This allows ->rw_bytes() functionality without
    requiring a 'struct pmem_device' to be instantiated.

    In other words, when ->rw_bytes() is in use i/o is driven through
    'struct nd_namespace_io', otherwise it is driven through 'struct
    pmem_device' and the block layer. This consolidates the disjoint calls
    to devm_exit_badblocks() and devm_memunmap() into a common
    devm_nsio_disable() and cleans up the init path to use a unified
    pmem_attach_disk() implementation.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

29 Aug, 2015

1 commit

  • Implement the base infrastructure for libnvdimm PFN devices. Similar to
    BTT devices they take a namespace as a backing device and layer
    functionality on top. In this case the functionality is reserving space
    for an array of 'struct page' entries to be handed out through
    pfn_to_page(). For now this is just the basic libnvdimm-device-model for
    configuring the base PFN device.

    As the namespace claiming mechanism for PFN devices is mostly identical
    to BTT devices drivers/nvdimm/claim.c is created to house the common
    bits.

    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams