Eric Lee / smarc-fsl-linux-kernel

14 Oct, 2020

1 commit

a4574f63e mm/memremap_pages: convert to 'struct range' ... Browse Code »

The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Reviewed-by: Boris Ostrovsky [xen]
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800

06 Oct, 2020

1 commit

ec6347bb4 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() ... Browse Code »

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams
Signed-off-by: Borislav Petkov
Reviewed-by: Tony Luck
Acked-by: Michael Ellerman
Cc:
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

Dan Williams
2020-10-06 17:18:04 +0800

15 Nov, 2019

1 commit

8f4b01fcd libnvdimm/namespace: Differentiate between probe mapping and runtime mapping ... Browse Code »

The nvdimm core currently maps the full namespace to an ioremap range
while probing the namespace mode. This can result in probe failures on
architectures that have limited ioremap space.

For example, with a large btt namespace that consumes most of I/O remap
range, depending on the sequence of namespace initialization, the user
can find a pfn namespace initialization failure due to unavailable I/O
remap space which nvdimm core uses for temporary mapping.

nvdimm core can avoid this failure by only mapping the reserved info
block area to check for pfn superblock type and map the full namespace
resource only before using the namespace.

Given that personalities like BTT can be layered on top of any namespace
type create a generic form of devm_nsio_enable (devm_namespace_enable)
and use it inside the per-personality attach routines. Now
devm_namespace_enable() is always paired with disable unless the mapping
is going to be used for long term runtime access.

Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
[djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
Reported-by: kbuild test robot
Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams

Aneesh Kumar K.V
2019-11-15 11:08:47 +0800

06 Jul, 2019

1 commit

c5d4355d1 libnvdimm: nd_region flush callback support ... Browse Code »

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta
Signed-off-by: Dan Williams

Pankaj Gupta
2019-07-06 06:19:10 +0800

05 Jun, 2019

1 commit

5b497af42 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:38 +0800

29 Jun, 2018

1 commit

b62cc6fdd libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes() ... Browse Code »

Commit 60622d68227d "x86/asm/memcpy_mcsafe: Return bytes remaining"
converted callers of memcpy_mcsafe() to expect a positive 'bytes
remaining' value rather than a negative error code. The nsio_rw_bytes()
conversion failed to return success. The failure is benign in that
nsio_rw_bytes() will end up writing back what it just read.

Fixes: 60622d68227d ("x86/asm/memcpy_mcsafe: Return bytes remaining")
Cc: Dan Williams
Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2018-06-29 09:21:30 +0800

15 May, 2018

1 commit

60622d682 x86/asm/memcpy_mcsafe: Return bytes remaining ... Browse Code »

Machine check safe memory copies are currently deployed in the pmem
driver whenever reading from persistent memory media, so that -EIO is
returned rather than triggering a kernel panic. While this protects most
pmem accesses, it is not complete in the filesystem-dax case. When
filesystem-dax is enabled reads may bypass the block layer and the
driver via dax_iomap_actor() and its usage of copy_to_iter().

In preparation for creating a copy_to_iter() variant that can handle
machine checks, teach memcpy_mcsafe() to return the number of bytes
remaining rather than -EFAULT when an exception occurs.

Co-developed-by: Tony Luck
Signed-off-by: Dan Williams
Cc: Al Viro
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar

Dan Williams
2018-05-15 14:32:42 +0800

07 Mar, 2018

1 commit

426824d63 libnvdimm: remove redundant __func__ in dev_dbg ... Browse Code »

Dynamic debug can be instructed to add the function name to the debug
output using the +f switch, so there is no need for the libnvdimm
modules to do it again. If a user decides to add the +f switch for
libnvdimm's dynamic debug this results in double prints of the function
name.

Reported-by: Johannes Thumshirn
Reported-by: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2018-03-07 00:44:17 +0800

01 Sep, 2017

2 commits

d9b83c756 libnvdimm, btt: rework error clearing ... Browse Code »

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer
Cc: Toshi Kani
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-09-01 06:05:10 +0800
1db1f3cea libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path ... Browse Code »

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-09-01 05:31:38 +0800

04 Jul, 2017

1 commit

9d92573ff Merge branch 'for-4.13/dax' into libnvdimm-for-next Browse Code »

Dan Williams
2017-07-04 07:54:58 +0800

01 Jul, 2017

1 commit

7e5a21dfe libnvdimm: fix the clear-error check in nsio_rw_bytes ... Browse Code »

A leftover from the 'bandaid' fix that disabled BTT error clearing in
rw_bytes resulted in an incorrect check. After we converted these checks
over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
redundant, and incorrect. Remove it.

Fixes: 3ae3d67ba705 ("libnvdimm: add an atomic vs process context flag to rw_bytes")
Cc:
Cc: Dave Jiang
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-07-01 09:50:34 +0800

30 Jun, 2017

1 commit

14e494542 libnvdimm, btt: BTT updates for UEFI 2.7 format ... Browse Code »

The UEFI 2.7 specification defines an updated BTT metadata format,
bumping the revision to 2.0. Add support for the new format, while
retaining compatibility for the old 1.1 format.

Cc: Toshi Kani
Cc: Linda Knippers
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-06-30 04:50:38 +0800

28 Jun, 2017

2 commits

ca6a4657e x86, libnvdimm, pmem: remove global pmem api ... Browse Code »

Now that all callers of the pmem api have been converted to dax helpers that
call back to the pmem driver, we can remove include/linux/pmem.h and
asm/pmem.h.

Cc:
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: Oliver O'Halloran
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:29:54 +0800
f2b612578 x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm ... Browse Code »

Kill this globally defined wrapper and move to libnvdimm so that we can
ultimately remove include/linux/pmem.h and asm/pmem.h.

Cc:
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:29:00 +0800

16 Jun, 2017

1 commit

b3fde74ea libnvdimm, label: add address abstraction identifiers ... Browse Code »

Starting with v1.2 labels, 'address abstractions' can be hinted via an
address abstraction id that implies an info-block format. The standard
address abstraction in the specification is the v2 format of the
Block-Translation-Table (BTT). Support for that is saved for a later
patch, for now we add support for the Linux supported address
abstractions BTT (v1), PFN, and DAX.

The new 'holder_class' attribute for namespace devices is added for
tooling to specify the 'abstraction_guid' to store in the namespace label.
For v1.1 labels this field is undefined and any setting of
'holder_class' away from the default 'none' value will only have effect
until the driver is unloaded. Setting 'holder_class' requires that
whatever device tries to claim the namespace must be of the specified
class.

Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800

10 Jun, 2017

1 commit

0aed55af8 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations ... Browse Code »

The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc:
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Al Viro
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2017-06-10 00:09:56 +0800

11 May, 2017

1 commit

3ae3d67ba libnvdimm: add an atomic vs process context flag to rw_bytes ... Browse Code »

nsio_rw_bytes can clear media errors, but this cannot be done while we
are in an atomic context due to locking within ACPI. From the BTT,
->rw_bytes may be called either from atomic or process context depending
on whether the calls happen during initialization or during IO.

During init, we want to ensure error clearing happens, and the flag
marking process context allows nsio_rw_bytes to do that. When called
during IO, we're in atomic context, and error clearing can be skipped.

Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-05-11 12:46:22 +0800

05 May, 2017

1 commit

736163671 Merge branch 'for-4.12/dax' into libnvdimm-for-next Browse Code »

Dan Williams
2017-05-05 14:38:43 +0800

02 May, 2017

1 commit

a3e9af95f libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking" ... Browse Code »

This continues the 4.11 status quo of disabling of error clearing from
the BTT I/O path. Toshi found that even though we have eliminated all
the libnvdimm sources of sleeping-while-atomic triggers, we still have
sleeping operations that will occur in the path to send the ACPI DSM to
the DIMM to clear the error:

BUG: sleeping function called from invalid context at mm/slab.h:432
in_atomic(): 1, irqs_disabled(): 0, pid: 13353, name: dd
Call Trace:
dump_stack+0x86/0xc3
___might_sleep+0x17d/0x250
__might_sleep+0x4a/0x80
__kmalloc+0x1c0/0x2e0
acpi_os_allocate_zeroed+0x2d/0x2f
acpi_evaluate_object+0x59/0x3b1
acpi_evaluate_dsm+0xbd/0x10c
acpi_nfit_ctl+0x1ef/0x7c0 [nfit]
? nsio_rw_bytes+0x152/0x280
nvdimm_clear_poison+0x77/0x140
nsio_rw_bytes+0x18f/0x280
btt_write_pg+0x1d4/0x3d0 [nd_btt]
btt_make_request+0x119/0x2d0 [nd_btt]

A solution for tracking and handling media errors natively in the BTT is
needed.

Cc: Jeff Moyer
Cc: Dave Jiang
Cc: Vishal Verma
Reported-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2017-05-02 01:00:02 +0800

01 May, 2017

1 commit

452bae0ae libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering ... Browse Code »

A debug patch to turn the standard device_lock() into something that
lockdep can analyze yielded the following:

======================================================
[ INFO: possible circular locking dependency detected ]
4.11.0-rc4+ #106 Tainted: G O
-------------------------------------------------------
lt-libndctl/1898 is trying to acquire lock:
(&dev->nvdimm_mutex/3){+.+.+.}, at: [] nd_attach_ndns+0x178/0x1b0 [libnvdimm]

but task is already holding lock:
(&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [] nvdimm_bus_lock+0x21/0x30 [libnvdimm]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
lock_acquire+0xf6/0x1f0
__mutex_lock+0x88/0x980
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
nd_pmem_probe+0x14/0x180 [nd_pmem]
nvdimm_bus_probe+0xa9/0x260 [libnvdimm]

-> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
__lock_acquire+0x1107/0x1280
lock_acquire+0xf6/0x1f0
__mutex_lock+0x88/0x980
mutex_lock_nested+0x1b/0x20
nd_attach_ndns+0x178/0x1b0 [libnvdimm]
nd_namespace_store+0x308/0x3c0 [libnvdimm]
namespace_store+0x87/0x220 [libnvdimm]

In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.

Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
nd_{attach,detach}_ndns() operations.

Cc:
Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices")
Reported-by: Yi Zhang
Signed-off-by: Dan Williams

Dan Williams
2017-05-01 23:29:37 +0800

28 Apr, 2017

1 commit

97681f9b0 libnvdimm: fix phys_addr for nvdimm_clear_poison ... Browse Code »

nvdimm_clear_poison() expects a physical address, not an offset.
Fix nsio_rw_bytes() to call nvdimm_clear_poison() with a physical
address.

Signed-off-by: Toshi Kani
Cc: Dave Jiang
Cc: Vishal Verma
Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Toshi Kani
2017-04-28 04:51:18 +0800

26 Apr, 2017

1 commit

6abccd1bf x86, dax, pmem: remove indirection around memcpy_from_pmem() ... Browse Code »

memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
serves no real benefit aside from affording a more generic function name
than the x86-specific 'mcsafe'. However this would not be the first time
that x86 terminology leaked into the global namespace. For lack of
better name, just use memcpy_mcsafe() directly.

This conversion also catches a place where we should have been using
plain memcpy, acpi_nfit_blk_single_io().

Cc:
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Cc: Ross Zwisler
Acked-by: Tony Luck
Signed-off-by: Dan Williams

Dan Williams
2017-04-26 04:20:46 +0800

15 Apr, 2017

1 commit

e88da7998 Revert "libnvdimm: band aid btt vs clear poison locking" ... Browse Code »

This reverts commit 4aa5615e080a "libnvdimm: band aid btt vs clear
poison locking".

Now that poison list locking has been converted to a spinlock and poison
list entry allocation during i/o has been converted to GFP_NOWAIT,
revert the band-aid that disabled error clearing from btt i/o.

Cc: Vishal Verma
Cc: Dave Jiang
Signed-off-by: Dan Williams

Dan Williams
2017-04-15 04:29:01 +0800

11 Apr, 2017

1 commit

4aa5615e0 libnvdimm: band aid btt vs clear poison locking ... Browse Code »

The following warning results from holding a lane spinlock,
preempt_disable(), or the btt map spinlock and then trying to take the
reconfig_mutex to walk the poison list and potentially add new entries.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
[..]
Call Trace:
dump_stack+0x85/0xc8
___might_sleep+0x184/0x250
__might_sleep+0x4a/0x90
__mutex_lock+0x58/0x9b0
? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
? acpi_nfit_forget_poison+0x79/0x80 [nfit]
? _raw_spin_unlock+0x27/0x40
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_forget_poison+0x25/0x50 [libnvdimm]
nvdimm_clear_poison+0x106/0x140 [libnvdimm]
nsio_rw_bytes+0x164/0x270 [libnvdimm]
btt_write_pg+0x1de/0x3e0 [nd_btt]
? blk_queue_enter+0x30/0x290
btt_make_request+0x11a/0x310 [nd_btt]
? blk_queue_enter+0xb7/0x290
? blk_queue_enter+0x30/0x290
generic_make_request+0x118/0x3b0

As a minimal fix, disable error clearing when the BTT is enabled for the
namespace. For the final fix a larger rework of the poison list locking
is needed.

Note that this is not a problem in the blk case since that path never
calls nvdimm_clear_poison().

Cc:
Fixes: 82bf1037f2ca ("libnvdimm: check and clear poison before writing to pmem")
Cc: Dave Jiang
[jeff: dynamically disable error clearing in the btt case]
Suggested-by: Jeff Moyer
Reviewed-by: Jeff Moyer
Reported-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2017-04-11 08:21:45 +0800

17 Dec, 2016

1 commit

868f036fe libnvdimm: fix mishandled nvdimm_clear_poison() return value ... Browse Code »

Colin, via static analysis, reports that the length could be negative
from nvdimm_clear_poison() in the error case. There was a similar
problem with commit 0a3f27b9a6a8 "libnvdimm, namespace: avoid multiple
sector calculations" that I noticed when merging the for-4.10/libnvdimm
topic branch into libnvdimm-for-next, but I missed this one. Fix both of
them to the following procedure:

* if we clear a block's worth of media, clear that many blocks in
badblocks

* if we clear less than the requested size of the transfer return an
error

* always invalidate cache after any non-error / non-zero
nvdimm_clear_poison result

Fixes: 82bf1037f2ca ("libnvdimm: check and clear poison before writing to pmem")
Fixes: 0a3f27b9a6a8 ("libnvdimm, namespace: avoid multiple sector calculations")
Cc: Fabian Frederick
Cc: Dave Jiang
Reported-by: Colin Ian King
Signed-off-by: Dan Williams

Dan Williams
2016-12-17 00:10:31 +0800

16 Dec, 2016

1 commit

9cf8bd529 libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held ... Browse Code »

For warnings that should only ever trigger during development and
testing replace WARN statements with lockdep_assert_held. The lockdep
pattern is prevalent, and these paths are are well covered by libnvdimm
unit tests.

Reported-by: Johannes Thumshirn
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams

Dan Williams
2016-12-16 12:04:31 +0800

05 Dec, 2016

1 commit

d37806dc3 libnvdimm: remove else after return in nsio_rw_bytes() ... Browse Code »

else after return is not needed.

Signed-off-by: Fabian Frederick
[djbw: removed some now unnecessary newlines]
Signed-off-by: Dan Williams

Fabian Frederick
2016-12-05 02:45:13 +0800

29 Nov, 2016

1 commit

450c6633e libnvdimm: use consistent naming for request_mem_region() ... Browse Code »

Here is an example /proc/iomem listing for a system with 2 namespaces,
one in "sector" mode and one in "memory" mode:

1fc000000-2fbffffff : Persistent Memory (legacy)
1fc000000-2fbffffff : namespace1.0
340000000-34fffffff : Persistent Memory
340000000-34fffffff : btt0.1

Here is the corresponding ndctl listing:

# ndctl list
[
{
"dev":"namespace1.0",
"mode":"memory",
"size":4294967296,
"blockdev":"pmem1"
},
{
"dev":"namespace0.0",
"mode":"sector",
"size":267091968,
"uuid":"f7594f86-badb-4592-875f-ded577da2eaf",
"sector_size":4096,
"blockdev":"pmem0s"
}
]

Notice that the ndctl listing is purely in terms of namespace devices,
while the iomem listing leaks the internal "btt0.1" implementation
detail. Given that ndctl requires the namespace device name to change
the mode, for example:

# ndctl create-namespace --reconfig=namespace0.0 --mode=raw --force

...use the namespace name in the iomem listing to keep the claiming
device name consistent across different mode settings.

Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2016-11-29 03:15:18 +0800

12 Nov, 2016

1 commit

82bf1037f libnvdimm: check and clear poison before writing to pmem ... Browse Code »

We need to clear any poison when we are writing to pmem. The granularity
will be sector size. If it's less then we can't do anything about it
barring corruption.

Signed-off-by: Dave Jiang
Reviewed-by: Vishal Verma
[djbw: fixup 0-length write request to succeed]
Signed-off-by: Dan Williams

Dave Jiang
2016-11-12 12:35:31 +0800

13 Jul, 2016

1 commit

91131dbd1 libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes ... Browse Code »

nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write. Replace wmb_pmem() with
nvdimm_flush() in this path.

Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2016-07-13 06:13:48 +0800

18 Jun, 2016

1 commit

425889581 libnvdimm: IS_ERR() usage cleanup ... Browse Code »

Prompted by commit 287980e49ffc "remove lots of IS_ERR_VALUE abuses", I
ran make coccicheck against drivers/nvdimm/ and found that:

if (IS_ERR(x))
return PTR_ERR(x);
return 0;

...can be replaced with PTR_ERR_OR_ZERO().

Reported-by: Linus Torvalds
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams

Dan Williams
2016-06-18 07:23:23 +0800

22 May, 2016

1 commit

03dca343a libnvdimm, dax: fix deletion ... Browse Code »

The ndctl unit tests discovered that the dax enabling omitted updates to
nd_detach_and_reset(). This routine clears device the configuration
when the namespace is detached. Without this clearing userspace may
assume that the device is in the process of being configured by another
agent in the system.

Signed-off-by: Dan Williams

Dan Williams
2016-05-22 03:22:41 +0800

10 May, 2016

1 commit

cd03412a5 libnvdimm, dax: introduce device-dax infrastructure ... Browse Code »

Device DAX is the device-centric analogue of Filesystem DAX
(CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and
mapped without need of an intervening file system. This initial
infrastructure arranges for a libnvdimm pfn-device to be represented as
a different device-type so that it can be attached to a driver other
than the pmem driver.

Signed-off-by: Dan Williams

Dan Williams
2016-05-10 06:35:42 +0800

23 Apr, 2016

1 commit

200c79da8 libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setup ... Browse Code »

In preparation for providing an alternative (to block device) access
mechanism to persistent memory, convert pmem_rw_bytes() to
nsio_rw_bytes(). This allows ->rw_bytes() functionality without
requiring a 'struct pmem_device' to be instantiated.

In other words, when ->rw_bytes() is in use i/o is driven through
'struct nd_namespace_io', otherwise it is driven through 'struct
pmem_device' and the block layer. This consolidates the disjoint calls
to devm_exit_badblocks() and devm_memunmap() into a common
devm_nsio_disable() and cleans up the init path to use a unified
pmem_attach_disk() implementation.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams

Dan Williams
2016-04-23 03:26:23 +0800

29 Aug, 2015

1 commit

e1455744b libnvdimm, pfn: 'struct page' provider infrastructure ... Browse Code »

Implement the base infrastructure for libnvdimm PFN devices. Similar to
BTT devices they take a namespace as a backing device and layer
functionality on top. In this case the functionality is reserving space
for an array of 'struct page' entries to be handed out through
pfn_to_page(). For now this is just the basic libnvdimm-device-model for
configuring the base PFN device.

As the namespace claiming mechanism for PFN devices is mostly identical
to BTT devices drivers/nvdimm/claim.c is created to house the common
bits.

Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2015-08-29 11:39:36 +0800