Eric Lee / smarc-fsl-linux-kernel

14 Oct, 2020

3 commits

b7b3c01b1 mm/memremap_pages: support multiple ranges per invocation ... Browse Code »

In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.

Convert all [devm_]memremap_pages() users to specify the number of ranges
they are mapping in their 'struct dev_pagemap' instance.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
a4574f63e mm/memremap_pages: convert to 'struct range' ... Browse Code »

The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Reviewed-by: Boris Ostrovsky [xen]
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

06 Oct, 2020

1 commit

ec6347bb4 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() ... Browse Code »

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams
Signed-off-by: Borislav Petkov
Reviewed-by: Tony Luck
Acked-by: Michael Ellerman
Cc:
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

Dan Williams
2020-10-06 17:18:04 +0800

25 Sep, 2020

1 commit

a8b456d01 bdi: remove BDI_CAP_SYNCHRONOUS_IO ... Browse Code »

BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
decided if ->rw_page can be used on a block device. Just check up for
the method instead. The only complication is that zram needs a second
set of block_device_operations as it can switch between modes that
actually support ->rw_page and those who don't.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-25 03:43:39 +0800

02 Sep, 2020

1 commit

32f61d675 nvdimm: simplify revalidate_disk handling ... Browse Code »

The nvdimm block driver abuse revalidate_disk in a strange way, and
totally unrelated to what other drivers do. Simplify this by just
calling nvdimm_revalidate_disk (which seems rather misnamed) from the
probe routines, as the additional bdev size revalidation is pointless
at this point, and remove the revalidate_disk methods given that
it can only be triggered from add_disk, which is right before the
manual calls.

Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 22:00:07 +0800

15 Aug, 2020

1 commit

af3bbc12d mm: add thp_size ... Browse Code »

This function returns the number of bytes in a THP. It is like
page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
is disabled.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Reviewed-by: Zi Yan
Cc: David Hildenbrand
Cc: Mike Kravetz
Cc: "Kirill A. Shutemov"
Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-08-15 10:56:56 +0800

01 Jul, 2020

1 commit

c62b37d96 block: move ->make_request_fn to struct block_device_operations ... Browse Code »

The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector. Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does). Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 21:27:24 +0800

14 Jun, 2020

1 commit

d74b15dbb Merge tag 'libnvdimm-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"Small collection of cleanups to rework usage of ->queuedata and the
GUID api"

* tag 'libnvdimm-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/pmem: stop using ->queuedata
nvdimm/btt: stop using ->queuedata
nvdimm/blk: stop using ->queuedata
libnvdimm: Replace guid_copy() with import_guid() where it makes sense

Linus Torvalds
2020-06-14 04:04:36 +0800

09 Jun, 2020

1 commit

e0cf615d7 asm-generic: don't include <linux/mm.h> in cacheflush.h ... Browse Code »

This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.

[hch@lst.de: fix warning]
Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Will Deacon
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Anton Ivanov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Dan Williams
Cc: Vishal Verma
Cc: Dave Jiang
Cc: Keith Busch
Cc: Ira Weiny
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-09 02:05:57 +0800

27 May, 2020

1 commit

0fd92f89a nvdimm: use bio_{start,end}_io_acct ... Browse Code »

Switch dm to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-27 19:21:23 +0800

14 May, 2020

1 commit

6ec26b8b2 nvdimm/pmem: stop using ->queuedata ... Browse Code »

In preparation for removing queuedata as an argument to
make_request_fn() drop the dependency ->queuedata.

Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20200508161517.252308-16-hch@lst.de
Signed-off-by: Dan Williams

Christoph Hellwig
2020-05-14 06:15:37 +0800

09 Apr, 2020

1 commit

9b06860d7 Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.

This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.

Summary:

- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.

- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.

- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.

- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.

- Promote numa_map_to_online_node() to a cross-kernel generic
facility.

- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.

- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.

- Fixup some flexible-array declarations"

* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...

Linus Torvalds
2020-04-09 12:03:40 +0800

03 Apr, 2020

3 commits

4e4ced937 dax: Move mandatory ->zero_page_range() check in alloc_dax() ... Browse Code »

zero_page_range() dax operation is mandatory for dax devices. Right now
that check happens in dax_zero_page_range() function. Dan thinks that's
too late and its better to do the check earlier in alloc_dax().

I also modified alloc_dax() to return pointer with error code in it in
case of failure. Right now it returns NULL and caller assumes failure
happened due to -ENOMEM. But with this ->zero_page_range() check, I
need to return -EINVAL instead.

Signed-off-by: Vivek Goyal
Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800
f605a263e dax, pmem: Add a dax operation zero_page_range ... Browse Code »

Add a dax operation zero_page_range, to zero a page. This will also clear any
known poison in the page being zeroed.

As of now, zeroing of one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.

Suggested-by: Christoph Hellwig
Signed-off-by: Vivek Goyal
Reviewed-by: Pankaj Gupta
Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800
5d64efe79 pmem: Add functions for reading/writing page to/from pmem ... Browse Code »

This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write().
pmem_do_write() will be used by pmem zero_page_range() as well. Hence
sharing the same code.

Suggested-by: Christoph Hellwig
Signed-off-by: Vivek Goyal
Reviewed-by: Christoph Hellwig
Reviewed-by: Pankaj Gupta
Link: https://lore.kernel.org/r/20200228163456.1587-2-vgoyal@redhat.com
Signed-off-by: Dan Williams

Vivek Goyal
2020-04-03 10:15:03 +0800

28 Mar, 2020

1 commit

3d745ea5b block: simplify queue allocation ... Browse Code »

Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-28 00:23:43 +0800

01 Feb, 2020

1 commit

429589d64 mm: Cleanup __put_devmap_managed_page() vs ->page_free() ... Browse Code »

After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel. One of those is a
device-private callback in the nouveau driver, the other is a generic
wakeup needed in the DAX case. In the hopes that all ->page_free()
callbacks can be migrated to common core kernel functionality, move the
device-private specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback. For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.com
Signed-off-by: Dan Williams
Signed-off-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: Jérôme Glisse
Cc: Jan Kara
Cc: Ira Weiny
Cc: Alex Williamson
Cc: Aneesh Kumar K.V
Cc: Björn Töpel
Cc: Daniel Vetter
Cc: Hans Verkuil
Cc: Jason Gunthorpe
Cc: Jason Gunthorpe
Cc: Jens Axboe
Cc: Jonathan Corbet
Cc: Kirill A. Shutemov
Cc: Leon Romanovsky
Cc: Mauro Carvalho Chehab
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2020-02-01 02:30:37 +0800

18 Nov, 2019

1 commit

d7c0d927a libnvdimm/pmem: Delete include of nd-core.h ... Browse Code »

The entire point of nd-core.h is to hide functionality that no leaf
driver should touch. In fact, the commit that added it had no need to
include it.

Fixes: 06e8ccdab15f ("acpi: nfit: Add support for detect platform...")
Cc: Ira Weiny
Cc: Dave Jiang
Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2019-11-18 01:17:38 +0800

15 Nov, 2019

1 commit

8f4b01fcd libnvdimm/namespace: Differentiate between probe mapping and runtime mapping ... Browse Code »

The nvdimm core currently maps the full namespace to an ioremap range
while probing the namespace mode. This can result in probe failures on
architectures that have limited ioremap space.

For example, with a large btt namespace that consumes most of I/O remap
range, depending on the sequence of namespace initialization, the user
can find a pfn namespace initialization failure due to unavailable I/O
remap space which nvdimm core uses for temporary mapping.

nvdimm core can avoid this failure by only mapping the reserved info
block area to check for pfn superblock type and map the full namespace
resource only before using the namespace.

Given that personalities like BTT can be layered on top of any namespace
type create a generic form of devm_nsio_enable (devm_namespace_enable)
and use it inside the per-personality attach routines. Now
devm_namespace_enable() is always paired with disable unless the mapping
is going to be used for long term runtime access.

Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com
[djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers]
Reported-by: kbuild test robot
Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams

Aneesh Kumar K.V
2019-11-15 11:08:47 +0800

06 Sep, 2019

1 commit

1c97afa71 libnvdimm/pmem: Advance namespace seed for specific probe errors ... Browse Code »

In order to support marking namespaces with unsupported feature/versions
disabled, nvdimm core should advance the namespace seed on these
probe failures. Otherwise, these failed namespaces will be considered a
seed namespace and will be wrongly used while creating new namespaces.

Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace
initialization failures due to pfn superblock feature/version mismatch.

Signed-off-by: Aneesh Kumar K.V
Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams

Aneesh Kumar K.V
2019-09-06 07:11:14 +0800

27 Jul, 2019

1 commit

523634db1 Merge tag 'libnvdimm-fixes-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm fixes from Dan Williams:
"A collection of locking and async operations fixes for v5.3-rc2. These
had been soaking in a branch targeting the merge window, but missed
due to a regression hunt. This fixed up version has otherwise been in
-next this past week with no reported issues.

In order to gain confidence in the locking changes the pull also
includes a debug / instrumentation patch to enable lockdep coverage
for libnvdimm subsystem operations that depend on the device_lock for
exclusion. As mentioned in the changelog it is a hack, but it works
and documents the locking expectations of the sub-system in a way that
others can use lockdep to verify. The driver core touches got an ack
from Greg.

Summary:

- Fix duplicate device_unregister() calls (multiple threads competing
to do unregister work when scheduling device removal from a sysfs
attribute of the self-same device).

- Fix badblocks registration order bug. Ensure region badblocks are
initialized in advance of namespace registration.

- Fix a deadlock between the bus lock and probe operations.

- Export device-core infrastructure to coordinate async operations
via the device ->dead state.

- Add device-core infrastructure to validate device_lock() usage with
lockdep"

* tag 'libnvdimm-fixes-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
driver-core, libnvdimm: Let device subsystems add local lockdep coverage
libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
libnvdimm/region: Register badblocks before namespaces
libnvdimm/bus: Prevent duplicate device_unregister() calls
drivers/base: Introduce kill_device()

Linus Torvalds
2019-07-27 23:25:51 +0800

19 Jul, 2019

2 commits

87a30e1f0 driver-core, libnvdimm: Let device subsystems add local lockdep coverage ... Browse Code »

For good reason, the standard device_lock() is marked
lockdep_set_novalidate_class() because there is simply no sane way to
describe the myriad ways the device_lock() ordered with other locks.
However, that leaves subsystems that know their own local device_lock()
ordering rules to find lock ordering mistakes manually. Instead,
introduce an optional / additional lockdep-enabled lock that a subsystem
can acquire in all the same paths that the device_lock() is acquired.

A conversion of the NFIT driver and NVDIMM subsystem to a
lockdep-validate device_lock() scheme is included. The
debug_nvdimm_lock() implementation implements the correct lock-class and
stacking order for the libnvdimm device topology hierarchy.

Yes, this is a hack, but hopefully it is a useful hack for other
subsystems device_lock() debug sessions. Quoting Greg:

"Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
using it as much as anything else, so user beware :)

I don't object to it if it makes things easier for you to debug."

Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Will Deacon
Cc: Dave Jiang
Cc: Keith Busch
Cc: Peter Zijlstra
Cc: Vishal Verma
Cc: "Rafael J. Wysocki"
Cc: Greg Kroah-Hartman
Signed-off-by: Dan Williams
Acked-by: Greg Kroah-Hartman
Reviewed-by: Ira Weiny
Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com

Dan Williams
2019-07-19 07:23:27 +0800
f8c3500cd Merge tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"Primarily just the virtio_pmem driver:

- virtio_pmem

The new virtio_pmem facility introduces a paravirtualized
persistent memory device that allows a guest VM to use DAX
mechanisms to access a host-file with host-page-cache. It arranges
for MAP_SYNC to be disabled and instead triggers a host fsync()
when a 'write-cache flush' command is sent to the virtual disk
device.

- Miscellaneous small fixups"

* tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
virtio_pmem: fix sparse warning
xfs: disable map_sync for async flush
ext4: disable map_sync for async flush
dax: check synchronous mapping is supported
dm: enable synchronous dax
libnvdimm: add dax_dev sync flag
virtio-pmem: Add virtio pmem driver
libnvdimm: nd_region flush callback support
libnvdimm, namespace: Drop uuid_t implementation detail

Linus Torvalds
2019-07-19 01:52:08 +0800

06 Jul, 2019

2 commits

fefc1d97f libnvdimm: add dax_dev sync flag ... Browse Code »

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta
Signed-off-by: Dan Williams

Pankaj Gupta
2019-07-06 06:19:10 +0800
c5d4355d1 libnvdimm: nd_region flush callback support ... Browse Code »

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta
Signed-off-by: Dan Williams

Pankaj Gupta
2019-07-06 06:19:10 +0800

03 Jul, 2019

5 commits

514caf23a memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag ... Browse Code »

Add a flags field to struct dev_pagemap to replace the altmap_valid
boolean to be a little more extensible. Also add a pgmap_altmap() helper
to find the optional altmap and clean up the code using the altmap using
it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:44 +0800
80a72d0af memremap: remove the data field in struct dev_pagemap ... Browse Code »

struct dev_pagemap is always embedded into a containing structure, so
there is no need to an additional private data field.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:44 +0800
f6a55e1a3 memremap: lift the devmap_enable manipulation into devm_memremap_pages ... Browse Code »

Just check if there is a ->page_free operation set and take care of the
static key enable, as well as the put using device managed resources.
Also check that a ->page_free is provided for the pgmaps types that
require it, and check for a valid type as well while we are at it.

Note that this also fixes the fact that hmm never called
dev_pagemap_put_ops and thus would leave the slow path enabled forever,
even after a device driver unload or disable.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ira Weiny
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:44 +0800
d8668bb04 memremap: pass a struct dev_pagemap to ->kill and ->cleanup ... Browse Code »

Passing the actual typed structure leads to more understandable code
vs just passing the ref member.

Reported-by: Logan Gunthorpe
Signed-off-by: Christoph Hellwig
Reviewed-by: Logan Gunthorpe
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:44 +0800
1e240e8d4 memremap: move dev_pagemap callbacks into a separate structure ... Browse Code »

The dev_pagemap is a growing too many callbacks. Move them into a
separate ops structure so that they are not duplicated for multiple
instances, and an attacker can't easily overwrite them.

Signed-off-by: Christoph Hellwig
Reviewed-by: Logan Gunthorpe
Reviewed-by: Jason Gunthorpe
Reviewed-by: Dan Williams
Tested-by: Dan Williams
Signed-off-by: Jason Gunthorpe

Christoph Hellwig
2019-07-03 01:32:44 +0800

14 Jun, 2019

1 commit

50f44ee72 mm/devm_memremap_pages: fix final page put race ... Browse Code »

Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.

As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.

Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.

Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 41e94a851304 ("add devm_memremap_pages")
Signed-off-by: Dan Williams
Reported-by: Logan Gunthorpe
Reviewed-by: Ira Weiny
Reviewed-by: Logan Gunthorpe
Cc: Bjorn Helgaas
Cc: "Jérôme Glisse"
Cc: Christoph Hellwig
Cc: Greg Kroah-Hartman
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2019-06-14 11:34:56 +0800

05 Jun, 2019

1 commit

2025cf9e1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Reviewed-by: Alexios Zavras
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:37 +0800

21 May, 2019

2 commits

52f476a32 libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead ... Browse Code »

Jeff discovered that performance improves from ~375K iops to ~519K iops
on a simple psync-write fio workload when moving the location of 'struct
page' from the default PMEM location to DRAM. This result is surprising
because the expectation is that 'struct page' for dax is only needed for
third party references to dax mappings. For example, a dax-mapped buffer
passed to another system call for direct-I/O requires 'struct page' for
sending the request down the driver stack and pinning the page. There is
no usage of 'struct page' for first party access to a file via
read(2)/write(2) and friends.

However, this "no page needed" expectation is violated by
CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
check_heap_object() helper routine assumes the buffer is backed by a
slab allocator (DRAM) page and applies some checks. Those checks are
invalid, dax pages do not originate from the slab, and redundant,
dax_iomap_actor() has already validated that the I/O is within bounds.
Specifically that routine validates that the logical file offset is
within bounds of the file, then it does a sector-to-pfn translation
which validates that the physical mapping is within bounds of the block
device.

Bypass additional hardened usercopy overhead and call the 'no check'
versions of the copy_{to,from}_iter operations directly.

Fixes: 0aed55af8834 ("x86, uaccess: introduce copy_from_iter_flushcache...")
Cc:
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Reported-and-tested-by: Jeff Smits
Acked-by: Kees Cook
Acked-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2019-05-21 11:43:32 +0800
7bf7eac8d dax: Arrange for dax_supported check to span multiple devices ... Browse Code »

Pankaj reports that starting with commit ad428cdb525a "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.

Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.

Fixes: ad428cdb525a ("dax: Check the end of the block-device...")
Cc:
Cc: Ira Weiny
Cc: Dave Jiang
Cc: Keith Busch
Cc: Matthew Wilcox
Cc: Vishal Verma
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Reviewed-by: Jan Kara
Reported-by: Pankaj Gupta
Reviewed-by: Pankaj Gupta
Tested-by: Pankaj Gupta
Tested-by: Vaibhav Jain
Reviewed-by: Mike Snitzer
Signed-off-by: Dan Williams

Dan Williams
2019-05-21 06:02:08 +0800

08 Apr, 2019

1 commit

9dc6488e8 libnvdimm/pmem: fix a possible OOB access when read and write pmem ... Browse Code »

If offset is not zero and length is bigger than PAGE_SIZE,
this will cause to out of boundary access to a page memory

Fixes: 98cc093cba1e ("block, THP: make block_device_operations.rw_page support THP")
Co-developed-by: Liang ZhiCheng
Signed-off-by: Liang ZhiCheng
Signed-off-by: Li RongQing
Reviewed-by: Ira Weiny
Reviewed-by: Jeff Moyer
Signed-off-by: Dan Williams

Li RongQing
2019-04-08 05:36:04 +0800

29 Dec, 2018

2 commits

f346b0bec Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc updates from Andrew Morton:

- large KASAN update to use arm's "software tag-based mode"

- a few misc things

- sh updates

- ocfs2 updates

- just about all of MM

* emailed patches from Andrew Morton : (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...

Linus Torvalds
2018-12-29 08:55:46 +0800
a95c90f1e mm, devm_memremap_pages: fix shutdown handling ... Browse Code »

The last step before devm_memremap_pages() returns success is to allocate
a release action, devm_memremap_pages_release(), to tear the entire setup
down. However, the result from devm_add_action() is not checked.

Checking the error from devm_add_action() is not enough. The api
currently relies on the fact that the percpu_ref it is using is killed by
the time the devm_memremap_pages_release() is run. Rather than continue
this awkward situation, offload the responsibility of killing the
percpu_ref to devm_memremap_pages_release() directly. This allows
devm_memremap_pages() to do the right thing relative to init failures and
shutdown.

Without this change we could fail to register the teardown of
devm_memremap_pages(). The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed. However, the impact of
the failure is large given any future reconfiguration, or disable/enable,
of an nvdimm namespace will fail forever as subsequent calls to
devm_memremap_pages() will fail to setup the pgmap_radix since there will
be stale entries for the physical address range.

An argument could be made to require that the ->kill() operation be set in
the @pgmap arg rather than passed in separately. However, it helps code
readability, tracking the lifetime of a given instance, to be able to grep
the kill routine directly at the devm_memremap_pages() call site.

Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Reviewed-by: "Jérôme Glisse"
Reported-by: Logan Gunthorpe
Reviewed-by: Logan Gunthorpe
Reviewed-by: Christoph Hellwig
Cc: Balbir Singh
Cc: Michal Hocko
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2018-12-29 04:11:47 +0800

16 Nov, 2018

1 commit

6d4696423 block: remove the lock argument to blk_alloc_queue_node ... Browse Code »

With the legacy request path gone there is no real need to override the
queue_lock.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:35 +0800

25 Oct, 2018

1 commit

6078e07dc Merge tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:

- Improve the efficiency and performance of reading nvdimm-namespace
labels. Reduce the amount of label data read at driver load time by a
few orders of magnitude. Reduce heavyweight call-outs to
platform-firmware routines.

- Handle media errors located in the 'struct page' array stored on a
persistent memory namespace. Let the kernel clear these errors rather
than an awkward userspace workaround.

- Fix Address Range Scrub (ARS) completion tracking. Correct occasions
where the kernel indicates completion of ARS before submission.

- Fix asynchronous device registration reference counting.

- Add support for reporting an nvdimm dirty-shutdown-count via sysfs.

- Fix various small libnvdimm core and uapi issues.

* tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
acpi, nfit: Further restrict userspace ARS start requests
acpi, nfit: Fix Address Range Scrub completion tracking
UAPI: ndctl: Remove use of PAGE_SIZE
UAPI: ndctl: Fix g++-unsupported initialisation in headers
tools/testing/nvdimm: Populate dirty shutdown data
acpi, nfit: Collect shutdown status
acpi, nfit: Introduce nfit_mem flags
libnvdimm, label: Fix sparse warning
nvdimm: Use namespace index data to reduce number of label reads needed
nvdimm: Split label init out from the logic for getting config data
nvdimm: Remove empty if statement
nvdimm: Clarify comment in sizeof_namespace_index
nvdimm: Sanity check labeloff
libnvdimm, dimm: Maximize label transfer size
libnvdimm, pmem: Fix badblocks population for 'raw' namespaces
libnvdimm, namespace: Drop the repeat assignment for variable dev->parent
libnvdimm, region: Fail badblocks listing for inactive regions
libnvdimm, pfn: during init, clear errors in the metadata area
libnvdimm: Set device node in nd_device_register
libnvdimm: Hold reference on parent while scheduling async init
...

Linus Torvalds
2018-10-25 21:31:56 +0800