Eric Lee / smarc-fsl-linux-kernel

18 Jul, 2017

1 commit

4e3f0701f libnvdimm: fix badblock range handling of ARS range ... Browse Code »

__add_badblock_range() does not account sector alignment when
it sets 'num_sectors'. Therefore, an ARS error record range
spanning across two sectors is set to a single sector length,
which leaves the 2nd sector unprotected.

Change __add_badblock_range() to set 'num_sectors' properly.

Cc:
Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: Toshi Kani
Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Toshi Kani
2017-07-18 02:43:58 +0800

12 Jul, 2017

1 commit

130568d5e Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"This is a followup for block changes, that didn't make the initial
pull request. It's a bit of a mixed bag, this contains:

- A followup pull request from Sagi for NVMe. Outside of fixups for
NVMe, it also includes a series for ensuring that we properly
quiesce hardware queues when browsing live tags.

- Set of integrity fixes from Dmitry (mostly), fixing various issues
for folks using DIF/DIX.

- Fix for a bug introduced in cciss, with the req init changes. From
Christoph.

- Fix for a bug in BFQ, from Paolo.

- Two followup fixes for lightnvm/pblk from Javier.

- Depth fix from Ming for blk-mq-sched.

- Also from Ming, performance fix for mtip32xx that was introduced
with the dynamic initialization of commands"

* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
block: call bio_uninit in bio_endio
nvmet: avoid unneeded assignment of submit_bio return value
nvme-pci: add module parameter for io queue depth
nvme-pci: compile warnings in nvme_alloc_host_mem()
nvmet_fc: Accept variable pad lengths on Create Association LS
nvme_fc/nvmet_fc: revise Create Association descriptor length
lightnvm: pblk: remove unnecessary checks
lightnvm: pblk: control I/O flow also on tear down
cciss: initialize struct scsi_req
null_blk: fix error flow for shared tags during module_init
block: Fix __blkdev_issue_zeroout loop
nvme-rdma: unconditionally recycle the request mr
nvme: split nvme_uninit_ctrl into stop and uninit
virtio_blk: quiesce/unquiesce live IO when entering PM states
mtip32xx: quiesce request queues to make sure no submissions are inflight
nbd: quiesce request queues to make sure no submissions are inflight
nvme: kick requeue list when requeueing a request instead of when starting the queues
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
...

Linus Torvalds
2017-07-12 06:36:52 +0800

08 Jul, 2017

1 commit

b6ffe9ba4 Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"libnvdimm updates for the latest ACPI and UEFI specifications. This
pull request also includes new 'struct dax_operations' enabling to
undo the abuse of copy_user_nocache() for copy operations to pmem.

The dax work originally missed 4.12 to address concerns raised by Al.

Summary:

- Introduce the _flushcache() family of memory copy helpers and use
them for persistent memory write operations on x86. The
_flushcache() semantic indicates that the cache is either bypassed
for the copy operation (movnt) or any lines dirtied by the copy
operation are written back (clwb, clflushopt, or clflush).

- Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache

- Add support for the new NVDIMM platform/firmware mechanisms
introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
namespace label format, extensions to the address-range-scrub
command set, new error injection commands, and a new BTT
(block-translation-table) layout. These updates support inter-OS
and pre-OS compatibility.

- Fix a longstanding memory corruption bug in nfit_test.

- Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.

- Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.

Acknowledgements that came after the branch was pushed: commit
6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
libnvdimm, namespace: record 'lbasize' for pmem namespaces
acpi/nfit: Issue Start ARS to retrieve existing records
libnvdimm: New ACPI 6.2 DSM functions
acpi, nfit: Show bus_dsm_mask in sysfs
libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
acpi, nfit: Enable DSM pass thru for root functions.
libnvdimm: passthru functions clear to send
libnvdimm, btt: convert some info messages to warn/err
libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
libnvdimm: fix the clear-error check in nsio_rw_bytes
libnvdimm, btt: fix btt_rw_page not returning errors
acpi, nfit: quiet invalid block-aperture-region warnings
libnvdimm, btt: BTT updates for UEFI 2.7 format
acpi, nfit: constify *_attribute_group
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
libnvdimm, pmem, dax: export a cache control attribute
dax: convert to bitmask for flags
dax: remove default copy_from_iter fallback
libnvdimm, nfit: enable support for volatile ranges
libnvdimm, pmem: fix persistence warning
...

Linus Torvalds
2017-07-08 00:44:06 +0800

04 Jul, 2017

4 commits

9d92573ff Merge branch 'for-4.13/dax' into libnvdimm-for-next Browse Code »

Dan Williams
2017-07-04 07:54:58 +0800
2de5148ff libnvdimm, namespace: record 'lbasize' for pmem namespaces ... Browse Code »

Commit f979b13c3cc5 "libnvdimm, label: honor the lba size specified in
v1.2 labels") neglected to update the 'lbasize' in the label when the
namespace sector_size attribute was written. We need this value in the
label for inter-OS / pre-OS compatibility.

Fixes: f979b13c3cc5 ("libnvdimm, label: honor the lba size specified in v1.2 labels")
Signed-off-by: Dan Williams

Dan Williams
2017-07-04 07:30:44 +0800
b1fb2c52b block: guard bvec iteration logic ... Browse Code »

Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.

Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.

This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size

Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)

Signed-off-by: Dmitry Monakhov
Reviewed-by: Ming Lei
Reviewed-by: Martin K. Petersen
[hch: switch to true/false returns instead of errno values]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Dmitry Monakhov
2017-07-04 06:56:26 +0800
e23947bd7 bio-integrity: fold bio_integrity_enabled to bio_integrity_prep ... Browse Code »

Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.

In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.

Signed-off-by: Dmitry Monakhov
Reviewed-by: Martin K. Petersen
[hch: merged with the latest block tree,
return bool from bio_integrity_prep]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Dmitry Monakhov
2017-07-04 06:56:24 +0800

01 Jul, 2017

4 commits

53b85a449 libnvdimm: passthru functions clear to send ... Browse Code »

Have dsm functions called via the pass thru mechanism also
be checked against clear to send.

Signed-off-by: Jerry Hoemann
Signed-off-by: Dan Williams

Jerry Hoemann
2017-07-01 23:49:59 +0800
e6be2dcbe libnvdimm, btt: convert some info messages to warn/err ... Browse Code »

Some critical messages such as IO errors, metadata failures were printed
with dev_info. Make them louder by upgrading them to dev_warn or
dev_error.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-07-01 23:49:59 +0800
6aa734a2f libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime ... Browse Code »

We need to hold a reference on the 'dirent' until we are sure there are
no more notifications that will be sent. As noted in the new comments we
take advantage of the fact that the references are taken and dropped
under device_lock() and that nd_device_notify() holds device_lock() over
new badblocks notifications. The notifications that happen when
badblocks are cleared only occur while the device is active.

Also take the opportunity to fix up the error messages to report the
user visible effect of a sysfs_get_dirent() failure.

Fixes: 975750a98c26 ("libnvdimm, pmem: Add sysfs notifications to badblocks")
Cc: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2017-07-01 09:56:03 +0800
7e5a21dfe libnvdimm: fix the clear-error check in nsio_rw_bytes ... Browse Code »

A leftover from the 'bandaid' fix that disabled BTT error clearing in
rw_bytes resulted in an incorrect check. After we converted these checks
over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
redundant, and incorrect. Remove it.

Fixes: 3ae3d67ba705 ("libnvdimm: add an atomic vs process context flag to rw_bytes")
Cc:
Cc: Dave Jiang
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-07-01 09:50:34 +0800

30 Jun, 2017

5 commits

c13c43d54 libnvdimm, btt: fix btt_rw_page not returning errors ... Browse Code »

btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
IO errors via the rw_page path going unnoticed. the pmem driver recently
fixed this in e10624f pmem: fail io-requests to known bad blocks
but same problem in BTT went neglected.

Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Cc:
Cc: Toshi Kani
Cc: Dan Williams
Cc: Jeff Moyer
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-06-30 09:24:28 +0800
d5d51fece acpi, nfit: quiet invalid block-aperture-region warnings ... Browse Code »

This state is already visible by userspace since the BLK region will not
be enabled, and it is otherwise benign as it usually indicates that the
DIMM is not configured.

Signed-off-by: Dan Williams

Dan Williams
2017-06-30 04:50:38 +0800
14e494542 libnvdimm, btt: BTT updates for UEFI 2.7 format ... Browse Code »

The UEFI 2.7 specification defines an updated BTT metadata format,
bumping the revision to 2.0. Add support for the new format, while
retaining compatibility for the old 1.1 format.

Cc: Toshi Kani
Cc: Linda Knippers
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-06-30 04:50:38 +0800
0b277961f libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region ... Browse Code »

The pmem driver attaches to both persistent and volatile memory ranges
advertised by the ACPI NFIT. When the region is volatile it is redundant
to spend cycles flushing caches at fsync(). Check if the hosting region
is volatile and do not set dax_write_cache() if it is.

Cc: Jan Kara
Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2017-06-30 00:29:50 +0800
6e0c90d69 libnvdimm, pmem, dax: export a cache control attribute ... Browse Code »

The dax_flush() operation can be turned into a nop on platforms where
firmware arranges for cpu caches to be flushed on a power-fail event.
The ACPI 6.2 specification defines a mechanism for the platform to
indicate this capability so the kernel can select the proper default.
However, for other platforms, the administrator must toggle this setting
manually.

Given this flush setting is a dax-specific mechanism we advertise it
through a 'dax' attribute group hanging off a host device. For example,
a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
'write_cache' attribute to control response to dax cache flush requests.
This is similar to the 'queue/write_cache' attribute that appears under
block devices.

Cc: Jan Kara
Cc: Jeff Moyer
Cc: Matthew Wilcox
Cc: Ross Zwisler
Suggested-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2017-06-30 00:29:50 +0800

28 Jun, 2017

5 commits

c9e582aa6 libnvdimm, nfit: enable support for volatile ranges ... Browse Code »

Allow volatile nfit ranges to participate in all the same infrastructure
provided for persistent memory regions. A resulting resulting namespace
device will still be called "pmem", but the parent region type will be
"nd_volatile". This is in preparation for disabling the dax ->flush()
operation in the pmem driver when it is hosted on a volatile range.

Cc: Jan Kara
Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:44:13 +0800
c00b396ef libnvdimm, pmem: fix persistence warning ... Browse Code »

The pmem driver assumes if platform firmware describes the memory
devices associated with a persistent memory range and
CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
flush data to a power-fail safe zone. We warn if the firmware does not
describe memory devices, but we also need to warn if the architecture
does not claim pmem support.

Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:44:01 +0800
ca6a4657e x86, libnvdimm, pmem: remove global pmem api ... Browse Code »

Now that all callers of the pmem api have been converted to dax helpers that
call back to the pmem driver, we can remove include/linux/pmem.h and
asm/pmem.h.

Cc:
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: Oliver O'Halloran
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:29:54 +0800
f2b612578 x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm ... Browse Code »

Kill this globally defined wrapper and move to libnvdimm so that we can
ultimately remove include/linux/pmem.h and asm/pmem.h.

Cc:
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-28 07:29:00 +0800
0b0bcacc3 block: don't bother with bounce limits for make_request drivers ... Browse Code »

We only call blk_queue_bounce for request-based drivers, so stop messing
with it for make_request based drivers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-28 02:13:45 +0800

16 Jun, 2017

12 commits

4e4f00a9b x86, dax, libnvdimm: remove wb_cache_pmem() indirection ... Browse Code »

With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
libnvdimm and the pmem driver directly we do not need to provide global
wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
driver will simply not link to arch_wb_cache_pmem() in that case. Same
as before, pmem flushing is only defined for x86_64, via
clean_cache_range(), but it is straightforward to add other archs in the
future.

arch_wb_cache_pmem() is an exported function since the pmem module needs
to find it, but it is privately declared in drivers/nvdimm/pmem.h because
there are no consumers outside of the pmem driver.

Cc:
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Oliver O'Halloran
Cc: Matthew Wilcox
Cc: Ross Zwisler
Suggested-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:35:24 +0800
3c1cebff2 dax, pmem: introduce an optional 'flush' dax_operation ... Browse Code »

Filesystem-DAX flushes caches whenever it writes to the address returned
through dax_direct_access() and when writing back dirty radix entries.
That flushing is only required in the pmem case, so add a dax operation
to allow pmem to take this extra action, but skip it for other dax
capable devices that do not provide a flush routine.

An example for this differentiation might be a volatile ram disk where
there is no expectation of persistence. In fact the pmem driver itself might
front such an address range specified by the NFIT. So, this "no flush"
property might be something passed down by the bus / libnvdimm.

Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Reviewed-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:34:59 +0800
975750a98 libnvdimm, pmem: Add sysfs notifications to badblocks ... Browse Code »

Sysfs "badblocks" information may be updated during run-time that:
- MCE, SCI, and sysfs "scrub" may add new bad blocks
- Writes and ioctl() may clear bad blocks

Add support to send sysfs notifications to sysfs "badblocks" file
under region and pmem directories when their badblocks information
is re-evaluated (but is not necessarily changed) during run-time.

Signed-off-by: Toshi Kani
Cc: Vishal Verma
Cc: Linda Knippers
Signed-off-by: Dan Williams

Toshi Kani
2017-06-16 05:31:41 +0800
8990cdf10 libnvdimm, label: switch to using v1.2 labels by default ... Browse Code »

The rules for which version of the label specification are in effect at
any given point in time are as follows:

1/ If a DIMM has an existing / valid index block then the version
specified is used regardless if it is a previous version.

2/ By default when the kernel is initializing new index blocks the
latest specification version (v1.2 at time of writing) is used.

3/ An environment that wants to force create v1.1 label-sets must
arrange for userspace to disable all active regions / namespaces /
dimms and write a valid set of v1.1 index blocks to the dimms.

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:41 +0800
b3fde74ea libnvdimm, label: add address abstraction identifiers ... Browse Code »

Starting with v1.2 labels, 'address abstractions' can be hinted via an
address abstraction id that implies an info-block format. The standard
address abstraction in the specification is the v2 format of the
Block-Translation-Table (BTT). Support for that is saved for a later
patch, for now we add support for the Linux supported address
abstractions BTT (v1), PFN, and DAX.

The new 'holder_class' attribute for namespace devices is added for
tooling to specify the 'abstraction_guid' to store in the namespace label.
For v1.1 labels this field is undefined and any setting of
'holder_class' away from the default 'none' value will only have effect
until the driver is unloaded. Setting 'holder_class' requires that
whatever device tries to claim the namespace must be of the specified
class.

Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800
355d83887 libnvdimm, label: add v1.2 label checksum support ... Browse Code »

The v1.2 namespace label specification adds a fletcher checksum to each
label instance. Add generation and validation support for the new field.

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800
3934d8410 libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces ... Browse Code »

The v1.2 namespace label specification requires 'nlabel' and 'position'
to be valid for the first ("lowest dpa") label in the set. It also
requires all non-first labels to set those fields to 0xff.

Linux does not much care if these values are correct, because we can
just trust the count of labels with the matching uuid like the v1.1
case. However, we set them correctly in case other environments care.

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800
8f2bc2430 libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces ... Browse Code »

Starting with the v1.2 definition of namespace labels, the isetcookie
field is populated and validated for blk-aperture namespaces. This adds
some safety against inadvertent copying of namespace labels from one
DIMM-device to another.

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800
faec6f8a1 libnvdimm, label: populate the type_guid property for v1.2 namespaces ... Browse Code »

The type_guid refers to the "Address Range Type GUID" for the region
backing a namespace as defined the ACPI NFIT (NVDIMM Firmware Interface
Table). This 'type' identifier specifies an access mechanism for the
given namespace. This capability replaces the confusing usage of the
'NSLABEL_FLAG_LOCAL' flag to indicate a block-aperture-mode namespace.

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:40 +0800
f979b13c3 libnvdimm, label: honor the lba size specified in v1.2 labels ... Browse Code »

Previously we only honored the lba size for blk-aperture mode
namespaces. For pmem namespaces the lba size was just assumed to be 512.
With the new v1.2 label definition and compatibility with other
operating environments, the ->lbasize property is now respected for pmem
namespaces.

Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:39 +0800
c12c48ce8 libnvdimm, label: add v1.2 interleave-set-cookie algorithm ... Browse Code »

The interleave-set-cookie algorithm is extended to incorporate all the
same components that are used to generate an nvdimm unique-id. For
backwards compatibility we still maintain the old v1.1 definition.

Reported-by: Nicholas Moulin
Reported-by: Kaushik Kanetkar
Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:39 +0800
564e871aa libnvdimm, label: add v1.2 nvdimm label definitions ... Browse Code »

In support of improved interoperability between operating systems and pre-boot
environments the Intel proposed NVDIMM Namespace Specification [1], has been
adopted and modified to the the UEFI 2.7 NVDIMM Label Protocol [2].

Update the definitions of the namespace label data structures so that the new
format can be supported alongside the existing label format.

The new specification changes the default label size to 256 bytes, so
everywhere that relied on sizeof(struct nd_namespace_label) must now use the
sizeof_namespace_label() helper.

There should be no functional differences from these changes as the
default is still the v1.1 128-byte format. Future patches will move the
default to the v1.2 definition.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
[2]: http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf

Signed-off-by: Dan Williams

Dan Williams
2017-06-16 05:31:39 +0800

13 Jun, 2017

1 commit

fdd050b5b Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base Browse Code »

Christoph Hellwig
2017-06-13 17:45:14 +0800

10 Jun, 2017

1 commit

0aed55af8 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations ... Browse Code »

The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc:
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Toshi Kani
Cc: "H. Peter Anvin"
Cc: Al Viro
Cc: Thomas Gleixner
Cc: Matthew Wilcox
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2017-06-10 00:09:56 +0800

09 Jun, 2017

1 commit

4e4cbee93 block: switch bios to blk_status_t ... Browse Code »

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-09 23:27:32 +0800

05 Jun, 2017

1 commit

ef40dda5b uuid: hoist uuid_is_null() helper from libnvdimm ... Browse Code »

Hoist the libnvdimm helper as an inline helper to linux/uuid.h
using an auxiliary const variable uuid_null in lib/uuid.c.

[hch: also add the guid variant. Both do the same but I'd like
to keep casts to a minimum]

The common helper uses the new abstract type uuid_t * instead of
u8 *.

Suggested-by: Christoph Hellwig
Signed-off-by: Amir Goldstein
[hch: added guid_is_null]
Signed-off-by: Christoph Hellwig
Acked-by: Dan Williams
Reviewed-by: Andy Shevchenko

Christoph Hellwig
2017-06-05 22:59:05 +0800

13 May, 2017

1 commit

0fcc3ab23 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:

- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.

- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.

- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.

These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

Linus Torvalds
2017-05-13 06:43:10 +0800

11 May, 2017

2 commits

b177fe85d libnvdimm, btt: ensure that initializing metadata clears poison ... Browse Code »

If we had badblocks/poison in the metadata area of a BTT, recreating the
BTT would not clear the poison in all cases, notably the flog area. This
is because rw_bytes will only clear errors if the request being sent
down is 512B aligned and sized.

Make sure that when writing the map and info blocks, the rw_bytes being
sent are of the correct size/alignment. For the flog, instead of doing
the smaller log_entry writes only, first do a 'wipe' of the entire area
by writing zeroes in large enough chunks so that errors get cleared.

Cc: Andy Rudoff
Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-05-11 12:46:22 +0800
3ae3d67ba libnvdimm: add an atomic vs process context flag to rw_bytes ... Browse Code »

nsio_rw_bytes can clear media errors, but this cannot be done while we
are in an atomic context due to locking within ACPI. From the BTT,
->rw_bytes may be called either from atomic or process context depending
on whether the calls happen during initialization or during IO.

During init, we want to ensure error clearing happens, and the flag
marking process context allows nsio_rw_bytes to do that. When called
during IO, we're in atomic context, and error clearing can be skipped.

Cc: Dan Williams
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2017-05-11 12:46:22 +0800