Eric Lee / smarc-fsl-linux-kernel

22 Oct, 2015

4 commits

4125a09b0 block, libnvdimm, nvme: provide a built-in blk_integrity nop profile ... Browse Code »

The libnvidmm-btt and nvme drivers use blk_integrity to reserve space
for per-sector metadata, but sometimes without protection checksums.
This property is generically useful, so teach the block core to
internally specify a nop profile if one is not provided at registration
time.

Cc: Keith Busch
Cc: Matthew Wilcox
Suggested-by: Christoph Hellwig
[hch: kill the local nvme nop profile as well]
Acked-by: Martin K. Petersen
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe

Dan Williams
2015-10-22 04:43:45 +0800
9609b9942 md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown ... Browse Code »

Now that the integrity profile is statically allocated there is no work
to do when shutting down an integrity enabled block device.

Cc: Matthew Wilcox
Cc: Mike Snitzer
Cc: James Bottomley
Acked-by: NeilBrown
Acked-by: Keith Busch
Acked-by: Vishal Verma
Tested-by: Ross Zwisler
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe

Dan Williams
2015-10-22 04:43:37 +0800
25520d55c block: Inline blk_integrity in struct gendisk ... Browse Code »

Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

- Moving the blk_integrity definition to genhd.h.

- Inlining blk_integrity in struct gendisk.

- Removing the dynamic allocation code.

- Adding helper functions which allow gendisk to set up and tear down
the integrity sysfs dir when a disk is added/deleted.

- Adding a blk_integrity_revalidate() callback for updating the stable
pages bdi setting.

- The calls that depend on whether a device has an integrity profile or
not now key off of the bi->profile pointer.

- Simplifying the integrity support routines in DM (Mike Snitzer).

Signed-off-by: Martin K. Petersen
Reported-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Mike Snitzer
Cc: Dan Williams
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe

Martin K. Petersen
2015-10-22 04:42:42 +0800
0f8087ecd block: Consolidate static integrity profile properties ... Browse Code »

We previously made a complete copy of a device's data integrity profile
even though several of the fields inside the blk_integrity struct are
pointers to fixed template entries in t10-pi.c.

Split the static and per-device portions so that we can reference the
template directly.

Signed-off-by: Martin K. Petersen
Reported-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Cc: Dan Williams
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe

Martin K. Petersen
2015-10-22 04:42:38 +0800

17 Sep, 2015

3 commits

ba8fe0f85 pmem: add proper fencing to pmem_rw_page() ... Browse Code »

pmem_rw_page() needs to call wmb_pmem() on writes to make sure that the
newly written data is durable. This flow was added to pmem_rw_bytes()
and pmem_make_request() with this commit:

commit 61031952f4c8 ("arch, x86: pmem api for ensuring durability of
persistent memory updates")

...the pmem_rw_page() path was missed.

Cc:
Signed-off-by: Ross Zwisler
Signed-off-by: Dan Williams

Ross Zwisler
2015-09-17 23:49:28 +0800
4ca8b57a0 libnvdimm: pfn_devs: Fix locking in namespace_store ... Browse Code »

Always take device_lock() before nvdimm_bus_lock() to prevent deadlock.

Signed-off-by: Axel Lin
Signed-off-by: Dan Williams

Axel Lin
2015-09-17 23:47:50 +0800
4be9c1fc3 libnvdimm: btt_devs: Fix locking in namespace_store ... Browse Code »

Always take device_lock() before nvdimm_bus_lock() to prevent deadlock.

Cc:
Signed-off-by: Axel Lin
Signed-off-by: Dan Williams

Axel Lin
2015-09-17 23:37:16 +0800

09 Sep, 2015

1 commit

12f03ee60 Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().

Summary:

- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.

This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').

For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.

- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.

Completion of the conversion is targeted for v4.4.

- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.

- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.

- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...

Linus Torvalds
2015-09-09 05:35:59 +0800

03 Sep, 2015

1 commit

1081230b7 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:

- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.

- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.

- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.

- Make /sys/block//queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.

- Improvement of the SGP gaps logic from Keith Busch.

- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.

- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.

- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.

- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.

- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.

- Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...

Linus Torvalds
2015-09-03 04:10:25 +0800

29 Aug, 2015

3 commits

004f1afbe libnvdimm, pmem: direct map legacy pmem by default ... Browse Code »

The expectation is that the legacy / non-standard pmem discovery method
(e820 type-12) will only ever be used to describe small quantities of
persistent memory. Larger capacities will be described via the ACPI
NFIT. When "allocate struct page from pmem" support is added this default
policy can be overridden by assigning a legacy pmem namespace to a pfn
device, however this would be only be necessary if a platform used the
legacy mechanism to define a very large range.

Cc: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-29 11:40:05 +0800
32ab0a3f5 libnvdimm, pmem: 'struct page' for pmem ... Browse Code »

Enable the pmem driver to handle PFN device instances. Attaching a pmem
namespace to a pfn device triggers the driver to allocate and initialize
struct page entries for pmem. Memory capacity for this allocation comes
exclusively from RAM for now which is suitable for low PMEM to RAM
ratios. This mechanism will be expanded later for setting an "allocate
from PMEM" policy.

Cc: Boaz Harrosh
Cc: Ross Zwisler
Cc: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-29 11:40:04 +0800
e1455744b libnvdimm, pfn: 'struct page' provider infrastructure ... Browse Code »

Implement the base infrastructure for libnvdimm PFN devices. Similar to
BTT devices they take a namespace as a backing device and layer
functionality on top. In this case the functionality is reserving space
for an array of 'struct page' entries to be handed out through
pfn_to_page(). For now this is just the basic libnvdimm-device-model for
configuring the base PFN device.

As the namespace claiming mechanism for PFN devices is mostly identical
to BTT devices drivers/nvdimm/claim.c is created to house the common
bits.

Cc: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2015-08-29 11:39:36 +0800

28 Aug, 2015

4 commits

96601adb7 x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB ... Browse Code »

Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.

The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available. When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.

The default fall back for persistent memory handling remains. Namely,
map it with the WT (write-through) cache-type and hope for the best.

arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Reviewed-by: Ross Zwisler
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-08-28 07:40:59 +0800
cb389b9c0 dax: drop size parameter to ->direct_access() ... Browse Code »

None of the implementations currently use it. The common
bdev_direct_access() entry point handles all the size checks before
calling ->direct_access().

Signed-off-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-28 07:40:58 +0800
4a9bf88a5 Merge branch 'pmem-api' into libnvdimm-for-next Browse Code »

Dan Williams
2015-08-28 07:40:26 +0800
a06a75765 nvdimm: change to use generic kvfree() ... Browse Code »

Signed-off-by: yalin wang
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

yalin wang
2015-08-28 07:35:48 +0800

21 Aug, 2015

1 commit

e2e05394e pmem, dax: have direct_access use __pmem annotation ... Browse Code »

Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer. This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.

Signed-off-by: Ross Zwisler
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams

Ross Zwisler
2015-08-21 02:07:24 +0800

19 Aug, 2015

1 commit

7a67832c7 libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option ... Browse Code »

We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it. Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty. Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in. Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:

1/ Letting e820_pmem support be a module allows building and testing
libnvdimm.ko changes without rebooting

2/ All the normal policy around modules can be applied to e820_pmem
(unbind to disable and/or blacklisting the module from loading by
default)

3/ Moving the driver to a generic location and converting it to scan
"iomem_resource" rather than "e820.map" means any other architecture can
take advantage of this simple nvdimm resource discovery mechanism by
registering a resource named "Persistent Memory (legacy)"

Cc: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-19 12:34:34 +0800

15 Aug, 2015

4 commits

708ab62be pmem: switch to devm_ allocations ... Browse Code »

Signed-off-by: Christoph Hellwig
[djbw: tools/testing/nvdimm/ and memunmap_pmem support]
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

Christoph Hellwig
2015-08-15 04:01:21 +0800
6ec689542 libnvdimm, btt: write and validate parent_uuid ... Browse Code »

When a BTT is instantiated on a namespace it must validate the namespace
uuid matches the 'parent_uuid' stored in the btt superblock. This
property enforces that changing the namespace UUID invalidates all
former BTT instances on that storage. For "IO namespaces" that don't
have a label or UUID, the parent_uuid is set to zero, and this
validation is skipped. For such cases, old BTTs have to be invalidated
by forcing the namespace to raw mode, and overwriting the BTT info
blocks.

Based on a patch by Dan Williams

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-08-15 01:43:04 +0800
ab45e7632 libnvdimm, btt: consolidate arena validation ... Browse Code »

Use arena_is_valid as a common routine for checking the validity of an
info block from both discover_arenas, and nd_btt_probe.

As a result, don't check for validity of the BTT's UUID, and lbasize.
The checksum in the BTT info block guarantees self-consistency, and when
we're called from nd_btt_probe, we don't have a valid uuid or lbasize
available to check against.

Also cleanup to return a bool instead of an int.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-08-15 01:43:04 +0800
fbde1414a libnvdimm, btt: clean up internal interfaces ... Browse Code »

Consolidate the parameters passed to arena_is_valid into just nd_btt,
and an info block to increase re-usability.

Similarly, btt_arena_write_layout doesn't need to be passed a uuid, as
it can be obtained from arena->nd_btt.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-08-15 01:43:04 +0800

01 Aug, 2015

1 commit

f6ef5a2a5 nvdimm: fix inline function return type warning ... Browse Code »

Fix multiple build warnings when CONFIG_BTT is not enabled:

In file included from ../drivers/nvdimm/bus.c:29:0:
../drivers/nvdimm/nd.h:169:15: warning: return type defaults to 'int' [-Wreturn-type]
static inline nd_btt_probe(struct nd_namespace_common *ndns, void *drvdata)
^

Signed-off-by: Randy Dunlap
Cc: Dan Williams
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Dan Williams

Randy Dunlap
2015-08-01 06:17:09 +0800

29 Jul, 2015

1 commit

4246a0b63 block: add a bi_error field to struct bio ... Browse Code »

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: NeilBrown
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-07-29 22:55:15 +0800

28 Jul, 2015

2 commits

6b47496a6 libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZE ... Browse Code »

Based on a patch: c8fa317 brd: Request from fdisk 4k alignment by Boaz
Harrosh, allow fdisk to create properly aligned partitions for DAX. This
will also cause mkfs.ext4 to emit a warning if using a file system block
size of less than PAGE_SIZE.

Cc: Dan Williams
Cc: Ross Zwisler
Cc: Matthew Wilcox
Cc: Christoph Hellwig
Cc: Elliott, Robert
Signed-off-by: Vishal Verma
Acked-by: Boaz Harrosh
Acked-by: Ross Zwisler
Signed-off-by: Dan Williams

Vishal Verma
2015-07-28 10:53:19 +0800
5e3294062 libnvdimm, btt: sparse fix ... Browse Code »

Fix:
drivers/nvdimm/btt.c:635:29: warning: restricted __le64 degrades to integer

Signed-off-by: Dan Williams

Dan Williams
2015-07-28 10:53:19 +0800

26 Jul, 2015

1 commit

8ca243536 libnvdimm: fix namespace seed creation ... Browse Code »

A new BLK namespace "seed" device is created whenever the current seed
is successfully probed. However, if that namespace is assigned to a BTT
it may never directly experience a successful probe as it is a
subordinate device to a BTT configuration.

The effect of the current code is that no new namespaces can be
instantiated, after the seed namespace, to consume available BLK DPA
capacity. Fix this by treating a successful BTT probe event as a
successful probe event for the backing namespace.

Reported-by: Nicholas Moulin
Signed-off-by: Dan Williams

Dan Williams
2015-07-26 00:57:56 +0800

01 Jul, 2015

2 commits

daa1dee40 nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails ... Browse Code »

Return proper error if class_create() fails.

Signed-off-by: Axel Lin
Signed-off-by: Dan Williams

Axel Lin
2015-07-01 02:30:34 +0800
af834d457 libnvdimm: smatch cleanups in __nd_ioctl ... Browse Code »

Drop use of access_ok() since we are already using copy_{to|from}_user()
which do their own access_ok().

Reported-by: Dan Carpenter
Signed-off-by: Dan Williams

Dan Williams
2015-07-01 02:10:09 +0800

26 Jun, 2015

11 commits

61031952f arch, x86: pmem api for ensuring durability of persistent memory updates ... Browse Code »

Based on an original patch by Ross Zwisler [1].

Writes to persistent memory have the potential to be posted to cpu
cache, cpu write buffers, and platform write buffers (memory controller)
before being committed to persistent media. Provide apis,
memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
pmem and assert that it is durable in PMEM (a persistent linear address
range). A '__pmem' attribute is added so sparse can track proper usage
of pointers to pmem.

This continues the status quo of pmem being x86 only for 4.2, but
reworks to ioremap, and wider implementation of memremap() will enable
other archs in 4.3.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html

Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Ross Zwisler
[djbw: various reworks]
Signed-off-by: Dan Williams

Ross Zwisler
2015-06-26 23:23:38 +0800
74ae66c3b libnvdimm: Add sysfs numa_node to NVDIMM devices ... Browse Code »

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.

An example of numa_node values on a 2-socket system with a single
NVDIMM range on each socket is shown below.
/sys/bus/nd/devices
|-- btt0.0/numa_node:0
|-- btt1.0/numa_node:1
|-- btt1.1/numa_node:1
|-- namespace0.0/numa_node:0
|-- namespace1.0/numa_node:1
|-- region0/numa_node:0
|-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
/sys/class/block/pmem0/device/numa_node:0
/sys/class/block/pmem1s/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
numactl --preferred block:pmem0 --show
numactl --preferred file:/dev/pmem1s --show

Signed-off-by: Toshi Kani
Signed-off-by: Dan Williams

Toshi Kani
2015-06-26 23:23:38 +0800
41d7a6d63 libnvdimm: Set numa_node to NVDIMM devices ... Browse Code »

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams

Toshi Kani
2015-06-26 23:23:38 +0800
581388209 libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only ... Browse Code »

Upon detection of an unarmed dimm in a region, arrange for descendant
BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
"unarmed" via flags passed by platform firmware (NFIT).

The flags in the NFIT memory device sub-structure indicate the state of
the data on the nvdimm relative to its energy source or last "flush to
persistence". For the most part there is nothing the driver can do but
advertise the state of these flags in sysfs and emit a message if
firmware indicates that the contents of the device may be corrupted.
However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
the block devices incorporating that nvdimm to be marked read-only.
This is a safe default as the data is still available and new writes are
held off until the administrator either forces read-write mode, or the
energy source becomes armed.

A 'read_only' attribute is added to REGION devices to allow for
overriding the default read-only policy of all descendant block devices.

Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
0f51c4fa7 pmem: flag pmem block devices as non-rotational ... Browse Code »

...since they are effectively SSDs as far as userspace is concerned.

Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
f0dc089ce libnvdimm: enable iostat ... Browse Code »

This is disabled by default as the overhead is prohibitive, but if the
user takes the action to turn it on we'll oblige.

Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
edc870e54 pmem: make_request cleanups ... Browse Code »

Various cleanups:

1/ Kill the BUG_ON since we've already told the block layer we don't
support DISCARD on all these drivers.

2/ Kill the 'rw' variable, no need to cache it.

3/ Kill the local 'sector' variable. bio_for_each_segment() is already
advancing the iterator's sector number by the bio_vec length.

4/ Kill the check for accessing past the end of device
generic_make_request_checks() already does that.

Suggested-by: Christoph Hellwig
[hch: kill access past end of the device check]
Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
43d3fa3a0 libnvdimm, pmem: fix up max_hw_sectors ... Browse Code »

There is no hardware limit to enforce on the size of the i/o that can be passed
to an nvdimm block device, so set it to UINT_MAX.

Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
fcae69573 libnvdimm, blk: add support for blk integrity ... Browse Code »

Support multiple block sizes (sector + metadata) for nd_blk in the
same way as done for the BTT. Add the idea of an 'internal' lbasize,
which is properly aligned and padded, and store metadata in this space.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-06-26 23:23:38 +0800
41cd8b70c libnvdimm, btt: add support for blk integrity ... Browse Code »

Support multiple block sizes (sector + metadata) using the blk integrity
framework. This registers a new integrity template that defines the
protection information tuple size based on the configured metadata size,
and simply acts as a passthrough for protection information generated by
another layer. The metadata is written to the storage as-is, and read back
with each sector.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-06-26 23:23:38 +0800
047fc8a1f libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory ... Browse Code »

The libnvdimm implementation handles allocating dimm address space (DPA)
between PMEM and BLK mode interfaces. After DPA has been allocated from
a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
as a struct bio based block device. Unlike PMEM, BLK is required to
handle platform specific details like mmio register formats and memory
controller interleave. For this reason the libnvdimm generic nd_blk
driver calls back into the bus provider to carry out the I/O.

This initial implementation handles the BLK interface defined by the
ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
DCR (dimm control region), BDW (block data window), IDT (interleave
descriptor) NFIT structures and the hardware register format.
[1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
[2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Andy Lutomirski
Cc: Boaz Harrosh
Cc: H. Peter Anvin
Cc: Jens Axboe
Cc: Ingo Molnar
Cc: Christoph Hellwig
Signed-off-by: Ross Zwisler
Acked-by: Rafael J. Wysocki
Signed-off-by: Dan Williams

Ross Zwisler
2015-06-26 23:23:38 +0800