Eric Lee / smarc-fsl-linux-kernel

06 Mar, 2016

3 commits

87bf572e1 nfit: disable userspace initiated ars during scrub ... Browse Code »

While the nfit driver is issuing address range scrub commands and
reaping the results do not permit an ars_start command issued from
userspace. The scrub thread assumes that all ars completions are for
scrubs initiated by platform firmware at boot, or by the nfit driver.

Signed-off-by: Dan Williams

Dan Williams
2016-03-06 04:24:06 +0800
7ae0fa439 nfit, libnvdimm: async region scrub workqueue ... Browse Code »

Introduce a workqueue that will be used to run address range scrub
asynchronously with the rest of nvdimm device probing.

Userspace still wants notification when probing operations complete, so
introduce a new callback to flush this workqueue when userspace is
awaiting probe completion.

Signed-off-by: Dan Williams

Dan Williams
2016-03-06 04:24:06 +0800
aef253382 libnvdimm, nfit: centralize command status translation ... Browse Code »

The return value from an 'ndctl_fn' reports the command execution
status, i.e. was the command properly formatted and was it successfully
submitted to the bus provider. The new 'cmd_rc' parameter allows the bus
provider to communicate command specific results, translated into
common error codes.

Convert the ARS commands to this scheme to:

1/ Consolidate status reporting

2/ Prepare for for expanding ars unit test cases

3/ Make the implementation more generic

Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2016-03-06 04:24:06 +0800

24 Feb, 2016

1 commit

4577b0665 nfit: update address range scrub commands to the acpi 6.1 format ... Browse Code »

The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
"9.20.7 NVDIMM Root Device _DSMs"

Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
implemented in the middle of the existing definition so this change
is not backwards compatible. The expectation is that shipping
platforms will only ever support the ACPI 6.1 definition.

2/ New status values for ars_start ('busy') and ars_status ('overflow').

Cc: Vishal Verma
Cc: Linda Knippers
Cc:
Signed-off-by: Dan Williams

Dan Williams
2016-02-24 09:17:20 +0800

20 Feb, 2016

1 commit

747ffe11b libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing ... Browse Code »

Use the output length specified in the command to size the receive
buffer rather than the arbitrary 4K limit.

This bug was hiding the fact that the ndctl implementation of
ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.

Cc:
Cc: Vishal Verma
Signed-off-by: Dan Williams

Dan Williams
2016-02-20 07:21:52 +0800

10 Jan, 2016

1 commit

0caeef63e libnvdimm: Add a poison list and export badblocks ... Browse Code »

During region creation, perform Address Range Scrubs (ARS) for the SPA
(System Physical Address) ranges to retrieve known poison locations from
firmware. Add a new data structure 'nd_poison' which is used as a list
in nvdimm_bus to store these poison locations.

When creating a pmem namespace, if there is any known poison associated
with its physical address space, convert the poison ranges to bad sectors
that are exposed using the badblocks interface.

Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2016-01-10 00:39:03 +0800

29 Aug, 2015

1 commit

004f1afbe libnvdimm, pmem: direct map legacy pmem by default ... Browse Code »

The expectation is that the legacy / non-standard pmem discovery method
(e820 type-12) will only ever be used to describe small quantities of
persistent memory. Larger capacities will be described via the ACPI
NFIT. When "allocate struct page from pmem" support is added this default
policy can be overridden by assigning a legacy pmem namespace to a pfn
device, however this would be only be necessary if a platform used the
legacy mechanism to define a very large range.

Cc: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-08-29 11:40:05 +0800

26 Jun, 2015

5 commits

74ae66c3b libnvdimm: Add sysfs numa_node to NVDIMM devices ... Browse Code »

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.

An example of numa_node values on a 2-socket system with a single
NVDIMM range on each socket is shown below.
/sys/bus/nd/devices
|-- btt0.0/numa_node:0
|-- btt1.0/numa_node:1
|-- btt1.1/numa_node:1
|-- namespace0.0/numa_node:0
|-- namespace1.0/numa_node:1
|-- region0/numa_node:0
|-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
/sys/class/block/pmem0/device/numa_node:0
/sys/class/block/pmem1s/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
numactl --preferred block:pmem0 --show
numactl --preferred file:/dev/pmem1s --show

Signed-off-by: Toshi Kani
Signed-off-by: Dan Williams

Toshi Kani
2015-06-26 23:23:38 +0800
41d7a6d63 libnvdimm: Set numa_node to NVDIMM devices ... Browse Code »

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.

Signed-off-by: Toshi Kani
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams

Toshi Kani
2015-06-26 23:23:38 +0800
581388209 libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only ... Browse Code »

Upon detection of an unarmed dimm in a region, arrange for descendant
BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
"unarmed" via flags passed by platform firmware (NFIT).

The flags in the NFIT memory device sub-structure indicate the state of
the data on the nvdimm relative to its energy source or last "flush to
persistence". For the most part there is nothing the driver can do but
advertise the state of these flags in sysfs and emit a message if
firmware indicates that the contents of the device may be corrupted.
However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
the block devices incorporating that nvdimm to be marked read-only.
This is a safe default as the data is still available and new writes are
held off until the administrator either forces read-write mode, or the
energy source becomes armed.

A 'read_only' attribute is added to REGION devices to allow for
overriding the default read-only policy of all descendant block devices.

Signed-off-by: Dan Williams

Dan Williams
2015-06-26 23:23:38 +0800
047fc8a1f libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory ... Browse Code »

The libnvdimm implementation handles allocating dimm address space (DPA)
between PMEM and BLK mode interfaces. After DPA has been allocated from
a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
as a struct bio based block device. Unlike PMEM, BLK is required to
handle platform specific details like mmio register formats and memory
controller interleave. For this reason the libnvdimm generic nd_blk
driver calls back into the bus provider to carry out the I/O.

This initial implementation handles the BLK interface defined by the
ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
DCR (dimm control region), BDW (block data window), IDT (interleave
descriptor) NFIT structures and the hardware register format.
[1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
[2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Andy Lutomirski
Cc: Boaz Harrosh
Cc: H. Peter Anvin
Cc: Jens Axboe
Cc: Ingo Molnar
Cc: Christoph Hellwig
Signed-off-by: Ross Zwisler
Acked-by: Rafael J. Wysocki
Signed-off-by: Dan Williams

Ross Zwisler
2015-06-26 23:23:38 +0800
5212e11fd nd_btt: atomic sector updates ... Browse Code »

BTT stands for Block Translation Table, and is a way to provide power
fail sector atomicity semantics for block devices that have the ability
to perform byte granularity IO. It relies on the capability of libnvdimm
namespace devices to do byte aligned IO.

The BTT works as a stacked blocked device, and reserves a chunk of space
from the backing device for its accounting metadata. It is a bio-based
driver because all IO is done synchronously, and there is no queuing or
asynchronous completions at either the device or the driver level.

The BTT uses 'lanes' to index into various 'on-disk' data structures,
and lanes also act as a synchronization mechanism in case there are more
CPUs than available lanes. We did a comparison between two lane lock
strategies - first where we kept an atomic counter around that tracked
which was the last lane that was used, and 'our' lane was determined by
atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
theoretically, no CPU would be blocked waiting for a lane. The other
strategy was to use the cpu number we're scheduled on to and hash it to
a lane number. Theoretically, this could block an IO that could've
otherwise run using a different, free lane. But some fio workloads
showed that the direct cpu -> lane hash performed faster than tracking
'last lane' - my reasoning is the cache thrash caused by moving the
atomic variable made that approach slower than simply waiting out the
in-progress IO. This supports the conclusion that the driver can be a
very simple bio-based one that does synchronous IOs instead of queuing.

Cc: Andy Lutomirski
Cc: Boaz Harrosh
Cc: H. Peter Anvin
Cc: Jens Axboe
Cc: Ingo Molnar
Cc: Christoph Hellwig
Cc: Neil Brown
Cc: Jeff Moyer
Cc: Dave Chinner
Cc: Greg KH
[jmoyer: fix nmi watchdog timeout in btt_map_init]
[jmoyer: move btt initialization to module load path]
[jmoyer: fix memory leak in the btt initialization path]
[jmoyer: Don't overwrite corrupted arenas]
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams

Vishal Verma
2015-06-26 23:23:38 +0800

25 Jun, 2015

10 commits

1b40e09a1 libnvdimm: blk labels and namespace instantiation ... Browse Code »

A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm. They may alias with one or
more pmem interleave sets that include the given dimm.

This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later
patch will make these settings persistent by writing back the label(s).

Unlike pmem namespaces, multiple blk namespaces can be created per
region. Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated. As long as
a region has 'available_size' != 0 new child namespaces may be created.

Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
bf9bccc14 libnvdimm: pmem label sets and namespace instantiation. ... Browse Code »

A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.

Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes. A later patch will make
these settings persistent by writing back the label.

Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.

Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
eaf961536 libnvdimm, nfit: add interleave-set state-tracking infrastructure ... Browse Code »

On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface. A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.

Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active. Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command. Adding/deleting namespaces from an active
interleave set is always possible via sysfs.

Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered. For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration. It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.

Cc: Neil Brown
Cc:
Cc: Greg KH
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
3d88002e4 libnvdimm: support for legacy (non-aliasing) nvdimms ... Browse Code »

The libnvdimm region driver is an intermediary driver that translates
non-volatile "region"s into "namespace" sub-devices that are surfaced by
persistent memory block-device drivers (PMEM and BLK).

ACPI 6 introduces the concept that a given nvdimm may simultaneously
offer multiple access modes to its media through direct PMEM load/store
access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
If an nvdimm is single interfaced, then there is no need for dimm
metadata labels. For these devices we can take the region boundaries
directly to create a child namespace device (nd_namespace_io).

Acked-by: Christoph Hellwig
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
1f7df6f88 libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory) ... Browse Code »

A "region" device represents the maximum capacity of a BLK range (mmio
block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
volatile memory), without regard for aliasing. Aliasing, in the
dimm-local address space (DPA), is resolved by metadata on a dimm to
designate which exclusive interface will access the aliased DPA ranges.
Support for the per-dimm metadata/label arrvies is in a subsequent
patch.

The name format of "region" devices is "regionN" where, like dimms, N is
a global ida index assigned at discovery time. This id is not reliable
across reboots nor in the presence of hotplug. Look to attributes of
the region or static id-data of the sub-namespace to generate a
persistent name. However, if the platform configuration does not change
it is reasonable to expect the same region id to be assigned at the next
boot.

"region"s have 2 generic attributes "size", and "mapping"s where:
- size: the BLK accessible capacity or the span of the
system physical address range in the case of PMEM.

- mappingN: a tuple describing a dimm's contribution to the region's
capacity in the format (,,). For a PMEM-region
there will be at least one mapping per dimm in the interleave set. For
a BLK-region there is only "mapping0" listing the starting DPA of the
BLK-region and the available DPA capacity of that space (matches "size"
above).

The max number of mappings per "region" is hard coded per the
constraints of sysfs attribute groups. That said the number of mappings
per region should never exceed the maximum number of possible dimms in
the system. If the current number turns out to not be enough then the
"mappings" attribute clarifies how many there are supposed to be. "32
should be enough for anybody...".

Cc: Neil Brown
Cc:
Cc: Greg KH
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
4d88a97aa libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure ... Browse Code »

* Implement the device-model infrastructure for loading modules and
attaching drivers to nvdimm devices. This is a simple association of a
nd-device-type number with a driver that has a bitmask of supported
device types. To facilitate userspace bind/unbind operations 'modalias'
and 'devtype', that also appear in the uevent, are added as generic
sysfs attributes for all nvdimm devices. The reason for the device-type
number is to support sub-types within a given parent devtype, be it a
vendor-specific sub-type or otherwise.

* The first consumer of this infrastructure is the driver
for dimm devices. It simply uses control messages to retrieve and
store the configuration-data image (label set) from each dimm.

Note: nd_device_register() arranges for asynchronous registration of
nvdimm bus devices by default.

Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
62232e45f libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices ... Browse Code »

Most discovery/configuration of the nvdimm-subsystem is done via sysfs
attributes. However, some nvdimm_bus instances, particularly the
ACPI.NFIT bus, define a small set of messages that can be passed to the
platform. For convenience we derive the initial libnvdimm-ioctl command
formats directly from the NFIT DSM Interface Example formats.

ND_CMD_SMART: media health and diagnostics
ND_CMD_GET_CONFIG_SIZE: size of the label space
ND_CMD_GET_CONFIG_DATA: read label space
ND_CMD_SET_CONFIG_DATA: write label space
ND_CMD_VENDOR: vendor-specific command passthrough
ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
ND_CMD_ARS_START: initiate scrubbing
ND_CMD_ARS_STATUS: report on scrubbing state
ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events

If a platform later defines different commands than this set it is
straightforward to extend support to those formats.

Most of the commands target a specific dimm. However, the
address-range-scrubbing commands target the bus. The 'commands'
attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
commands for that object.

Cc:
Cc: Robert Moore
Cc: Rafael J. Wysocki
Reported-by: Nicholas Moulin
Acked-by: Christoph Hellwig
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
e6dfb2de4 libnvdimm, nfit: dimm/memory-devices ... Browse Code »

Enable nvdimm devices to be registered on a nvdimm_bus. The kernel
assigned device id for nvdimm devicesis dynamic. If userspace needs a
more static identifier it should consult a provider-specific attribute.
In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
'nmemX/nfit/serial' attributes may be used for this purpose.

Cc: Neil Brown
Cc:
Cc: Greg KH
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
45def22c1 libnvdimm: control character device and nvdimm_bus sysfs attributes ... Browse Code »

The control device for a nvdimm_bus is registered as an "nd" class
device. The expectation is that there will usually only be one "nd" bus
registered under /sys/class/nd. However, we allow for the possibility
of multiple buses and they will listed in discovery order as
ndctl0...ndctlN. This character device hosts the ioctl for passing
control messages. The initial command set has a 1:1 correlation with
the commands listed in the by the "NFIT DSM Example" document [1], but
this scheme is extensible to future command sets.

Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
a subsequent patch. This is simply the initial registrations and sysfs
attributes.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Neil Brown
Cc: Greg KH
Cc:
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800
b94d5230d libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support ... Browse Code »

A struct nvdimm_bus is the anchor device for registering nvdimm
resources and interfaces, for example, a character control device,
nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware
Interface Table) is one possible platform description for such
non-volatile memory resources in a system. The nfit.ko driver attaches
to the "ACPI0012" device that indicates the presence of the NFIT and
parses the table to register a struct nvdimm_bus instance.

Cc:
Cc: Lv Zheng
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Jeff Moyer
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Tested-by: Toshi Kani
Signed-off-by: Dan Williams

Dan Williams
2015-06-25 09:24:10 +0800