15 Feb, 2017
2 commits
-
commit bfb34527a32a1a576d9bfb7026d3ab0369a6cd60 upstream.
When vmemmap_populate() allocates space for the memmap it does so in 2MB
sized chunks. The libnvdimm-pfn driver incorrectly accounts for this
when the alignment of the device is set to 4K. When this happens we
trigger memory allocation failures in altmap_alloc_block_buf() and
trigger warnings of the form:WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0
[..]
Call Trace:
dump_stack+0x86/0xc3
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
arch_add_memory+0xe4/0xf0
devm_memremap_pages+0x29b/0x4e0Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman -
commit 9d032f4201d39e5cf43a8709a047e481f5723fdc upstream.
Given that the naming of pmem devices changes from the pmemX form to the
pmemX.Y form when namespace id is greater than 0, arrange for namespaces
with id-0 to be exempt from deletion. Otherwise a simple reconfiguration
of an existing namespace to a new mode results in a name change of the
resulting block device:# ndctl list --namespace=namespace1.0
{
"dev":"namespace1.0",
"mode":"raw",
"size":2147483648,
"uuid":"3dadf3dc-89b9-4b24-b20e-abc8a4707ce3",
"blockdev":"pmem1"
}# ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
{
"dev":"namespace1.1",
"mode":"memory",
"size":2111832064,
"uuid":"7b4a6341-7318-4219-a02c-fb57c0bbf613",
"blockdev":"pmem1.1"
}This change does require tooling changes to explicitly look for
namespaceX.0 if the seed has already advanced to another namespace.Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman
26 Jan, 2017
1 commit
-
commit 1f19b983a8877f81763fab3e693c6befe212736d upstream.
Commit 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple
pmem-namespaces per region") added support for establishing additional
pmem namespace beyond the seed device, similar to blk namespaces.
However, it neglected to delete the namespace when the size is set to
zero.Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman
09 Jan, 2017
1 commit
-
commit af7d9f0c57941b465043681cb5c3410f7f3f1a41 upstream.
Fix the format specifier so that the attribute can be parsed correctly.
Currently it returns decimal 1000 for a 4096-byte alignment.Reported-by: Dave Jiang
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman
07 Dec, 2016
1 commit
-
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.ars_status00000000: 00020000 00000000 ........
BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
task: ffff8803332d2ec0 task.stack: ffffc9000174c000
RIP: 0010:[] [] __memcpy+0x12/0x20
RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246
RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
Stack:
ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
Call Trace:
[] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
[] nfit_test_probe+0x670/0xb1b [nfit_test]Cc:
Fixes: 747ffe11b440 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams
28 Oct, 2016
1 commit
-
A bugfix just tried to address a randconfig build problem and introduced
a variant of the same problem: with CONFIG_LIBNVDIMM=y and
CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:drivers/nvdimm/built-in.o: In function `to_nd_device_type':
bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_probe':
region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
drivers/nvdimm/built-in.o: In function `mode_show':
namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa55f): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa56e): undefined reference to `to_nd_dax'This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
as it should be (it gets linked into the libnvdimm module). To fix
the original problem, I'm adding a dependency on LIBNVDIMM to
DEV_DAX_PMEM, which ensures we can't have that one built-in if the
rest is a module.Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage")
Signed-off-by: Arnd Bergmann
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams
20 Oct, 2016
2 commits
-
ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform. pmem_clear_poison() returns without clearing
badblocks in such cases. This failure is detected at the next read
(-EIO).This behavior can lead to an issue when user keeps writing but does not
read immediately. For instance, flight recorder file may be only read
when it is necessary for troubleshooting.Change pmem_do_bvec() and pmem_clear_poison() to return -EIO so that
filesystem can log an error message on a write error.Cc: Vishal Verma
Signed-off-by: Toshi Kani
Signed-off-by: Dan Williams -
If the kcalloc() fails then "devs" can be NULL and we dereference it
checking "devs[i]".Fixes: 1b40e09a1232 ('libnvdimm: blk labels and namespace instantiation')
Signed-off-by: Dan Carpenter
Signed-off-by: Dan Williams
08 Oct, 2016
12 commits
-
The function dax_pmem_probe() in drivers/dax/pmem.c is compiled under the
CONFIG_DEV_DAX_PMEM tri-state config option. This config option currently
only depends on CONFIG_NVDIMM_DAX, a bool, which means that the following
configuration is possible:CONFIG_LIBNVDIMM=m
...
CONFIG_NVDIMM_DAX=y
CONFIG_DEV_DAX=y
CONFIG_DEV_DAX_PMEM=yWith this config LIBNVDIMM is compiled as a module with NVDIMM_DAX=y just
meaning that we will compile drivers/nvdimm/dax_devs.c into that module.
However, dax_pmem_probe() depends on several symbols defined in
drivers/nvdimm/dax_devs.c, which results in the following build errors:drivers/built-in.o: In function `dax_pmem_probe':
linux/drivers/dax/pmem.c:70: undefined reference to `to_nd_dax'
linux/drivers/dax/pmem.c:74: undefined reference to
`nvdimm_namespace_common_probe'
linux/drivers/dax/pmem.c:80: undefined reference to `devm_nsio_enable'
linux/drivers/dax/pmem.c:81: undefined reference to `nvdimm_setup_pfn'
linux/drivers/dax/pmem.c:84: undefined reference to `devm_nsio_disable'
linux/drivers/dax/pmem.c:122: undefined reference to `to_nd_region'
drivers/built-in.o: In function `dax_pmem_init':
linux/drivers/dax/pmem.c:147: undefined reference to `__nd_driver_register'Fix this by making NVDIMM_DAX a tristate. DEV_DAX_PMEM depends on
NVDIMM_DAX which depends on LIBNVDIMM. Since they are all now tristates,
if LIBNVDIMM is built as a kernel module DEV_DAX_PMEM will be as well.
This prevents dax_devs.c from being built as a built-in while its
dependencies are in the libnvdimm.ko module.Signed-off-by: Ross Zwisler
Signed-off-by: Dan Williams -
Similar to BLK regions, publish new seed namespace devices to allow
unused PMEM region capacity to be consumed by additional namespaces.Signed-off-by: Dan Williams
-
Now that the rest of the infrastructure has been converted to handle
multi-pmem configurations, lift the artificial barrier at scan time.Signed-off-by: Dan Williams
-
Short-circuit doomed-to-fail label validation attempts by skipping
labels that are outside the given region. For example a DIMM that has
multiple PMEM regions will waste time attempting to create namespaces
only to find that the interleave-set-cookie does not validate, e.g.:nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2
Similar to how we skip BLK labels when performing PMEM validation we can
skip out-of-range labels early.Signed-off-by: Dan Williams
-
Now that we have nd_region_available_dpa() able to handle the presence
of multiple PMEM allocations in aliased PMEM regions, reuse that same
infrastructure to track allocations from free space. In particular
handle allocating from an aliased PMEM region in the case where there
are dis-contiguous holes. The allocation for BLK and PMEM are
documented in the space_valid() helper:BLK-space is valid as long as it does not precede a PMEM
allocation in a given region. PMEM-space must be contiguous
and adjacent to an existing existing allocation (if one
exists).Signed-off-by: Dan Williams
-
Instead of assuming that there will only ever be one allocated range at
the start of the region, account for additional namespaces that might
start at an offset from the region base.After this change pmem namespaces now have a reason to carry an array of
resources similar to blk. Unifying the resource tracking infrastructure
in nd_namespace_common is a future cleanup candidate.Signed-off-by: Dan Williams
-
pmem devices are currently named /dev/pmem. Preserve the
naming of the 0th device, but add a "." for other
devices.Signed-off-by: Dan Williams
-
The free dpa (dimm-physical-address) space calculation reports how much
free space is available with consideration for aliased BLK + PMEM
regions. Recall that BLK capacity is allocated from high addresses and
PMEM is allocated from low addresses in their respective regions.nd_region_available_dpa() accounts for the fact that the largest
encroachment (lowest starting address) into PMEM capacity by a BLK
allocation limits the available capacity to that point, regardless if
there is BLK allocation hole at a higher address. Similarly, for the
multi-pmem case we need to track the largest encroachment (highest
ending address) of a PMEM allocation in BLK capacity regardless of
whether there is an allocation hole that a BLK allocation could fill at
a lower address.Signed-off-by: Dan Williams
-
Add more determinism to initial namespace device-name assignments by
sorting the namespaces by starting dpa.Signed-off-by: Dan Williams
-
If label scanning finds multiple valid pmem namespaces allow them to be
surfaced rather than fail namespace scanning. Support for creating
multiple namespaces per region is saved for a later patch.Note that this adds some new error messages to clarify which of the pmem
namespaces in the set are potentially impacted by invalid labels.Signed-off-by: Dan Williams
06 Oct, 2016
2 commits
-
In preparation for allowing multiple namespace per pmem region, unify
blk and pmem label scanning. Given that blk regions already support
multiple namespaces, teaching that path how to do pmem namespace
scanning is an incremental step towards multiple pmem namespace support.
This should be functionally equivalent to the previous state in that
stops after finding the first valid pmem label set.Signed-off-by: Dan Williams
-
The ability to translate a generic struct device pointer into a
namespace uuid is a useful utility as we go to unify the blk and pmem
label scanning paths.Signed-off-by: Dan Williams
01 Oct, 2016
5 commits
-
In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list. In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state. Currently we only track one validated set per-region.Signed-off-by: Dan Williams
-
Before we add more libnvdimm-private fields to nd_mapping make it clear
which parameters are input vs libnvdimm internals. Use struct
nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make
struct nd_mapping private to libnvdimm.Signed-off-by: Dan Williams
-
Existing implemenetation writes to all the flush hint addresses for a
given ND region. This is not necessary as the flushes are per imc and
not per DIMM. Search the mappings and clear out the duplicates at init
to avoid multiple flush to the same imc.Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams -
nvdimm_clear_poison cleared the user-visible badblocks, and sent
commands to the NVDIMM to clear the areas marked as 'poison', but it
neglected to clear the same areas from the internal poison_list which is
used to marshal ARS results before sorting them by namespace. As a
result, once on-demand ARS functionality was added:37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand
A scrub triggered from either sysfs or an MCE was found to be adding
stale entries that had been cleared from gendisk->badblocks, but were
still present in nvdimm_bus->poison_list. Additionally, the stale entries
could be triggered into producing stale disk->badblocks by simply disabling
and re-enabling the namespace or region.This adds the missing step of clearing poison_list entries when clearing
poison, so that it is always in sync with badblocks.Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams -
pmem_do_bvec used to kmap_atomic at the begin, and only unmap at the
end. Things like nvdimm_clear_poison may want to do nvdimm subsystem
bookkeeping operations that may involve taking locks or doing memory
allocations, and we can't do that from the atomic context. Reduce the
atomic context to just what needs it - the memcpy to/from pmem.Cc: Ross Zwisler
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
25 Sep, 2016
1 commit
-
The definition of the flush hint table as:
void __iomem *flush_wpq[0][0];
...passed the unit test, but is broken as flush_wpq[0][1] and
flush_wpq[1][0] refer to the same entry. Fix this to use a helper that
calculates a slot in the table based on the geometry of flush hints in
the region. This is important to get right since virtualization
solutions use this mechanism to trigger hypervisor flushes to platform
persistence.Reported-by: Dave Jiang
Tested-by: Dave Jiang
Signed-off-by: Dan Williams
22 Sep, 2016
3 commits
-
Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams -
If platform firmware fails to populate unique / non-zero serial number
data for each nvdimm in an interleave-set it may cause pmem region
initialization to fail. Add a debug message for this case.Signed-off-by: Dan Williams
-
The internal alloc_nvdimm_map() helper might fail, particularly if the
memory region is already busy. Report request_mem_region() failures and
check for the failure.Reported-by: Ryan Chen
Signed-off-by: Dan Williams
19 Sep, 2016
1 commit
-
nd_activate_region() iomaps any hint addresses required when activating
a region. To prevent duplicate mappings it checks the PFN of the hint to
be mapped against the PFNs of the already mapped hints. Unfortunately it
doesn't convert the PFN back into a physical address before passing it
to devm_nvdimm_ioremap(). Instead it applies PHYS_PFN a second time
which ends about as well as you would imagine.Signed-off-by: Oliver O'Halloran
Signed-off-by: Dan Williams
10 Sep, 2016
1 commit
-
Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation
where legacy pmem is being used or a pmem region created by using memmap
kernel parameter, the injected bad blocks are not cleared due to
nvdimm_clear_poison() failing from lack of ndctl function pointer. In
this case we need to just return as handled and allow the bad blocks to
be cleared rather than fail.Reviewed-by: Vishal Verma
Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams
02 Sep, 2016
2 commits
-
'ndctl list --buses --dimms' does not list any NVDIMM-Ns since
they are considered as idle. ndctl checks if any driver is
attached to nmem device. nvdimm_probe() always fails in
nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal
ND_CMD_GET_CONFIG_DATA command.Change nvdimm_probe() to accept the case that the CONFIG_DATA
command is not implemented for NVDIMM-Ns. The driver attaches
without ndd, which keeps it no-op to the device.Reported-by: Brian Boylston
Signed-off-by: Toshi Kani
Cc: Dan Williams
Tested-by: Johannes Thumshirn
Acked-by: Johannes Thumshirn
Signed-off-by: Dan Williams -
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Dan Williams
30 Aug, 2016
1 commit
-
Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012
NVDIMM Root device, can receive health event notifications.Given that these devices are precluded from registering a notification
handler via acpi_driver.acpi_device_ops (due to no _HID), we use
acpi_install_notify_handler() directly. The registered handler,
acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags
sysfs attribute when a health event notification is received.Cc: Rafael J. Wysocki
Tested-by: Toshi Kani
Reviewed-by: Vishal Verma
Acked-by: Rafael J. Wysocki
Reviewed-by: Toshi Kani
Signed-off-by: Dan Williams
09 Aug, 2016
1 commit
-
To be consistent with other namespaces, expose a 'size' attribute for
BTT devices also.Cc: Dan Williams
Reported-by: Linda Knippers
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
08 Aug, 2016
2 commits
-
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.No intended functional changes in this commit.
Signed-off-by: Jens Axboe
-
Commit abf545484d31 changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.Reviewed-by: Mike Christie
Signed-off-by: Jens Axboe
05 Aug, 2016
1 commit
-
The rw_page users were not converted to use bio/req ops. As a result
bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
be sent down as reads.Signed-off-by: Mike Christie
Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")Modified by me to:
1) Drop op_flags passing into ->rw_page(), as we don't use it.
2) Make op_is_write() and friends safe to use for !CONFIG_BLOCKSigned-off-by: Jens Axboe