19 Jul, 2019
1 commit
-
For good reason, the standard device_lock() is marked
lockdep_set_novalidate_class() because there is simply no sane way to
describe the myriad ways the device_lock() ordered with other locks.
However, that leaves subsystems that know their own local device_lock()
ordering rules to find lock ordering mistakes manually. Instead,
introduce an optional / additional lockdep-enabled lock that a subsystem
can acquire in all the same paths that the device_lock() is acquired.A conversion of the NFIT driver and NVDIMM subsystem to a
lockdep-validate device_lock() scheme is included. The
debug_nvdimm_lock() implementation implements the correct lock-class and
stacking order for the libnvdimm device topology hierarchy.Yes, this is a hack, but hopefully it is a useful hack for other
subsystems device_lock() debug sessions. Quoting Greg:"Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
using it as much as anything else, so user beware :)I don't object to it if it makes things easier for you to debug."
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Will Deacon
Cc: Dave Jiang
Cc: Keith Busch
Cc: Peter Zijlstra
Cc: Vishal Verma
Cc: "Rafael J. Wysocki"
Cc: Greg Kroah-Hartman
Signed-off-by: Dan Williams
Acked-by: Greg Kroah-Hartman
Reviewed-by: Ira Weiny
Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
05 Jun, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more detailsextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 64 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman
07 Mar, 2018
1 commit
-
Dynamic debug can be instructed to add the function name to the debug
output using the +f switch, so there is no need for the libnvdimm
modules to do it again. If a user decides to add the +f switch for
libnvdimm's dynamic debug this results in double prints of the function
name.Reported-by: Johannes Thumshirn
Reported-by: Ross Zwisler
Signed-off-by: Dan Williams
03 Nov, 2017
1 commit
-
nfit_test needs to use the poison list manipulation code as well. Make
it more generic and in the process rename poison to badrange, and move
all the related helpers to a new file.Signed-off-by: Dave Jiang
[vishal: Add badrange.o to nfit_test's Kbuild]
[vishal: add a missed include in bus.c for the new badrange functions]
[vishal: rename all instances of 'be' to 'bre']
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
12 Aug, 2017
1 commit
-
Prepare for other another consumer of this size selection scheme that is
not a 'sector size'.Cc: Oliver O'Halloran
Signed-off-by: Dan Williams
18 Jul, 2017
1 commit
-
__add_badblock_range() does not account sector alignment when
it sets 'num_sectors'. Therefore, an ARS error record range
spanning across two sectors is set to a single sector length,
which leaves the 2nd sector unprotected.Change __add_badblock_range() to set 'num_sectors' properly.
Cc:
Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: Toshi Kani
Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams
04 Jul, 2017
1 commit
28 Jun, 2017
1 commit
-
Allow volatile nfit ranges to participate in all the same infrastructure
provided for persistent memory regions. A resulting resulting namespace
device will still be called "pmem", but the parent region type will be
"nd_volatile". This is in preparation for disabling the dax ->flush()
operation in the pmem driver when it is hosted on a volatile range.Cc: Jan Kara
Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Ross Zwisler
Signed-off-by: Dan Williams
16 Jun, 2017
1 commit
-
Starting with v1.2 labels, 'address abstractions' can be hinted via an
address abstraction id that implies an info-block format. The standard
address abstraction in the specification is the v2 format of the
Block-Translation-Table (BTT). Support for that is saved for a later
patch, for now we add support for the Linux supported address
abstractions BTT (v1), PFN, and DAX.The new 'holder_class' attribute for namespace devices is added for
tooling to specify the 'abstraction_guid' to store in the namespace label.
For v1.1 labels this field is undefined and any setting of
'holder_class' away from the default 'none' value will only have effect
until the driver is unloaded. Setting 'holder_class' requires that
whatever device tries to claim the namespace must be of the specified
class.Cc: Vishal Verma
Signed-off-by: Dan Williams
14 Apr, 2017
1 commit
-
The following warning results from holding a lane spinlock,
preempt_disable(), or the btt map spinlock and then trying to take the
reconfig_mutex to walk the poison list and potentially add new entries.BUG: sleeping function called from invalid context at kernel/locking/mutex.
c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
[..]
Call Trace:
dump_stack+0x85/0xc8
___might_sleep+0x184/0x250
__might_sleep+0x4a/0x90
__mutex_lock+0x58/0x9b0
? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
? acpi_nfit_forget_poison+0x79/0x80 [nfit]
? _raw_spin_unlock+0x27/0x40
mutex_lock_nested+0x1b/0x20
nvdimm_bus_lock+0x21/0x30 [libnvdimm]
nvdimm_forget_poison+0x25/0x50 [libnvdimm]
nvdimm_clear_poison+0x106/0x140 [libnvdimm]
nsio_rw_bytes+0x164/0x270 [libnvdimm]
btt_write_pg+0x1de/0x3e0 [nd_btt]
? blk_queue_enter+0x30/0x290
btt_make_request+0x11a/0x310 [nd_btt]
? blk_queue_enter+0xb7/0x290
? blk_queue_enter+0x30/0x290
generic_make_request+0x118/0x3b0A spinlock is introduced to protect the poison list. This allows us to not
having to acquire the reconfig_mutex for touching the poison list. The
add_poison() function has been broken out into two helper functions. One to
allocate the poison entry and the other to apppend the entry. This allows us
to unlock the poison_lock in non-I/O path and continue to be able to allocate
the poison entry with GFP_KERNEL. We will use GFP_NOWAIT in the I/O path in
order to satisfy being in atomic context.Reviewed-by: Vishal Verma
Signed-off-by: Dave Jiang
Signed-off-by: Dan Williams
13 Apr, 2017
1 commit
-
Providing mechanism to clear poison list via the ndctl ND_CMD_CLEAR_ERROR
call. We will update the poison list and also the badblocks at region level
if the region is in dax mode or in pmem mode and not active. In other
words we force badblocks to be cleared through write requests if the
address is currently accessed through a block device, otherwise it can
only be done via the ioctl+dsm path.Signed-off-by: Dave Jiang
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams
19 Oct, 2016
1 commit
-
nd_iostat_start() and nd_iostat_end() implement the same functionality
that generic_start_io_acct() and generic_end_io_acct() already provide.Change nd_iostat_start() and nd_iostat_end() to call the generic iostat
interfaces. There is no change in the nd interfaces.Signed-off-by: Toshi Kani
Cc: Andrew Morton
Cc: Alexander Viro
Cc: Dave Chinner
Cc: Ross Zwisler
Signed-off-by: Dan Williams
08 Oct, 2016
1 commit
01 Oct, 2016
1 commit
-
nvdimm_clear_poison cleared the user-visible badblocks, and sent
commands to the NVDIMM to clear the areas marked as 'poison', but it
neglected to clear the same areas from the internal poison_list which is
used to marshal ARS results before sorting them by namespace. As a
result, once on-demand ARS functionality was added:37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand
A scrub triggered from either sysfs or an MCE was found to be adding
stale entries that had been cleared from gendisk->badblocks, but were
still present in nvdimm_bus->poison_list. Additionally, the stale entries
could be triggered into producing stale disk->badblocks by simply disabling
and re-enabling the namespace or region.This adds the missing step of clearing poison_list entries when clearing
poison, so that it is always in sync with badblocks.Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
22 Sep, 2016
1 commit
-
The internal alloc_nvdimm_map() helper might fail, particularly if the
memory region is already busy. Report request_mem_region() failures and
check for the failure.Reported-by: Ryan Chen
Signed-off-by: Dan Williams
24 Jul, 2016
2 commits
-
Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.Cc: Rafael J. Wysocki
Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams -
A recent effort to add a new nvdimm bus provider attribute highlighted a
race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down.
The typical way to handle these races is to take the device_lock() in
the attribute method and validate that the device is still active. In
order for a device to be 'active' it needs to be associated with a
driver. So, we create the small boilerplate for a driver and register
nvdimm_bus devices on the 'nvdimm_bus_type' bus.A result of this change is that ndbusX devices now appear under
/sys/bus/nd/devices. In fact this makes /sys/class/nd somewhat
redundant, but removing that will need to take a long deprecation period
given its use by ndctl binaries in the field.This change naturally pulls code from drivers/nvdimm/core.c to
drivers/nvdimm/bus.c, so it is a nice code organization clean-up as
well.Cc: Vishal Verma
Signed-off-by: Dan Williams
22 Jul, 2016
1 commit
-
Let the provider module be explicitly passed in rather than implicitly
assumed by the module that calls nvdimm_bus_register(). This is in
preparation for unifying the nfit and nfit_test driver teardown paths.Reviewed-by: Lee, Chun-Yi
Signed-off-by: Dan Williams
08 Jul, 2016
1 commit
-
In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api. Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active. This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping. Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.This eliminates the need for the nd_blk_region.disable() callback. Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.Signed-off-by: Dan Williams
07 Jul, 2016
1 commit
-
Initialize struct blk_integrity with 0 as blk_integrity_register() takes the
then unitialized struct blk_integrity::flags and ORs it to the resulting block
integrity structure.Signed-off-by: Johannes Thumshirn
Signed-off-by: Dan Williams
22 May, 2016
1 commit
21 May, 2016
1 commit
-
ida instances allocate some internal memory for ->free_bitmap in
addition to the base 'struct ida'. Use ida_destroy() to release that
memory at module_exit().Reported-by: Johannes Thumshirn
Reviewed-by: Johannes Thumshirn
Signed-off-by: Dan Williams
29 Apr, 2016
1 commit
-
Clarify the distinction between "commands", the ioctls userspace calls
to request the kernel take some action on a given dimm device, and
"_DSMs", the actual function numbers used in the firmware interface to
the DIMM. _DSMs are ACPI specific whereas commands are Linux kernel
generic.This is in preparation for breaking the 1:1 implicit relationship
between the kernel ioctl number space and the firmware specific function
numbers.Cc: Jerry Hoemann
Cc: Christoph Hellwig
Signed-off-by: Dan Williams
08 Apr, 2016
1 commit
-
When section alignment padding is in effect we need to shift / truncate
the range that is queried for poison by the 'start_pad' or 'end_trunc'
reservations.It's easiest if we just pass in an adjusted resource range rather than
deriving it from the passed in namespace. With the resource range
resolution pushed out to the caller we can also push the
namespace-to-region lookup to the caller and drop the implicit pmem-type
assumption about the passed in namespace object.Cc: Vishal Verma
Signed-off-by: Dan Williams
06 Mar, 2016
2 commits
-
Introduce a workqueue that will be used to run address range scrub
asynchronously with the rest of nvdimm device probing.Userspace still wants notification when probing operations complete, so
introduce a new callback to flush this workqueue when userspace is
awaiting probe completion.Signed-off-by: Dan Williams
-
In preparation for making poison list retrieval asynchronus to region
registration, add protection for walking and mutating the bus-level
poison list.Cc: Vishal Verma
Signed-off-by: Dan Williams
10 Jan, 2016
3 commits
-
If a device will ever have badblocks it should always have a badblocks
instance available. So, similar to md, embed a badblocks instance in
pmem_device. This reduces pointer chasing in the i/o fast path, and
simplifies the init path.Reported-by: Vishal Verma
Signed-off-by: Dan Williams -
If the badblocks list runs out of space it simply means that software is
unable to intercept all errors. This is no different than the latent
discovery of new badblocks case and should not be an initialization
failure condition.Signed-off-by: Dan Williams
-
During region creation, perform Address Range Scrubs (ARS) for the SPA
(System Physical Address) ranges to retrieve known poison locations from
firmware. Add a new data structure 'nd_poison' which is used as a list
in nvdimm_bus to store these poison locations.When creating a pmem namespace, if there is any known poison associated
with its physical address space, convert the poison ranges to bad sectors
that are exposed using the badblocks interface.Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
22 Oct, 2015
3 commits
-
The libnvidmm-btt and nvme drivers use blk_integrity to reserve space
for per-sector metadata, but sometimes without protection checksums.
This property is generically useful, so teach the block core to
internally specify a nop profile if one is not provided at registration
time.Cc: Keith Busch
Cc: Matthew Wilcox
Suggested-by: Christoph Hellwig
[hch: kill the local nvme nop profile as well]
Acked-by: Martin K. Petersen
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe -
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:- Moving the blk_integrity definition to genhd.h.
- Inlining blk_integrity in struct gendisk.
- Removing the dynamic allocation code.
- Adding helper functions which allow gendisk to set up and tear down
the integrity sysfs dir when a disk is added/deleted.- Adding a blk_integrity_revalidate() callback for updating the stable
pages bdi setting.- The calls that depend on whether a device has an integrity profile or
not now key off of the bi->profile pointer.- Simplifying the integrity support routines in DM (Mike Snitzer).
Signed-off-by: Martin K. Petersen
Reported-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Mike Snitzer
Cc: Dan Williams
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe -
We previously made a complete copy of a device's data integrity profile
even though several of the fields inside the blk_integrity struct are
pointers to fixed template entries in t10-pi.c.Split the static and per-device portions so that we can reference the
template directly.Signed-off-by: Martin K. Petersen
Reported-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Cc: Dan Williams
Signed-off-by: Dan Williams
Signed-off-by: Jens Axboe
26 Jun, 2015
3 commits
-
This is disabled by default as the overhead is prohibitive, but if the
user takes the action to turn it on we'll oblige.Reviewed-by: Vishal Verma
Signed-off-by: Dan Williams -
Support multiple block sizes (sector + metadata) for nd_blk in the
same way as done for the BTT. Add the idea of an 'internal' lbasize,
which is properly aligned and padded, and store metadata in this space.Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams -
Support multiple block sizes (sector + metadata) using the blk integrity
framework. This registers a new integrity template that defines the
protection information tuple size based on the configured metadata size,
and simply acts as a passthrough for protection information generated by
another layer. The metadata is written to the storage as-is, and read back
with each sector.Signed-off-by: Vishal Verma
Signed-off-by: Dan Williams
25 Jun, 2015
5 commits
-
A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm. They may alias with one or
more pmem interleave sets that include the given dimm.This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later
patch will make these settings persistent by writing back the label(s).Unlike pmem namespaces, multiple blk namespaces can be created per
region. Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated. As long as
a region has 'available_size' != 0 new child namespaces may be created.Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Signed-off-by: Dan Williams -
A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes. A later patch will make
these settings persistent by writing back the label.Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Signed-off-by: Dan Williams -
On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface. A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active. Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command. Adding/deleting namespaces from an active
interleave set is always possible via sysfs.Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered. For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration. It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.Cc: Neil Brown
Cc:
Cc: Greg KH
Cc: Robert Moore
Cc: Rafael J. Wysocki
Acked-by: Christoph Hellwig
Acked-by: Rafael J. Wysocki
Signed-off-by: Dan Williams -
The libnvdimm region driver is an intermediary driver that translates
non-volatile "region"s into "namespace" sub-devices that are surfaced by
persistent memory block-device drivers (PMEM and BLK).ACPI 6 introduces the concept that a given nvdimm may simultaneously
offer multiple access modes to its media through direct PMEM load/store
access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
If an nvdimm is single interfaced, then there is no need for dimm
metadata labels. For these devices we can take the region boundaries
directly to create a child namespace device (nd_namespace_io).Acked-by: Christoph Hellwig
Tested-by: Toshi Kani
Signed-off-by: Dan Williams -
* Implement the device-model infrastructure for loading modules and
attaching drivers to nvdimm devices. This is a simple association of a
nd-device-type number with a driver that has a bitmask of supported
device types. To facilitate userspace bind/unbind operations 'modalias'
and 'devtype', that also appear in the uevent, are added as generic
sysfs attributes for all nvdimm devices. The reason for the device-type
number is to support sub-types within a given parent devtype, be it a
vendor-specific sub-type or otherwise.* The first consumer of this infrastructure is the driver
for dimm devices. It simply uses control messages to retrieve and
store the configuration-data image (label set) from each dimm.Note: nd_device_register() arranges for asynchronous registration of
nvdimm bus devices by default.Cc: Greg KH
Cc: Neil Brown
Acked-by: Christoph Hellwig
Tested-by: Toshi Kani
Signed-off-by: Dan Williams