18 Nov, 2019

1 commit

  • Statically initialize the attribute groups for each libnvdimm
    device_type. This is a preparation step for removing unnecessary exports
    of attributes that can be included in the device_type by default.

    Also take the opportunity to mark 'struct device_type' instances const.

    Cc: Ira Weiny
    Cc: Vishal Verma
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309900111.1582359.2445687530383470348.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

19 Jul, 2019

1 commit

  • For good reason, the standard device_lock() is marked
    lockdep_set_novalidate_class() because there is simply no sane way to
    describe the myriad ways the device_lock() ordered with other locks.
    However, that leaves subsystems that know their own local device_lock()
    ordering rules to find lock ordering mistakes manually. Instead,
    introduce an optional / additional lockdep-enabled lock that a subsystem
    can acquire in all the same paths that the device_lock() is acquired.

    A conversion of the NFIT driver and NVDIMM subsystem to a
    lockdep-validate device_lock() scheme is included. The
    debug_nvdimm_lock() implementation implements the correct lock-class and
    stacking order for the libnvdimm device topology hierarchy.

    Yes, this is a hack, but hopefully it is a useful hack for other
    subsystems device_lock() debug sessions. Quoting Greg:

    "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
    using it as much as anything else, so user beware :)

    I don't object to it if it makes things easier for you to debug."

    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Will Deacon
    Cc: Dave Jiang
    Cc: Keith Busch
    Cc: Peter Zijlstra
    Cc: Vishal Verma
    Cc: "Rafael J. Wysocki"
    Cc: Greg Kroah-Hartman
    Signed-off-by: Dan Williams
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Ira Weiny
    Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 64 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Mar, 2019

1 commit


01 Mar, 2019

1 commit

  • The Linux BTT implementation assumes that log entries will never have
    the 'zero' flag set, and indeed it never sets that flag for log entries
    itself.

    However, the UEFI spec is ambiguous on the exact format of the LBA field
    of a log entry, specifically as to whether it should include the
    additional flag bits or not. While a zero bit doesn't make sense in the
    context of a log entry, other BTT implementations might still have it set.

    If an implementation does happen to have it set, we would happily read
    it in as the next block to write to for writes. Since a high bit is set,
    it pushes the block number out of the range of an 'arena', and we fail
    such a write with an EIO.

    Follow the robustness principle, and tolerate such implementations by
    stripping out the zero flag when populating the free list during
    initialization. Additionally, use the same stripped out entries for
    detection of incomplete writes and map restoration that happens at this
    stage.

    Add a sysfs file 'log_zero_flags' that indicates the ability to accept
    such a layout to userspace applications. This enables 'ndctl
    check-namespace' to recognize whether the kernel is able to handle zero
    flags, or whether it should attempt a fix-up under the --repair option.

    Cc: Dan Williams
    Reported-by: Dexuan Cui
    Reported-by: Pedro d'Aquino Filocre F S Barbuda
    Tested-by: Dexuan Cui
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

07 Mar, 2018

1 commit

  • Dynamic debug can be instructed to add the function name to the debug
    output using the +f switch, so there is no need for the libnvdimm
    modules to do it again. If a user decides to add the +f switch for
    libnvdimm's dynamic debug this results in double prints of the function
    name.

    Reported-by: Johannes Thumshirn
    Reported-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

12 Aug, 2017

1 commit


30 Jun, 2017

1 commit

  • The UEFI 2.7 specification defines an updated BTT metadata format,
    bumping the revision to 2.0. Add support for the new format, while
    retaining compatibility for the old 1.1 format.

    Cc: Toshi Kani
    Cc: Linda Knippers
    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

16 Jun, 2017

1 commit

  • Starting with v1.2 labels, 'address abstractions' can be hinted via an
    address abstraction id that implies an info-block format. The standard
    address abstraction in the specification is the v2 format of the
    Block-Translation-Table (BTT). Support for that is saved for a later
    patch, for now we add support for the Linux supported address
    abstractions BTT (v1), PFN, and DAX.

    The new 'holder_class' attribute for namespace devices is added for
    tooling to specify the 'abstraction_guid' to store in the namespace label.
    For v1.1 labels this field is undefined and any setting of
    'holder_class' away from the default 'none' value will only have effect
    until the driver is unloaded. Setting 'holder_class' requires that
    whatever device tries to claim the namespace must be of the specified
    class.

    Cc: Vishal Verma
    Signed-off-by: Dan Williams

    Dan Williams
     

05 Jun, 2017

1 commit

  • Hoist the libnvdimm helper as an inline helper to linux/uuid.h
    using an auxiliary const variable uuid_null in lib/uuid.c.

    [hch: also add the guid variant. Both do the same but I'd like
    to keep casts to a minimum]

    The common helper uses the new abstract type uuid_t * instead of
    u8 *.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Amir Goldstein
    [hch: added guid_is_null]
    Signed-off-by: Christoph Hellwig
    Acked-by: Dan Williams
    Reviewed-by: Andy Shevchenko

    Christoph Hellwig
     

11 May, 2017

1 commit

  • nsio_rw_bytes can clear media errors, but this cannot be done while we
    are in an atomic context due to locking within ACPI. From the BTT,
    ->rw_bytes may be called either from atomic or process context depending
    on whether the calls happen during initialization or during IO.

    During init, we want to ensure error clearing happens, and the flag
    marking process context allows nsio_rw_bytes to do that. When called
    during IO, we're in atomic context, and error clearing can be skipped.

    Cc: Dan Williams
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

01 May, 2017

1 commit

  • A debug patch to turn the standard device_lock() into something that
    lockdep can analyze yielded the following:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    4.11.0-rc4+ #106 Tainted: G O
    -------------------------------------------------------
    lt-libndctl/1898 is trying to acquire lock:
    (&dev->nvdimm_mutex/3){+.+.+.}, at: [] nd_attach_ndns+0x178/0x1b0 [libnvdimm]

    but task is already holding lock:
    (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [] nvdimm_bus_lock+0x21/0x30 [libnvdimm]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
    lock_acquire+0xf6/0x1f0
    __mutex_lock+0x88/0x980
    mutex_lock_nested+0x1b/0x20
    nvdimm_bus_lock+0x21/0x30 [libnvdimm]
    nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
    nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
    nd_pmem_probe+0x14/0x180 [nd_pmem]
    nvdimm_bus_probe+0xa9/0x260 [libnvdimm]

    -> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
    __lock_acquire+0x1107/0x1280
    lock_acquire+0xf6/0x1f0
    __mutex_lock+0x88/0x980
    mutex_lock_nested+0x1b/0x20
    nd_attach_ndns+0x178/0x1b0 [libnvdimm]
    nd_namespace_store+0x308/0x3c0 [libnvdimm]
    namespace_store+0x87/0x220 [libnvdimm]

    In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.

    Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
    nd_{attach,detach}_ndns() operations.

    Cc:
    Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices")
    Reported-by: Yi Zhang
    Signed-off-by: Dan Williams

    Dan Williams
     

09 Aug, 2016

1 commit


24 Jul, 2016

1 commit


23 Apr, 2016

2 commits

  • In preparation for providing an alternative (to block device) access
    mechanism to persistent memory, convert pmem_rw_bytes() to
    nsio_rw_bytes(). This allows ->rw_bytes() functionality without
    requiring a 'struct pmem_device' to be instantiated.

    In other words, when ->rw_bytes() is in use i/o is driven through
    'struct nd_namespace_io', otherwise it is driven through 'struct
    pmem_device' and the block layer. This consolidates the disjoint calls
    to devm_exit_badblocks() and devm_memunmap() into a common
    devm_nsio_disable() and cleans up the init path to use a unified
    pmem_attach_disk() implementation.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Pass the device performing the probe so we can use a devm allocation for
    the btt superblock.

    Cc: Vishal Verma
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Dan Williams

    Dan Williams
     

17 Sep, 2015

1 commit


29 Aug, 2015

1 commit

  • Implement the base infrastructure for libnvdimm PFN devices. Similar to
    BTT devices they take a namespace as a backing device and layer
    functionality on top. In this case the functionality is reserving space
    for an array of 'struct page' entries to be handed out through
    pfn_to_page(). For now this is just the basic libnvdimm-device-model for
    configuring the base PFN device.

    As the namespace claiming mechanism for PFN devices is mostly identical
    to BTT devices drivers/nvdimm/claim.c is created to house the common
    bits.

    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Aug, 2015

2 commits

  • When a BTT is instantiated on a namespace it must validate the namespace
    uuid matches the 'parent_uuid' stored in the btt superblock. This
    property enforces that changing the namespace UUID invalidates all
    former BTT instances on that storage. For "IO namespaces" that don't
    have a label or UUID, the parent_uuid is set to zero, and this
    validation is skipped. For such cases, old BTTs have to be invalidated
    by forcing the namespace to raw mode, and overwriting the BTT info
    blocks.

    Based on a patch by Dan Williams

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • Use arena_is_valid as a common routine for checking the validity of an
    info block from both discover_arenas, and nd_btt_probe.

    As a result, don't check for validity of the BTT's UUID, and lbasize.
    The checksum in the BTT info block guarantees self-consistency, and when
    we're called from nd_btt_probe, we don't have a valid uuid or lbasize
    available to check against.

    Also cleanup to return a bool instead of an int.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

26 Jun, 2015

3 commits

  • Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
    under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.

    An example of numa_node values on a 2-socket system with a single
    NVDIMM range on each socket is shown below.
    /sys/bus/nd/devices
    |-- btt0.0/numa_node:0
    |-- btt1.0/numa_node:1
    |-- btt1.1/numa_node:1
    |-- namespace0.0/numa_node:0
    |-- namespace1.0/numa_node:1
    |-- region0/numa_node:0
    |-- region1/numa_node:1

    These numa_node files are then linked under the block class of
    their device names.
    /sys/class/block/pmem0/device/numa_node:0
    /sys/class/block/pmem1s/device/numa_node:1

    This enables numactl(8) to accept 'block:' and 'file:' paths of
    pmem and btt devices as shown in the examples below.
    numactl --preferred block:pmem0 --show
    numactl --preferred file:/dev/pmem1s --show

    Signed-off-by: Toshi Kani
    Signed-off-by: Dan Williams

    Toshi Kani
     
  • Support multiple block sizes (sector + metadata) using the blk integrity
    framework. This registers a new integrity template that defines the
    protection information tuple size based on the configured metadata size,
    and simply acts as a passthrough for protection information generated by
    another layer. The metadata is written to the storage as-is, and read back
    with each sector.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     
  • BTT stands for Block Translation Table, and is a way to provide power
    fail sector atomicity semantics for block devices that have the ability
    to perform byte granularity IO. It relies on the capability of libnvdimm
    namespace devices to do byte aligned IO.

    The BTT works as a stacked blocked device, and reserves a chunk of space
    from the backing device for its accounting metadata. It is a bio-based
    driver because all IO is done synchronously, and there is no queuing or
    asynchronous completions at either the device or the driver level.

    The BTT uses 'lanes' to index into various 'on-disk' data structures,
    and lanes also act as a synchronization mechanism in case there are more
    CPUs than available lanes. We did a comparison between two lane lock
    strategies - first where we kept an atomic counter around that tracked
    which was the last lane that was used, and 'our' lane was determined by
    atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
    theoretically, no CPU would be blocked waiting for a lane. The other
    strategy was to use the cpu number we're scheduled on to and hash it to
    a lane number. Theoretically, this could block an IO that could've
    otherwise run using a different, free lane. But some fio workloads
    showed that the direct cpu -> lane hash performed faster than tracking
    'last lane' - my reasoning is the cache thrash caused by moving the
    atomic variable made that approach slower than simply waiting out the
    in-progress IO. This supports the conclusion that the driver can be a
    very simple bio-based one that does synchronous IOs instead of queuing.

    Cc: Andy Lutomirski
    Cc: Boaz Harrosh
    Cc: H. Peter Anvin
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Neil Brown
    Cc: Jeff Moyer
    Cc: Dave Chinner
    Cc: Greg KH
    [jmoyer: fix nmi watchdog timeout in btt_map_init]
    [jmoyer: move btt initialization to module load path]
    [jmoyer: fix memory leak in the btt initialization path]
    [jmoyer: Don't overwrite corrupted arenas]
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

25 Jun, 2015

1 commit

  • NVDIMM namespaces, in addition to accepting "struct bio" based requests,
    also have the capability to perform byte-aligned accesses. By default
    only the bio/block interface is used. However, if another driver can
    make effective use of the byte-aligned capability it can claim namespace
    interface and use the byte-aligned ->rw_bytes() interface.

    The BTT driver is the initial first consumer of this mechanism to allow
    adding atomic sector update semantics to a pmem or blk namespace. This
    patch is the sysfs infrastructure to allow configuring a BTT instance
    for a namespace. Enabling that BTT and performing i/o is in a
    subsequent patch.

    Cc: Greg KH
    Cc: Neil Brown
    Signed-off-by: Dan Williams

    Dan Williams