07 Sep, 2019

1 commit

  • The infrastructure to mock core libnvdimm routines for unit testing
    purposes is prone to bitrot relative to refactoring of that core. Arrange
    for the unit test core to be built when CONFIG_COMPILE_TEST=y. This does
    not result in a functional unit test environment, it is only a helper for
    0day to catch unit test build regressions.

    Note that there are a few x86isms in the implementation, so this does not
    bother compile testing this architectures other than 64-bit x86.

    Link: https://lore.kernel.org/r/156763690875.2556198.15786177395425033830.stgit@dwillia2-desk3.amr.corp.intel.com
    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams
    Signed-off-by: Jason Gunthorpe

    Dan Williams
     

06 Jul, 2019

1 commit

  • This patch adds virtio-pmem driver for KVM guest.

    Guest reads the persistent memory range information from
    Qemu over VIRTIO and registers it on nvdimm_bus. It also
    creates a nd_region object with the persistent memory
    range information so that existing 'nvdimm/pmem' driver
    can reserve this into system memory map. This way
    'virtio-pmem' driver uses existing functionality of pmem
    driver to register persistent memory compatible for DAX
    capable filesystems.

    This also provides function to perform guest flush over
    VIRTIO from 'pmem' driver when userspace performs flush
    on DAX memory range.

    Signed-off-by: Pankaj Gupta
    Reviewed-by: Yuval Shaia
    Acked-by: Michael S. Tsirkin
    Acked-by: Jakub Staron
    Tested-by: Jakub Staron
    Reviewed-by: Cornelia Huck
    Signed-off-by: Dan Williams

    Pankaj Gupta
     

14 Dec, 2018

1 commit

  • Add support to unlock the dimm via the kernel key management APIs. The
    passphrase is expected to be pulled from userspace through keyutils.
    The key management and sysfs attributes are libnvdimm generic.

    Encrypted keys are used to protect the nvdimm passphrase at rest. The
    master key can be a trusted-key sealed in a TPM, preferred, or an
    encrypted-key, more flexible, but more exposure to a potential attacker.

    Signed-off-by: Dave Jiang
    Co-developed-by: Dan Williams
    Reported-by: Randy Dunlap
    Signed-off-by: Dan Williams

    Dave Jiang
     

07 Apr, 2018

1 commit

  • This patch adds peliminary device-tree bindings for persistent memory
    regions. The driver registers a libnvdimm bus for each pmem-region
    node and each address range under the node is converted to a region
    within that bus.

    Signed-off-by: Oliver O'Halloran
    Signed-off-by: Dan Williams

    Oliver O'Halloran
     

18 Nov, 2017

1 commit

  • Pull libnvdimm and dax updates from Dan Williams:
    "Save for a few late fixes, all of these commits have shipped in -next
    releases since before the merge window opened, and 0day has given a
    build success notification.

    The ext4 touches came from Jan, and the xfs touches have Darrick's
    reviewed-by. An xfstest for the MAP_SYNC feature has been through
    a few round of reviews and is on track to be merged.

    - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
    'userspace flush' of persistent memory updates via filesystem-dax
    mappings. It arranges for any filesystem metadata updates that may
    be required to satisfy a write fault to also be flushed ("on disk")
    before the kernel returns to userspace from the fault handler.
    Effectively every write-fault that dirties metadata completes an
    fsync() before returning from the fault handler. The new
    MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
    is validated as supported by the filesystem's ->mmap() file
    operation.

    - Add support for the standard ACPI 6.2 label access methods that
    replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
    This enables interoperability with environments that only implement
    the standardized methods.

    - Add support for the ACPI 6.2 NVDIMM media error injection methods.

    - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
    latch last shutdown status, firmware update, SMART error injection,
    and SMART alarm threshold control.

    - Cleanup physical address information disclosures to be root-only.

    - Fix revalidation of the DIMM "locked label area" status to support
    dynamic unlock of the label area.

    - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
    (system-physical-address) command and error injection commands.

    Acknowledgements that came after the commits were pushed to -next:

    - 957ac8c421ad ("dax: fix PMD faults on zero-length files"):
    Reviewed-by: Ross Zwisler

    - a39e596baa07 ("xfs: support for synchronous DAX faults") and
    7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
    Reviewed-by: Darrick J. Wong "

    * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
    acpi, nfit: add 'Enable Latch System Shutdown Status' command support
    dax: fix general protection fault in dax_alloc_inode
    dax: fix PMD faults on zero-length files
    dax: stop requiring a live device for dax_flush()
    brd: remove dax support
    dax: quiet bdev_dax_supported()
    fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
    tools/testing/nvdimm: unit test clear-error commands
    acpi, nfit: validate commands against the device type
    tools/testing/nvdimm: stricter bounds checking for error injection commands
    xfs: support for synchronous DAX faults
    xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
    ext4: Support for synchronous DAX faults
    ext4: Simplify error handling in ext4_dax_huge_fault()
    dax: Implement dax_finish_sync_fault()
    dax, iomap: Add support for synchronous faults
    mm: Define MAP_SYNC and VM_SYNC flags
    dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
    dax: Allow dax_iomap_fault() to return pfn
    dax: Fix comment describing dax_iomap_fault()
    ...

    Linus Torvalds
     

03 Nov, 2017

1 commit

  • nfit_test needs to use the poison list manipulation code as well. Make
    it more generic and in the process rename poison to badrange, and move
    all the related helpers to a new file.

    Signed-off-by: Dave Jiang
    [vishal: Add badrange.o to nfit_test's Kbuild]
    [vishal: add a missed include in bus.c for the new badrange functions]
    [vishal: rename all instances of 'be' to 'bre']
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Dave Jiang
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 May, 2016

1 commit

  • Device DAX is the device-centric analogue of Filesystem DAX
    (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and
    mapped without need of an intervening file system. This initial
    infrastructure arranges for a libnvdimm pfn-device to be represented as
    a different device-type so that it can be attached to a driver other
    than the pmem driver.

    Signed-off-by: Dan Williams

    Dan Williams
     

29 Aug, 2015

1 commit

  • Implement the base infrastructure for libnvdimm PFN devices. Similar to
    BTT devices they take a namespace as a backing device and layer
    functionality on top. In this case the functionality is reserving space
    for an array of 'struct page' entries to be handed out through
    pfn_to_page(). For now this is just the basic libnvdimm-device-model for
    configuring the base PFN device.

    As the namespace claiming mechanism for PFN devices is mostly identical
    to BTT devices drivers/nvdimm/claim.c is created to house the common
    bits.

    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

19 Aug, 2015

1 commit

  • We currently register a platform device for e820 type-12 memory and
    register a nvdimm bus beneath it. Registering the platform device
    triggers the device-core machinery to probe for a driver, but that
    search currently comes up empty. Building the nvdimm-bus registration
    into the e820_pmem platform device registration in this way forces
    libnvdimm to be built-in. Instead, convert the built-in portion of
    CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
    rest of the logic to the driver for e820_pmem, for the following
    reasons:

    1/ Letting e820_pmem support be a module allows building and testing
    libnvdimm.ko changes without rebooting

    2/ All the normal policy around modules can be applied to e820_pmem
    (unbind to disable and/or blacklisting the module from loading by
    default)

    3/ Moving the driver to a generic location and converting it to scan
    "iomem_resource" rather than "e820.map" means any other architecture can
    take advantage of this simple nvdimm resource discovery mechanism by
    registering a resource named "Persistent Memory (legacy)"

    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

26 Jun, 2015

2 commits

  • The libnvdimm implementation handles allocating dimm address space (DPA)
    between PMEM and BLK mode interfaces. After DPA has been allocated from
    a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
    as a struct bio based block device. Unlike PMEM, BLK is required to
    handle platform specific details like mmio register formats and memory
    controller interleave. For this reason the libnvdimm generic nd_blk
    driver calls back into the bus provider to carry out the I/O.

    This initial implementation handles the BLK interface defined by the
    ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
    DCR (dimm control region), BDW (block data window), IDT (interleave
    descriptor) NFIT structures and the hardware register format.
    [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
    [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Cc: Andy Lutomirski
    Cc: Boaz Harrosh
    Cc: H. Peter Anvin
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Ross Zwisler
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Ross Zwisler
     
  • BTT stands for Block Translation Table, and is a way to provide power
    fail sector atomicity semantics for block devices that have the ability
    to perform byte granularity IO. It relies on the capability of libnvdimm
    namespace devices to do byte aligned IO.

    The BTT works as a stacked blocked device, and reserves a chunk of space
    from the backing device for its accounting metadata. It is a bio-based
    driver because all IO is done synchronously, and there is no queuing or
    asynchronous completions at either the device or the driver level.

    The BTT uses 'lanes' to index into various 'on-disk' data structures,
    and lanes also act as a synchronization mechanism in case there are more
    CPUs than available lanes. We did a comparison between two lane lock
    strategies - first where we kept an atomic counter around that tracked
    which was the last lane that was used, and 'our' lane was determined by
    atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
    theoretically, no CPU would be blocked waiting for a lane. The other
    strategy was to use the cpu number we're scheduled on to and hash it to
    a lane number. Theoretically, this could block an IO that could've
    otherwise run using a different, free lane. But some fio workloads
    showed that the direct cpu -> lane hash performed faster than tracking
    'last lane' - my reasoning is the cache thrash caused by moving the
    atomic variable made that approach slower than simply waiting out the
    in-progress IO. This supports the conclusion that the driver can be a
    very simple bio-based one that does synchronous IOs instead of queuing.

    Cc: Andy Lutomirski
    Cc: Boaz Harrosh
    Cc: H. Peter Anvin
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Neil Brown
    Cc: Jeff Moyer
    Cc: Dave Chinner
    Cc: Greg KH
    [jmoyer: fix nmi watchdog timeout in btt_map_init]
    [jmoyer: move btt initialization to module load path]
    [jmoyer: fix memory leak in the btt initialization path]
    [jmoyer: Don't overwrite corrupted arenas]
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

25 Jun, 2015

9 commits

  • NVDIMM namespaces, in addition to accepting "struct bio" based requests,
    also have the capability to perform byte-aligned accesses. By default
    only the bio/block interface is used. However, if another driver can
    make effective use of the byte-aligned capability it can claim namespace
    interface and use the byte-aligned ->rw_bytes() interface.

    The BTT driver is the initial first consumer of this mechanism to allow
    adding atomic sector update semantics to a pmem or blk namespace. This
    patch is the sysfs infrastructure to allow configuring a BTT instance
    for a namespace. Enabling that BTT and performing i/o is in a
    subsequent patch.

    Cc: Greg KH
    Cc: Neil Brown
    Signed-off-by: Dan Williams

    Dan Williams
     
  • This on media label format [1] consists of two index blocks followed by
    an array of labels. None of these structures are ever updated in place.
    A sequence number tracks the current active index and the next one to
    write, while labels are written to free slots.

    +------------+
    | |
    | nsindex0 |
    | |
    +------------+
    | |
    | nsindex1 |
    | |
    +------------+
    | label0 |
    +------------+
    | label1 |
    +------------+
    | |
    ....nslot...
    | |
    +------------+
    | labelN |
    +------------+

    After reading valid labels, store the dpa ranges they claim into
    per-dimm resource trees.

    [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

    Cc: Neil Brown
    Acked-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Prepare the pmem driver to consume PMEM namespaces emitted by regions of
    an nvdimm_bus instance. No functional change.

    Acked-by: Christoph Hellwig
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The libnvdimm region driver is an intermediary driver that translates
    non-volatile "region"s into "namespace" sub-devices that are surfaced by
    persistent memory block-device drivers (PMEM and BLK).

    ACPI 6 introduces the concept that a given nvdimm may simultaneously
    offer multiple access modes to its media through direct PMEM load/store
    access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
    interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
    If an nvdimm is single interfaced, then there is no need for dimm
    metadata labels. For these devices we can take the region boundaries
    directly to create a child namespace device (nd_namespace_io).

    Acked-by: Christoph Hellwig
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A "region" device represents the maximum capacity of a BLK range (mmio
    block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
    volatile memory), without regard for aliasing. Aliasing, in the
    dimm-local address space (DPA), is resolved by metadata on a dimm to
    designate which exclusive interface will access the aliased DPA ranges.
    Support for the per-dimm metadata/label arrvies is in a subsequent
    patch.

    The name format of "region" devices is "regionN" where, like dimms, N is
    a global ida index assigned at discovery time. This id is not reliable
    across reboots nor in the presence of hotplug. Look to attributes of
    the region or static id-data of the sub-namespace to generate a
    persistent name. However, if the platform configuration does not change
    it is reasonable to expect the same region id to be assigned at the next
    boot.

    "region"s have 2 generic attributes "size", and "mapping"s where:
    - size: the BLK accessible capacity or the span of the
    system physical address range in the case of PMEM.

    - mappingN: a tuple describing a dimm's contribution to the region's
    capacity in the format (,,). For a PMEM-region
    there will be at least one mapping per dimm in the interleave set. For
    a BLK-region there is only "mapping0" listing the starting DPA of the
    BLK-region and the available DPA capacity of that space (matches "size"
    above).

    The max number of mappings per "region" is hard coded per the
    constraints of sysfs attribute groups. That said the number of mappings
    per region should never exceed the maximum number of possible dimms in
    the system. If the current number turns out to not be enough then the
    "mappings" attribute clarifies how many there are supposed to be. "32
    should be enough for anybody...".

    Cc: Neil Brown
    Cc:
    Cc: Greg KH
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • * Implement the device-model infrastructure for loading modules and
    attaching drivers to nvdimm devices. This is a simple association of a
    nd-device-type number with a driver that has a bitmask of supported
    device types. To facilitate userspace bind/unbind operations 'modalias'
    and 'devtype', that also appear in the uevent, are added as generic
    sysfs attributes for all nvdimm devices. The reason for the device-type
    number is to support sub-types within a given parent devtype, be it a
    vendor-specific sub-type or otherwise.

    * The first consumer of this infrastructure is the driver
    for dimm devices. It simply uses control messages to retrieve and
    store the configuration-data image (label set) from each dimm.

    Note: nd_device_register() arranges for asynchronous registration of
    nvdimm bus devices by default.

    Cc: Greg KH
    Cc: Neil Brown
    Acked-by: Christoph Hellwig
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Enable nvdimm devices to be registered on a nvdimm_bus. The kernel
    assigned device id for nvdimm devicesis dynamic. If userspace needs a
    more static identifier it should consult a provider-specific attribute.
    In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
    'nmemX/nfit/serial' attributes may be used for this purpose.

    Cc: Neil Brown
    Cc:
    Cc: Greg KH
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The control device for a nvdimm_bus is registered as an "nd" class
    device. The expectation is that there will usually only be one "nd" bus
    registered under /sys/class/nd. However, we allow for the possibility
    of multiple buses and they will listed in discovery order as
    ndctl0...ndctlN. This character device hosts the ioctl for passing
    control messages. The initial command set has a 1:1 correlation with
    the commands listed in the by the "NFIT DSM Example" document [1], but
    this scheme is extensible to future command sets.

    Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
    a subsequent patch. This is simply the initial registrations and sysfs
    attributes.

    [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Cc: Neil Brown
    Cc: Greg KH
    Cc:
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A struct nvdimm_bus is the anchor device for registering nvdimm
    resources and interfaces, for example, a character control device,
    nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware
    Interface Table) is one possible platform description for such
    non-volatile memory resources in a system. The nfit.ko driver attaches
    to the "ACPI0012" device that indicates the presence of the NFIT and
    parses the table to register a struct nvdimm_bus instance.

    Cc:
    Cc: Lv Zheng
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Jeff Moyer
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams