01 Dec, 2015

1 commit

  • When support for _FIT was added, the code presumed that the data
    returned by the _FIT method is identical to the NFIT table, which
    starts with an acpi_table_header. However, the _FIT is defined
    to return a data in the format of a series of NFIT type structure
    entries and as a method, has an acpi_object header rather tahn
    an acpi_table_header.

    To address the differences, explicitly save the acpi_table_header
    from the NFIT, since it is accessible through /sys, and change
    the nfit pointer in the acpi_desc structure to point to the
    table entries rather than the headers.

    Reported-by: Jeff Moyer (jmoyer@redhat.com>
    Signed-off-by: Linda Knippers
    Acked-by: Vishal Verma
    [vishal: fix up unit test for new header assumptions]
    Signed-off-by: Dan Williams

    Linda Knippers
     

11 Nov, 2015

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "Outside of the new ACPI-NFIT hot-add support this pull request is more
    notable for what it does not contain, than what it does. There were a
    handful of development topics this cycle, dax get_user_pages, dax
    fsync, and raw block dax, that need more more iteration and will wait
    for 4.5.

    The patches to make devm and the pmem driver NUMA aware have been in
    -next for several weeks. The hot-add support has not, but is
    contained to the NFIT driver and is passing unit tests. The coredump
    support is straightforward and was looked over by Jeff. All of it has
    received a 0day build success notification across 107 configs.

    Summary:

    - Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.

    - Teach the coredump implementation how to filter out DAX mappings.

    - Introduce NUMA hints for allocations made by the pmem driver, and
    as a side effect all devm allocations now hint their NUMA node by
    default"

    * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    coredump: add DAX filtering for FDPIC ELF coredumps
    coredump: add DAX filtering for ELF coredumps
    acpi: nfit: Add support for hot-add
    nfit: in acpi_nfit_init, break on a 0-length table
    pmem, memremap: convert to numa aware allocations
    devm_memremap_pages: use numa_mem_id
    devm: make allocations numa aware by default
    devm_memremap: convert to return ERR_PTR
    devm_memunmap: use devres_release()
    pmem: kill memremap_pmem()
    x86, mm: quiet arch_add_memory()

    Linus Torvalds
     

03 Nov, 2015

1 commit

  • Add a .notify callback to the acpi_nfit_driver that gets called on a
    hotplug event. From this, evaluate the _FIT ACPI method which returns
    the updated NFIT with handles for the hot-plugged NVDIMM.

    Iterate over the new NFIT, and add any new tables found, and
    register/enable the corresponding regions.

    In the nfit test framework, after normal initialization, update the NFIT
    with a new hot-plugged NVDIMM, and directly call into the driver to
    update its view of the available regions.

    Cc: Dan Williams
    Cc: Rafael J. Wysocki
    Cc: Toshi Kani
    Cc: Elliott, Robert
    Cc: Jeff Moyer
    Cc:
    Cc:
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

22 Oct, 2015

1 commit


28 Aug, 2015

1 commit

  • This should result in a pretty sizeable performance gain for reads. For
    rough comparison I did some simple read testing using PMEM to compare
    reads of write combining (WC) mappings vs write-back (WB). This was
    done on a random lab machine.

    PMEM reads from a write combining mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
    100000+0 records in
    100000+0 records out
    409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

    PMEM reads from a write-back mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
    1000000+0 records in
    1000000+0 records out
    4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

    To be able to safely support a write-back aperture I needed to add
    support for the "read flush" _DSM flag, as outlined in the DSM spec:

    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    This flag tells the ND BLK driver that it needs to flush the cache lines
    associated with the aperture after the aperture is moved but before any
    new data is read. This ensures that any stale cache lines from the
    previous contents of the aperture will be discarded from the processor
    cache, and the new data will be read properly from the DIMM. We know
    that the cache lines are clean and will be discarded without any
    writeback because either a) the previous aperture operation was a read,
    and we never modified the contents of the aperture, or b) the previous
    aperture operation was a write and we must have written back the dirtied
    contents of the aperture to the DIMM before the I/O was completed.

    In order to add support for the "read flush" flag I needed to add a
    generic routine to invalidate cache lines, mmio_flush_range(). This is
    protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
    only supported on x86.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Ross Zwisler
     

28 Jul, 2015

1 commit

  • Add support for the three ARS DSM commands:
    - Query ARS Capabilities - Queries the firmware to check if a given
    range supports scrub, and if so, which type (persistent vs. volatile)
    - Start ARS - Starts a scrub for a given range/type
    - Query ARS Status - Checks status of a previously started scrub, and
    provides the error logs if any.

    The commands are described by the example DSM spec at:
    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Also add these commands to the nfit_test test framework, and return
    canned data.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

11 Jul, 2015

2 commits

  • Add support in the NFIT BLK I/O path for the "latch" flag
    defined in the "Get Block NVDIMM Flags" _DSM function:

    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    This flag requires the driver to read back the command register after it
    is written in the block I/O path. This ensures that the hardware has
    fully processed the new command and moved the aperture appropriately.

    Signed-off-by: Ross Zwisler
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Ross Zwisler
     
  • Update the nfit block I/O path to use the new PMEM API and to adhere to
    the read/write flows outlined in the "NVDIMM Block Window Driver
    Writer's Guide":

    http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

    This includes adding support for targeted NVDIMM flushes called "flush
    hints" in the ACPI 6.0 specification:

    http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

    For performance and media durability the mapping for a BLK aperture is
    moved to a write-combining mapping which is consistent with
    memcpy_to_pmem() and wmb_blk().

    Signed-off-by: Ross Zwisler
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Ross Zwisler
     

26 Jun, 2015

3 commits

  • Upon detection of an unarmed dimm in a region, arrange for descendant
    BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
    "unarmed" via flags passed by platform firmware (NFIT).

    The flags in the NFIT memory device sub-structure indicate the state of
    the data on the nvdimm relative to its energy source or last "flush to
    persistence". For the most part there is nothing the driver can do but
    advertise the state of these flags in sysfs and emit a message if
    firmware indicates that the contents of the device may be corrupted.
    However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
    the block devices incorporating that nvdimm to be marked read-only.
    This is a safe default as the data is still available and new writes are
    held off until the administrator either forces read-write mode, or the
    energy source becomes armed.

    A 'read_only' attribute is added to REGION devices to allow for
    overriding the default read-only policy of all descendant block devices.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • 'libnvdimm' is the first driver sub-system in the kernel to implement
    mocking for unit test coverage. The nfit_test module gets built as an
    external module and arranges for external module replacements of nfit,
    libnvdimm, nd_pmem, and nd_blk. These replacements use the linker
    --wrap option to redirect calls to ioremap() + request_mem_region() to
    custom defined unit test resources. The end result is a fully
    functional nvdimm_bus, as far as userspace is concerned, but with the
    capability to perform otherwise destructive tests on emulated resources.

    Q: Why not use QEMU for this emulation?
    QEMU is not suitable for unit testing. QEMU's role is to faithfully
    emulate the platform. A unit test's role is to unfaithfully implement
    the platform with the goal of triggering bugs in the corners of the
    sub-system implementation. As bugs are discovered in platforms, or the
    sub-system itself, the unit tests are extended to backstop a fix with a
    reproducer unit test.

    Another problem with QEMU is that it would require coordination of 3
    software projects instead of 2 (kernel + libndctl [1]) to maintain and
    execute the tests. The chances for bit rot and the difficulty of
    getting the tests running goes up non-linearly the more components
    involved.

    Q: Why submit this to the kernel tree instead of external modules in
    libndctl?
    Simple, to alleviate the same risk that out-of-tree external modules
    face. Updates to drivers/nvdimm/ can be immediately evaluated to see if
    they have any impact on tools/testing/nvdimm/.

    Q: What are the negative implications of merging this?
    It is a unique maintenance burden because the purpose of mocking an
    interface to enable a unit test is to purposefully short circuit the
    semantics of a routine to enable testing. For example
    __wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
    resource buffer allocated by dma_alloc_coherent(). The future
    maintenance burden hits when someone changes the semantics of
    ioremap_cache() and wonders what the implications are for the unit test.

    [1]: https://github.com/pmem/ndctl

    Cc:
    Cc: Lv Zheng
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The libnvdimm implementation handles allocating dimm address space (DPA)
    between PMEM and BLK mode interfaces. After DPA has been allocated from
    a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
    as a struct bio based block device. Unlike PMEM, BLK is required to
    handle platform specific details like mmio register formats and memory
    controller interleave. For this reason the libnvdimm generic nd_blk
    driver calls back into the bus provider to carry out the I/O.

    This initial implementation handles the BLK interface defined by the
    ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
    DCR (dimm control region), BDW (block data window), IDT (interleave
    descriptor) NFIT structures and the hardware register format.
    [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
    [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Cc: Andy Lutomirski
    Cc: Boaz Harrosh
    Cc: H. Peter Anvin
    Cc: Jens Axboe
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Ross Zwisler
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams

    Ross Zwisler
     

25 Jun, 2015

4 commits

  • Most discovery/configuration of the nvdimm-subsystem is done via sysfs
    attributes. However, some nvdimm_bus instances, particularly the
    ACPI.NFIT bus, define a small set of messages that can be passed to the
    platform. For convenience we derive the initial libnvdimm-ioctl command
    formats directly from the NFIT DSM Interface Example formats.

    ND_CMD_SMART: media health and diagnostics
    ND_CMD_GET_CONFIG_SIZE: size of the label space
    ND_CMD_GET_CONFIG_DATA: read label space
    ND_CMD_SET_CONFIG_DATA: write label space
    ND_CMD_VENDOR: vendor-specific command passthrough
    ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
    ND_CMD_ARS_START: initiate scrubbing
    ND_CMD_ARS_STATUS: report on scrubbing state
    ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events

    If a platform later defines different commands than this set it is
    straightforward to extend support to those formats.

    Most of the commands target a specific dimm. However, the
    address-range-scrubbing commands target the bus. The 'commands'
    attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
    commands for that object.

    Cc:
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Reported-by: Nicholas Moulin
    Acked-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Enable nvdimm devices to be registered on a nvdimm_bus. The kernel
    assigned device id for nvdimm devicesis dynamic. If userspace needs a
    more static identifier it should consult a provider-specific attribute.
    In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
    'nmemX/nfit/serial' attributes may be used for this purpose.

    Cc: Neil Brown
    Cc:
    Cc: Greg KH
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • The control device for a nvdimm_bus is registered as an "nd" class
    device. The expectation is that there will usually only be one "nd" bus
    registered under /sys/class/nd. However, we allow for the possibility
    of multiple buses and they will listed in discovery order as
    ndctl0...ndctlN. This character device hosts the ioctl for passing
    control messages. The initial command set has a 1:1 correlation with
    the commands listed in the by the "NFIT DSM Example" document [1], but
    this scheme is extensible to future command sets.

    Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
    a subsequent patch. This is simply the initial registrations and sysfs
    attributes.

    [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Cc: Neil Brown
    Cc: Greg KH
    Cc:
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams
     
  • A struct nvdimm_bus is the anchor device for registering nvdimm
    resources and interfaces, for example, a character control device,
    nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware
    Interface Table) is one possible platform description for such
    non-volatile memory resources in a system. The nfit.ko driver attaches
    to the "ACPI0012" device that indicates the presence of the NFIT and
    parses the table to register a struct nvdimm_bus instance.

    Cc:
    Cc: Lv Zheng
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Acked-by: Jeff Moyer
    Acked-by: Christoph Hellwig
    Acked-by: Rafael J. Wysocki
    Tested-by: Toshi Kani
    Signed-off-by: Dan Williams

    Dan Williams