01 Feb, 2016

1 commit

  • A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
    Don't truncate the address when doing the pfn conversion.

    Cc: Ross Zwisler
    Reported-by: Matthew Wilcox
    [willy: fix pfn_t_to_phys as well]
    Signed-off-by: Dan Williams

    Dan Williams
     

10 Jan, 2016

2 commits


25 Dec, 2015

1 commit


15 Dec, 2015

1 commit

  • The unit test infrastructure uses CMA and real memory to emulate nvdimm
    resources. The call to devm_memremap_pages() can simply be mocked in
    the same manner as memremap and we mock phys_to_pfn_t() to clear PFN_MAP
    since these resources are not registered with in the pgmap_radix.

    Signed-off-by: Dan Williams

    Dan Williams
     

01 Dec, 2015

1 commit

  • When support for _FIT was added, the code presumed that the data
    returned by the _FIT method is identical to the NFIT table, which
    starts with an acpi_table_header. However, the _FIT is defined
    to return a data in the format of a series of NFIT type structure
    entries and as a method, has an acpi_object header rather tahn
    an acpi_table_header.

    To address the differences, explicitly save the acpi_table_header
    from the NFIT, since it is accessible through /sys, and change
    the nfit pointer in the acpi_desc structure to point to the
    table entries rather than the headers.

    Reported-by: Jeff Moyer (jmoyer@redhat.com>
    Signed-off-by: Linda Knippers
    Acked-by: Vishal Verma
    [vishal: fix up unit test for new header assumptions]
    Signed-off-by: Dan Williams

    Linda Knippers
     

13 Nov, 2015

1 commit


03 Nov, 2015

1 commit

  • Add a .notify callback to the acpi_nfit_driver that gets called on a
    hotplug event. From this, evaluate the _FIT ACPI method which returns
    the updated NFIT with handles for the hot-plugged NVDIMM.

    Iterate over the new NFIT, and add any new tables found, and
    register/enable the corresponding regions.

    In the nfit test framework, after normal initialization, update the NFIT
    with a new hot-plugged NVDIMM, and directly call into the driver to
    update its view of the available regions.

    Cc: Dan Williams
    Cc: Rafael J. Wysocki
    Cc: Toshi Kani
    Cc: Elliott, Robert
    Cc: Jeff Moyer
    Cc:
    Cc:
    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

29 Aug, 2015

2 commits

  • Enable the pmem driver to handle PFN device instances. Attaching a pmem
    namespace to a pfn device triggers the driver to allocate and initialize
    struct page entries for pmem. Memory capacity for this allocation comes
    exclusively from RAM for now which is suitable for low PMEM to RAM
    ratios. This mechanism will be expanded later for setting an "allocate
    from PMEM" policy.

    Cc: Boaz Harrosh
    Cc: Ross Zwisler
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Implement the base infrastructure for libnvdimm PFN devices. Similar to
    BTT devices they take a namespace as a backing device and layer
    functionality on top. In this case the functionality is reserving space
    for an array of 'struct page' entries to be handed out through
    pfn_to_page(). For now this is just the basic libnvdimm-device-model for
    configuring the base PFN device.

    As the namespace claiming mechanism for PFN devices is mostly identical
    to BTT devices drivers/nvdimm/claim.c is created to house the common
    bits.

    Cc: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

28 Aug, 2015

2 commits

  • Dan Williams
     
  • This should result in a pretty sizeable performance gain for reads. For
    rough comparison I did some simple read testing using PMEM to compare
    reads of write combining (WC) mappings vs write-back (WB). This was
    done on a random lab machine.

    PMEM reads from a write combining mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
    100000+0 records in
    100000+0 records out
    409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

    PMEM reads from a write-back mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
    1000000+0 records in
    1000000+0 records out
    4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

    To be able to safely support a write-back aperture I needed to add
    support for the "read flush" _DSM flag, as outlined in the DSM spec:

    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    This flag tells the ND BLK driver that it needs to flush the cache lines
    associated with the aperture after the aperture is moved but before any
    new data is read. This ensures that any stale cache lines from the
    previous contents of the aperture will be discarded from the processor
    cache, and the new data will be read properly from the DIMM. We know
    that the cache lines are clean and will be discarded without any
    writeback because either a) the previous aperture operation was a read,
    and we never modified the contents of the aperture, or b) the previous
    aperture operation was a write and we must have written back the dirtied
    contents of the aperture to the DIMM before the I/O was completed.

    In order to add support for the "read flush" flag I needed to add a
    generic routine to invalidate cache lines, mmio_flush_range(). This is
    protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
    only supported on x86.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Ross Zwisler
     

19 Aug, 2015

1 commit

  • We currently register a platform device for e820 type-12 memory and
    register a nvdimm bus beneath it. Registering the platform device
    triggers the device-core machinery to probe for a driver, but that
    search currently comes up empty. Building the nvdimm-bus registration
    into the e820_pmem platform device registration in this way forces
    libnvdimm to be built-in. Instead, convert the built-in portion of
    CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
    rest of the logic to the driver for e820_pmem, for the following
    reasons:

    1/ Letting e820_pmem support be a module allows building and testing
    libnvdimm.ko changes without rebooting

    2/ All the normal policy around modules can be applied to e820_pmem
    (unbind to disable and/or blacklisting the module from loading by
    default)

    3/ Moving the driver to a generic location and converting it to scan
    "iomem_resource" rather than "e820.map" means any other architecture can
    take advantage of this simple nvdimm resource discovery mechanism by
    registering a resource named "Persistent Memory (legacy)"

    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

15 Aug, 2015

2 commits


28 Jul, 2015

1 commit

  • Add support for the three ARS DSM commands:
    - Query ARS Capabilities - Queries the firmware to check if a given
    range supports scrub, and if so, which type (persistent vs. volatile)
    - Start ARS - Starts a scrub for a given range/type
    - Query ARS Status - Checks status of a previously started scrub, and
    provides the error logs if any.

    The commands are described by the example DSM spec at:
    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    Also add these commands to the nfit_test test framework, and return
    canned data.

    Signed-off-by: Vishal Verma
    Signed-off-by: Dan Williams

    Vishal Verma
     

11 Jul, 2015

3 commits


26 Jun, 2015

2 commits

  • Upon detection of an unarmed dimm in a region, arrange for descendant
    BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
    "unarmed" via flags passed by platform firmware (NFIT).

    The flags in the NFIT memory device sub-structure indicate the state of
    the data on the nvdimm relative to its energy source or last "flush to
    persistence". For the most part there is nothing the driver can do but
    advertise the state of these flags in sysfs and emit a message if
    firmware indicates that the contents of the device may be corrupted.
    However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
    the block devices incorporating that nvdimm to be marked read-only.
    This is a safe default as the data is still available and new writes are
    held off until the administrator either forces read-write mode, or the
    energy source becomes armed.

    A 'read_only' attribute is added to REGION devices to allow for
    overriding the default read-only policy of all descendant block devices.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • 'libnvdimm' is the first driver sub-system in the kernel to implement
    mocking for unit test coverage. The nfit_test module gets built as an
    external module and arranges for external module replacements of nfit,
    libnvdimm, nd_pmem, and nd_blk. These replacements use the linker
    --wrap option to redirect calls to ioremap() + request_mem_region() to
    custom defined unit test resources. The end result is a fully
    functional nvdimm_bus, as far as userspace is concerned, but with the
    capability to perform otherwise destructive tests on emulated resources.

    Q: Why not use QEMU for this emulation?
    QEMU is not suitable for unit testing. QEMU's role is to faithfully
    emulate the platform. A unit test's role is to unfaithfully implement
    the platform with the goal of triggering bugs in the corners of the
    sub-system implementation. As bugs are discovered in platforms, or the
    sub-system itself, the unit tests are extended to backstop a fix with a
    reproducer unit test.

    Another problem with QEMU is that it would require coordination of 3
    software projects instead of 2 (kernel + libndctl [1]) to maintain and
    execute the tests. The chances for bit rot and the difficulty of
    getting the tests running goes up non-linearly the more components
    involved.

    Q: Why submit this to the kernel tree instead of external modules in
    libndctl?
    Simple, to alleviate the same risk that out-of-tree external modules
    face. Updates to drivers/nvdimm/ can be immediately evaluated to see if
    they have any impact on tools/testing/nvdimm/.

    Q: What are the negative implications of merging this?
    It is a unique maintenance burden because the purpose of mocking an
    interface to enable a unit test is to purposefully short circuit the
    semantics of a routine to enable testing. For example
    __wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
    resource buffer allocated by dma_alloc_coherent(). The future
    maintenance burden hits when someone changes the semantics of
    ioremap_cache() and wonders what the implications are for the unit test.

    [1]: https://github.com/pmem/ndctl

    Cc:
    Cc: Lv Zheng
    Cc: Robert Moore
    Cc: Rafael J. Wysocki
    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams