19 Feb, 2020

1 commit

  • Use the new phys_to_target_node() and numa_map_to_online_node() helpers
    to retrieve the correct id for the 'numa_node' ("local" / online
    initiator node) and 'target_node' (offline target memory node) sysfs
    attributes.

    Below is an example from a 4 NUMA node system where all the memory on
    node2 is pmem / reserved. It should be noted that with the arrival of
    the ACPI HMAT table and EFI Specific Purpose Memory the kernel will
    start to see more platforms with reserved / performance differentiated
    memory in its own NUMA node. Hence all the stakeholders on the Cc for
    what is ostensibly a libnvdimm local patch.

    === Before ===

    /* Notice no online memory on node2 at start */

    # numactl --hardware
    available: 3 nodes (0-1,3)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    node 0 size: 3958 MB
    node 0 free: 3708 MB
    node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
    node 1 size: 4027 MB
    node 1 free: 3871 MB
    node 3 cpus:
    node 3 size: 3994 MB
    node 3 free: 3971 MB
    node distances:
    node 0 1 3
    0: 10 21 21
    1: 21 10 21
    3: 21 21 10

    /*
    * Put the pmem namespace into devdax mode so it can be assigned to the
    * kmem driver
    */

    # ndctl create-namespace -e namespace0.0 -m devdax -f
    {
    "dev":"namespace0.0",
    "mode":"devdax",
    "map":"dev",
    "size":"3.94 GiB (4.23 GB)",
    "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3",
    [..]
    }

    /* Online Persistent Memory as System RAM */

    # daxctl reconfigure-device --mode=system-ram dax0.0
    libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
    libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
    libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
    libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
    [
    {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":0,
    "mode":"system-ram"
    }
    ]
    reconfigured 1 device

    /* Note that the memory is onlined by default to the wrong node, node0 */

    # numactl --hardware
    available: 3 nodes (0-1,3)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    node 0 size: 7926 MB
    node 0 free: 7655 MB
    node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
    node 1 size: 4027 MB
    node 1 free: 3871 MB
    node 3 cpus:
    node 3 size: 3994 MB
    node 3 free: 3971 MB
    node distances:
    node 0 1 3
    0: 10 21 21
    1: 21 10 21
    3: 21 21 10

    === After ===

    /* Notice that the "phys_index" error messages are gone */

    # daxctl reconfigure-device --mode=system-ram dax0.0
    [
    {
    "chardev":"dax0.0",
    "size":4225761280,
    "target_node":2,
    "mode":"system-ram"
    }
    ]
    reconfigured 1 device

    /* Notice that node2 is now correctly populated */

    # numactl --hardware
    available: 4 nodes (0-3)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    node 0 size: 3958 MB
    node 0 free: 3793 MB
    node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
    node 1 size: 4027 MB
    node 1 free: 3851 MB
    node 2 cpus:
    node 2 size: 3968 MB
    node 2 free: 3968 MB
    node 3 cpus:
    node 3 size: 3994 MB
    node 3 free: 3908 MB
    node distances:
    node 0 1 2 3
    0: 10 21 21 21
    1: 21 10 21 21
    2: 21 21 10 21
    3: 21 21 21 10

    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Cc: Ira Weiny
    Cc: Vishal Verma
    Cc: Christoph Hellwig
    Reviewed-by: Ingo Molnar
    Signed-off-by: Dan Williams
    Link: https://lore.kernel.org/r/158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

20 Nov, 2019

2 commits

  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nvdimm_bus_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309903815.1582359.6418211876315050283.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     
  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_region_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Signed-off-by: Dan Williams
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309902169.1582359.16828508538444551337.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

18 Nov, 2019

1 commit

  • A 'struct device_type' instance can carry default attributes for the
    device. Use this facility to remove the export of
    nd_device_attribute_group and put the responsibility on the core rather
    than leaf implementations to define this attribute.

    For regions this creates a new nd_region_attribute_groups[] added to the
    per-region device-type instances.

    Cc: Ira Weiny
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Vishal Verma
    Cc: Aneesh Kumar K.V
    Reviewed-by: Aneesh Kumar K.V
    Link: https://lore.kernel.org/r/157309901138.1582359.12909354140826530394.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have MODULE_LICENCE("GPL*") inside which was used in the initial
    scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

07 Jan, 2019

1 commit

  • Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
    Interface Table), is the first known instance of a memory range
    described by a unique "target" proximity domain. Where "initiator" and
    "target" proximity domains is an approach that the ACPI HMAT
    (Heterogeneous Memory Attributes Table) uses to described the unique
    performance properties of a memory range relative to a given initiator
    (e.g. CPU or DMA device).

    Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
    char-device follows the traditional notion of 'numa-node' where the
    attribute conveys the closest online numa-node. That numa-node attribute
    is useful for cpu-binding and memory-binding processes *near* the
    device. However, when the memory range backing a 'pmem', or 'dax' device
    is onlined (memory hot-add) the memory-only-numa-node representing that
    address needs to be differentiated from the set of online nodes. In
    other words, the numa-node association of the device depends on whether
    you can bind processes *near* the cpu-numa-node in the offline
    device-case, or bind process *on* the memory-range directly after the
    backing address range is onlined.

    Allow for the case that platform firmware describes persistent memory
    with a unique proximity domain, i.e. when it is distinct from the
    proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
    numa-node translation of that proximity through the libnvdimm region
    device to namespaces that are in device-dax mode. With this in place the
    proposed kmem driver [1] can optionally discover a unique numa-node
    number for the address range as it transitions the memory from an
    offline state managed by a device-driver to an online memory range
    managed by the core-mm.

    [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com

    Reported-by: Fan Du
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Reviewed-by: Yang Shi
    Signed-off-by: Dan Williams

    Dan Williams
     

03 Jun, 2018

1 commit

  • There is currently a mismatch between the resources that will trigger
    the e820_pmem driver to register/load and the resources that will
    actually be surfaced as pmem ranges. register_e820_pmem() uses
    walk_iomem_res_desc() which includes children and siblings. In contrast,
    e820_pmem_probe() only considers top level resources. For example the
    following resource tree results in the driver being loaded, but no
    resources being registered:

    398000000000-39bfffffffff : PCI Bus 0000:ae
    39be00000000-39bf07ffffff : PCI Bus 0000:af
    39be00000000-39beffffffff : 0000:af:00.0
    39be10000000-39beffffffff : Persistent Memory (legacy)

    Fix this up to allow definitions of "legacy" pmem ranges anywhere in
    system-physical address space. Not that it is a recommended or safe to
    define a pmem range in PCI space, but it is useful for debug /
    experimentation, and the restriction on being a top-level resource was
    arbitrary.

    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

06 Dec, 2016

1 commit


22 Jul, 2016

1 commit


30 Jan, 2016

1 commit

  • Change the callers of walk_iomem_res() scanning for the
    following resources by name to use walk_iomem_res_desc()
    instead.

    "ACPI Tables"
    "ACPI Non-volatile Storage"
    "Persistent Memory (legacy)"
    "Crash kernel"

    Note, the caller of walk_iomem_res() with "GART" will be removed
    in a later patch.

    Signed-off-by: Toshi Kani
    Signed-off-by: Borislav Petkov
    Reviewed-by: Dave Young
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Chun-Yi
    Cc: Dan Williams
    Cc: Denys Vlasenko
    Cc: Don Zickus
    Cc: H. Peter Anvin
    Cc: Lee, Chun-Yi
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Minfei Huang
    Cc: Peter Zijlstra (Intel)
    Cc: Ross Zwisler
    Cc: Stephen Rothwell
    Cc: Takao Indoh
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: kexec@lists.infradead.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm
    Cc: linux-nvdimm@lists.01.org
    Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Toshi Kani
     

13 Nov, 2015

1 commit


29 Aug, 2015

1 commit

  • The expectation is that the legacy / non-standard pmem discovery method
    (e820 type-12) will only ever be used to describe small quantities of
    persistent memory. Larger capacities will be described via the ACPI
    NFIT. When "allocate struct page from pmem" support is added this default
    policy can be overridden by assigning a legacy pmem namespace to a pfn
    device, however this would be only be necessary if a platform used the
    legacy mechanism to define a very large range.

    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

19 Aug, 2015

1 commit

  • We currently register a platform device for e820 type-12 memory and
    register a nvdimm bus beneath it. Registering the platform device
    triggers the device-core machinery to probe for a driver, but that
    search currently comes up empty. Building the nvdimm-bus registration
    into the e820_pmem platform device registration in this way forces
    libnvdimm to be built-in. Instead, convert the built-in portion of
    CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
    rest of the logic to the driver for e820_pmem, for the following
    reasons:

    1/ Letting e820_pmem support be a module allows building and testing
    libnvdimm.ko changes without rebooting

    2/ All the normal policy around modules can be applied to e820_pmem
    (unbind to disable and/or blacklisting the module from loading by
    default)

    3/ Moving the driver to a generic location and converting it to scan
    "iomem_resource" rather than "e820.map" means any other architecture can
    take advantage of this simple nvdimm resource discovery mechanism by
    registering a resource named "Persistent Memory (legacy)"

    Cc: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams