14 Oct, 2020

6 commits

  • The 'struct resource' in 'struct dev_pagemap' is only used for holding
    resource span information. The other fields, 'name', 'flags', 'desc',
    'parent', 'sibling', and 'child' are all unused wasted space.

    This is in preparation for introducing a multi-range extension of
    devm_memremap_pages().

    The bulk of this change is unwinding all the places internal to libnvdimm
    that used 'struct resource' unnecessarily, and replacing instances of
    'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

    P2PDMA had a minor usage of the resource flags field, but only to report
    failures with "%pR". That is replaced with an open coded print of the
    range.

    [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
    Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

    Signed-off-by: Dan Williams
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Reviewed-by: Boris Ostrovsky [xen]
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for introducing seed devices the dax-bus core needs to be
    able to intercept ->probe() and ->remove() operations. Towards that end
    arrange for the bus and drivers to switch from raw 'struct device' driver
    operations to 'struct dev_dax' typed operations.

    Reported-by: Hulk Robot
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Jason Yan
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for a facility that enables dax regions to be sub-divided,
    introduce infrastructure to track and allocate region capacity.

    The new dax_region/available_size attribute is only enabled for volatile
    hmem devices, not pmem devices that are defined by nvdimm namespace
    boundaries. This is per Jeff's feedback the last time dynamic device-dax
    capacity allocation support was discussed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
    Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The passed in dev_pagemap is only required in the pmem case as the
    libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
    place the memmap in pmem directly. In the hmem case there is no agent
    reserving an altmap so it can all be handled by a core internal default.

    Pass the resource range via a new @range property of 'struct
    dev_dax_data'.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for adding more parameters to instance creation, move
    existing parameters to a new struct.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • All callers specify the same flags to alloc_dax_region(), so there is no
    need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
    ->pfn_flags around on the region. Device-dax instances are always page
    backed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     

07 Nov, 2019

1 commit

  • PFN flags are (unsigned long long), fix the alloc_dax_region() calling
    convention to fix warnings of the form:

    >> include/linux/pfn_t.h:18:17: warning: large integer implicitly truncated to unsigned type [-Woverflow]
    #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))

    Reported-by: kbuild test robot
    Signed-off-by: Dan Williams
    Acked-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Dan Williams
     

07 Jan, 2019

6 commits

  • Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
    Interface Table), is the first known instance of a memory range
    described by a unique "target" proximity domain. Where "initiator" and
    "target" proximity domains is an approach that the ACPI HMAT
    (Heterogeneous Memory Attributes Table) uses to described the unique
    performance properties of a memory range relative to a given initiator
    (e.g. CPU or DMA device).

    Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
    char-device follows the traditional notion of 'numa-node' where the
    attribute conveys the closest online numa-node. That numa-node attribute
    is useful for cpu-binding and memory-binding processes *near* the
    device. However, when the memory range backing a 'pmem', or 'dax' device
    is onlined (memory hot-add) the memory-only-numa-node representing that
    address needs to be differentiated from the set of online nodes. In
    other words, the numa-node association of the device depends on whether
    you can bind processes *near* the cpu-numa-node in the offline
    device-case, or bind process *on* the memory-range directly after the
    backing address range is onlined.

    Allow for the case that platform firmware describes persistent memory
    with a unique proximity domain, i.e. when it is distinct from the
    proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
    numa-node translation of that proximity through the libnvdimm region
    device to namespaces that are in device-dax mode. With this in place the
    proposed kmem driver [1] can optionally discover a unique numa-node
    number for the address range as it transitions the memory from an
    offline state managed by a device-driver to an online memory range
    managed by the core-mm.

    [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com

    Reported-by: Fan Du
    Cc: Michael Ellerman
    Cc: "Oliver O'Halloran"
    Cc: Dave Hansen
    Cc: Jérôme Glisse
    Reviewed-by: Yang Shi
    Signed-off-by: Dan Williams

    Dan Williams
     
  • On the expectation that some environments may not upgrade libdaxctl
    (userspace component that depends on the /sys/class/dax hierarchy),
    provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat
    driver implements the original /sys/class/dax sysfs layout rather than
    /sys/bus/dax. When userspace is upgraded it can blacklist this module
    and switch to the dax_pmem driver going forward.

    CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according
    to the dax_pmem entry in Documentation/ABI/obsolete/.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Introduce the 'new_id' concept for enabling a custom device-driver attach
    policy for dax-bus drivers. The intended use is to have a mechanism for
    hot-plugging device-dax ranges into the page allocator on-demand. With
    this in place the default policy of using device-dax for performance
    differentiated memory can be overridden by user-space policy that can
    arrange for the memory range to be managed as 'System RAM' with
    user-defined NUMA and other performance attributes.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Move the responsibility of calling devm_request_resource() and
    devm_memremap_pages() into the common device-dax driver. This is another
    preparatory step to allowing an alternate personality driver for a
    device-dax range.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • In support of multiple device-dax instances per device-dax-region and
    allowing the 'kmem' driver to attach to dax-instances instead of the
    current device-node access, convert the dax sub-system from a class to a
    bus. Recall that the kmem driver takes reserved / special purpose
    memories and assigns them to be managed by the core-mm.

    Aside from the fact the device-dax instances are registered and probed
    on a bus, two other lifetime-management changes are made:

    1/ Delay attaching a cdev until driver probe time

    2/ A new run_dax() helper is introduced to allow restoring dax-operation
    after a kill_dax() event. So, at driver ->probe() time we run_dax()
    and at ->remove() time we kill_dax() and invalidate all mappings.

    Signed-off-by: Dan Williams

    Dan Williams
     
  • Towards eliminating the dax_class, move the dax-device-attribute
    enabling to a new bus.c file in the core. The amount of code
    thrash of sub-sequent patches is reduced as no logic changes are made,
    just pure code movement.

    A temporary export of unregister_dex_dax() and dax_attribute_groups is
    needed to preserve compilation, but those symbols become static again in
    a follow-on patch.

    Signed-off-by: Dan Williams

    Dan Williams