14 Oct, 2020

40 commits

  • Avoid bumping the refcount on pages when we're only interested in the
    swap entries.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Cc: Alexey Dobriyan
    Cc: Chris Wilson
    Cc: Huang Ying
    Cc: Hugh Dickins
    Cc: Jani Nikula
    Cc: Matthew Auld
    Cc: William Kucharski
    Link: https://lkml.kernel.org/r/20200910183318.20139-5-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Instead of calling find_get_entry() for every page index, use an XArray
    iterator to skip over NULL entries, and avoid calling get_page(),
    because we only want the swap entries.

    [willy@infradead.org: fix LTP soft lockups]
    Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.org

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Cc: Alexey Dobriyan
    Cc: Chris Wilson
    Cc: Huang Ying
    Cc: Hugh Dickins
    Cc: Jani Nikula
    Cc: Matthew Auld
    Cc: William Kucharski
    Cc: Qian Cai
    Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • The current code does not protect against swapoff of the underlying
    swap device, so this is a bug fix as well as a worthwhile reduction in
    code complexity.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Cc: Alexey Dobriyan
    Cc: Chris Wilson
    Cc: Huang Ying
    Cc: Hugh Dickins
    Cc: Jani Nikula
    Cc: Johannes Weiner
    Cc: Matthew Auld
    Cc: William Kucharski
    Link: https://lkml.kernel.org/r/20200910183318.20139-3-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Patch series "Return head pages from find_*_entry", v2.

    This patch series started out as part of the THP patch set, but it has
    some nice effects along the way and it seems worth splitting it out and
    submitting separately.

    Currently find_get_entry() and find_lock_entry() return the page
    corresponding to the requested index, but the first thing most callers do
    is find the head page, which we just threw away. As part of auditing all
    the callers, I found some misuses of the APIs and some plain
    inefficiencies that I've fixed.

    The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

    This patch (of 8);

    Provide this functionality from the swap cache. It's useful for
    more than just mincore().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Cc: Hugh Dickins
    Cc: William Kucharski
    Cc: Jani Nikula
    Cc: Alexey Dobriyan
    Cc: Johannes Weiner
    Cc: Chris Wilson
    Cc: Matthew Auld
    Cc: Huang Ying
    Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Rename head_pincount() --> head_compound_pincount(). These names are more
    accurate (or less misleading) than the original ones.

    Signed-off-by: John Hubbard
    Signed-off-by: Andrew Morton
    Cc: Qian Cai
    Cc: Matthew Wilcox
    Cc: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Mike Rapoport
    Cc: William Kucharski
    Link: https://lkml.kernel.org/r/20200807183358.105097-1-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds

    John Hubbard
     
  • __dump_page() checks i_dentry is fetchable and i_ino is earlier in the
    struct than i_ino, so it ought to work fine, but it's possible that struct
    randomisation has reordered i_ino after i_dentry and the pointer is just
    wild enough that i_dentry is fetchable and i_ino isn't.

    Also print the inode number if the dentry is invalid.

    Reported-by: Vlastimil Babka
    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: John Hubbard
    Reviewed-by: Mike Rapoport
    Link: https://lkml.kernel.org/r/20200819185710.28180-1-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Add a sysfs attribute which denotes a range from the dax region to be
    allocated. It's an write only @mapping sysfs attribute in the format of
    '-' to allocate a range. @start and @end use hexadecimal
    values and the @pgoff is implicitly ordered wrt to previous writes to
    @mapping sysfs e.g. a write of a range of length 1G the pgoff is
    0..1G(-4K), a second write will use @pgoff for 1G+4K...

    This range mapping interface is useful for:

    1) Application which want to implement its own allocation logic, and
    thus pick the desired ranges from dax_region.

    2) For use cases like VMM fast restart[0] where after kexec we want
    to the same gpaphys mappings (as originally created before kexec).

    [0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • Introduce a new module parameter for dax_hmem which initializes all region
    devices as free, rather than allocating a pagemap for the region by
    default.

    All hmem devices created with dax_hmem.region_idle=1 will have full
    available size for creating dynamic dax devices.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • Introduce a device align attribute. While doing so, rename the region
    align attribute to be more explicitly named as so, but keep it named as
    @align to retain the API for tools like daxctl.

    Changes on align may not always be valid, when say certain mappings were
    created with 2M and then we switch to 1G. So, we validate all ranges
    against the new value being attempted, post resizing.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Introduce @align to struct dev_dax.

    When creating a new device, we still initialize to the default dax_region
    @align. Child devices belonging to a region may wish to keep a different
    alignment property instead of a global region-defined one.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • In support of interrogating the physical address layout of a device with
    dis-contiguous ranges, introduce a sysfs directory with 'start', 'end',
    and 'page_offset' attributes. The alternative is trying to parse
    /proc/iomem, and that file will not reflect the extent layout until the
    device is enabled.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Joao Martins
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Break the requirement that device-dax instances are physically contiguous.
    With this constraint removed it allows fragmented available capacity to
    be fully allocated.

    This capability is useful to mitigate the "noisy neighbor" problem with
    memory-side-cache management for virtual machines, or any other scenario
    where a platform address boundary also designates a performance boundary.
    For example a direct mapped memory side cache might rotate cache colors at
    1GB boundaries. With dis-contiguous allocations a device-dax instance
    could be configured to contain only 1 cache color.

    It also satisfies Joao's use case (see link) for partitioning memory for
    exclusive guest access. It allows for a future potential mode where the
    host kernel need not allocate 'struct page' capacity up-front.

    Reported-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
    Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In support of device-dax growing the ability to front physically
    dis-contiguous ranges of memory, update devm_memremap_pages() to track
    multiple ranges with a single reference counter and devm instance.

    Convert all [devm_]memremap_pages() users to specify the number of ranges
    they are mapping in their 'struct dev_pagemap' instance.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The 'struct resource' in 'struct dev_pagemap' is only used for holding
    resource span information. The other fields, 'name', 'flags', 'desc',
    'parent', 'sibling', and 'child' are all unused wasted space.

    This is in preparation for introducing a multi-range extension of
    devm_memremap_pages().

    The bulk of this change is unwinding all the places internal to libnvdimm
    that used 'struct resource' unnecessarily, and replacing instances of
    'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

    P2PDMA had a minor usage of the resource flags field, but only to report
    failures with "%pR". That is replaced with an open coded print of the
    range.

    [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
    Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

    Signed-off-by: Dan Williams
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Reviewed-by: Boris Ostrovsky [xen]
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Make the device-dax 'size' attribute writable to allow capacity to be
    split between multiple instances in a region. The intended consumers of
    this capability are users that want to split a scarce memory resource
    between device-dax and System-RAM access, or users that want to have
    multiple security domains for a large region.

    By default the hmem instance provider allocates an entire region to the
    first instance. The process of creating a new instance (assuming a
    region-id of 0) is find the region and trigger the 'create' attribute
    which yields an empty instance to configure. For example:

    cd /sys/bus/dax/devices
    echo dax0.0 > dax0.0/driver/unbind
    echo $new_size > dax0.0/size
    echo 1 > $(readlink -f dax0.0)../dax_region/create
    seed=$(cat $(readlink -f dax0.0)../dax_region/seed)
    echo $new_size > $seed/size
    echo dax0.0 > ../drivers/{device_dax,kmem}/bind
    echo dax0.1 > ../drivers/{device_dax,kmem}/bind

    Instances can be destroyed by:

    echo $device > $(readlink -f $device)../dax_region/delete

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115239.30709.9850106928133493138.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Use sysfs_streq() in device_find_child_by_name() to allow it to use a
    sysfs input string that might contain a trailing newline.

    The other "device by name" interfaces,
    {bus,driver,class}_find_device_by_name(), already account for sysfs
    strings.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Reviewed-by: Greg Kroah-Hartman
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643102106.4062302.12229802117645312104.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106114576.30709.2960091665444712180.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Add a seed device concept for dynamic dax regions to be able to split the
    region amongst multiple sub-instances. The seed device, similar to
    libnvdimm seed devices, is a device that starts with zero capacity
    allocated and unbound to a driver. In contrast to libnvdimm seed devices
    explicit 'create' and 'delete' interfaces are added to the region to
    trigger seeds to be created and unused devices to be reclaimed. The
    explicit create and delete replaces implicit create as a side effect of
    probe and implicit delete when writing 0 to the size that libnvdimm
    implements.

    Delete can be performed on any 0-sized and idle device. This avoids the
    gymnastics of needing to move device_unregister() to its own async
    context. Specifically, it avoids the deadlock of deleting a device via
    one of its own attributes. It is also less surprising to userspace which
    never sees an extra device it did not request.

    For now just add the device creation, teardown, and ->probe() prevention.
    A later patch will arrange for the 'dax/size' attribute to be writable to
    allocate capacity from the region.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for introducing seed devices the dax-bus core needs to be
    able to intercept ->probe() and ->remove() operations. Towards that end
    arrange for the bus and drivers to switch from raw 'struct device' driver
    operations to 'struct dev_dax' typed operations.

    Reported-by: Hulk Robot
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Jason Yan
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for a facility that enables dax regions to be sub-divided,
    introduce infrastructure to track and allocate region capacity.

    The new dax_region/available_size attribute is only enabled for volatile
    hmem devices, not pmem devices that are defined by nvdimm namespace
    boundaries. This is per Jeff's feedback the last time dynamic device-dax
    capacity allocation support was discussed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
    Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, change
    the kmem driver to use the idiomatic release_mem_region() to pair with the
    initial request_mem_region(). This also eliminates the need to open code
    the release of the resource allocated by request_mem_region().

    As there are no more dax_kmem_res users, delete this struct member.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106112239.30709.15909567572288425294.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, move
    resource name tracking to driver data. The memory for the resource name
    needs to have its own lifetime separate from the device bind lifetime for
    cases where the driver is unbound, but the kmem range could not be
    unplugged from the page allocator.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106111639.30709.17624822766862009183.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, teach the
    driver to calculate the hotplug range from the device range. The hotplug
    range is the trivially calculated memory-block-size aligned version of the
    device range.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106111109.30709.3173462396758431559.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The passed in dev_pagemap is only required in the pmem case as the
    libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
    place the memmap in pmem directly. In the hmem case there is no agent
    reserving an altmap so it can all be handled by a core internal default.

    Pass the resource range via a new @range property of 'struct
    dev_dax_data'.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for adding more parameters to instance creation, move
    existing parameters to a new struct.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • All callers specify the same flags to alloc_dax_region(), so there is no
    need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
    ->pfn_flags around on the region. Device-dax instances are always page
    backed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register
    "soft reserved" memory as an "hmem" device") only registered ranges to the
    hmem driver for each soft-reservation that also appeared in the HMAT.
    While this is meant to encourage platform firmware to "do the right thing"
    and publish an HMAT, the corollary is that platforms that fail to publish
    an accurate HMAT will strand memory from Linux usage. Additionally, the
    "efi_fake_mem" kernel command line option enabling will strand memory by
    default without an HMAT.

    Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to
    be published as raw resource ranges for the hmem driver to consume.

    Include a module parameter to disable either this fallback behavior, or
    the hmat enabling from creating hmem devices. The module parameter
    requires the hmem device enabling to have unique name in the module
    namespace: "device_hmem".

    The driver depends on the architecture providing phys_to_target_node()
    which is only x86 via numa_meminfo() and arm64 via a generic memblock
    implementation.

    [joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()]
    Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.com

    Signed-off-by: Dan Williams
    Signed-off-by: Joao Martins
    Signed-off-by: Andrew Morton
    Reviewed-by: Joao Martins
    Cc: Jonathan Cameron
    Cc: Brice Goglin
    Cc: Jeff Moyer
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jia He
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation to set a fallback value for dev_dax->target_node, introduce
    generic fallback helpers for phys_to_target_node()

    A generic implementation based on node-data or memblock was proposed, but
    as noted by Mike:

    "Here again, I would prefer to add a weak default for
    phys_to_target_node() because the "generic" implementation is not really
    generic.

    The fallback to reserved ranges is x86 specfic because on x86 most of
    the reserved areas is not in memblock.memory. AFAIK, no other
    architecture does this."

    The info message in the generic memory_add_physaddr_to_nid()
    implementation is fixed up to properly reflect that
    memory_add_physaddr_to_nid() communicates "online" node info and
    phys_to_target_node() indicates "target / to-be-onlined" node info.

    [akpm@linux-foundation.org: fix CONFIG_MEMORY_HOTPLUG=n build]
    Link: https://lkml.kernel.org/r/202008252130.7YrHIyMI%25lkp@intel.com

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Mike Rapoport
    Cc: Jia He
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643097768.4062302.3135192588966888630.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In support of detecting whether a resource might have been been claimed,
    report the parent to the walk_iomem_res_desc() callback. For example, the
    ACPI HMAT parser publishes "hmem" platform devices per target range.
    However, if the HMAT is disabled / missing a fallback driver can attach
    devices to the raw memory ranges as a fallback if it sees unclaimed /
    orphan "Soft Reserved" resources in the resource tree.

    Otherwise, find_next_iomem_res() returns a resource with garbage data from
    the stack allocation in __walk_iomem_res_desc() for the res->parent field.

    There are currently no users that expect ->child and ->sibling to be
    valid, and the resource_lock would be needed to traverse them. Use a
    compound literal to implicitly zero initialize the fields that are not
    being returned in addition to setting ->parent.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Jason Gunthorpe
    Cc: Dave Hansen
    Cc: Wei Yang
    Cc: Tom Lendacky
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Vishal Verma
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643097166.4062302.11875688887228572793.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for exposing "Soft Reserved" memory ranges without an HMAT,
    move the hmem device registration to its own compilation unit and make the
    implementation generic.

    The generic implementation drops usage acpi_map_pxm_to_online_node() that
    was translating ACPI proximity domain values and instead relies on
    numa_map_to_online_node() to determine the numa node for the device.

    [joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y]
    Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com

    Signed-off-by: Dan Williams
    Signed-off-by: Joao Martins
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for attaching a platform device per iomem resource teach
    the efi_fake_mem code to create an e820 entry per instance. Similar to
    E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Acked-by: Ard Biesheuvel
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643096068.4062302.11590041070221681669.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Disable parsing of the HMAT for debug, to workaround broken platform
    instances, or cases where it is otherwise not wanted.

    [rdunlap@infradead.org: fix build when CONFIG_ACPI is not set]
    Link: https://lkml.kernel.org/r/70e5ee34-9809-a997-7b49-499e4be61307@infradead.org

    Signed-off-by: Dan Williams
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: "Rafael J. Wysocki"
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643095540.4062302.732962081968036212.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.

    The device-dax facility allows an address range to be directly mapped
    through a chardev, or optionally hotplugged to the core kernel page
    allocator as System-RAM. It is the mechanism for converting persistent
    memory (pmem) to be used as another volatile memory pool i.e. the current
    Memory Tiering hot topic on linux-mm.

    In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
    it, but that labeling mechanism is not available / applicable to
    soft-reserved ("EFI specific purpose") memory [3]. This series provides a
    sysfs-mechanism for the daxctl utility to enable provisioning of
    volatile-soft-reserved memory ranges.

    The motivations for this facility are:

    1/ Allow performance differentiated memory ranges to be split between
    kernel-managed and directly-accessed use cases.

    2/ Allow physical memory to be provisioned along performance relevant
    address boundaries. For example, divide a memory-side cache [4] along
    cache-color boundaries.

    3/ Parcel out soft-reserved memory to VMs using device-dax as a security
    / permissions boundary [5]. Specifically I have seen people (ab)using
    memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
    device-dax interface on custom address ranges. A follow-on for the VM
    use case is to teach device-dax to dynamically allocate 'struct page' at
    runtime to reduce the duplication of 'struct page' space in both the
    guest and the host kernel for the same physical pages.

    [2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
    [3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
    [4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
    [5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com

    This patch (of 23):

    In preparation for adding a new numa= option clean up the existing ones to
    avoid ifdefs in numa_setup(), and provide feedback when the option is
    numa=fake= option is invalid due to kernel config. The same does not need
    to be done for numa=noacpi, since the capability is already hard disabled
    at compile-time.

    Suggested-by: Rafael J. Wysocki
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • kmemleak-test.c is just a kmemleak test module, which also can not be used
    as a built-in kernel module. Thus, i think it may should not be in mm
    dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c.
    Fix the spelling of built-in by the way.

    Signed-off-by: Hui Su
    Signed-off-by: Andrew Morton
    Cc: Catalin Marinas
    Cc: Jonathan Corbet
    Cc: Mauro Carvalho Chehab
    Cc: David S. Miller
    Cc: Rob Herring
    Cc: Masahiro Yamada
    Cc: Sam Ravnborg
    Cc: Josh Poimboeuf
    Cc: Steven Rostedt (VMware)
    Cc: Miguel Ojeda
    Cc: Divya Indi
    Cc: Tomas Winkler
    Cc: David Howells
    Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk
    Signed-off-by: Linus Torvalds

    Hui Su
     
  • kmemleak_scan() currently relies on the big tasklist_lock hammer to
    stabilize iterating through the tasklist. Instead, this patch proposes
    simply using rcu along with the rcu-safe for_each_process_thread flavor
    (without changing scan semantics), which doesn't make use of
    next_thread/p->thread_group and thus cannot race with exit. Furthermore,
    any races with fork() and not seeing the new child should be benign as
    it's not running yet and can also be detected by the next scan.

    Avoiding the tasklist_lock could prove beneficial for performance
    considering the scan operation is done periodically. I have seen
    improvements of 30%-ish when doing similar replacements on very
    pathological microbenchmarks (ie stressing get/setpriority(2)).

    However my main motivation is that it's one less user of the global
    lock, something that Linus has long time wanted to see gone eventually
    (if ever) even if the traditional fairness issues has been dealt with
    now with qrwlocks. Of course this is a very long ways ahead. This
    patch also kills another user of the deprecated tsk->thread_group.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Reviewed-by: Qian Cai
    Acked-by: Catalin Marinas
    Acked-by: Oleg Nesterov
    Link: https://lkml.kernel.org/r/20200820203902.11308-1-dave@stgolabs.net
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The commit below is incomplete, as it didn't handle the add_full() part.
    commit a4d3f8916c65 ("slub: remove useless kmem_cache_debug() before
    remove_full()")

    This patch checks for SLAB_STORE_USER instead of kmem_cache_debug(), since
    that should be the only context in which we need the list_lock for
    add_full().

    Signed-off-by: Abel Wu
    Signed-off-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Liu Xiang
    Link: https://lkml.kernel.org/r/20200811020240.1231-1-wuyun.wu@huawei.com
    Signed-off-by: Linus Torvalds

    Abel Wu
     
  • The ALLOC_SLOWPATH statistics is missing in bulk allocation now. Fix it
    by doing statistics in alloc slow path.

    Signed-off-by: Abel Wu
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Hewenliang
    Cc: Hu Shiyuan
    Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com
    Signed-off-by: Linus Torvalds

    Abel Wu
     
  • The two conditions are mutually exclusive and gcc compiler will optimise
    this into if-else-like pattern. Given that the majority of free_slowpath
    is free_frozen, let's provide some hint to the compilers.

    Tests (perf bench sched messaging -g 20 -l 400000, executed 10x
    after reboot) are done and the summarized result:

    un-patched patched
    max. 192.316 189.851
    min. 187.267 186.252
    avg. 189.154 188.086
    stdev. 1.37 0.99

    Signed-off-by: Abel Wu
    Signed-off-by: Andrew Morton
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Hewenliang
    Cc: Hu Shiyuan
    Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com
    Signed-off-by: Linus Torvalds

    Abel Wu
     
  • fix a typo error in slab.h
    "allocagtor" -> "allocator"

    Signed-off-by: tangjianqiang
    Signed-off-by: Andrew Morton
    Acked-by: Souptick Joarder
    Link: https://lkml.kernel.org/r/1600230053-24303-1-git-send-email-tangjianqiang@xiaomi.com
    Signed-off-by: Linus Torvalds

    tangjianqiang
     
  • The removed code was unnecessary and changed nothing in the flow, since in
    case of returning NULL by 'kmem_cache_alloc_node' returning 'freelist'
    from the function in question is the same as returning NULL.

    Signed-off-by: Mateusz Nosek
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: https://lkml.kernel.org/r/20200915230329.13002-1-mateusznosek0@gmail.com
    Signed-off-by: Linus Torvalds

    Mateusz Nosek
     
  • We found the following warning when build kernel with W=1:

    fs/fs_parser.c:192:5: warning: no previous prototype for `fs_param_bad_value' [-Wmissing-prototypes]
    int fs_param_bad_value(struct p_log *log, struct fs_parameter *param)
    ^
    CC drivers/usb/gadget/udc/snps_udc_core.o

    And no header file define a prototype for this function, so we should mark
    it as static.

    Signed-off-by: Luo Jiaxing
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/1601293463-25763-1-git-send-email-luojiaxing@huawei.com
    Signed-off-by: Linus Torvalds

    Luo Jiaxing