14 Oct, 2020
40 commits
-
Avoid bumping the refcount on pages when we're only interested in the
swap entries.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Cc: Alexey Dobriyan
Cc: Chris Wilson
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Jani Nikula
Cc: Matthew Auld
Cc: William Kucharski
Link: https://lkml.kernel.org/r/20200910183318.20139-5-willy@infradead.org
Signed-off-by: Linus Torvalds -
Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.[willy@infradead.org: fix LTP soft lockups]
Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.orgSigned-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Acked-by: Johannes Weiner
Cc: Alexey Dobriyan
Cc: Chris Wilson
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Jani Nikula
Cc: Matthew Auld
Cc: William Kucharski
Cc: Qian Cai
Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.org
Signed-off-by: Linus Torvalds -
The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Alexey Dobriyan
Cc: Chris Wilson
Cc: Huang Ying
Cc: Hugh Dickins
Cc: Jani Nikula
Cc: Johannes Weiner
Cc: Matthew Auld
Cc: William Kucharski
Link: https://lkml.kernel.org/r/20200910183318.20139-3-willy@infradead.org
Signed-off-by: Linus Torvalds -
Patch series "Return head pages from find_*_entry", v2.
This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away. As part of auditing all
the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.The diffstat is unflattering, but I added more kernel-doc and a new wrapper.
This patch (of 8);
Provide this functionality from the swap cache. It's useful for
more than just mincore().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Hugh Dickins
Cc: William Kucharski
Cc: Jani Nikula
Cc: Alexey Dobriyan
Cc: Johannes Weiner
Cc: Chris Wilson
Cc: Matthew Auld
Cc: Huang Ying
Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
Signed-off-by: Linus Torvalds -
Rename head_pincount() --> head_compound_pincount(). These names are more
accurate (or less misleading) than the original ones.Signed-off-by: John Hubbard
Signed-off-by: Andrew Morton
Cc: Qian Cai
Cc: Matthew Wilcox
Cc: Vlastimil Babka
Cc: Kirill A. Shutemov
Cc: Mike Rapoport
Cc: William Kucharski
Link: https://lkml.kernel.org/r/20200807183358.105097-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds -
__dump_page() checks i_dentry is fetchable and i_ino is earlier in the
struct than i_ino, so it ought to work fine, but it's possible that struct
randomisation has reordered i_ino after i_dentry and the pointer is just
wild enough that i_dentry is fetchable and i_ino isn't.Also print the inode number if the dentry is invalid.
Reported-by: Vlastimil Babka
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Mike Rapoport
Link: https://lkml.kernel.org/r/20200819185710.28180-1-willy@infradead.org
Signed-off-by: Linus Torvalds -
Add a sysfs attribute which denotes a range from the dax region to be
allocated. It's an write only @mapping sysfs attribute in the format of
'-' to allocate a range. @start and @end use hexadecimal
values and the @pgoff is implicitly ordered wrt to previous writes to
@mapping sysfs e.g. a write of a range of length 1G the pgoff is
0..1G(-4K), a second write will use @pgoff for 1G+4K...This range mapping interface is useful for:
1) Application which want to implement its own allocation logic, and
thus pick the desired ranges from dax_region.2) For use cases like VMM fast restart[0] where after kexec we want
to the same gpaphys mappings (as originally created before kexec).[0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf
Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Introduce a new module parameter for dax_hmem which initializes all region
devices as free, rather than allocating a pagemap for the region by
default.All hmem devices created with dax_hmem.region_idle=1 will have full
available size for creating dynamic dax devices.Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Introduce a device align attribute. While doing so, rename the region
align attribute to be more explicitly named as so, but keep it named as
@align to retain the API for tools like daxctl.Changes on align may not always be valid, when say certain mappings were
created with 2M and then we switch to 1G. So, we validate all ranges
against the new value being attempted, post resizing.Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Introduce @align to struct dev_dax.
When creating a new device, we still initialize to the default dax_region
@align. Child devices belonging to a region may wish to keep a different
alignment property instead of a global region-defined one.Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In support of interrogating the physical address layout of a device with
dis-contiguous ranges, introduce a sysfs directory with 'start', 'end',
and 'page_offset' attributes. The alternative is trying to parse
/proc/iomem, and that file will not reflect the extent layout until the
device is enabled.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Joao Martins
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Break the requirement that device-dax instances are physically contiguous.
With this constraint removed it allows fragmented available capacity to
be fully allocated.This capability is useful to mitigate the "noisy neighbor" problem with
memory-side-cache management for virtual machines, or any other scenario
where a platform address boundary also designates a performance boundary.
For example a direct mapped memory side cache might rotate cache colors at
1GB boundaries. With dis-contiguous allocations a device-dax instance
could be configured to contain only 1 cache color.It also satisfies Joao's use case (see link) for partitioning memory for
exclusive guest access. It allows for a future potential mode where the
host kernel need not allocate 'struct page' capacity up-front.Reported-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.Convert all [devm_]memremap_pages() users to specify the number of ranges
they are mapping in their 'struct dev_pagemap' instance.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.This is in preparation for introducing a multi-range extension of
devm_memremap_pages().The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadamSigned-off-by: Dan Williams
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Reviewed-by: Boris Ostrovsky [xen]
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Make the device-dax 'size' attribute writable to allow capacity to be
split between multiple instances in a region. The intended consumers of
this capability are users that want to split a scarce memory resource
between device-dax and System-RAM access, or users that want to have
multiple security domains for a large region.By default the hmem instance provider allocates an entire region to the
first instance. The process of creating a new instance (assuming a
region-id of 0) is find the region and trigger the 'create' attribute
which yields an empty instance to configure. For example:cd /sys/bus/dax/devices
echo dax0.0 > dax0.0/driver/unbind
echo $new_size > dax0.0/size
echo 1 > $(readlink -f dax0.0)../dax_region/create
seed=$(cat $(readlink -f dax0.0)../dax_region/seed)
echo $new_size > $seed/size
echo dax0.0 > ../drivers/{device_dax,kmem}/bind
echo dax0.1 > ../drivers/{device_dax,kmem}/bindInstances can be destroyed by:
echo $device > $(readlink -f $device)../dax_region/delete
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115239.30709.9850106928133493138.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Use sysfs_streq() in device_find_child_by_name() to allow it to use a
sysfs input string that might contain a trailing newline.The other "device by name" interfaces,
{bus,driver,class}_find_device_by_name(), already account for sysfs
strings.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Reviewed-by: Greg Kroah-Hartman
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643102106.4062302.12229802117645312104.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106114576.30709.2960091665444712180.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Add a seed device concept for dynamic dax regions to be able to split the
region amongst multiple sub-instances. The seed device, similar to
libnvdimm seed devices, is a device that starts with zero capacity
allocated and unbound to a driver. In contrast to libnvdimm seed devices
explicit 'create' and 'delete' interfaces are added to the region to
trigger seeds to be created and unused devices to be reclaimed. The
explicit create and delete replaces implicit create as a side effect of
probe and implicit delete when writing 0 to the size that libnvdimm
implements.Delete can be performed on any 0-sized and idle device. This avoids the
gymnastics of needing to move device_unregister() to its own async
context. Specifically, it avoids the deadlock of deleting a device via
one of its own attributes. It is also less surprising to userspace which
never sees an extra device it did not request.For now just add the device creation, teardown, and ->probe() prevention.
A later patch will arrange for the 'dax/size' attribute to be writable to
allocate capacity from the region.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation for introducing seed devices the dax-bus core needs to be
able to intercept ->probe() and ->remove() operations. Towards that end
arrange for the bus and drivers to switch from raw 'struct device' driver
operations to 'struct dev_dax' typed operations.Reported-by: Hulk Robot
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Jason Yan
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation for a facility that enables dax regions to be sub-divided,
introduce infrastructure to track and allocate region capacity.The new dax_region/available_size attribute is only enabled for volatile
hmem devices, not pmem devices that are defined by nvdimm namespace
boundaries. This is per Jeff's feedback the last time dynamic device-dax
capacity allocation support was discussed.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, change
the kmem driver to use the idiomatic release_mem_region() to pair with the
initial request_mem_region(). This also eliminates the need to open code
the release of the resource allocated by request_mem_region().As there are no more dax_kmem_res users, delete this struct member.
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106112239.30709.15909567572288425294.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, move
resource name tracking to driver data. The memory for the resource name
needs to have its own lifetime separate from the device bind lifetime for
cases where the driver is unbound, but the kmem range could not be
unplugged from the page allocator.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106111639.30709.17624822766862009183.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, teach the
driver to calculate the hotplug range from the device range. The hotplug
range is the trivially calculated memory-block-size aligned version of the
device range.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106111109.30709.3173462396758431559.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly. In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.Pass the resource range via a new @range property of 'struct
dev_dax_data'.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation for adding more parameters to instance creation, move
existing parameters to a new struct.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
All callers specify the same flags to alloc_dax_region(), so there is no
need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
->pfn_flags around on the region. Device-dax instances are always page
backed.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register
"soft reserved" memory as an "hmem" device") only registered ranges to the
hmem driver for each soft-reservation that also appeared in the HMAT.
While this is meant to encourage platform firmware to "do the right thing"
and publish an HMAT, the corollary is that platforms that fail to publish
an accurate HMAT will strand memory from Linux usage. Additionally, the
"efi_fake_mem" kernel command line option enabling will strand memory by
default without an HMAT.Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to
be published as raw resource ranges for the hmem driver to consume.Include a module parameter to disable either this fallback behavior, or
the hmat enabling from creating hmem devices. The module parameter
requires the hmem device enabling to have unique name in the module
namespace: "device_hmem".The driver depends on the architecture providing phys_to_target_node()
which is only x86 via numa_meminfo() and arm64 via a generic memblock
implementation.[joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()]
Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.comSigned-off-by: Dan Williams
Signed-off-by: Joao Martins
Signed-off-by: Andrew Morton
Reviewed-by: Joao Martins
Cc: Jonathan Cameron
Cc: Brice Goglin
Cc: Jeff Moyer
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jia He
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation to set a fallback value for dev_dax->target_node, introduce
generic fallback helpers for phys_to_target_node()A generic implementation based on node-data or memblock was proposed, but
as noted by Mike:"Here again, I would prefer to add a weak default for
phys_to_target_node() because the "generic" implementation is not really
generic.The fallback to reserved ranges is x86 specfic because on x86 most of
the reserved areas is not in memblock.memory. AFAIK, no other
architecture does this."The info message in the generic memory_add_physaddr_to_nid()
implementation is fixed up to properly reflect that
memory_add_physaddr_to_nid() communicates "online" node info and
phys_to_target_node() indicates "target / to-be-onlined" node info.[akpm@linux-foundation.org: fix CONFIG_MEMORY_HOTPLUG=n build]
Link: https://lkml.kernel.org/r/202008252130.7YrHIyMI%25lkp@intel.comSigned-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Mike Rapoport
Cc: Jia He
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643097768.4062302.3135192588966888630.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In support of detecting whether a resource might have been been claimed,
report the parent to the walk_iomem_res_desc() callback. For example, the
ACPI HMAT parser publishes "hmem" platform devices per target range.
However, if the HMAT is disabled / missing a fallback driver can attach
devices to the raw memory ranges as a fallback if it sees unclaimed /
orphan "Soft Reserved" resources in the resource tree.Otherwise, find_next_iomem_res() returns a resource with garbage data from
the stack allocation in __walk_iomem_res_desc() for the res->parent field.There are currently no users that expect ->child and ->sibling to be
valid, and the resource_lock would be needed to traverse them. Use a
compound literal to implicitly zero initialize the fields that are not
being returned in addition to setting ->parent.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Jason Gunthorpe
Cc: Dave Hansen
Cc: Wei Yang
Cc: Tom Lendacky
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Vishal Verma
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643097166.4062302.11875688887228572793.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation for exposing "Soft Reserved" memory ranges without an HMAT,
move the hmem device registration to its own compilation unit and make the
implementation generic.The generic implementation drops usage acpi_map_pxm_to_online_node() that
was translating ACPI proximity domain values and instead relies on
numa_map_to_online_node() to determine the numa node for the device.[joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y]
Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.comSigned-off-by: Dan Williams
Signed-off-by: Joao Martins
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
In preparation for attaching a platform device per iomem resource teach
the efi_fake_mem code to create an e820 entry per instance. Similar to
E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Acked-by: Ard Biesheuvel
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643096068.4062302.11590041070221681669.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Disable parsing of the HMAT for debug, to workaround broken platform
instances, or cases where it is otherwise not wanted.[rdunlap@infradead.org: fix build when CONFIG_ACPI is not set]
Link: https://lkml.kernel.org/r/70e5ee34-9809-a997-7b49-499e4be61307@infradead.orgSigned-off-by: Dan Williams
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Cc: Dave Hansen
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc: "H. Peter Anvin"
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: "Rafael J. Wysocki"
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643095540.4062302.732962081968036212.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.
The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the current
Memory Tiering hot topic on linux-mm.In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides a
sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.The motivations for this facility are:
1/ Allow performance differentiated memory ranges to be split between
kernel-managed and directly-accessed use cases.2/ Allow physical memory to be provisioned along performance relevant
address boundaries. For example, divide a memory-side cache [4] along
cache-color boundaries.3/ Parcel out soft-reserved memory to VMs using device-dax as a security
/ permissions boundary [5]. Specifically I have seen people (ab)using
memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
device-dax interface on custom address ranges. A follow-on for the VM
use case is to teach device-dax to dynamically allocate 'struct page' at
runtime to reduce the duplication of 'struct page' space in both the
guest and the host kernel for the same physical pages.[2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.comThis patch (of 23):
In preparation for adding a new numa= option clean up the existing ones to
avoid ifdefs in numa_setup(), and provide feedback when the option is
numa=fake= option is invalid due to kernel config. The same does not need
to be done for numa=noacpi, since the capability is already hard disabled
at compile-time.Suggested-by: Rafael J. Wysocki
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds -
kmemleak-test.c is just a kmemleak test module, which also can not be used
as a built-in kernel module. Thus, i think it may should not be in mm
dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c.
Fix the spelling of built-in by the way.Signed-off-by: Hui Su
Signed-off-by: Andrew Morton
Cc: Catalin Marinas
Cc: Jonathan Corbet
Cc: Mauro Carvalho Chehab
Cc: David S. Miller
Cc: Rob Herring
Cc: Masahiro Yamada
Cc: Sam Ravnborg
Cc: Josh Poimboeuf
Cc: Steven Rostedt (VMware)
Cc: Miguel Ojeda
Cc: Divya Indi
Cc: Tomas Winkler
Cc: David Howells
Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk
Signed-off-by: Linus Torvalds -
kmemleak_scan() currently relies on the big tasklist_lock hammer to
stabilize iterating through the tasklist. Instead, this patch proposes
simply using rcu along with the rcu-safe for_each_process_thread flavor
(without changing scan semantics), which doesn't make use of
next_thread/p->thread_group and thus cannot race with exit. Furthermore,
any races with fork() and not seeing the new child should be benign as
it's not running yet and can also be detected by the next scan.Avoiding the tasklist_lock could prove beneficial for performance
considering the scan operation is done periodically. I have seen
improvements of 30%-ish when doing similar replacements on very
pathological microbenchmarks (ie stressing get/setpriority(2)).However my main motivation is that it's one less user of the global
lock, something that Linus has long time wanted to see gone eventually
(if ever) even if the traditional fairness issues has been dealt with
now with qrwlocks. Of course this is a very long ways ahead. This
patch also kills another user of the deprecated tsk->thread_group.Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Reviewed-by: Qian Cai
Acked-by: Catalin Marinas
Acked-by: Oleg Nesterov
Link: https://lkml.kernel.org/r/20200820203902.11308-1-dave@stgolabs.net
Signed-off-by: Linus Torvalds -
The commit below is incomplete, as it didn't handle the add_full() part.
commit a4d3f8916c65 ("slub: remove useless kmem_cache_debug() before
remove_full()")This patch checks for SLAB_STORE_USER instead of kmem_cache_debug(), since
that should be the only context in which we need the list_lock for
add_full().Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Liu Xiang
Link: https://lkml.kernel.org/r/20200811020240.1231-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds -
The ALLOC_SLOWPATH statistics is missing in bulk allocation now. Fix it
by doing statistics in alloc slow path.Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Hewenliang
Cc: Hu Shiyuan
Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds -
The two conditions are mutually exclusive and gcc compiler will optimise
this into if-else-like pattern. Given that the majority of free_slowpath
is free_frozen, let's provide some hint to the compilers.Tests (perf bench sched messaging -g 20 -l 400000, executed 10x
after reboot) are done and the summarized result:un-patched patched
max. 192.316 189.851
min. 187.267 186.252
avg. 189.154 188.086
stdev. 1.37 0.99Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Hewenliang
Cc: Hu Shiyuan
Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds -
fix a typo error in slab.h
"allocagtor" -> "allocator"Signed-off-by: tangjianqiang
Signed-off-by: Andrew Morton
Acked-by: Souptick Joarder
Link: https://lkml.kernel.org/r/1600230053-24303-1-git-send-email-tangjianqiang@xiaomi.com
Signed-off-by: Linus Torvalds -
The removed code was unnecessary and changed nothing in the flow, since in
case of returning NULL by 'kmem_cache_alloc_node' returning 'freelist'
from the function in question is the same as returning NULL.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: https://lkml.kernel.org/r/20200915230329.13002-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds -
We found the following warning when build kernel with W=1:
fs/fs_parser.c:192:5: warning: no previous prototype for `fs_param_bad_value' [-Wmissing-prototypes]
int fs_param_bad_value(struct p_log *log, struct fs_parameter *param)
^
CC drivers/usb/gadget/udc/snps_udc_core.oAnd no header file define a prototype for this function, so we should mark
it as static.Signed-off-by: Luo Jiaxing
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/1601293463-25763-1-git-send-email-luojiaxing@huawei.com
Signed-off-by: Linus Torvalds