Eric Lee / smarc-fsl-linux-kernel

06 Jan, 2021

1 commit

12d377b93 device-dax: Fix range release ... Browse Code »

[ Upstream commit 6268d7da4d192af339f4d688942b9ccb45a65e04 ]

There are multiple locations that open-code the release of the last
range in a device-dax instance. Consolidate this into a new
dev_dax_trim_range() helper.

This also addresses a kmemleak report:

# cat /sys/kernel/debug/kmemleak
[..]
unreferenced object 0xffff976bd46f6240 (size 64):
comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7...
ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8...........
backtrace:
[] __kmalloc_track_caller+0x136/0x379
[] krealloc+0x67/0x92
[] __alloc_dev_dax_range+0x73/0x25c
[] devm_create_dev_dax+0x27d/0x416
[] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
[] dax_pmem_probe+0x10/0x1f [dax_pmem]
[] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
[] really_probe+0x230/0x48d
[] driver_probe_device+0x122/0x13b
[] device_driver_attach+0x5b/0x60
[] bind_store+0xb7/0xc3
[] drv_attr_store+0x27/0x31
[] sysfs_kf_write+0x4a/0x57
[] kernfs_fop_write+0x150/0x1e5
[] __vfs_write+0x1b/0x34
[] vfs_write+0xd8/0x1d1

Reported-by: Jane Chu
Cc: Zhen Lei
Link: https://lore.kernel.org/r/160834570161.1791850.14911670304441510419.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams
Signed-off-by: Sasha Levin

Dan Williams
2021-01-06 21:56:56 +0800

30 Dec, 2020

1 commit

224adad2c device-dax/core: Fix memory leak when rmmod dax.ko ... Browse Code »

commit 1aa574312518ef1d60d2dc62d58f7021db3b163a upstream.

When I repeatedly modprobe and rmmod dax.ko, kmemleak report a
memory leak as follows:

unreferenced object 0xffff9a5588c05088 (size 8):
comm "modprobe", pid 261, jiffies 4294693644 (age 42.063s)
...
backtrace:
[] kstrdup+0x35/0x70
[] kstrdup_const+0x3d/0x50
[] kvasprintf_const+0xbc/0xf0
[] kobject_set_name_vargs+0x3b/0xd0
[] kobject_set_name+0x62/0x90
[] bus_register+0x7f/0x2b0
[] 0xffffffffc02840f7
[] 0xffffffffc02840b4
[] do_one_initcall+0x58/0x240
[] do_init_module+0x56/0x1e2
[] load_module+0x2517/0x2840
[] __do_sys_finit_module+0x9c/0xe0
[] do_syscall_64+0x33/0x40
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

When rmmod dax is executed, dax_bus_exit() is missing. This patch
can fix this bug.

Fixes: 9567da0b408a ("device-dax: Introduce bus + driver model")
Cc:
Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Link: https://lore.kernel.org/r/20201201135929.66530-1-wanghai38@huawei.com
Signed-off-by: Dan Williams
Signed-off-by: Greg Kroah-Hartman

Wang Hai
2020-12-30 18:54:26 +0800

23 Nov, 2020

1 commit

a927bd6ba mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports ... Browse Code »

The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y

...and:

CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y

...it failed to export the symbol in the case of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n

Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.

Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.

The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.

The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com

Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap
Reported-by: Thomas Gleixner
Reported-by: kernel test robot
Reported-by: Christoph Hellwig
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Tested-by: Randy Dunlap
Tested-by: Thomas Gleixner
Reviewed-by: Thomas Gleixner
Reviewed-by: Christoph Hellwig
Cc: Joao Martins
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Vishal Verma
Cc: Stephen Rothwell
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-11-23 02:48:22 +0800

20 Oct, 2020

1 commit

694565356 Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse updates from Miklos Szeredi:

- Support directly accessing host page cache from virtiofs. This can
improve I/O performance for various workloads, as well as reducing
the memory requirement by eliminating double caching. Thanks to Vivek
Goyal for doing most of the work on this.

- Allow automatic submounting inside virtiofs. This allows unique
st_dev/ st_ino values to be assigned inside the guest to files
residing on different filesystems on the host. Thanks to Max Reitz
for the patches.

- Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
virtiofs: calculate number of scatter-gather elements accurately
fuse: connection remove fix
fuse: implement crossmounts
fuse: Allow fuse_fill_super_common() for submounts
fuse: split fuse_mount off of fuse_conn
fuse: drop fuse_conn parameter where possible
fuse: store fuse_conn in fuse_req
fuse: add submount support to
fuse: fix page dereference after free
virtiofs: add logic to free up a memory range
virtiofs: maintain a list of busy elements
virtiofs: serialize truncate/punch_hole and dax fault path
virtiofs: define dax address space operations
virtiofs: add DAX mmap support
virtiofs: implement dax read/write operations
virtiofs: introduce setupmapping/removemapping commands
virtiofs: implement FUSE_INIT map_alignment field
virtiofs: keep a list of free dax memory ranges
virtiofs: add a mount option to enable dax
virtiofs: set up virtio_fs dax_device
...

Linus Torvalds
2020-10-20 05:28:30 +0800

17 Oct, 2020

2 commits

b61171997 mm/memory_hotplug: prepare passing flags to add_memory() and friends ... Browse Code »

We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Juergen Gross # Xen related part
Reviewed-by: Pankaj Gupta
Acked-by: Wei Liu
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Baoquan He
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Rafael J. Wysocki"
Cc: Len Brown
Cc: Greg Kroah-Hartman
Cc: Vishal Verma
Cc: Dave Jiang
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: David Hildenbrand
Cc: "Michael S. Tsirkin"
Cc: Jason Wang
Cc: Boris Ostrovsky
Cc: Stefano Stabellini
Cc: "Oliver O'Halloran"
Cc: Pingfan Liu
Cc: Nathan Lynch
Cc: Libor Pechacek
Cc: Anton Blanchard
Cc: Leonardo Bras
Cc: Ard Biesheuvel
Cc: Eric Biederman
Cc: Julien Grall
Cc: Kees Cook
Cc: Roger Pau Monné
Cc: Thomas Gleixner
Cc: Wei Yang
Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
a455aa72f device-dax/kmem: fix resource release ... Browse Code »

The conversion to request_mem_region() is broken because it assumes that
the range is marked busy prior to release. However, due to the way that
the kmem driver manipulates the IORESOURCE_BUSY flag (clears it to let
{add,remove}_memory() handle busy) it requires a manual release_resource()
to perform cleanup.

Given that the actual 'struct resource *' needs to be recalled, not just
the range, add that tracking to the kmem driver-data.

Fixes: 0513bd5bb114 ("device-dax/kmem: replace release_resource() with release_mem_region()")
Reported-by: David Hildenbrand
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Link: https://lkml.kernel.org/r/160272252925.3136502.17220638073995895400.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-17 02:11:14 +0800

14 Oct, 2020

20 commits

8490e2e25 device-dax: add a range mapping allocation attribute ... Browse Code »

Add a sysfs attribute which denotes a range from the dax region to be
allocated. It's an write only @mapping sysfs attribute in the format of
'-' to allocate a range. @start and @end use hexadecimal
values and the @pgoff is implicitly ordered wrt to previous writes to
@mapping sysfs e.g. a write of a range of length 1G the pgoff is
0..1G(-4K), a second write will use @pgoff for 1G+4K...

This range mapping interface is useful for:

1) Application which want to implement its own allocation logic, and
thus pick the desired ranges from dax_region.

2) For use cases like VMM fast restart[0] where after kexec we want
to the same gpaphys mappings (as originally created before kexec).

[0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf

Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Joao Martins
2020-10-14 09:38:29 +0800
5a505603a dax/hmem: introduce dax_hmem.region_idle parameter ... Browse Code »

Introduce a new module parameter for dax_hmem which initializes all region
devices as free, rather than allocating a pagemap for the region by
default.

All hmem devices created with dax_hmem.region_idle=1 will have full
available size for creating dynamic dax devices.

Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Joao Martins
2020-10-14 09:38:28 +0800
6d82120f4 device-dax: add an 'align' attribute ... Browse Code »

Introduce a device align attribute. While doing so, rename the region
align attribute to be more explicitly named as so, but keep it named as
@align to retain the API for tools like daxctl.

Changes on align may not always be valid, when say certain mappings were
created with 2M and then we switch to 1G. So, we validate all ranges
against the new value being attempted, post resizing.

Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
33cf94d71 device-dax: make align a per-device property ... Browse Code »

Introduce @align to struct dev_dax.

When creating a new device, we still initialize to the default dax_region
@align. Child devices belonging to a region may wish to keep a different
alignment property instead of a global region-defined one.

Signed-off-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Joao Martins
2020-10-14 09:38:28 +0800
0b07ce872 device-dax: introduce 'mapping' devices ... Browse Code »

In support of interrogating the physical address layout of a device with
dis-contiguous ranges, introduce a sysfs directory with 'start', 'end',
and 'page_offset' attributes. The alternative is trying to parse
/proc/iomem, and that file will not reflect the extent layout until the
device is enabled.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Joao Martins
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
60e93dc09 device-dax: add dis-contiguous resource support ... Browse Code »

Break the requirement that device-dax instances are physically contiguous.
With this constraint removed it allows fragmented available capacity to
be fully allocated.

This capability is useful to mitigate the "noisy neighbor" problem with
memory-side-cache management for virtual machines, or any other scenario
where a platform address boundary also designates a performance boundary.
For example a direct mapped memory side cache might rotate cache colors at
1GB boundaries. With dis-contiguous allocations a device-dax instance
could be configured to contain only 1 cache color.

It also satisfies Joao's use case (see link) for partitioning memory for
exclusive guest access. It allows for a future potential mode where the
host kernel need not allocate 'struct page' capacity up-front.

Reported-by: Joao Martins
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Jonathan Cameron
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
b7b3c01b1 mm/memremap_pages: support multiple ranges per invocation ... Browse Code »

In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.

Convert all [devm_]memremap_pages() users to specify the number of ranges
they are mapping in their 'struct dev_pagemap' instance.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
a4574f63e mm/memremap_pages: convert to 'struct range' ... Browse Code »

The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams
Signed-off-by: Dan Carpenter
Signed-off-by: Andrew Morton
Reviewed-by: Boris Ostrovsky [xen]
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Vishal Verma
Cc: Vivek Goyal
Cc: Dave Jiang
Cc: Ben Skeggs
Cc: David Airlie
Cc: Daniel Vetter
Cc: Ira Weiny
Cc: Bjorn Helgaas
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: "Jérôme Glisse"
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: kernel test robot
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
fcffb6a1d device-dax: add resize support ... Browse Code »

Make the device-dax 'size' attribute writable to allow capacity to be
split between multiple instances in a region. The intended consumers of
this capability are users that want to split a scarce memory resource
between device-dax and System-RAM access, or users that want to have
multiple security domains for a large region.

By default the hmem instance provider allocates an entire region to the
first instance. The process of creating a new instance (assuming a
region-id of 0) is find the region and trigger the 'create' attribute
which yields an empty instance to configure. For example:

cd /sys/bus/dax/devices
echo dax0.0 > dax0.0/driver/unbind
echo $new_size > dax0.0/size
echo 1 > $(readlink -f dax0.0)../dax_region/create
seed=$(cat $(readlink -f dax0.0)../dax_region/seed)
echo $new_size > $seed/size
echo dax0.0 > ../drivers/{device_dax,kmem}/bind
echo dax0.1 > ../drivers/{device_dax,kmem}/bind

Instances can be destroyed by:

echo $device > $(readlink -f $device)../dax_region/delete

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115239.30709.9850106928133493138.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
0f3da14a4 device-dax: introduce 'seed' devices ... Browse Code »

Add a seed device concept for dynamic dax regions to be able to split the
region amongst multiple sub-instances. The seed device, similar to
libnvdimm seed devices, is a device that starts with zero capacity
allocated and unbound to a driver. In contrast to libnvdimm seed devices
explicit 'create' and 'delete' interfaces are added to the region to
trigger seeds to be created and unused devices to be reclaimed. The
explicit create and delete replaces implicit create as a side effect of
probe and implicit delete when writing 0 to the size that libnvdimm
implements.

Delete can be performed on any 0-sized and idle device. This avoids the
gymnastics of needing to move device_unregister() to its own async
context. Specifically, it avoids the deadlock of deleting a device via
one of its own attributes. It is also less surprising to userspace which
never sees an extra device it did not request.

For now just add the device creation, teardown, and ->probe() prevention.
A later patch will arrange for the 'dax/size' attribute to be writable to
allocate capacity from the region.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
f11cf813d device-dax: introduce 'struct dev_dax' typed-driver operations ... Browse Code »

In preparation for introducing seed devices the dax-bus core needs to be
able to intercept ->probe() and ->remove() operations. Towards that end
arrange for the bus and drivers to switch from raw 'struct device' driver
operations to 'struct dev_dax' typed operations.

Reported-by: Hulk Robot
Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Jason Yan
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
c2f3011ee device-dax: add an allocation interface for device-dax instances ... Browse Code »

In preparation for a facility that enables dax regions to be sub-divided,
introduce infrastructure to track and allocate region capacity.

The new dax_region/available_size attribute is only enabled for volatile
hmem devices, not pmem devices that are defined by nvdimm namespace
boundaries. This is per Jeff's feedback the last time dynamic device-dax
capacity allocation support was discussed.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Brice Goglin
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Hildenbrand
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
0513bd5bb device-dax/kmem: replace release_resource() with release_mem_region() ... Browse Code »

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, change
the kmem driver to use the idiomatic release_mem_region() to pair with the
initial request_mem_region(). This also eliminates the need to open code
the release of the resource allocated by request_mem_region().

As there are no more dax_kmem_res users, delete this struct member.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106112239.30709.15909567572288425294.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
7e6b431aa device-dax/kmem: move resource name tracking to drvdata ... Browse Code »

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, move
resource name tracking to driver data. The memory for the resource name
needs to have its own lifetime separate from the device bind lifetime for
cases where the driver is unbound, but the kmem range could not be
unplugged from the page allocator.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106111639.30709.17624822766862009183.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
59bc8d10d device-dax/kmem: introduce dax_kmem_range() ... Browse Code »

Towards removing the mode specific @dax_kmem_res attribute from the
generic 'struct dev_dax', and preparing for multi-range support, teach the
driver to calculate the hotplug range from the device range. The hotplug
range is the trivially calculated memory-block-size aligned version of the
device range.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/160106111109.30709.3173462396758431559.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
f5516ec5e device-dax: make pgmap optional for instance creation ... Browse Code »

The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly. In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.

Pass the resource range via a new @range property of 'struct
dev_dax_data'.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: David Hildenbrand
Cc: Vishal Verma
Cc: Dave Hansen
Cc: Pavel Tatashin
Cc: Brice Goglin
Cc: Dave Jiang
Cc: Ira Weiny
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Boris Ostrovsky
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Hulk Robot
Cc: Ingo Molnar
Cc: Jason Gunthorpe
Cc: Jason Yan
Cc: Jeff Moyer
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vivek Goyal
Cc: Wei Yang
Cc: Will Deacon
Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
174ebece3 device-dax: move instance creation parameters to 'struct dev_dax_data' ... Browse Code »

In preparation for adding more parameters to instance creation, move
existing parameters to a new struct.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
ec8269099 device-dax: drop the dax_region.pfn_flags attribute ... Browse Code »

All callers specify the same flags to alloc_dax_region(), so there is no
need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
->pfn_flags around on the region. Device-dax instances are always page
backed.

Signed-off-by: Dan Williams
Signed-off-by: Andrew Morton
Cc: Vishal Verma
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
5ccac54f3 ACPI: HMAT: attach a device for each soft-reserved range ... Browse Code »

The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register
"soft reserved" memory as an "hmem" device") only registered ranges to the
hmem driver for each soft-reservation that also appeared in the HMAT.
While this is meant to encourage platform firmware to "do the right thing"
and publish an HMAT, the corollary is that platforms that fail to publish
an accurate HMAT will strand memory from Linux usage. Additionally, the
"efi_fake_mem" kernel command line option enabling will strand memory by
default without an HMAT.

Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to
be published as raw resource ranges for the hmem driver to consume.

Include a module parameter to disable either this fallback behavior, or
the hmat enabling from creating hmem devices. The module parameter
requires the hmem device enabling to have unique name in the module
namespace: "device_hmem".

The driver depends on the architecture providing phys_to_target_node()
which is only x86 via numa_meminfo() and arm64 via a generic memblock
implementation.

[joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()]
Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.com

Signed-off-by: Dan Williams
Signed-off-by: Joao Martins
Signed-off-by: Andrew Morton
Reviewed-by: Joao Martins
Cc: Jonathan Cameron
Cc: Brice Goglin
Cc: Jeff Moyer
Cc: Catalin Marinas
Cc: Will Deacon
Cc: Andy Lutomirski
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jia He
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:28 +0800
c01044cc8 ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device ... Browse Code »

In preparation for exposing "Soft Reserved" memory ranges without an HMAT,
move the hmem device registration to its own compilation unit and make the
implementation generic.

The generic implementation drops usage acpi_map_pxm_to_online_node() that
was translating ACPI proximity domain values and instead relies on
numa_map_to_online_node() to determine the numa node for the device.

[joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y]
Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com

Signed-off-by: Dan Williams
Signed-off-by: Joao Martins
Signed-off-by: Andrew Morton
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Ben Skeggs
Cc: Borislav Petkov
Cc: Brice Goglin
Cc: Catalin Marinas
Cc: Daniel Vetter
Cc: Dave Hansen
Cc: Dave Jiang
Cc: David Airlie
Cc: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Ira Weiny
Cc: Jason Gunthorpe
Cc: Jeff Moyer
Cc: Jia He
Cc: Joao Martins
Cc: Jonathan Cameron
Cc: Michael Ellerman
Cc: Mike Rapoport
Cc: Paul Mackerras
Cc: Pavel Tatashin
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Thomas Gleixner
Cc: Tom Lendacky
Cc: Vishal Verma
Cc: Wei Yang
Cc: Will Deacon
Cc: Ard Biesheuvel
Cc: Ard Biesheuvel
Cc: Bjorn Helgaas
Cc: Boris Ostrovsky
Cc: Hulk Robot
Cc: Jason Yan
Cc: "Jérôme Glisse"
Cc: Juergen Gross
Cc: kernel test robot
Cc: Randy Dunlap
Cc: Stefano Stabellini
Cc: Vivek Goyal
Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds

Dan Williams
2020-10-14 09:38:27 +0800

20 Sep, 2020

2 commits

d4c5da504 dax: Fix stack overflow when mounting fsdax pmem device ... Browse Code »

When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
detection of dax support for non-persistent memory block devices")
introduces the stack overflow [1][2]. Here is the call path for
mounting ext4 file system:
ext4_fill_super
bdev_dax_supported
__bdev_dax_supported
dax_supported
generic_fsdax_supported
__generic_fsdax_supported
bdev_dax_supported

The call path leads to the infinite calling loop, so we cannot
call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
checking of the variable 'dax_dev' is moved prior to the two
bdev_dax_pgoff() checks [3][4].

[1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
[3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
[4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/

Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
Reported-by: Yi Zhang
Reported-by: Mikulas Patocka
Signed-off-by: Adrian Huang
Reviewed-by: Jan Kara
Tested-by: Ritesh Harjani
Cc: Coly Li
Cc: Ira Weiny
Cc: John Pittman
Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
Signed-off-by: Dan Williams

Adrian Huang
2020-09-20 23:57:36 +0800
e2ec51282 dm: Call proper helper to determine dax support ... Browse Code »

DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc:
Tested-by: Adrian Huang
Signed-off-by: Jan Kara
Acked-by: Mike Snitzer
Reported-by: kernel test robot
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams

Jan Kara
2020-09-20 23:55:09 +0800

13 Sep, 2020

1 commit

4f8b0a5b3 Merge tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm fix from Vishal Verma:
"Fix detection of dax support for block devices.

Previous fixes in this area, which only affected printing of debug
messages, had an incorrect condition for detection of dax. This fix
should finally do the right thing"

* tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix detection of dax support for non-persistent memory block devices

Linus Torvalds
2020-09-13 03:43:58 +0800

10 Sep, 2020

1 commit

1a9d5d405 dax: Modify bdev_dax_pgoff() to handle NULL bdev ... Browse Code »

virtiofs does not have a block device but it has dax device.
Modify bdev_dax_pgoff() to be able to handle that.

If there is no bdev, that means dax offset is 0. (It can't be a partition
block device starting at an offset in dax device).

This is little hackish. There have been discussions about getting rid
of dax not supporting partitions.

https://lore.kernel.org/linux-fsdevel/20200107125159.GA15745@infradead.org/

IMHO, this path can easily break exisitng users. For example
ioctl(BLKPG_ADD_PARTITION) will start breaking on block devices
supporting DAX. Also, I personally find it very useful to be able to
partition dax devices and still be able to use DAX.

Alternatively, I tried to store offset into dax device information in iomap
interface, but that got NACKed.

https://lore.kernel.org/linux-fsdevel/20200217133117.GB20444@infradead.org/

I can't think of a good path to solve this issue properly. So to make
progress, it seems this patch is least bad option for now and I hope
we can take it.

Signed-off-by: Stefan Hajnoczi
Signed-off-by: Vivek Goyal
Reviewed-by: Jan Kara
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Jan Kara
Cc: Vishal L Verma
Cc: "Weiny, Ira"
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Miklos Szeredi

Vivek Goyal
2020-09-10 17:39:22 +0800

07 Sep, 2020

1 commit

68beef571 Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen updates from Juergen Gross:
"A small series for fixing a problem with Xen PVH guests when running
as backends (e.g. as dom0).

Mapping other guests' memory is now working via ZONE_DEVICE, thus not
requiring to abuse the memory hotplug functionality for that purpose"

* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: add helpers to allocate unpopulated memory
memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
xen/balloon: add header guard

Linus Torvalds
2020-09-07 00:59:27 +0800

04 Sep, 2020

2 commits

4533d3aed memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC ... Browse Code »

This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also
being used by non DAX devices.

No functional change intended.

Signed-off-by: Roger Pau Monné
Reviewed-by: Ira Weiny
Acked-by: Andrew Morton
Reviewed-by: Pankaj Gupta
Link: https://lore.kernel.org/r/20200901083326.21264-3-roger.pau@citrix.com
Signed-off-by: Juergen Gross

Roger Pau Monne
2020-09-04 15:59:59 +0800
6180bb446 dax: fix detection of dax support for non-persistent memory block devices ... Browse Code »

When calling __generic_fsdax_supported(), a dax-unsupported device may
not have dax_dev as NULL, e.g. the dax related code block is not enabled
by Kconfig.

Therefore in __generic_fsdax_supported(), to check whether a device
supports DAX or not, the following order of operations should be
performed:
- If dax_dev pointer is NULL, it means the device driver explicitly
announce it doesn't support DAX. Then it is OK to directly return
false from __generic_fsdax_supported().
- If dax_dev pointer is NOT NULL, it might be because the driver doesn't
support DAX and not explicitly initialize related data structure. Then
bdev_dax_supported() should be called for further check.

If device driver desn't explicitly set its dax_dev pointer to NULL,
this is not a bug. Calling bdev_dax_supported() makes sure they can be
recognized as dax-unsupported eventually.

Fixes: c2affe920b0e ("dax: do not print error message for non-persistent memory block device")
Cc: Jan Kara
Cc: Vishal Verma
Reviewed-and-tested-by: Adrian Huang
Reviewed-by: Ira Weiny
Reviewed-by: Mike Snitzer
Reviewed-by: Pankaj Gupta
Signed-off-by: Coly Li
Signed-off-by: Vishal Verma
Link: https://lore.kernel.org/r/20200903161625.19524-1-colyli@suse.de

Coly Li
2020-09-04 02:28:03 +0800

21 Aug, 2020

1 commit

c2affe920 dax: do not print error message for non-persistent memory block device ... Browse Code »

Commit 231609785cbf ("dax: print error message by pr_info()
in __generic_fsdax_supported()") happens to print the following
error message during booting when the non-persistent memory block
devices are configured by device mapper. Those error messages are
caused by the variable 'dax_dev' is NULL. Users might be confused
with those error messages since they do not use the persistent
memory device. Moreover, users might scare about "what's wrong
with my disks" because they see the 'error' and 'failed' keywords.

# dmesg | grep fail
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 156M 0 part
├─sda2 8:2 0 40G 0 part
└─sda3 8:3 0 1.1T 0 part
sdb 8:16 0 1.1T 0 disk
├─sdb1 8:17 0 600M 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 1.1T 0 part
├─rhel00-swap 254:3 0 4G 0 lvm
├─rhel00-home 254:4 0 1T 0 lvm
└─rhel00-root 254:5 0 50G 0 lvm
sdc 8:32 0 1.1T 0 disk
sdd 8:48 0 1.1T 0 disk
sde 8:64 0 1.1T 0 disk
sdf 8:80 0 1.1T 0 disk
sdg 8:96 0 1.1T 0 disk
sdh 8:112 0 3.3T 0 disk
├─sdh1 8:113 0 500M 0 part /boot/efi
├─sdh2 8:114 0 40G 0 part /
├─sdh3 8:115 0 2.9T 0 part /home
└─sdh4 8:116 0 314.6G 0 part [SWAP]
sdi 8:128 0 1.1T 0 disk
sdj 8:144 0 3.3T 0 disk
├─sdj1 8:145 0 512M 0 part
└─sdj2 8:146 0 3.3T 0 part
sdk 8:160 0 119.2G 0 disk
├─sdk1 8:161 0 200M 0 part
├─sdk2 8:162 0 1G 0 part
└─sdk3 8:163 0 118G 0 part
├─rhel-swap 254:0 0 4G 0 lvm
├─rhel-home 254:1 0 64G 0 lvm
└─rhel-root 254:2 0 50G 0 lvm
sdl 8:176 0 119.2G 0 disk

The call path is shown as follows:
dm_table_determine_type
dm_table_supports_dax
device_supports_dax
generic_fsdax_supported
__generic_fsdax_supported

With the disk configuration listing from the command 'lsblk',
the member 'dev->dax_dev' of the block devices 'sdb3' and 'sdk3'
(configured by device mapper) is NULL in function
generic_fsdax_supported() because the member is configured in
function open_table_device().

To prevent the confusing error messages in this scenario (this is
normal behavior), just print those error messages by pr_debug()
by checking if dax_dev is NULL and the block device does not support
DAX.

Link: https://lore.kernel.org/r/20200819154236.24191-1-adrianhuang0701@gmail.com
Fixes: 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()")
Cc: Coly Li
Cc: Dan Williams
Cc: Alasdair Kergon
Cc: Mike Snitzer
Acked-by: Coly Li
Signed-off-by: Adrian Huang
Signed-off-by: Vishal Verma

Adrian Huang
2020-08-21 01:43:18 +0800

12 Aug, 2020

1 commit

4bf5e3611 Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updayes from Vishal Verma:
"You'd normally receive this pull request from Dan Williams, but he's
busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
this cycle.

This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
and a few small cleanups and fixes in libnvdimm and DAX. I'd
originally intended to make separate topic-based pull requests - one
for libnvdimm, and one for DAX, but some of the DAX material fell out
since it wasn't quite ready.

Summary:

- add 'Runtime Firmware Activation' support for NVDIMMs that
advertise the relevant capability

- misc libnvdimm and DAX cleanups"

* tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
libnvdimm/security: the 'security' attr never show 'overwrite' state
libnvdimm/security: fix a typo
ACPI: NFIT: Fix ARS zero-sized allocation
dax: Fix incorrect argument passed to xas_set_err()
ACPI: NFIT: Add runtime firmware activate support
PM, libnvdimm: Add runtime firmware activation support
libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
drivers/dax: Expand lock scope to cover the use of addresses
fs/dax: Remove unused size parameter
dax: print error message by pr_info() in __generic_fsdax_supported()
driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
tools/testing/nvdimm: Emulate firmware activation commands
tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
tools/testing/nvdimm: Add command debug messages
tools/testing/nvdimm: Cleanup dimm index passing
ACPI: NFIT: Define runtime firmware activation commands
ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
libnvdimm: Validate command family indices

Linus Torvalds
2020-08-12 01:59:19 +0800

29 Jul, 2020

2 commits

eedfd73d4 drivers/dax: Expand lock scope to cover the use of addresses ... Browse Code »

The addition of PKS protection to dax read lock/unlock will require that
the address returned by dax_direct_access() be protected by this lock.

Correct the locking by ensuring that the use of kaddr and end_kaddr
are covered by the dax read lock/unlock.

Link: https://lore.kernel.org/r/20200717072056.73134-12-ira.weiny@intel.com
Reviewed-by: Dan Williams
Signed-off-by: Ira Weiny
Signed-off-by: Vishal Verma

Ira Weiny
2020-07-29 01:50:08 +0800
231609785 dax: print error message by pr_info() in __generic_fsdax_supported() ... Browse Code »

In struct dax_operations, the callback routine dax_supported() returns
a bool type result. For false return value, the caller has no idea
whether the device does not support dax at all, or it is just some mis-
configuration issue.

An example is formatting an Ext4 file system on pmem device on top of
a NVDIMM namespace by,
# mkfs.ext4 /dev/pmem0
If the fs block size does not match kernel space memory page size (which
is possible on non-x86 platform), mount this Ext4 file system will fail,
# mount -o dax /dev/pmem0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0,
missing codepage or helper program, or other error.
And from the dmesg output there is only the following information,
[ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device.

The above information is quite confusing. Because definitely the pmem0
device supports dax operation, and the super block is consistent as how
it was created by mkfs.ext4.

Indeed the failure is from __generic_fsdax_supported() by the following
code piece,
if (blocksize != PAGE_SIZE) {
pr_debug("%s: error: unsupported blocksize for dax\n",
bdevname(bdev, buf));
return false;
}
It is because the Ext4 block size is 4KB and kernel page size is 8KB or
16KB.

It is not simple to make dax_supported() from struct dax_operations
or __generic_fsdax_supported() to return exact failure type right now.
So the simplest fix is to use pr_info() to print all the error messages
inside __generic_fsdax_supported(). Then users may find informative clue
from the kernel message at least.

Message printed by pr_debug() is very easy to be ignored by users. This
patch prints error message by pr_info() in __generic_fsdax_supported(),
when then mount fails, following lines can be found from dmesg output,
[ 2705.500885] pmem0: error: unsupported blocksize for dax
[ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device.
Now the users may have idea the mount failure is from pmem driver for
unsupported block size.

Link: https://lore.kernel.org/r/20200725162450.95999-1-colyli@suse.de
Cc: Dan Williams
Cc: Anthony Iliopoulos
Reported-by: Michal Suchanek
Suggested-by: Jan Kara
Reviewed-by: Jan Kara
Reviewed-by: Ira Weiny
Reviewed-by: Pankaj Gupta
Signed-off-by: Coly Li
Signed-off-by: Vishal Verma

Coly Li
2020-07-29 01:49:27 +0800

01 Jul, 2020

1 commit

e556f6ba1 block: remove the bd_queue field from struct block_device ... Browse Code »

Just use bd_disk->queue instead.

Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 22:08:20 +0800

05 Jun, 2020

1 commit

8a725e469 device-dax: add memory via add_memory_driver_managed() ... Browse Code »

Currently, when adding memory, we create entries in /sys/firmware/memmap/
as "System RAM". This will lead to kexec-tools to add that memory to the
fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The
memory will be considered initial System RAM by the kexec'd kernel and can
no longer be reconfigured. This is not what happens during a real reboot.

Let's add our memory via add_memory_driver_managed() now, so we won't
create entries in /sys/firmware/memmap/ and indicate the memory as "System
RAM (kmem)" in /proc/iomem. This allows everybody (especially
kexec-tools) to identify that this memory is special and has to be treated
differently than ordinary (hotplugged) System RAM.

Before configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-33fffffff : namespace0.0
3280000000-32ffffffff : PCI Bus 0000:00

After configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM (kmem)
3280000000-32ffffffff : PCI Bus 0000:00

After a proper reboot:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

/sys/firmware/memmap/ before this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)
0000000150000000-0000000340000000 (System RAM)

/sys/firmware/memmap/ after a proper reboot:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)

/sys/firmware/memmap/ after this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)

kexec-tools already seem to basically ignore any System RAM that's not on
top level when searching for areas to place kexec images - but also for
determining crash areas to dump via kdump. Changing the resource name
won't have an impact.

Handle unloading of the driver after memory hotremove failed properly, by
duplicating the string if necessary.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Acked-by: Pankaj Gupta
Cc: Michal Hocko
Cc: Pankaj Gupta
Cc: Wei Yang
Cc: Baoquan He
Cc: Dave Hansen
Cc: Eric Biederman
Cc: Pavel Tatashin
Cc: Dan Williams
Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-06-05 10:06:23 +0800

03 Jun, 2020

1 commit

735e4ae5b vfs: track per-sb writeback errors and report them to syncfs ... Browse Code »

Patch series "vfs: have syncfs() return error when there are writeback
errors", v6.

Currently, syncfs does not return errors when one of the inodes fails to
be written back. It will return errors based on the legacy AS_EIO and
AS_ENOSPC flags when syncing out the block device fails, but that's not
particularly helpful for filesystems that aren't backed by a blockdev.
It's also possible for a stray sync to lose those errors.

The basic idea in this set is to track writeback errors at the
superblock level, so that we can quickly and easily check whether
something bad happened without having to fsync each file individually.
syncfs is then changed to reliably report writeback errors after they
occur, much in the same fashion as fsync does now.

This patch (of 2):

Usually we suggest that applications call fsync when they want to ensure
that all data written to the file has made it to the backing store, but
that can be inefficient when there are a lot of open files.

Calling syncfs on the filesystem can be more efficient in some
situations, but the error reporting doesn't currently work the way most
people expect. If a single inode on a filesystem reports a writeback
error, syncfs won't necessarily return an error. syncfs only returns an
error if __sync_blockdev fails, and on some filesystems that's a no-op.

It would be better if syncfs reported an error if there were any
writeback failures. Then applications could call syncfs to see if there
are any errors on any open files, and could then call fsync on all of
the other descriptors to figure out which one failed.

This patch adds a new errseq_t to struct super_block, and has
mapping_set_error also record writeback errors there.

To report those errors, we also need to keep an errseq_t in struct file
to act as a cursor. This patch adds a dedicated field for that purpose,
which slots nicely into 4 bytes of padding at the end of struct file on
x86_64.

An earlier version of this patch used an O_PATH file descriptor to cue
the kernel that the open file should track the superblock error and not
the inode's writeback error.

I think that API is just too weird though. This is simpler and should
make syncfs error reporting "just work" even if someone is multiplexing
fsync and syncfs on the same fds.

Signed-off-by: Jeff Layton
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Cc: Andres Freund
Cc: Matthew Wilcox
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: David Howells
Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org
Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.org
Signed-off-by: Linus Torvalds

Jeff Layton
2020-06-03 01:59:05 +0800