06 Jan, 2021

1 commit

  • [ Upstream commit 6268d7da4d192af339f4d688942b9ccb45a65e04 ]

    There are multiple locations that open-code the release of the last
    range in a device-dax instance. Consolidate this into a new
    dev_dax_trim_range() helper.

    This also addresses a kmemleak report:

    # cat /sys/kernel/debug/kmemleak
    [..]
    unreferenced object 0xffff976bd46f6240 (size 64):
    comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7...
    ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8...........
    backtrace:
    [] __kmalloc_track_caller+0x136/0x379
    [] krealloc+0x67/0x92
    [] __alloc_dev_dax_range+0x73/0x25c
    [] devm_create_dev_dax+0x27d/0x416
    [] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
    [] dax_pmem_probe+0x10/0x1f [dax_pmem]
    [] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
    [] really_probe+0x230/0x48d
    [] driver_probe_device+0x122/0x13b
    [] device_driver_attach+0x5b/0x60
    [] bind_store+0xb7/0xc3
    [] drv_attr_store+0x27/0x31
    [] sysfs_kf_write+0x4a/0x57
    [] kernfs_fop_write+0x150/0x1e5
    [] __vfs_write+0x1b/0x34
    [] vfs_write+0xd8/0x1d1

    Reported-by: Jane Chu
    Cc: Zhen Lei
    Link: https://lore.kernel.org/r/160834570161.1791850.14911670304441510419.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Signed-off-by: Sasha Levin

    Dan Williams
     

30 Dec, 2020

1 commit

  • commit 1aa574312518ef1d60d2dc62d58f7021db3b163a upstream.

    When I repeatedly modprobe and rmmod dax.ko, kmemleak report a
    memory leak as follows:

    unreferenced object 0xffff9a5588c05088 (size 8):
    comm "modprobe", pid 261, jiffies 4294693644 (age 42.063s)
    ...
    backtrace:
    [] kstrdup+0x35/0x70
    [] kstrdup_const+0x3d/0x50
    [] kvasprintf_const+0xbc/0xf0
    [] kobject_set_name_vargs+0x3b/0xd0
    [] kobject_set_name+0x62/0x90
    [] bus_register+0x7f/0x2b0
    [] 0xffffffffc02840f7
    [] 0xffffffffc02840b4
    [] do_one_initcall+0x58/0x240
    [] do_init_module+0x56/0x1e2
    [] load_module+0x2517/0x2840
    [] __do_sys_finit_module+0x9c/0xe0
    [] do_syscall_64+0x33/0x40
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    When rmmod dax is executed, dax_bus_exit() is missing. This patch
    can fix this bug.

    Fixes: 9567da0b408a ("device-dax: Introduce bus + driver model")
    Cc:
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Link: https://lore.kernel.org/r/20201201135929.66530-1-wanghai38@huawei.com
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Wang Hai
     

23 Nov, 2020

1 commit

  • The core-mm has a default __weak implementation of phys_to_target_node()
    to mirror the weak definition of memory_add_physaddr_to_nid(). That
    symbol is exported for modules. However, while the export in
    mm/memory_hotplug.c exported the symbol in the configuration cases of:

    CONFIG_NUMA_KEEP_MEMINFO=y
    CONFIG_MEMORY_HOTPLUG=y

    ...and:

    CONFIG_NUMA_KEEP_MEMINFO=n
    CONFIG_MEMORY_HOTPLUG=y

    ...it failed to export the symbol in the case of:

    CONFIG_NUMA_KEEP_MEMINFO=y
    CONFIG_MEMORY_HOTPLUG=n

    Not only is that broken, but Christoph points out that the kernel should
    not be exporting any __weak symbol, which means that
    memory_add_physaddr_to_nid() example that phys_to_target_node() copied
    is broken too.

    Rework the definition of phys_to_target_node() and
    memory_add_physaddr_to_nid() to not require weak symbols. Move to the
    common arch override design-pattern of an asm header defining a symbol
    to replace the default implementation.

    The only common header that all memory_add_physaddr_to_nid() producing
    architectures implement is asm/sparsemem.h. In fact, powerpc already
    defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
    Double-down on that observation and define phys_to_target_node() where
    necessary in asm/sparsemem.h. An alternate consideration that was
    discarded was to put this override in asm/numa.h, but that entangles
    with the definition of MAX_NUMNODES relative to the inclusion of
    linux/nodemask.h, and requires powerpc to grow a new header.

    The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
    now that the symbol is properly exported / stubbed in all combinations
    of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

    [dan.j.williams@intel.com: v4]
    Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
    [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
    Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com

    Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
    Reported-by: Randy Dunlap
    Reported-by: Thomas Gleixner
    Reported-by: kernel test robot
    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Tested-by: Randy Dunlap
    Tested-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Christoph Hellwig
    Cc: Joao Martins
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Vishal Verma
    Cc: Stephen Rothwell
    Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     

20 Oct, 2020

1 commit

  • Pull fuse updates from Miklos Szeredi:

    - Support directly accessing host page cache from virtiofs. This can
    improve I/O performance for various workloads, as well as reducing
    the memory requirement by eliminating double caching. Thanks to Vivek
    Goyal for doing most of the work on this.

    - Allow automatic submounting inside virtiofs. This allows unique
    st_dev/ st_ino values to be assigned inside the guest to files
    residing on different filesystems on the host. Thanks to Max Reitz
    for the patches.

    - Fix an old use after free bug found by Pradeep P V K.

    * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
    virtiofs: calculate number of scatter-gather elements accurately
    fuse: connection remove fix
    fuse: implement crossmounts
    fuse: Allow fuse_fill_super_common() for submounts
    fuse: split fuse_mount off of fuse_conn
    fuse: drop fuse_conn parameter where possible
    fuse: store fuse_conn in fuse_req
    fuse: add submount support to
    fuse: fix page dereference after free
    virtiofs: add logic to free up a memory range
    virtiofs: maintain a list of busy elements
    virtiofs: serialize truncate/punch_hole and dax fault path
    virtiofs: define dax address space operations
    virtiofs: add DAX mmap support
    virtiofs: implement dax read/write operations
    virtiofs: introduce setupmapping/removemapping commands
    virtiofs: implement FUSE_INIT map_alignment field
    virtiofs: keep a list of free dax memory ranges
    virtiofs: add a mount option to enable dax
    virtiofs: set up virtio_fs dax_device
    ...

    Linus Torvalds
     

17 Oct, 2020

2 commits

  • We soon want to pass flags, e.g., to mark added System RAM resources.
    mergeable. Prepare for that.

    This patch is based on a similar patch by Oscar Salvador:

    https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Reviewed-by: Juergen Gross # Xen related part
    Reviewed-by: Pankaj Gupta
    Acked-by: Wei Liu
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Baoquan He
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Greg Kroah-Hartman
    Cc: Vishal Verma
    Cc: Dave Jiang
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Wei Liu
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: David Hildenbrand
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Boris Ostrovsky
    Cc: Stefano Stabellini
    Cc: "Oliver O'Halloran"
    Cc: Pingfan Liu
    Cc: Nathan Lynch
    Cc: Libor Pechacek
    Cc: Anton Blanchard
    Cc: Leonardo Bras
    Cc: Ard Biesheuvel
    Cc: Eric Biederman
    Cc: Julien Grall
    Cc: Kees Cook
    Cc: Roger Pau Monné
    Cc: Thomas Gleixner
    Cc: Wei Yang
    Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     
  • The conversion to request_mem_region() is broken because it assumes that
    the range is marked busy prior to release. However, due to the way that
    the kmem driver manipulates the IORESOURCE_BUSY flag (clears it to let
    {add,remove}_memory() handle busy) it requires a manual release_resource()
    to perform cleanup.

    Given that the actual 'struct resource *' needs to be recalled, not just
    the range, add that tracking to the kmem driver-data.

    Fixes: 0513bd5bb114 ("device-dax/kmem: replace release_resource() with release_mem_region()")
    Reported-by: David Hildenbrand
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Link: https://lkml.kernel.org/r/160272252925.3136502.17220638073995895400.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     

14 Oct, 2020

20 commits

  • Add a sysfs attribute which denotes a range from the dax region to be
    allocated. It's an write only @mapping sysfs attribute in the format of
    '-' to allocate a range. @start and @end use hexadecimal
    values and the @pgoff is implicitly ordered wrt to previous writes to
    @mapping sysfs e.g. a write of a range of length 1G the pgoff is
    0..1G(-4K), a second write will use @pgoff for 1G+4K...

    This range mapping interface is useful for:

    1) Application which want to implement its own allocation logic, and
    thus pick the desired ranges from dax_region.

    2) For use cases like VMM fast restart[0] where after kexec we want
    to the same gpaphys mappings (as originally created before kexec).

    [0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • Introduce a new module parameter for dax_hmem which initializes all region
    devices as free, rather than allocating a pagemap for the region by
    default.

    All hmem devices created with dax_hmem.region_idle=1 will have full
    available size for creating dynamic dax devices.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • Introduce a device align attribute. While doing so, rename the region
    align attribute to be more explicitly named as so, but keep it named as
    @align to retain the API for tools like daxctl.

    Changes on align may not always be valid, when say certain mappings were
    created with 2M and then we switch to 1G. So, we validate all ranges
    against the new value being attempted, post resizing.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Introduce @align to struct dev_dax.

    When creating a new device, we still initialize to the default dax_region
    @align. Child devices belonging to a region may wish to keep a different
    alignment property instead of a global region-defined one.

    Signed-off-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com
    Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Joao Martins
     
  • In support of interrogating the physical address layout of a device with
    dis-contiguous ranges, introduce a sysfs directory with 'start', 'end',
    and 'page_offset' attributes. The alternative is trying to parse
    /proc/iomem, and that file will not reflect the extent layout until the
    device is enabled.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Joao Martins
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Break the requirement that device-dax instances are physically contiguous.
    With this constraint removed it allows fragmented available capacity to
    be fully allocated.

    This capability is useful to mitigate the "noisy neighbor" problem with
    memory-side-cache management for virtual machines, or any other scenario
    where a platform address boundary also designates a performance boundary.
    For example a direct mapped memory side cache might rotate cache colors at
    1GB boundaries. With dis-contiguous allocations a device-dax instance
    could be configured to contain only 1 cache color.

    It also satisfies Joao's use case (see link) for partitioning memory for
    exclusive guest access. It allows for a future potential mode where the
    host kernel need not allocate 'struct page' capacity up-front.

    Reported-by: Joao Martins
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Jonathan Cameron
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
    Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In support of device-dax growing the ability to front physically
    dis-contiguous ranges of memory, update devm_memremap_pages() to track
    multiple ranges with a single reference counter and devm instance.

    Convert all [devm_]memremap_pages() users to specify the number of ranges
    they are mapping in their 'struct dev_pagemap' instance.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The 'struct resource' in 'struct dev_pagemap' is only used for holding
    resource span information. The other fields, 'name', 'flags', 'desc',
    'parent', 'sibling', and 'child' are all unused wasted space.

    This is in preparation for introducing a multi-range extension of
    devm_memremap_pages().

    The bulk of this change is unwinding all the places internal to libnvdimm
    that used 'struct resource' unnecessarily, and replacing instances of
    'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

    P2PDMA had a minor usage of the resource flags field, but only to report
    failures with "%pR". That is replaced with an open coded print of the
    range.

    [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
    Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

    Signed-off-by: Dan Williams
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Reviewed-by: Boris Ostrovsky [xen]
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Vishal Verma
    Cc: Vivek Goyal
    Cc: Dave Jiang
    Cc: Ben Skeggs
    Cc: David Airlie
    Cc: Daniel Vetter
    Cc: Ira Weiny
    Cc: Bjorn Helgaas
    Cc: Juergen Gross
    Cc: Stefano Stabellini
    Cc: "Jérôme Glisse"
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: kernel test robot
    Cc: Mike Rapoport
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Make the device-dax 'size' attribute writable to allow capacity to be
    split between multiple instances in a region. The intended consumers of
    this capability are users that want to split a scarce memory resource
    between device-dax and System-RAM access, or users that want to have
    multiple security domains for a large region.

    By default the hmem instance provider allocates an entire region to the
    first instance. The process of creating a new instance (assuming a
    region-id of 0) is find the region and trigger the 'create' attribute
    which yields an empty instance to configure. For example:

    cd /sys/bus/dax/devices
    echo dax0.0 > dax0.0/driver/unbind
    echo $new_size > dax0.0/size
    echo 1 > $(readlink -f dax0.0)../dax_region/create
    seed=$(cat $(readlink -f dax0.0)../dax_region/seed)
    echo $new_size > $seed/size
    echo dax0.0 > ../drivers/{device_dax,kmem}/bind
    echo dax0.1 > ../drivers/{device_dax,kmem}/bind

    Instances can be destroyed by:

    echo $device > $(readlink -f $device)../dax_region/delete

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106115239.30709.9850106928133493138.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Add a seed device concept for dynamic dax regions to be able to split the
    region amongst multiple sub-instances. The seed device, similar to
    libnvdimm seed devices, is a device that starts with zero capacity
    allocated and unbound to a driver. In contrast to libnvdimm seed devices
    explicit 'create' and 'delete' interfaces are added to the region to
    trigger seeds to be created and unused devices to be reclaimed. The
    explicit create and delete replaces implicit create as a side effect of
    probe and implicit delete when writing 0 to the size that libnvdimm
    implements.

    Delete can be performed on any 0-sized and idle device. This avoids the
    gymnastics of needing to move device_unregister() to its own async
    context. Specifically, it avoids the deadlock of deleting a device via
    one of its own attributes. It is also less surprising to userspace which
    never sees an extra device it did not request.

    For now just add the device creation, teardown, and ->probe() prevention.
    A later patch will arrange for the 'dax/size' attribute to be writable to
    allocate capacity from the region.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for introducing seed devices the dax-bus core needs to be
    able to intercept ->probe() and ->remove() operations. Towards that end
    arrange for the bus and drivers to switch from raw 'struct device' driver
    operations to 'struct dev_dax' typed operations.

    Reported-by: Hulk Robot
    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Jason Yan
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for a facility that enables dax regions to be sub-divided,
    introduce infrastructure to track and allocate region capacity.

    The new dax_region/available_size attribute is only enabled for volatile
    hmem devices, not pmem devices that are defined by nvdimm namespace
    boundaries. This is per Jeff's feedback the last time dynamic device-dax
    capacity allocation support was discussed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Brice Goglin
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Hildenbrand
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com
    Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, change
    the kmem driver to use the idiomatic release_mem_region() to pair with the
    initial request_mem_region(). This also eliminates the need to open code
    the release of the resource allocated by request_mem_region().

    As there are no more dax_kmem_res users, delete this struct member.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106112239.30709.15909567572288425294.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, move
    resource name tracking to driver data. The memory for the resource name
    needs to have its own lifetime separate from the device bind lifetime for
    cases where the driver is unbound, but the kmem range could not be
    unplugged from the page allocator.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106111639.30709.17624822766862009183.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Towards removing the mode specific @dax_kmem_res attribute from the
    generic 'struct dev_dax', and preparing for multi-range support, teach the
    driver to calculate the hotplug range from the device range. The hotplug
    range is the trivially calculated memory-block-size aligned version of the
    device range.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/160106111109.30709.3173462396758431559.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The passed in dev_pagemap is only required in the pmem case as the
    libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
    place the memmap in pmem directly. In the hmem case there is no agent
    reserving an altmap so it can all be handled by a core internal default.

    Pass the resource range via a new @range property of 'struct
    dev_dax_data'.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: David Hildenbrand
    Cc: Vishal Verma
    Cc: Dave Hansen
    Cc: Pavel Tatashin
    Cc: Brice Goglin
    Cc: Dave Jiang
    Cc: Ira Weiny
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Boris Ostrovsky
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Hulk Robot
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Jason Yan
    Cc: Jeff Moyer
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vivek Goyal
    Cc: Wei Yang
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for adding more parameters to instance creation, move
    existing parameters to a new struct.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • All callers specify the same flags to alloc_dax_region(), so there is no
    need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
    ->pfn_flags around on the region. Device-dax instances are always page
    backed.

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Cc: Vishal Verma
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register
    "soft reserved" memory as an "hmem" device") only registered ranges to the
    hmem driver for each soft-reservation that also appeared in the HMAT.
    While this is meant to encourage platform firmware to "do the right thing"
    and publish an HMAT, the corollary is that platforms that fail to publish
    an accurate HMAT will strand memory from Linux usage. Additionally, the
    "efi_fake_mem" kernel command line option enabling will strand memory by
    default without an HMAT.

    Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to
    be published as raw resource ranges for the hmem driver to consume.

    Include a module parameter to disable either this fallback behavior, or
    the hmat enabling from creating hmem devices. The module parameter
    requires the hmem device enabling to have unique name in the module
    namespace: "device_hmem".

    The driver depends on the architecture providing phys_to_target_node()
    which is only x86 via numa_meminfo() and arm64 via a generic memblock
    implementation.

    [joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()]
    Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.com

    Signed-off-by: Dan Williams
    Signed-off-by: Joao Martins
    Signed-off-by: Andrew Morton
    Reviewed-by: Joao Martins
    Cc: Jonathan Cameron
    Cc: Brice Goglin
    Cc: Jeff Moyer
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Andy Lutomirski
    Cc: Ard Biesheuvel
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jia He
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • In preparation for exposing "Soft Reserved" memory ranges without an HMAT,
    move the hmem device registration to its own compilation unit and make the
    implementation generic.

    The generic implementation drops usage acpi_map_pxm_to_online_node() that
    was translating ACPI proximity domain values and instead relies on
    numa_map_to_online_node() to determine the numa node for the device.

    [joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y]
    Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com

    Signed-off-by: Dan Williams
    Signed-off-by: Joao Martins
    Signed-off-by: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Ben Skeggs
    Cc: Borislav Petkov
    Cc: Brice Goglin
    Cc: Catalin Marinas
    Cc: Daniel Vetter
    Cc: Dave Hansen
    Cc: Dave Jiang
    Cc: David Airlie
    Cc: David Hildenbrand
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Ira Weiny
    Cc: Jason Gunthorpe
    Cc: Jeff Moyer
    Cc: Jia He
    Cc: Joao Martins
    Cc: Jonathan Cameron
    Cc: Michael Ellerman
    Cc: Mike Rapoport
    Cc: Paul Mackerras
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: Vishal Verma
    Cc: Wei Yang
    Cc: Will Deacon
    Cc: Ard Biesheuvel
    Cc: Ard Biesheuvel
    Cc: Bjorn Helgaas
    Cc: Boris Ostrovsky
    Cc: Hulk Robot
    Cc: Jason Yan
    Cc: "Jérôme Glisse"
    Cc: Juergen Gross
    Cc: kernel test robot
    Cc: Randy Dunlap
    Cc: Stefano Stabellini
    Cc: Vivek Goyal
    Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com
    Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Linus Torvalds

    Dan Williams
     

20 Sep, 2020

2 commits

  • When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
    detection of dax support for non-persistent memory block devices")
    introduces the stack overflow [1][2]. Here is the call path for
    mounting ext4 file system:
    ext4_fill_super
    bdev_dax_supported
    __bdev_dax_supported
    dax_supported
    generic_fsdax_supported
    __generic_fsdax_supported
    bdev_dax_supported

    The call path leads to the infinite calling loop, so we cannot
    call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
    checking of the variable 'dax_dev' is moved prior to the two
    bdev_dax_pgoff() checks [3][4].

    [1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
    [2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
    [3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
    [4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/

    Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
    Reported-by: Yi Zhang
    Reported-by: Mikulas Patocka
    Signed-off-by: Adrian Huang
    Reviewed-by: Jan Kara
    Tested-by: Ritesh Harjani
    Cc: Coly Li
    Cc: Ira Weiny
    Cc: John Pittman
    Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
    Signed-off-by: Dan Williams

    Adrian Huang
     
  • DM was calling generic_fsdax_supported() to determine whether a device
    referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
    they don't have to duplicate common generic checks. High level code
    should call dax_supported() helper which that calls into appropriate
    helper for the particular device. This problem manifested itself as
    kernel messages:

    dm-3: error: dax access failed (-95)

    when lvm2-testsuite run in cases where a DM device was stacked on top of
    another DM device.

    Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
    Cc:
    Tested-by: Adrian Huang
    Signed-off-by: Jan Kara
    Acked-by: Mike Snitzer
    Reported-by: kernel test robot
    Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Jan Kara
     

13 Sep, 2020

1 commit


10 Sep, 2020

1 commit

  • virtiofs does not have a block device but it has dax device.
    Modify bdev_dax_pgoff() to be able to handle that.

    If there is no bdev, that means dax offset is 0. (It can't be a partition
    block device starting at an offset in dax device).

    This is little hackish. There have been discussions about getting rid
    of dax not supporting partitions.

    https://lore.kernel.org/linux-fsdevel/20200107125159.GA15745@infradead.org/

    IMHO, this path can easily break exisitng users. For example
    ioctl(BLKPG_ADD_PARTITION) will start breaking on block devices
    supporting DAX. Also, I personally find it very useful to be able to
    partition dax devices and still be able to use DAX.

    Alternatively, I tried to store offset into dax device information in iomap
    interface, but that got NACKed.

    https://lore.kernel.org/linux-fsdevel/20200217133117.GB20444@infradead.org/

    I can't think of a good path to solve this issue properly. So to make
    progress, it seems this patch is least bad option for now and I hope
    we can take it.

    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Vivek Goyal
    Reviewed-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Vishal L Verma
    Cc: "Weiny, Ira"
    Cc: linux-nvdimm@lists.01.org
    Signed-off-by: Miklos Szeredi

    Vivek Goyal
     

07 Sep, 2020

1 commit

  • Pull xen updates from Juergen Gross:
    "A small series for fixing a problem with Xen PVH guests when running
    as backends (e.g. as dom0).

    Mapping other guests' memory is now working via ZONE_DEVICE, thus not
    requiring to abuse the memory hotplug functionality for that purpose"

    * tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen: add helpers to allocate unpopulated memory
    memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
    xen/balloon: add header guard

    Linus Torvalds
     

04 Sep, 2020

2 commits

  • This is in preparation for the logic behind MEMORY_DEVICE_DEVDAX also
    being used by non DAX devices.

    No functional change intended.

    Signed-off-by: Roger Pau Monné
    Reviewed-by: Ira Weiny
    Acked-by: Andrew Morton
    Reviewed-by: Pankaj Gupta
    Link: https://lore.kernel.org/r/20200901083326.21264-3-roger.pau@citrix.com
    Signed-off-by: Juergen Gross

    Roger Pau Monne
     
  • When calling __generic_fsdax_supported(), a dax-unsupported device may
    not have dax_dev as NULL, e.g. the dax related code block is not enabled
    by Kconfig.

    Therefore in __generic_fsdax_supported(), to check whether a device
    supports DAX or not, the following order of operations should be
    performed:
    - If dax_dev pointer is NULL, it means the device driver explicitly
    announce it doesn't support DAX. Then it is OK to directly return
    false from __generic_fsdax_supported().
    - If dax_dev pointer is NOT NULL, it might be because the driver doesn't
    support DAX and not explicitly initialize related data structure. Then
    bdev_dax_supported() should be called for further check.

    If device driver desn't explicitly set its dax_dev pointer to NULL,
    this is not a bug. Calling bdev_dax_supported() makes sure they can be
    recognized as dax-unsupported eventually.

    Fixes: c2affe920b0e ("dax: do not print error message for non-persistent memory block device")
    Cc: Jan Kara
    Cc: Vishal Verma
    Reviewed-and-tested-by: Adrian Huang
    Reviewed-by: Ira Weiny
    Reviewed-by: Mike Snitzer
    Reviewed-by: Pankaj Gupta
    Signed-off-by: Coly Li
    Signed-off-by: Vishal Verma
    Link: https://lore.kernel.org/r/20200903161625.19524-1-colyli@suse.de

    Coly Li
     

21 Aug, 2020

1 commit

  • Commit 231609785cbf ("dax: print error message by pr_info()
    in __generic_fsdax_supported()") happens to print the following
    error message during booting when the non-persistent memory block
    devices are configured by device mapper. Those error messages are
    caused by the variable 'dax_dev' is NULL. Users might be confused
    with those error messages since they do not use the persistent
    memory device. Moreover, users might scare about "what's wrong
    with my disks" because they see the 'error' and 'failed' keywords.

    # dmesg | grep fail
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdk3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)
    sdb3: error: dax access failed (-95)

    # lsblk
    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    sda 8:0 0 1.1T 0 disk
    ├─sda1 8:1 0 156M 0 part
    ├─sda2 8:2 0 40G 0 part
    └─sda3 8:3 0 1.1T 0 part
    sdb 8:16 0 1.1T 0 disk
    ├─sdb1 8:17 0 600M 0 part
    ├─sdb2 8:18 0 1G 0 part
    └─sdb3 8:19 0 1.1T 0 part
    ├─rhel00-swap 254:3 0 4G 0 lvm
    ├─rhel00-home 254:4 0 1T 0 lvm
    └─rhel00-root 254:5 0 50G 0 lvm
    sdc 8:32 0 1.1T 0 disk
    sdd 8:48 0 1.1T 0 disk
    sde 8:64 0 1.1T 0 disk
    sdf 8:80 0 1.1T 0 disk
    sdg 8:96 0 1.1T 0 disk
    sdh 8:112 0 3.3T 0 disk
    ├─sdh1 8:113 0 500M 0 part /boot/efi
    ├─sdh2 8:114 0 40G 0 part /
    ├─sdh3 8:115 0 2.9T 0 part /home
    └─sdh4 8:116 0 314.6G 0 part [SWAP]
    sdi 8:128 0 1.1T 0 disk
    sdj 8:144 0 3.3T 0 disk
    ├─sdj1 8:145 0 512M 0 part
    └─sdj2 8:146 0 3.3T 0 part
    sdk 8:160 0 119.2G 0 disk
    ├─sdk1 8:161 0 200M 0 part
    ├─sdk2 8:162 0 1G 0 part
    └─sdk3 8:163 0 118G 0 part
    ├─rhel-swap 254:0 0 4G 0 lvm
    ├─rhel-home 254:1 0 64G 0 lvm
    └─rhel-root 254:2 0 50G 0 lvm
    sdl 8:176 0 119.2G 0 disk

    The call path is shown as follows:
    dm_table_determine_type
    dm_table_supports_dax
    device_supports_dax
    generic_fsdax_supported
    __generic_fsdax_supported

    With the disk configuration listing from the command 'lsblk',
    the member 'dev->dax_dev' of the block devices 'sdb3' and 'sdk3'
    (configured by device mapper) is NULL in function
    generic_fsdax_supported() because the member is configured in
    function open_table_device().

    To prevent the confusing error messages in this scenario (this is
    normal behavior), just print those error messages by pr_debug()
    by checking if dax_dev is NULL and the block device does not support
    DAX.

    Link: https://lore.kernel.org/r/20200819154236.24191-1-adrianhuang0701@gmail.com
    Fixes: 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()")
    Cc: Coly Li
    Cc: Dan Williams
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Acked-by: Coly Li
    Signed-off-by: Adrian Huang
    Signed-off-by: Vishal Verma

    Adrian Huang
     

12 Aug, 2020

1 commit

  • Pull libnvdimm updayes from Vishal Verma:
    "You'd normally receive this pull request from Dan Williams, but he's
    busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
    this cycle.

    This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
    and a few small cleanups and fixes in libnvdimm and DAX. I'd
    originally intended to make separate topic-based pull requests - one
    for libnvdimm, and one for DAX, but some of the DAX material fell out
    since it wasn't quite ready.

    Summary:

    - add 'Runtime Firmware Activation' support for NVDIMMs that
    advertise the relevant capability

    - misc libnvdimm and DAX cleanups"

    * tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
    libnvdimm/security: the 'security' attr never show 'overwrite' state
    libnvdimm/security: fix a typo
    ACPI: NFIT: Fix ARS zero-sized allocation
    dax: Fix incorrect argument passed to xas_set_err()
    ACPI: NFIT: Add runtime firmware activate support
    PM, libnvdimm: Add runtime firmware activation support
    libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
    drivers/dax: Expand lock scope to cover the use of addresses
    fs/dax: Remove unused size parameter
    dax: print error message by pr_info() in __generic_fsdax_supported()
    driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
    tools/testing/nvdimm: Emulate firmware activation commands
    tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
    tools/testing/nvdimm: Add command debug messages
    tools/testing/nvdimm: Cleanup dimm index passing
    ACPI: NFIT: Define runtime firmware activation commands
    ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
    libnvdimm: Validate command family indices

    Linus Torvalds
     

29 Jul, 2020

2 commits

  • The addition of PKS protection to dax read lock/unlock will require that
    the address returned by dax_direct_access() be protected by this lock.

    Correct the locking by ensuring that the use of kaddr and end_kaddr
    are covered by the dax read lock/unlock.

    Link: https://lore.kernel.org/r/20200717072056.73134-12-ira.weiny@intel.com
    Reviewed-by: Dan Williams
    Signed-off-by: Ira Weiny
    Signed-off-by: Vishal Verma

    Ira Weiny
     
  • In struct dax_operations, the callback routine dax_supported() returns
    a bool type result. For false return value, the caller has no idea
    whether the device does not support dax at all, or it is just some mis-
    configuration issue.

    An example is formatting an Ext4 file system on pmem device on top of
    a NVDIMM namespace by,
    # mkfs.ext4 /dev/pmem0
    If the fs block size does not match kernel space memory page size (which
    is possible on non-x86 platform), mount this Ext4 file system will fail,
    # mount -o dax /dev/pmem0 /mnt
    mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0,
    missing codepage or helper program, or other error.
    And from the dmesg output there is only the following information,
    [ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device.

    The above information is quite confusing. Because definitely the pmem0
    device supports dax operation, and the super block is consistent as how
    it was created by mkfs.ext4.

    Indeed the failure is from __generic_fsdax_supported() by the following
    code piece,
    if (blocksize != PAGE_SIZE) {
    pr_debug("%s: error: unsupported blocksize for dax\n",
    bdevname(bdev, buf));
    return false;
    }
    It is because the Ext4 block size is 4KB and kernel page size is 8KB or
    16KB.

    It is not simple to make dax_supported() from struct dax_operations
    or __generic_fsdax_supported() to return exact failure type right now.
    So the simplest fix is to use pr_info() to print all the error messages
    inside __generic_fsdax_supported(). Then users may find informative clue
    from the kernel message at least.

    Message printed by pr_debug() is very easy to be ignored by users. This
    patch prints error message by pr_info() in __generic_fsdax_supported(),
    when then mount fails, following lines can be found from dmesg output,
    [ 2705.500885] pmem0: error: unsupported blocksize for dax
    [ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device.
    Now the users may have idea the mount failure is from pmem driver for
    unsupported block size.

    Link: https://lore.kernel.org/r/20200725162450.95999-1-colyli@suse.de
    Cc: Dan Williams
    Cc: Anthony Iliopoulos
    Reported-by: Michal Suchanek
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Reviewed-by: Ira Weiny
    Reviewed-by: Pankaj Gupta
    Signed-off-by: Coly Li
    Signed-off-by: Vishal Verma

    Coly Li
     

01 Jul, 2020

1 commit


05 Jun, 2020

1 commit

  • Currently, when adding memory, we create entries in /sys/firmware/memmap/
    as "System RAM". This will lead to kexec-tools to add that memory to the
    fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The
    memory will be considered initial System RAM by the kexec'd kernel and can
    no longer be reconfigured. This is not what happens during a real reboot.

    Let's add our memory via add_memory_driver_managed() now, so we won't
    create entries in /sys/firmware/memmap/ and indicate the memory as "System
    RAM (kmem)" in /proc/iomem. This allows everybody (especially
    kexec-tools) to identify that this memory is special and has to be treated
    differently than ordinary (hotplugged) System RAM.

    Before configuring the namespace:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-33fffffff : namespace0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    After configuring the namespace:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    After loading kmem before this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
    150000000-33fffffff : System RAM
    3280000000-32ffffffff : PCI Bus 0000:00

    After loading kmem after this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
    150000000-33fffffff : System RAM (kmem)
    3280000000-32ffffffff : PCI Bus 0000:00

    After a proper reboot:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    Within the kexec kernel before this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : System RAM
    3280000000-32ffffffff : PCI Bus 0000:00

    Within the kexec kernel after this change:
    [root@localhost ~]# cat /proc/iomem
    ...
    140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    148200000-33fffffff : dax0.0
    3280000000-32ffffffff : PCI Bus 0000:00

    /sys/firmware/memmap/ before this change:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)
    0000000150000000-0000000340000000 (System RAM)

    /sys/firmware/memmap/ after a proper reboot:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)

    /sys/firmware/memmap/ after this change:
    0000000000000000-000000000009fc00 (System RAM)
    000000000009fc00-00000000000a0000 (Reserved)
    00000000000f0000-0000000000100000 (Reserved)
    0000000000100000-00000000bffdf000 (System RAM)
    00000000bffdf000-00000000c0000000 (Reserved)
    00000000feffc000-00000000ff000000 (Reserved)
    00000000fffc0000-0000000100000000 (Reserved)
    0000000100000000-0000000140000000 (System RAM)

    kexec-tools already seem to basically ignore any System RAM that's not on
    top level when searching for areas to place kexec images - but also for
    determining crash areas to dump via kdump. Changing the resource name
    won't have an impact.

    Handle unloading of the driver after memory hotremove failed properly, by
    duplicating the string if necessary.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Acked-by: Pankaj Gupta
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

03 Jun, 2020

1 commit

  • Patch series "vfs: have syncfs() return error when there are writeback
    errors", v6.

    Currently, syncfs does not return errors when one of the inodes fails to
    be written back. It will return errors based on the legacy AS_EIO and
    AS_ENOSPC flags when syncing out the block device fails, but that's not
    particularly helpful for filesystems that aren't backed by a blockdev.
    It's also possible for a stray sync to lose those errors.

    The basic idea in this set is to track writeback errors at the
    superblock level, so that we can quickly and easily check whether
    something bad happened without having to fsync each file individually.
    syncfs is then changed to reliably report writeback errors after they
    occur, much in the same fashion as fsync does now.

    This patch (of 2):

    Usually we suggest that applications call fsync when they want to ensure
    that all data written to the file has made it to the backing store, but
    that can be inefficient when there are a lot of open files.

    Calling syncfs on the filesystem can be more efficient in some
    situations, but the error reporting doesn't currently work the way most
    people expect. If a single inode on a filesystem reports a writeback
    error, syncfs won't necessarily return an error. syncfs only returns an
    error if __sync_blockdev fails, and on some filesystems that's a no-op.

    It would be better if syncfs reported an error if there were any
    writeback failures. Then applications could call syncfs to see if there
    are any errors on any open files, and could then call fsync on all of
    the other descriptors to figure out which one failed.

    This patch adds a new errseq_t to struct super_block, and has
    mapping_set_error also record writeback errors there.

    To report those errors, we also need to keep an errseq_t in struct file
    to act as a cursor. This patch adds a dedicated field for that purpose,
    which slots nicely into 4 bytes of padding at the end of struct file on
    x86_64.

    An earlier version of this patch used an O_PATH file descriptor to cue
    the kernel that the open file should track the superblock error and not
    the inode's writeback error.

    I think that API is just too weird though. This is simpler and should
    make syncfs error reporting "just work" even if someone is multiplexing
    fsync and syncfs on the same fds.

    Signed-off-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Reviewed-by: Jan Kara
    Cc: Andres Freund
    Cc: Matthew Wilcox
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: David Howells
    Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org
    Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.org
    Signed-off-by: Linus Torvalds

    Jeff Layton