11 Apr, 2020

26 commits

  • After request_module(), nothing is stopping the module from being
    unloaded until someone takes a reference to it via try_get_module().

    The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
    running 'rmmod' concurrently.

    Since WARN_ONCE() is for kernel bugs only, not for user-reachable
    situations, downgrade this warning to pr_warn_once().

    Keep it printed once only, since the intent of this warning is to detect
    a bug in modprobe at boot time. Printing the warning more than once
    wouldn't really provide any useful extra information.

    Fixes: 41124db869b7 ("fs: warn in case userspace lied about modprobe return")
    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Reviewed-by: Jessica Yu
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Jessica Yu
    Cc: Kees Cook
    Cc: Luis Chamberlain
    Cc: NeilBrown
    Cc: [4.13+]
    Link: http://lkml.kernel.org/r/20200312202552.241885-3-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • Patch series "module autoloading fixes and cleanups", v5.

    This series fixes a bug where request_module() was reporting success to
    kernel code when module autoloading had been completely disabled via
    'echo > /proc/sys/kernel/modprobe'.

    It also addresses the issues raised on the original thread
    (https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
    bydocumenting the modprobe sysctl, adding a self-test for the empty path
    case, and downgrading a user-reachable WARN_ONCE().

    This patch (of 4):

    It's long been possible to disable kernel module autoloading completely
    (while still allowing manual module insertion) by setting
    /proc/sys/kernel/modprobe to the empty string.

    This can be preferable to setting it to a nonexistent file since it
    avoids the overhead of an attempted execve(), avoids potential
    deadlocks, and avoids the call to security_kernel_module_request() and
    thus on SELinux-based systems eliminates the need to write SELinux rules
    to dontaudit module_request.

    However, when module autoloading is disabled in this way,
    request_module() returns 0. This is broken because callers expect 0 to
    mean that the module was successfully loaded.

    Apparently this was never noticed because this method of disabling
    module autoloading isn't used much, and also most callers don't use the
    return value of request_module() since it's always necessary to check
    whether the module registered its functionality or not anyway.

    But improperly returning 0 can indeed confuse a few callers, for example
    get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

    if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
    fs = __get_fs_type(name, len);
    WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
    }

    This is easily reproduced with:

    echo > /proc/sys/kernel/modprobe
    mount -t NONEXISTENT none /

    It causes:

    request_module fs-NONEXISTENT succeeded, but still no fs?
    WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
    [...]

    This should actually use pr_warn_once() rather than WARN_ONCE(), since
    it's also user-reachable if userspace immediately unloads the module.
    Regardless, request_module() should correctly return an error when it
    fails. So let's make it return -ENOENT, which matches the error when
    the modprobe binary doesn't exist.

    I've also sent patches to document and test this case.

    Signed-off-by: Eric Biggers
    Signed-off-by: Andrew Morton
    Reviewed-by: Kees Cook
    Reviewed-by: Jessica Yu
    Acked-by: Luis Chamberlain
    Cc: Alexei Starovoitov
    Cc: Greg Kroah-Hartman
    Cc: Jeff Vander Stoep
    Cc: Ben Hutchings
    Cc: Josh Triplett
    Cc:
    Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
    Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
    Signed-off-by: Linus Torvalds

    Eric Biggers
     
  • PCI BAR IO memory should never be mapped as WB, however prior to this
    the PAT bits were set WB and it was typically overridden by MTRR
    registers set by the firmware.

    Set PCI P2PDMA memory to be UC as this is what it currently, typically,
    ends up being mapped as on x86 after the MTRR registers override the
    cache setting.

    Future use-cases may need to generalize this by adding flags to select
    the caching type, as some P2PDMA cases may not want UC. However, those
    use-cases are not upstream yet and this can be changed when they arrive.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Christoph Hellwig
    Cc: Jason Gunthorpe
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • devm_memremap_pages() is currently used by the PCI P2PDMA code to create
    struct page mappings for IO memory. At present, these mappings are
    created with PAGE_KERNEL which implies setting the PAT bits to be WB.
    However, on x86, an mtrr register will typically override this and force
    the cache type to be UC-. In the case firmware doesn't set this
    register it is effectively WB and will typically result in a machine
    check exception when it's accessed.

    Other arches are not currently likely to function correctly seeing they
    don't have any MTRR registers to fall back on.

    To solve this, provide a way to specify the pgprot value explicitly to
    arch_add_memory().

    Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
    simple change to pass the pgprot_t down to their respective functions
    which set up the page tables. For x86_32, set the page tables
    explicitly using _set_memory_prot() (seeing they are already mapped).

    For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
    should be fine, for now, seeing these architectures don't support
    ZONE_DEVICE.

    A check in __add_pages() is also added to ensure the pgprot parameter
    was set for all arches.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Acked-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Dan Williams
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • In prepartion to support a pgprot_t argument for arch_add_memory().

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Michal Hocko
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • For use in the 32bit arch_add_memory() to set the pgprot type of the
    memory to add.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • In preparation to support a pgprot_t argument for arch_add_memory().

    It's required to move the prototype of init_memory_mapping() seeing the
    original location came before the definition of pgprot_t.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: David Hildenbrand
    Cc: Eric Badger
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • The mhp_restrictions struct really doesn't specify anything resembling a
    restriction anymore so rename it to be mhp_params as it is a list of
    extended parameters.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christoph Hellwig
    Cc: Dave Hansen
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • Patch series "Allow setting caching mode in arch_add_memory() for
    P2PDMA", v4.

    Currently, the page tables created using memremap_pages() are always
    created with the PAGE_KERNEL cacheing mode. However, the P2PDMA code is
    creating pages for PCI BAR memory which should never be accessed through
    the cache and instead use either WC or UC. This still works in most
    cases, on x86, because the MTRR registers typically override the caching
    settings in the page tables for all of the IO memory to be UC-.
    However, this tends not to work so well on other arches or some rare x86
    machines that have firmware which does not setup the MTRR registers in
    this way.

    Instead of this, this series proposes a change to arch_add_memory() to
    take the pgprot required by the mapping which allows us to explicitly
    set pagetable entries for P2PDMA memory to UC.

    This changes is pretty routine for most of the arches: x86_64, arm64 and
    powerpc simply need to thread the pgprot through to where the page
    tables are setup. x86_32 unfortunately sets up the page tables at boot
    so must use _set_memory_prot() to change their caching mode. ia64, s390
    and sh don't appear to have an easy way to change the page tables so,
    for now at least, we just return -EINVAL on such mappings and thus they
    will not support P2PDMA memory until the work for this is done. This
    should be fine as they don't yet support ZONE_DEVICE.

    This patch (of 7):

    This variable is not used anywhere and should therefore be removed from
    the structure.

    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Reviewed-by: Dan Williams
    Acked-by: Michal Hocko
    Cc: Christoph Hellwig
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Eric Badger
    Cc: "H. Peter Anvin"
    Cc: Jason Gunthorpe
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Link: http://lkml.kernel.org/r/20200306170846.9333-2-logang@deltatee.com
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
    but required to define quite similar fallback stubs for special page
    table entry helpers such as pte_special() and pte_mkspecial(), as they
    get build in generic MM without a config check. This creates two
    generic fallback stub definitions for these helpers, eliminating much
    code duplication.

    mips platform has a special case where pte_special() and pte_mkspecial()
    visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
    This restricts those symbol visibility in order to avoid redefinitions
    which is now exposed through this new generic stubs and subsequent build
    failure. arm platform set_pte_at() definition needs to be moved into a
    C file just to prevent a build failure.

    [anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
    Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Acked-by: Guo Ren [csky]
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Stafford Horne [openrisc]
    Acked-by: Helge Deller [parisc]
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Russell King
    Cc: Brian Cain
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Sam Creasey
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: Nick Hu
    Cc: Greentime Hu
    Cc: Vincent Chen
    Cc: Ley Foon Tan
    Cc: Jonas Bonn
    Cc: Stefan Kristiansson
    Cc: "James E.J. Bottomley"
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Anton Ivanov
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Thomas Bogendoerfer
    Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • There are many places where all basic VMA access flags (read, write,
    exec) are initialized or checked against as a group. One such example
    is during page fault. Existing vma_is_accessible() wrapper already
    creates the notion of VMA accessibility as a group access permissions.

    Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
    will not only reduce code duplication but also extend the VMA
    accessibility concept in general.

    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Mark Salter
    Cc: Nick Hu
    Cc: Ley Foon Tan
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Guan Xuetao
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Rob Springer
    Cc: Greg Kroah-Hartman
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
    This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
    existing VM_STACK_DEFAULT_FLAGS. While here, also define some more
    macros with standard VMA access flag combinations that are used
    frequently across many platforms. Apart from simplification, this
    reduces code duplication as well.

    Signed-off-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Acked-by: Geert Uytterhoeven
    Cc: Richard Henderson
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Mark Salter
    Cc: Guo Ren
    Cc: Yoshinori Sato
    Cc: Brian Cain
    Cc: Tony Luck
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: Nick Hu
    Cc: Ley Foon Tan
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Michael Ellerman
    Cc: Paul Walmsley
    Cc: Heiko Carstens
    Cc: Rich Felker
    Cc: "David S. Miller"
    Cc: Guan Xuetao
    Cc: Thomas Gleixner
    Cc: Jeff Dike
    Cc: Chris Zankel
    Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • Add the ability to insert multiple pages at once to a user VM with lower
    PTE spinlock operations.

    The intention of this patch-set is to reduce atomic ops for tcp zerocopy
    receives, which normally hits the same spinlock multiple times
    consecutively.

    [akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
    [arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
    Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
    [arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
    Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
    Signed-off-by: Arjun Roy
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Jason Gunthorpe
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • pte_index() is either defined as a macro (e.g. sparc64) or as an
    inlined function (e.g. x86). vm_insert_pages() depends on pte_index
    but it is not defined on all platforms (e.g. m68k).

    To fix compilation of vm_insert_pages() on architectures not providing
    pte_index(), we perform the following fix:

    0. For platforms where it is meaningful, and defined as a macro, no
    change is needed.
    1. For platforms where it is meaningful and defined as an inlined
    function, and we want to use it with vm_insert_pages(), we define
    a degenerate macro of the form: #define pte_index pte_index
    2. vm_insert_pages() checks for the existence of a pte_index macro
    definition. If found, it implements a batched insert. If not found,
    it devolves to calling vm_insert_page() in a loop.

    This patch implements step 1 for x86.

    v3 of this patch fixes a compilation warning for an unused method.
    v2 of this patch moved a macro definition to a more readable location.

    Signed-off-by: Arjun Roy
    Signed-off-by: Andrew Morton
    Cc: David Miller
    Cc: Eric Dumazet
    Cc: Jason Gunthorpe
    Cc: Matthew Wilcox
    Cc: Soheil Hassas Yeganeh
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200228054714.204424-1-arjunroy.kdev@gmail.com
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • pte_index() on platforms other than sparc return a numerical index. On
    sparc, it returns a pte_t*. This presents an issue for
    vm_insert_pages(), which relies on pte_index() to find the offset for a
    pte within a pmd, for batched inserts.

    This patch:
    1. Modifies pte_index() for sparc to return a numerical index, like
    other platforms,
    2. Defines pte_entry() for sparc which returns a pte_t*
    (as pte_index() used to),
    3. Converts existing sparc callers for pte_index() to use pte_entry().

    [sfr@canb.auug.org.au: remove pte_entry and just directly modified pte_offset_kernel instead]
    Signed-off-by: Arjun Roy
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Cc: Eric Dumazet
    Cc: Soheil Hassas Yeganeh
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Arjun Roy
    Cc: Jason Gunthorpe
    Link: http://lkml.kernel.org/r/20200227105045.6b421d9f@canb.auug.org.au
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • Add helper methods for vm_insert_page()/insert_page() to prepare for
    vm_insert_pages(), which batch-inserts pages to reduce spinlock
    operations when inserting multiple consecutive pages into the user page
    table.

    The intention of this patch-set is to reduce atomic ops for tcp zerocopy
    receives, which normally hits the same spinlock multiple times
    consecutively.

    Signed-off-by: Arjun Roy
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Jason Gunthorpe
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
    arch_get_unmapped_area_topdown did not set align_offset. Internally on
    both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
    then info->align_offset was meaningless.

    But commit df529cabb7a2 ("mm: mmap: add trace point of
    vm_unmapped_area") always prints info->align_offset even though it is
    uninitialized.

    Fix this uninitialized value issue by setting it to 0 explicitly.

    Before:
    vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022

    After:
    vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0

    Signed-off-by: Jaewon Kim
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Matthew Wilcox (Oracle)
    Cc: Michel Lespinasse
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
    Signed-off-by: Linus Torvalds

    Jaewon Kim
     
  • Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation
    at runtime") has added the run-time allocation of gigantic pages.

    However it actually works only at early stages of the system loading,
    when the majority of memory is free. After some time the memory gets
    fragmented by non-movable pages, so the chances to find a contiguous 1GB
    block are getting close to zero. Even dropping caches manually doesn't
    help a lot.

    At large scale rebooting servers in order to allocate gigantic hugepages
    is quite expensive and complex. At the same time keeping some constant
    percentage of memory in reserved hugepages even if the workload isn't
    using it is a big waste: not all workloads can benefit from using 1 GB
    pages.

    The following solution can solve the problem:
    1) On boot time a dedicated cma area* is reserved. The size is passed
    as a kernel argument.
    2) Run-time allocations of gigantic hugepages are performed using the
    cma allocator and the dedicated cma area

    In this case gigantic hugepages can be allocated successfully with a
    high probability, however the memory isn't completely wasted if nobody
    is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
    etc.

    * On a multi-node machine a per-node cma area is allocated on each node.
    Following gigantic hugetlb allocation are using the first available
    numa node if the mask isn't specified by a user.

    Usage:
    1) configure the kernel to allocate a cma area for hugetlb allocations:
    pass hugetlb_cma=10G as a kernel argument

    2) allocate hugetlb pages as usual, e.g.
    echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

    If the option isn't enabled or the allocation of the cma area failed,
    the current behavior of the system is preserved.

    x86 and arm-64 are covered by this patch, other architectures can be
    trivially added later.

    The patch contains clean-ups and fixes proposed and implemented by Aslan
    Bakirov and Randy Dunlap. It also contains ideas and suggestions
    proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks!

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Tested-by: Andreas Schaufler
    Acked-by: Mike Kravetz
    Acked-by: Michal Hocko
    Cc: Aslan Bakirov
    Cc: Randy Dunlap
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • I've noticed that there is no interface exposed by CMA which would let
    me to declare contigous memory on particular NUMA node.

    This patchset adds the ability to try to allocate contiguous memory on a
    specific node. It will fallback to other nodes if the specified one
    doesn't work.

    Implement a new method for declaring contigous memory on particular node
    and keep cma_declare_contiguous() as a wrapper.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Aslan Bakirov
    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko
    Cc: Andreas Schaufler
    Cc: Mike Kravetz
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
    Signed-off-by: Linus Torvalds

    Aslan Bakirov
     
  • Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can
    exceed the inode size. Ocfs2 now doesn't allow that offset beyond inode
    size. This restriction is not necessary and violates fallocate(2)
    semantics.

    If fallocate(2) offset is beyond inode size, just return success and do
    nothing further.

    Otherwise, ocfs2 will crash the kernel.

    kernel BUG at fs/ocfs2//alloc.c:7264!
    ocfs2_truncate_inline+0x20f/0x360 [ocfs2]
    ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2]
    __ocfs2_change_file_space+0x4a5/0x650 [ocfs2]
    ocfs2_fallocate+0x83/0xa0 [ocfs2]
    vfs_fallocate+0x148/0x230
    SyS_fallocate+0x48/0x80
    do_syscall_64+0x79/0x170

    Signed-off-by: Changwei Ge
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.com
    Signed-off-by: Linus Torvalds

    Changwei Ge
     
  • Fix the following sparse warning:

    mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static?
    mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static?

    Reported-by: Hulk Robot
    Signed-off-by: Jason Yan
    Signed-off-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200407023925.46438-1-yanaijie@huawei.com
    Signed-off-by: Linus Torvalds

    Jason Yan
     
  • Add description of function parameter 'mt' to fix kernel-doc warning:

    mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Acked-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/02998bd4-0b82-2f15-2570-f86130304d1e@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • There is a typo at the cross-reference link, causing this warning:

    include/linux/slab.h:11: WARNING: undefined label: memory-allocation (if the link has no caption the label must precede a section header)

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Andrew Morton
    Cc: Jonathan Corbet
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/0aeac24235d356ebd935d11e147dcc6edbb6465c.1586359676.git.mchehab+huawei@kernel.org
    Signed-off-by: Linus Torvalds

    Mauro Carvalho Chehab
     
  • There is a typo in comment, fix it.
    s/eariler/earlier/

    Signed-off-by: Qiujun Huang
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Acked-by: Christoph Lameter
    Link: http://lkml.kernel.org/r/20200405160544.1246-1-hqjagain@gmail.com
    Signed-off-by: Linus Torvalds

    Qiujun Huang
     
  • If a cgroup violates its memory.high constraints, we may end up unduly
    penalising it. For example, for the following hierarchy:

    A: max high, 20 usage
    A/B: 9 high, 10 usage
    A/C: max high, 10 usage

    We would end up doing the following calculation below when calculating
    high delay for A/B:

    A/B: 10 - 9 = 1...
    A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21.

    This gets worse with higher disparities in usage in the parent.

    I have no idea how this disappeared from the final version of the patch,
    but it is certainly Not Good(tm). This wasn't obvious in testing because,
    for a simple cgroup hierarchy with only one child, the result is usually
    roughly the same. It's only in more complex hierarchies that things go
    really awry (although still, the effects are limited to a maximum of 2
    seconds in schedule_timeout_killable at a maximum).

    [chris@chrisdown.name: changelog]
    Fixes: e26733e0d0ec ("mm, memcg: throttle allocators based on ancestral memory.high")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Chris Down
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: [5.4.x]
    Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name
    Signed-off-by: Linus Torvalds

    Jakub Kicinski
     
  • When removing files containing extended attributes, the hfsplus driver may
    remove the wrong entries from the attributes b-tree, causing major
    filesystem damage and in some cases even kernel crashes.

    To remove a file, all its extended attributes have to be removed as well.
    The driver does this by looking up all keys in the attributes b-tree with
    the cnid of the file. Each of these entries then gets deleted using the
    key used for searching, which doesn't contain the attribute's name when it
    should. Since the key doesn't contain the name, the deletion routine will
    not find the correct entry and instead remove the one in front of it. If
    parent nodes have to be modified, these become corrupt as well. This
    causes invalid links and unsorted entries that not even macOS's fsck_hfs
    is able to fix.

    To fix this, modify the search key before an entry is deleted from the
    attributes b-tree by copying the found entry's key into the search key,
    therefore ensuring that the correct entry gets removed from the tree.

    Signed-off-by: Simon Gander
    Signed-off-by: Andrew Morton
    Reviewed-by: Anton Altaparmakov
    Cc:
    Link: http://lkml.kernel.org/r/20200327155541.1521-1-simon@tuxera.com
    Signed-off-by: Linus Torvalds

    Simon Gander
     

10 Apr, 2020

5 commits

  • Pull module updates from Jessica Yu:
    "Only a small cleanup this time around: a trivial conversion of
    zero-length arrays to flexible arrays"

    * tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    kernel: module: Replace zero-length array with flexible-array member

    Linus Torvalds
     
  • Pull arm64 fixes from Catalin Marinas:

    - Ensure that the compiler and linker versions are aligned so that ld
    doesn't complain about not understanding a .note.gnu.property section
    (emitted when pointer authentication is enabled).

    - Force -mbranch-protection=none when the feature is not enabled, in
    case a compiler may choose a different default value.

    - Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
    rarely enabled.

    - Fix checking 16-bit Thumb-2 instructions checking mask in the
    emulation of the SETEND instruction (it could match the bottom half
    of a 32-bit Thumb-2 instruction).

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
    arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
    arm64: Always force a branch protection mode when the compiler has one
    arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
    init/kconfig: Add LD_VERSION Kconfig

    Linus Torvalds
     
  • Pull more powerpc updates from Michael Ellerman:
    "The bulk of this is the series to make CONFIG_COMPAT user-selectable,
    it's been around for a long time but was blocked behind the
    syscall-in-C series.

    Plus there's also a few fixes and other minor things.

    Summary:

    - A fix for a crash in machine check handling on pseries (ie. guests)

    - A small series to make it possible to disable CONFIG_COMPAT, and
    turn it off by default for ppc64le where it's not used.

    - A few other miscellaneous fixes and small improvements.

    Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
    Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
    Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
    Nicholas Piggin, Stephen Boyd, Wen Xiong"

    * tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    selftests/powerpc: Always build the tm-poison test 64-bit
    powerpc: Improve ppc_save_regs()
    Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
    powerpc/time: Replace by
    powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
    powerpc/perf: split callchain.c by bitness
    powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
    powerpc/64: make buildable without CONFIG_COMPAT
    powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
    powerpc/perf: consolidate read_user_stack_32
    powerpc: move common register copy functions from signal_32.c to signal.c
    powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
    powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
    powerpc/ps3: Remove an unneeded NULL check
    powerpc/ps3: Remove duplicate error message
    powerpc/powernv: Re-enable imc trace-mode in kernel
    powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
    powerpc/pseries: Fix MCE handling on pseries
    selftests/eeh: Skip ahci adapters
    powerpc/64s: Fix doorbell wakeup msgclr optimisation

    Linus Torvalds
     
  • Pull m68knommu update from Greg Ungerer:
    "Only a single commit, to remove all use of the obsolete setup_irq()
    calls within the m68knommu architecture code"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68k: Replace setup_irq() by request_irq()

    Linus Torvalds
     
  • Pull RISC-V updates from Palmer Dabbelt:
    "This contains a handful of new features:

    - Partial support for the Kendryte K210.

    There are still a few outstanding issues that I have patches for,
    but I don't actually have a board to test them so they're not
    included yet.

    - SBI v0.2 support.

    - Fixes to support for building with LLVM-based toolchains. The
    resulting images are known not to boot yet.

    I don't anticipate a part two, but I'll probably have something early
    in the RCs to finish up the K210 support"

    * tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits)
    riscv: create a loader.bin boot image for Kendryte SoC
    riscv: Kendryte K210 default config
    riscv: Add Kendryte K210 device tree
    riscv: Select required drivers for Kendryte SOC
    riscv: Add Kendryte K210 SoC support
    riscv: Add SOC early init support
    riscv: Unaligned load/store handling for M_MODE
    RISC-V: Support cpu hotplug
    RISC-V: Add supported for ordered booting method using HSM
    RISC-V: Add SBI HSM extension definitions
    RISC-V: Export SBI error to linux error mapping function
    RISC-V: Add cpu_ops and modify default booting method
    RISC-V: Move relocate and few other functions out of __init
    RISC-V: Implement new SBI v0.2 extensions
    RISC-V: Introduce a new config for SBI v0.1
    RISC-V: Add SBI v0.2 extension definitions
    RISC-V: Add basic support for SBI v0.2
    RISC-V: Mark existing SBI as 0.1 SBI.
    riscv: Use macro definition instead of magic number
    riscv: Add support to dump the kernel page tables
    ...

    Linus Torvalds
     

09 Apr, 2020

9 commits

  • Pull 9p documentation update from Dominique Martinet:
    "Document the new O_NONBLOCK short read behavior"

    * tag '9p-for-5.7-2' of git://github.com/martinetd/linux:
    9p: document short read behaviour with O_NONBLOCK

    Linus Torvalds
     
  • Pull ceph updates from Ilya Dryomov:
    "The main items are:

    - support for asynchronous create and unlink (Jeff Layton).

    Creates and unlinks are satisfied locally, without waiting for a
    reply from the MDS, provided the client has been granted
    appropriate caps (new in v15.y.z ("Octopus") release). This can be
    a big help for metadata heavy workloads such as tar and rsync.
    Opt-in with the new nowsync mount option.

    - multiple blk-mq queues for rbd (Hannes Reinecke and myself).

    When the driver was converted to blk-mq, we settled on a single
    blk-mq queue because of a global lock in libceph and some other
    technical debt. These have since been addressed, so allocate a
    queue per CPU to enhance parallelism.

    - don't hold onto caps that aren't actually needed (Zheng Yan).

    This has been our long-standing behavior, but it causes issues with
    some active/standby applications (synchronous I/O, stalls if the
    standby goes down, etc).

    - .snap directory timestamps consistent with ceph-fuse (Luis
    Henriques)"

    * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits)
    ceph: fix snapshot directory timestamps
    ceph: wait for async creating inode before requesting new max size
    ceph: don't skip updating wanted caps when cap is stale
    ceph: request new max size only when there is auth cap
    ceph: cleanup return error of try_get_cap_refs()
    ceph: return ceph_mdsc_do_request() errors from __get_parent()
    ceph: check all mds' caps after page writeback
    ceph: update i_requested_max_size only when sending cap msg to auth mds
    ceph: simplify calling of ceph_get_fmode()
    ceph: remove delay check logic from ceph_check_caps()
    ceph: consider inode's last read/write when calculating wanted caps
    ceph: always renew caps if mds_wanted is insufficient
    ceph: update dentry lease for async create
    ceph: attempt to do async create when possible
    ceph: cache layout in parent dir on first sync create
    ceph: add new MDS req field to hold delegated inode number
    ceph: decode interval_sets for delegated inos
    ceph: make ceph_fill_inode non-static
    ceph: perform asynchronous unlink if we have sufficient caps
    ceph: don't take refs to want mask unless we have all bits
    ...

    Linus Torvalds
     
  • Pull overlayfs update from Miklos Szeredi:

    - Fix failure to copy-up files from certain NFSv4 mounts

    - Sort out inconsistencies between st_ino and i_ino (used in /proc/locks)

    - Allow consistent (POSIX-y) inode numbering in more cases

    - Allow virtiofs to be used as upper layer

    - Miscellaneous cleanups and fixes

    * tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: document xino expected behavior
    ovl: enable xino automatically in more cases
    ovl: avoid possible inode number collisions with xino=on
    ovl: use a private non-persistent ino pool
    ovl: fix WARN_ON nlink drop to zero
    ovl: fix a typo in comment
    ovl: replace zero-length array with flexible-array member
    ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old
    ovl: strict upper fs requirements for remote upper fs
    ovl: check if upper fs supports RENAME_WHITEOUT
    ovl: allow remote upper
    ovl: decide if revalidate needed on a per-dentry basis
    ovl: separate detection of remote upper layer from stacked overlay
    ovl: restructure dentry revalidation
    ovl: ignore failure to copy up unknown xattrs
    ovl: document permission model
    ovl: simplify i_ino initialization
    ovl: factor out helper ovl_get_root()
    ovl: fix out of date comment and unreachable code
    ovl: fix value of i_ino for lower hardlink corner case

    Linus Torvalds
     
  • Pull iomap fix from Darrick Wong:
    "Fix a problem in readahead where we can crash if we can't allocate a
    full bio due to GFP_NORETRY"

    * tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    iomap: Handle memory allocation failure in readahead

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a Kconfig dependency for hisilicon as well as a double free
    in marvell/octeontx"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: marvell/octeontx - fix double free of ptr
    crypto: hisilicon - Fix build error

    Linus Torvalds
     
  • Pull watchdog updates from Wim Van Sebroeck:

    - add TI K3 RTI watchdog

    - add stop_on_reboot parameter to control reboot policy

    - wm831x_wdt: Remove GPIO handling

    - several small fixes, improvements and clean-ups

    * tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog:
    watchdog: Add K3 RTI watchdog support
    dt-bindings: watchdog: Add support for TI K3 RTI watchdog
    watchdog: ziirave_wdt: change name to be more specific
    watchdog: orion: use 0 for unset heartbeat
    watchdog: npcm: remove whitespaces
    watchdog: reset last_hw_keepalive time at start
    watchdog: imx2_wdt: Drop .remove callback
    watchdog: Add stop_on_reboot parameter to control reboot policy
    watchdog: wm831x_wdt: Remove GPIO handling
    watchdog: imx7ulp: Remove unused include of init.h
    watchdog: imx_sc_wdt: Remove unused includes
    watchdog: qcom: Use irq flags from firmware
    watchdog: pm8916_wdt: Add system sleep callbacks
    watchdog: qcom-wdt: disable pretimeout on timer platform

    Linus Torvalds
     
  • …ernel/git/chrome-platform/linux

    Pull chrome platform updates from Benson Leung:

    cros-usbpd-notify and cros_ec_typec:
    - Add a new notification driver that handles and dispatches USB PD
    related events to other drivers.
    - Add a Type C connector class driver for cros_ec

    CrOS EC:
    - Introduce a new cros_ec_cmd_xfer_status helper

    Sensors/iio:
    - A series from Gwendal that adds Cros EC sensor hub FIFO support

    Wilco EC:
    - Fix a build warning.
    - Platform data shouldn't include kernel.h

    Misc:
    - i2c api conversion complete, with i2c_new_client_device instead of
    i2c_new_device in chromeos_laptop.
    - Replace zero-length array with flexible-array member in
    cros_ec_chardev and wilco_ec
    - Update new structure for SPI transfer delays in cros_ec_spi

    * tag 'tag-chrome-platform-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (34 commits)
    platform/chrome: cros_ec_spi: Wait for USECS, not NSECS
    iio: cros_ec: Use Hertz as unit for sampling frequency
    iio: cros_ec: Report hwfifo_watermark_max
    iio: cros_ec: Expose hwfifo_timeout
    iio: cros_ec: Remove pm function
    iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO
    iio: expose iio_device_set_clock
    iio: cros_ec: Move function description to .c file
    platform/chrome: cros_ec_sensorhub: Add median filter
    platform/chrome: cros_ec_sensorhub: Add code to spread timestmap
    platform/chrome: cros_ec_sensorhub: Add FIFO support
    platform/chrome: cros_ec_sensorhub: Add the number of sensors in sensorhub
    platform/chrome: chromeos_laptop: make I2C API conversion complete
    platform/chrome: wilco_ec: event: Replace zero-length array with flexible-array member
    platform/chrome: cros_ec_chardev: Replace zero-length array with flexible-array member
    platform/chrome: cros_ec_typec: Update port info from EC
    platform/chrome: Add Type C connector class driver
    platform/chrome: cros_usbpd_notify: Pull PD_HOST_EVENT status
    platform/chrome: cros_usbpd_notify: Amend ACPI driver to plat
    platform/chrome: cros_usbpd_notify: Add driver data struct
    ...

    Linus Torvalds
     
  • Pull libnvdimm and dax updates from Dan Williams:
    "There were multiple touches outside of drivers/nvdimm/ this round to
    add cross arch compatibility to the devm_memremap_pages() interface,
    enhance numa information for persistent memory ranges, and add a
    zero_page_range() dax operation.

    This cycle I switched from the patchwork api to Konstantin's b4 script
    for collecting tags (from x86, PowerPC, filesystem, and device-mapper
    folks), and everything looks to have gone ok there. This has all
    appeared in -next with no reported issues.

    Summary:

    - Add support for region alignment configuration and enforcement to
    fix compatibility across architectures and PowerPC page size
    configurations.

    - Introduce 'zero_page_range' as a dax operation. This facilitates
    filesystem-dax operation without a block-device.

    - Introduce phys_to_target_node() to facilitate drivers that want to
    know resulting numa node if a given reserved address range was
    onlined.

    - Advertise a persistence-domain for of_pmem and papr_scm. The
    persistence domain indicates where cpu-store cycles need to reach
    in the platform-memory subsystem before the platform will consider
    them power-fail protected.

    - Promote numa_map_to_online_node() to a cross-kernel generic
    facility.

    - Save x86 numa information to allow for node-id lookups for reserved
    memory ranges, deploy that capability for the e820-pmem driver.

    - Pick up some miscellaneous minor fixes, that missed v5.6-final,
    including a some smatch reports in the ioctl path and some unit
    test compilation fixups.

    - Fixup some flexible-array declarations"

    * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
    dax: Move mandatory ->zero_page_range() check in alloc_dax()
    dax,iomap: Add helper dax_iomap_zero() to zero a range
    dax: Use new dax zero page method for zeroing a page
    dm,dax: Add dax zero_page_range operation
    s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
    dax, pmem: Add a dax operation zero_page_range
    pmem: Add functions for reading/writing page to/from pmem
    libnvdimm: Update persistence domain value for of_pmem and papr_scm device
    tools/test/nvdimm: Fix out of tree build
    libnvdimm/region: Fix build error
    libnvdimm/region: Replace zero-length array with flexible-array member
    libnvdimm/label: Replace zero-length array with flexible-array member
    ACPI: NFIT: Replace zero-length array with flexible-array member
    libnvdimm/region: Introduce an 'align' attribute
    libnvdimm/region: Introduce NDD_LABELING
    libnvdimm/namespace: Enforce memremap_compat_align()
    libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
    libnvdimm: Out of bounds read in __nd_ioctl()
    acpi/nfit: improve bounds checking for 'func'
    mm/memremap_pages: Introduce memremap_compat_align()
    ...

    Linus Torvalds
     
  • Pull iommu updates from Joerg Roedel:

    - ARM-SMMU support for the TLB range invalidation command in SMMUv3.2

    - ARM-SMMU introduction of command batching helpers to batch up CD and
    ATC invalidation

    - ARM-SMMU support for PCI PASID, along with necessary PCI symbol
    exports

    - Introduce a generic (actually rename an existing) IOMMU related
    pointer in struct device and reduce the IOMMU related pointers

    - Some fixes for the OMAP IOMMU driver to make it build on 64bit
    architectures

    - Various smaller fixes and improvements

    * tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (39 commits)
    iommu: Move fwspec->iommu_priv to struct dev_iommu
    iommu/virtio: Use accessor functions for iommu private data
    iommu/qcom: Use accessor functions for iommu private data
    iommu/mediatek: Use accessor functions for iommu private data
    iommu/renesas: Use accessor functions for iommu private data
    iommu/arm-smmu: Use accessor functions for iommu private data
    iommu/arm-smmu: Refactor master_cfg/fwspec usage
    iommu/arm-smmu-v3: Use accessor functions for iommu private data
    iommu: Introduce accessors for iommu private data
    iommu/arm-smmu: Fix uninitilized variable warning
    iommu: Move iommu_fwspec to struct dev_iommu
    iommu: Rename struct iommu_param to dev_iommu
    iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
    drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
    ACPI/IORT: Remove direct access of dev->iommu_fwspec
    iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
    iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
    iommu/virtio: Fix freeing of incomplete domains
    iommu/virtio: Fix sparse warning
    iommu/vt-d: Add build dependency on IOASID
    ...

    Linus Torvalds