12 Jul, 2021

4 commits

  • I know nothing about zone_device pages and !device_private pages; but if
    try_to_migrate_one() will do nothing for them, then it's better that
    try_to_migrate() filter them first, than trawl through all their vmas.

    Signed-off-by: Hugh Dickins
    Reviewed-by: Shakeel Butt
    Reviewed-by: Alistair Popple
    Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
    Cc: Andrew Morton
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: Christoph Hellwig
    Cc: Yang Shi
    Cc: Kirill A. Shutemov
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
    cleared by the time it got page table lock, page_vma_mapped_walk_done()
    must be called before returning, either explicitly, or by a final call
    to page_vma_mapped_walk() - otherwise the page table remains locked.

    Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap")
    Signed-off-by: Hugh Dickins
    Reviewed-by: Alistair Popple
    Reviewed-by: Shakeel Butt
    Reported-by: kernel test robot
    Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
    Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
    Cc: Andrew Morton
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: Christoph Hellwig
    Cc: Yang Shi
    Cc: Kirill A. Shutemov
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The kernel recovers in due course from missing Mlocked pages: but there
    was no point in calling page_mlock() (formerly known as
    try_to_munlock()) on a THP, because nothing got done even when it was
    found to be mapped in another VM_LOCKED vma.

    It's true that we need to be careful: Mlocked accounting of pte-mapped
    THPs is too difficult (so consistently avoided); but Mlocked accounting
    of only-pmd-mapped THPs is supposed to work, even when multiple mappings
    are mlocked and munlocked or munmapped. Refine the tests.

    There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
    page_mlock_one() does not even have to worry about that complication.

    (I said the kernel recovers: but would page reclaim be likely to split
    THP before rediscovering that it's VM_LOCKED? I've not followed that up)

    Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
    Signed-off-by: Hugh Dickins
    Reviewed-by: Shakeel Butt
    Acked-by: Kirill A. Shutemov
    Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
    Cc: Andrew Morton
    Cc: Alistair Popple
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: Christoph Hellwig
    Cc: Yang Shi
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Parallel developments in mm/rmap.c have left behind some out-of-date
    comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
    in try_to_migrate() itself), and try_to_migrate() returns nothing at
    all.

    TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it
    in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so
    delete the "recently referenced" comment from try_to_unmap_one() (once
    upon a time the comment was near the removed codeblock, but they drifted
    apart).

    Signed-off-by: Hugh Dickins
    Reviewed-by: Shakeel Butt
    Reviewed-by: Alistair Popple
    Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/
    Cc: Andrew Morton
    Cc: Jason Gunthorpe
    Cc: Ralph Campbell
    Cc: Christoph Hellwig
    Cc: Yang Shi
    Cc: Kirill A. Shutemov
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

11 Jul, 2021

2 commits

  • Commit dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to
    local_lock") folded in a workaround patch for pahole that was unable to
    deal with zero-sized percpu structures.

    A superior workaround is achieved with commit a0b8200d06ad ("kbuild:
    skip per-CPU BTF generation for pahole v1.18-v1.21").

    This patch reverts the dummy field and the pahole version check.

    Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull percpu fix from Dennis Zhou:
    "This is just a single change to fix percpu depopulation. The code
    relied on depopulation code written specifically for the free path and
    relied on vmalloc to do the tlb flush lazily. As we're modifying the
    backing pages during the lifetime of a chunk, we need to also flush
    the tlb accordingly.

    Guenter Roeck reported this issue in [1] on mips. I believe we just
    happen to be lucky given the much larger chunk sizes on x86 and
    consequently less churning of this memory"

    Link: https://lore.kernel.org/lkml/20210702191140.GA3166599@roeck-us.net/ [1]

    * 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
    percpu: flush tlb in pcpu_reclaim_populated()

    Linus Torvalds
     

09 Jul, 2021

10 commits

  • Patch series "Speedup mremap on ppc64", v8.

    This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
    the platform to support updating higher-level page tables without updating
    page table entries. This also needs to invalidate the Page Walk Cache on
    architecture supporting the same.

    This patch (of 3):

    Architectures like ppc64 support faster mremap only with radix
    translation. Hence allow a runtime check w.r.t support for fast mremap.

    Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com
    Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Michael Ellerman
    Cc: Kalesh Singh
    Cc: Nicholas Piggin
    Cc: Joel Fernandes
    Cc: Christophe Leroy
    Cc: Kirill A. Shutemov
    Cc: "Aneesh Kumar K . V"
    Cc: Hugh Dickins
    Cc: Kirill A. Shutemov
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • To avoid a race between rmap walk and mremap, mremap does
    take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
    a page table entry due to PTE moves via move_pagetables(). The kernel
    does further optimization of this lock such that if we are going to find
    the newly added vma after the old vma, the rmap lock is not taken. This
    is because rmap walk would find the vmas in the same order and if we don't
    find the page table attached to older vma we would find it with the new
    vma which we would iterate later.

    As explained in commit eb66ae030829 ("mremap: properly flush TLB before
    releasing the page") mremap is special in that it doesn't take ownership
    of the page. The optimized version for PUD/PMD aligned mremap also
    doesn't hold the ptl lock. This can result in stale TLB entries as show
    below.

    This patch updates the rmap locking requirement in mremap to handle the race condition
    explained below with optimized mremap::

    Optmized PMD move

    CPU 1 CPU 2 CPU 3

    mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

    addr = old_addr
    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
    *new_addr = 10; and fills
    TLB with new addr
    and old pfn

    unlock(pmd_ptl)
    ptep_clear_flush()
    old pfn is free.
    Stale TLB entry

    Optimized PUD move also suffers from a similar race. Both the above race
    condition can be fixed if we force mremap path to take rmap lock.

    Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
    Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
    Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
    Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
    Signed-off-by: Aneesh Kumar K.V
    Acked-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Christophe Leroy
    Cc: Joel Fernandes
    Cc: Kalesh Singh
    Cc: Kirill A. Shutemov
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Stephen Rothwell
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • pmd/pud_populate is the right interface to be used to set the respective
    page table entries. Some architectures like ppc64 do assume that
    set_pmd/pud_at can only be used to set a hugepage PTE. Since we are not
    setting up a hugepage PTE here, use the pmd/pud_populate interface.

    Link: https://lkml.kernel.org/r/20210616045239.370802-6-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Christophe Leroy
    Cc: Hugh Dickins
    Cc: Joel Fernandes
    Cc: Kalesh Singh
    Cc: Kirill A. Shutemov
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • With two level page table don't enable move_normal_pud.

    Link: https://lkml.kernel.org/r/20210616045239.370802-5-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Christophe Leroy
    Cc: Hugh Dickins
    Cc: Joel Fernandes
    Cc: Kalesh Singh
    Cc: Kirill A. Shutemov
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD
    entries. Add a helper to move huge PUD entries on mremap().

    This will be used by a later patch to optimize mremap of PUD_SIZE aligned
    level 4 PTE mapped address

    This also make sure we support mremap on huge PUD entries even with
    CONFIG_HAVE_MOVE_PUD disabled.

    [aneesh.kumar@linux.ibm.com: fix build failure with clang-10]
    Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161
    Link: https://lkml.kernel.org/r/20210619134310.89098-1-aneesh.kumar@linux.ibm.com

    Link: https://lkml.kernel.org/r/20210616045239.370802-4-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V
    Cc: Christophe Leroy
    Cc: Hugh Dickins
    Cc: Joel Fernandes
    Cc: Kalesh Singh
    Cc: Kirill A. Shutemov
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3.

    Add setup_initial_init_mm() helper, then use it to cleanup the text, data
    and brk setup code.

    This patch (of 15):

    Add setup_initial_init_mm() helper to setup kernel text, data and brk.

    Link: https://lkml.kernel.org/r/20210608083418.137226-1-wangkefeng.wang@huawei.com
    Link: https://lkml.kernel.org/r/20210608083418.137226-2-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang
    Cc: Souptick Joarder
    Cc: Christophe Leroy
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Jonas Bonn
    Cc: Ley Foon Tan
    Cc: Michael Ellerman
    Cc: Nick Hu
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Rich Felker
    Cc: Russell King (Oracle)
    Cc: Stafford Horne
    Cc: Stefan Kristiansson
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kefeng Wang
     
  • It is unsafe to allow saving of secretmem areas to the hibernation
    snapshot as they would be visible after the resume and this essentially
    will defeat the purpose of secret memory mappings.

    Prevent hibernation whenever there are active secret memory users.

    Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Acked-by: David Hildenbrand
    Acked-by: James Bottomley
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christopher Lameter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Elena Reshetova
    Cc: Hagen Paul Pfeifer
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Bottomley
    Cc: "Kirill A. Shutemov"
    Cc: Mark Rutland
    Cc: Matthew Wilcox
    Cc: Michael Kerrisk
    Cc: Palmer Dabbelt
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: Tycho Andersen
    Cc: Will Deacon
    Cc: kernel test robot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Introduce "memfd_secret" system call with the ability to create memory
    areas visible only in the context of the owning process and not mapped not
    only to other processes but in the kernel page tables as well.

    The secretmem feature is off by default and the user must explicitly
    enable it at the boot time.

    Once secretmem is enabled, the user will be able to create a file
    descriptor using the memfd_secret() system call. The memory areas created
    by mmap() calls from this file descriptor will be unmapped from the kernel
    direct map and they will be only mapped in the page table of the processes
    that have access to the file descriptor.

    Secretmem is designed to provide the following protections:

    * Enhanced protection (in conjunction with all the other in-kernel
    attack prevention systems) against ROP attacks. Seceretmem makes
    "simple" ROP insufficient to perform exfiltration, which increases the
    required complexity of the attack. Along with other protections like
    the kernel stack size limit and address space layout randomization which
    make finding gadgets is really hard, absence of any in-kernel primitive
    for accessing secret memory means the one gadget ROP attack can't work.
    Since the only way to access secret memory is to reconstruct the missing
    mapping entry, the attacker has to recover the physical page and insert
    a PTE pointing to it in the kernel and then retrieve the contents. That
    takes at least three gadgets which is a level of difficulty beyond most
    standard attacks.

    * Prevent cross-process secret userspace memory exposures. Once the
    secret memory is allocated, the user can't accidentally pass it into the
    kernel to be transmitted somewhere. The secreremem pages cannot be
    accessed via the direct map and they are disallowed in GUP.

    * Harden against exploited kernel flaws. In order to access secretmem,
    a kernel-side attack would need to either walk the page tables and
    create new ones, or spawn a new privileged uiserspace process to perform
    secrets exfiltration using ptrace.

    The file descriptor based memory has several advantages over the
    "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File
    descriptor approach allows explicit and controlled sharing of the memory
    areas, it allows to seal the operations. Besides, file descriptor based
    memory paves the way for VMMs to remove the secret memory range from the
    userspace hipervisor process, for instance QEMU. Andy Lutomirski says:

    "Getting fd-backed memory into a guest will take some possibly major
    work in the kernel, but getting vma-backed memory into a guest without
    mapping it in the host user address space seems much, much worse."

    memfd_secret() is made a dedicated system call rather than an extension to
    memfd_create() because it's purpose is to allow the user to create more
    secure memory mappings rather than to simply allow file based access to
    the memory. Nowadays a new system call cost is negligible while it is way
    simpler for userspace to deal with a clear-cut system calls than with a
    multiplexer or an overloaded syscall. Moreover, the initial
    implementation of memfd_secret() is completely distinct from
    memfd_create() so there is no much sense in overloading memfd_create() to
    begin with. If there will be a need for code sharing between these
    implementation it can be easily achieved without a need to adjust user
    visible APIs.

    The secret memory remains accessible in the process context using uaccess
    primitives, but it is not exposed to the kernel otherwise; secret memory
    areas are removed from the direct map and functions in the
    follow_page()/get_user_page() family will refuse to return a page that
    belongs to the secret memory area.

    Once there will be a use case that will require exposing secretmem to the
    kernel it will be an opt-in request in the system call flags so that user
    would have to decide what data can be exposed to the kernel.

    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance. However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice". Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.

    Pages in the secretmem regions are unevictable and unmovable to avoid
    accidental exposure of the sensitive data via swap or during page
    migration.

    Since the secretmem mappings are locked in memory they cannot exceed
    RLIMIT_MEMLOCK. Since these mappings are already locked independently
    from mlock(), an attempt to mlock()/munlock() secretmem range would fail
    and mlockall()/munlockall() will ignore secretmem mappings.

    However, unlike mlock()ed memory, secretmem currently behaves more like
    long-term GUP: secretmem mappings are unmovable mappings directly consumed
    by user space. With default limits, there is no excessive use of
    secretmem and it poses no real problem in combination with
    ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
    balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.

    A page that was a part of the secret memory area is cleared when it is
    freed to ensure the data is not exposed to the next user of that page.

    The following example demonstrates creation of a secret mapping (error
    handling is omitted):

    fd = memfd_secret(0);
    ftruncate(fd, MAP_SIZE);
    ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
    MAP_SHARED, fd, 0);

    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

    [akpm@linux-foundation.org: suppress Kconfig whine]

    Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Acked-by: Hagen Paul Pfeifer
    Acked-by: James Bottomley
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christopher Lameter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: Elena Reshetova
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Bottomley
    Cc: "Kirill A. Shutemov"
    Cc: Matthew Wilcox
    Cc: Mark Rutland
    Cc: Michael Kerrisk
    Cc: Palmer Dabbelt
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: Tycho Andersen
    Cc: Will Deacon
    Cc: David Hildenbrand
    Cc: kernel test robot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20.

    This is an implementation of "secret" mappings backed by a file
    descriptor.

    The file descriptor backing secret memory mappings is created using a
    dedicated memfd_secret system call The desired protection mode for the
    memory is configured using flags parameter of the system call. The mmap()
    of the file descriptor created with memfd_secret() will create a "secret"
    memory mapping. The pages in that mapping will be marked as not present
    in the direct map and will be present only in the page table of the owning
    mm.

    Although normally Linux userspace mappings are protected from other users,
    such secret mappings are useful for environments where a hostile tenant is
    trying to trick the kernel into giving them access to other tenants
    mappings.

    It's designed to provide the following protections:

    * Enhanced protection (in conjunction with all the other in-kernel
    attack prevention systems) against ROP attacks. Seceretmem makes
    "simple" ROP insufficient to perform exfiltration, which increases the
    required complexity of the attack. Along with other protections like
    the kernel stack size limit and address space layout randomization which
    make finding gadgets is really hard, absence of any in-kernel primitive
    for accessing secret memory means the one gadget ROP attack can't work.
    Since the only way to access secret memory is to reconstruct the missing
    mapping entry, the attacker has to recover the physical page and insert
    a PTE pointing to it in the kernel and then retrieve the contents. That
    takes at least three gadgets which is a level of difficulty beyond most
    standard attacks.

    * Prevent cross-process secret userspace memory exposures. Once the
    secret memory is allocated, the user can't accidentally pass it into the
    kernel to be transmitted somewhere. The secreremem pages cannot be
    accessed via the direct map and they are disallowed in GUP.

    * Harden against exploited kernel flaws. In order to access secretmem,
    a kernel-side attack would need to either walk the page tables and
    create new ones, or spawn a new privileged uiserspace process to perform
    secrets exfiltration using ptrace.

    In the future the secret mappings may be used as a mean to protect guest
    memory in a virtual machine host.

    For demonstration of secret memory usage we've created a userspace library

    https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

    that does two things: the first is act as a preloader for openssl to
    redirect all the OPENSSL_malloc calls to secret memory meaning any secret
    keys get automatically protected this way and the other thing it does is
    expose the API to the user who needs it. We anticipate that a lot of the
    use cases would be like the openssl one: many toolkits that deal with
    secret keys already have special handling for the memory to try to give
    them greater protection, so this would simply be pluggable into the
    toolkits without any need for user application modification.

    Hiding secret memory mappings behind an anonymous file allows usage of the
    page cache for tracking pages allocated for the "secret" mappings as well
    as using address_space_operations for e.g. page migration callbacks.

    The anonymous file may be also used implicitly, like hugetlb files, to
    implement mmap(MAP_SECRET) and use the secret memory areas with "native"
    mm ABIs in the future.

    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance. However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice". Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.

    In addition, there is also a long term goal to improve management of the
    direct map.

    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

    This patch (of 7):

    It will be used by the upcoming secret memory implementation.

    Link: https://lkml.kernel.org/r/20210518072034.31572-1-rppt@kernel.org
    Link: https://lkml.kernel.org/r/20210518072034.31572-2-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Reviewed-by: David Hildenbrand
    Acked-by: James Bottomley
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christopher Lameter
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Elena Reshetova
    Cc: Hagen Paul Pfeifer
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Bottomley
    Cc: "Kirill A. Shutemov"
    Cc: Mark Rutland
    Cc: Matthew Wilcox
    Cc: Michael Kerrisk
    Cc: Palmer Dabbelt
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Roman Gushchin
    Cc: Shakeel Butt
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: Tycho Andersen
    Cc: Will Deacon
    Cc: kernel test robot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Many stack traces are similar so there are many similar arrays.
    Stackdepot saves each unique stack only once.

    Replace field addrs in struct track with depot_stack_handle_t handle. Use
    stackdepot to save stack trace.

    The benefits are smaller memory overhead and possibility to aggregate
    per-cache statistics in the future using the stackdepot handle instead of
    matching stacks manually.

    [rdunlap@infradead.org: rename save_stack_trace()]
    Link: https://lkml.kernel.org/r/20210513051920.29320-1-rdunlap@infradead.org
    [vbabka@suse.cz: fix lockdep splat]
    Link: https://lkml.kernel.org/r/20210516195150.26740-1-vbabka@suse.czLink: https://lkml.kernel.org/r/20210414163434.4376-1-glittao@gmail.com

    Signed-off-by: Oliver Glitta
    Signed-off-by: Randy Dunlap
    Signed-off-by: Vlastimil Babka
    Reviewed-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oliver Glitta
     

05 Jul, 2021

3 commits

  • …git/paulmck/linux-rcu

    Pull RCU updates from Paul McKenney:

    - Bitmap parsing support for "all" as an alias for all bits

    - Documentation updates

    - Miscellaneous fixes, including some that overlap into mm and lockdep

    - kvfree_rcu() updates

    - mem_dump_obj() updates, with acks from one of the slab-allocator
    maintainers

    - RCU NOCB CPU updates, including limited deoffloading

    - SRCU updates

    - Tasks-RCU updates

    - Torture-test updates

    * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
    tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
    rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
    rcu: Add missing __releases() annotation
    rcu: Remove obsolete rcu_read_unlock() deadlock commentary
    rcu: Improve comments describing RCU read-side critical sections
    rcu: Create an unrcu_pointer() to remove __rcu from a pointer
    srcu: Early test SRCU polling start
    rcu: Fix various typos in comments
    rcu/nocb: Unify timers
    rcu/nocb: Prepare for fine-grained deferred wakeup
    rcu/nocb: Only cancel nocb timer if not polling
    rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
    rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
    rcu/nocb: Allow de-offloading rdp leader
    rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
    rcu: Don't penalize priority boosting when there is nothing to boost
    rcu: Point to documentation of ordering guarantees
    rcu: Make rcu_gp_cleanup() be noinline for tracing
    rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
    rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
    ...

    Linus Torvalds
     
  • Pull memblock updates from Mike Rapoport:
    "Fix arm crashes caused by holes in the memory map.

    The coordination between freeing of unused memory map, pfn_valid() and
    core mm assumptions about validity of the memory map in various ranges
    was not designed for complex layouts of the physical memory with a lot
    of holes all over the place.

    Kefen Wang reported crashes in move_freepages() on a system with the
    following memory layout [1]:

    node 0: [mem 0x0000000080a00000-0x00000000855fffff]
    node 0: [mem 0x0000000086a00000-0x0000000087dfffff]
    node 0: [mem 0x000000008bd00000-0x000000008c4fffff]
    node 0: [mem 0x000000008e300000-0x000000008ecfffff]
    node 0: [mem 0x0000000090d00000-0x00000000bfffffff]
    node 0: [mem 0x00000000cc000000-0x00000000dc9fffff]
    node 0: [mem 0x00000000de700000-0x00000000de9fffff]
    node 0: [mem 0x00000000e0800000-0x00000000e0bfffff]
    node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
    node 0: [mem 0x00000000fda00000-0x00000000ffffefff]

    These crashes can be mitigated by enabling CONFIG_HOLES_IN_ZONE on ARM
    and essentially turning pfn_valid_within() to pfn_valid() instead of
    having it hardwired to 1 on that architecture, but this would require
    to keep CONFIG_HOLES_IN_ZONE solely for this purpose.

    A cleaner approach is to update ARM's implementation of pfn_valid() to
    take into accounting rounding of the freed memory map to pageblock
    boundaries and make sure it returns true for PFNs that have memory map
    entries even if there is no physical memory backing those PFNs"

    Link: https://lore.kernel.org/lkml/2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com [1]

    * tag 'memblock-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
    arm: extend pfn_valid to take into account freed memory map alignment
    memblock: ensure there is no overflow in memblock_overlaps_region()
    memblock: align freed memory map on pageblock boundaries with SPARSEMEM
    memblock: free_unused_memmap: use pageblock units instead of MAX_ORDER

    Linus Torvalds
     
  • Prior to "percpu: implement partial chunk depopulation",
    pcpu_depopulate_chunk() was called only on the destruction path. This
    meant the virtual address range was on its way back to vmalloc which
    will handle flushing the tlbs for us.

    However, with pcpu_reclaim_populated(), we are now calling
    pcpu_depopulate_chunk() during the active lifecycle of a chunk.
    Therefore, we need to flush the tlb as well otherwise we can end up
    accessing the wrong page through an invalid tlb mapping as reported in
    [1].

    [1] https://lore.kernel.org/lkml/20210702191140.GA3166599@roeck-us.net/

    Fixes: f183324133ea ("percpu: implement partial chunk depopulation")
    Reported-and-tested-by: Guenter Roeck
    Signed-off-by: Dennis Zhou

    Dennis Zhou
     

04 Jul, 2021

1 commit

  • Pull iov_iter updates from Al Viro:
    "iov_iter cleanups and fixes.

    There are followups, but this is what had sat in -next this cycle. IMO
    the macro forest in there became much thinner and easier to follow..."

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller
    clean up copy_mc_pipe_to_iter()
    pipe_zero(): we don't need no stinkin' kmap_atomic()...
    iov_iter: clean csum_and_copy_...() primitives up a bit
    copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases
    copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases
    iterate_xarray(): only of the first iteration we might get offset != 0
    pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}
    iov_iter: make iterator callbacks use base and len instead of iovec
    iov_iter: make the amount already copied available to iterator callbacks
    iov_iter: get rid of separate bvec and xarray callbacks
    iov_iter: teach iterate_{bvec,xarray}() about possible short copies
    iterate_bvec(): expand bvec.h macro forest, massage a bit
    iov_iter: unify iterate_iovec and iterate_kvec
    iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec
    iterate_and_advance(): get rid of magic in case when n is 0
    csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()
    iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
    [xarray] iov_iter_npages(): just use DIV_ROUND_UP()
    iov_iter_npages(): don't bother with iterate_all_kinds()
    ...

    Linus Torvalds
     

03 Jul, 2021

1 commit

  • Merge more updates from Andrew Morton:
    "190 patches.

    Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
    vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
    migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
    zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
    core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
    signals, exec, kcov, selftests, compress/decompress, and ipc"

    * emailed patches from Andrew Morton : (190 commits)
    ipc/util.c: use binary search for max_idx
    ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
    ipc: use kmalloc for msg_queue and shmid_kernel
    ipc sem: use kvmalloc for sem_undo allocation
    lib/decompressors: remove set but not used variabled 'level'
    selftests/vm/pkeys: exercise x86 XSAVE init state
    selftests/vm/pkeys: refill shadow register after implicit kernel write
    selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
    selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
    kcov: add __no_sanitize_coverage to fix noinstr for all architectures
    exec: remove checks in __register_bimfmt()
    x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
    hfsplus: report create_date to kstat.btime
    hfsplus: remove unnecessary oom message
    nilfs2: remove redundant continue statement in a while-loop
    kprobes: remove duplicated strong free_insn_page in x86 and s390
    init: print out unknown kernel parameters
    checkpatch: do not complain about positive return values starting with EPOLL
    checkpatch: improve the indented label test
    checkpatch: scripts/spdxcheck.py now requires python3
    ...

    Linus Torvalds
     

02 Jul, 2021

19 commits

  • Pull percpu updates from Dennis Zhou:

    - percpu chunk depopulation - depopulate backing pages for chunks with
    empty pages when we exceed a global threshold without those pages.
    This lets us reclaim a portion of memory that would previously be
    lost until the full chunk would be freed (possibly never).

    - memcg accounting cleanup - previously separate chunks were managed
    for normal allocations and __GFP_ACCOUNT allocations. These are now
    consolidated which cleans up the code quite a bit.

    - a few misc clean ups for clang warnings

    * 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
    percpu: optimize locking in pcpu_balance_workfn()
    percpu: initialize best_upa variable
    percpu: rework memcg accounting
    mm, memcg: introduce mem_cgroup_kmem_disabled()
    mm, memcg: mark cgroup_memory_nosocket, nokmem and noswap as __ro_after_init
    percpu: make symbol 'pcpu_free_slot' static
    percpu: implement partial chunk depopulation
    percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1
    percpu: factor out pcpu_check_block_hint()
    percpu: split __pcpu_balance_workfn()
    percpu: fix a comment about the chunks ordering

    Linus Torvalds
     
  • Some devices require exclusive write access to shared virtual memory (SVM)
    ranges to perform atomic operations on that memory. This requires CPU
    page tables to be updated to deny access whilst atomic operations are
    occurring.

    In order to do this introduce a new swap entry type
    (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for exclusive
    access by a device all page table mappings for the particular range are
    replaced with device exclusive swap entries. This causes any CPU access
    to the page to result in a fault.

    Faults are resovled by replacing the faulting entry with the original
    mapping. This results in MMU notifiers being called which a driver uses
    to update access permissions such as revoking atomic access. After
    notifiers have been called the device will no longer have exclusive access
    to the region.

    Walking of the page tables to find the target pages is handled by
    get_user_pages() rather than a direct page table walk. A direct page
    table walk similar to what migrate_vma_collect()/unmap() does could also
    have been utilised. However this resulted in more code similar in
    functionality to what get_user_pages() provides as page faulting is
    required to make the PTEs present and to break COW.

    [dan.carpenter@oracle.com: fix signedness bug in make_device_exclusive_range()]
    Link: https://lkml.kernel.org/r/YNIz5NVnZ5GiZ3u1@mwanda

    Link: https://lkml.kernel.org/r/20210616105937.23201-8-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Signed-off-by: Dan Carpenter
    Reviewed-by: Christoph Hellwig
    Cc: Ben Skeggs
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Peter Xu
    Cc: Ralph Campbell
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • Currently if copy_nonpresent_pte() returns a non-zero value it is assumed
    to be a swap entry which requires further processing outside the loop in
    copy_pte_range() after dropping locks. This prevents other values being
    returned to signal conditions such as failure which a subsequent change
    requires.

    Instead make copy_nonpresent_pte() return an error code if further
    processing is required and read the value for the swap entry in the main
    loop under the ptl.

    Link: https://lkml.kernel.org/r/20210616105937.23201-7-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Reviewed-by: Peter Xu
    Cc: Ben Skeggs
    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Ralph Campbell
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • MMU notifier ranges have a migrate_pgmap_owner field which is used by
    drivers to store a pointer. This is subsequently used by the driver
    callback to filter MMU_NOTIFY_MIGRATE events. Other notifier event types
    can also benefit from this filtering, so rename the 'migrate_pgmap_owner'
    field to 'owner' and create a new notifier initialisation function to
    initialise this field.

    Link: https://lkml.kernel.org/r/20210616105937.23201-6-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Suggested-by: Peter Xu
    Reviewed-by: Peter Xu
    Cc: Ben Skeggs
    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Ralph Campbell
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • Migration is currently implemented as a mode of operation for
    try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag
    or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE.

    However it does not have much in common with the rest of the unmap
    functionality of try_to_unmap_one() and thus splitting it into a separate
    function reduces the complexity of try_to_unmap_one() making it more
    readable.

    Several simplifications can also be made in try_to_migrate_one() based on
    the following observations:

    - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK.
    - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON.
    - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH.

    TTU_SPLIT_FREEZE is a special case of migration used when splitting an
    anonymous page. This is most easily dealt with by calling the correct
    function from unmap_page() in mm/huge_memory.c - either try_to_migrate()
    for PageAnon or try_to_unmap().

    Link: https://lkml.kernel.org/r/20210616105937.23201-5-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ralph Campbell
    Cc: Ben Skeggs
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Peter Xu
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • The behaviour of try_to_unmap_one() is difficult to follow because it
    performs different operations based on a fairly large set of flags used in
    different combinations.

    TTU_MUNLOCK is one such flag. However it is exclusively used by
    try_to_munlock() which specifies no other flags. Therefore rather than
    overload try_to_unmap_one() with unrelated behaviour split this out into
    it's own function and remove the flag.

    Link: https://lkml.kernel.org/r/20210616105937.23201-4-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Reviewed-by: Ralph Campbell
    Reviewed-by: Christoph Hellwig
    Cc: Ben Skeggs
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Peter Xu
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • Both migration and device private pages use special swap entries that are
    manipluated by a range of inline functions. The arguments to these are
    somewhat inconsistent so rework them to remove flag type arguments and to
    make the arguments similar for both read and write entry creation.

    Link: https://lkml.kernel.org/r/20210616105937.23201-3-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Ralph Campbell
    Cc: Ben Skeggs
    Cc: Hugh Dickins
    Cc: John Hubbard
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Peter Xu
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • Patch series "Add support for SVM atomics in Nouveau", v11.

    Introduction
    ============

    Some devices have features such as atomic PTE bits that can be used to
    implement atomic access to system memory. To support atomic operations to
    a shared virtual memory page such a device needs access to that page which
    is exclusive of the CPU. This series introduces a mechanism to
    temporarily unmap pages granting exclusive access to a device.

    These changes are required to support OpenCL atomic operations in Nouveau
    to shared virtual memory (SVM) regions allocated with the
    CL_MEM_SVM_ATOMICS clSVMAlloc flag. A more complete description of the
    OpenCL SVM feature is available at
    https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
    OpenCL_API.html#_shared_virtual_memory .

    Implementation
    ==============

    Exclusive device access is implemented by adding a new swap entry type
    (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
    difference is that on fault the original entry is immediately restored by
    the fault handler instead of waiting.

    Restoring the entry triggers calls to MMU notifers which allows a device
    driver to revoke the atomic access permission from the GPU prior to the
    CPU finalising the entry.

    Patches
    =======

    Patches 1 & 2 refactor existing migration and device private entry
    functions.

    Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
    functionality into separate functions - try_to_migrate_one() and
    try_to_munlock_one().

    Patch 5 renames some existing code but does not introduce functionality.

    Patch 6 is a small clean-up to swap entry handling in copy_pte_range().

    Patch 7 contains the bulk of the implementation for device exclusive
    memory.

    Patch 8 contains some additions to the HMM selftests to ensure everything
    works as expected.

    Patch 9 is a cleanup for the Nouveau SVM implementation.

    Patch 10 contains the implementation of atomic access for the Nouveau
    driver.

    Testing
    =======

    This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
    which checks that GPU atomic accesses to system memory are atomic.
    Without this series the test fails as there is no way of write-protecting
    the page mapping which results in the device clobbering CPU writes. For
    reference the test is available at
    https://ozlabs.org/~apopple/opencl_svm_atomics/

    Further testing has been performed by adding support for testing exclusive
    access to the hmm-tests kselftests.

    This patch (of 10):

    Remove multiple similar inline functions for dealing with different types
    of special swap entries.

    Both migration and device private swap entries use the swap offset to
    store a pfn. Instead of multiple inline functions to obtain a struct page
    for each swap entry type use a common function pfn_swap_entry_to_page().
    Also open-code the various entry_to_pfn() functions as this results is
    shorter code that is easier to understand.

    Link: https://lkml.kernel.org/r/20210616105937.23201-1-apopple@nvidia.com
    Link: https://lkml.kernel.org/r/20210616105937.23201-2-apopple@nvidia.com
    Signed-off-by: Alistair Popple
    Reviewed-by: Ralph Campbell
    Reviewed-by: Christoph Hellwig
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Hugh Dickins
    Cc: Peter Xu
    Cc: Shakeel Butt
    Cc: Ben Skeggs
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair Popple
     
  • Unconditionally use unbound work queue, and not just if wq_power_efficient
    is true. Because if the system is idle, KFENCE may wait, and by being run
    on the unbound work queue, we permit the scheduler to make better
    scheduling decisions and not require pinning KFENCE to the same CPU upon
    waking up.

    Link: https://lkml.kernel.org/r/20210521111630.472579-1-elver@google.com
    Fixes: 36f0b35d0894 ("kfence: use power-efficient work queue to run delayed work")
    Signed-off-by: Marco Elver
    Reported-by: Hillf Danton
    Reviewed-by: Alexander Potapenko
    Cc: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco Elver
     
  • make W=1 generates the following warning in mmap_lock.c for allnoconfig

    mm/page_alloc.c:2670:5: warning: no previous prototype for `find_suitable_fallback' [-Wmissing-prototypes]
    int find_suitable_fallback(struct free_area *area, unsigned int order,
    ^~~~~~~~~~~~~~~~~~~~~~

    find_suitable_fallback is only shared outside of page_alloc.c for
    CONFIG_COMPACTION but to suppress the warning, move the protype outside of
    CONFIG_COMPACTION. It is not worth the effort at this time to find a
    clever way of allowing compaction.c to share the code or avoid the use
    entirely as the function is called on relatively slow paths.

    Link: https://lkml.kernel.org/r/20210520084809.8576-14-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning in mmap_lock.c for allnoconfig

    mm/mmap_lock.c:213:6: warning: no previous prototype for `__mmap_lock_do_trace_start_locking' [-Wmissing-prototypes]
    void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    mm/mmap_lock.c:219:6: warning: no previous prototype for `__mmap_lock_do_trace_acquire_returned' [-Wmissing-prototypes]
    void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    mm/mmap_lock.c:226:6: warning: no previous prototype for `__mmap_lock_do_trace_released' [-Wmissing-prototypes]
    void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write)

    On !CONFIG_TRACING configurations, the code is dead so put it behind an
    #ifdef.

    [cuibixuan@huawei.com: fix warning when CONFIG_TRACING is not defined]
    Link: https://lkml.kernel.org/r/20210531033426.74031-1-cuibixuan@huawei.com

    Link: https://lkml.kernel.org/r/20210520084809.8576-13-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Signed-off-by: Bixuan Cui
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for z3fold_pool

    mm/z3fold.c:171: warning: Function parameter or member 'zpool' not described in 'z3fold_pool'
    mm/z3fold.c:171: warning: Function parameter or member 'zpool_ops' not described in 'z3fold_pool'

    Commit 9a001fc19ccc ("z3fold: the 3-fold allocator for compressed pages")
    simply did not document the fields at the time. Add rudimentary
    documentation.

    Link: https://lkml.kernel.org/r/20210520084809.8576-11-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for zbud_pool

    mm/zbud.c:105: warning: Function parameter or member 'zpool' not described in 'zbud_pool'
    mm/zbud.c:105: warning: Function parameter or member 'zpool_ops' not described in 'zbud_pool'

    Commit 479305fd7172 ("zpool: remove zpool_evict()") removed the
    zpool_evict helper and added the associated zpool and operations structure
    in struct zbud_pool but did not add documentation for the fields. Add
    rudimentary documentation.

    Link: https://lkml.kernel.org/r/20210520084809.8576-10-mgorman@techsingularity.net
    Fixes: 479305fd7172 ("zpool: remove zpool_evict()")
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for __remove_memory

    mm/memory_hotplug.c:2044: warning: expecting prototype for remove_memory(). Prototype was for __remove_memory() instead

    Commit eca499ab3749 ("mm/hotplug: make remove_memory() interface usable")
    introduced the kerneldoc comment and function but the kerneldoc name and
    function name did not match.

    Link: https://lkml.kernel.org/r/20210520084809.8576-9-mgorman@techsingularity.net
    Fixes: eca499ab3749 ("mm/hotplug: make remove_memory() interface usable")
    Signed-off-by: Mel Gorman
    Reviewed-by: David Hildenbrand
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for try_online_node

    mm/memory_hotplug.c:1087: warning: expecting prototype for try_online_node(). Prototype was for __try_online_node() instead

    Commit b9ff036082cd ("mm/memory_hotplug.c: make add_memory_resource use
    __try_online_node") renamed the function but did not update the associated
    kerneldoc. The function is static and somewhat specialised in nature so
    it's not clear it warrants being a kerneldoc by moving the comment to
    try_online_node. Hence, leave the comment of the internal helper in place
    but leave it out of kerneldoc and correct the function name in the
    comment.

    Link: https://lkml.kernel.org/r/20210520084809.8576-8-mgorman@techsingularity.net
    Fixes: Commit b9ff036082cd ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node")
    Signed-off-by: Mel Gorman
    Reviewed-by: David Hildenbrand
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for mem_cgroup_calculate_protection

    mm/memcontrol.c:6468: warning: expecting prototype for mem_cgroup_protected(). Prototype was for mem_cgroup_calculate_protection() instead

    Commit 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from
    protection checks") changed the function definition but not the associated
    kerneldoc comment.

    Link: https://lkml.kernel.org/r/20210520084809.8576-7-mgorman@techsingularity.net
    Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Chris Down
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for mm/mapping_dirty_helpers.c

    mm/mapping_dirty_helpers.c:325: warning: duplicate section name 'Note'

    The helper function is very specific to one driver -- vmwgfx. While the
    two notes are separate, all of it needs to be taken into account when
    using the helper so make it one note.

    Link: https://lkml.kernel.org/r/20210520084809.8576-5-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for mm/page_alloc.c

    mm/page_alloc.c:3651:15: warning: no previous prototype for `should_fail_alloc_page' [-Wmissing-prototypes]
    noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
    ^~~~~~~~~~~~~~~~~~~~~~

    This function is deliberately split out for BPF to allow errors to be
    injected. The function is not used anywhere else so it is local to the
    file. Make it static which should still allow error injection to be used
    similar to how block/blk-core.c:should_fail_bio() works.

    Link: https://lkml.kernel.org/r/20210520084809.8576-4-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • make W=1 generates the following warning for mm/vmalloc.c

    mm/vmalloc.c:1599:6: warning: no previous prototype for `set_iounmap_nonlazy' [-Wmissing-prototypes]
    void set_iounmap_nonlazy(void)
    ^~~~~~~~~~~~~~~~~~~

    This is an arch-generic function only used by x86. On other arches, it's
    dead code. Include the header with the definition and make it x86-64
    specific.

    Link: https://lkml.kernel.org/r/20210520084809.8576-3-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Dan Streetman
    Cc: David Hildenbrand
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman