12 Jul, 2022

1 commit

  • [ Upstream commit a8749a35c39903120ec421ef2525acc8e0daa55c ]

    Linux has dozens of occurrences of vmalloc(array_size()) and
    vzalloc(array_size()). Allow to simplify the code by providing
    vmalloc_array and vcalloc, as well as the underscored variants that let
    the caller specify the GFP flags.

    Acked-by: Michal Hocko
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Sasha Levin

    Paolo Bonzini
     

30 May, 2022

1 commit

  • commit 5ad7dd882e45d7fe432c32e896e2aaa0b21746ea upstream.

    randomize_page is an mm function. It is documented like one. It contains
    the history of one. It has the naming convention of one. It looks
    just like another very similar function in mm, randomize_stack_top().
    And it has always been maintained and updated by mm people. There is no
    need for it to be in random.c. In the "which shape does not look like
    the other ones" test, pointing to randomize_page() is correct.

    So move randomize_page() into mm/util.c, right next to the similar
    randomize_stack_top() function.

    This commit contains no actual code changes.

    Cc: Andrew Morton
    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Greg Kroah-Hartman

    Jason A. Donenfeld
     

09 Mar, 2022

1 commit

  • commit 0708a0afe291bdfe1386d74d5ec1f0c27e8b9168 upstream.

    syzkaller was recently triggering an oversized kvmalloc() warning via
    xdp_umem_create().

    The triggered warning was added back in 7661809d493b ("mm: don't allow
    oversized kvmalloc() calls"). The rationale for the warning for huge
    kvmalloc sizes was as a reaction to a security bug where the size was
    more than UINT_MAX but not everything was prepared to handle unsigned
    long sizes.

    Anyway, the AF_XDP related call trace from this syzkaller report was:

    kvmalloc include/linux/mm.h:806 [inline]
    kvmalloc_array include/linux/mm.h:824 [inline]
    kvcalloc include/linux/mm.h:829 [inline]
    xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
    xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
    xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
    xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
    __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
    __do_sys_setsockopt net/socket.c:2187 [inline]
    __se_sys_setsockopt net/socket.c:2184 [inline]
    __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    Björn mentioned that requests for >2GB allocation can still be valid:

    The structure that is being allocated is the page-pinning accounting.
    AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
    still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
    PAGE_SIZE on 64 bit systems). [...]

    I could just change from U32_MAX to INT_MAX, but as I stated earlier
    that has a hacky feeling to it. [...] From my perspective, the code
    isn't broken, with the memcg limits in consideration. [...]

    Linus says:

    [...] Pretty much every time this has come up, the kernel warning has
    shown that yes, the code was broken and there really wasn't a reason
    for doing allocations that big.

    Of course, some people would be perfectly fine with the allocation
    failing, they just don't want the warning. I didn't want __GFP_NOWARN
    to shut it up originally because I wanted people to see all those
    cases, but these days I think we can just say "yeah, people can shut
    it up explicitly by saying 'go ahead and fail this allocation, don't
    warn about it'".

    So enough time has passed that by now I'd certainly be ok with [it].

    Thus allow call-sites to silence such userspace triggered splats if the
    allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
    to kvcalloc() this is already the case, so nothing else needed there.

    Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
    Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
    Suggested-by: Linus Torvalds
    Signed-off-by: Daniel Borkmann
    Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
    Cc: Björn Töpel
    Cc: Magnus Karlsson
    Cc: Willy Tarreau
    Cc: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Andrii Nakryiko
    Cc: Jakub Kicinski
    Cc: David S. Miller
    Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
    Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
    Reviewed-by: Leon Romanovsky
    Ackd-by: Michal Hocko
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

25 Sep, 2021

1 commit

  • We get an unexpected value of /proc/sys/vm/overcommit_memory after
    running the following program:

    int main()
    {
    int fd = open("/proc/sys/vm/overcommit_memory", O_RDWR);
    write(fd, "1", 1);
    write(fd, "2", 1);
    close(fd);
    }

    write(fd, "2", 1) will pass *ppos = 1 to proc_dointvec_minmax.
    proc_dointvec_minmax will return 0 without setting new_policy.

    t.data = &new_policy;
    ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos)
    -->do_proc_dointvec
    -->__do_proc_dointvec
    if (write) {
    if (proc_first_pos_non_zero_ignore(ppos, table))
    goto out;

    sysctl_overcommit_memory = new_policy;

    so sysctl_overcommit_memory will be set to an uninitialized value.

    Check whether new_policy has been changed by proc_dointvec_minmax.

    Link: https://lkml.kernel.org/r/20210923020524.13289-1-chenjun102@huawei.com
    Fixes: 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
    Signed-off-by: Chen Jun
    Acked-by: Michal Hocko
    Reviewed-by: Feng Tang
    Reviewed-by: Kefeng Wang
    Cc: Rui Xiang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Jun
     

03 Sep, 2021

1 commit

  • 'kvmalloc()' is a convenience function for people who want to do a
    kmalloc() but fall back on vmalloc() if there aren't enough physically
    contiguous pages, or if the allocation is larger than what kmalloc()
    supports.

    However, let's make sure it doesn't get _too_ easy to do crazy things
    with it. In particular, don't allow big allocations that could be due
    to integer overflow or underflow. So make sure the allocation size fits
    in an 'int', to protect against trivial integer conversion issues.

    Acked-by: Willy Tarreau
    Cc: Kees Cook
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Aug, 2021

1 commit

  • During log recovery of an XFS filesystem with 64kB directory
    buffers, rebuilding a buffer split across two log records results
    in a memory allocation warning from krealloc like this:

    xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff)
    XFS (dm-0): Unmounting Filesystem
    XFS (dm-0): Mounting V5 Filesystem
    XFS (dm-0): Starting recovery (logdev: internal)
    ------------[ cut here ]------------
    WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40
    .....
    RIP: 0010:get_page_from_freelist+0xdee/0xe40
    Call Trace:
    ? complete+0x3f/0x50
    __alloc_pages+0x16f/0x300
    alloc_pages+0x87/0x110
    kmalloc_order+0x2c/0x90
    kmalloc_order_trace+0x1d/0x90
    __kmalloc_track_caller+0x215/0x270
    ? xlog_recover_add_to_cont_trans+0x63/0x1f0
    krealloc+0x54/0xb0
    xlog_recover_add_to_cont_trans+0x63/0x1f0
    xlog_recovery_process_trans+0xc1/0xd0
    xlog_recover_process_ophdr+0x86/0x130
    xlog_recover_process_data+0x9f/0x160
    xlog_recover_process+0xa2/0x120
    xlog_do_recovery_pass+0x40b/0x7d0
    ? __irq_work_queue_local+0x4f/0x60
    ? irq_work_queue+0x3a/0x50
    xlog_do_log_recovery+0x70/0x150
    xlog_do_recover+0x38/0x1d0
    xlog_recover+0xd8/0x170
    xfs_log_mount+0x181/0x300
    xfs_mountfs+0x4a1/0x9b0
    xfs_fs_fill_super+0x3c0/0x7b0
    get_tree_bdev+0x171/0x270
    ? suffix_kstrtoint.constprop.0+0xf0/0xf0
    xfs_fs_get_tree+0x15/0x20
    vfs_get_tree+0x24/0xc0
    path_mount+0x2f5/0xaf0
    __x64_sys_mount+0x108/0x140
    do_syscall_64+0x3a/0x70
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    Essentially, we are taking a multi-order allocation from kmem_alloc()
    (which has an open coded no fail, no warn loop) and then
    reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is
    then triggering the above warning.

    This is a regression caused by converting this code from an open
    coded no fail/no warn reallocation loop to using __GFP_NOFAIL.

    What we actually need here is kvrealloc(), so that if contiguous
    page allocation fails we fall back to vmalloc() and we don't
    get nasty warnings happening in XFS.

    Fixes: 771915c4f688 ("xfs: remove kmem_realloc()")
    Signed-off-by: Dave Chinner
    Acked-by: Mel Gorman
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

13 Jul, 2021

1 commit

  • Rewrite copy_huge_page() and move it into mm/util.c so it's always
    available. Fixes an exposure of uninitialised memory on configurations
    with HUGETLB and UFFD enabled and MIGRATION disabled.

    Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Mike Kravetz
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

05 Jul, 2021

1 commit

  • …git/paulmck/linux-rcu

    Pull RCU updates from Paul McKenney:

    - Bitmap parsing support for "all" as an alias for all bits

    - Documentation updates

    - Miscellaneous fixes, including some that overlap into mm and lockdep

    - kvfree_rcu() updates

    - mem_dump_obj() updates, with acks from one of the slab-allocator
    maintainers

    - RCU NOCB CPU updates, including limited deoffloading

    - SRCU updates

    - Tasks-RCU updates

    - Torture-test updates

    * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
    tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
    rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
    rcu: Add missing __releases() annotation
    rcu: Remove obsolete rcu_read_unlock() deadlock commentary
    rcu: Improve comments describing RCU read-side critical sections
    rcu: Create an unrcu_pointer() to remove __rcu from a pointer
    srcu: Early test SRCU polling start
    rcu: Fix various typos in comments
    rcu/nocb: Unify timers
    rcu/nocb: Prepare for fine-grained deferred wakeup
    rcu/nocb: Only cancel nocb timer if not polling
    rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
    rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
    rcu/nocb: Allow de-offloading rdp leader
    rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
    rcu: Don't penalize priority boosting when there is nothing to boost
    rcu: Point to documentation of ordering guarantees
    rcu: Make rcu_gp_cleanup() be noinline for tracing
    rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
    rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
    ...

    Linus Torvalds
     

01 Jul, 2021

1 commit

  • A driver might set a page logically offline -- PageOffline() -- and turn
    the page inaccessible in the hypervisor; after that, access to page
    content can be fatal. One example is virtio-mem; while unplugged memory
    -- marked as PageOffline() can currently be read in the hypervisor, this
    will no longer be the case in the future; for example, when having a
    virtio-mem device backed by huge pages in the hypervisor.

    Some special PFN walkers -- i.e., /proc/kcore -- read content of random
    pages after checking PageOffline(); however, these PFN walkers can race
    with drivers that set PageOffline().

    Let's introduce page_offline_(begin|end|freeze|thaw) for synchronizing.

    page_offline_freeze()/page_offline_thaw() allows for a subsystem to
    synchronize with such drivers, achieving that a page cannot be set
    PageOffline() while frozen.

    page_offline_begin()/page_offline_end() is used by drivers that care about
    such races when setting a page PageOffline().

    For simplicity, use a rwsem for now; neither drivers nor users are
    performance sensitive.

    Link: https://lkml.kernel.org/r/20210526093041.8800-5-david@redhat.com
    Signed-off-by: David Hildenbrand
    Acked-by: Michal Hocko
    Reviewed-by: Mike Rapoport
    Reviewed-by: Oscar Salvador
    Cc: Aili Yao
    Cc: Alexey Dobriyan
    Cc: Alex Shi
    Cc: Haiyang Zhang
    Cc: Jason Wang
    Cc: Jiri Bohac
    Cc: "K. Y. Srinivasan"
    Cc: "Matthew Wilcox (Oracle)"
    Cc: "Michael S. Tsirkin"
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: Roman Gushchin
    Cc: Stephen Hemminger
    Cc: Steven Price
    Cc: Wei Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

11 May, 2021

1 commit

  • This commit adds enables a stack dump for the last free of an object:

    slab kmalloc-64 start c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
    [ 20.192078] meminfo_proc_show+0x40/0x4fc
    [ 20.192263] seq_read_iter+0x18c/0x4c4
    [ 20.192430] proc_reg_read_iter+0x84/0xac
    [ 20.192617] generic_file_splice_read+0xe8/0x17c
    [ 20.192816] splice_direct_to_actor+0xb8/0x290
    [ 20.193008] do_splice_direct+0xa0/0xe0
    [ 20.193185] do_sendfile+0x2d0/0x438
    [ 20.193345] sys_sendfile64+0x12c/0x140
    [ 20.193523] ret_fast_syscall+0x0/0x58
    [ 20.193695] 0xbeeacde4
    [ 20.193822] Free path:
    [ 20.193935] meminfo_proc_show+0x5c/0x4fc
    [ 20.194115] seq_read_iter+0x18c/0x4c4
    [ 20.194285] proc_reg_read_iter+0x84/0xac
    [ 20.194475] generic_file_splice_read+0xe8/0x17c
    [ 20.194685] splice_direct_to_actor+0xb8/0x290
    [ 20.194870] do_splice_direct+0xa0/0xe0
    [ 20.195014] do_sendfile+0x2d0/0x438
    [ 20.195174] sys_sendfile64+0x12c/0x140
    [ 20.195336] ret_fast_syscall+0x0/0x58
    [ 20.195491] 0xbeeacde4

    Acked-by: Vlastimil Babka
    Co-developed-by: Vaneet Narang
    Signed-off-by: Vaneet Narang
    Signed-off-by: Maninder Singh
    Signed-off-by: Paul E. McKenney

    Maninder Singh
     

06 May, 2021

2 commits

  • s/condtion/condition/

    Link: https://lkml.kernel.org/r/20210317033439.3429411-1-unixbhaskar@gmail.com
    Signed-off-by: Bhaskar Chowdhury
    Acked-by: Randy Dunlap
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaskar Chowdhury
     
  • Simplify the code by using a temporary and reduce the object size by
    using a single call to pr_cont(). Reverse a test and unindent a block
    too.

    $ size mm/util.o* (defconfig x86-64)
    text data bss dec hex filename
    7419 372 40 7831 1e97 mm/util.o.new
    7477 372 40 7889 1ed1 mm/util.o.old

    Link: https://lkml.kernel.org/r/a6e105886338f68afd35f7a13d73bcf06b0cc732.camel@perches.com
    Signed-off-by: Joe Perches
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

01 May, 2021

1 commit

  • page_mapping_file() is only used by some architectures, and then it
    is usually only used in one place. Make it a static inline function
    so other architectures don't have to carry this dead code.

    Link: https://lkml.kernel.org/r/20210317123011.350118-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: David Hildenbrand
    Acked-by: Mike Rapoport
    Cc: Huang Ying
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

09 Mar, 2021

2 commits

  • This commit adds a few crude tests for mem_dump_obj() to rcutorture
    runs. Just to prevent bitrot, you understand!

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The mem_dump_obj() functionality adds a few hundred bytes, which is a
    small price to pay. Except on kernels built with CONFIG_PRINTK=n, in
    which mem_dump_obj() messages will be suppressed. This commit therefore
    makes mem_dump_obj() be a static inline empty function on kernels built
    with CONFIG_PRINTK=n and excludes all of its support functions as well.
    This avoids kernel bloat on systems that cannot use mem_dump_obj().

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc:
    Suggested-by: Andrew Morton
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

23 Jan, 2021

3 commits

  • This commit adds vmalloc() support to mem_dump_obj(). Note that the
    vmalloc_dump_obj() function combines the checking and dumping, in
    contrast with the split between kmem_valid_obj() and kmem_dump_obj().
    The reason for the difference is that the checking in the vmalloc()
    case involves acquiring a global lock, and redundant acquisitions of
    global locks should be avoided, even on not-so-fast paths.

    Note that this change causes on-stack variables to be reported as
    vmalloc() storage from kernel_clone() or similar, depending on the degree
    of inlining that your compiler does. This is likely more helpful than
    the earlier "non-paged (local) memory".

    Cc: Andrew Morton
    Cc: Joonsoo Kim
    Cc:
    Reported-by: Andrii Nakryiko
    Acked-by: Vlastimil Babka
    Tested-by: Naresh Kamboju
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit makes mem_dump_obj() call out NULL and zero-sized pointers
    specially instead of classifying them as non-paged memory.

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Reported-by: Andrii Nakryiko
    Acked-by: Vlastimil Babka
    Tested-by: Naresh Kamboju
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There are kernel facilities such as per-CPU reference counts that give
    error messages in generic handlers or callbacks, whose messages are
    unenlightening. In the case of per-CPU reference-count underflow, this
    is not a problem when creating a new use of this facility because in that
    case the bug is almost certainly in the code implementing that new use.
    However, trouble arises when deploying across many systems, which might
    exercise corner cases that were not seen during development and testing.
    Here, it would be really nice to get some kind of hint as to which of
    several uses the underflow was caused by.

    This commit therefore exposes a mem_dump_obj() function that takes
    a pointer to memory (which must still be allocated if it has been
    dynamically allocated) and prints available information on where that
    memory came from. This pointer can reference the middle of the block as
    well as the beginning of the block, as needed by things like RCU callback
    functions and timer handlers that might not know where the beginning of
    the memory block is. These functions and handlers can use mem_dump_obj()
    to print out better hints as to where the problem might lie.

    The information printed can depend on kernel configuration. For example,
    the allocation return address can be printed only for slab and slub,
    and even then only when the necessary debug has been enabled. For slab,
    build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
    to the next power of two or use the SLAB_STORE_USER when creating the
    kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and
    boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
    if more focused use is desired. Also for slub, use CONFIG_STACKTRACE
    to enable printing of the allocation-time stack trace.

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Reported-by: Andrii Nakryiko
    [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
    [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
    [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
    [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
    [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
    [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
    Acked-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Tested-by: Naresh Kamboju
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

19 Nov, 2020

1 commit

  • Add the new vma_set_file() function to allow changing
    vma->vm_file with the necessary refcount dance.

    v2: add more users of this.
    v3: add missing EXPORT_SYMBOL, rebase on mmap cleanup,
    add comments why we drop the reference on two occasions.
    v4: make it clear that changing an anonymous vma is illegal.
    v5: move vma_set_file to mm/util.c

    Signed-off-by: Christian König
    Reviewed-by: Daniel Vetter (v2)
    Reviewed-by: Jason Gunthorpe
    Acked-by: Andrew Morton
    Link: https://patchwork.freedesktop.org/patch/399360/

    Christian König
     

17 Oct, 2020

1 commit

  • Memory allocated with kstrdup_const() must not be passed to regular
    krealloc() as it is not aware of the possibility of the chunk residing in
    .rodata. Since there are no potential users of krealloc_const() at the
    moment, let's just update the doc to make it explicit.

    Signed-off-by: Bartosz Golaszewski
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.pl
    Signed-off-by: Linus Torvalds

    Bartosz Golaszewski
     

04 Sep, 2020

1 commit

  • When the Memory Tagging Extension is enabled, two pages are identical
    only if both their data and tags are identical.

    Make the generic memcmp_pages() a __weak function and add an
    arm64-specific implementation which returns non-zero if any of the two
    pages contain valid MTE tags (PG_mte_tagged set). There isn't much
    benefit in comparing the tags of two pages since these are normally used
    for heap allocations and likely to differ anyway.

    Co-developed-by: Vincenzo Frascino
    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Catalin Marinas
    Cc: Will Deacon

    Catalin Marinas
     

08 Aug, 2020

3 commits

  • The current split between do_mmap() and do_mmap_pgoff() was introduced in
    commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
    do_mmap_pgoff()") to support MPX.

    The wrapper function do_mmap_pgoff() always passed 0 as the value of the
    vm_flags argument to do_mmap(). However, MPX support has subsequently
    been removed from the kernel and there were no more direct callers of
    do_mmap(); all calls were going via do_mmap_pgoff().

    Simplify the code by removing do_mmap_pgoff() and changing all callers to
    directly call do_mmap(), which now no longer takes a vm_flags argument.

    Signed-off-by: Peter Collingbourne
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
    Signed-off-by: Linus Torvalds

    Peter Collingbourne
     
  • When checking a performance change for will-it-scale scalability mmap test
    [1], we found very high lock contention for spinlock of percpu counter
    'vm_committed_as':

    94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

    Actually this heavy lock contention is not always necessary. The
    'vm_committed_as' needs to be very precise when the strict
    OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
    for the percpu counter.

    So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
    lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also
    add a sysctl handler to adjust it when the policy is reconfigured.

    Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
    desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test
    platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
    improvements with that test. And whether it shows improvements depends on
    if the test mmap size is bigger than the batch number computed.

    And if the lift is 16X, 1/3 of the platforms will show improvements,
    though it should help the mmap/unmap usage generally, as Michal Hocko
    mentioned:

    : I believe that there are non-synthetic worklaods which would benefit from
    : a larger batch. E.g. large in memory databases which do large mmaps
    : during startups from multiple threads.

    [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

    Signed-off-by: Feng Tang
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko
    Cc: Matthew Wilcox (Oracle)
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Qian Cai
    Cc: Kees Cook
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: Dave Hansen
    Cc: Huang Ying
    Cc: Christoph Lameter
    Cc: Dennis Zhou
    Cc: Haiyang Zhang
    Cc: kernel test robot
    Cc: "K. Y. Srinivasan"
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
    Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
    Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
    Signed-off-by: Linus Torvalds

    Feng Tang
     
  • percpu_counter_sum_positive() will provide more accurate info.

    As with percpu_counter_read_positive(), in worst case the deviation could
    be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be
    more when the batch gets enlarged.

    Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3
    microseconds on a 2S/36C/72T Skylake server in normal case, and in worst
    case where vm_committed_as's spinlock is under severe contention, it costs
    30~40 microseconds for the 2S/36C/72T Skylake sever, which should be fine
    for its only two users: /proc/meminfo and HyperV balloon driver's status
    trace per second.

    Signed-off-by: Feng Tang
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko # for /proc/meminfo
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Matthew Wilcox (Oracle)
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Qian Cai
    Cc: Andi Kleen
    Cc: Tim Chen
    Cc: Dave Hansen
    Cc: Huang Ying
    Cc: Christoph Lameter
    Cc: Dennis Zhou
    Cc: Kees Cook
    Cc: kernel test robot
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/1592725000-73486-3-git-send-email-feng.tang@intel.com
    Link: http://lkml.kernel.org/r/1594389708-60781-3-git-send-email-feng.tang@intel.com
    Signed-off-by: Linus Torvalds

    Feng Tang
     

10 Jun, 2020

3 commits

  • Convert comments that reference mmap_sem to reference mmap_lock instead.

    [akpm@linux-foundation.org: fix up linux-next leftovers]
    [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
    [akpm@linux-foundation.org: more linux-next fixups, per Michel]

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add new APIs to assert that mmap_sem is held.

    Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
    makes the assertions more tolerant of future changes to the lock type.

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Daniel Jordan
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Laurent Dufour
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change converts the existing mmap_sem rwsem calls to use the new mmap
    locking API instead.

    The change is generated using coccinelle with the following rule:

    // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

    @@
    expression mm;
    @@
    (
    -init_rwsem
    +mmap_init_lock
    |
    -down_write
    +mmap_write_lock
    |
    -down_write_killable
    +mmap_write_lock_killable
    |
    -down_write_trylock
    +mmap_write_trylock
    |
    -up_write
    +mmap_write_unlock
    |
    -downgrade_write
    +mmap_write_downgrade
    |
    -down_read
    +mmap_read_lock
    |
    -down_read_killable
    +mmap_read_lock_killable
    |
    -down_read_trylock
    +mmap_read_trylock
    |
    -up_read
    +mmap_read_unlock
    )
    -(&mm->mmap_sem)
    +(mm)

    Signed-off-by: Michel Lespinasse
    Signed-off-by: Andrew Morton
    Reviewed-by: Daniel Jordan
    Reviewed-by: Laurent Dufour
    Reviewed-by: Vlastimil Babka
    Cc: Davidlohr Bueso
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Jason Gunthorpe
    Cc: Jerome Glisse
    Cc: John Hubbard
    Cc: Liam Howlett
    Cc: Matthew Wilcox
    Cc: Peter Zijlstra
    Cc: Ying Han
    Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

05 Jun, 2020

2 commits

  • For kvmalloc'ed data object that contains sensitive information like
    cryptographic keys, we need to make sure that the buffer is always cleared
    before freeing it. Using memset() alone for buffer clearing may not
    provide certainty as the compiler may compile it away. To be sure, the
    special memzero_explicit() has to be used.

    This patch introduces a new kvfree_sensitive() for freeing those sensitive
    data objects allocated by kvmalloc(). The relevant places where
    kvfree_sensitive() can be used are modified to use it.

    Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
    Suggested-by: Linus Torvalds
    Signed-off-by: Waiman Long
    Signed-off-by: Andrew Morton
    Reviewed-by: Eric Biggers
    Acked-by: David Howells
    Cc: Jarkko Sakkinen
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Cc: David Rientjes
    Cc: Uladzislau Rezki
    Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • This check was added by commit 82f71ae4a2b8 ("mm: catch memory
    commitment underflow") in 2014 to have a safety check for issues which
    have been fixed. And there has been few report caught by it, as
    described in its commit log:

    : This shouldn't happen any more - the previous two patches fixed
    : the committed_as underflow issues.

    But it was really found by Qian Cai when he used the LTP memory stress
    suite to test a RFC patchset, which tries to improve scalability of
    per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for
    loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while
    keeping current number for OVERCOMMIT_NEVER.

    With that patchset, when system firstly uses a loose policy, the
    'vm_committed_as' count could be a big negative value, as its big 'batch'
    number allows a big deviation, then when the policy is changed to
    OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value,
    thus hits this WARN check.

    To mitigate this, one proposed solution is to queue work on all online
    CPUs to do a local sync for 'vm_committed_as' when changing policy to
    OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be
    hit.

    But this solution is costy and slow, given this check hasn't shown real
    trouble or benefit, simply drop it from one hot path of MM. And perf
    stats does show some tiny saving for removing it.

    Reported-by: Qian Cai
    Signed-off-by: Feng Tang
    Signed-off-by: Andrew Morton
    Reviewed-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Kees Cook
    Link: http://lkml.kernel.org/r/20200603094804.GB89848@shbuild999.sh.intel.com
    Signed-off-by: Linus Torvalds

    Feng Tang
     

04 Jun, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

03 Jun, 2020

1 commit

  • Just use __vmalloc_node instead which gets and extra argument. To be able
    to to use __vmalloc_node in all caller make it available outside of
    vmalloc and implement it in nommu.c.

    [akpm@linux-foundation.org: fix nommu build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Acked-by: Peter Zijlstra (Intel)
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Gao Xiang
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: "K. Y. Srinivasan"
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Michael Kelley
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Robin Murphy
    Cc: Sakari Ailus
    Cc: Stephen Hemminger
    Cc: Sumit Semwal
    Cc: Wei Liu
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Paul Mackerras
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Cc: Stephen Rothwell
    Link: http://lkml.kernel.org/r/20200414131348.444715-25-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

01 Dec, 2019

2 commits

  • Now we use rb_parent to get next, while this is not necessary.

    When prev is NULL, this means vma should be the first element in the list.
    Then next should be current first one (mm->mmap), no matter whether we
    have parent or not.

    After removing it, the code shows the beauty of symmetry.

    Link: http://lkml.kernel.org/r/20190813032656.16625-1-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Acked-by: Andrew Morton
    Cc: Mel Gorman
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • Just make the code a little easier to read.

    Link: http://lkml.kernel.org/r/20191006012636.31521-3-richardw.yang@linux.intel.com
    Signed-off-by: Wei Yang
    Cc: Christoph Hellwig
    Cc: Matthew Wilcox (Oracle)
    Cc: Mel Gorman
    Cc: Oscar Salvador
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     

25 Sep, 2019

5 commits

  • This commits selects ARCH_HAS_ELF_RANDOMIZE when an arch uses the generic
    topdown mmap layout functions so that this security feature is on by
    default.

    Note that this commit also removes the possibility for arm64 to have elf
    randomization and no MMU: without MMU, the security added by randomization
    is worth nothing.

    Link: http://lkml.kernel.org/r/20190730055113.23635-6-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Acked-by: Catalin Marinas
    Acked-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Luis Chamberlain
    Cc: Albert Ou
    Cc: Alexander Viro
    Cc: Christoph Hellwig
    Cc: James Hogan
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     
  • arm64 handles top-down mmap layout in a way that can be easily reused by
    other architectures, so make it available in mm. It then introduces a new
    config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT that can be set by other
    architectures to benefit from those functions. Note that this new config
    depends on MMU being enabled, if selected without MMU support, a warning
    will be thrown.

    Link: http://lkml.kernel.org/r/20190730055113.23635-5-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Suggested-by: Christoph Hellwig
    Acked-by: Catalin Marinas
    Acked-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Luis Chamberlain
    Cc: Albert Ou
    Cc: Alexander Viro
    Cc: James Hogan
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     
  • Patch series "Provide generic top-down mmap layout functions", v6.

    This series introduces generic functions to make top-down mmap layout
    easily accessible to architectures, in particular riscv which was the
    initial goal of this series. The generic implementation was taken from
    arm64 and used successively by arm, mips and finally riscv.

    Note that in addition the series fixes 2 issues:

    - stack randomization was taken into account even if not necessary.

    - [1] fixed an issue with mmap base which did not take into account
    randomization but did not report it to arm and mips, so by moving arm64
    into a generic library, this problem is now fixed for both
    architectures.

    This work is an effort to factorize architecture functions to avoid code
    duplication and oversights as in [1].

    [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html

    This patch (of 14):

    This preparatory commit moves this function so that further introduction
    of generic topdown mmap layout is contained only in mm/util.c.

    Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr
    Signed-off-by: Alexandre Ghiti
    Acked-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Luis Chamberlain
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Palmer Dabbelt
    Cc: Albert Ou
    Cc: Alexander Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Ghiti
     
  • Patch series "THP aware uprobe", v13.

    This patchset makes uprobe aware of THPs.

    Currently, when uprobe is attached to text on THP, the page is split by
    FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of
    THP.

    This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces
    FOLL_SPLIT_PMD, which only split PMD for uprobe.

    After all uprobes within the THP are removed, the PTE-mapped pages are
    regrouped as huge PMD.

    This set (plus a few THP patches) is also available at

    https://github.com/liu-song-6/linux/tree/uprobe-thp

    This patch (of 6):

    Move memcmp_pages() to mm/util.c and pages_identical() to mm.h, so that we
    can use them in other files.

    Link: http://lkml.kernel.org/r/20190815164525.1848545-2-songliubraving@fb.com
    Signed-off-by: Song Liu
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Oleg Nesterov
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: William Kucharski
    Cc: Srikar Dronamraju
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Song Liu
     
  • Replace 1 << compound_order(page) with compound_nr(page). Minor
    improvements in readability.

    Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

17 Jul, 2019

1 commit

  • locked_vm accounting is done roughly the same way in five places, so
    unify them in a helper.

    Include the helper's caller in the debug print to distinguish between
    callsites.

    Error codes stay the same, so user-visible behavior does too. The one
    exception is that the -EPERM case in tce_account_locked_vm is removed
    because Alexey has never seen it triggered.

    [daniel.m.jordan@oracle.com: v3]
    Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com
    [sfr@canb.auug.org.au: fix mm/util.c]
    Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.com
    Signed-off-by: Daniel Jordan
    Signed-off-by: Stephen Rothwell
    Tested-by: Alexey Kardashevskiy
    Acked-by: Alex Williamson
    Cc: Alan Tull
    Cc: Alex Williamson
    Cc: Benjamin Herrenschmidt
    Cc: Christoph Lameter
    Cc: Christophe Leroy
    Cc: Davidlohr Bueso
    Cc: Jason Gunthorpe
    Cc: Mark Rutland
    Cc: Michael Ellerman
    Cc: Moritz Fischer
    Cc: Paul Mackerras
    Cc: Steve Sistare
    Cc: Wu Hao
    Cc: Ira Weiny
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Jordan