30 Jun, 2020

1 commit

  • Rename kvfree_rcu() function to the kvfree_rcu_local() one.
    The purpose is to prevent a conflict of two same function
    declarations. The kvfree_rcu() will be globally visible
    what would lead to a build error. No functional change.

    Cc: linux-mm@kvack.org
    Cc: rcu@vger.kernel.org
    Cc: Andrew Morton
    Signed-off-by: Uladzislau Rezki (Sony)
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Paul E. McKenney

    Uladzislau Rezki (Sony)
     

26 Jun, 2020

17 commits

  • When working with very large nodes, poisoning the struct pages (for which
    there will be very many) can take a very long time. If the system is
    using voluntary preemptions, the software watchdog will not be able to
    detect forward progress. This patch addresses this issue by offering to
    give up time like __remove_pages() does. This behavior was introduced in
    v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in
    remove_pfn_range_from_zone()")

    Alternately, init_page_poison could do this cond_resched(), but it seems
    to me that the caller of init_page_poison() is what actually knows whether
    or not it should relax its own priority.

    Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2
    ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

    Aside from fixing the lockup, it is also a friendlier thing to do on lower
    core systems that might wipe out large chunks of hotplug memory (probably
    not a very common case).

    Fixes this kind of splat:

    watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
    irq event stamp: 138450
    hardirqs last enabled at (138449): [] trace_hardirqs_on_thunk+0x1a/0x1c
    hardirqs last disabled at (138450): [] trace_hardirqs_off_thunk+0x1a/0x1c
    softirqs last enabled at (138448): [] __do_softirq+0x347/0x456
    softirqs last disabled at (138443): [] irq_exit+0x7d/0xb0
    CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
    Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
    RIP: 0010:memset_erms+0x9/0x10
    Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
    Call Trace:
    remove_pfn_range_from_zone+0x3a/0x380
    memunmap_pages+0x17f/0x280
    release_nodes+0x22a/0x260
    __device_release_driver+0x172/0x220
    device_driver_detach+0x3e/0xa0
    unbind_store+0x113/0x130
    kernfs_fop_write+0xdc/0x1c0
    vfs_write+0xde/0x1d0
    ksys_write+0x58/0xd0
    do_syscall_64+0x5a/0x120
    entry_SYSCALL_64_after_hwframe+0x49/0xb3
    Built 2 zonelists, mobility grouping on. Total pages: 49050381
    Policy zone: Normal
    Built 3 zonelists, mobility grouping on. Total pages: 49312525
    Policy zone: Normal

    David said: "It really only is an issue for devmem. Ordinary
    hotplugged system memory is not affected (onlined/offlined in memory
    block granularity)."

    Link: http://lkml.kernel.org/r/20200619231213.1160351-1-ben.widawsky@intel.com
    Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
    Signed-off-by: Ben Widawsky
    Reported-by: "Scargall, Steve"
    Reported-by: Ben Widawsky
    Acked-by: David Hildenbrand
    Cc: Dan Williams
    Cc: Vishal Verma
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Widawsky
     
  • Merge vmalloc_exec into its only caller. Note that for !CONFIG_MMU
    __vmalloc_node_range maps to __vmalloc, which directly clears the
    __GFP_HIGHMEM added by the vmalloc_exec stub anyway.

    Link: http://lkml.kernel.org/r/20200618064307.32739-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: David Hildenbrand
    Acked-by: Peter Zijlstra (Intel)
    Cc: Catalin Marinas
    Cc: Dexuan Cui
    Cc: Jessica Yu
    Cc: Vitaly Kuznetsov
    Cc: Wei Liu
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • With synchronous IO swap device, swap-in is directly handled in fault
    code. Since IO cost notation isn't added there, with synchronous IO
    swap device, LRU balancing could be wrongly biased. Fix it to count it
    in fault code.

    Link: http://lkml.kernel.org/r/1592288204-27734-4-git-send-email-iamjoonsoo.kim@lge.com
    Fixes: 314b57fb0460001 ("mm: balance LRU lists based on relative thrashing cache sizing")
    Signed-off-by: Joonsoo Kim
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Non-file-lru page could also be activated in mark_page_accessed() and we
    need to count this activation for nonresident_age.

    Note that it's better for this patch to be squashed into the patch "mm:
    workingset: age nonresident information alongside anonymous pages".

    Link: http://lkml.kernel.org/r/1592288204-27734-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Patch series "fix for "mm: balance LRU lists based on relative
    thrashing" patchset"

    This patchset fixes some problems of the patchset, "mm: balance LRU
    lists based on relative thrashing", which is now merged on the mainline.

    Patch "mm: workingset: let cache workingset challenge anon fix" is the
    result of discussion with Johannes. See following link.

    http://lkml.kernel.org/r/20200520232525.798933-6-hannes@cmpxchg.org

    And, the other two are minor things which are found when I try to rebase
    my patchset.

    This patch (of 3):

    After ("mm: workingset: let cache workingset challenge anon fix"), we
    compare refault distances to active_file + anon. But age of the
    non-resident information is only driven by the file LRU. As a result,
    we may overestimate the recency of any incoming refaults and activate
    them too eagerly, causing unnecessary LRU churn in certain situations.

    Make anon aging drive nonresident age as well to address that.

    Link: http://lkml.kernel.org/r/1592288204-27734-1-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1592288204-27734-2-git-send-email-iamjoonsoo.kim@lge.com
    Fixes: 34e58cac6d8f2a ("mm: workingset: let cache workingset challenge anon")
    Reported-by: Joonsoo Kim
    Signed-off-by: Johannes Weiner
    Signed-off-by: Joonsoo Kim
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Looks like one of these got missed when massaging in f86b810c2610 ("mm,
    memcg: prevent memory.low load/store tearing") with other linux-mm
    changes.

    Link: http://lkml.kernel.org/r/20200612174437.GA391453@chrisdown.name
    Signed-off-by: Chris Down
    Reported-by: Michal Koutny
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Down
     
  • We should put the css reference when memory allocation failed.

    Link: http://lkml.kernel.org/r/20200614122653.98829-1-songmuchun@bytedance.com
    Fixes: f0a3a24b532d ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
    Signed-off-by: Muchun Song
    Acked-by: Roman Gushchin
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Muchun Song
     
  • Tejun reports seeing rare div0 crashes in memory.low stress testing:

    RIP: 0010:mem_cgroup_calculate_protection+0xed/0x150
    Code: 0f 46 d1 4c 39 d8 72 57 f6 05 16 d6 42 01 40 74 1f 4c 39 d8 76 1a 4c 39 d1 76 15 4c 29 d1 4c 29 d8 4d 29 d9 31 d2 48 0f af c1 f7 f1 49 01 c2 4c 89 96 38 01 00 00 5d c3 48 0f af c7 31 d2 49
    RSP: 0018:ffffa14e01d6fcd0 EFLAGS: 00010246
    RAX: 000000000243e384 RBX: 0000000000000000 RCX: 0000000000008f4b
    RDX: 0000000000000000 RSI: ffff8b89bee84000 RDI: 0000000000000000
    RBP: ffffa14e01d6fcd0 R08: ffff8b89ca7d40f8 R09: 0000000000000000
    R10: 0000000000000000 R11: 00000000006422f7 R12: 0000000000000000
    R13: ffff8b89d9617000 R14: ffff8b89bee84000 R15: ffffa14e01d6fdb8
    FS: 0000000000000000(0000) GS:ffff8b8a1f1c0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f93b1fc175b CR3: 000000016100a000 CR4: 0000000000340ea0
    Call Trace:
    shrink_node+0x1e5/0x6c0
    balance_pgdat+0x32d/0x5f0
    kswapd+0x1d7/0x3d0
    kthread+0x11c/0x160
    ret_from_fork+0x1f/0x30

    This happens when parent_usage == siblings_protected.

    We check that usage is bigger than protected, which should imply
    parent_usage being bigger than siblings_protected. However, we don't
    read (or even update) these values atomically, and they can be out of
    sync as the memory state changes under us. A bit of fluctuation around
    the target protection isn't a big deal, but we need to handle the div0
    case.

    Check the parent state explicitly to make sure we have a reasonable
    positive value for the divisor.

    Link: http://lkml.kernel.org/r/20200615140658.601684-1-hannes@cmpxchg.org
    Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection")
    Signed-off-by: Johannes Weiner
    Reported-by: Tejun Heo
    Acked-by: Michal Hocko
    Acked-by: Chris Down
    Cc: Roman Gushchin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This patch fixes following warning while "make xmldocs"

    mm/vmalloc.c:1877: warning: Excess function parameter 'prot' description in 'vm_map_ram'

    This warning started since commit d4efd79a81ab ("mm: remove the prot
    argument from vm_map_ram").

    Link: http://lkml.kernel.org/r/20200622152850.140871-1-standby24x7@gmail.com
    Fixes: d4efd79a81ab ("mm: remove the prot argument from vm_map_ram")
    Signed-off-by: Masanari Iida
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masanari Iida
     
  • Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for
    {READ,WRITE}_ONCE() memory accesses"), READ_ONCE() cannot be used
    anymore to read complex page table entries.

    This leads to:

    CC mm/debug_vm_pgtable.o
    In file included from ./include/asm-generic/bug.h:5,
    from ./arch/powerpc/include/asm/bug.h:109,
    from ./include/linux/bug.h:5,
    from ./include/linux/mmdebug.h:5,
    from ./include/linux/gfp.h:5,
    from mm/debug_vm_pgtable.c:13:
    In function 'pte_clear_tests',
    inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:363:2:
    ./include/linux/compiler.h:392:38: error: Unsupported access size for {READ,WRITE}_ONCE().
    mm/debug_vm_pgtable.c:249:14: note: in expansion of macro 'READ_ONCE'
    249 | pte_t pte = READ_ONCE(*ptep);
    | ^~~~~~~~~
    make[2]: *** [mm/debug_vm_pgtable.o] Error 1

    Fix it by using the recently added ptep_get() helper.

    Link: http://lkml.kernel.org/r/6ca8c972e6c920dc4ae0d4affbed9703afa4d010.1592490570.git.christophe.leroy@csgroup.eu
    Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
    Signed-off-by: Christophe Leroy
    Acked-by: Will Deacon
    Reviewed-by: Anshuman Khandual
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: "Peter Zijlstra (Intel)"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     
  • Calls to pte_offset_map() in vm_insert_pages() are erroneously not
    matched with a call to pte_unmap(). This would cause problems on
    architectures where that is not a no-op.

    This patch does away with the non-traditional locking in the existing
    code, and instead uses pte_offset_map_lock/unlock() as usual,
    incrementing PTE as necessary. The PTE pointer is kept within bounds
    since we clamp it with PTRS_PER_PTE.

    Link: http://lkml.kernel.org/r/20200618220446.20284-1-arjunroy.kdev@gmail.com
    Fixes: 8cd3984d81d5 ("mm/memory.c: add vm_insert_pages()")
    Signed-off-by: Arjun Roy
    Acked-by: David Rientjes
    Cc: Eric Dumazet
    Cc: Hugh Dickins
    Cc: Soheil Hassas Yeganeh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjun Roy
     
  • Chris Murphy reports that a slightly overcommitted load, testing swap
    and zram along with i915, splats and keeps on splatting, when it had
    better fail less noisily:

    gnome-shell: page allocation failure: order:0,
    mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
    nodemask=(null),cpuset=/,mems_allowed=0
    CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
    Call Trace:
    dump_stack+0x64/0x88
    warn_alloc.cold+0x75/0xd9
    __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
    __alloc_pages_nodemask+0x2df/0x320
    alloc_slab_page+0x195/0x310
    allocate_slab+0x3c5/0x440
    ___slab_alloc+0x40c/0x5f0
    __slab_alloc+0x1c/0x30
    kmem_cache_alloc+0x20e/0x220
    xas_nomem+0x28/0x70
    add_to_swap_cache+0x321/0x400
    __read_swap_cache_async+0x105/0x240
    swap_cluster_readahead+0x22c/0x2e0
    shmem_swapin+0x8e/0xc0
    shmem_swapin_page+0x196/0x740
    shmem_getpage_gfp+0x3a2/0xa60
    shmem_read_mapping_page_gfp+0x32/0x60
    shmem_get_pages+0x155/0x5e0 [i915]
    __i915_gem_object_get_pages+0x68/0xa0 [i915]
    i915_vma_pin+0x3fe/0x6c0 [i915]
    eb_add_vma+0x10b/0x2c0 [i915]
    i915_gem_do_execbuffer+0x704/0x3430 [i915]
    i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
    drm_ioctl_kernel+0x86/0xd0 [drm]
    drm_ioctl+0x206/0x390 [drm]
    ksys_ioctl+0x82/0xc0
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x5b/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported on 5.7, but it goes back really to 3.1: when
    shmem_read_mapping_page_gfp() was implemented for use by i915, and
    allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
    missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
    __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
    from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
    Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp")
    Signed-off-by: Hugh Dickins
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Matthew Wilcox (Oracle)
    Reported-by: Chris Murphy
    Analyzed-by: Vlastimil Babka
    Analyzed-by: Matthew Wilcox
    Tested-by: Chris Murphy
    Cc: [3.1+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • According to Christopher Lameter two fixes have been merged for the same
    problem. As far as I can tell, the code does not acquire the list_lock
    and invoke kmalloc(). list_slab_objects() misses an unlock (the
    counterpart to get_map()) and the memory allocated in free_partial()
    isn't used.

    Revert the mentioned commit.

    Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de
    Fixes: aa456c7aebb14 ("slub: remove kmalloc under list_lock from list_slab_objects() V2")
    Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2006181501480.12014@www.lameter.com
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Thomas Gleixner
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     
  • The kzfree() function is normally used to clear some sensitive
    information, like encryption keys, in the buffer before freeing it back to
    the pool. Memset() is currently used for buffer clearing. However
    unlikely, there is still a non-zero probability that the compiler may
    choose to optimize away the memory clearing especially if LTO is being
    used in the future.

    To make sure that this optimization will never happen,
    memzero_explicit(), which is introduced in v3.18, is now used in
    kzfree() to future-proof it.

    Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
    Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()")
    Signed-off-by: Waiman Long
    Acked-by: Michal Hocko
    Cc: David Howells
    Cc: Jarkko Sakkinen
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Dan Carpenter
    Cc: "Jason A . Donenfeld"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • It was found that running the LTP test on a PowerPC system could produce
    erroneous values in /proc/meminfo, like:

    MemTotal: 531915072 kB
    MemFree: 507962176 kB
    MemAvailable: 1100020596352 kB

    Using bisection, the problem is tracked down to commit 9c315e4d7d8c ("mm:
    memcg/slab: cache page number in memcg_(un)charge_slab()").

    In memcg_uncharge_slab() with a "int order" argument:

    unsigned int nr_pages = 1 << order;
    :
    mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);

    The mod_lruvec_state() function will eventually call the
    __mod_zone_page_state() which accepts a long argument. Depending on the
    compiler and how inlining is done, "-nr_pages" may be treated as a
    negative number or a very large positive number. Apparently, it was
    treated as a large positive number in that PowerPC system leading to
    incorrect stat counts. This problem hasn't been seen in x86-64 yet,
    perhaps the gcc compiler there has some slight difference in behavior.

    It is fixed by making nr_pages a signed value. For consistency, a similar
    change is applied to memcg_charge_slab() as well.

    Link: http://lkml.kernel.org/r/20200620184719.10994-1-longman@redhat.com
    Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
    Signed-off-by: Waiman Long
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Shakeel Butt
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • Hugh reports:

    "While stressing compaction, one run oopsed on NULL capc->cc in
    __free_one_page()'s task_capc(zone): compact_zone_order() had been
    interrupted, and a page was being freed in the return from interrupt.

    Though you would not expect it from the source, both gccs I was using
    (4.8.1 and 7.5.0) had chosen to compile compact_zone_order() with the
    ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
    callq compact_zone - long after the "current->capture_control =
    &capc". An interrupt in between those finds capc->cc NULL (zeroed by
    an earlier rep stos).

    This could presumably be fixed by a barrier() before setting
    current->capture_control in compact_zone_order(); but would also need
    more care on return from compact_zone(), in order not to risk leaking
    a page captured by interrupt just before capture_control is reset.

    Maybe that is the preferable fix, but I felt safer for task_capc() to
    exclude the rather surprising possibility of capture at interrupt
    time"

    I have checked that gcc10 also behaves the same.

    The advantage of fix in compact_zone_order() is that we don't add
    another test in the page freeing hot path, and that it might prevent
    future problems if we stop exposing pointers to uninitialized structures
    in current task.

    So this patch implements the suggestion for compact_zone_order() with
    barrier() (and WRITE_ONCE() to prevent store tearing) for setting
    current->capture_control, and prevents page leaking with
    WRITE_ONCE/READ_ONCE in the proper order.

    Link: http://lkml.kernel.org/r/20200616082649.27173-1-vbabka@suse.cz
    Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
    Signed-off-by: Vlastimil Babka
    Reported-by: Hugh Dickins
    Suggested-by: Hugh Dickins
    Acked-by: Hugh Dickins
    Cc: Alex Shi
    Cc: Li Wang
    Cc: Mel Gorman
    Cc: [5.1+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • do_swap_page() returns error codes from the VM_FAULT* space. try_charge()
    might return -ENOMEM, though, and then do_swap_page() simply returns 0
    which means a success.

    We almost never return ENOMEM for GFP_KERNEL single page charge. Except
    for async OOM handling (oom_disabled v1). So this needs translation to
    VM_FAULT_OOM otherwise the the page fault path will not notify the
    userspace and wait for an action.

    Link: http://lkml.kernel.org/r/20200617090238.GL9499@dhcp22.suse.cz
    Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
    Signed-off-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Alex Shi
    Cc: Joonsoo Kim
    Cc: Shakeel Butt
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Roman Gushchin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

22 Jun, 2020

1 commit

  • Pull powerpc fixes from Michael Ellerman:

    - One fix for the interrupt rework we did last release which broke
    KVM-PR

    - Three commits fixing some fallout from the READ_ONCE() changes
    interacting badly with our 8xx 16K pages support, which uses a pte_t
    that is a structure of 4 actual PTEs

    - A cleanup of the 8xx pte_update() to use the newly added pmd_off()

    - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
    enabled

    - A minor fix for the SPU syscall generation

    Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
    Rapoport, Nicholas Piggin.

    * tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/8xx: Provide ptep_get() with 16k pages
    mm: Allow arches to provide ptep_get()
    mm/gup: Use huge_ptep_get() in gup_hugepte()
    powerpc/syscalls: Use the number when building SPU syscall table
    powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
    powerpc/64s: Fix KVM interrupt using wrong save area
    powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL

    Linus Torvalds
     

20 Jun, 2020

2 commits

  • Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for
    {READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
    use READ_ONCE() to access complex page table entries like the one
    defined for powerpc 8xx with 16k size pages.

    Define a ptep_get() helper that architectures can override instead
    of performing a READ_ONCE() on the page table entry pointer.

    Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
    Signed-off-by: Christophe Leroy
    Acked-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     
  • gup_hugepte() reads hugepage table entries, it can't read
    them directly, huge_ptep_get() must be used.

    Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
    Signed-off-by: Christophe Leroy
    Acked-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/ffc3714334c3bfaca6f13788ad039e8759ae413f.1592225558.git.christophe.leroy@csgroup.eu

    Christophe Leroy
     

18 Jun, 2020

2 commits


14 Jun, 2020

2 commits

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix build rules in binderfs sample

    - fix build errors when Kbuild recurses to the top Makefile

    - covert '---help---' in Kconfig to 'help'

    * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    treewide: replace '---help---' in Kconfig files with 'help'
    kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
    samples: binderfs: really compile this sample and fix build issues

    Linus Torvalds
     
  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

12 Jun, 2020

5 commits

  • Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
    "The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
    which relies on compile-time instrumentation, and uses a
    watchpoint-based sampling approach to detect races.

    The feature was under development for quite some time and has already
    found legitimate bugs.

    Unfortunately it comes with a limitation, which was only understood
    late in the development cycle:

    It requires an up to date CLANG-11 compiler

    CLANG-11 is not yet released (scheduled for June), but it's the only
    compiler today which handles the kernel requirements and especially
    the annotations of functions to exclude them from KCSAN
    instrumentation correctly.

    These annotations really need to work so that low level entry code and
    especially int3 text poke handling can be completely isolated.

    A detailed discussion of the requirements and compiler issues can be
    found here:

    https://lore.kernel.org/lkml/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com/

    We came to the conclusion that trying to work around compiler
    limitations and bugs again would end up in a major trainwreck, so
    requiring a working compiler seemed to be the best choice.

    For Continous Integration purposes the compiler restriction is
    manageable and that's where most xxSAN reports come from.

    For a change this limitation might make GCC people actually look at
    their bugs. Some issues with CSAN in GCC are 7 years old and one has
    been 'fixed' 3 years ago with a half baken solution which 'solved' the
    reported issue but not the underlying problem.

    The KCSAN developers also ponder to use a GCC plugin to become
    independent, but that's not something which will show up in a few
    days.

    Blocking KCSAN until wide spread compiler support is available is not
    a really good alternative because the continuous growth of lockless
    optimizations in the kernel demands proper tooling support"

    * tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
    compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
    compiler.h: Move function attributes to compiler_types.h
    compiler.h: Avoid nested statement expression in data_race()
    compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
    kcsan: Update Documentation to change supported compilers
    kcsan: Remove 'noinline' from __no_kcsan_or_inline
    kcsan: Pass option tsan-instrument-read-before-write to Clang
    kcsan: Support distinguishing volatile accesses
    kcsan: Restrict supported compilers
    kcsan: Avoid inserting __tsan_func_entry/exit if possible
    ubsan, kcsan: Don't combine sanitizer with kcov on clang
    objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
    kcsan: Add __kcsan_{enable,disable}_current() variants
    checkpatch: Warn about data_race() without comment
    kcsan: Use GFP_ATOMIC under spin lock
    Improve KCSAN documentation a bit
    kcsan: Make reporting aware of KCSAN tests
    kcsan: Fix function matching in report
    kcsan: Change data_race() to no longer require marking racing accesses
    kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
    ...

    Linus Torvalds
     
  • Action Required memory error should happen only when a processor is
    about to access to a corrupted memory, so it's synchronous and only
    affects current process/thread.

    Recently commit 872e9a205c84 ("mm, memory_failure: don't send
    BUS_MCEERR_AO for action required error") fixed the issue that Action
    Required memory could unnecessarily send SIGBUS to the processes which
    share the error memory. But we still have another issue that we could
    send SIGBUS to a wrong thread.

    This is because collect_procs() and task_early_kill() fails to add the
    current process to "to-kill" list. So this patch is suggesting to fix
    it. With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
    process/thread.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Acked-by: Tony Luck
    Acked-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-3-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Patch series "hwpoison: fixes signaling on memory error"

    This is a small patchset to solve issues in memory error handler to send
    SIGBUS to proper process/thread as expected in configuration. Please
    see descriptions in individual patches for more details.

    This patch (of 2):

    Early-kill policy is controlled from two types of settings, one is
    per-process setting prctl(PR_MCE_KILL) and the other is system-wide
    setting vm.memory_failure_early_kill. Users expect per-process setting
    to override system-wide setting as many other settings do, but
    early-kill setting doesn't work as such.

    For example, if a system configures vm.memory_failure_early_kill to 1
    (enabled), a process receives SIGBUS even if it's configured to
    explicitly disable PF_MCE_KILL by prctl(). That's not desirable for
    applications with their own policies.

    This patch is suggesting to change the priority of these two types of
    settings, by checking sysctl_memory_failure_early_kill only when a given
    process has the default kill policy.

    Note that this patch is solving a thread choice issue too.

    Originally, collect_procs() always chooses the main thread when
    vm.memory_failure_early_kill is 1, even if the process has a dedicated
    thread for memory error handling. SIGBUS should be sent to the
    dedicated thread if early-kill is enabled via
    vm.memory_failure_early_kill as we are doing for PR_MCE_KILL_EARLY
    processes.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Cc: Tony Luck
    Cc: Pankaj Gupta
    Link: http://lkml.kernel.org/r/1591321039-22141-1-git-send-email-naoya.horiguchi@nec.com
    Link: http://lkml.kernel.org/r/1591321039-22141-2-git-send-email-naoya.horiguchi@nec.com
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • Merge some more updates from Andrew Morton:

    - various hotfixes and minor things

    - hch's use_mm/unuse_mm clearnups

    Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
    lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.

    * emailed patches from Andrew Morton :
    kernel: set USER_DS in kthread_use_mm
    kernel: better document the use_mm/unuse_mm API contract
    kernel: move use_mm/unuse_mm to kthread.c
    kernel: move use_mm/unuse_mm to kthread.c
    stacktrace: cleanup inconsistent variable type
    lib: test get_count_order/long in test_bitops.c
    mm: add comments on pglist_data zones
    ocfs2: fix spelling mistake and grammar
    mm/debug_vm_pgtable: fix kernel crash by checking for THP support
    lib: fix bitmap_parse() on 64-bit big endian archs
    checkpatch: correct check for kernel parameters doc
    nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
    lib/lz4/lz4_decompress.c: document deliberate use of `&'
    kcov: check kcov_softirq in kcov_remote_stop()
    scripts/spelling: add a few more typos
    khugepaged: selftests: fix timeout condition in wait_for_scan()

    Linus Torvalds
     
  • Merge the state of the locking kcsan branch before the read/write_once()
    and the atomics modifications got merged.

    Squash the fallout of the rebase on top of the read/write once and atomic
    fallback work into the merge. The history of the original branch is
    preserved in tag locking-kcsan-2020-06-02.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

11 Jun, 2020

4 commits

  • Switch the function documentation to kerneldoc comments, and add
    WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
    not have ->mm set (or has ->mm set in the case of unuse_mm).

    Also give the functions a kthread_ prefix to better document the use case.

    [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
    Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
    [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
    Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Tested-by: Jens Axboe
    Reviewed-by: Jens Axboe
    Acked-by: Felix Kuehling
    Acked-by: Greg Kroah-Hartman [usb]
    Acked-by: Haren Myneni
    Cc: Alex Deucher
    Cc: Al Viro
    Cc: Felipe Balbi
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Zhenyu Wang
    Cc: Zhi Wang
    Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "improve use_mm / unuse_mm", v2.

    This series improves the use_mm / unuse_mm interface by better documenting
    the assumptions, and my taking the set_fs manipulations spread over the
    callers into the core API.

    This patch (of 3):

    Use the proper API instead.

    Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

    These helpers are only for use with kernel threads, and I will tie them
    more into the kthread infrastructure going forward. Also move the
    prototypes to kthread.h - mmu_context.h was a little weird to start with
    as it otherwise contains very low-level MM bits.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Tested-by: Jens Axboe
    Reviewed-by: Jens Axboe
    Acked-by: Felix Kuehling
    Cc: Alex Deucher
    Cc: Al Viro
    Cc: Felipe Balbi
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Zhenyu Wang
    Cc: Zhi Wang
    Cc: Greg Kroah-Hartman
    Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
    Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Architectures can have CONFIG_TRANSPARENT_HUGEPAGE enabled but no THP
    support enabled based on platforms. For ex: with 4K PAGE_SIZE ppc64
    supports THP only with radix translation.

    This results in below crash when running with hash translation and 4K
    PAGE_SIZE.

    kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:140!
    cpu 0x61: Vector: 700 (Program Check) at [c000000ff948f860]
    pc: debug_vm_pgtable+0x480/0x8b0
    lr: debug_vm_pgtable+0x474/0x8b0
    ...
    debug_vm_pgtable+0x374/0x8b0 (unreliable)
    do_one_initcall+0x98/0x4f0
    kernel_init_freeable+0x330/0x3fc
    kernel_init+0x24/0x148

    Check for THP support correctly

    Link: http://lkml.kernel.org/r/20200608125252.407659-1-aneesh.kumar@linux.ibm.com
    Fixes: 399145f9eb6c ("mm/debug: add tests validating architecture page table helpers")
    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Pull virtio updates from Michael Tsirkin:

    - virtio-mem: paravirtualized memory hotplug

    - support doorbell mapping for vdpa

    - config interrupt support in ifc

    - fixes all over the place

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
    vhost/test: fix up after API change
    virtio_mem: convert device block size into 64bit
    virtio-mem: drop unnecessary initialization
    ifcvf: implement config interrupt in IFCVF
    vhost: replace -1 with VHOST_FILE_UNBIND in ioctls
    vhost_vdpa: Support config interrupt in vdpa
    ifcvf: ignore continuous setting same status value
    virtio-mem: Don't rely on implicit compiler padding for requests
    virtio-mem: Try to unplug the complete online memory block first
    virtio-mem: Use -ETXTBSY as error code if the device is busy
    virtio-mem: Unplug subblocks right-to-left
    virtio-mem: Drop manual check for already present memory
    virtio-mem: Add parent resource for all added "System RAM"
    virtio-mem: Better retry handling
    virtio-mem: Offline and remove completely unplugged memory blocks
    mm/memory_hotplug: Introduce offline_and_remove_memory()
    virtio-mem: Allow to offline partially unplugged memory blocks
    mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
    virtio-mem: Paravirtualized memory hotunplug part 2
    virtio-mem: Paravirtualized memory hotunplug part 1
    ...

    Linus Torvalds
     

10 Jun, 2020

6 commits

  • Allow the callers to distinguish a real unmapped address vs a range
    that can't be probed.

    Suggested-by: Masami Hiramatsu
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Reviewed-by: Masami Hiramatsu
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-24-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Provide alternative versions of probe_kernel_read, probe_kernel_write
    and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
    use arch hooks that are modelled after unsafe_{get,put}_user to access
    kernel memory in an exception safe way.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-19-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Move kernel access vs user access routines together to ease upcoming
    ifdefs.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-18-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Except for historical confusion in the kprobes/uprobes and bpf tracers,
    which has been fixed now, there is no good reason to ever allow user
    memory accesses from probe_kernel_read. Switch probe_kernel_read to only
    read from kernel memory.

    [akpm@linux-foundation.org: update it for "mm, dump_page(): do not crash with invalid mapping pointer"]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-17-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • All users are gone now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-16-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Currently architectures have to override every routine that probes
    kernel memory, which includes a pure read and strcpy, both in strict
    and not strict variants. Just provide a single arch hooks instead to
    make sure all architectures cover all the cases.

    [akpm@linux-foundation.org: fix !CONFIG_X86_64 build]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200521152301.2587579-11-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig