27 Apr, 2022

1 commit

  • commit 2dfe63e61cc31ee59ce951672b0850b5229cd5b0 upstream.

    Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
    producing garbage data due to the object not actually being maintained
    by SLAB or SLUB.

    Fix this by implementing __kfence_obj_info() that copies relevant
    information to struct kmem_obj_info when the object was allocated by
    KFENCE; this is called by a common kmem_obj_info(), which also calls the
    slab/slub/slob specific variant now called __kmem_obj_info().

    For completeness, kmem_dump_obj() now displays if the object was
    allocated by KFENCE.

    Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
    Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
    Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB")
    Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
    Signed-off-by: Marco Elver
    Reviewed-by: Hyeonggon Yoo
    Reported-by: kernel test robot
    Acked-by: Vlastimil Babka [slab]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Marco Elver
     

08 Apr, 2022

1 commit

  • commit ae085d7f9365de7da27ab5c0d16b12d51ea7fca9 upstream.

    The objcg is not cleared and put for kfence object when it is freed,
    which could lead to memory leak for struct obj_cgroup and wrong
    statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B.

    Since the last freed object's objcg is not cleared,
    mem_cgroup_from_obj() could return the wrong memcg when this kfence
    object, which is not charged to any objcgs, is reallocated to other
    users.

    A real word issue [1] is caused by this bug.

    Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1]
    Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com
    Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
    Signed-off-by: Muchun Song
    Cc: Dmitry Vyukov
    Cc: Marco Elver
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Muchun Song
     

19 Oct, 2021

1 commit

  • Once upon a time, the node demotion updates were driven solely by memory
    hotplug events. But now, there are handlers for both CPU and memory
    hotplug.

    However, the #ifdef around the code checks only memory hotplug. A
    system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU
    hotplug events.

    Update the #ifdef around the common code. Add memory and CPU-specific
    #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused
    function warnings when their Kconfig option is off.

    [arnd@arndb.de: rework hotplug_memory_notifier() stub]
    Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org

    Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com
    Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
    Signed-off-by: Dave Hansen
    Signed-off-by: Arnd Bergmann
    Cc: "Huang, Ying"
    Cc: Michal Hocko
    Cc: Wei Xu
    Cc: Oscar Salvador
    Cc: David Rientjes
    Cc: Dan Williams
    Cc: David Hildenbrand
    Cc: Greg Thelen
    Cc: Yang Shi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

07 May, 2021

2 commits

  • Fix ~94 single-word typos in locking code comments, plus a few
    very obvious grammar mistakes.

    Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
    Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
    Signed-off-by: Ingo Molnar
    Reviewed-by: Matthew Wilcox (Oracle)
    Reviewed-by: Randy Dunlap
    Cc: Bhaskar Chowdhury
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • There is a spelling mistake in a comment. Fix it.

    Link: https://lkml.kernel.org/r/20210317094158.5762-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     

01 May, 2021

2 commits

  • This change uses the previously added memory initialization feature of
    HW_TAGS KASAN routines for slab memory when init_on_free is enabled.

    With this change, memory initialization memset() is no longer called when
    both HW_TAGS KASAN and init_on_free are enabled. Instead, memory is
    initialized in KASAN runtime.

    For SLUB, the memory initialization memset() is moved into
    slab_free_hook() that currently directly follows the initialization loop.
    A new argument is added to slab_free_hook() that indicates whether to
    initialize the memory or not.

    To avoid discrepancies with which memory gets initialized that can be
    caused by future changes, both KASAN hook and initialization memset() are
    put together and a warning comment is added.

    Combining setting allocation tags with memory initialization improves
    HW_TAGS KASAN performance when init_on_free is enabled.

    Link: https://lkml.kernel.org/r/190fd15c1886654afdec0d19ebebd5ade665b601.1615296150.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Marco Elver
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Branislav Rankov
    Cc: Catalin Marinas
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Dmitry Vyukov
    Cc: Evgenii Stepanov
    Cc: Joonsoo Kim
    Cc: Kevin Brodsky
    Cc: Pekka Enberg
    Cc: Peter Collingbourne
    Cc: Vincenzo Frascino
    Cc: Vlastimil Babka
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • This change uses the previously added memory initialization feature of
    HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.

    With this change, memory initialization memset() is no longer called when
    both HW_TAGS KASAN and init_on_alloc are enabled. Instead, memory is
    initialized in KASAN runtime.

    The memory initialization memset() is moved into slab_post_alloc_hook()
    that currently directly follows the initialization loop. A new argument
    is added to slab_post_alloc_hook() that indicates whether to initialize
    the memory or not.

    To avoid discrepancies with which memory gets initialized that can be
    caused by future changes, both KASAN hook and initialization memset() are
    put together and a warning comment is added.

    Combining setting allocation tags with memory initialization improves
    HW_TAGS KASAN performance when init_on_alloc is enabled.

    Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Marco Elver
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Branislav Rankov
    Cc: Catalin Marinas
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Dmitry Vyukov
    Cc: Evgenii Stepanov
    Cc: Joonsoo Kim
    Cc: Kevin Brodsky
    Cc: Pekka Enberg
    Cc: Peter Collingbourne
    Cc: Vincenzo Frascino
    Cc: Vlastimil Babka
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     

11 Apr, 2021

1 commit

  • …ulmck/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - Bitmap support for "N" as alias for last bit

    - kvfree_rcu updates

    - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.)

    - RCU callback offloading update

    - Polling RCU grace-period interfaces

    - Realtime-related RCU updates

    - Tasks-RCU updates

    - Torture-test updates

    - Torture-test scripting updates

    - Miscellaneous fixes

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

14 Mar, 2021

1 commit

  • cache_alloc_debugcheck_after() performs checks on an object, including
    adjusting the returned pointer. None of this should apply to KFENCE
    objects. While for non-bulk allocations, the checks are skipped when we
    allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
    is called via cache_alloc_debugcheck_after_bulk().

    Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.

    Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
    Signed-off-by: Marco Elver
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Andrey Konovalov
    Cc: Jann Horn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marco Elver
     

09 Mar, 2021

1 commit

  • The mem_dump_obj() functionality adds a few hundred bytes, which is a
    small price to pay. Except on kernels built with CONFIG_PRINTK=n, in
    which mem_dump_obj() messages will be suppressed. This commit therefore
    makes mem_dump_obj() be a static inline empty function on kernels built
    with CONFIG_PRINTK=n and excludes all of its support functions as well.
    This avoids kernel bloat on systems that cannot use mem_dump_obj().

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc:
    Suggested-by: Andrew Morton
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

27 Feb, 2021

1 commit

  • Inserts KFENCE hooks into the SLAB allocator.

    To pass the originally requested size to KFENCE, add an argument
    'orig_size' to slab_alloc*(). The additional argument is required to
    preserve the requested original size for kmalloc() allocations, which
    uses size classes (e.g. an allocation of 272 bytes will return an object
    of size 512). Therefore, kmem_cache::size does not represent the
    kmalloc-caller's requested size, and we must introduce the argument
    'orig_size' to propagate the originally requested size to KFENCE.

    Without the originally requested size, we would not be able to detect
    out-of-bounds accesses for objects placed at the end of a KFENCE object
    page if that object is not equal to the kmalloc-size class it was
    bucketed into.

    When KFENCE is disabled, there is no additional overhead, since
    slab_alloc*() functions are __always_inline.

    Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
    Signed-off-by: Marco Elver
    Signed-off-by: Alexander Potapenko
    Reviewed-by: Dmitry Vyukov
    Co-developed-by: Marco Elver

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Dave Hansen
    Cc: Eric Dumazet
    Cc: Greg Kroah-Hartman
    Cc: Hillf Danton
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jann Horn
    Cc: Joern Engel
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Mark Rutland
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: SeongJae Park
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

25 Feb, 2021

5 commits

  • Generic mm functions that call KASAN annotations that might report a bug
    pass _RET_IP_ to them as an argument. This allows KASAN to include the
    name of the function that called the mm function in its report's header.

    Now that KASAN has inline wrappers for all of its annotations, move
    _RET_IP_ to those wrappers to simplify annotation call sites.

    Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67
    Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Marco Elver
    Reviewed-by: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Branislav Rankov
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Evgenii Stepanov
    Cc: Kevin Brodsky
    Cc: Peter Collingbourne
    Cc: Vincenzo Frascino
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Konovalov
     
  • In general it's unknown in advance if a slab page will contain accounted
    objects or not. In order to avoid memory waste, an obj_cgroup vector is
    allocated dynamically when a need to account of a new object arises. Such
    approach is memory efficient, but requires an expensive cmpxchg() to set
    up the memcg/objcgs pointer, because an allocation can race with a
    different allocation on another cpu.

    But in some common cases it's known for sure that a slab page will contain
    accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT
    flag set. It includes such popular objects like vm_area_struct, anon_vma,
    task_struct, etc.

    In such cases we can pre-allocate the objcgs vector and simple assign it
    to the page without any atomic operations, because at this early stage the
    page is not visible to anyone else.

    A very simplistic benchmark (allocating 10000000 64-bytes objects in a
    row) shows ~15% win. In the real life it seems that most workloads are
    not very sensitive to the speed of (accounted) slab allocations.

    [guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes]
    Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com
    [akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch]

    Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com
    Signed-off-by: Roman Gushchin
    Acked-by: Johannes Weiner
    Reviewed-by: Shakeel Butt
    Cc: Michal Hocko
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Fix some coding style issues, improve code reading. Adds whitespace to
    clearly separate the parameters.

    Link: https://lkml.kernel.org/r/1612841499-32166-1-git-send-email-daizhiyuan@phytium.com.cn
    Signed-off-by: Zhiyuan Dai
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhiyuan Dai
     
  • This argument hasn't been used since e153362a50a3 ("slub: Remove objsize
    check in kmem_cache_flags()") so simply remove it.

    Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.com
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Miaohe Lin
    Reviewed-by: Vlastimil Babka
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Cc: Pekka Enberg
    Cc: Joonsoo Kim
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikolay Borisov
     
  • Currently, a trace record generated by the RCU core is as below.

    ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66

    It doesn't tell us what the RCU core has freed.

    This patch adds the slab name to trace_kmem_cache_free().
    The new format is as follows.

    ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry
    ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache
    ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue
    ... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_node

    We can use it to understand what the RCU core is going to free. For
    example, some users maybe interested in when the RCU core starts
    freeing reclaimable slabs like dentry to reduce memory pressure.

    Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.com
    Signed-off-by: Jacob Wen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Steven Rostedt
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jacob Wen
     

23 Jan, 2021

1 commit

  • There are kernel facilities such as per-CPU reference counts that give
    error messages in generic handlers or callbacks, whose messages are
    unenlightening. In the case of per-CPU reference-count underflow, this
    is not a problem when creating a new use of this facility because in that
    case the bug is almost certainly in the code implementing that new use.
    However, trouble arises when deploying across many systems, which might
    exercise corner cases that were not seen during development and testing.
    Here, it would be really nice to get some kind of hint as to which of
    several uses the underflow was caused by.

    This commit therefore exposes a mem_dump_obj() function that takes
    a pointer to memory (which must still be allocated if it has been
    dynamically allocated) and prints available information on where that
    memory came from. This pointer can reference the middle of the block as
    well as the beginning of the block, as needed by things like RCU callback
    functions and timer handlers that might not know where the beginning of
    the memory block is. These functions and handlers can use mem_dump_obj()
    to print out better hints as to where the problem might lie.

    The information printed can depend on kernel configuration. For example,
    the allocation return address can be printed only for slab and slub,
    and even then only when the necessary debug has been enabled. For slab,
    build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
    to the next power of two or use the SLAB_STORE_USER when creating the
    kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and
    boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
    if more focused use is desired. Also for slub, use CONFIG_STACKTRACE
    to enable printing of the allocation-time stack trace.

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Reported-by: Andrii Nakryiko
    [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
    [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
    [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
    [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
    [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
    [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
    Acked-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Tested-by: Naresh Kamboju
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

16 Dec, 2020

3 commits

  • Patch series "arch, mm: improve robustness of direct map manipulation", v7.

    During recent discussion about KVM protected memory, David raised a
    concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
    scope [1].

    Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
    possible that __kernel_map_pages() would fail, but since this function is
    void, the failure will go unnoticed.

    Moreover, there's lack of consistency of __kernel_map_pages() semantics
    across architectures as some guard this function with #ifdef
    DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
    debugging is disabled at run time and some allow modifying the direct map
    regardless of DEBUG_PAGEALLOC settings.

    This set straightens this out by restoring dependency of
    __kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
    accordingly.

    Since currently the only user of __kernel_map_pages() outside
    DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
    there more explicit.

    [1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com

    This patch (of 4):

    When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
    direct mapping after free_pages(). The pages than need to be mapped back
    before they could be used. Theese mapping operations use
    __kernel_map_pages() guarded with with debug_pagealloc_enabled().

    The only place that calls __kernel_map_pages() without checking whether
    DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
    availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still,
    on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
    enabled but set_direct_map_invalid_noflush() may render some pages not
    present in the direct map and hibernation code won't be able to save such
    pages.

    To make page allocation debugging and hibernation interaction more robust,
    the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
    made more explicit.

    Start with combining the guard condition and the call to
    __kernel_map_pages() into debug_pagealloc_map_pages() and
    debug_pagealloc_unmap_pages() functions to emphasize that
    __kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
    these new functions to map/unmap pages when page allocation debugging is
    enabled.

    Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org
    Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.org
    Signed-off-by: Mike Rapoport
    Reviewed-by: David Hildenbrand
    Acked-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Cc: Albert Ou
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: Christoph Lameter
    Cc: "David S. Miller"
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: "Edgecombe, Rick P"
    Cc: "H. Peter Anvin"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Cc: Len Brown
    Cc: Michael Ellerman
    Cc: Palmer Dabbelt
    Cc: Paul Mackerras
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Thomas Gleixner
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Cc: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
    go to the heap quarantine not being erased.

    Lets move init_on_free clearing before calling kasan_slab_free(). In that
    case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
    behavior.

    Link: https://lkml.kernel.org/r/20201210183729.1261524-1-alex.popov@linux.com
    Signed-off-by: Alexander Popov
    Reviewed-by: Alexander Potapenko
    Acked-by: David Rientjes
    Acked-by: Joonsoo Kim
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Popov
     
  • The page allocator expects that page->mapping is NULL for a page being
    freed. SLAB and SLUB use the slab_cache field which is in union with
    mapping, but before freeing the page, the field is referenced with the
    "mapping" name when set to NULL.

    It's IMHO more correct (albeit functionally the same) to use the
    slab_cache name as that's the field we use in SL*B, and document why we
    clear it in a comment (we don't clear fields such as s_mem or freelist, as
    page allocator doesn't care about those). While using the 'mapping' name
    would automagically keep the code correct if the unions in struct page
    changed, such changes should be done consciously and needed changes
    evaluated - the comment should help with that.

    Link: https://lkml.kernel.org/r/20201210160020.21562-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: David Rientjes
    Acked-by: Joonsoo Kim
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

17 Oct, 2020

1 commit

  • Correct one function name "get_partials" with "get_partial". Update the
    old struct name of list3 with kmem_cache_node.

    Signed-off-by: Chen Tao
    Signed-off-by: Andrew Morton
    Reviewed-by: Mike Rapoport
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: Linus Torvalds

    Chen Tao
     

14 Oct, 2020

2 commits

  • Object cgroup charging is done for all the objects during allocation, but
    during freeing, uncharging ends up happening for only one object in the
    case of bulk allocation/freeing.

    Fix this by having a separate call to uncharge all the objects from
    kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take
    care of bulk uncharging.

    Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects"
    Signed-off-by: Bharata B Rao
    Signed-off-by: Andrew Morton
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Vlastimil Babka
    Cc: Shakeel Butt
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc:
    Link: https://lkml.kernel.org/r/20201009060423.390479-1-bharata@linux.ibm.com
    Signed-off-by: Linus Torvalds

    Bharata B Rao
     
  • The removed code was unnecessary and changed nothing in the flow, since in
    case of returning NULL by 'kmem_cache_alloc_node' returning 'freelist'
    from the function in question is the same as returning NULL.

    Signed-off-by: Mateusz Nosek
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: https://lkml.kernel.org/r/20200915230329.13002-1-mateusznosek0@gmail.com
    Signed-off-by: Linus Torvalds

    Mateusz Nosek
     

27 Sep, 2020

1 commit

  • With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
    kmem_caches for all allocations"), it becomes possible to call kfree()
    from the slabs_destroy().

    The functions cache_flusharray() and do_drain() calls slabs_destroy() on
    array_cache of the local CPU without updating the size of the
    array_cache. This enables the kfree() call from the slabs_destroy() to
    recursively call cache_flusharray() which can potentially call
    free_block() on the same elements of the array_cache of the local CPU
    and causing double free and memory corruption.

    To fix the issue, simply update the local CPU array_cache cache before
    calling slabs_destroy().

    Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
    Signed-off-by: Shakeel Butt
    Reviewed-by: Roman Gushchin
    Tested-by: Ming Lei
    Reported-by: kernel test robot
    Cc: Andrew Morton
    Cc: Ted Ts'o
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     

08 Aug, 2020

12 commits

  • charge_slab_page() and uncharge_slab_page() are not related anymore to
    memcg charging and uncharging. In order to make their names less
    confusing, let's rename them to account_slab_page() and
    unaccount_slab_page() respectively.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Shakeel Butt
    Acked-by: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • charge_slab_page() is not using the gfp argument anymore,
    remove it.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Shakeel Butt
    Acked-by: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Instead of having two sets of kmem_caches: one for system-wide and
    non-accounted allocations and the second one shared by all accounted
    allocations, we can use just one.

    The idea is simple: space for obj_cgroup metadata can be allocated on
    demand and filled only for accounted allocations.

    It allows to remove a bunch of code which is required to handle kmem_cache
    clones for accounted allocations. There is no more need to create them,
    accumulate statistics, propagate attributes, etc. It's a quite
    significant simplification.

    Also, because the total number of slab_caches is reduced almost twice (not
    all kmem_caches have a memcg clone), some additional memory savings are
    expected. On my devvm it additionally saves about 3.5% of slab memory.

    [guro@fb.com: fix build on MIPS]
    Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com

    Suggested-by: Johannes Weiner
    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Shakeel Butt
    Cc: Christoph Lameter
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Naresh Kamboju
    Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Currently there are two lists of kmem_caches:
    1) slab_caches, which contains all kmem_caches,
    2) slab_root_caches, which contains only root kmem_caches.

    And there is some preprocessor magic to have a single list if
    CONFIG_MEMCG_KMEM isn't enabled.

    It was required earlier because the number of non-root kmem_caches was
    proportional to the number of memory cgroups and could reach really big
    values. Now, when it cannot exceed the number of root kmem_caches, there
    is really no reason to maintain two lists.

    We never iterate over the slab_root_caches list on any hot paths, so it's
    perfectly fine to iterate over slab_caches and filter out non-root
    kmem_caches.

    It allows to remove a lot of config-dependent code and two pointers from
    the kmem_cache structure.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Shakeel Butt
    Cc: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • This is fairly big but mostly red patch, which makes all accounted slab
    allocations use a single set of kmem_caches instead of creating a separate
    set for each memory cgroup.

    Because the number of non-root kmem_caches is now capped by the number of
    root kmem_caches, there is no need to shrink or destroy them prematurely.
    They can be perfectly destroyed together with their root counterparts.
    This allows to dramatically simplify the management of non-root
    kmem_caches and delete a ton of code.

    This patch performs the following changes:
    1) introduces memcg_params.memcg_cache pointer to represent the
    kmem_cache which will be used for all non-root allocations
    2) reuses the existing memcg kmem_cache creation mechanism
    to create memcg kmem_cache on the first allocation attempt
    3) memcg kmem_caches are named -memcg,
    e.g. dentry-memcg
    4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache
    or schedule it's creation and return the root cache
    5) removes almost all non-root kmem_cache management code
    (separate refcounter, reparenting, shrinking, etc)
    6) makes slab debugfs to display root_mem_cgroup css id and never
    show :dead and :deact flags in the memcg_slabinfo attribute.

    Following patches in the series will simplify the kmem_cache creation.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Shakeel Butt
    Cc: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Store the obj_cgroup pointer in the corresponding place of
    page->obj_cgroups for each allocated non-root slab object. Make sure that
    each allocated object holds a reference to obj_cgroup.

    Objcg pointer is obtained from the memcg->objcg dereferencing in
    memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook.
    Then in case of successful allocation(s) it's getting stored in the
    page->obj_cgroups vector.

    The objcg obtaining part look a bit bulky now, but it will be simplified
    by next commits in the series.

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Vlastimil Babka
    Reviewed-by: Shakeel Butt
    Cc: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • Provide the necessary KCSAN checks to assist with debugging racy
    use-after-frees. While KASAN is more reliable at generally catching such
    use-after-frees (due to its use of a quarantine), it can be difficult to
    debug racy use-after-frees. If a reliable reproducer exists, KCSAN can
    assist in debugging such issues.

    Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is
    simply sizeof(var). Instead, here we just use __kcsan_check_access()
    explicitly to pass the correct size.

    Signed-off-by: Marco Elver
    Signed-off-by: Andrew Morton
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200623072653.114563-1-elver@google.com
    Signed-off-by: Linus Torvalds

    Marco Elver
     
  • cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get
    the cache from its page in kmem_cache_free()") to support kmemcg, where
    per-memcg cache can be different from the root one, so we can't use the
    kmem_cache pointer given to kmem_cache_free().

    Prior to that commit, SLUB already had debugging check+warning that could
    be enabled to compare the given kmem_cache pointer to one referenced by
    the slab page where the object-to-be-freed resides. This check was moved
    to cache_from_obj(). Later the check was also enabled for
    SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
    cache membership under freelist hardening").

    These checks and warnings can be useful especially for the debugging,
    which can be improved. Commit 598a0717a816 changed the pr_err() with
    WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
    others are silent. This patch changes it to WARN() so that all errors are
    reported.

    It's also useful to print SLUB allocation/free tracking info for the
    offending object, if tracking is enabled. Thus, export the SLUB
    print_tracking() function and provide an empty one for SLAB.

    For SLUB we can also benefit from the static key check in
    kmem_cache_debug_flags(), but we need to move this function to slab.h and
    declare the static key there.

    [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com

    [vbabka@suse.cz: avoid bogus WARN()]
    Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian
    Link: http://lkml.kernel.org/r/b33e0fa7-cd28-4788-9e54-5927846329ef@suse.cz

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Acked-by: Kees Cook
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Matthew Garrett
    Cc: Jann Horn
    Cc: Vijayanand Jitta
    Cc: Vinayak Menon
    Link: http://lkml.kernel.org/r/afeda7ac-748b-33d8-a905-56b708148ad5@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b:
    always get the cache from its page in kmem_cache_free()") to support
    kmemcg, where per-memcg cache can be different from the root one, so we
    can't use the kmem_cache pointer given to kmem_cache_free().

    Prior to that commit, SLUB already had debugging check+warning that could
    be enabled to compare the given kmem_cache pointer to one referenced by
    the slab page where the object-to-be-freed resides. This check was moved
    to cache_from_obj(). Later the check was also enabled for
    SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
    cache membership under freelist hardening").

    These checks and warnings can be useful especially for the debugging,
    which can be improved. Commit 598a0717a816 changed the pr_err() with
    WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
    others are silent. This patch changes it to WARN() so that all errors are
    reported.

    It's also useful to print SLUB allocation/free tracking info for the
    offending object, if tracking is enabled. We could export the SLUB
    print_tracking() function and provide an empty one for SLAB, or realize
    that both the debugging and hardening cases in cache_from_obj() are only
    supported by SLUB anyway. So this patch moves cache_from_obj() from
    slab.h to separate instances in slab.c and slub.c, where the SLAB version
    only does the kmemcg lookup and even could be completely removed once the
    kmemcg rework [1] is merged. The SLUB version can thus easily use the
    print_tracking() function. It can also use the kmem_cache_debug_flags()
    static key check for improved performance in kernels without the hardening
    and with debugging not enabled on boot.

    [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Cc: Christoph Lameter
    Cc: Jann Horn
    Cc: Kees Cook
    Cc: Vijayanand Jitta
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Link: http://lkml.kernel.org/r/20200610163135.17364-10-vbabka@suse.cz
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • kmem_list3 has been renamed to kmem_cache_node long long ago so update it.

    References:
    6744f087ba2a ("slab: Common name for the per node structures")
    ce8eb6c424c7 ("slab: Rename list3/l3 to node")

    Signed-off-by: Xiao Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Pekka Enberg
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200722033355.26908-1-yangx.jy@cn.fujitsu.com
    Signed-off-by: Linus Torvalds

    Xiao Yang
     
  • kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of
    memory currently bypasses the check and will simply leak the memory when
    page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK
    check out of slab & slub, and call it from kmalloc_order() as well. In
    order to make the code clear, the warning message is put in one place.

    Signed-off-by: Long Li
    Signed-off-by: Andrew Morton
    Reviewed-by: Matthew Wilcox (Oracle)
    Reviewed-by: Pekka Enberg
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong
    Signed-off-by: Linus Torvalds

    Long Li
     
  • Similar to commit ce6fa91b9363 ("mm/slub.c: add a naive detection of
    double free or corruption"), add a very cheap double-free check for SLAB
    under CONFIG_SLAB_FREELIST_HARDENED. With this added, the
    "SLAB_FREE_DOUBLE" LKDTM test passes under SLAB:

    lkdtm: Performing direct entry SLAB_FREE_DOUBLE
    lkdtm: Attempting double slab free ...
    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 2193 at mm/slab.c:757 ___cache _free+0x325/0x390

    [keescook@chromium.org: fix misplaced __free_one()]
    Link: http://lkml.kernel.org/r/202006261306.0D82A2B@keescook
    Link: https://lore.kernel.org/lkml/7ff248c7-d447-340c-a8e2-8c02972aca70@infradead.org

    Signed-off-by: Kees Cook
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Acked-by: Randy Dunlap [build tested]
    Cc: Roman Gushchin
    Cc: Christoph Lameter
    Cc: Alexander Popov
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Vinayak Menon
    Cc: Matthew Garrett
    Cc: Jann Horn
    Cc: Vijayanand Jitta
    Link: http://lkml.kernel.org/r/20200625215548.389774-3-keescook@chromium.org
    Signed-off-by: Linus Torvalds

    Kees Cook
     

04 Jun, 2020

1 commit

  • classzone_idx is just different name for high_zoneidx now. So, integrate
    them and add some comment to struct alloc_context in order to reduce
    future confusion about the meaning of this variable.

    The accessor, ac_classzone_idx() is also removed since it isn't needed
    after integration.

    In addition to integration, this patch also renames high_zoneidx to
    highest_zoneidx since it represents more precise meaning.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Reviewed-by: Baoquan He
    Acked-by: Vlastimil Babka
    Acked-by: David Rientjes
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Ye Xiaolong
    Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

14 Jan, 2020

1 commit

  • Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
    debugging") has introduced a static key to reduce overhead when
    debug_pagealloc is compiled in but not enabled. It relied on the
    assumption that jump_label_init() is called before parse_early_param()
    as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
    it is safe to enable the static key.

    However, it turns out multiple architectures call parse_early_param()
    earlier from their setup_arch(). x86 also calls jump_label_init() even
    earlier, so no issue was found while testing the commit, but same is not
    true for e.g. ppc64 and s390 where the kernel would not boot with
    debug_pagealloc=on as found by our QA.

    To fix this without tricky changes to init code of multiple
    architectures, this patch partially reverts the static key conversion
    from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
    code) of debug_pagealloc_enabled() will again test a simple bool
    variable. Fastpath mm code is converted to a new
    debug_pagealloc_enabled_static() variant that relies on the static key,
    which is enabled in a well-defined point in mm_init() where it's
    guaranteed that jump_label_init() has been called, regardless of
    architecture.

    [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
    Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
    Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Stephen Rothwell
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

01 Dec, 2019

2 commits

  • The size of kmalloc can be obtained from kmalloc_info[], so remove
    kmalloc_size() that will not be used anymore.

    Link: http://lkml.kernel.org/r/1569241648-26908-3-git-send-email-lpf.vector@gmail.com
    Signed-off-by: Pengfei Li
    Acked-by: Vlastimil Babka
    Acked-by: Roman Gushchin
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pengfei Li
     
  • Patch series "mm, slab: Make kmalloc_info[] contain all types of names", v6.

    There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM
    and KMALLOC_DMA.

    The name of KMALLOC_NORMAL is contained in kmalloc_info[].name,
    but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically
    generated by kmalloc_cache_name().

    Patch1 predefines the names of all types of kmalloc to save
    the time spent dynamically generating names.

    These changes make sense, and the time spent by new_kmalloc_cache()
    has been reduced by approximately 36.3%.

    Time spent by new_kmalloc_cache()
    (CPU cycles)
    5.3-rc7 66264
    5.3-rc7+patch 42188

    This patch (of 3):

    There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and
    KMALLOC_DMA.

    The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the
    names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by
    kmalloc_cache_name().

    This patch predefines the names of all types of kmalloc to save the time
    spent dynamically generating names.

    Besides, remove the kmalloc_cache_name() that is no longer used.

    Link: http://lkml.kernel.org/r/1569241648-26908-2-git-send-email-lpf.vector@gmail.com
    Signed-off-by: Pengfei Li
    Acked-by: Vlastimil Babka
    Acked-by: Roman Gushchin
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pengfei Li