27 Apr, 2022
1 commit
-
commit 2dfe63e61cc31ee59ce951672b0850b5229cd5b0 upstream.
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
producing garbage data due to the object not actually being maintained
by SLAB or SLUB.Fix this by implementing __kfence_obj_info() that copies relevant
information to struct kmem_obj_info when the object was allocated by
KFENCE; this is called by a common kmem_obj_info(), which also calls the
slab/slub/slob specific variant now called __kmem_obj_info().For completeness, kmem_dump_obj() now displays if the object was
allocated by KFENCE.Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB")
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Marco Elver
Reviewed-by: Hyeonggon Yoo
Reported-by: kernel test robot
Acked-by: Vlastimil Babka [slab]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
08 Apr, 2022
1 commit
-
commit ae085d7f9365de7da27ab5c0d16b12d51ea7fca9 upstream.
The objcg is not cleared and put for kfence object when it is freed,
which could lead to memory leak for struct obj_cgroup and wrong
statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B.Since the last freed object's objcg is not cleared,
mem_cgroup_from_obj() could return the wrong memcg when this kfence
object, which is not charged to any objcgs, is reallocated to other
users.A real word issue [1] is caused by this bug.
Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1]
Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Muchun Song
Cc: Dmitry Vyukov
Cc: Marco Elver
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
19 Oct, 2021
1 commit
-
Once upon a time, the node demotion updates were driven solely by memory
hotplug events. But now, there are handlers for both CPU and memory
hotplug.However, the #ifdef around the code checks only memory hotplug. A
system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU
hotplug events.Update the #ifdef around the common code. Add memory and CPU-specific
#ifdefs for their handlers. These memory/CPU #ifdefs avoid unused
function warnings when their Kconfig option is off.[arnd@arndb.de: rework hotplug_memory_notifier() stub]
Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.orgLink: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen
Signed-off-by: Arnd Bergmann
Cc: "Huang, Ying"
Cc: Michal Hocko
Cc: Wei Xu
Cc: Oscar Salvador
Cc: David Rientjes
Cc: Dan Williams
Cc: David Hildenbrand
Cc: Greg Thelen
Cc: Yang Shi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 May, 2021
2 commits
-
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar
Reviewed-by: Matthew Wilcox (Oracle)
Reviewed-by: Randy Dunlap
Cc: Bhaskar Chowdhury
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is a spelling mistake in a comment. Fix it.
Link: https://lkml.kernel.org/r/20210317094158.5762-1-colin.king@canonical.com
Signed-off-by: Colin Ian King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 May, 2021
2 commits
-
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_free is enabled.With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_free are enabled. Instead, memory is
initialized in KASAN runtime.For SLUB, the memory initialization memset() is moved into
slab_free_hook() that currently directly follows the initialization loop.
A new argument is added to slab_free_hook() that indicates whether to
initialize the memory or not.To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_free is enabled.Link: https://lkml.kernel.org/r/190fd15c1886654afdec0d19ebebd5ade665b601.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Marco Elver
Cc: Alexander Potapenko
Cc: Andrey Ryabinin
Cc: Branislav Rankov
Cc: Catalin Marinas
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Dmitry Vyukov
Cc: Evgenii Stepanov
Cc: Joonsoo Kim
Cc: Kevin Brodsky
Cc: Pekka Enberg
Cc: Peter Collingbourne
Cc: Vincenzo Frascino
Cc: Vlastimil Babka
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This change uses the previously added memory initialization feature of
HW_TAGS KASAN routines for slab memory when init_on_alloc is enabled.With this change, memory initialization memset() is no longer called when
both HW_TAGS KASAN and init_on_alloc are enabled. Instead, memory is
initialized in KASAN runtime.The memory initialization memset() is moved into slab_post_alloc_hook()
that currently directly follows the initialization loop. A new argument
is added to slab_post_alloc_hook() that indicates whether to initialize
the memory or not.To avoid discrepancies with which memory gets initialized that can be
caused by future changes, both KASAN hook and initialization memset() are
put together and a warning comment is added.Combining setting allocation tags with memory initialization improves
HW_TAGS KASAN performance when init_on_alloc is enabled.Link: https://lkml.kernel.org/r/c1292aeb5d519da221ec74a0684a949b027d7720.1615296150.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Marco Elver
Cc: Alexander Potapenko
Cc: Andrey Ryabinin
Cc: Branislav Rankov
Cc: Catalin Marinas
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Dmitry Vyukov
Cc: Evgenii Stepanov
Cc: Joonsoo Kim
Cc: Kevin Brodsky
Cc: Pekka Enberg
Cc: Peter Collingbourne
Cc: Vincenzo Frascino
Cc: Vlastimil Babka
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Apr, 2021
1 commit
-
…ulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:
- Bitmap support for "N" as alias for last bit
- kvfree_rcu updates
- mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.)
- RCU callback offloading update
- Polling RCU grace-period interfaces
- Realtime-related RCU updates
- Tasks-RCU updates
- Torture-test updates
- Torture-test scripting updates
- Miscellaneous fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
14 Mar, 2021
1 commit
-
cache_alloc_debugcheck_after() performs checks on an object, including
adjusting the returned pointer. None of this should apply to KFENCE
objects. While for non-bulk allocations, the checks are skipped when we
allocate via KFENCE, for bulk allocations cache_alloc_debugcheck_after()
is called via cache_alloc_debugcheck_after_bulk().Fix it by skipping cache_alloc_debugcheck_after() for KFENCE objects.
Link: https://lkml.kernel.org/r/20210304205256.2162309-1-elver@google.com
Signed-off-by: Marco Elver
Cc: Alexander Potapenko
Cc: Dmitry Vyukov
Cc: Andrey Konovalov
Cc: Jann Horn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Mar, 2021
1 commit
-
The mem_dump_obj() functionality adds a few hundred bytes, which is a
small price to pay. Except on kernels built with CONFIG_PRINTK=n, in
which mem_dump_obj() messages will be suppressed. This commit therefore
makes mem_dump_obj() be a static inline empty function on kernels built
with CONFIG_PRINTK=n and excludes all of its support functions as well.
This avoids kernel bloat on systems that cannot use mem_dump_obj().Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc:
Suggested-by: Andrew Morton
Signed-off-by: Paul E. McKenney
27 Feb, 2021
1 commit
-
Inserts KFENCE hooks into the SLAB allocator.
To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
Signed-off-by: Marco Elver
Signed-off-by: Alexander Potapenko
Reviewed-by: Dmitry Vyukov
Co-developed-by: Marco ElverCc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Andrey Ryabinin
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Dave Hansen
Cc: Eric Dumazet
Cc: Greg Kroah-Hartman
Cc: Hillf Danton
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jann Horn
Cc: Joern Engel
Cc: Jonathan Corbet
Cc: Kees Cook
Cc: Mark Rutland
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: SeongJae Park
Cc: Thomas Gleixner
Cc: Vlastimil Babka
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Feb, 2021
5 commits
-
Generic mm functions that call KASAN annotations that might report a bug
pass _RET_IP_ to them as an argument. This allows KASAN to include the
name of the function that called the mm function in its report's header.Now that KASAN has inline wrappers for all of its annotations, move
_RET_IP_ to those wrappers to simplify annotation call sites.Link: https://linux-review.googlesource.com/id/I8fb3c06d49671305ee184175a39591bc26647a67
Link: https://lkml.kernel.org/r/5c1490eddf20b436b8c4eeea83fce47687d5e4a4.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov
Reviewed-by: Marco Elver
Reviewed-by: Alexander Potapenko
Cc: Andrey Ryabinin
Cc: Branislav Rankov
Cc: Catalin Marinas
Cc: Dmitry Vyukov
Cc: Evgenii Stepanov
Cc: Kevin Brodsky
Cc: Peter Collingbourne
Cc: Vincenzo Frascino
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In general it's unknown in advance if a slab page will contain accounted
objects or not. In order to avoid memory waste, an obj_cgroup vector is
allocated dynamically when a need to account of a new object arises. Such
approach is memory efficient, but requires an expensive cmpxchg() to set
up the memcg/objcgs pointer, because an allocation can race with a
different allocation on another cpu.But in some common cases it's known for sure that a slab page will contain
accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT
flag set. It includes such popular objects like vm_area_struct, anon_vma,
task_struct, etc.In such cases we can pre-allocate the objcgs vector and simple assign it
to the page without any atomic operations, because at this early stage the
page is not visible to anyone else.A very simplistic benchmark (allocating 10000000 64-bytes objects in a
row) shows ~15% win. In the real life it seems that most workloads are
not very sensitive to the speed of (accounted) slab allocations.[guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes]
Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com
[akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch]Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.com
Signed-off-by: Roman Gushchin
Acked-by: Johannes Weiner
Reviewed-by: Shakeel Butt
Cc: Michal Hocko
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix some coding style issues, improve code reading. Adds whitespace to
clearly separate the parameters.Link: https://lkml.kernel.org/r/1612841499-32166-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This argument hasn't been used since e153362a50a3 ("slub: Remove objsize
check in kmem_cache_flags()") so simply remove it.Link: https://lkml.kernel.org/r/20210126095733.974665-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov
Reviewed-by: Miaohe Lin
Reviewed-by: Vlastimil Babka
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Cc: Pekka Enberg
Cc: Joonsoo Kim
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, a trace record generated by the RCU core is as below.
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f3b49a66
It doesn't tell us what the RCU core has freed.
This patch adds the slab name to trace_kmem_cache_free().
The new format is as follows.... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000037f79c8d name=dentry
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=00000000f78cb7b5 name=sock_inode_cache
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=0000000018768985 name=pool_workqueue
... kmem_cache_free: call_site=rcu_core+0x1fd/0x610 ptr=000000006a6cb484 name=radix_tree_nodeWe can use it to understand what the RCU core is going to free. For
example, some users maybe interested in when the RCU core starts
freeing reclaimable slabs like dentry to reduce memory pressure.Link: https://lkml.kernel.org/r/20201216072804.8838-1-jian.w.wen@oracle.com
Signed-off-by: Jacob Wen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Steven Rostedt
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Jan, 2021
1 commit
-
There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening. In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from. This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is. These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.The information printed can depend on kernel configuration. For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled. For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired. Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrew Morton
Cc:
Reported-by: Andrii Nakryiko
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
[ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
[ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
[ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
Acked-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Tested-by: Naresh Kamboju
Signed-off-by: Paul E. McKenney
16 Dec, 2020
3 commits
-
Patch series "arch, mm: improve robustness of direct map manipulation", v7.
During recent discussion about KVM protected memory, David raised a
concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
scope [1].Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
possible that __kernel_map_pages() would fail, but since this function is
void, the failure will go unnoticed.Moreover, there's lack of consistency of __kernel_map_pages() semantics
across architectures as some guard this function with #ifdef
DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
debugging is disabled at run time and some allow modifying the direct map
regardless of DEBUG_PAGEALLOC settings.This set straightens this out by restoring dependency of
__kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
accordingly.Since currently the only user of __kernel_map_pages() outside
DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
there more explicit.[1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com
This patch (of 4):
When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
direct mapping after free_pages(). The pages than need to be mapped back
before they could be used. Theese mapping operations use
__kernel_map_pages() guarded with with debug_pagealloc_enabled().The only place that calls __kernel_map_pages() without checking whether
DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still,
on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
enabled but set_direct_map_invalid_noflush() may render some pages not
present in the direct map and hibernation code won't be able to save such
pages.To make page allocation debugging and hibernation interaction more robust,
the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
made more explicit.Start with combining the guard condition and the call to
__kernel_map_pages() into debug_pagealloc_map_pages() and
debug_pagealloc_unmap_pages() functions to emphasize that
__kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
these new functions to map/unmap pages when page allocation debugging is
enabled.Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.org
Signed-off-by: Mike Rapoport
Reviewed-by: David Hildenbrand
Acked-by: Kirill A. Shutemov
Acked-by: Vlastimil Babka
Cc: Albert Ou
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Christoph Lameter
Cc: "David S. Miller"
Cc: Dave Hansen
Cc: David Rientjes
Cc: "Edgecombe, Rick P"
Cc: "H. Peter Anvin"
Cc: Heiko Carstens
Cc: Ingo Molnar
Cc: Joonsoo Kim
Cc: Len Brown
Cc: Michael Ellerman
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Will Deacon
Cc: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.Lets move init_on_free clearing before calling kasan_slab_free(). In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.Link: https://lkml.kernel.org/r/20201210183729.1261524-1-alex.popov@linux.com
Signed-off-by: Alexander Popov
Reviewed-by: Alexander Potapenko
Acked-by: David Rientjes
Acked-by: Joonsoo Kim
Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The page allocator expects that page->mapping is NULL for a page being
freed. SLAB and SLUB use the slab_cache field which is in union with
mapping, but before freeing the page, the field is referenced with the
"mapping" name when set to NULL.It's IMHO more correct (albeit functionally the same) to use the
slab_cache name as that's the field we use in SL*B, and document why we
clear it in a comment (we don't clear fields such as s_mem or freelist, as
page allocator doesn't care about those). While using the 'mapping' name
would automagically keep the code correct if the unions in struct page
changed, such changes should be done consciously and needed changes
evaluated - the comment should help with that.Link: https://lkml.kernel.org/r/20201210160020.21562-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: David Rientjes
Acked-by: Joonsoo Kim
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2020
1 commit
-
Correct one function name "get_partials" with "get_partial". Update the
old struct name of list3 with kmem_cache_node.Signed-off-by: Chen Tao
Signed-off-by: Andrew Morton
Reviewed-by: Mike Rapoport
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Linus Torvalds
14 Oct, 2020
2 commits
-
Object cgroup charging is done for all the objects during allocation, but
during freeing, uncharging ends up happening for only one object in the
case of bulk allocation/freeing.Fix this by having a separate call to uncharge all the objects from
kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take
care of bulk uncharging.Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects"
Signed-off-by: Bharata B Rao
Signed-off-by: Andrew Morton
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Vlastimil Babka
Cc: Shakeel Butt
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Cc:
Link: https://lkml.kernel.org/r/20201009060423.390479-1-bharata@linux.ibm.com
Signed-off-by: Linus Torvalds -
The removed code was unnecessary and changed nothing in the flow, since in
case of returning NULL by 'kmem_cache_alloc_node' returning 'freelist'
from the function in question is the same as returning NULL.Signed-off-by: Mateusz Nosek
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: https://lkml.kernel.org/r/20200915230329.13002-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds
27 Sep, 2020
1 commit
-
With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
kmem_caches for all allocations"), it becomes possible to call kfree()
from the slabs_destroy().The functions cache_flusharray() and do_drain() calls slabs_destroy() on
array_cache of the local CPU without updating the size of the
array_cache. This enables the kfree() call from the slabs_destroy() to
recursively call cache_flusharray() which can potentially call
free_block() on the same elements of the array_cache of the local CPU
and causing double free and memory corruption.To fix the issue, simply update the local CPU array_cache cache before
calling slabs_destroy().Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Tested-by: Ming Lei
Reported-by: kernel test robot
Cc: Andrew Morton
Cc: Ted Ts'o
Signed-off-by: Linus Torvalds
08 Aug, 2020
12 commits
-
charge_slab_page() and uncharge_slab_page() are not related anymore to
memcg charging and uncharging. In order to make their names less
confusing, let's rename them to account_slab_page() and
unaccount_slab_page() respectively.Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com
Signed-off-by: Linus Torvalds -
charge_slab_page() is not using the gfp argument anymore,
remove it.Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Johannes Weiner
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com
Signed-off-by: Linus Torvalds -
Instead of having two sets of kmem_caches: one for system-wide and
non-accounted allocations and the second one shared by all accounted
allocations, we can use just one.The idea is simple: space for obj_cgroup metadata can be allocated on
demand and filled only for accounted allocations.It allows to remove a bunch of code which is required to handle kmem_cache
clones for accounted allocations. There is no more need to create them,
accumulate statistics, propagate attributes, etc. It's a quite
significant simplification.Also, because the total number of slab_caches is reduced almost twice (not
all kmem_caches have a memcg clone), some additional memory savings are
expected. On my devvm it additionally saves about 3.5% of slab memory.[guro@fb.com: fix build on MIPS]
Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.comSuggested-by: Johannes Weiner
Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Michal Hocko
Cc: Tejun Heo
Cc: Naresh Kamboju
Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com
Signed-off-by: Linus Torvalds -
Currently there are two lists of kmem_caches:
1) slab_caches, which contains all kmem_caches,
2) slab_root_caches, which contains only root kmem_caches.And there is some preprocessor magic to have a single list if
CONFIG_MEMCG_KMEM isn't enabled.It was required earlier because the number of non-root kmem_caches was
proportional to the number of memory cgroups and could reach really big
values. Now, when it cannot exceed the number of root kmem_caches, there
is really no reason to maintain two lists.We never iterate over the slab_root_caches list on any hot paths, so it's
perfectly fine to iterate over slab_caches and filter out non-root
kmem_caches.It allows to remove a lot of config-dependent code and two pointers from
the kmem_cache structure.Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com
Signed-off-by: Linus Torvalds -
This is fairly big but mostly red patch, which makes all accounted slab
allocations use a single set of kmem_caches instead of creating a separate
set for each memory cgroup.Because the number of non-root kmem_caches is now capped by the number of
root kmem_caches, there is no need to shrink or destroy them prematurely.
They can be perfectly destroyed together with their root counterparts.
This allows to dramatically simplify the management of non-root
kmem_caches and delete a ton of code.This patch performs the following changes:
1) introduces memcg_params.memcg_cache pointer to represent the
kmem_cache which will be used for all non-root allocations
2) reuses the existing memcg kmem_cache creation mechanism
to create memcg kmem_cache on the first allocation attempt
3) memcg kmem_caches are named -memcg,
e.g. dentry-memcg
4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache
or schedule it's creation and return the root cache
5) removes almost all non-root kmem_cache management code
(separate refcounter, reparenting, shrinking, etc)
6) makes slab debugfs to display root_mem_cgroup css id and never
show :dead and :deact flags in the memcg_slabinfo attribute.Following patches in the series will simplify the kmem_cache creation.
Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com
Signed-off-by: Linus Torvalds -
Store the obj_cgroup pointer in the corresponding place of
page->obj_cgroups for each allocated non-root slab object. Make sure that
each allocated object holds a reference to obj_cgroup.Objcg pointer is obtained from the memcg->objcg dereferencing in
memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook.
Then in case of successful allocation(s) it's getting stored in the
page->obj_cgroups vector.The objcg obtaining part look a bit bulky now, but it will be simplified
by next commits in the series.Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com
Signed-off-by: Linus Torvalds -
Provide the necessary KCSAN checks to assist with debugging racy
use-after-frees. While KASAN is more reliable at generally catching such
use-after-frees (due to its use of a quarantine), it can be difficult to
debug racy use-after-frees. If a reliable reproducer exists, KCSAN can
assist in debugging such issues.Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is
simply sizeof(var). Instead, here we just use __kcsan_check_access()
explicitly to pass the correct size.Signed-off-by: Marco Elver
Signed-off-by: Andrew Morton
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200623072653.114563-1-elver@google.com
Signed-off-by: Linus Torvalds -
cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get
the cache from its page in kmem_cache_free()") to support kmemcg, where
per-memcg cache can be different from the root one, so we can't use the
kmem_cache pointer given to kmem_cache_free().Prior to that commit, SLUB already had debugging check+warning that could
be enabled to compare the given kmem_cache pointer to one referenced by
the slab page where the object-to-be-freed resides. This check was moved
to cache_from_obj(). Later the check was also enabled for
SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
cache membership under freelist hardening").These checks and warnings can be useful especially for the debugging,
which can be improved. Commit 598a0717a816 changed the pr_err() with
WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
others are silent. This patch changes it to WARN() so that all errors are
reported.It's also useful to print SLUB allocation/free tracking info for the
offending object, if tracking is enabled. Thus, export the SLUB
print_tracking() function and provide an empty one for SLAB.For SLUB we can also benefit from the static key check in
kmem_cache_debug_flags(), but we need to move this function to slab.h and
declare the static key there.[1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com
[vbabka@suse.cz: avoid bogus WARN()]
Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian
Link: http://lkml.kernel.org/r/b33e0fa7-cd28-4788-9e54-5927846329ef@suse.czSigned-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Kees Cook
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Matthew Garrett
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: Vinayak Menon
Link: http://lkml.kernel.org/r/afeda7ac-748b-33d8-a905-56b708148ad5@suse.cz
Signed-off-by: Linus Torvalds -
The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b:
always get the cache from its page in kmem_cache_free()") to support
kmemcg, where per-memcg cache can be different from the root one, so we
can't use the kmem_cache pointer given to kmem_cache_free().Prior to that commit, SLUB already had debugging check+warning that could
be enabled to compare the given kmem_cache pointer to one referenced by
the slab page where the object-to-be-freed resides. This check was moved
to cache_from_obj(). Later the check was also enabled for
SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
cache membership under freelist hardening").These checks and warnings can be useful especially for the debugging,
which can be improved. Commit 598a0717a816 changed the pr_err() with
WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
others are silent. This patch changes it to WARN() so that all errors are
reported.It's also useful to print SLUB allocation/free tracking info for the
offending object, if tracking is enabled. We could export the SLUB
print_tracking() function and provide an empty one for SLAB, or realize
that both the debugging and hardening cases in cache_from_obj() are only
supported by SLUB anyway. So this patch moves cache_from_obj() from
slab.h to separate instances in slab.c and slub.c, where the SLAB version
only does the kmemcg lookup and even could be completely removed once the
kmemcg rework [1] is merged. The SLUB version can thus easily use the
print_tracking() function. It can also use the kmem_cache_debug_flags()
static key check for improved performance in kernels without the hardening
and with debugging not enabled on boot.[1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com
Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Cc: Christoph Lameter
Cc: Jann Horn
Cc: Kees Cook
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-10-vbabka@suse.cz
Signed-off-by: Linus Torvalds -
kmem_list3 has been renamed to kmem_cache_node long long ago so update it.
References:
6744f087ba2a ("slab: Common name for the per node structures")
ce8eb6c424c7 ("slab: Rename list3/l3 to node")Signed-off-by: Xiao Yang
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200722033355.26908-1-yangx.jy@cn.fujitsu.com
Signed-off-by: Linus Torvalds -
kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of
memory currently bypasses the check and will simply leak the memory when
page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK
check out of slab & slub, and call it from kmalloc_order() as well. In
order to make the code clear, the warning message is put in one place.Signed-off-by: Long Li
Signed-off-by: Andrew Morton
Reviewed-by: Matthew Wilcox (Oracle)
Reviewed-by: Pekka Enberg
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong
Signed-off-by: Linus Torvalds -
Similar to commit ce6fa91b9363 ("mm/slub.c: add a naive detection of
double free or corruption"), add a very cheap double-free check for SLAB
under CONFIG_SLAB_FREELIST_HARDENED. With this added, the
"SLAB_FREE_DOUBLE" LKDTM test passes under SLAB:lkdtm: Performing direct entry SLAB_FREE_DOUBLE
lkdtm: Attempting double slab free ...
------------[ cut here ]------------
WARNING: CPU: 2 PID: 2193 at mm/slab.c:757 ___cache _free+0x325/0x390[keescook@chromium.org: fix misplaced __free_one()]
Link: http://lkml.kernel.org/r/202006261306.0D82A2B@keescook
Link: https://lore.kernel.org/lkml/7ff248c7-d447-340c-a8e2-8c02972aca70@infradead.orgSigned-off-by: Kees Cook
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Acked-by: Randy Dunlap [build tested]
Cc: Roman Gushchin
Cc: Christoph Lameter
Cc: Alexander Popov
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Vinayak Menon
Cc: Matthew Garrett
Cc: Jann Horn
Cc: Vijayanand Jitta
Link: http://lkml.kernel.org/r/20200625215548.389774-3-keescook@chromium.org
Signed-off-by: Linus Torvalds
04 Jun, 2020
1 commit
-
classzone_idx is just different name for high_zoneidx now. So, integrate
them and add some comment to struct alloc_context in order to reduce
future confusion about the meaning of this variable.The accessor, ac_classzone_idx() is also removed since it isn't needed
after integration.In addition to integration, this patch also renames high_zoneidx to
highest_zoneidx since it represents more precise meaning.Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Acked-by: Vlastimil Babka
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Ye Xiaolong
Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds
14 Jan, 2020
1 commit
-
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
debugging") has introduced a static key to reduce overhead when
debug_pagealloc is compiled in but not enabled. It relied on the
assumption that jump_label_init() is called before parse_early_param()
as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
it is safe to enable the static key.However, it turns out multiple architectures call parse_early_param()
earlier from their setup_arch(). x86 also calls jump_label_init() even
earlier, so no issue was found while testing the commit, but same is not
true for e.g. ppc64 and s390 where the kernel would not boot with
debug_pagealloc=on as found by our QA.To fix this without tricky changes to init code of multiple
architectures, this patch partially reverts the static key conversion
from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
code) of debug_pagealloc_enabled() will again test a simple bool
variable. Fastpath mm code is converted to a new
debug_pagealloc_enabled_static() variant that relies on the static key,
which is enabled in a well-defined point in mm_init() where it's
guaranteed that jump_label_init() has been called, regardless of
architecture.[sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
Signed-off-by: Vlastimil Babka
Signed-off-by: Stephen Rothwell
Cc: Joonsoo Kim
Cc: "Kirill A. Shutemov"
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Borislav Petkov
Cc: Qian Cai
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Dec, 2019
2 commits
-
The size of kmalloc can be obtained from kmalloc_info[], so remove
kmalloc_size() that will not be used anymore.Link: http://lkml.kernel.org/r/1569241648-26908-3-git-send-email-lpf.vector@gmail.com
Signed-off-by: Pengfei Li
Acked-by: Vlastimil Babka
Acked-by: Roman Gushchin
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "mm, slab: Make kmalloc_info[] contain all types of names", v6.
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM
and KMALLOC_DMA.The name of KMALLOC_NORMAL is contained in kmalloc_info[].name,
but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically
generated by kmalloc_cache_name().Patch1 predefines the names of all types of kmalloc to save
the time spent dynamically generating names.These changes make sense, and the time spent by new_kmalloc_cache()
has been reduced by approximately 36.3%.Time spent by new_kmalloc_cache()
(CPU cycles)
5.3-rc7 66264
5.3-rc7+patch 42188This patch (of 3):
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and
KMALLOC_DMA.The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the
names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by
kmalloc_cache_name().This patch predefines the names of all types of kmalloc to save the time
spent dynamically generating names.Besides, remove the kmalloc_cache_name() that is no longer used.
Link: http://lkml.kernel.org/r/1569241648-26908-2-git-send-email-lpf.vector@gmail.com
Signed-off-by: Pengfei Li
Acked-by: Vlastimil Babka
Acked-by: Roman Gushchin
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds