Eric Lee / smarc-fsl-linux-kernel

15 Nov, 2020

1 commit

22e4663e9 mm/slub: fix panic in slab_alloc_node() ... Browse Code »

While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
with 11TB of ram, I hit the following panic:

BUG: Kernel NULL pointer dereference on read at 0x00000007
Faulting instruction address: 0xc000000000456048
Oops: Kernel access of bad area, sig: 11 [#2]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp
CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1
NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0)
MSR: 8000000000009033 CR: 24004228 XER: 00000000
CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
NIP [c000000000456048] __kmalloc_node+0x108/0x790
LR [c000000000455fd4] __kmalloc_node+0x94/0x790
Call Trace:
kvmalloc_node+0x58/0x110
mem_cgroup_css_online+0x10c/0x270
online_css+0x48/0xd0
cgroup_apply_control_enable+0x2c4/0x470
cgroup_mkdir+0x408/0x5f0
kernfs_iop_mkdir+0x90/0x100
vfs_mkdir+0x138/0x250
do_mkdirat+0x154/0x1c0
system_call_exception+0xf8/0x200
system_call_common+0xf0/0x27c
Instruction dump:
e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
2fbc0000 419e0018 41920230 e9270010 7f994800 419e0220 7ee6bb78

This pointing to the following code:

mm/slub.c:2851
if (unlikely(!object || !node_match(page, node))) {
c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0
c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054
node_match():
mm/slub.c:2491
if (node != NUMA_NO_NODE && page_to_nid(page) != node)
c000000000456040: 30 02 92 41 beq cr4,c000000000456270
page_to_nid():
include/linux/mm.h:1294
c000000000456044: 10 00 27 e9 ld r9,16(r7)
c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL
node_match():
mm/slub.c:2491
c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9
c000000000456050: 20 02 9e 41 beq cr7,c000000000456270

The panic occurred in slab_alloc_node() when checking for the page's node:

object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);

The issue is that object is not NULL while page is NULL which is odd but
may happen if the cache flush happened after loading object but before
loading page. Thus checking for the page pointer is required too.

The cache flush is done through an inter processor interrupt when a
piece of memory is off-lined. That interrupt is triggered when a memory
hot-unplug operation is initiated and offline_pages() is calling the
slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback()
which is calling flush_cpu_slab(). If that interrupt is caught between
the reading of c->freelist and the reading of c->page, this could lead
to such a situation. That situation is expected and the later call to
this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
the whole operation.

In commit 6159d0f5c03e ("mm/slub.c: page is always non-NULL in
node_match()") check on the page pointer has been removed assuming that
page is always valid when it is called. It happens that this is not
true in that particular case, so check for page before calling
node_match() here.

Fixes: 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()")
Signed-off-by: Laurent Dufour
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Acked-by: Christoph Lameter
Cc: Wei Yang
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Nathan Lynch
Cc: Scott Cheloha
Cc: Michal Hocko
Cc:
Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds

Laurent Dufour
2020-11-15 03:26:03 +0800

17 Oct, 2020

1 commit

70b6d25ec mm: fix some comments formatting ... Browse Code »

Correct one function name "get_partials" with "get_partial". Update the
old struct name of list3 with kmem_cache_node.

Signed-off-by: Chen Tao
Signed-off-by: Andrew Morton
Reviewed-by: Mike Rapoport
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Linus Torvalds

Chen Tao
2020-10-17 02:11:19 +0800

14 Oct, 2020

4 commits

d1b2cf6cb mm: memcg/slab: uncharge during kmem_cache_free_bulk() ... Browse Code »

Object cgroup charging is done for all the objects during allocation, but
during freeing, uncharging ends up happening for only one object in the
case of bulk allocation/freeing.

Fix this by having a separate call to uncharge all the objects from
kmem_cache_free_bulk() and by modifying memcg_slab_free_hook() to take
care of bulk uncharging.

Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects"
Signed-off-by: Bharata B Rao
Signed-off-by: Andrew Morton
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Vlastimil Babka
Cc: Shakeel Butt
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Cc:
Link: https://lkml.kernel.org/r/20201009060423.390479-1-bharata@linux.ibm.com
Signed-off-by: Linus Torvalds

Bharata B Rao
2020-10-14 09:38:31 +0800
9cf7a1118 mm/slub: make add_full() condition more explicit ... Browse Code »

The commit below is incomplete, as it didn't handle the add_full() part.
commit a4d3f8916c65 ("slub: remove useless kmem_cache_debug() before
remove_full()")

This patch checks for SLAB_STORE_USER instead of kmem_cache_debug(), since
that should be the only context in which we need the list_lock for
add_full().

Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Liu Xiang
Link: https://lkml.kernel.org/r/20200811020240.1231-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds

Abel Wu
2020-10-14 09:38:27 +0800
9f986d998 mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc ... Browse Code »

The ALLOC_SLOWPATH statistics is missing in bulk allocation now. Fix it
by doing statistics in alloc slow path.

Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Joonsoo Kim
Cc: Hewenliang
Cc: Hu Shiyuan
Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds

Abel Wu
2020-10-14 09:38:27 +0800
c270cf304 mm/slub.c: branch optimization in free slowpath ... Browse Code »

The two conditions are mutually exclusive and gcc compiler will optimise
this into if-else-like pattern. Given that the majority of free_slowpath
is free_frozen, let's provide some hint to the compilers.

Tests (perf bench sched messaging -g 20 -l 400000, executed 10x
after reboot) are done and the summarized result:

un-patched patched
max. 192.316 189.851
min. 187.267 186.252
avg. 189.154 188.086
stdev. 1.37 0.99

Signed-off-by: Abel Wu
Signed-off-by: Andrew Morton
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Hewenliang
Cc: Hu Shiyuan
Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds

Abel Wu
2020-10-14 09:38:27 +0800

04 Oct, 2020

1 commit

484cfaca9 mm, slub: restore initial kmem_cache flags ... Browse Code »

The routine that applies debug flags to the kmem_cache slabs
inadvertantly prevents non-debug flags from being applied to those
same objects. That is, if slub_debug=, is specified,
non-debugged slabs will end up having flags of zero, and the slabs
may be unusable.

Fix this by including the input flags for non-matching slabs with the
contents of slub_debug, so that the caches are created as expected
alongside any debugging options that may be requested. With this, we
can remove the check for a NULL slub_debug_string, since it's covered
by the loop itself.

Fixes: e17f1dfba37b ("mm, slub: extend slub_debug syntax for multiple blocks")
Signed-off-by: Eric Farman
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Cc: Kees Cook
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: https://lkml.kernel.org/r/20200930161931.28575-1-farman@linux.ibm.com
Signed-off-by: Linus Torvalds

Eric Farman
2020-10-04 02:28:12 +0800

06 Sep, 2020

1 commit

dc07a728d mm: slub: fix conversion of freelist_corrupted() ... Browse Code »

Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].

Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended
in the original patch [1]. In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.

The issue has been spotted via static analysis and code review.

[1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/

Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
Signed-off-by: Eugeniu Rosca
Signed-off-by: Andrew Morton
Cc: Dongli Zhang
Cc: Joe Jin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc:
Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.com
Signed-off-by: Linus Torvalds

Eugeniu Rosca
2020-09-06 03:14:29 +0800

08 Aug, 2020

21 commits

74d555bed mm: slab: rename (un)charge_slab_page() to (un)account_slab_page() ... Browse Code »

charge_slab_page() and uncharge_slab_page() are not related anymore to
memcg charging and uncharging. In order to make their names less
confusing, let's rename them to account_slab_page() and
unaccount_slab_page() respectively.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:25 +0800
849504809 mm: memcg/slab: remove unused argument by charge_slab_page() ... Browse Code »

charge_slab_page() is not using the gfp argument anymore,
remove it.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Johannes Weiner
Cc: Michal Hocko
Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:25 +0800
10befea91 mm: memcg/slab: use a single set of kmem_caches for all allocations ... Browse Code »

Instead of having two sets of kmem_caches: one for system-wide and
non-accounted allocations and the second one shared by all accounted
allocations, we can use just one.

The idea is simple: space for obj_cgroup metadata can be allocated on
demand and filled only for accounted allocations.

It allows to remove a bunch of code which is required to handle kmem_cache
clones for accounted allocations. There is no more need to create them,
accumulate statistics, propagate attributes, etc. It's a quite
significant simplification.

Also, because the total number of slab_caches is reduced almost twice (not
all kmem_caches have a memcg clone), some additional memory savings are
expected. On my devvm it additionally saves about 3.5% of slab memory.

[guro@fb.com: fix build on MIPS]
Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com

Suggested-by: Johannes Weiner
Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Michal Hocko
Cc: Tejun Heo
Cc: Naresh Kamboju
Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:25 +0800
c7094406f mm: memcg/slab: deprecate slab_root_caches ... Browse Code »

Currently there are two lists of kmem_caches:
1) slab_caches, which contains all kmem_caches,
2) slab_root_caches, which contains only root kmem_caches.

And there is some preprocessor magic to have a single list if
CONFIG_MEMCG_KMEM isn't enabled.

It was required earlier because the number of non-root kmem_caches was
proportional to the number of memory cgroups and could reach really big
values. Now, when it cannot exceed the number of root kmem_caches, there
is really no reason to maintain two lists.

We never iterate over the slab_root_caches list on any hot paths, so it's
perfectly fine to iterate over slab_caches and filter out non-root
kmem_caches.

It allows to remove a lot of config-dependent code and two pointers from
the kmem_cache structure.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:25 +0800
9855609bd mm: memcg/slab: use a single set of kmem_caches for all accounted allocations ... Browse Code »

This is fairly big but mostly red patch, which makes all accounted slab
allocations use a single set of kmem_caches instead of creating a separate
set for each memory cgroup.

Because the number of non-root kmem_caches is now capped by the number of
root kmem_caches, there is no need to shrink or destroy them prematurely.
They can be perfectly destroyed together with their root counterparts.
This allows to dramatically simplify the management of non-root
kmem_caches and delete a ton of code.

This patch performs the following changes:
1) introduces memcg_params.memcg_cache pointer to represent the
kmem_cache which will be used for all non-root allocations
2) reuses the existing memcg kmem_cache creation mechanism
to create memcg kmem_cache on the first allocation attempt
3) memcg kmem_caches are named -memcg,
e.g. dentry-memcg
4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache
or schedule it's creation and return the root cache
5) removes almost all non-root kmem_cache management code
(separate refcounter, reparenting, shrinking, etc)
6) makes slab debugfs to display root_mem_cgroup css id and never
show :dead and :deact flags in the memcg_slabinfo attribute.

Following patches in the series will simplify the kmem_cache creation.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:25 +0800
964d4bd37 mm: memcg/slab: save obj_cgroup for non-root slab objects ... Browse Code »

Store the obj_cgroup pointer in the corresponding place of
page->obj_cgroups for each allocated non-root slab object. Make sure that
each allocated object holds a reference to obj_cgroup.

Objcg pointer is obtained from the memcg->objcg dereferencing in
memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook.
Then in case of successful allocation(s) it's getting stored in the
page->obj_cgroups vector.

The objcg obtaining part look a bit bulky now, but it will be simplified
by next commits in the series.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Cc: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:24 +0800
4138fdfc8 mm: slub: implement SLUB version of obj_to_index() ... Browse Code »

This commit implements SLUB version of the obj_to_index() function, which
will be required to calculate the offset of obj_cgroup in the obj_cgroups
vector to store/obtain the objcg ownership data.

To make it faster, let's repeat the SLAB's trick introduced by commit
6a2d7a955d8d ("SLAB: use a multiply instead of a divide in
obj_to_index()") and avoid an expensive division.

Vlastimil Babka noticed, that SLUB does have already a similar function
called slab_index(), which is defined only if SLUB_DEBUG is enabled. The
function does a similar math, but with a division, and it also takes a
page address instead of a page pointer.

Let's remove slab_index() and replace it with the new helper
__obj_to_index(), which takes a page address. obj_to_index() will be a
simple wrapper taking a page pointer and passing page_address(page) into
__obj_to_index().

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Shakeel Butt
Acked-by: Johannes Weiner
Cc: Christoph Lameter
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-5-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:24 +0800
d42f3245c mm: memcg: convert vmstat slab counters to bytes ... Browse Code »

In order to prepare for per-object slab memory accounting, convert
NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes.

To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and
NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB).

Internally global and per-node counters are stored in pages, however memcg
and lruvec counters are stored in bytes. This scheme may look weird, but
only for now. As soon as slab pages will be shared between multiple
cgroups, global and node counters will reflect the total number of slab
pages. However memcg and lruvec counters will be used for per-memcg slab
memory tracking, which will take separate kernel objects in the account.
Keeping global and node counters in pages helps to avoid additional
overhead.

The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it
will fit into atomic_long_t we use for vmstats.

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Johannes Weiner
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Cc: Michal Hocko
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-08 02:33:24 +0800
cfbe1636c mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS" ... Browse Code »

Provide the necessary KCSAN checks to assist with debugging racy
use-after-frees. While KASAN is more reliable at generally catching such
use-after-frees (due to its use of a quarantine), it can be difficult to
debug racy use-after-frees. If a reliable reproducer exists, KCSAN can
assist in debugging such issues.

Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is
simply sizeof(var). Instead, here we just use __kcsan_check_access()
explicitly to pass the correct size.

Signed-off-by: Marco Elver
Signed-off-by: Andrew Morton
Cc: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200623072653.114563-1-elver@google.com
Signed-off-by: Linus Torvalds

Marco Elver
2020-08-08 02:33:23 +0800
b3cb9fc3a mm/slub.c: drop lockdep_assert_held() from put_map() ... Browse Code »

There is no point in using lockdep_assert_held() unlock that is about to
be unlocked. It works only with lockdep and lockdep will complain if
spin_unlock() is used on a lock that has not been locked.

Remove superfluous lockdep_assert_held().

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Andrew Morton
Cc: Yu Zhao
Cc: Christopher Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20200618201234.795692-2-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds

Sebastian Andrzej Siewior
2020-08-08 02:33:23 +0800
e42f174e4 mm, slab/slub: improve error reporting and overhead of cache_from_obj() ... Browse Code »

cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get
the cache from its page in kmem_cache_free()") to support kmemcg, where
per-memcg cache can be different from the root one, so we can't use the
kmem_cache pointer given to kmem_cache_free().

Prior to that commit, SLUB already had debugging check+warning that could
be enabled to compare the given kmem_cache pointer to one referenced by
the slab page where the object-to-be-freed resides. This check was moved
to cache_from_obj(). Later the check was also enabled for
SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
cache membership under freelist hardening").

These checks and warnings can be useful especially for the debugging,
which can be improved. Commit 598a0717a816 changed the pr_err() with
WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
others are silent. This patch changes it to WARN() so that all errors are
reported.

It's also useful to print SLUB allocation/free tracking info for the
offending object, if tracking is enabled. Thus, export the SLUB
print_tracking() function and provide an empty one for SLAB.

For SLUB we can also benefit from the static key check in
kmem_cache_debug_flags(), but we need to move this function to slab.h and
declare the static key there.

[1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com

[vbabka@suse.cz: avoid bogus WARN()]
Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian
Link: http://lkml.kernel.org/r/b33e0fa7-cd28-4788-9e54-5927846329ef@suse.cz

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Kees Cook
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Matthew Garrett
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: Vinayak Menon
Link: http://lkml.kernel.org/r/afeda7ac-748b-33d8-a905-56b708148ad5@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:23 +0800
d3c58f24b mm, slab/slub: move and improve cache_from_obj() ... Browse Code »

The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b:
always get the cache from its page in kmem_cache_free()") to support
kmemcg, where per-memcg cache can be different from the root one, so we
can't use the kmem_cache pointer given to kmem_cache_free().

Prior to that commit, SLUB already had debugging check+warning that could
be enabled to compare the given kmem_cache pointer to one referenced by
the slab page where the object-to-be-freed resides. This check was moved
to cache_from_obj(). Later the check was also enabled for
SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate
cache membership under freelist hardening").

These checks and warnings can be useful especially for the debugging,
which can be improved. Commit 598a0717a816 changed the pr_err() with
WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported,
others are silent. This patch changes it to WARN() so that all errors are
reported.

It's also useful to print SLUB allocation/free tracking info for the
offending object, if tracking is enabled. We could export the SLUB
print_tracking() function and provide an empty one for SLAB, or realize
that both the debugging and hardening cases in cache_from_obj() are only
supported by SLUB anyway. So this patch moves cache_from_obj() from
slab.h to separate instances in slab.c and slub.c, where the SLAB version
only does the kmemcg lookup and even could be completely removed once the
kmemcg rework [1] is merged. The SLUB version can thus easily use the
print_tracking() function. It can also use the kmem_cache_debug_flags()
static key check for improved performance in kernels without the hardening
and with debugging not enabled on boot.

[1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Cc: Christoph Lameter
Cc: Jann Horn
Cc: Kees Cook
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-10-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
8fc8d6664 mm, slub: extend checks guarded by slub_debug static key ... Browse Code »

There are few more places in SLUB that could benefit from reduced overhead
of the static key introduced by a previous patch:

- setup_object_debug() called on each object in newly allocated slab page
- setup_page_debug() called on newly allocated slab page
- __free_slab() called on freed slab page

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Roman Gushchin
Acked-by: Christoph Lameter
Cc: Jann Horn
Cc: Kees Cook
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-9-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
59052e89f mm, slub: introduce kmem_cache_debug_flags() ... Browse Code »

There are few places that call kmem_cache_debug(s) (which tests if any of
debug flags are enabled for a cache) immediately followed by a test for a
specific flag. The compiler can probably eliminate the extra check, but
we can make the code nicer by introducing kmem_cache_debug_flags() that
works like kmem_cache_debug() (including the static key check) but tests
for specific flag(s). The next patches will add more users.

[vbabka@suse.cz: change return from int to bool, per Kees. Add VM_WARN_ON_ONCE() for invalid flags, per Roman]
Link: http://lkml.kernel.org/r/949b90ed-e0f0-07d7-4d21-e30ec0958a7c@suse.cz

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Acked-by: Roman Gushchin
Acked-by: Christoph Lameter
Acked-by: Kees Cook
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-8-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
ca0cab65e mm, slub: introduce static key for slub_debug() ... Browse Code »

One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be
built with the option enabled, but it's inactive until simply enabled on
boot, without rebuilding the kernel. With a static key, we can further
eliminate the overhead of checking whether a cache has a particular debug
flag enabled if we know that there are no such caches (slub_debug was not
enabled during boot). We use the same mechanism also for e.g.
page_owner, debug_pagealloc or kmemcg functionality.

This patch introduces the static key and makes the general check for
per-cache debug flags kmem_cache_debug() use it. This benefits several
call sites, including (slow path but still rather frequent) __slab_free().
The next patches will add more uses.

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Roman Gushchin
Acked-by: Christoph Lameter
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-7-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
8f58119ac mm, slub: make reclaim_account attribute read-only ... Browse Code »

The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not
clear why this attribute was writable in the first place, as it's tied to
how the cache is used by its creator, it's not a user tunable.
Furthermore:

- it affects slab merging, but that's not being checked while toggled
- if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but
the runtime toggle doesn't update allocflags
- it affects cache_vmstat_idx() so runtime toggling might lead to incosistency
of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE

Thus make it read-only.

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-6-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
060807f84 mm, slub: make remaining slub_debug related attributes read-only ... Browse Code »

SLUB_DEBUG creates several files under /sys/kernel/slab// that can
be read to check if the respective debugging options are enabled for given
cache. Some options, namely sanity_checks, trace, and failslab can be
also enabled and disabled at runtime by writing into the files.

The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when
enabled, which means that in case of concurrent allocations, some can
still use __CMPXCHG_DOUBLE and some not, leading to potential corruption.
The s->flags field is also not updated or checked atomically. The
simplest solution is to remove the runtime toggling. The extended
slub_debug boot parameter syntax introduced by earlier patch should allow
to fine-tune the debugging configuration during boot with same
granularity.

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: Jann Horn
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
32a6f409b mm, slub: remove runtime allocation order changes ... Browse Code »

SLUB allows runtime changing of page allocation order by writing into the
/sys/kernel/slab//order file. Jann has reported [1] that this
interface allows the order to be set too small, leading to crashes.

While it's possible to fix the immediate issue, closer inspection reveals
potential races. Storing the new order calls calculate_sizes() which
non-atomically updates a lot of kmem_cache fields while the cache is still
in use. Unexpected behavior might occur even if the fields are set to the
same value as they were.

This could be fixed by splitting out the part of calculate_sizes() that
depends on forced_order, so that we only update kmem_cache.oo field. This
could still race with init_cache_random_seq(), shuffle_freelist(),
allocate_slab(). Perhaps it's possible to audit and e.g. add some
READ_ONCE/WRITE_ONCE accesses, it might be easier just to remove the
runtime order changes, which is what this patch does. If there are valid
usecases for per-cache order setting, we could e.g. extend the boot
parameters to do that.

[1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com

Reported-by: Jann Horn
Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Christoph Lameter
Acked-by: Roman Gushchin
Cc: Vijayanand Jitta
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-4-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
ad38b5b11 mm, slub: make some slub_debug related attributes read-only ... Browse Code »

SLUB_DEBUG creates several files under /sys/kernel/slab// that can
be read to check if the respective debugging options are enabled for given
cache. The options can be also toggled at runtime by writing into the
files. Some of those, namely red_zone, poison, and store_user can be
toggled only when no objects yet exist in the cache.

Vijayanand reports [1] that there is a problem with freelist randomization
if changing the debugging option's state results in different number of
objects per page, and the random sequence cache needs thus needs to be
recomputed.

However, another problem is that the check for "no objects yet exist in
the cache" is racy, as noted by Jann [2] and fixing that would add
overhead or otherwise complicate the allocation/freeing paths. Thus it
would be much simpler just to remove the runtime toggling support. The
documentation describes it's "In case you forgot to enable debugging on
the kernel command line", but the neccessity of having no objects limits
its usefulness anyway for many caches.

Vijayanand describes an use case [3] where debugging is enabled for all
but zram caches for memory overhead reasons, and using the runtime toggles
was the only way to achieve such configuration. After the previous patch
it's now possible to do that directly from the kernel boot option, so we
can remove the dangerous runtime toggles by making the /sys attribute
files read-only.

While updating it, also improve the documentation of the debugging /sys files.

[1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org
[2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com
[3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

Reported-by: Vijayanand Jitta
Reported-by: Jann Horn
Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Acked-by: Roman Gushchin
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Pekka Enberg
Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
e17f1dfba mm, slub: extend slub_debug syntax for multiple blocks ... Browse Code »

Patch series "slub_debug fixes and improvements".

The slub_debug kernel boot parameter can either apply a single set of
options to all caches or a list of caches. There is a use case where
debugging is applied for all caches and then disabled at runtime for
specific caches, for performance and memory consumption reasons [1]. As
runtime changes are dangerous, extend the boot parameter syntax so that
multiple blocks of either global or slab-specific options can be
specified, with blocks delimited by ';'. This will also support the use
case of [1] without runtime changes.

For details see the updated Documentation/vm/slub.rst

[1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org

[weiyongjun1@huawei.com: make parse_slub_debug_flags() static]
Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com

Signed-off-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Reviewed-by: Kees Cook
Cc: Vlastimil Babka
Cc: Christoph Lameter
Cc: Jann Horn
Cc: Roman Gushchin
Cc: Vijayanand Jitta
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz
Signed-off-by: Linus Torvalds

Vlastimil Babka
2020-08-08 02:33:22 +0800
444050990 mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order ... Browse Code »

kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of
memory currently bypasses the check and will simply leak the memory when
page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK
check out of slab & slub, and call it from kmalloc_order() as well. In
order to make the code clear, the warning message is put in one place.

Signed-off-by: Long Li
Signed-off-by: Andrew Morton
Reviewed-by: Matthew Wilcox (Oracle)
Reviewed-by: Pekka Enberg
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong
Signed-off-by: Linus Torvalds

Long Li
2020-08-08 02:33:22 +0800

17 Jul, 2020

1 commit

3f649ab72 treewide: Remove uninitialized_var() usage ... Browse Code »

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var$([^$]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe # IB
Acked-by: Kalle Valo # wireless drivers
Reviewed-by: Chao Yu # erofs
Signed-off-by: Kees Cook

Kees Cook
2020-07-17 03:35:15 +0800

26 Jun, 2020

1 commit

55860d96c slub: cure list_slab_objects() from double fix ... Browse Code »

According to Christopher Lameter two fixes have been merged for the same
problem. As far as I can tell, the code does not acquire the list_lock
and invoke kmalloc(). list_slab_objects() misses an unlock (the
counterpart to get_map()) and the memory allocated in free_partial()
isn't used.

Revert the mentioned commit.

Link: http://lkml.kernel.org/r/20200618201234.795692-1-bigeasy@linutronix.de
Fixes: aa456c7aebb14 ("slub: remove kmalloc under list_lock from list_slab_objects() V2")
Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2006181501480.12014@www.lameter.com
Signed-off-by: Sebastian Andrzej Siewior
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Thomas Gleixner
Cc: Yu Zhao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sebastian Andrzej Siewior
2020-06-26 15:27:37 +0800

18 Jun, 2020

1 commit

fe557319a maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault ... Browse Code »

Better describe what these functions do.

Suggested-by: Linus Torvalds
Signed-off-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-18 01:57:41 +0800

05 Jun, 2020

1 commit

0d645ed19 mm/slub: fix a typo in comment "disambiguiation"->"disambiguation" ... Browse Code »

There is a typo in comment, fix it.

Signed-off-by: Ethon Paul
Signed-off-by: Andrew Morton
Acked-by: David Rientjes
Link: http://lkml.kernel.org/r/20200411002247.14468-1-ethp@qq.com
Signed-off-by: Linus Torvalds

Ethon Paul
2020-06-05 10:06:24 +0800

04 Jun, 2020

2 commits

97a225e69 mm/page_alloc: integrate classzone_idx and high_zoneidx ... Browse Code »

classzone_idx is just different name for high_zoneidx now. So, integrate
them and add some comment to struct alloc_context in order to reduce
future confusion about the meaning of this variable.

The accessor, ac_classzone_idx() is also removed since it isn't needed
after integration.

In addition to integration, this patch also renames high_zoneidx to
highest_zoneidx since it represents more precise meaning.

Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Acked-by: Vlastimil Babka
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Ye Xiaolong
Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds

Joonsoo Kim
2020-06-04 11:09:44 +0800
dde3c6b72 mm/slub: fix a memory leak in sysfs_slab_add() ... Browse Code »

syzkaller reports for memory leak when kobject_init_and_add() returns an
error in the function sysfs_slab_add() [1]

When this happened, the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.

This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails.

[1]
BUG: memory leak
unreferenced object 0xffff8880a6d4be88 (size 8):
comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
hex dump (first 8 bytes):
70 69 64 5f 33 00 ff ff pid_3...
backtrace:
kstrdup+0x35/0x70 mm/util.c:60
kstrdup_const+0x3d/0x50 mm/util.c:82
kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
kobject_add_varg lib/kobject.c:384 [inline]
kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
__kmem_cache_create+0x50a/0x570 mm/slub.c:4384
create_cache+0x113/0x1e0 mm/slab_common.c:407
kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
kmem_cache_create+0xd/0x10 mm/slab_common.c:564
create_pid_cachep kernel/pid_namespace.c:54 [inline]
create_pid_namespace kernel/pid_namespace.c:96 [inline]
copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
__do_sys_unshare kernel/fork.c:3037 [inline]
__se_sys_unshare kernel/fork.c:3035 [inline]
__x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295

Fixes: 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
Reported-by: Hulk Robot
Signed-off-by: Wang Hai
Signed-off-by: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.com
Signed-off-by: Linus Torvalds

Wang Hai
2020-06-04 11:09:42 +0800

03 Jun, 2020

5 commits

faa392181 Merge tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm ... Browse Code »

Pull drm updates from Dave Airlie:
"Highlights:

- Core DRM had a lot of refactoring around managed drm resources to
make drivers simpler.

- Intel Tigerlake support is on by default

- amdgpu now support p2p PCI buffer sharing and encrypted GPU memory

Details:

core:
- uapi: error out EBUSY when existing master
- uapi: rework SET/DROP MASTER permission handling
- remove drm_pci.h
- drm_pci* are now legacy
- introduced managed DRM resources
- subclassing support for drm_framebuffer
- simple encoder helper
- edid improvements
- vblank + writeback documentation improved
- drm/mm - optimise tree searches
- port drivers to use devm_drm_dev_alloc

dma-buf:
- add flag for p2p buffer support

mst:
- ACT timeout improvements
- remove drm_dp_mst_has_audio
- don't use 2nd TX slot - spec recommends against it

bridge:
- dw-hdmi various improvements
- chrontel ch7033 support
- fix stack issues with old gcc

hdmi:
- add unpack function for drm infoframe

fbdev:
- misc fbdev driver fixes

i915:
- uapi: global sseu pinning
- uapi: OA buffer polling
- uapi: remove generated perf code
- uapi: per-engine default property values in sysfs
- Tigerlake GEN12 enabled.
- Lots of gem refactoring
- Tigerlake enablement patches
- move to drm_device logging
- Icelake gamma HW readout
- push MST link retrain to hotplug work
- bandwidth atomic helpers
- ICL fixes
- RPS/GT refactoring
- Cherryview full-ppgtt support
- i915 locking guidelines documented
- require linear fb stride to be 512 multiple on gen9
- Tigerlake SAGV support

amdgpu:
- uapi: encrypted GPU memory handling
- uapi: add MEM_SYNC IB flag
- p2p dma-buf support
- export VRAM dma-bufs
- FRU chip access support
- RAS/SR-IOV updates
- Powerplay locking fixes
- VCN DPG (powergating) enablement
- GFX10 clockgating fixes
- DC fixes
- GPU reset fixes
- navi SDMA fix
- expose FP16 for modesetting
- DP 1.4 compliance fixes
- gfx10 soft recovery
- Improved Critical Thermal Faults handling
- resizable BAR on gmc10

amdkfd:
- uapi: GWS resource management
- track GPU memory per process
- report PCI domain in topology

radeon:
- safe reg list generator fixes

nouveau:
- HD audio fixes on recent systems
- vGPU detection (fail probe if we're on one, for now)
- Interlaced mode fixes (mostly avoidance on Turing, which doesn't support it)
- SVM improvements/fixes
- NVIDIA format modifier support
- Misc other fixes.

adv7511:
- HDMI SPDIF support

ast:
- allocate crtc state size
- fix double assignment
- fix suspend

bochs:
- drop connector register

cirrus:
- move to tiny drivers.

exynos:
- fix imported dma-buf mapping
- enable runtime PM
- fixes and cleanups

mediatek:
- DPI pin mode swap
- config mipi_tx current/impedance

lima:
- devfreq + cooling device support
- task handling improvements
- runtime PM support

pl111:
- vexpress init improvements
- fix module auto-load

rcar-du:
- DT bindings conversion to YAML
- Planes zpos sanity check and fix
- MAINTAINERS entry for LVDS panel driver

mcde:
- fix return value

mgag200:
- use managed config init

stm:
- read endpoints from DT

vboxvideo:
- use PCI managed functions
- drop WC mtrr

vkms:
- enable cursor by default

rockchip:
- afbc support

virtio:
- various cleanups

qxl:
- fix cursor notify port

hisilicon:
- 128-byte stride alignment fix

sun4i:
- improved format handling"

* tag 'drm-next-2020-06-02' of git://anongit.freedesktop.org/drm/drm: (1401 commits)
drm/amd/display: Fix potential integer wraparound resulting in a hang
drm/amd/display: drop cursor position check in atomic test
drm/amdgpu: fix device attribute node create failed with multi gpu
drm/nouveau: use correct conflicting framebuffer API
drm/vblank: Fix -Wformat compile warnings on some arches
drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode
drm/amd/display: Handle GPU reset for DC block
drm/amdgpu: add apu flags (v2)
drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven
drm/amdgpu: fix pm sysfs node handling (v2)
drm/amdgpu: move gpu_info parsing after common early init
drm/amdgpu: move discovery gfx config fetching
drm/nouveau/dispnv50: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau/debugfs: fix runtime pm imbalance on error
drm/nouveau/nouveau/hmm: fix migrate zero page to GPU
drm/nouveau/nouveau/hmm: fix nouveau_dmem_chunk allocations
drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST
drm/nouveau/kms/nv50-: Move 8BPC limit for MST into nv50_mstc_get_modes()
...

Linus Torvalds
2020-06-03 06:04:15 +0800
a68ee0573 mm/slub: fix stack overruns with SLUB_STATS ... Browse Code »

There is no need to copy SLUB_STATS items from root memcg cache to new
memcg cache copies. Doing so could result in stack overruns because the
store function only accepts 0 to clear the stat and returns an error for
everything else while the show method would print out the whole stat.

Then, the mismatch of the lengths returns from show and store methods
happens in memcg_propagate_slab_attrs():

else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
buf = mbuf;

max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
in show_stat() later where a bounch of sprintf() would overrun the stack
variable. Fix it by always allocating a page of buffer to be used in
show_stat() if SLUB_STATS=y which should only be used for debug purpose.

# echo 1 > /sys/kernel/slab/fs_cache/shrink
BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251

Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
number+0x421/0x6e0
vsnprintf+0x451/0x8e0
sprintf+0x9e/0xd0
show_stat+0x124/0x1d0
alloc_slowpath_show+0x13/0x20
__kmem_cache_create+0x47a/0x6b0

addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
process_one_work+0x0/0xb90

this frame has 1 object:
[32, 72) 'lockdep_map'

Memory state around the buggy address:
ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
^
ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
__kmem_cache_create+0x6ac/0x6b0

Fixes: 107dab5c92d5 ("slub: slub-specific propagation changes")
Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Cc: Glauber Costa
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pw
Signed-off-by: Linus Torvalds

Qian Cai
2020-06-03 01:59:06 +0800
aa456c7ae slub: remove kmalloc under list_lock from list_slab_objects() V2 ... Browse Code »

list_slab_objects() is called when a slab is destroyed and there are
objects still left to list the objects in the syslog. This is a pretty
rare event.

And there it seems we take the list_lock and call kmalloc while holding
that lock.

Perform the allocation in free_partial() before the list_lock is taken.

Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christopher Lameter
Signed-off-by: Andrew Morton
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: "Kirill A. Shutemov"
Cc: Tetsuo Handa
Cc: Yu Zhao
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002031721250.1668@www.lameter.com
Signed-off-by: Linus Torvalds

Christopher Lameter
2020-06-03 01:59:06 +0800
d7660ce59 slub: Remove userspace notifier for cache add/remove ... Browse Code »

I came across some unnecessary uevents once again which reminded me
this. The patch seems to be lost in the leaves of the original
discussion [1], so resending.

[1] https://lore.kernel.org/r/alpine.DEB.2.21.2001281813130.745@www.lameter.com

Kmem caches are internal kernel structures so it is strange that
userspace notifiers would be needed. And I am not aware of any use of
these notifiers. These notifiers may just exist because in the initial
slub release the sysfs code was copied from another subsystem.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Acked-by: Michal Koutný
Acked-by: David Rientjes
Cc: Pekka Enberg
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200423115721.19821-1-mkoutny@suse.com
Signed-off-by: Linus Torvalds

Christoph Lameter
2020-06-03 01:59:06 +0800
52f234780 mm/slub.c: fix corrupted freechain in deactivate_slab() ... Browse Code »

The slub_debug is able to fix the corrupted slab freelist/page.
However, alloc_debug_processing() only checks the validity of current
and next freepointer during allocation path. As a result, once some
objects have their freepointers corrupted, deactivate_slab() may lead to
page fault.

Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128
slub_nomerge'. The test kernel corrupts the freepointer of one free
object on purpose. Unfortunately, deactivate_slab() does not detect it
when iterating the freechain.

BUG: unable to handle page fault for address: 00000000123456f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
... ...
RIP: 0010:deactivate_slab.isra.92+0xed/0x490
... ...
Call Trace:
___slab_alloc+0x536/0x570
__slab_alloc+0x17/0x30
__kmalloc+0x1d9/0x200
ext4_htree_store_dirent+0x30/0xf0
htree_dirblock_to_tree+0xcb/0x1c0
ext4_htree_fill_tree+0x1bc/0x2d0
ext4_readdir+0x54f/0x920
iterate_dir+0x88/0x190
__x64_sys_getdents+0xa6/0x140
do_syscall_64+0x49/0x170
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Therefore, this patch adds extra consistency check in deactivate_slab().
Once an object's freepointer is corrupted, all following objects
starting at this object are isolated.

[akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n]
Signed-off-by: Dongli Zhang
Signed-off-by: Andrew Morton
Cc: Joe Jin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.com
Signed-off-by: Linus Torvalds

Dongli Zhang
2020-06-03 01:59:06 +0800