Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

14 Apr, 2014

1 commit

bf3a34073 Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:

https://lkml.org/lkml/2013/12/2/64"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate

Linus Torvalds
2014-04-14 04:28:13 +0800

08 Apr, 2014

6 commits

88da03a67 slub: use raw_cpu_inc for incrementing statistics ... Browse Code »

Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.

When __this_cpu_inc is altered to check if preemption is disabled this
triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Christoph Lameter
Cc: Fengguang Wu
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-04-08 07:36:14 +0800
54b6a7310 slub: fix leak of 'name' in sysfs_slab_add ... Browse Code »

The failure paths of sysfs_slab_add don't release the allocation of
'name' made by create_unique_id() a few lines above the context of the
diff below. Create a common exit path to make it more obvious what
needs freeing.

[vdavydov@parallels.com: free the name only if !unmergeable]
Signed-off-by: Dave Jones
Signed-off-by: Vladimir Davydov
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jones
2014-04-08 07:36:13 +0800
9a41707bd slub: rework sysfs layout for memcg caches ... Browse Code »

Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches. Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache. For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:

X
A -> X
B -> X

where X is some unique id (see create_unique_id()). Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:

X
X:M
X:N
A -> X
B -> X
A:M -> X:M
A:N -> X:N

Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.

It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory. This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.

This patch does the trick. It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:13 +0800
84d0ddd6b slub: adjust memcg caches when creating cache alias ... Browse Code »

Otherwise, kzalloc() called from a memcg won't clear the whole object.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:13 +0800
a44cb9449 memcg, slab: never try to merge memcg caches ... Browse Code »

When a kmem cache is created (kmem_cache_create_memcg()), we first try to
find a compatible cache that already exists and can handle requests from
the new cache, i.e. has the same object size, alignment, ctor, etc. If
there is such a cache, we do not create any new caches, instead we simply
increment the refcount of the cache found and return it.

Currently we do this procedure not only when creating root caches, but
also for memcg caches. However, there is no point in that, because, as
every memcg cache has exactly the same parameters as its parent and cache
merging cannot be turned off in runtime (only on boot by passing
"slub_nomerge"), the root caches of any two potentially mergeable memcg
caches should be merged already, i.e. it must be the same root cache, and
therefore we couldn't even get to the memcg cache creation, because it
already exists.

The only exception is boot caches - they are explicitly forbidden to be
merged by setting their refcount to -1. There are currently only two of
them - kmem_cache and kmem_cache_node, which are used in slab internals (I
do not count kmalloc caches as their refcount is set to 1 immediately
after creation). Since they are prevented from merging preliminary I
guess we should avoid to merge their children too.

So let's remove the useless code responsible for merging memcg caches.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:12 +0800
2a389610a mm, mempolicy: rename slab_node for clarity ... Browse Code »

slab_node() is actually a mempolicy function, so rename it to
mempolicy_slab_node() to make it clearer that it used for processes with
mempolicies.

At the same time, cleanup its code by saving numa_mem_id() in a local
variable (since we require a node with memory, not just any node) and
remove an obsolete comment that assumes the mempolicy is actually passed
into the function.

Signed-off-by: David Rientjes
Acked-by: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Tejun Heo
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Jianguo Wu
Cc: Tim Hockin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-04-08 07:35:54 +0800

04 Apr, 2014

2 commits

421af243b slub: do not drop slab_mutex for sysfs_slab_add ... Browse Code »

We release the slab_mutex while calling sysfs_slab_add from
__kmem_cache_create since commit 66c4c35c6bc5 ("slub: Do not hold
slub_lock when calling sysfs_slab_add()"), because kobject_uevent called
by sysfs_slab_add might block waiting for the usermode helper to exec,
which would result in a deadlock if we took the slab_mutex while
executing it.

However, apart from complicating synchronization rules, releasing the
slab_mutex on kmem cache creation can result in a kmemcg-related race.
The point is that we check if the memcg cache exists before going to
__kmem_cache_create, but register the new cache in memcg subsys after
it. Since we can drop the mutex there, several threads can see that the
memcg cache does not exist and proceed to creating it, which is wrong.

Fortunately, recently kobject_uevent was patched to call the usermode
helper with the UMH_NO_WAIT flag, making the deadlock impossible.
Therefore there is no point in releasing the slab_mutex while calling
sysfs_slab_add, so let's simplify kmem_cache_create synchronization and
fix the kmemcg-race mentioned above by holding the slab_mutex during the
whole cache creation path.

Signed-off-by: Vladimir Davydov
Acked-by: Christoph Lameter
Cc: Greg KH
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-04 07:21:05 +0800
d26914d11 mm: optimize put_mems_allowed() usage ... Browse Code »
7

Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.

Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.

This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.

Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2014-04-04 07:20:58 +0800

27 Mar, 2014

1 commit

80c3a9981 slub: fix high order page allocation problem with __GFP_NOFAIL ... Browse Code »

SLUB already try to allocate high order page with clearing __GFP_NOFAIL.
But, when allocating shadow page for kmemcheck, it missed clearing
the flag. This trigger WARN_ON_ONCE() reported by Christian Casteyde.

https://bugzilla.kernel.org/show_bug.cgi?id=65991
https://lkml.org/lkml/2013/12/3/764

This patch fix this situation by using same allocation flag as original
allocation.

Reported-by: Christian Casteyde
Acked-by: David Rientjes
Signed-off-by: Joonsoo Kim
Signed-off-by: Pekka Enberg

Joonsoo Kim
2014-03-27 20:27:34 +0800

11 Feb, 2014

2 commits

1e4dd9461 slub: do not assert not having lock in removing freed partial ... Browse Code »
2

Vladimir reported the following issue:

Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires
remove_partial() to be called with n->list_lock held, but free_partial()
called from kmem_cache_close() on cache destruction does not follow this
rule, leading to a warning:

WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
Modules linked in:
CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1
Hardware name:
0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
Call Trace:
__kmem_cache_shutdown+0x1b2/0x1f0
kmem_cache_destroy+0x43/0xf0
xfs_destroy_zones+0x103/0x110 [xfs]
exit_xfs_fs+0x38/0x4e4 [xfs]
SyS_delete_module+0x19a/0x1f0
system_call_fastpath+0x16/0x1b

His solution was to add a spinlock in order to quiet lockdep. Although
there would be no contention to adding the lock, that lock also requires
disabling of interrupts which will have a larger impact on the system.

Instead of adding a spinlock to a location where it is not needed for
lockdep, make a __remove_partial() function that does not test if the
list_lock is held, as no one should have it due to it being freed.

Also added a __add_partial() function that does not do the lock
validation either, as it is not needed for the creation of the cache.

Signed-off-by: Steven Rostedt
Reported-by: Vladimir Davydov
Suggested-by: David Rientjes
Acked-by: David Rientjes
Acked-by: Vladimir Davydov
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
2014-02-11 08:01:42 +0800
255d0884f mm/slub.c: list_lock may not be held in some circumstances ... Browse Code »

Commit c65c1877bd68 ("slub: use lockdep_assert_held") incorrectly
required that add_full() and remove_full() hold n->list_lock. The lock
is only taken when kmem_cache_debug(s), since that's the only time it
actually does anything.

Require that the lock only be taken under such a condition.

Reported-by: Larry Finger
Tested-by: Larry Finger
Tested-by: Paul E. McKenney
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-02-11 08:01:41 +0800

03 Feb, 2014

1 commit

7b383bef2 Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull SLAB changes from Pekka Enberg:
"Random bug fixes that have accumulated in my inbox over the past few
months"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: Fix warning on make htmldocs caused by slab.c
mm: slub: work around unneeded lockdep warning
mm: sl[uo]b: fix misleading comments
slub: Fix possible format string bug.
slub: use lockdep_assert_held
slub: Fix calculation of cpu slabs
slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings

Linus Torvalds
2014-02-03 03:30:08 +0800

31 Jan, 2014

2 commits

67b6c900d mm: slub: work around unneeded lockdep warning ... Browse Code »

The slub code does some setup during early boot in
early_kmem_cache_node_alloc() with some local data. There is no
possible way that another CPU can see this data, so the slub code
doesn't unnecessarily lock it. However, some new lockdep asserts
check to make sure that add_partial() _always_ has the list_lock
held.

Just add the locking, even though it is technically unnecessary.

Cc: Peter Zijlstra
Cc: Russell King
Acked-by: David Rientjes
Signed-off-by: Dave Hansen
Signed-off-by: Pekka Enberg

Dave Hansen
2014-01-31 19:41:26 +0800
a03208652 mm/slub.c: fix page->_count corruption (again) ... Browse Code »

Commit abca7c496584 ("mm: fix slab->page _count corruption when using
slub") notes that we can not _set_ a page->counters directly, except
when using a real double-cmpxchg. Doing so can lose updates to
->_count.

That is an absolute rule:

You may not *set* page->counters except via a cmpxchg.

Commit abca7c496584 fixed this for the folks who have the slub
cmpxchg_double code turned off at compile time, but it left the bad case
alone. It can still be reached, and the same bug triggered in two
cases:

1. Turning on slub debugging at runtime, which is available on
the distro kernels that I looked at.
2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
cpus, evidently)

There are at least 3 ways we could fix this:

1. Take all of the exising calls to cmpxchg_double_slab() and
__cmpxchg_double_slab() and convert them to take an old, new
and target 'struct page'.
2. Do (1), but with the newly-introduced 'slub_data'.
3. Do some magic inside the two cmpxchg...slab() functions to
pull the counters out of new_counters and only set those
fields in page->{inuse,frozen,objects}.

I've done (2) as well, but it's a bunch more code. This patch is an
attempt at (3). This was the most straightforward and foolproof way
that I could think to do this.

This would also technically allow us to get rid of the ugly

#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)

in 'struct page', but leaving it alone has the added benefit that
'counters' stays 'unsigned' instead of 'unsigned long', so all the
copies that the slub code does stay a bit smaller.

Signed-off-by: Dave Hansen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Pravin B Shelar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-01-31 08:56:56 +0800

30 Jan, 2014

1 commit

a0132ac0f mm/slub.c: do not VM_BUG_ON_PAGE() for temporary on-stack pages ... Browse Code »

Commit 309381feaee5 ("mm: dump page when hitting a VM_BUG_ON using
VM_BUG_ON_PAGE") added a bunch of VM_BUG_ON_PAGE() calls.

But, most of the ones in the slub code are for _temporary_ 'struct
page's which are declared on the stack and likely have lots of gunk in
them. Dumping their contents out will just confuse folks looking at
bad_page() output. Plus, if we try to page_to_pfn() on them or
soemthing, we'll probably oops anyway.

Turn them back in to VM_BUG_ON()s.

Signed-off-by: Dave Hansen
Cc: Sasha Levin
Cc: "Kirill A. Shutemov"
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-01-30 08:22:40 +0800

24 Jan, 2014

1 commit

309381fea mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE ... Browse Code »

Most of the VM_BUG_ON assertions are performed on a page. Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2014-01-24 08:36:50 +0800

14 Jan, 2014

2 commits

26e4f2057 slub: Fix possible format string bug. ... Browse Code »

The "name" is determined at runtime and is parsed as format string.

Acked-by: David Rientjes
Signed-off-by: Tetsuo Handa
Signed-off-by: Pekka Enberg

Tetsuo Handa
2014-01-14 03:36:34 +0800
c65c1877b slub: use lockdep_assert_held ... Browse Code »

Instead of using comments in an attempt at getting the locking right,
use proper assertions that actively warn you if you got it wrong.

Also add extra braces in a few sites to comply with coding-style.

Signed-off-by: Peter Zijlstra
Signed-off-by: Pekka Enberg

Peter Zijlstra
2014-01-14 03:34:39 +0800

29 Dec, 2013

1 commit

8afb1474d slub: Fix calculation of cpu slabs ... Browse Code »

/sys/kernel/slab/:t-0000048 # cat cpu_slabs
231 N0=16 N1=215
/sys/kernel/slab/:t-0000048 # cat slabs
145 N0=36 N1=109

See, the number of slabs is smaller than that of cpu slabs.

The bug was introduced by commit 49e2258586b423684f03c278149ab46d8f8b6700
("slub: per cpu cache for partial pages").

We should use page->pages instead of page->pobjects when calculating
the number of cpu partial slabs. This also fixes the mapping of slabs
and nodes.

As there's no variable storing the number of total/active objects in
cpu partial slabs, and we don't have user interfaces requiring those
statistics, I just add WARN_ON for those cases.

Cc: # 3.2+
Acked-by: Christoph Lameter
Reviewed-by: Wanpeng Li
Signed-off-by: Li Zefan
Signed-off-by: Pekka Enberg

Li Zefan
2013-12-29 19:44:45 +0800

23 Nov, 2013

1 commit

24f971abb Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull SLAB changes from Pekka Enberg:
"The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
slab internals similar to mm/slub.c. This reduces memory usage and
improves performance:

https://lkml.org/lkml/2013/10/16/155

Rest of the changes are bug fixes from various people"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
mm, slub: fix the typo in mm/slub.c
mm, slub: fix the typo in include/linux/slub_def.h
slub: Handle NULL parameter in kmem_cache_flags
slab: replace non-existing 'struct freelist *' with 'void *'
slab: fix to calm down kmemleak warning
slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
slab: rename slab_bufctl to slab_freelist
slab: remove useless statement for checking pfmemalloc
slab: use struct page for slab management
slab: replace free and inuse in struct slab with newly introduced active
slab: remove SLAB_LIMIT
slab: remove kmem_bufctl_t
slab: change the management method of free objects of the slab
slab: use __GFP_COMP flag for allocating slab pages
slab: use well-defined macro, virt_to_slab()
slab: overloading the RCU head over the LRU for RCU free
slab: remove cachep in struct slab_rcu
slab: remove nodeid in struct slab
slab: remove colouroff in struct slab
slab: change return type of kmem_getpages() to struct page
...

Linus Torvalds
2013-11-23 00:10:34 +0800

16 Nov, 2013

1 commit

9073e1a80 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...

Linus Torvalds
2013-11-16 08:47:22 +0800

13 Nov, 2013

1 commit

2ade4de87 memcg, kmem: rename cache_from_memcg to cache_from_memcg_idx ... Browse Code »

We can't see the relationship with memcg from the parameters,
so the name with memcg_idx would be more reasonable.

Signed-off-by: Qiang Huang
Reviewed-by: Pekka Enberg
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qiang Huang
2013-11-13 11:09:10 +0800

12 Nov, 2013

2 commits

721ae22ae mm, slub: fix the typo in mm/slub.c ... Browse Code »

Acked-by: Christoph Lameter
Signed-off-by: Zhi Yong Wu
Signed-off-by: Pekka Enberg

Zhi Yong Wu
2013-11-12 00:19:07 +0800
c6f58d9b3 slub: Handle NULL parameter in kmem_cache_flags ... Browse Code »
2

Andreas Herrmann writes:

When I've used slub_debug kernel option (e.g.
"slub_debug=,skbuff_fclone_cache" or similar) on a debug session I've
seen a panic like:

Highbank #setenv bootargs console=ttyAMA0 root=/dev/sda2 kgdboc.kgdboc=ttyAMA0,115200 slub_debug=,kmalloc-4096 earlyprintk=ttyAMA0
...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G W 3.12.0-00048-gbe408cd #314
task: c0898360 ti: c088a000 task.ti: c088a000
PC is at strncmp+0x1c/0x84
LR is at kmem_cache_flags.isra.46.part.47+0x44/0x60
pc : [] lr : [] psr: 200001d3
sp : c088bea8 ip : c088beb8 fp : c088beb4
r10: 00000000 r9 : 413fc090 r8 : 00000001
r7 : 00000000 r6 : c2984a08 r5 : c0966e78 r4 : 00000000
r3 : 0000006b r2 : 0000000c r1 : 00000000 r0 : c2984a08
Flags: nzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel
Control: 10c5387d Table: 0000404a DAC: 00000015
Process swapper (pid: 0, stack limit = 0xc088a248)
Stack: (0xc088bea8 to 0xc088c000)
bea0: c088bed4 c088beb8 c0110a3c c02c6d90 c0966e78 00000040
bec0: ef001f00 00000040 c088bf14 c088bed8 c0112070 c0110a04 00000005 c010fac8
bee0: c088bf5c c088bef0 c010fac8 ef001f00 00000040 00000000 00000040 00000001
bf00: 413fc090 00000000 c088bf34 c088bf18 c0839190 c0112040 00000000 ef001f00
bf20: 00000000 00000000 c088bf54 c088bf38 c0839200 c083914c 00000006 c0961c4c
bf40: c0961c28 00000000 c088bf7c c088bf58 c08392ac c08391c0 c08a2ed8 c0966e78
bf60: c086b874 c08a3f50 c0961c28 00000001 c088bfb4 c088bf80 c083b258 c0839248
bf80: 2f800000 0f000000 c08935b4 ffffffff c08cd400 ffffffff c08cd400 c0868408
bfa0: c29849c0 00000000 c088bff4 c088bfb8 c0824974 c083b1e4 ffffffff ffffffff
bfc0: c08245c0 00000000 00000000 c0868408 00000000 10c5387d c0892bcc c0868404
bfe0: c0899440 0000406a 00000000 c088bff8 00008074 c0824824 00000000 00000000
[] (strncmp+0x1c/0x84) from [] (kmem_cache_flags.isra.46.part.47+0x44/0x60)
[] (kmem_cache_flags.isra.46.part.47+0x44/0x60) from [] (__kmem_cache_create+0x3c/0x410)
[] (__kmem_cache_create+0x3c/0x410) from [] (create_boot_cache+0x50/0x74)
[] (create_boot_cache+0x50/0x74) from [] (create_kmalloc_cache+0x4c/0x88)
[] (create_kmalloc_cache+0x4c/0x88) from [] (create_kmalloc_caches+0x70/0x114)
[] (create_kmalloc_caches+0x70/0x114) from [] (kmem_cache_init+0x80/0xe0)
[] (kmem_cache_init+0x80/0xe0) from [] (start_kernel+0x15c/0x318)
[] (start_kernel+0x15c/0x318) from [] (0x8074)
Code: e3520000 01a00002 089da800 e5d03000 (e5d1c000)
---[ end trace 1b75b31a2719ed1d ]---
Kernel panic - not syncing: Fatal exception

Problem is that slub_debug option is not parsed before
create_boot_cache is called. Solve this by changing slub_debug to
early_param.

Kernels 3.11, 3.10 are also affected. I am not sure about older
kernels.

Christoph Lameter explains:

kmem_cache_flags may be called with NULL parameter during early boot.
Skip the test in that case.

Cc: stable@vger.kernel.org # 3.10 and 3.11
Reported-by: Andreas Herrmann
Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2013-11-12 00:15:45 +0800

25 Oct, 2013

1 commit

d56791b38 slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled ... Browse Code »

Move all kmemleak calls into hook functions, and make it so
that all hooks (both inside and outside of #ifdef CONFIG_SLUB_DEBUG)
call the appropriate kmemleak routines. This allows for kmemleak
to be configured independently of slub debug features.

It also fixes a bug where kmemleak was only partially enabled in some
configurations.

Acked-by: Catalin Marinas
Acked-by: Christoph Lameter
Signed-off-by: Roman Bobniev
Signed-off-by: Tim Bird
Signed-off-by: Pekka Enberg

Roman Bobniev
2013-10-25 01:25:10 +0800

18 Oct, 2013

1 commit

d17561743 mm: Fix some trivial typos in comments ... Browse Code »

Signed-off-by: Xie XiuQi
Signed-off-by: Jiri Kosina

Xie XiuQi
2013-10-18 20:49:53 +0800

15 Sep, 2013

1 commit

bff157b3a Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull SLAB update from Pekka Enberg:
"Nothing terribly exciting here apart from Christoph's kmalloc
unification patches that brings sl[aou]b implementations closer to
each other"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slab: Use correct GFP_DMA constant
slub: remove verify_mem_not_deleted()
mm/sl[aou]b: Move kmallocXXX functions to common code
mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
mm/slub.c: beautify code for removing redundancy 'break' statement.
slub: Remove unnecessary page NULL check
slub: don't use cpu partial pages on UP
mm/slub: beautify code for 80 column limitation and tab alignment
mm/slub: remove 'per_cpu' which is useless variable

Linus Torvalds
2013-09-15 19:15:06 +0800

12 Sep, 2013

1 commit

3dbb95f78 mm: replace strict_strtoul() with kstrtoul() ... Browse Code »

The use of strict_strtoul() is not preferred, because strict_strtoul() is
obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jingoo Han
2013-09-12 06:57:11 +0800

05 Sep, 2013

2 commits

76b6f3d25 slub: remove verify_mem_not_deleted() ... Browse Code »

I do not see any user for this code in the tree.

Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2013-09-05 01:53:16 +0800
f1b6eb6e6 mm/sl[aou]b: Move kmallocXXX functions to common code ... Browse Code »

The kmalloc* functions of all slab allcoators are similar now so
lets move them into slab.h. This requires some function naming changes
in slob.

As a results of this patch there is a common set of functions for
all allocators. Also means that kmalloc_large() is now available
in general to perform large order allocations that go directly
via the page allocator. kmalloc_large() can be substituted if
kmalloc() throws warnings because of too large allocations.

kmalloc_large() has exactly the same semantics as kmalloc but
can only used for allocations > PAGE_SIZE.

Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2013-09-05 01:51:33 +0800

13 Aug, 2013

2 commits

68f06650e mm/slub.c: beautify code for removing redundancy 'break' statement. ... Browse Code »

Remove redundancy 'break' statement.

Acked-by: Christoph Lameter
Signed-off-by: Chen Gang
Signed-off-by: Pekka Enberg

Chen Gang
2013-08-13 14:21:46 +0800
ac6434e6b slub: Remove unnecessary page NULL check ... Browse Code »

In commit 4d7868e6(slub: Do not dereference NULL pointer in node_match)
had added check for page NULL in node_match. Thus, it is not needed
to check it before node_match, remove it.

Acked-by: Christoph Lameter
Signed-off-by: Libin
Signed-off-by: Pekka Enberg

Libin
2013-08-13 14:21:46 +0800

09 Aug, 2013

1 commit

370905069 Revert "slub: do not put a slab to cpu partial list when cpu_partial is 0" ... Browse Code »

This reverts commit 318df36e57c0ca9f2146660d41ff28e8650af423.

This commit caused Steven Rostedt's hackbench runs to run out of memory
due to a leak. As noted by Joonsoo Kim, it is buggy in the following
scenario:

"I guess, you may set 0 to all kmem caches's cpu_partial via sysfs,
doesn't it?

In this case, memory leak is possible in following case. Code flow of
possible leak is follwing case.

* in __slab_free()
1. (!new.inuse || !prior) && !was_frozen
2. !kmem_cache_debug && !prior
3. new.frozen = 1
4. after cmpxchg_double_slab, run the (!n) case with new.frozen=1
5. with this patch, put_cpu_partial() doesn't do anything,
because this cache's cpu_partial is 0
6. return

In step 5, leak occur"

And Steven does indeed have cpu_partial set to 0 due to RT testing.

Joonsoo is cooking up a patch, but everybody agrees that reverting this
for now is the right thing to do.

Reported-and-bisected-by: Steven Rostedt
Acked-by: Joonsoo Kim
Acked-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-08-09 00:06:37 +0800

17 Jul, 2013

1 commit

d0e0ac977 mm/slub: beautify code for 80 column limitation and tab alignment ... Browse Code »

Be sure of 80 column limitation for both code and comments.

Correct tab alignment for 'if-else' statement.

Acked-by: Christoph Lameter
Signed-off-by: Chen Gang
Signed-off-by: Pekka Enberg

Chen Gang
2013-07-17 15:11:57 +0800

15 Jul, 2013

4 commits

e35e1a974 mm/slub: remove 'per_cpu' which is useless variable ... Browse Code »

Remove 'per_cpu', since it is useless now after the patch: "205ab99
slub: Update statistics handling for variable order slabs". And the
partial list is handled in the same way as the per cpu slab.

Acked-by: Christoph Lameter
Signed-off-by: Chen Gang
Signed-off-by: Pekka Enberg

Chen Gang
2013-07-15 17:37:55 +0800
0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »
5

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:59 +0800
54be82001 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab update from Pekka Enberg:
"Highlights:

- Fix for boot-time problems on some architectures due to
init_lock_keys() not respecting kmalloc_caches boundaries
(Christoph Lameter)

- CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

- Fix for excessive slab freelist draining (Wanpeng Li)

- SLUB and SLOB cleanups and fixes (various people)"

I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
slub: Check for page NULL before doing the node_match check
mm/slab: Give s_next and s_stop slab-specific names
slob: Check for NULL pointer before calling ctor()
slub: Make cpu partial slab support configurable
slab: add kmalloc() to kernel API documentation
slab: fix init_lock_keys
slob: use DIV_ROUND_UP where possible
slub: do not put a slab to cpu partial list when cpu_partial is 0
mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
mm/slub: Drop unnecessary nr_partials
mm/slab: Fix /proc/slabinfo unwriteable for slab
mm/slab: Sharing s_next and s_stop between slab and slub
mm/slab: Fix drain freelist excessively
slob: Rework #ifdeffery in slab.h
mm, slab: moved kmem_cache_alloc_node comment to correct place

Linus Torvalds
2013-07-15 06:14:29 +0800
c25f195e8 slub: Check for page NULL before doing the node_match check ... Browse Code »

In the -rt kernel (mrg), we hit the following dump:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] kmem_cache_alloc_node+0x51/0x180
PGD a2d39067 PUD b1641067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU 3
Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
RIP: 0010:[] [] kmem_cache_alloc_node+0x51/0x180
RSP: 0018:ffff8800a9b17d70 EFLAGS: 00010213
RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
Stack:
ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
0000000001200011 0000000001200011 0000000000000000 0000000000000000
00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
Call Trace:
[] ? current_has_perm+0x68/0x80
[] copy_process+0xdd/0x15b0
[] ? rt_up_read+0x25/0x30
[] do_fork+0x5a/0x360
[] ? migrate_enable+0xeb/0x220
[] sys_clone+0x28/0x30
[] stub_clone+0x13/0x20
[] ? system_call_fastpath+0x16/0x1b
Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
RIP [] kmem_cache_alloc_node+0x51/0x180
RSP
CR2: 0000000000000000
---[ end trace 0000000000000002 ]---

Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
disable migration. But the SLUB code is relatively lockless, and the
spin_locks there are raw_spin_locks (not converted to mutexes), thus I
believe this bug can happen in mainline without -rt features. The -rt
patch is just good at triggering mainline bugs ;-)

Anyway, looking at where this crashed, it seems that the page variable
can be NULL when passed to the node_match() function (which does not
check if it is NULL). When this happens we get the above panic.

As page is only used in slab_alloc() to check if the node matches, if
it's NULL I'm assuming that we can say it doesn't and call the
__slab_alloc() code. Is this a correct assumption?

Acked-by: Christoph Lameter
Signed-off-by: Steven Rostedt
Signed-off-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Steven Rostedt
2013-07-15 06:13:16 +0800

08 Jul, 2013

1 commit

345c905d1 slub: Make cpu partial slab support configurable ... Browse Code »

CPU partial support can introduce level of indeterminism that is not
wanted in certain context (like a realtime kernel). Make it
configurable.

This patch is based on Christoph Lameter's "slub: Make cpu partial slab
support configurable V2".

Acked-by: Christoph Lameter
Signed-off-by: Joonsoo Kim
Signed-off-by: Pekka Enberg

Joonsoo Kim
2013-07-08 00:09:56 +0800