Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

10 Oct, 2014

3 commits

423c929cb mm/slab_common: commonize slab merge logic ... Browse Code »

Slab merge is good feature to reduce fragmentation. Now, it is only
applied to SLUB, but, it would be good to apply it to SLAB. This patch is
preparation step to apply slab merge to SLAB by commonizing slab merge
logic.

Signed-off-by: Joonsoo Kim
Cc: Randy Dunlap
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-10-10 10:25:51 +0800
a561ce00b slub: fall back to node_to_mem_node() node if allocating on memoryless node ... Browse Code »

Update the SLUB code to search for partial slabs on the nearest node with
memory in the presence of memoryless nodes. Additionally, do not consider
it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when a
memoryless-node specified allocation goes off-node.

Signed-off-by: Joonsoo Kim
Signed-off-by: Nishanth Aravamudan
Cc: David Rientjes
Cc: Han Pingtian
Cc: Pekka Enberg
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Michael Ellerman
Cc: Anton Blanchard
Cc: Christoph Lameter
Cc: Wanpeng Li
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-10-10 10:25:51 +0800
c9e16131d slub: disable tracing and failslab for merged slabs ... Browse Code »

Tracing of mergeable slabs as well as uses of failslab are confusing since
the objects of multiple slab caches will be affected. Moreover this
creates a situation where a mergeable slab will become unmergeable.

If tracing or failslab testing is desired then it may be best to switch
merging off for starters.

Signed-off-by: Christoph Lameter
Tested-by: WANG Chao
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-10-10 10:25:51 +0800

07 Aug, 2014

8 commits

aee52cae0 slub: remove kmemcg id from create_unique_id ... Browse Code »

This function is never called for memcg caches, because they are
unmergeable, so remove the dead code.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Christoph Lameter
Reviewed-by: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-08-07 09:01:21 +0800
4307c14f3 slab: fix the alias count (via sysfs) of slab cache ... Browse Code »

We mark some slab caches (e.g. kmem_cache_node) as unmergeable by
setting refcount to -1, and their alias should be 0, not refcount-1, so
correct it here.

Signed-off-by: Gu Zheng
Acked-by: David Rientjes
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gu Zheng
2014-08-07 09:01:15 +0800
0aa9a13d8 mm, slub: fix some indenting in cmpxchg_double_slab() ... Browse Code »

The return statement goes with the cmpxchg_double() condition so it needs
to be indented another tab.

Also these days the fashion is to line function parameters up, and it
looks nicer that way because then the "freelist_new" is not at the same
indent level as the "return 1;".

Signed-off-by: Dan Carpenter
Signed-off-by: Pekka Enberg
Signed-off-by: David Rientjes
Cc: Joonsoo Kim
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2014-08-07 09:01:15 +0800
542666407 slub: avoid duplicate creation on the first object ... Browse Code »

When a kmem_cache is created with ctor, each object in the kmem_cache
will be initialized before ready to use. While in slub implementation,
the first object will be initialized twice.

This patch reduces the duplication of initialization of the first
object.

Fix commit 7656c72b ("SLUB: add macros for scanning objects in a slab").

Signed-off-by: Wei Yang
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wei Yang
2014-08-07 09:01:15 +0800
02e72cc61 mm: slub: SLUB_DEBUG=n: use the same alloc/free hooks as for SLUB_DEBUG=y ... Browse Code »

There are two versions of alloc/free hooks now - one for
CONFIG_SLUB_DEBUG=y and another one for CONFIG_SLUB_DEBUG=n.

I see no reason why calls to other debugging subsystems (LOCKDEP,
DEBUG_ATOMIC_SLEEP, KMEMCHECK and FAILSLAB) are hidden under SLUB_DEBUG.
All this features should work regardless of SLUB_DEBUG config, as all of
them already have own Kconfig options.

This also fixes failslab for CONFIG_SLUB_DEBUG=n configuration. It
simply has not worked before because should_failslab() call was in a
hook hidden under "#ifdef CONFIG_SLUB_DEBUG #else".

Note: There is one concealed change in allocation path for SLUB_DEBUG=n
and all other debugging features disabled. The might_sleep_if() call
can generate some code even if DEBUG_ATOMIC_SLEEP=n. For
PREEMPT_VOLUNTARY=y might_sleep() inserts _cond_resched() call, but I
think it should be ok.

Signed-off-by: Andrey Ryabinin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Ryabinin
2014-08-07 09:01:14 +0800
c07b8183c mm, slub: mark resiliency_test as init text ... Browse Code »

resiliency_test() is only called for bootstrap, so it may be moved to
init.text and freed after boot.

Signed-off-by: David Rientjes
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-08-07 09:01:14 +0800
fa45dc254 slub: use new node functions ... Browse Code »

Make use of the new node functions in mm/slab.h to reduce code size and
simplify.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter
Cc: Christoph Lameter
Cc: Pekka Enberg
Acked-by: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-08-07 09:01:13 +0800
44c5356fb slab common: add functions for kmem_cache_node access ... Browse Code »

The patchset provides two new functions in mm/slab.h and modifies SLAB
and SLUB to use these. The kmem_cache_node structure is shared between
both allocators and the use of common accessors will allow us to move
more code into slab_common.c in the future.

This patch (of 3):

These functions allow to eliminate repeatedly used code in both SLAB and
SLUB and also allow for the insertion of debugging code that may be
needed in the development process.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Acked-by: David Rientjes
Acked-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-08-07 09:01:13 +0800

04 Jul, 2014

1 commit

8a5b20aeb slub: fix off by one in number of slab tests ... Browse Code »

min_partial means minimum number of slab cached in node partial list.
So, if nr_partial is less than it, we keep newly empty slab on node
partial list rather than freeing it. But if nr_partial is equal or
greater than it, it means that we have enough partial slabs so should
free newly empty slab. Current implementation missed the equal case so
if we set min_partial is 0, then, at least one slab could be cached.
This is critical problem to kmemcg destroying logic because it doesn't
works properly if some slabs is cached. This patch fixes this problem.

Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
immediately").

Signed-off-by: Joonsoo Kim
Acked-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-07-04 00:21:53 +0800

07 Jun, 2014

1 commit

844e4d66f slub: search partial list on numa_mem_id(), instead of numa_node_id() ... Browse Code »

Currently, if allocation constraint to node is NUMA_NO_NODE, we search a
partial slab on numa_node_id() node. This doesn't work properly on a
system having memoryless nodes, since it can have no memory on that node
so there must be no partial slab on that node.

On that node, page allocation always falls back to numa_mem_id() first.
So searching a partial slab on numa_node_id() in that case is the proper
solution for the memoryless node case.

Signed-off-by: Joonsoo Kim
Acked-by: Nishanth Aravamudan
Acked-by: David Rientjes
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: Wanpeng Li
Cc: Han Pingtian
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-06-07 07:08:06 +0800

05 Jun, 2014

10 commits

7c8e0181e mm: replace __get_cpu_var uses with this_cpu_ptr ... Browse Code »

Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().

Signed-off-by: Christoph Lameter
Cc: Tejun Heo
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-06-05 07:54:03 +0800
c67a8a685 memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab ... Browse Code »

Currently we have two pairs of kmemcg-related functions that are called on
slab alloc/free. The first is memcg_{bind,release}_pages that count the
total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.

Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Cc: Michal Hocko
Cc: Glauber Costa
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:54:01 +0800
03afc0e25 slab: get_online_mems for kmem_cache_{create,destroy,shrink} ... Browse Code »

When we create a sl[au]b cache, we allocate kmem_cache_node structures
for each online NUMA node. To handle nodes taken online/offline, we
register memory hotplug notifier and allocate/free kmem_cache_node
corresponding to the node that changes its state for each kmem cache.

To synchronize between the two paths we hold the slab_mutex during both
the cache creationg/destruction path and while tuning per-node parts of
kmem caches in memory hotplug handler, but that's not quite right,
because it does not guarantee that a newly created cache will have all
kmem_cache_nodes initialized in case it races with memory hotplug. For
instance, in case of slub:

CPU0 CPU1
---- ----
kmem_cache_create: online_pages:
__kmem_cache_create: slab_memory_callback:
slab_mem_going_online_callback:
lock slab_mutex
for each slab_caches list entry
allocate kmem_cache node
unlock slab_mutex
lock slab_mutex
init_kmem_cache_nodes:
for_each_node_state(node, N_NORMAL_MEMORY)
allocate kmem_cache node
add kmem_cache to slab_caches list
unlock slab_mutex
online_pages (continued):
node_states_set_node

As a result we'll get a kmem cache with not all kmem_cache_nodes
allocated.

To avoid issues like that we should hold get/put_online_mems() during
the whole kmem cache creation/destruction/shrink paths, just like we
deal with cpu hotplug. This patch does the trick.

Note, that after it's applied, there is no need in taking the slab_mutex
for kmem_cache_shrink any more, so it is removed from there.

Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Tang Chen
Cc: Zhang Yanfei
Cc: Toshi Kani
Cc: Xishi Qiu
Cc: Jiang Liu
Cc: Rafael J. Wysocki
Cc: David Rientjes
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:53:59 +0800
bfc8c9013 mem-hotplug: implement get/put_online_mems ... Browse Code »

kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug. To protect against cpu hotplug, these functions use
{get,put}_online_cpus. However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 2.

What we need there is something like get_online_cpus, but for memory.
We already have lock_memory_hotplug, which serves for the purpose, but
it's a bit of a hammer right now, because it's backed by a mutex. As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus. That's why in
patch 1 I substitute it with get/put_online_mems, which work exactly
like get/put_online_cpus except they block not cpu, but memory hotplug.

[ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by
myself, because it used an rw semaphore for get/put_online_mems,
making them dead lock prune. ]

This patch (of 2):

{un}lock_memory_hotplug, which is used to synchronize against memory
hotplug, is currently backed by a mutex, which makes it a bit of a
hammer - threads that only want to get a stable value of online nodes
mask won't be able to proceed concurrently. Also, it imposes some
strong locking ordering rules on it, which narrows down the set of its
usage scenarios.

This patch introduces get/put_online_mems, which are the same as
get/put_online_cpus, but for memory hotplug, i.e. executing a code
inside a get/put_online_mems section will guarantee a stable value of
online nodes, present pages, etc.

lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.

Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Tang Chen
Cc: Zhang Yanfei
Cc: Toshi Kani
Cc: Xishi Qiu
Cc: Jiang Liu
Cc: Rafael J. Wysocki
Cc: David Rientjes
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:53:59 +0800
52383431b mm: get rid of __GFP_KMEMCG ... Browse Code »

Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page
allocated is then to be freed by free_memcg_kmem_pages. Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path. So let's introduce separate functions that will
alloc/free pages charged to kmemcg.

The new functions are called alloc_kmem_pages and free_kmem_pages. They
should be used when the caller actually would like to use kmalloc, but
has to fall back to the page allocator for the allocation is large.
They only differ from alloc_pages and free_pages in that besides
allocating or freeing pages they also charge them to the kmem resource
counter of the current memory cgroup.

[sfr@canb.auug.org.au: export kmalloc_order() to modules]
Signed-off-by: Vladimir Davydov
Acked-by: Greg Thelen
Cc: Johannes Weiner
Acked-by: Michal Hocko
Cc: Glauber Costa
Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:53:56 +0800
5dfb41750 sl[au]b: charge slabs to kmemcg explicitly ... Browse Code »

We have only a few places where we actually want to charge kmem so
instead of intruding into the general page allocation path with
__GFP_KMEMCG it's better to explictly charge kmem there. All kmem
charges will be easier to follow that way.

This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG
from memcg caches' allocflags. Instead it makes slab allocation path
call memcg_charge_kmem directly getting memcg to charge from the cache's
memcg params.

This also eliminates any possibility of misaccounting an allocation
going from one memcg's cache to another memcg, because now we always
charge slabs against the memcg the cache belongs to. That's why this
patch removes the big comment to memcg_kmem_get_cache.

Signed-off-by: Vladimir Davydov
Acked-by: Greg Thelen
Cc: Johannes Weiner
Acked-by: Michal Hocko
Cc: Glauber Costa
Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:53:56 +0800
8eae14926 mm: slub: fix ALLOC_SLOWPATH stat ... Browse Code »

There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH
got bumped in that exit path. Now there are two, and a bunch of gotos.
ALLOC_SLOWPATH can now get set more than once during a single call to
__slab_alloc() which is pretty bogus. Here's the sequence:

1. Enter __slab_alloc(), fall through all the way to the
stat(s, ALLOC_SLOWPATH);
2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to
new_slab (goto #1)
3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo
(goto #2)
4. Fall through in the same path we did before all the way to
stat(s, ALLOC_SLOWPATH)
5. bump ALLOC_REFILL stat, then return

Doing this is obviously bogus. It keeps us from being able to
accurately compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means
that the total number of allocs always exceeds the total number of
frees.

This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same
place that __slab_alloc() is. This makes it much less likely that
ALLOC_SLOWPATH will get botched again in the spaghetti-code inside
__slab_alloc().

Signed-off-by: Dave Hansen
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-06-05 07:53:56 +0800
9a02d6999 mm, slab: suppress out of memory warning unless debug is enabled ... Browse Code »

When the slab or slub allocators cannot allocate additional slab pages,
they emit diagnostic information to the kernel log such as current
number of slabs, number of objects, active objects, etc. This is always
coupled with a page allocation failure warning since it is controlled by
!__GFP_NOWARN.

Suppress this out of memory warning if the allocator is configured
without debug supported. The page allocation failure warning will
indicate it is a failed slab allocation, the order, and the gfp mask, so
this is only useful to diagnose allocator issues.

Since CONFIG_SLUB_DEBUG is already enabled by default for the slub
allocator, there is no functional change with this patch. If debug is
disabled, however, the warnings are now suppressed.

Signed-off-by: David Rientjes
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-06-05 07:53:56 +0800
ecc42fbe9 mm/slub.c: convert vnsprintf-static to va_format ... Browse Code »

Inspired by Joe Perches suggestion in ntfs logging clean-up.

Signed-off-by: Fabian Frederick
Acked-by: Christoph Lameter
Cc: Joe Perches
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:53:55 +0800
f9f582859 mm/slub.c: convert printk to pr_foo() ... Browse Code »

All printk(KERN_foo converted to pr_foo()

Default printk converted to pr_warn()

Coalesce format fragments

Signed-off-by: Fabian Frederick
Acked-by: Christoph Lameter
Cc: Joe Perches
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:53:55 +0800

07 May, 2014

2 commits

41a212859 slub: use sysfs'es release mechanism for kmem_cache ... Browse Code »
13

debugobjects warning during netfilter exit:

------------[ cut here ]------------
WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in:
CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984
Workqueue: netns cleanup_net
Call Trace:
dump_stack+0x52/0x87
warn_slowpath_common+0x8c/0xc0
warn_slowpath_fmt+0x46/0x50
debug_print_object+0x8d/0xb0
__debug_check_no_obj_freed+0xa5/0x220
debug_check_no_obj_freed+0x15/0x20
kmem_cache_free+0x197/0x340
kmem_cache_destroy+0x86/0xe0
nf_conntrack_cleanup_net_list+0x131/0x170
nf_conntrack_pernet_exit+0x5d/0x70
ops_exit_list+0x5e/0x70
cleanup_net+0xfb/0x1c0
process_one_work+0x338/0x550
worker_thread+0x215/0x350
kthread+0xe7/0xf0
ret_from_fork+0x7c/0xb0

Also during dcookie cleanup:

WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in:
CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
Call Trace:
dump_stack (lib/dump_stack.c:52)
warn_slowpath_common (kernel/panic.c:430)
warn_slowpath_fmt (kernel/panic.c:445)
debug_print_object (lib/debugobjects.c:262)
__debug_check_no_obj_freed (lib/debugobjects.c:697)
debug_check_no_obj_freed (lib/debugobjects.c:726)
kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
kmem_cache_destroy (mm/slab_common.c:363)
dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
__fput (fs/file_table.c:217)
____fput (fs/file_table.c:253)
task_work_run (kernel/task_work.c:125 (discriminator 1))
do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
int_signal (arch/x86/kernel/entry_64.S:807)

Sysfs has a release mechanism. Use that to release the kmem_cache
structure if CONFIG_SYSFS is enabled.

Only slub is changed - slab currently only supports /proc/slabinfo and
not /sys/kernel/slab/*. We talked about adding that and someone was
working on it.

[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
Signed-off-by: Christoph Lameter
Reported-by: Sasha Levin
Tested-by: Sasha Levin
Acked-by: Greg KH
Cc: Thomas Gleixner
Cc: Pekka Enberg
Cc: Russell King
Cc: Bart Van Assche
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-05-07 04:04:59 +0800
93030d83b slub: fix memcg_propagate_slab_attrs ... Browse Code »

After creating a cache for a memcg we should initialize its sysfs attrs
with the values from its parent. That's what memcg_propagate_slab_attrs
is for. Currently it's broken - we clearly muddled root-vs-memcg caches
there. Let's fix it up.

Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-05-07 04:04:58 +0800

14 Apr, 2014

1 commit

bf3a34073 Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:

https://lkml.org/lkml/2013/12/2/64"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate

Linus Torvalds
2014-04-14 04:28:13 +0800

08 Apr, 2014

6 commits

88da03a67 slub: use raw_cpu_inc for incrementing statistics ... Browse Code »

Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.

When __this_cpu_inc is altered to check if preemption is disabled this
triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Christoph Lameter
Cc: Fengguang Wu
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-04-08 07:36:14 +0800
54b6a7310 slub: fix leak of 'name' in sysfs_slab_add ... Browse Code »

The failure paths of sysfs_slab_add don't release the allocation of
'name' made by create_unique_id() a few lines above the context of the
diff below. Create a common exit path to make it more obvious what
needs freeing.

[vdavydov@parallels.com: free the name only if !unmergeable]
Signed-off-by: Dave Jones
Signed-off-by: Vladimir Davydov
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Jones
2014-04-08 07:36:13 +0800
9a41707bd slub: rework sysfs layout for memcg caches ... Browse Code »

Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches. Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache. For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:

X
A -> X
B -> X

where X is some unique id (see create_unique_id()). Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:

X
X:M
X:N
A -> X
B -> X
A:M -> X:M
A:N -> X:N

Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.

It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory. This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.

This patch does the trick. It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:13 +0800
84d0ddd6b slub: adjust memcg caches when creating cache alias ... Browse Code »

Otherwise, kzalloc() called from a memcg won't clear the whole object.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:13 +0800
a44cb9449 memcg, slab: never try to merge memcg caches ... Browse Code »

When a kmem cache is created (kmem_cache_create_memcg()), we first try to
find a compatible cache that already exists and can handle requests from
the new cache, i.e. has the same object size, alignment, ctor, etc. If
there is such a cache, we do not create any new caches, instead we simply
increment the refcount of the cache found and return it.

Currently we do this procedure not only when creating root caches, but
also for memcg caches. However, there is no point in that, because, as
every memcg cache has exactly the same parameters as its parent and cache
merging cannot be turned off in runtime (only on boot by passing
"slub_nomerge"), the root caches of any two potentially mergeable memcg
caches should be merged already, i.e. it must be the same root cache, and
therefore we couldn't even get to the memcg cache creation, because it
already exists.

The only exception is boot caches - they are explicitly forbidden to be
merged by setting their refcount to -1. There are currently only two of
them - kmem_cache and kmem_cache_node, which are used in slab internals (I
do not count kmalloc caches as their refcount is set to 1 immediately
after creation). Since they are prevented from merging preliminary I
guess we should avoid to merge their children too.

So let's remove the useless code responsible for merging memcg caches.

Signed-off-by: Vladimir Davydov
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-08 07:36:12 +0800
2a389610a mm, mempolicy: rename slab_node for clarity ... Browse Code »

slab_node() is actually a mempolicy function, so rename it to
mempolicy_slab_node() to make it clearer that it used for processes with
mempolicies.

At the same time, cleanup its code by saving numa_mem_id() in a local
variable (since we require a node with memory, not just any node) and
remove an obsolete comment that assumes the mempolicy is actually passed
into the function.

Signed-off-by: David Rientjes
Acked-by: Christoph Lameter
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Tejun Heo
Cc: Mel Gorman
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Jianguo Wu
Cc: Tim Hockin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-04-08 07:35:54 +0800

04 Apr, 2014

2 commits

421af243b slub: do not drop slab_mutex for sysfs_slab_add ... Browse Code »

We release the slab_mutex while calling sysfs_slab_add from
__kmem_cache_create since commit 66c4c35c6bc5 ("slub: Do not hold
slub_lock when calling sysfs_slab_add()"), because kobject_uevent called
by sysfs_slab_add might block waiting for the usermode helper to exec,
which would result in a deadlock if we took the slab_mutex while
executing it.

However, apart from complicating synchronization rules, releasing the
slab_mutex on kmem cache creation can result in a kmemcg-related race.
The point is that we check if the memcg cache exists before going to
__kmem_cache_create, but register the new cache in memcg subsys after
it. Since we can drop the mutex there, several threads can see that the
memcg cache does not exist and proceed to creating it, which is wrong.

Fortunately, recently kobject_uevent was patched to call the usermode
helper with the UMH_NO_WAIT flag, making the deadlock impossible.
Therefore there is no point in releasing the slab_mutex while calling
sysfs_slab_add, so let's simplify kmem_cache_create synchronization and
fix the kmemcg-race mentioned above by holding the slab_mutex during the
whole cache creation path.

Signed-off-by: Vladimir Davydov
Acked-by: Christoph Lameter
Cc: Greg KH
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-04-04 07:21:05 +0800
d26914d11 mm: optimize put_mems_allowed() usage ... Browse Code »
7

Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.

Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.

This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.

Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2014-04-04 07:20:58 +0800

27 Mar, 2014

1 commit

80c3a9981 slub: fix high order page allocation problem with __GFP_NOFAIL ... Browse Code »

SLUB already try to allocate high order page with clearing __GFP_NOFAIL.
But, when allocating shadow page for kmemcheck, it missed clearing
the flag. This trigger WARN_ON_ONCE() reported by Christian Casteyde.

https://bugzilla.kernel.org/show_bug.cgi?id=65991
https://lkml.org/lkml/2013/12/3/764

This patch fix this situation by using same allocation flag as original
allocation.

Reported-by: Christian Casteyde
Acked-by: David Rientjes
Signed-off-by: Joonsoo Kim
Signed-off-by: Pekka Enberg

Joonsoo Kim
2014-03-27 20:27:34 +0800

11 Feb, 2014

2 commits

1e4dd9461 slub: do not assert not having lock in removing freed partial ... Browse Code »
2

Vladimir reported the following issue:

Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires
remove_partial() to be called with n->list_lock held, but free_partial()
called from kmem_cache_close() on cache destruction does not follow this
rule, leading to a warning:

WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
Modules linked in:
CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1
Hardware name:
0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
Call Trace:
__kmem_cache_shutdown+0x1b2/0x1f0
kmem_cache_destroy+0x43/0xf0
xfs_destroy_zones+0x103/0x110 [xfs]
exit_xfs_fs+0x38/0x4e4 [xfs]
SyS_delete_module+0x19a/0x1f0
system_call_fastpath+0x16/0x1b

His solution was to add a spinlock in order to quiet lockdep. Although
there would be no contention to adding the lock, that lock also requires
disabling of interrupts which will have a larger impact on the system.

Instead of adding a spinlock to a location where it is not needed for
lockdep, make a __remove_partial() function that does not test if the
list_lock is held, as no one should have it due to it being freed.

Also added a __add_partial() function that does not do the lock
validation either, as it is not needed for the creation of the cache.

Signed-off-by: Steven Rostedt
Reported-by: Vladimir Davydov
Suggested-by: David Rientjes
Acked-by: David Rientjes
Acked-by: Vladimir Davydov
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
2014-02-11 08:01:42 +0800
255d0884f mm/slub.c: list_lock may not be held in some circumstances ... Browse Code »

Commit c65c1877bd68 ("slub: use lockdep_assert_held") incorrectly
required that add_full() and remove_full() hold n->list_lock. The lock
is only taken when kmem_cache_debug(s), since that's the only time it
actually does anything.

Require that the lock only be taken under such a condition.

Reported-by: Larry Finger
Tested-by: Larry Finger
Tested-by: Paul E. McKenney
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-02-11 08:01:41 +0800

03 Feb, 2014

1 commit

7b383bef2 Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull SLAB changes from Pekka Enberg:
"Random bug fixes that have accumulated in my inbox over the past few
months"

* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: Fix warning on make htmldocs caused by slab.c
mm: slub: work around unneeded lockdep warning
mm: sl[uo]b: fix misleading comments
slub: Fix possible format string bug.
slub: use lockdep_assert_held
slub: Fix calculation of cpu slabs
slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings

Linus Torvalds
2014-02-03 03:30:08 +0800

31 Jan, 2014

2 commits

67b6c900d mm: slub: work around unneeded lockdep warning ... Browse Code »

The slub code does some setup during early boot in
early_kmem_cache_node_alloc() with some local data. There is no
possible way that another CPU can see this data, so the slub code
doesn't unnecessarily lock it. However, some new lockdep asserts
check to make sure that add_partial() _always_ has the list_lock
held.

Just add the locking, even though it is technically unnecessary.

Cc: Peter Zijlstra
Cc: Russell King
Acked-by: David Rientjes
Signed-off-by: Dave Hansen
Signed-off-by: Pekka Enberg

Dave Hansen
2014-01-31 19:41:26 +0800
a03208652 mm/slub.c: fix page->_count corruption (again) ... Browse Code »

Commit abca7c496584 ("mm: fix slab->page _count corruption when using
slub") notes that we can not _set_ a page->counters directly, except
when using a real double-cmpxchg. Doing so can lose updates to
->_count.

That is an absolute rule:

You may not *set* page->counters except via a cmpxchg.

Commit abca7c496584 fixed this for the folks who have the slub
cmpxchg_double code turned off at compile time, but it left the bad case
alone. It can still be reached, and the same bug triggered in two
cases:

1. Turning on slub debugging at runtime, which is available on
the distro kernels that I looked at.
2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
cpus, evidently)

There are at least 3 ways we could fix this:

1. Take all of the exising calls to cmpxchg_double_slab() and
__cmpxchg_double_slab() and convert them to take an old, new
and target 'struct page'.
2. Do (1), but with the newly-introduced 'slub_data'.
3. Do some magic inside the two cmpxchg...slab() functions to
pull the counters out of new_counters and only set those
fields in page->{inuse,frozen,objects}.

I've done (2) as well, but it's a bunch more code. This patch is an
attempt at (3). This was the most straightforward and foolproof way
that I could think to do this.

This would also technically allow us to get rid of the ugly

#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)

in 'struct page', but leaving it alone has the added benefit that
'counters' stays 'unsigned' instead of 'unsigned long', so all the
copies that the slub code does stay a bit smaller.

Signed-off-by: Dave Hansen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Matt Mackall
Cc: Pravin B Shelar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-01-31 08:56:56 +0800