08 Apr, 2014

1 commit

  • When a kmem cache is created (kmem_cache_create_memcg()), we first try to
    find a compatible cache that already exists and can handle requests from
    the new cache, i.e. has the same object size, alignment, ctor, etc. If
    there is such a cache, we do not create any new caches, instead we simply
    increment the refcount of the cache found and return it.

    Currently we do this procedure not only when creating root caches, but
    also for memcg caches. However, there is no point in that, because, as
    every memcg cache has exactly the same parameters as its parent and cache
    merging cannot be turned off in runtime (only on boot by passing
    "slub_nomerge"), the root caches of any two potentially mergeable memcg
    caches should be merged already, i.e. it must be the same root cache, and
    therefore we couldn't even get to the memcg cache creation, because it
    already exists.

    The only exception is boot caches - they are explicitly forbidden to be
    merged by setting their refcount to -1. There are currently only two of
    them - kmem_cache and kmem_cache_node, which are used in slab internals (I
    do not count kmalloc caches as their refcount is set to 1 immediately
    after creation). Since they are prevented from merging preliminary I
    guess we should avoid to merge their children too.

    So let's remove the useless code responsible for merging memcg caches.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

24 Jan, 2014

2 commits

  • We relocate root cache's memcg_params whenever we need to grow the
    memcg_caches array to accommodate all kmem-active memory cgroups.
    Currently on relocation we free the old version immediately, which can
    lead to use-after-free, because the memcg_caches array is accessed
    lock-free (see cache_from_memcg_idx()). This patch fixes this by making
    memcg_params RCU-protected for root caches.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Each root kmem_cache has pointers to per-memcg caches stored in its
    memcg_params::memcg_caches array. Whenever we want to allocate a slab
    for a memcg, we access this array to get per-memcg cache to allocate
    from (see memcg_kmem_get_cache()). The access must be lock-free for
    performance reasons, so we should use barriers to assert the kmem_cache
    is up-to-date.

    First, we should place a write barrier immediately before setting the
    pointer to it in the memcg_caches array in order to make sure nobody
    will see a partially initialized object. Second, we should issue a read
    barrier before dereferencing the pointer to conform to the write
    barrier.

    However, currently the barrier usage looks rather strange. We have a
    write barrier *after* setting the pointer and a read barrier *before*
    reading the pointer, which is incorrect. This patch fixes this.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

13 Nov, 2013

1 commit


29 Aug, 2013

1 commit

  • If the system had a few memory groups and all of them were destroyed,
    memcg_limited_groups_array_size has non-zero value, but all new caches
    are created without memcg_params, because memcg_kmem_enabled() returns
    false.

    We try to enumirate child caches in a few places and all of them are
    potentially dangerous.

    For example my kernel is compiled with CONFIG_SLAB and it crashed when I
    tryed to mount a NFS share after a few experiments with kmemcg.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] do_tune_cpucache+0x8a/0xd0
    PGD b942a067 PUD b999f067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: fscache(+) ip6table_filter ip6_tables iptable_filter ip_tables i2c_piix4 pcspkr virtio_net virtio_balloon i2c_core floppy
    CPU: 0 PID: 357 Comm: modprobe Not tainted 3.11.0-rc7+ #59
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800b9f98240 ti: ffff8800ba32e000 task.ti: ffff8800ba32e000
    RIP: 0010:[] [] do_tune_cpucache+0x8a/0xd0
    RSP: 0018:ffff8800ba32fb70 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: ffff8800b9f98910 RDI: 0000000000000246
    RBP: ffff8800ba32fba0 R08: 0000000000000002 R09: 0000000000000004
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010
    R13: 0000000000000008 R14: 00000000000000d0 R15: ffff8800375d0200
    FS: 00007f55f1378740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f24feba57a0 CR3: 0000000037b51000 CR4: 00000000000006f0
    Call Trace:
    enable_cpucache+0x49/0x100
    setup_cpu_cache+0x215/0x280
    __kmem_cache_create+0x2fa/0x450
    kmem_cache_create_memcg+0x214/0x350
    kmem_cache_create+0x2b/0x30
    fscache_init+0x19b/0x230 [fscache]
    do_one_initcall+0xfa/0x1b0
    load_module+0x1c41/0x26d0
    SyS_finit_module+0x86/0xb0
    system_call_fastpath+0x16/0x1b

    Signed-off-by: Andrey Vagin
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Glauber Costa
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

08 Jul, 2013

1 commit


07 Jul, 2013

1 commit


01 Feb, 2013

4 commits


19 Dec, 2012

6 commits

  • SLAB allows us to tune a particular cache behavior with tunables. When
    creating a new memcg cache copy, we'd like to preserve any tunables the
    parent cache already had.

    This could be done by an explicit call to do_tune_cpucache() after the
    cache is created. But this is not very convenient now that the caches are
    created from common code, since this function is SLAB-specific.

    Another method of doing that is taking advantage of the fact that
    do_tune_cpucache() is always called from enable_cpucache(), which is
    called at cache initialization. We can just preset the values, and then
    things work as expected.

    It can also happen that a root cache has its tunables updated during
    normal system operation. In this case, we will propagate the change to
    all caches that are already active.

    This change will require us to move the assignment of root_cache in
    memcg_params a bit earlier. We need this to be already set - which
    memcg_kmem_register_cache will do - when we reach __kmem_cache_create()

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • When we create caches in memcgs, we need to display their usage
    information somewhere. We'll adopt a scheme similar to /proc/meminfo,
    with aggregate totals shown in the global file, and per-group information
    stored in the group itself.

    For the time being, only reads are allowed in the per-group cache.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • Implement destruction of memcg caches. Right now, only caches where our
    reference counter is the last remaining are deleted. If there are any
    other reference counters around, we just leave the caches lying around
    until they go away.

    When that happens, a destruction function is called from the cache code.
    Caches are only destroyed in process context, so we queue them up for
    later processing in the general case.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • struct page already has this information. If we start chaining caches,
    this information will always be more trustworthy than whatever is passed
    into the function.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • Allow a memcg parameter to be passed during cache creation. When the slub
    allocator is being used, it will only merge caches that belong to the same
    memcg. We'll do this by scanning the global list, and then translating
    the cache to a memcg-specific cache

    Default function is created as a wrapper, passing NULL to the memcg
    version. We only merge caches that belong to the same memcg.

    A helper is provided, memcg_css_id: because slub needs a unique cache name
    for sysfs. Since this is visible, but not the canonical location for slab
    data, the cache name is not used, the css_id should suffice.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • For the kmem slab controller, we need to record some extra information in
    the kmem_cache structure.

    Signed-off-by: Glauber Costa
    Signed-off-by: Suleiman Souhlal
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

11 Dec, 2012

2 commits


31 Oct, 2012

1 commit

  • Some flags are used internally by the allocators for management
    purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses
    to mark that the metadata for that cache is stored outside of the slab.

    No cache should ever pass those as a creation flags. We can just ignore
    this bit if it happens to be passed (such as when duplicating a cache in
    the kmem memcg patches).

    Because such flags can vary from allocator to allocator, we allow them
    to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with
    all flags that are valid at creation time. Allocators that doesn't have
    any specific flag requirement should define that to mean all flags.

    Common code will mask out all flags not belonging to that set.

    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: Glauber Costa
    Signed-off-by: Pekka Enberg

    Glauber Costa
     

24 Oct, 2012

3 commits

  • With all the infrastructure in place, we can now have slabinfo_show
    done from slab_common.c. A cache-specific function is called to grab
    information about the cache itself, since that is still heavily
    dependent on the implementation. But with the values produced by it, all
    the printing and handling is done from common code.

    Signed-off-by: Glauber Costa
    CC: Christoph Lameter
    CC: David Rientjes
    Signed-off-by: Pekka Enberg

    Glauber Costa
     
  • The header format is highly similar between slab and slub. The main
    difference lays in the fact that slab may optionally have statistics
    added here in case of CONFIG_SLAB_DEBUG, while the slub will stick them
    somewhere else.

    By making sure that information conditionally lives inside a
    globally-visible CONFIG_DEBUG_SLAB switch, we can move the header
    printing to a common location.

    Signed-off-by: Glauber Costa
    Acked-by: Christoph Lameter
    CC: David Rientjes
    Signed-off-by: Pekka Enberg

    Glauber Costa
     
  • This patch moves all the common machinery to slabinfo processing
    to slab_common.c. We can do better by noticing that the output is
    heavily common, and having the allocators to just provide finished
    information about this. But after this first step, this can be done
    easier.

    Signed-off-by: Glauber Costa
    Acked-by: Christoph Lameter
    CC: David Rientjes
    Signed-off-by: Pekka Enberg

    Glauber Costa
     

05 Sep, 2012

8 commits

  • This reverts commit 96d17b7be0a9849d381442030886211dbb2a7061 which
    caused the following errors at boot:

    [ 1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
    [ 1.114885] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
    [ 1.114885] Call Trace:
    [ 1.114885] [] kobject_init+0x87/0xa0
    [ 1.115555] [] kobject_init_and_add+0x2a/0x90
    [ 1.115555] [] ? sprintf+0x40/0x50
    [ 1.115555] [] sysfs_slab_add+0x80/0x210
    [ 1.115555] [] kmem_cache_create+0xa5/0x250
    [ 1.115555] [] ? md_init+0x144/0x144
    [ 1.115555] [] local_init+0xa4/0x11b
    [ 1.115555] [] dm_init+0x14/0x45
    [ 1.115836] [] do_one_initcall+0x3a/0x160
    [ 1.116834] [] kernel_init+0x133/0x1b7
    [ 1.117835] [] ? do_early_param+0x86/0x86
    [ 1.117835] [] kernel_thread_helper+0x4/0x10
    [ 1.118401] [] ? start_kernel+0x33f/0x33f
    [ 1.119832] [] ? gs_change+0xb/0xb
    [ 1.120325] ------------[ cut here ]------------
    [ 1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
    [ 1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
    [ 1.121831] Modules linked in:
    [ 1.122138] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
    [ 1.122831] Call Trace:
    [ 1.123074] [] ? sysfs_add_one+0xc1/0xf0
    [ 1.123833] [] warn_slowpath_common+0x7a/0xb0
    [ 1.124405] [] warn_slowpath_fmt+0x41/0x50
    [ 1.124832] [] sysfs_add_one+0xc1/0xf0
    [ 1.125337] [] create_dir+0x73/0xd0
    [ 1.125832] [] sysfs_create_dir+0x81/0xe0
    [ 1.126363] [] kobject_add_internal+0x9d/0x210
    [ 1.126832] [] kobject_init_and_add+0x63/0x90
    [ 1.127406] [] sysfs_slab_add+0x80/0x210
    [ 1.127832] [] kmem_cache_create+0xa5/0x250
    [ 1.128384] [] ? md_init+0x144/0x144
    [ 1.128833] [] local_init+0xa4/0x11b
    [ 1.129831] [] dm_init+0x14/0x45
    [ 1.130305] [] do_one_initcall+0x3a/0x160
    [ 1.130831] [] kernel_init+0x133/0x1b7
    [ 1.131351] [] ? do_early_param+0x86/0x86
    [ 1.131830] [] kernel_thread_helper+0x4/0x10
    [ 1.132392] [] ? start_kernel+0x33f/0x33f
    [ 1.132830] [] ? gs_change+0xb/0xb
    [ 1.133315] ---[ end trace 2703540871c8fab7 ]---
    [ 1.133830] ------------[ cut here ]------------
    [ 1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
    [ 1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
    [ 1.135829] Modules linked in:
    [ 1.136135] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
    [ 1.136828] Call Trace:
    [ 1.137071] [] ? kobject_add_internal+0x1f5/0x210
    [ 1.137830] [] warn_slowpath_common+0x7a/0xb0
    [ 1.138402] [] warn_slowpath_fmt+0x41/0x50
    [ 1.138830] [] ? release_sysfs_dirent+0x73/0xf0
    [ 1.139419] [] kobject_add_internal+0x1f5/0x210
    [ 1.139830] [] kobject_init_and_add+0x63/0x90
    [ 1.140429] [] sysfs_slab_add+0x80/0x210
    [ 1.140830] [] kmem_cache_create+0xa5/0x250
    [ 1.141829] [] ? md_init+0x144/0x144
    [ 1.142307] [] local_init+0xa4/0x11b
    [ 1.142829] [] dm_init+0x14/0x45
    [ 1.143307] [] do_one_initcall+0x3a/0x160
    [ 1.143829] [] kernel_init+0x133/0x1b7
    [ 1.144352] [] ? do_early_param+0x86/0x86
    [ 1.144829] [] kernel_thread_helper+0x4/0x10
    [ 1.145405] [] ? start_kernel+0x33f/0x33f
    [ 1.145828] [] ? gs_change+0xb/0xb
    [ 1.146313] ---[ end trace 2703540871c8fab8 ]---

    Conflicts:

    mm/slub.c

    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • Do the initial settings of the fields in common code. This will allow us
    to push more processing into common code later and improve readability.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Shift the allocations to common code. That way the allocation and
    freeing of the kmem_cache structures is handled by common code.

    Reviewed-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Simplify locking by moving the slab_add_sysfs after all locks have been
    dropped. Eases the upcoming move to provide sysfs support for all
    allocators.

    Reviewed-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • The slab aliasing logic causes some strange contortions in slub. So add
    a call to deal with aliases to slab_common.c but disable it for other
    slab allocators by providng stubs that fail to create aliases.

    Full general support for aliases will require additional cleanup passes
    and more standardization of fields in kmem_cache.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • What is done there can be done in __kmem_cache_shutdown.

    This affects RCU handling somewhat. On rcu free all slab allocators do
    not refer to other management structures than the kmem_cache structure.
    Therefore these other structures can be freed before the rcu deferred
    free to the page allocator occurs.

    Reviewed-by: Joonsoo Kim
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
    structure.

    Reviewed-by: Glauber Costa
    Reviewed-by: Joonsoo Kim
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • kmem_cache_destroy does basically the same in all allocators.

    Extract common code which is easy since we already have common mutex
    handling.

    Reviewed-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

09 Jul, 2012

2 commits

  • Use the mutex definition from SLAB and make it the common way to take a sleeping lock.

    This has the effect of using a mutex instead of a rw semaphore for SLUB.

    SLOB gains the use of a mutex for kmem_cache_create serialization.
    Not needed now but SLOB may acquire some more features later (like slabinfo
    / sysfs support) through the expansion of the common code that will
    need this.

    Reviewed-by: Glauber Costa
    Reviewed-by: Joonsoo Kim
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • All allocators have some sort of support for the bootstrap status.

    Setup a common definition for the boot states and make all slab
    allocators use that definition.

    Reviewed-by: Glauber Costa
    Reviewed-by: Joonsoo Kim
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter