26 Mar, 2016

1 commit

  • Add KASAN hooks to SLAB allocator.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

16 Mar, 2016

1 commit

  • SLUB already has a redzone debugging feature. But it is only positioned
    at the end of object (aka right redzone) so it cannot catch left oob.
    Although current object's right redzone acts as left redzone of next
    object, first object in a slab cannot take advantage of this effect.
    This patch explicitly adds a left red zone to each object to detect left
    oob more precisely.

    Background:

    Someone complained to me that left OOB doesn't catch even if KASAN is
    enabled which does page allocation debugging. That page is out of our
    control so it would be allocated when left OOB happens and, in this
    case, we can't find OOB. Moreover, SLUB debugging feature can be
    enabled without page allocator debugging and, in this case, we will miss
    that OOB.

    Before trying to implement, I expected that changes would be too
    complex, but, it doesn't look that complex to me now. Almost changes
    are applied to debug specific functions so I feel okay.

    Signed-off-by: Joonsoo Kim
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

21 Jan, 2016

1 commit


14 Feb, 2015

2 commits

  • Remove static and add function declarations to linux/slub_def.h so it
    could be used by kernel address sanitizer.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • virt_to_obj takes kmem_cache address, address of slab page, address x
    pointing somewhere inside slab object, and returns address of the
    beginning of object.

    Signed-off-by: Andrey Ryabinin
    Acked-by: Christoph Lameter
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

13 Feb, 2015

1 commit

  • Currently, kmem_cache stores a pointer to struct memcg_cache_params
    instead of embedding it. The rationale is to save memory when kmem
    accounting is disabled. However, the memcg_cache_params has shrivelled
    drastically since it was first introduced:

    * Initially:

    struct memcg_cache_params {
    bool is_root_cache;
    union {
    struct kmem_cache *memcg_caches[0];
    struct {
    struct mem_cgroup *memcg;
    struct list_head list;
    struct kmem_cache *root_cache;
    bool dead;
    atomic_t nr_pages;
    struct work_struct destroy;
    };
    };
    };

    * Now:

    struct memcg_cache_params {
    bool is_root_cache;
    union {
    struct {
    struct rcu_head rcu_head;
    struct kmem_cache *memcg_caches[0];
    };
    struct {
    struct mem_cgroup *memcg;
    struct kmem_cache *root_cache;
    };
    };
    };

    So the memory saving does not seem to be a clear win anymore.

    OTOH, keeping a pointer to memcg_cache_params struct instead of embedding
    it results in touching one more cache line on kmem alloc/free hot paths.
    Besides, it makes linking kmem caches in a list chained by a field of
    struct memcg_cache_params really painful due to a level of indirection,
    while I want to make them linked in the following patch. That said, let
    us embed it.

    Signed-off-by: Vladimir Davydov
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Cc: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

07 May, 2014

1 commit

  • debugobjects warning during netfilter exit:

    ------------[ cut here ]------------
    WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
    Modules linked in:
    CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984
    Workqueue: netns cleanup_net
    Call Trace:
    dump_stack+0x52/0x87
    warn_slowpath_common+0x8c/0xc0
    warn_slowpath_fmt+0x46/0x50
    debug_print_object+0x8d/0xb0
    __debug_check_no_obj_freed+0xa5/0x220
    debug_check_no_obj_freed+0x15/0x20
    kmem_cache_free+0x197/0x340
    kmem_cache_destroy+0x86/0xe0
    nf_conntrack_cleanup_net_list+0x131/0x170
    nf_conntrack_pernet_exit+0x5d/0x70
    ops_exit_list+0x5e/0x70
    cleanup_net+0xfb/0x1c0
    process_one_work+0x338/0x550
    worker_thread+0x215/0x350
    kthread+0xe7/0xf0
    ret_from_fork+0x7c/0xb0

    Also during dcookie cleanup:

    WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
    Modules linked in:
    CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
    Call Trace:
    dump_stack (lib/dump_stack.c:52)
    warn_slowpath_common (kernel/panic.c:430)
    warn_slowpath_fmt (kernel/panic.c:445)
    debug_print_object (lib/debugobjects.c:262)
    __debug_check_no_obj_freed (lib/debugobjects.c:697)
    debug_check_no_obj_freed (lib/debugobjects.c:726)
    kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
    kmem_cache_destroy (mm/slab_common.c:363)
    dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
    event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
    __fput (fs/file_table.c:217)
    ____fput (fs/file_table.c:253)
    task_work_run (kernel/task_work.c:125 (discriminator 1))
    do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
    int_signal (arch/x86/kernel/entry_64.S:807)

    Sysfs has a release mechanism. Use that to release the kmem_cache
    structure if CONFIG_SYSFS is enabled.

    Only slub is changed - slab currently only supports /proc/slabinfo and
    not /sys/kernel/slab/*. We talked about adding that and someone was
    working on it.

    [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
    [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
    Signed-off-by: Christoph Lameter
    Reported-by: Sasha Levin
    Tested-by: Sasha Levin
    Acked-by: Greg KH
    Cc: Thomas Gleixner
    Cc: Pekka Enberg
    Cc: Russell King
    Cc: Bart Van Assche
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 Apr, 2014

1 commit

  • Currently, we try to arrange sysfs entries for memcg caches in the same
    manner as for global caches. Apart from turning /sys/kernel/slab into a
    mess when there are a lot of kmem-active memcgs created, it actually
    does not work properly - we won't create more than one link to a memcg
    cache in case its parent is merged with another cache. For instance, if
    A is a root cache merged with another root cache B, we will have the
    following sysfs setup:

    X
    A -> X
    B -> X

    where X is some unique id (see create_unique_id()). Now if memcgs M and
    N start to allocate from cache A (or B, which is the same), we will get:

    X
    X:M
    X:N
    A -> X
    B -> X
    A:M -> X:M
    A:N -> X:N

    Since B is an alias for A, we won't get entries B:M and B:N, which is
    confusing.

    It is more logical to have entries for memcg caches under the
    corresponding root cache's sysfs directory. This would allow us to keep
    sysfs layout clean, and avoid such inconsistencies like one described
    above.

    This patch does the trick. It creates a "cgroup" kset in each root
    cache kobject to keep its children caches there.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

23 Nov, 2013

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
    slab internals similar to mm/slub.c. This reduces memory usage and
    improves performance:

    https://lkml.org/lkml/2013/10/16/155

    Rest of the changes are bug fixes from various people"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
    mm, slub: fix the typo in mm/slub.c
    mm, slub: fix the typo in include/linux/slub_def.h
    slub: Handle NULL parameter in kmem_cache_flags
    slab: replace non-existing 'struct freelist *' with 'void *'
    slab: fix to calm down kmemleak warning
    slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
    slab: rename slab_bufctl to slab_freelist
    slab: remove useless statement for checking pfmemalloc
    slab: use struct page for slab management
    slab: replace free and inuse in struct slab with newly introduced active
    slab: remove SLAB_LIMIT
    slab: remove kmem_bufctl_t
    slab: change the management method of free objects of the slab
    slab: use __GFP_COMP flag for allocating slab pages
    slab: use well-defined macro, virt_to_slab()
    slab: overloading the RCU head over the LRU for RCU free
    slab: remove cachep in struct slab_rcu
    slab: remove nodeid in struct slab
    slab: remove colouroff in struct slab
    slab: change return type of kmem_getpages() to struct page
    ...

    Linus Torvalds
     

12 Nov, 2013

1 commit


05 Sep, 2013

2 commits

  • I do not see any user for this code in the tree.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • The kmalloc* functions of all slab allcoators are similar now so
    lets move them into slab.h. This requires some function naming changes
    in slob.

    As a results of this patch there is a common set of functions for
    all allocators. Also means that kmalloc_large() is now available
    in general to perform large order allocations that go directly
    via the page allocator. kmalloc_large() can be substituted if
    kmalloc() throws warnings because of too large allocations.

    kmalloc_large() has exactly the same semantics as kmalloc but
    can only used for allocations > PAGE_SIZE.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

01 Feb, 2013

5 commits

  • Put the definitions for the kmem_cache_node structures together so that
    we have one structure. That will allow us to create more common fields in
    the future which could yield more opportunities to share code.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Extract the optimized lookup functions from slub and put them into
    slab_common.c. Then make slab use these functions as well.

    Joonsoo notes that this fixes some issues with constant folding which
    also reduces the code size for slub.

    https://lkml.org/lkml/2012/10/20/82

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Have a common definition fo the kmalloc cache arrays in
    SLAB and SLUB

    Acked-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Standardize the constants that describe the smallest and largest
    object kept in the kmalloc arrays for SLAB and SLUB.

    Differentiate between the maximum size for which a slab cache is used
    (KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size
    (KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Extract the function to determine the index of the slab within
    the array of kmalloc caches as well as a function to determine
    maximum object size from the nr of the kmalloc slab.

    This is used here only to simplify slub bootstrap but will
    be used later also for SLAB.

    Acked-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

19 Dec, 2012

3 commits

  • SLUB allows us to tune a particular cache behavior with sysfs-based
    tunables. When creating a new memcg cache copy, we'd like to preserve any
    tunables the parent cache already had.

    This can be done by tapping into the store attribute function provided by
    the allocator. We of course don't need to mess with read-only fields.
    Since the attributes can have multiple types and are stored internally by
    sysfs, the best strategy is to issue a ->show() in the root cache, and
    then ->store() in the memcg cache.

    The drawback of that, is that sysfs can allocate up to a page in buffering
    for show(), that we are likely not to need, but also can't guarantee. To
    avoid always allocating a page for that, we can update the caches at store
    time with the maximum attribute size ever stored to the root cache. We
    will then get a buffer big enough to hold it. The corolary to this, is
    that if no stores happened, nothing will be propagated.

    It can also happen that a root cache has its tunables updated during
    normal system operation. In this case, we will propagate the change to
    all caches that are already active.

    [akpm@linux-foundation.org: tweak code to avoid __maybe_unused]
    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • We are able to match a cache allocation to a particular memcg. If the
    task doesn't change groups during the allocation itself - a rare event,
    this will give us a good picture about who is the first group to touch a
    cache page.

    This patch uses the now available infrastructure by calling
    memcg_kmem_get_cache() before all the cache allocations.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • For the kmem slab controller, we need to record some extra information in
    the kmem_cache structure.

    Signed-off-by: Glauber Costa
    Signed-off-by: Suleiman Souhlal
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

14 Jun, 2012

1 commit

  • Define a struct that describes common fields used in all slab allocators.
    A slab allocator either uses the common definition (like SLOB) or is
    required to provide members of kmem_cache with the definition given.

    After that it will be possible to share code that
    only operates on those fields of kmem_cache.

    The patch basically takes the slob definition of kmem cache and
    uses the field namees for the other allocators.

    It also standardizes the names used for basic object lengths in
    allocators:

    object_size Struct size specified at kmem_cache_create. Basically
    the payload expected to be used by the subsystem.

    size The size of memory allocator for each object. This size
    is larger than object_size and includes padding, alignment
    and extra metadata for each object (f.e. for debugging
    and rcu).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

01 Jun, 2012

1 commit

  • The node field is always page_to_nid(c->page). So its rather easy to
    replace. Note that there maybe slightly more overhead in various hot paths
    due to the need to shift the bits from page->flags. However, that is mostly
    compensated for by a smaller footprint of the kmem_cache_cpu structure (this
    patch reduces that to 3 words per cache) which allows better caching.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

29 Mar, 2012

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "There's the new kmalloc_array() API, minor fixes and performance
    improvements, but quite honestly, nothing terribly exciting."

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    mm: SLAB Out-of-memory diagnostics
    slab: introduce kmalloc_array()
    slub: per cpu partial statistics change
    slub: include include for prefetch
    slub: Do not hold slub_lock when calling sysfs_slab_add()
    slub: prefetch next freelist pointer in slab_alloc()
    slab, cleanup: remove unneeded return

    Linus Torvalds
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

18 Feb, 2012

1 commit

  • This patch split the cpu_partial_free into 2 parts: cpu_partial_node, PCP refilling
    times from node partial; and same name cpu_partial_free, PCP refilling times in
    slab_free slow path. A new statistic 'cpu_partial_drain' is added to get PCP
    drain to node partial times. These info are useful when do PCP tunning.

    The slabinfo.c code is unchanged, since cpu_partial_node is not on slow path.

    Signed-off-by: Alex Shi
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Alex Shi
     

28 Sep, 2011

1 commit


20 Aug, 2011

1 commit

  • Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
    partial pages. The partial page list is used in slab_free() to avoid
    per node lock taking.

    In __slab_alloc() we can then take multiple partial pages off the per
    node partial list in one go reducing node lock pressure.

    We can also use the per cpu partial list in slab_alloc() to avoid scanning
    partial lists for pages with free objects.

    The main effect of a per cpu partial list is that the per node list_lock
    is taken for batches of partial pages instead of individual ones.

    Potential future enhancements:

    1. The pickup from the partial list could be perhaps be done without disabling
    interrupts with some work. The free path already puts the page into the
    per cpu partial list without disabling interrupts.

    2. __slab_free() may have some code paths that could use optimization.

    Performance:

    Before After
    ./hackbench 100 process 200000
    Time: 1953.047 1564.614
    ./hackbench 100 process 20000
    Time: 207.176 156.940
    ./hackbench 100 process 20000
    Time: 204.468 156.940
    ./hackbench 100 process 20000
    Time: 204.879 158.772
    ./hackbench 10 process 20000
    Time: 20.153 15.853
    ./hackbench 10 process 20000
    Time: 20.153 15.986
    ./hackbench 10 process 20000
    Time: 19.363 16.111
    ./hackbench 1 process 20000
    Time: 2.518 2.307
    ./hackbench 1 process 20000
    Time: 2.258 2.339
    ./hackbench 1 process 20000
    Time: 2.864 2.163

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

31 Jul, 2011

1 commit

  • * 'slub/lockless' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: (21 commits)
    slub: When allocating a new slab also prep the first object
    slub: disable interrupts in cmpxchg_double_slab when falling back to pagelock
    Avoid duplicate _count variables in page_struct
    Revert "SLUB: Fix build breakage in linux/mm_types.h"
    SLUB: Fix build breakage in linux/mm_types.h
    slub: slabinfo update for cmpxchg handling
    slub: Not necessary to check for empty slab on load_freelist
    slub: fast release on full slab
    slub: Add statistics for the case that the current slab does not match the node
    slub: Get rid of the another_slab label
    slub: Avoid disabling interrupts in free slowpath
    slub: Disable interrupts in free_debug processing
    slub: Invert locking and avoid slab lock
    slub: Rework allocator fastpaths
    slub: Pass kmem_cache struct to lock and freeze slab
    slub: explicit list_lock taking
    slub: Add cmpxchg_double_slab()
    mm: Rearrange struct page
    slub: Move page->frozen handling near where the page->freelist handling occurs
    slub: Do not use frozen page flag but a bit in the page counters
    ...

    Linus Torvalds
     

08 Jul, 2011

1 commit


02 Jul, 2011

3 commits


17 Jun, 2011

1 commit

  • Every slab has its on alignment definition in include/linux/sl?b_def.h. Extract those
    and define a common set in include/linux/slab.h.

    SLOB: As notes sometimes we need double word alignment on 32 bit. This gives all
    structures allocated by SLOB a unsigned long long alignment like the others do.

    SLAB: If ARCH_SLAB_MINALIGN is not set SLAB would set ARCH_SLAB_MINALIGN to
    zero meaning no alignment at all. Give it the default unsigned long long alignment.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

21 May, 2011

1 commit


08 May, 2011

1 commit

  • Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
    everywhere.

    There may be performance implications since:

    A. We now have to manage a transaction ID for all arches

    B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
    to a very short irqoff section.

    There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
    case we only have to do one disable and enable like before.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

23 Mar, 2011

1 commit


21 Mar, 2011

1 commit


12 Mar, 2011

1 commit

  • There is no "struct" for slub's slab, it shares with struct page.
    But struct page is very small, it is insufficient when we need
    to add some metadata for slab.

    So we add a field "reserved" to struct kmem_cache, when a slab
    is allocated, kmem_cache->reserved bytes are automatically reserved
    at the end of the slab for slab's metadata.

    Changed from v1:
    Export the reserved field via sysfs

    Acked-by: Christoph Lameter
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Pekka Enberg

    Lai Jiangshan
     

11 Mar, 2011

2 commits

  • Use the this_cpu_cmpxchg_double functionality to implement a lockless
    allocation algorithm on arches that support fast this_cpu_ops.

    Each of the per cpu pointers is paired with a transaction id that ensures
    that updates of the per cpu information can only occur in sequence on
    a certain cpu.

    A transaction id is a "long" integer that is comprised of an event number
    and the cpu number. The event number is incremented for every change to the
    per cpu state. This means that the cmpxchg instruction can verify for an
    update that nothing interfered and that we are updating the percpu structure
    for the processor where we picked up the information and that we are also
    currently on that processor when we update the information.

    This results in a significant decrease of the overhead in the fastpaths. It
    also makes it easy to adopt the fast path for realtime kernels since this
    is lockless and does not require the use of the current per cpu area
    over the critical section. It is only important that the per cpu area is
    current at the beginning of the critical section and at the end.

    So there is no need even to disable preemption.

    Test results show that the fastpath cycle count is reduced by up to ~ 40%
    (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
    adds a few cycles.

    Sadly this does nothing for the slowpath which is where the main issues with
    performance in slub are but the best case performance rises significantly.
    (For that see the more complex slub patches that require cmpxchg_double)

    Kmalloc: alloc/free test

    Before:

    10000 times kmalloc(8)/kfree -> 134 cycles
    10000 times kmalloc(16)/kfree -> 152 cycles
    10000 times kmalloc(32)/kfree -> 144 cycles
    10000 times kmalloc(64)/kfree -> 142 cycles
    10000 times kmalloc(128)/kfree -> 142 cycles
    10000 times kmalloc(256)/kfree -> 132 cycles
    10000 times kmalloc(512)/kfree -> 132 cycles
    10000 times kmalloc(1024)/kfree -> 135 cycles
    10000 times kmalloc(2048)/kfree -> 135 cycles
    10000 times kmalloc(4096)/kfree -> 135 cycles
    10000 times kmalloc(8192)/kfree -> 144 cycles
    10000 times kmalloc(16384)/kfree -> 754 cycles

    After:

    10000 times kmalloc(8)/kfree -> 78 cycles
    10000 times kmalloc(16)/kfree -> 78 cycles
    10000 times kmalloc(32)/kfree -> 82 cycles
    10000 times kmalloc(64)/kfree -> 88 cycles
    10000 times kmalloc(128)/kfree -> 79 cycles
    10000 times kmalloc(256)/kfree -> 79 cycles
    10000 times kmalloc(512)/kfree -> 85 cycles
    10000 times kmalloc(1024)/kfree -> 82 cycles
    10000 times kmalloc(2048)/kfree -> 82 cycles
    10000 times kmalloc(4096)/kfree -> 85 cycles
    10000 times kmalloc(8192)/kfree -> 82 cycles
    10000 times kmalloc(16384)/kfree -> 706 cycles

    Kmalloc: Repeatedly allocate then free test

    Before:

    10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
    10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
    10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
    10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
    10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
    10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
    10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
    10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
    10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
    10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
    10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
    10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles

    After:

    10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
    10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
    10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
    10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
    10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
    10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
    10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
    10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
    10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
    10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
    10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
    10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • It is used in unfreeze_slab() which is a performance critical
    function.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter