23 Nov, 2013

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
    slab internals similar to mm/slub.c. This reduces memory usage and
    improves performance:

    https://lkml.org/lkml/2013/10/16/155

    Rest of the changes are bug fixes from various people"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
    mm, slub: fix the typo in mm/slub.c
    mm, slub: fix the typo in include/linux/slub_def.h
    slub: Handle NULL parameter in kmem_cache_flags
    slab: replace non-existing 'struct freelist *' with 'void *'
    slab: fix to calm down kmemleak warning
    slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
    slab: rename slab_bufctl to slab_freelist
    slab: remove useless statement for checking pfmemalloc
    slab: use struct page for slab management
    slab: replace free and inuse in struct slab with newly introduced active
    slab: remove SLAB_LIMIT
    slab: remove kmem_bufctl_t
    slab: change the management method of free objects of the slab
    slab: use __GFP_COMP flag for allocating slab pages
    slab: use well-defined macro, virt_to_slab()
    slab: overloading the RCU head over the LRU for RCU free
    slab: remove cachep in struct slab_rcu
    slab: remove nodeid in struct slab
    slab: remove colouroff in struct slab
    slab: change return type of kmem_getpages() to struct page
    ...

    Linus Torvalds
     

12 Nov, 2013

1 commit


05 Sep, 2013

2 commits

  • I do not see any user for this code in the tree.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • The kmalloc* functions of all slab allcoators are similar now so
    lets move them into slab.h. This requires some function naming changes
    in slob.

    As a results of this patch there is a common set of functions for
    all allocators. Also means that kmalloc_large() is now available
    in general to perform large order allocations that go directly
    via the page allocator. kmalloc_large() can be substituted if
    kmalloc() throws warnings because of too large allocations.

    kmalloc_large() has exactly the same semantics as kmalloc but
    can only used for allocations > PAGE_SIZE.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

01 Feb, 2013

5 commits

  • Put the definitions for the kmem_cache_node structures together so that
    we have one structure. That will allow us to create more common fields in
    the future which could yield more opportunities to share code.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Extract the optimized lookup functions from slub and put them into
    slab_common.c. Then make slab use these functions as well.

    Joonsoo notes that this fixes some issues with constant folding which
    also reduces the code size for slub.

    https://lkml.org/lkml/2012/10/20/82

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Have a common definition fo the kmalloc cache arrays in
    SLAB and SLUB

    Acked-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Standardize the constants that describe the smallest and largest
    object kept in the kmalloc arrays for SLAB and SLUB.

    Differentiate between the maximum size for which a slab cache is used
    (KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size
    (KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Extract the function to determine the index of the slab within
    the array of kmalloc caches as well as a function to determine
    maximum object size from the nr of the kmalloc slab.

    This is used here only to simplify slub bootstrap but will
    be used later also for SLAB.

    Acked-by: Glauber Costa
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

19 Dec, 2012

3 commits

  • SLUB allows us to tune a particular cache behavior with sysfs-based
    tunables. When creating a new memcg cache copy, we'd like to preserve any
    tunables the parent cache already had.

    This can be done by tapping into the store attribute function provided by
    the allocator. We of course don't need to mess with read-only fields.
    Since the attributes can have multiple types and are stored internally by
    sysfs, the best strategy is to issue a ->show() in the root cache, and
    then ->store() in the memcg cache.

    The drawback of that, is that sysfs can allocate up to a page in buffering
    for show(), that we are likely not to need, but also can't guarantee. To
    avoid always allocating a page for that, we can update the caches at store
    time with the maximum attribute size ever stored to the root cache. We
    will then get a buffer big enough to hold it. The corolary to this, is
    that if no stores happened, nothing will be propagated.

    It can also happen that a root cache has its tunables updated during
    normal system operation. In this case, we will propagate the change to
    all caches that are already active.

    [akpm@linux-foundation.org: tweak code to avoid __maybe_unused]
    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • We are able to match a cache allocation to a particular memcg. If the
    task doesn't change groups during the allocation itself - a rare event,
    this will give us a good picture about who is the first group to touch a
    cache page.

    This patch uses the now available infrastructure by calling
    memcg_kmem_get_cache() before all the cache allocations.

    Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • For the kmem slab controller, we need to record some extra information in
    the kmem_cache structure.

    Signed-off-by: Glauber Costa
    Signed-off-by: Suleiman Souhlal
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

14 Jun, 2012

1 commit

  • Define a struct that describes common fields used in all slab allocators.
    A slab allocator either uses the common definition (like SLOB) or is
    required to provide members of kmem_cache with the definition given.

    After that it will be possible to share code that
    only operates on those fields of kmem_cache.

    The patch basically takes the slob definition of kmem cache and
    uses the field namees for the other allocators.

    It also standardizes the names used for basic object lengths in
    allocators:

    object_size Struct size specified at kmem_cache_create. Basically
    the payload expected to be used by the subsystem.

    size The size of memory allocator for each object. This size
    is larger than object_size and includes padding, alignment
    and extra metadata for each object (f.e. for debugging
    and rcu).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

01 Jun, 2012

1 commit

  • The node field is always page_to_nid(c->page). So its rather easy to
    replace. Note that there maybe slightly more overhead in various hot paths
    due to the need to shift the bits from page->flags. However, that is mostly
    compensated for by a smaller footprint of the kmem_cache_cpu structure (this
    patch reduces that to 3 words per cache) which allows better caching.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

29 Mar, 2012

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "There's the new kmalloc_array() API, minor fixes and performance
    improvements, but quite honestly, nothing terribly exciting."

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    mm: SLAB Out-of-memory diagnostics
    slab: introduce kmalloc_array()
    slub: per cpu partial statistics change
    slub: include include for prefetch
    slub: Do not hold slub_lock when calling sysfs_slab_add()
    slub: prefetch next freelist pointer in slab_alloc()
    slab, cleanup: remove unneeded return

    Linus Torvalds
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

18 Feb, 2012

1 commit

  • This patch split the cpu_partial_free into 2 parts: cpu_partial_node, PCP refilling
    times from node partial; and same name cpu_partial_free, PCP refilling times in
    slab_free slow path. A new statistic 'cpu_partial_drain' is added to get PCP
    drain to node partial times. These info are useful when do PCP tunning.

    The slabinfo.c code is unchanged, since cpu_partial_node is not on slow path.

    Signed-off-by: Alex Shi
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Alex Shi
     

28 Sep, 2011

1 commit


20 Aug, 2011

1 commit

  • Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
    partial pages. The partial page list is used in slab_free() to avoid
    per node lock taking.

    In __slab_alloc() we can then take multiple partial pages off the per
    node partial list in one go reducing node lock pressure.

    We can also use the per cpu partial list in slab_alloc() to avoid scanning
    partial lists for pages with free objects.

    The main effect of a per cpu partial list is that the per node list_lock
    is taken for batches of partial pages instead of individual ones.

    Potential future enhancements:

    1. The pickup from the partial list could be perhaps be done without disabling
    interrupts with some work. The free path already puts the page into the
    per cpu partial list without disabling interrupts.

    2. __slab_free() may have some code paths that could use optimization.

    Performance:

    Before After
    ./hackbench 100 process 200000
    Time: 1953.047 1564.614
    ./hackbench 100 process 20000
    Time: 207.176 156.940
    ./hackbench 100 process 20000
    Time: 204.468 156.940
    ./hackbench 100 process 20000
    Time: 204.879 158.772
    ./hackbench 10 process 20000
    Time: 20.153 15.853
    ./hackbench 10 process 20000
    Time: 20.153 15.986
    ./hackbench 10 process 20000
    Time: 19.363 16.111
    ./hackbench 1 process 20000
    Time: 2.518 2.307
    ./hackbench 1 process 20000
    Time: 2.258 2.339
    ./hackbench 1 process 20000
    Time: 2.864 2.163

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

31 Jul, 2011

1 commit

  • * 'slub/lockless' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: (21 commits)
    slub: When allocating a new slab also prep the first object
    slub: disable interrupts in cmpxchg_double_slab when falling back to pagelock
    Avoid duplicate _count variables in page_struct
    Revert "SLUB: Fix build breakage in linux/mm_types.h"
    SLUB: Fix build breakage in linux/mm_types.h
    slub: slabinfo update for cmpxchg handling
    slub: Not necessary to check for empty slab on load_freelist
    slub: fast release on full slab
    slub: Add statistics for the case that the current slab does not match the node
    slub: Get rid of the another_slab label
    slub: Avoid disabling interrupts in free slowpath
    slub: Disable interrupts in free_debug processing
    slub: Invert locking and avoid slab lock
    slub: Rework allocator fastpaths
    slub: Pass kmem_cache struct to lock and freeze slab
    slub: explicit list_lock taking
    slub: Add cmpxchg_double_slab()
    mm: Rearrange struct page
    slub: Move page->frozen handling near where the page->freelist handling occurs
    slub: Do not use frozen page flag but a bit in the page counters
    ...

    Linus Torvalds
     

08 Jul, 2011

1 commit


02 Jul, 2011

3 commits


17 Jun, 2011

1 commit

  • Every slab has its on alignment definition in include/linux/sl?b_def.h. Extract those
    and define a common set in include/linux/slab.h.

    SLOB: As notes sometimes we need double word alignment on 32 bit. This gives all
    structures allocated by SLOB a unsigned long long alignment like the others do.

    SLAB: If ARCH_SLAB_MINALIGN is not set SLAB would set ARCH_SLAB_MINALIGN to
    zero meaning no alignment at all. Give it the default unsigned long long alignment.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

21 May, 2011

1 commit


08 May, 2011

1 commit

  • Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
    everywhere.

    There may be performance implications since:

    A. We now have to manage a transaction ID for all arches

    B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
    to a very short irqoff section.

    There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
    case we only have to do one disable and enable like before.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

23 Mar, 2011

1 commit


21 Mar, 2011

1 commit


12 Mar, 2011

1 commit

  • There is no "struct" for slub's slab, it shares with struct page.
    But struct page is very small, it is insufficient when we need
    to add some metadata for slab.

    So we add a field "reserved" to struct kmem_cache, when a slab
    is allocated, kmem_cache->reserved bytes are automatically reserved
    at the end of the slab for slab's metadata.

    Changed from v1:
    Export the reserved field via sysfs

    Acked-by: Christoph Lameter
    Signed-off-by: Lai Jiangshan
    Signed-off-by: Pekka Enberg

    Lai Jiangshan
     

11 Mar, 2011

2 commits

  • Use the this_cpu_cmpxchg_double functionality to implement a lockless
    allocation algorithm on arches that support fast this_cpu_ops.

    Each of the per cpu pointers is paired with a transaction id that ensures
    that updates of the per cpu information can only occur in sequence on
    a certain cpu.

    A transaction id is a "long" integer that is comprised of an event number
    and the cpu number. The event number is incremented for every change to the
    per cpu state. This means that the cmpxchg instruction can verify for an
    update that nothing interfered and that we are updating the percpu structure
    for the processor where we picked up the information and that we are also
    currently on that processor when we update the information.

    This results in a significant decrease of the overhead in the fastpaths. It
    also makes it easy to adopt the fast path for realtime kernels since this
    is lockless and does not require the use of the current per cpu area
    over the critical section. It is only important that the per cpu area is
    current at the beginning of the critical section and at the end.

    So there is no need even to disable preemption.

    Test results show that the fastpath cycle count is reduced by up to ~ 40%
    (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
    adds a few cycles.

    Sadly this does nothing for the slowpath which is where the main issues with
    performance in slub are but the best case performance rises significantly.
    (For that see the more complex slub patches that require cmpxchg_double)

    Kmalloc: alloc/free test

    Before:

    10000 times kmalloc(8)/kfree -> 134 cycles
    10000 times kmalloc(16)/kfree -> 152 cycles
    10000 times kmalloc(32)/kfree -> 144 cycles
    10000 times kmalloc(64)/kfree -> 142 cycles
    10000 times kmalloc(128)/kfree -> 142 cycles
    10000 times kmalloc(256)/kfree -> 132 cycles
    10000 times kmalloc(512)/kfree -> 132 cycles
    10000 times kmalloc(1024)/kfree -> 135 cycles
    10000 times kmalloc(2048)/kfree -> 135 cycles
    10000 times kmalloc(4096)/kfree -> 135 cycles
    10000 times kmalloc(8192)/kfree -> 144 cycles
    10000 times kmalloc(16384)/kfree -> 754 cycles

    After:

    10000 times kmalloc(8)/kfree -> 78 cycles
    10000 times kmalloc(16)/kfree -> 78 cycles
    10000 times kmalloc(32)/kfree -> 82 cycles
    10000 times kmalloc(64)/kfree -> 88 cycles
    10000 times kmalloc(128)/kfree -> 79 cycles
    10000 times kmalloc(256)/kfree -> 79 cycles
    10000 times kmalloc(512)/kfree -> 85 cycles
    10000 times kmalloc(1024)/kfree -> 82 cycles
    10000 times kmalloc(2048)/kfree -> 82 cycles
    10000 times kmalloc(4096)/kfree -> 85 cycles
    10000 times kmalloc(8192)/kfree -> 82 cycles
    10000 times kmalloc(16384)/kfree -> 706 cycles

    Kmalloc: Repeatedly allocate then free test

    Before:

    10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
    10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
    10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
    10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
    10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
    10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
    10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
    10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
    10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
    10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
    10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
    10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles

    After:

    10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
    10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
    10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
    10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
    10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
    10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
    10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
    10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
    10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
    10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
    10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
    10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • It is used in unfreeze_slab() which is a performance critical
    function.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

06 Nov, 2010

1 commit

  • Having the trace calls defined in the always inlined kmalloc functions
    in include/linux/slub_def.h causes a lot of code duplication as the
    trace functions get instantiated for each kamalloc call site. This can
    simply be removed by pushing the trace calls down into the functions in
    slub.c.

    On my x86_64 built this patch shrinks the code size of the kernel by
    approx 36K and also shrinks the code size of many modules -- too many to
    list here ;)

    size vmlinux (2.6.36) reports
    text data bss dec hex filename
    5410611 743172 828928 6982711 6a8c37 vmlinux
    5373738 744244 828928 6946910 6a005e vmlinux + patch

    The resulting kernel has had some testing & kmalloc trace still seems to
    work.

    This patch
    - moves trace_kmalloc out of the inlined kmalloc() and pushes it down
    into kmem_cache_alloc_trace() so this it only get instantiated once.

    - rename kmem_cache_alloc_notrace() to kmem_cache_alloc_trace() to
    indicate that now is does have tracing. (maybe this would better being
    called something like kmalloc_kmem_cache ?)

    - adds a new function kmalloc_order() to handle allocation and tracing
    of large allocations of page order.

    - removes tracing from the inlined kmalloc_large() replacing them with a
    call to kmalloc_order();

    - move tracing out of inlined kmalloc_node() and pushing it down into
    kmem_cache_alloc_node_trace

    - rename kmem_cache_alloc_node_notrace() to
    kmem_cache_alloc_node_trace()

    - removes the include of trace/events/kmem.h from slub_def.h.

    v2
    - keep kmalloc_order_trace inline when !CONFIG_TRACE

    Signed-off-by: Richard Kennedy
    Signed-off-by: Pekka Enberg

    Richard Kennedy
     

06 Oct, 2010

1 commit


02 Oct, 2010

2 commits

  • Reduce the #ifdefs and simplify bootstrap by making SMP and NUMA as much alike
    as possible. This means that there will be an additional indirection to get to
    the kmem_cache_node field under SMP.

    Acked-by: David Rientjes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • kmalloc caches are statically defined and may take up a lot of space just
    because the sizes of the node array has to be dimensioned for the largest
    node count supported.

    This patch makes the size of the kmem_cache structure dynamic throughout by
    creating a kmem_cache slab cache for the kmem_cache objects. The bootstrap
    occurs by allocating the initial one or two kmem_cache objects from the
    page allocator.

    C2->C3
    - Fix various issues indicated by David
    - Make create kmalloc_cache return a kmem_cache * pointer.

    Acked-by: David Rientjes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

23 Aug, 2010

1 commit


11 Aug, 2010

1 commit

  • Now each architecture has the own dma_get_cache_alignment implementation.

    dma_get_cache_alignment returns the minimum DMA alignment. Architectures
    define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
    buffer is DMA-safe; the buffer doesn't share a cache with the others). So
    we can unify dma_get_cache_alignment implementations.

    This patch:

    dma_get_cache_alignment() needs to know if an architecture defines
    ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
    alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if
    architectures doesn't define it.

    Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
    ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
    (except for crypto).

    Signed-off-by: FUJITA Tomonori
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     

09 Aug, 2010

1 commit


10 Jun, 2010

1 commit