14 Apr, 2014

1 commit

  • Pull slab changes from Pekka Enberg:
    "The biggest change is byte-sized freelist indices which reduces slab
    freelist memory usage:

    https://lkml.org/lkml/2013/12/2/64"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    mm: slab/slub: use page->list consistently instead of page->lru
    mm/slab.c: cleanup outdated comments and unify variables naming
    slab: fix wrongly used macro
    slub: fix high order page allocation problem with __GFP_NOFAIL
    slab: Make allocations with GFP_ZERO slightly more efficient
    slab: make more slab management structure off the slab
    slab: introduce byte sized index for the freelist of a slab
    slab: restrict the number of objects in a slab
    slab: introduce helper functions to get/set free object
    slab: factor out calculate nr objects in cache_estimate

    Linus Torvalds
     

11 Apr, 2014

1 commit

  • 'struct page' has two list_head fields: 'lru' and 'list'. Conveniently,
    they are unioned together. This means that code can use them
    interchangably, which gets horribly confusing like with this nugget from
    slab.c:

    > list_del(&page->lru);
    > if (page->active == cachep->num)
    > list_add(&page->list, &n->slabs_full);

    This patch makes the slab and slub code use page->lru universally instead
    of mixing ->list and ->lru.

    So, the new rule is: page->lru is what the you use if you want to keep
    your page on a list. Don't like the fact that it's not called ->list?
    Too bad.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Pekka Enberg

    Dave Hansen
     

08 Apr, 2014

2 commits

  • PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
    There's no significant performance degradation to checking
    current->mempolicy rather than current->flags & PF_MEMPOLICY in the
    allocation path, especially since this is considered unlikely().

    Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
    64GB of memory and without a mempolicy:

    threads before after
    16 1249409 1244487
    32 1281786 1246783
    48 1239175 1239138
    64 1244642 1241841
    80 1244346 1248918
    96 1266436 1254316
    112 1307398 1312135
    128 1327607 1326502

    Per-process flags are a scarce resource so we should free them up whenever
    possible and make them available. We'll be using it shortly for memcg oom
    reserves.

    Signed-off-by: David Rientjes
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Tejun Heo
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Jianguo Wu
    Cc: Tim Hockin
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • slab_node() is actually a mempolicy function, so rename it to
    mempolicy_slab_node() to make it clearer that it used for processes with
    mempolicies.

    At the same time, cleanup its code by saving numa_mem_id() in a local
    variable (since we require a node with memory, not just any node) and
    remove an obsolete comment that assumes the mempolicy is actually passed
    into the function.

    Signed-off-by: David Rientjes
    Acked-by: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Tejun Heo
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Jianguo Wu
    Cc: Tim Hockin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

04 Apr, 2014

1 commit

  • Since put_mems_allowed() is strictly optional, its a seqcount retry, we
    don't need to evaluate the function if the allocation was in fact
    successful, saving a smp_rmb some loads and comparisons on some relative
    fast-paths.

    Since the naming, get/put_mems_allowed() does suggest a mandatory
    pairing, rename the interface, as suggested by Mel, to resemble the
    seqcount interface.

    This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
    where it is important to note that the return value of the latter call
    is inverted from its previous incarnation.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

01 Apr, 2014

1 commit


08 Feb, 2014

6 commits

  • Use the likely mechanism already around valid
    pointer tests to better choose when to memset
    to 0 allocations with __GFP_ZERO

    Acked-by: Christoph Lameter
    Signed-off-by: Joe Perches
    Signed-off-by: Pekka Enberg

    Joe Perches
     
  • Now, the size of the freelist for the slab management diminish,
    so that the on-slab management structure can waste large space
    if the object of the slab is large.

    Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
    in the slab. The size of the freelist for this case would be 31 bytes
    so that 97 bytes, that is, more than 75% of object size, are wasted.

    In a 64 byte sized slab case, no space is wasted if we use on-slab.
    So set off-slab determining constraint to 128 bytes.

    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Currently, the freelist of a slab consist of unsigned int sized indexes.
    Since most of slabs have less number of objects than 256, large sized
    indexes is needless. For example, consider the minimum kmalloc slab. It's
    object size is 32 byte and it would consist of one page, so 256 indexes
    through byte sized index are enough to contain all possible indexes.

    There can be some slabs whose object size is 8 byte. We cannot handle
    this case with byte sized index, so we need to restrict minimum
    object size. Since these slabs are not major, wasted memory from these
    slabs would be negligible.

    Some architectures' page size isn't 4096 bytes and rather larger than
    4096 bytes (One example is 64KB page size on PPC or IA64) so that
    byte sized index doesn't fit to them. In this case, we will use
    two bytes sized index.

    Below is some number for this patch.

    * Before *
    kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0
    kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0
    kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0
    kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0
    kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0
    kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
    kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0
    kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0

    * After *
    kmalloc-512 521 648 512 8 1 : tunables 54 27 0 : slabdata 81 81 0
    kmalloc-256 208 208 256 16 1 : tunables 120 60 0 : slabdata 13 13 0
    kmalloc-192 1029 1029 192 21 1 : tunables 120 60 0 : slabdata 49 49 0
    kmalloc-96 529 589 128 31 1 : tunables 120 60 0 : slabdata 19 19 0
    kmalloc-64 2142 2142 64 63 1 : tunables 120 60 0 : slabdata 34 34 0
    kmalloc-128 660 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0
    kmalloc-32 11716 11780 32 124 1 : tunables 120 60 0 : slabdata 95 95 0
    kmem_cache 197 210 192 21 1 : tunables 120 60 0 : slabdata 10 10 0

    kmem_caches consisting of objects less than or equal to 256 byte have
    one or more objects than before. In the case of kmalloc-32, we have 11 more
    objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
    memory. Of couse, this percentage decreases as the number of objects
    in a slab decreases.

    Here are the performance results on my 4 cpus machine.

    * Before *

    Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

    229,945,138 cache-misses ( +- 0.23% )

    11.627897174 seconds time elapsed ( +- 0.14% )

    * After *

    Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

    218,640,472 cache-misses ( +- 0.42% )

    11.504999837 seconds time elapsed ( +- 0.21% )

    cache-misses are reduced by this patchset, roughly 5%.
    And elapsed times are improved by 1%.

    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • To prepare to implement byte sized index for managing the freelist
    of a slab, we should restrict the number of objects in a slab to be less
    or equal to 256, since byte only represent 256 different values.
    Setting the size of object to value equal or more than newly introduced
    SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
    equal to 256 for a slab with 1 page.

    If page size is rather larger than 4096, above assumption would be wrong.
    In this case, we would fall back on 2 bytes sized index.

    If minimum size of kmalloc is less than 16, we use it as minimum object
    size and give up this optimization.

    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • In the following patches, to get/set free objects from the freelist
    is changed so that simple casting doesn't work for it. Therefore,
    introduce helper functions.

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • This logic is not simple to understand so that making separate function
    helping readability. Additionally, we can use this change in the
    following patch which implement for freelist to have another sized index
    in according to nr objects.

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

31 Jan, 2014

1 commit

  • This patch fixed following errors while make htmldocs
    Warning(/mm/slab.c:1956): No description found for parameter 'page'
    Warning(/mm/slab.c:1956): Excess function parameter 'slabp' description in 'slab_destroy'

    Incorrect function parameter "slabp" was set instead of "page"

    Acked-by: Christoph Lameter
    Signed-off-by: Masanari Iida
    Signed-off-by: Pekka Enberg

    Masanari Iida
     

23 Nov, 2013

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
    slab internals similar to mm/slub.c. This reduces memory usage and
    improves performance:

    https://lkml.org/lkml/2013/10/16/155

    Rest of the changes are bug fixes from various people"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
    mm, slub: fix the typo in mm/slub.c
    mm, slub: fix the typo in include/linux/slub_def.h
    slub: Handle NULL parameter in kmem_cache_flags
    slab: replace non-existing 'struct freelist *' with 'void *'
    slab: fix to calm down kmemleak warning
    slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
    slab: rename slab_bufctl to slab_freelist
    slab: remove useless statement for checking pfmemalloc
    slab: use struct page for slab management
    slab: replace free and inuse in struct slab with newly introduced active
    slab: remove SLAB_LIMIT
    slab: remove kmem_bufctl_t
    slab: change the management method of free objects of the slab
    slab: use __GFP_COMP flag for allocating slab pages
    slab: use well-defined macro, virt_to_slab()
    slab: overloading the RCU head over the LRU for RCU free
    slab: remove cachep in struct slab_rcu
    slab: remove nodeid in struct slab
    slab: remove colouroff in struct slab
    slab: change return type of kmem_getpages() to struct page
    ...

    Linus Torvalds
     

13 Nov, 2013

1 commit


30 Oct, 2013

2 commits


25 Oct, 2013

15 commits

  • Now, bufctl is not proper name to this array.
    So change it.

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Now, virt_to_page(page->s_mem) is same as the page,
    because slab use this structure for management.
    So remove useless statement.

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Now, there are a few field in struct slab, so we can overload these
    over struct page. This will save some memory and reduce cache footprint.

    After this change, slabp_cache and slab_size no longer related to
    a struct slab, so rename them as freelist_cache and freelist_size.

    These changes are just mechanical ones and there is no functional change.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Now, free in struct slab is same meaning as inuse.
    So, remove both and replace them with active.

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • It's useless now, so remove it.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Now, we changed the management method of free objects of the slab and
    there is no need to use special value, BUFCTL_END, BUFCTL_FREE and
    BUFCTL_ACTIVE. So remove them.

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Current free objects management method of the slab is weird, because
    it touch random position of the array of kmem_bufctl_t when we try to
    get free object. See following example.

    struct slab's free = 6
    kmem_bufctl_t array: 1 END 5 7 0 4 3 2

    To get free objects, we access this array with following pattern.
    6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END

    If we have many objects, this array would be larger and be not in the same
    cache line. It is not good for performance.

    We can do same thing through more easy way, like as the stack.
    Only thing we have to do is to maintain stack top to free object. I use
    free field of struct slab for this purpose. After that, if we need to get
    an object, we can get it at stack top and manipulate top pointer.
    That's all. This method already used in array_cache management.
    Following is an access pattern when we use this method.

    struct slab's free = 0
    kmem_bufctl_t array: 6 3 7 2 5 4 0 1

    To get free objects, we access this array with following pattern.
    0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

    This may help cache line footprint if slab has many objects, and,
    in addition, this makes code much much simpler.

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • If we use 'struct page' of first page as 'struct slab', there is no
    advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • This is trivial change, just use well-defined macro.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • With build-time size checking, we can overload the RCU head over the LRU
    of struct page to free pages of a slab in rcu context. This really help to
    implement to overload the struct slab over the struct page and this
    eventually reduce memory usage and cache footprint of the SLAB.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • We can get cachep using page in struct slab_rcu, so remove it.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • We can get nodeid using address translation, so this field is not useful.
    Therefore, remove it.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • Now there is no user colouroff, so remove it.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • It is more understandable that kmem_getpages() return struct page.
    And, with this, we can reduce one translation from virt addr to page and
    makes better code than before. Below is a change of this patch.

    * Before
    text data bss dec hex filename
    22123 23434 4 45561 b1f9 mm/slab.o

    * After
    text data bss dec hex filename
    22074 23434 4 45512 b1c8 mm/slab.o

    And this help following patch to remove struct slab's colouroff.

    Acked-by: Andi Kleen
    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     
  • We checked pfmemalloc by slab unit, not page unit. You can see this
    in is_slab_pfmemalloc(). So other pages don't need to be set/cleared
    pfmemalloc.

    And, therefore we should check pfmemalloc in page flag of first page,
    but current implementation don't do that. virt_to_head_page(obj) just
    return 'struct page' of that object, not one of first page, since the SLAB
    don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page,
    we first get a slab and try to get it via virt_to_head_page(slab->s_mem).

    Acked-by: Andi Kleen
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

15 Jul, 2013

2 commits

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     

08 Jul, 2013

1 commit


07 Jul, 2013

3 commits

  • Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
    CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.

    In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
    init_lock_keys() dereference beyond kmalloc_caches[26].
    This leads to an unbootable system (kernel panic at initializing SLAB)
    if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.

    Fix this by making sure that init_lock_keys() does not dereference beyond
    kmalloc_caches[26] arrays.

    Signed-off-by: Christoph Lameter
    Reported-by: Tetsuo Handa
    Cc: Pekka Enberg
    Cc: [3.10.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • This patch shares s_next and s_stop between slab and slub.

    Acked-by: Christoph Lameter
    Signed-off-by: Wanpeng Li
    Signed-off-by: Pekka Enberg

    Wanpeng Li
     
  • The drain_freelist is called to drain slabs_free lists for cache reap,
    cache shrink, memory hotplug callback etc. The tofree parameter should
    be the number of slab to free instead of the number of slab objects to
    free.

    This patch fix the callers that pass # of objects. Make sure they pass #
    of slabs.

    Acked-by: Christoph Lameter
    Signed-off-by: Wanpeng Li
    Signed-off-by: Pekka Enberg

    Wanpeng Li
     

08 Jun, 2013

1 commit


07 May, 2013

1 commit

  • Pull slab changes from Pekka Enberg:
    "The bulk of the changes are more slab unification from Christoph.

    There's also few fixes from Aaron, Glauber, and Joonsoo thrown into
    the mix."

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits)
    mm, slab_common: Fix bootstrap creation of kmalloc caches
    slab: Return NULL for oversized allocations
    mm: slab: Verify the nodeid passed to ____cache_alloc_node
    slub: tid must be retrieved from the percpu area of the current processor
    slub: Do not dereference NULL pointer in node_match
    slub: add 'likely' macro to inc_slabs_node()
    slub: correct to calculate num of acquired objects in get_partial_node()
    slub: correctly bootstrap boot caches
    mm/sl[au]b: correct allocation type check in kmalloc_slab()
    slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections
    slab: Handle ARCH_DMA_MINALIGN correctly
    slab: Common definition for kmem_cache_node
    slab: Rename list3/l3 to node
    slab: Common Kmalloc cache determination
    stat: Use size_t for sizes instead of unsigned
    slab: Common function to create the kmalloc array
    slab: Common definition for the array of kmalloc caches
    slab: Common constants for kmalloc boundaries
    slab: Rename nodelists to node
    slab: Common name for the per node structures
    ...

    Linus Torvalds