14 Apr, 2014

1 commit

  • Pull slab changes from Pekka Enberg:
    "The biggest change is byte-sized freelist indices which reduces slab
    freelist memory usage:

    https://lkml.org/lkml/2013/12/2/64"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    mm: slab/slub: use page->list consistently instead of page->lru
    mm/slab.c: cleanup outdated comments and unify variables naming
    slab: fix wrongly used macro
    slub: fix high order page allocation problem with __GFP_NOFAIL
    slab: Make allocations with GFP_ZERO slightly more efficient
    slab: make more slab management structure off the slab
    slab: introduce byte sized index for the freelist of a slab
    slab: restrict the number of objects in a slab
    slab: introduce helper functions to get/set free object
    slab: factor out calculate nr objects in cache_estimate

    Linus Torvalds
     

08 Apr, 2014

6 commits

  • Statistics are not critical to the operation of the allocation but
    should also not cause too much overhead.

    When __this_cpu_inc is altered to check if preemption is disabled this
    triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
    cause interrupt disable/enable sequences on various arches which may
    significantly impact allocator performance.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Christoph Lameter
    Cc: Fengguang Wu
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The failure paths of sysfs_slab_add don't release the allocation of
    'name' made by create_unique_id() a few lines above the context of the
    diff below. Create a common exit path to make it more obvious what
    needs freeing.

    [vdavydov@parallels.com: free the name only if !unmergeable]
    Signed-off-by: Dave Jones
    Signed-off-by: Vladimir Davydov
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • Currently, we try to arrange sysfs entries for memcg caches in the same
    manner as for global caches. Apart from turning /sys/kernel/slab into a
    mess when there are a lot of kmem-active memcgs created, it actually
    does not work properly - we won't create more than one link to a memcg
    cache in case its parent is merged with another cache. For instance, if
    A is a root cache merged with another root cache B, we will have the
    following sysfs setup:

    X
    A -> X
    B -> X

    where X is some unique id (see create_unique_id()). Now if memcgs M and
    N start to allocate from cache A (or B, which is the same), we will get:

    X
    X:M
    X:N
    A -> X
    B -> X
    A:M -> X:M
    A:N -> X:N

    Since B is an alias for A, we won't get entries B:M and B:N, which is
    confusing.

    It is more logical to have entries for memcg caches under the
    corresponding root cache's sysfs directory. This would allow us to keep
    sysfs layout clean, and avoid such inconsistencies like one described
    above.

    This patch does the trick. It creates a "cgroup" kset in each root
    cache kobject to keep its children caches there.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Otherwise, kzalloc() called from a memcg won't clear the whole object.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • When a kmem cache is created (kmem_cache_create_memcg()), we first try to
    find a compatible cache that already exists and can handle requests from
    the new cache, i.e. has the same object size, alignment, ctor, etc. If
    there is such a cache, we do not create any new caches, instead we simply
    increment the refcount of the cache found and return it.

    Currently we do this procedure not only when creating root caches, but
    also for memcg caches. However, there is no point in that, because, as
    every memcg cache has exactly the same parameters as its parent and cache
    merging cannot be turned off in runtime (only on boot by passing
    "slub_nomerge"), the root caches of any two potentially mergeable memcg
    caches should be merged already, i.e. it must be the same root cache, and
    therefore we couldn't even get to the memcg cache creation, because it
    already exists.

    The only exception is boot caches - they are explicitly forbidden to be
    merged by setting their refcount to -1. There are currently only two of
    them - kmem_cache and kmem_cache_node, which are used in slab internals (I
    do not count kmalloc caches as their refcount is set to 1 immediately
    after creation). Since they are prevented from merging preliminary I
    guess we should avoid to merge their children too.

    So let's remove the useless code responsible for merging memcg caches.

    Signed-off-by: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • slab_node() is actually a mempolicy function, so rename it to
    mempolicy_slab_node() to make it clearer that it used for processes with
    mempolicies.

    At the same time, cleanup its code by saving numa_mem_id() in a local
    variable (since we require a node with memory, not just any node) and
    remove an obsolete comment that assumes the mempolicy is actually passed
    into the function.

    Signed-off-by: David Rientjes
    Acked-by: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Tejun Heo
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Jianguo Wu
    Cc: Tim Hockin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

04 Apr, 2014

2 commits

  • We release the slab_mutex while calling sysfs_slab_add from
    __kmem_cache_create since commit 66c4c35c6bc5 ("slub: Do not hold
    slub_lock when calling sysfs_slab_add()"), because kobject_uevent called
    by sysfs_slab_add might block waiting for the usermode helper to exec,
    which would result in a deadlock if we took the slab_mutex while
    executing it.

    However, apart from complicating synchronization rules, releasing the
    slab_mutex on kmem cache creation can result in a kmemcg-related race.
    The point is that we check if the memcg cache exists before going to
    __kmem_cache_create, but register the new cache in memcg subsys after
    it. Since we can drop the mutex there, several threads can see that the
    memcg cache does not exist and proceed to creating it, which is wrong.

    Fortunately, recently kobject_uevent was patched to call the usermode
    helper with the UMH_NO_WAIT flag, making the deadlock impossible.
    Therefore there is no point in releasing the slab_mutex while calling
    sysfs_slab_add, so let's simplify kmem_cache_create synchronization and
    fix the kmemcg-race mentioned above by holding the slab_mutex during the
    whole cache creation path.

    Signed-off-by: Vladimir Davydov
    Acked-by: Christoph Lameter
    Cc: Greg KH
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Since put_mems_allowed() is strictly optional, its a seqcount retry, we
    don't need to evaluate the function if the allocation was in fact
    successful, saving a smp_rmb some loads and comparisons on some relative
    fast-paths.

    Since the naming, get/put_mems_allowed() does suggest a mandatory
    pairing, rename the interface, as suggested by Mel, to resemble the
    seqcount interface.

    This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
    where it is important to note that the return value of the latter call
    is inverted from its previous incarnation.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

27 Mar, 2014

1 commit

  • SLUB already try to allocate high order page with clearing __GFP_NOFAIL.
    But, when allocating shadow page for kmemcheck, it missed clearing
    the flag. This trigger WARN_ON_ONCE() reported by Christian Casteyde.

    https://bugzilla.kernel.org/show_bug.cgi?id=65991
    https://lkml.org/lkml/2013/12/3/764

    This patch fix this situation by using same allocation flag as original
    allocation.

    Reported-by: Christian Casteyde
    Acked-by: David Rientjes
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim
     

11 Feb, 2014

2 commits

  • Vladimir reported the following issue:

    Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires
    remove_partial() to be called with n->list_lock held, but free_partial()
    called from kmem_cache_close() on cache destruction does not follow this
    rule, leading to a warning:

    WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
    Modules linked in:
    CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1
    Hardware name:
    0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
    0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
    ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
    Call Trace:
    __kmem_cache_shutdown+0x1b2/0x1f0
    kmem_cache_destroy+0x43/0xf0
    xfs_destroy_zones+0x103/0x110 [xfs]
    exit_xfs_fs+0x38/0x4e4 [xfs]
    SyS_delete_module+0x19a/0x1f0
    system_call_fastpath+0x16/0x1b

    His solution was to add a spinlock in order to quiet lockdep. Although
    there would be no contention to adding the lock, that lock also requires
    disabling of interrupts which will have a larger impact on the system.

    Instead of adding a spinlock to a location where it is not needed for
    lockdep, make a __remove_partial() function that does not test if the
    list_lock is held, as no one should have it due to it being freed.

    Also added a __add_partial() function that does not do the lock
    validation either, as it is not needed for the creation of the cache.

    Signed-off-by: Steven Rostedt
    Reported-by: Vladimir Davydov
    Suggested-by: David Rientjes
    Acked-by: David Rientjes
    Acked-by: Vladimir Davydov
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     
  • Commit c65c1877bd68 ("slub: use lockdep_assert_held") incorrectly
    required that add_full() and remove_full() hold n->list_lock. The lock
    is only taken when kmem_cache_debug(s), since that's the only time it
    actually does anything.

    Require that the lock only be taken under such a condition.

    Reported-by: Larry Finger
    Tested-by: Larry Finger
    Tested-by: Paul E. McKenney
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

03 Feb, 2014

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "Random bug fixes that have accumulated in my inbox over the past few
    months"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    mm: Fix warning on make htmldocs caused by slab.c
    mm: slub: work around unneeded lockdep warning
    mm: sl[uo]b: fix misleading comments
    slub: Fix possible format string bug.
    slub: use lockdep_assert_held
    slub: Fix calculation of cpu slabs
    slab.h: remove duplicate kmalloc declaration and fix kernel-doc warnings

    Linus Torvalds
     

31 Jan, 2014

2 commits

  • The slub code does some setup during early boot in
    early_kmem_cache_node_alloc() with some local data. There is no
    possible way that another CPU can see this data, so the slub code
    doesn't unnecessarily lock it. However, some new lockdep asserts
    check to make sure that add_partial() _always_ has the list_lock
    held.

    Just add the locking, even though it is technically unnecessary.

    Cc: Peter Zijlstra
    Cc: Russell King
    Acked-by: David Rientjes
    Signed-off-by: Dave Hansen
    Signed-off-by: Pekka Enberg

    Dave Hansen
     
  • Commit abca7c496584 ("mm: fix slab->page _count corruption when using
    slub") notes that we can not _set_ a page->counters directly, except
    when using a real double-cmpxchg. Doing so can lose updates to
    ->_count.

    That is an absolute rule:

    You may not *set* page->counters except via a cmpxchg.

    Commit abca7c496584 fixed this for the folks who have the slub
    cmpxchg_double code turned off at compile time, but it left the bad case
    alone. It can still be reached, and the same bug triggered in two
    cases:

    1. Turning on slub debugging at runtime, which is available on
    the distro kernels that I looked at.
    2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
    cpus, evidently)

    There are at least 3 ways we could fix this:

    1. Take all of the exising calls to cmpxchg_double_slab() and
    __cmpxchg_double_slab() and convert them to take an old, new
    and target 'struct page'.
    2. Do (1), but with the newly-introduced 'slub_data'.
    3. Do some magic inside the two cmpxchg...slab() functions to
    pull the counters out of new_counters and only set those
    fields in page->{inuse,frozen,objects}.

    I've done (2) as well, but it's a bunch more code. This patch is an
    attempt at (3). This was the most straightforward and foolproof way
    that I could think to do this.

    This would also technically allow us to get rid of the ugly

    #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
    defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)

    in 'struct page', but leaving it alone has the added benefit that
    'counters' stays 'unsigned' instead of 'unsigned long', so all the
    copies that the slub code does stay a bit smaller.

    Signed-off-by: Dave Hansen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: Pravin B Shelar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

30 Jan, 2014

1 commit

  • Commit 309381feaee5 ("mm: dump page when hitting a VM_BUG_ON using
    VM_BUG_ON_PAGE") added a bunch of VM_BUG_ON_PAGE() calls.

    But, most of the ones in the slub code are for _temporary_ 'struct
    page's which are declared on the stack and likely have lots of gunk in
    them. Dumping their contents out will just confuse folks looking at
    bad_page() output. Plus, if we try to page_to_pfn() on them or
    soemthing, we'll probably oops anyway.

    Turn them back in to VM_BUG_ON()s.

    Signed-off-by: Dave Hansen
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

24 Jan, 2014

1 commit

  • Most of the VM_BUG_ON assertions are performed on a page. Usually, when
    one of these assertions fails we'll get a BUG_ON with a call stack and
    the registers.

    I've recently noticed based on the requests to add a small piece of code
    that dumps the page to various VM_BUG_ON sites that the page dump is
    quite useful to people debugging issues in mm.

    This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
    VM_BUG_ON() does, also dumps the page before executing the actual
    BUG_ON.

    [akpm@linux-foundation.org: fix up includes]
    Signed-off-by: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

14 Jan, 2014

2 commits


29 Dec, 2013

1 commit

  • /sys/kernel/slab/:t-0000048 # cat cpu_slabs
    231 N0=16 N1=215
    /sys/kernel/slab/:t-0000048 # cat slabs
    145 N0=36 N1=109

    See, the number of slabs is smaller than that of cpu slabs.

    The bug was introduced by commit 49e2258586b423684f03c278149ab46d8f8b6700
    ("slub: per cpu cache for partial pages").

    We should use page->pages instead of page->pobjects when calculating
    the number of cpu partial slabs. This also fixes the mapping of slabs
    and nodes.

    As there's no variable storing the number of total/active objects in
    cpu partial slabs, and we don't have user interfaces requiring those
    statistics, I just add WARN_ON for those cases.

    Cc: # 3.2+
    Acked-by: Christoph Lameter
    Reviewed-by: Wanpeng Li
    Signed-off-by: Li Zefan
    Signed-off-by: Pekka Enberg

    Li Zefan
     

23 Nov, 2013

1 commit

  • Pull SLAB changes from Pekka Enberg:
    "The patches from Joonsoo Kim switch mm/slab.c to use 'struct page' for
    slab internals similar to mm/slub.c. This reduces memory usage and
    improves performance:

    https://lkml.org/lkml/2013/10/16/155

    Rest of the changes are bug fixes from various people"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (21 commits)
    mm, slub: fix the typo in mm/slub.c
    mm, slub: fix the typo in include/linux/slub_def.h
    slub: Handle NULL parameter in kmem_cache_flags
    slab: replace non-existing 'struct freelist *' with 'void *'
    slab: fix to calm down kmemleak warning
    slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled
    slab: rename slab_bufctl to slab_freelist
    slab: remove useless statement for checking pfmemalloc
    slab: use struct page for slab management
    slab: replace free and inuse in struct slab with newly introduced active
    slab: remove SLAB_LIMIT
    slab: remove kmem_bufctl_t
    slab: change the management method of free objects of the slab
    slab: use __GFP_COMP flag for allocating slab pages
    slab: use well-defined macro, virt_to_slab()
    slab: overloading the RCU head over the LRU for RCU free
    slab: remove cachep in struct slab_rcu
    slab: remove nodeid in struct slab
    slab: remove colouroff in struct slab
    slab: change return type of kmem_getpages() to struct page
    ...

    Linus Torvalds
     

16 Nov, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual earth-shaking, news-breaking, rocket science pile from
    trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
    doc: add missing files to timers/00-INDEX
    timekeeping: Fix some trivial typos in comments
    mm: Fix some trivial typos in comments
    irq: Fix some trivial typos in comments
    NUMA: fix typos in Kconfig help text
    mm: update 00-INDEX
    doc: Documentation/DMA-attributes.txt fix typo
    DRM: comment: `halve' -> `half'
    Docs: Kconfig: `devlopers' -> `developers'
    doc: typo on word accounting in kprobes.c in mutliple architectures
    treewide: fix "usefull" typo
    treewide: fix "distingush" typo
    mm/Kconfig: Grammar s/an/a/
    kexec: Typo s/the/then/
    Documentation/kvm: Update cpuid documentation for steal time and pv eoi
    treewide: Fix common typo in "identify"
    __page_to_pfn: Fix typo in comment
    Correct some typos for word frequency
    clk: fixed-factor: Fix a trivial typo
    ...

    Linus Torvalds
     

13 Nov, 2013

1 commit


12 Nov, 2013

2 commits

  • Acked-by: Christoph Lameter
    Signed-off-by: Zhi Yong Wu
    Signed-off-by: Pekka Enberg

    Zhi Yong Wu
     
  • Andreas Herrmann writes:

    When I've used slub_debug kernel option (e.g.
    "slub_debug=,skbuff_fclone_cache" or similar) on a debug session I've
    seen a panic like:

    Highbank #setenv bootargs console=ttyAMA0 root=/dev/sda2 kgdboc.kgdboc=ttyAMA0,115200 slub_debug=,kmalloc-4096 earlyprintk=ttyAMA0
    ...
    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = c0004000
    [00000000] *pgd=00000000
    Internal error: Oops: 5 [#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Tainted: G W 3.12.0-00048-gbe408cd #314
    task: c0898360 ti: c088a000 task.ti: c088a000
    PC is at strncmp+0x1c/0x84
    LR is at kmem_cache_flags.isra.46.part.47+0x44/0x60
    pc : [] lr : [] psr: 200001d3
    sp : c088bea8 ip : c088beb8 fp : c088beb4
    r10: 00000000 r9 : 413fc090 r8 : 00000001
    r7 : 00000000 r6 : c2984a08 r5 : c0966e78 r4 : 00000000
    r3 : 0000006b r2 : 0000000c r1 : 00000000 r0 : c2984a08
    Flags: nzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel
    Control: 10c5387d Table: 0000404a DAC: 00000015
    Process swapper (pid: 0, stack limit = 0xc088a248)
    Stack: (0xc088bea8 to 0xc088c000)
    bea0: c088bed4 c088beb8 c0110a3c c02c6d90 c0966e78 00000040
    bec0: ef001f00 00000040 c088bf14 c088bed8 c0112070 c0110a04 00000005 c010fac8
    bee0: c088bf5c c088bef0 c010fac8 ef001f00 00000040 00000000 00000040 00000001
    bf00: 413fc090 00000000 c088bf34 c088bf18 c0839190 c0112040 00000000 ef001f00
    bf20: 00000000 00000000 c088bf54 c088bf38 c0839200 c083914c 00000006 c0961c4c
    bf40: c0961c28 00000000 c088bf7c c088bf58 c08392ac c08391c0 c08a2ed8 c0966e78
    bf60: c086b874 c08a3f50 c0961c28 00000001 c088bfb4 c088bf80 c083b258 c0839248
    bf80: 2f800000 0f000000 c08935b4 ffffffff c08cd400 ffffffff c08cd400 c0868408
    bfa0: c29849c0 00000000 c088bff4 c088bfb8 c0824974 c083b1e4 ffffffff ffffffff
    bfc0: c08245c0 00000000 00000000 c0868408 00000000 10c5387d c0892bcc c0868404
    bfe0: c0899440 0000406a 00000000 c088bff8 00008074 c0824824 00000000 00000000
    [] (strncmp+0x1c/0x84) from [] (kmem_cache_flags.isra.46.part.47+0x44/0x60)
    [] (kmem_cache_flags.isra.46.part.47+0x44/0x60) from [] (__kmem_cache_create+0x3c/0x410)
    [] (__kmem_cache_create+0x3c/0x410) from [] (create_boot_cache+0x50/0x74)
    [] (create_boot_cache+0x50/0x74) from [] (create_kmalloc_cache+0x4c/0x88)
    [] (create_kmalloc_cache+0x4c/0x88) from [] (create_kmalloc_caches+0x70/0x114)
    [] (create_kmalloc_caches+0x70/0x114) from [] (kmem_cache_init+0x80/0xe0)
    [] (kmem_cache_init+0x80/0xe0) from [] (start_kernel+0x15c/0x318)
    [] (start_kernel+0x15c/0x318) from [] (0x8074)
    Code: e3520000 01a00002 089da800 e5d03000 (e5d1c000)
    ---[ end trace 1b75b31a2719ed1d ]---
    Kernel panic - not syncing: Fatal exception

    Problem is that slub_debug option is not parsed before
    create_boot_cache is called. Solve this by changing slub_debug to
    early_param.

    Kernels 3.11, 3.10 are also affected. I am not sure about older
    kernels.

    Christoph Lameter explains:

    kmem_cache_flags may be called with NULL parameter during early boot.
    Skip the test in that case.

    Cc: stable@vger.kernel.org # 3.10 and 3.11
    Reported-by: Andreas Herrmann
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

25 Oct, 2013

1 commit

  • Move all kmemleak calls into hook functions, and make it so
    that all hooks (both inside and outside of #ifdef CONFIG_SLUB_DEBUG)
    call the appropriate kmemleak routines. This allows for kmemleak
    to be configured independently of slub debug features.

    It also fixes a bug where kmemleak was only partially enabled in some
    configurations.

    Acked-by: Catalin Marinas
    Acked-by: Christoph Lameter
    Signed-off-by: Roman Bobniev
    Signed-off-by: Tim Bird
    Signed-off-by: Pekka Enberg

    Roman Bobniev
     

18 Oct, 2013

1 commit


15 Sep, 2013

1 commit

  • Pull SLAB update from Pekka Enberg:
    "Nothing terribly exciting here apart from Christoph's kmalloc
    unification patches that brings sl[aou]b implementations closer to
    each other"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    slab: Use correct GFP_DMA constant
    slub: remove verify_mem_not_deleted()
    mm/sl[aou]b: Move kmallocXXX functions to common code
    mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
    mm/slub.c: beautify code for removing redundancy 'break' statement.
    slub: Remove unnecessary page NULL check
    slub: don't use cpu partial pages on UP
    mm/slub: beautify code for 80 column limitation and tab alignment
    mm/slub: remove 'per_cpu' which is useless variable

    Linus Torvalds
     

12 Sep, 2013

1 commit


05 Sep, 2013

2 commits

  • I do not see any user for this code in the tree.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • The kmalloc* functions of all slab allcoators are similar now so
    lets move them into slab.h. This requires some function naming changes
    in slob.

    As a results of this patch there is a common set of functions for
    all allocators. Also means that kmalloc_large() is now available
    in general to perform large order allocations that go directly
    via the page allocator. kmalloc_large() can be substituted if
    kmalloc() throws warnings because of too large allocations.

    kmalloc_large() has exactly the same semantics as kmalloc but
    can only used for allocations > PAGE_SIZE.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

13 Aug, 2013

2 commits


09 Aug, 2013

1 commit

  • This reverts commit 318df36e57c0ca9f2146660d41ff28e8650af423.

    This commit caused Steven Rostedt's hackbench runs to run out of memory
    due to a leak. As noted by Joonsoo Kim, it is buggy in the following
    scenario:

    "I guess, you may set 0 to all kmem caches's cpu_partial via sysfs,
    doesn't it?

    In this case, memory leak is possible in following case. Code flow of
    possible leak is follwing case.

    * in __slab_free()
    1. (!new.inuse || !prior) && !was_frozen
    2. !kmem_cache_debug && !prior
    3. new.frozen = 1
    4. after cmpxchg_double_slab, run the (!n) case with new.frozen=1
    5. with this patch, put_cpu_partial() doesn't do anything,
    because this cache's cpu_partial is 0
    6. return

    In step 5, leak occur"

    And Steven does indeed have cpu_partial set to 0 due to RT testing.

    Joonsoo is cooking up a patch, but everybody agrees that reverting this
    for now is the right thing to do.

    Reported-and-bisected-by: Steven Rostedt
    Acked-by: Joonsoo Kim
    Acked-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Jul, 2013

1 commit


15 Jul, 2013

4 commits

  • Remove 'per_cpu', since it is useless now after the patch: "205ab99
    slub: Update statistics handling for variable order slabs". And the
    partial list is handled in the same way as the per cpu slab.

    Acked-by: Christoph Lameter
    Signed-off-by: Chen Gang
    Signed-off-by: Pekka Enberg

    Chen Gang
     
  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Pull slab update from Pekka Enberg:
    "Highlights:

    - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

    - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

    - Fix for excessive slab freelist draining (Wanpeng Li)

    - SLUB and SLOB cleanups and fixes (various people)"

    I ended up editing the branch, and this avoids two commits at the end
    that were immediately reverted, and I instead just applied the oneliner
    fix in between myself.

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
    slub: Check for page NULL before doing the node_match check
    mm/slab: Give s_next and s_stop slab-specific names
    slob: Check for NULL pointer before calling ctor()
    slub: Make cpu partial slab support configurable
    slab: add kmalloc() to kernel API documentation
    slab: fix init_lock_keys
    slob: use DIV_ROUND_UP where possible
    slub: do not put a slab to cpu partial list when cpu_partial is 0
    mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
    mm/slub: Drop unnecessary nr_partials
    mm/slab: Fix /proc/slabinfo unwriteable for slab
    mm/slab: Sharing s_next and s_stop between slab and slub
    mm/slab: Fix drain freelist excessively
    slob: Rework #ifdeffery in slab.h
    mm, slab: moved kmem_cache_alloc_node comment to correct place

    Linus Torvalds
     
  • In the -rt kernel (mrg), we hit the following dump:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] kmem_cache_alloc_node+0x51/0x180
    PGD a2d39067 PUD b1641067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
    CPU 3
    Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
    RIP: 0010:[] [] kmem_cache_alloc_node+0x51/0x180
    RSP: 0018:ffff8800a9b17d70 EFLAGS: 00010213
    RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
    RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
    RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
    R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
    R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
    FS: 00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
    Stack:
    ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
    0000000001200011 0000000001200011 0000000000000000 0000000000000000
    00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
    Call Trace:
    [] ? current_has_perm+0x68/0x80
    [] copy_process+0xdd/0x15b0
    [] ? rt_up_read+0x25/0x30
    [] do_fork+0x5a/0x360
    [] ? migrate_enable+0xeb/0x220
    [] sys_clone+0x28/0x30
    [] stub_clone+0x13/0x20
    [] ? system_call_fastpath+0x16/0x1b
    Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
    RIP [] kmem_cache_alloc_node+0x51/0x180
    RSP
    CR2: 0000000000000000
    ---[ end trace 0000000000000002 ]---

    Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
    with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
    disable migration. But the SLUB code is relatively lockless, and the
    spin_locks there are raw_spin_locks (not converted to mutexes), thus I
    believe this bug can happen in mainline without -rt features. The -rt
    patch is just good at triggering mainline bugs ;-)

    Anyway, looking at where this crashed, it seems that the page variable
    can be NULL when passed to the node_match() function (which does not
    check if it is NULL). When this happens we get the above panic.

    As page is only used in slab_alloc() to check if the node matches, if
    it's NULL I'm assuming that we can say it doesn't and call the
    __slab_alloc() code. Is this a correct assumption?

    Acked-by: Christoph Lameter
    Signed-off-by: Steven Rostedt
    Signed-off-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

08 Jul, 2013

1 commit

  • CPU partial support can introduce level of indeterminism that is not
    wanted in certain context (like a realtime kernel). Make it
    configurable.

    This patch is based on Christoph Lameter's "slub: Make cpu partial slab
    support configurable V2".

    Acked-by: Christoph Lameter
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Pekka Enberg

    Joonsoo Kim