04 Jun, 2011

1 commit

  • On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled,
    the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early
    panic during boot unless we always align cpu_slab properly.

    In principle we could remove the alignment-testing VM_BUG_ON() for
    architectures that don't have CMPXCHG_LOCAL, but leaving it in means
    that new code will tend not to break x86 even if it is introduced
    on another platform, and it's low cost to require alignment.

    Acked-by: David Rientjes
    Acked-by: Christoph Lameter
    Signed-off-by: Chris Metcalf
    Signed-off-by: Pekka Enberg

    Chris Metcalf
     

26 May, 2011

1 commit

  • Commit a71ae47a2cbf ("slub: Fix double bit unlock in debug mode")
    removed the only goto to this label, resulting in

    mm/slub.c: In function '__slab_alloc':
    mm/slub.c:1834: warning: label 'unlock_out' defined but not used

    fixed trivially by the removal of the label itself too.

    Reported-by: Stephen Rothwell
    Cc: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 May, 2011

1 commit

  • Commit 442b06bcea23 ("slub: Remove node check in slab_free") added a
    call to deactivate_slab() in the debug case in __slab_alloc(), which
    unlocks the current slab used for allocation. Going to the label
    'unlock_out' then does it again.

    Also, in the debug case we do not need all the other processing that the
    'unlock_out' path does. We always fall back to the slow path in the
    debug case. So the tid update is useless.

    Similarly, ALLOC_SLOWPATH would just be incremented for all allocations.
    Also a pretty useless thing.

    So simply restore irq flags and return the object.

    Signed-off-by: Christoph Lameter
    Reported-and-bisected-by: James Morris
    Reported-by: Ingo Molnar
    Reported-by: Jens Axboe
    Cc: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

24 May, 2011

1 commit


21 May, 2011

1 commit

  • We can set the page pointing in the percpu structure to
    NULL to have the same effect as setting c->node to NUMA_NO_NODE.

    Gets rid of one check in slab_free() that was only used for
    forcing the slab_free to the slowpath for debugging.

    We still need to set c->node to NUMA_NO_NODE to force the
    slab_alloc() fastpath to the slowpath in case of debugging.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

18 May, 2011

3 commits

  • Jumping to a label inside a conditional is considered poor style,
    especially considering the current organization of __slab_alloc().

    This removes the 'load_from_page' label and just duplicates the three
    lines of code that it uses:

    c->node = page_to_nid(page);
    c->page = page;
    goto load_freelist;

    since it's probably not worth making this a separate helper function.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     
  • Fastpath can do a speculative access to a page that CONFIG_DEBUG_PAGE_ALLOC may have
    marked as invalid to retrieve the pointer to the next free object.

    Use probe_kernel_read in that case in order not to cause a page fault.

    Cc: # 38.x
    Reported-by: Eric Dumazet
    Signed-off-by: Christoph Lameter
    Signed-off-by: Eric Dumazet
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Move the #ifdef so that get_map is only defined if CONFIG_SLUB_DEBUG is defined.

    Reported-by: David Rientjes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

08 May, 2011

1 commit

  • Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
    everywhere.

    There may be performance implications since:

    A. We now have to manage a transaction ID for all arches

    B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
    to a very short irqoff section.

    There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
    case we only have to do one disable and enable like before.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

05 May, 2011

1 commit

  • The SLUB allocator use of the cmpxchg_double logic was wrong: it
    actually needs the irq-safe one.

    That happens automatically when we use the native unlocked 'cmpxchg8b'
    instruction, but when compiling the kernel for older x86 CPUs that do
    not support that instruction, we fall back to the generic emulation
    code.

    And if you don't specify that you want the irq-safe version, the generic
    code ends up just open-coding the cmpxchg8b equivalent without any
    protection against interrupts or preemption. Which definitely doesn't
    work for SLUB.

    This was reported by Werner Landgraf , who saw
    instability with his distro-kernel that was compiled to support pretty
    much everything under the sun. Most big Linux distributions tend to
    compile for PPro and later, and would never have noticed this problem.

    This also fixes the prototypes for the irqsafe cmpxchg_double functions
    to use 'bool' like they should.

    [ Btw, that whole "generic code defaults to no protection" design just
    sounds stupid - if the code needs no protection, there is no reason to
    use "cmpxchg_double" to begin with. So we should probably just remove
    the unprotected version entirely as pointless. - Linus ]

    Signed-off-by: Thomas Gleixner
    Reported-and-tested-by: werner
    Acked-and-tested-by: Ingo Molnar
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Jens Axboe
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

17 Apr, 2011

5 commits


13 Apr, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

2 commits

  • It turns out that the cmpxchg16b emulation has to access vmalloced
    percpu memory with interrupts disabled. If the memory has never
    been touched before then the fault necessary to establish the
    mapping will not to occur and the kernel will fail on boot.

    Fix that by reusing the CONFIG_PREEMPT code that writes the
    cpu number into a field on every cpu. Writing to the per cpu
    area before causes the mapping to be established before we get
    to a cmpxchg16b emulation.

    Tested-by: Ingo Molnar
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • On Thu, 24 Mar 2011, Ingo Molnar wrote:
    > RIP: 0010:[] [] get_next_timer_interrupt+0x119/0x260

    That's a typical timer crash, but you were unable to debug it with
    debugobjects because commit d3f661d6 broke those.

    Cc: Christoph Lameter
    Tested-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Pekka Enberg

    Thomas Gleixner
     

23 Mar, 2011

2 commits


21 Mar, 2011

2 commits


12 Mar, 2011

3 commits


11 Mar, 2011

2 commits

  • Use the this_cpu_cmpxchg_double functionality to implement a lockless
    allocation algorithm on arches that support fast this_cpu_ops.

    Each of the per cpu pointers is paired with a transaction id that ensures
    that updates of the per cpu information can only occur in sequence on
    a certain cpu.

    A transaction id is a "long" integer that is comprised of an event number
    and the cpu number. The event number is incremented for every change to the
    per cpu state. This means that the cmpxchg instruction can verify for an
    update that nothing interfered and that we are updating the percpu structure
    for the processor where we picked up the information and that we are also
    currently on that processor when we update the information.

    This results in a significant decrease of the overhead in the fastpaths. It
    also makes it easy to adopt the fast path for realtime kernels since this
    is lockless and does not require the use of the current per cpu area
    over the critical section. It is only important that the per cpu area is
    current at the beginning of the critical section and at the end.

    So there is no need even to disable preemption.

    Test results show that the fastpath cycle count is reduced by up to ~ 40%
    (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
    adds a few cycles.

    Sadly this does nothing for the slowpath which is where the main issues with
    performance in slub are but the best case performance rises significantly.
    (For that see the more complex slub patches that require cmpxchg_double)

    Kmalloc: alloc/free test

    Before:

    10000 times kmalloc(8)/kfree -> 134 cycles
    10000 times kmalloc(16)/kfree -> 152 cycles
    10000 times kmalloc(32)/kfree -> 144 cycles
    10000 times kmalloc(64)/kfree -> 142 cycles
    10000 times kmalloc(128)/kfree -> 142 cycles
    10000 times kmalloc(256)/kfree -> 132 cycles
    10000 times kmalloc(512)/kfree -> 132 cycles
    10000 times kmalloc(1024)/kfree -> 135 cycles
    10000 times kmalloc(2048)/kfree -> 135 cycles
    10000 times kmalloc(4096)/kfree -> 135 cycles
    10000 times kmalloc(8192)/kfree -> 144 cycles
    10000 times kmalloc(16384)/kfree -> 754 cycles

    After:

    10000 times kmalloc(8)/kfree -> 78 cycles
    10000 times kmalloc(16)/kfree -> 78 cycles
    10000 times kmalloc(32)/kfree -> 82 cycles
    10000 times kmalloc(64)/kfree -> 88 cycles
    10000 times kmalloc(128)/kfree -> 79 cycles
    10000 times kmalloc(256)/kfree -> 79 cycles
    10000 times kmalloc(512)/kfree -> 85 cycles
    10000 times kmalloc(1024)/kfree -> 82 cycles
    10000 times kmalloc(2048)/kfree -> 82 cycles
    10000 times kmalloc(4096)/kfree -> 85 cycles
    10000 times kmalloc(8192)/kfree -> 82 cycles
    10000 times kmalloc(16384)/kfree -> 706 cycles

    Kmalloc: Repeatedly allocate then free test

    Before:

    10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
    10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
    10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
    10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
    10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
    10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
    10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
    10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
    10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
    10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
    10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
    10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles

    After:

    10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
    10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
    10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
    10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
    10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
    10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
    10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
    10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
    10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
    10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
    10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
    10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • The following patch will make the fastpaths lockless and will no longer
    require interrupts to be disabled. Calling the free hook with irq disabled
    will no longer be possible.

    Move the slab_free_hook_irq() logic into slab_free_hook. Only disable
    interrupts if the features are selected that require callbacks with
    interrupts off and reenable after calls have been made.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

27 Feb, 2011

1 commit

  • mm/slub.c: In function 'ksize':
    mm/slub.c:2728: error: implicit declaration of function 'slab_ksize'

    slab_ksize() needs to go out of CONFIG_SLUB_DEBUG section.

    Acked-by: Randy Dunlap
    Acked-by: David Rientjes
    Signed-off-by: Mariusz Kozlowski
    Signed-off-by: Pekka Enberg

    Mariusz Kozlowski
     

23 Feb, 2011

1 commit

  • Recent use of ksize() in network stack (commit ca44ac38 : net: don't
    reallocate skb->head unless the current one hasn't the needed extra size
    or is shared) triggers kmemcheck warnings, because ksize() can return
    more space than kmemcheck is aware of.

    Pekka Enberg noticed SLAB+kmemcheck is doing the right thing, while SLUB
    +kmemcheck doesnt.

    Bugzilla reference #27212

    Reported-by: Christian Casteyde
    Suggested-by: Pekka Enberg
    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Acked-by: David Rientjes
    Acked-by: Christoph Lameter
    CC: Changli Gao
    CC: Andrew Morton
    Signed-off-by: Pekka Enberg

    Eric Dumazet
     

24 Jan, 2011

1 commit


15 Jan, 2011

1 commit


14 Jan, 2011

1 commit


11 Jan, 2011

2 commits

  • The purpose of the locking is to prevent removal and additions
    of nodes when statistics are gathered for a slab cache. So we
    need to avoid racing with memory hotplug functionality.

    It is enough to take the memory hotplug locks there instead
    of the slub_lock.

    online_pages() currently does not acquire the memory_hotplug
    lock. Another patch will be submitted by the memory hotplug
    authors to take the memory hotplug lock and describe the
    uses of the memory hotplug lock to protect against
    adding and removal of nodes from non hotplug data structures.

    Cc: # 2.6.37
    Reported-and-tested-by: Bart Van Assche
    Acked-by: David Rientjes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slub: Fix a crash during slabinfo -v
    tracing/slab: Move kmalloc tracepoint out of inline code
    slub: Fix slub_lock down/up imbalance
    slub: Fix build breakage in Documentation/vm
    slub tracing: move trace calls out of always inlined functions to reduce kernel code size
    slub: move slabinfo.c to tools/slub/slabinfo.c

    Linus Torvalds
     

07 Jan, 2011

1 commit


04 Dec, 2010

2 commits

  • Commit f7cb1933621bce66a77f690776a16fe3ebbc4d58 ("SLUB: Pass active
    and inactive redzone flags instead of boolean to debug functions")
    missed two instances of check_object(). This caused a lot of warnings
    during 'slabinfo -v' finally leading to a crash:

    BUG ext4_xattr: Freepointer corrupt
    ...
    BUG buffer_head: Freepointer corrupt
    ...
    BUG ext4_alloc_context: Freepointer corrupt
    ...
    ...
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] file_sb_list_del+0x1c/0x35
    PGD 79d78067 PUD 79e67067 PMD 0
    Oops: 0002 [#1] SMP
    last sysfs file: /sys/kernel/slab/:t-0000192/validate

    This patch fixes the problem by converting the two missed instances.

    Acked-by: Christoph Lameter
    Signed-off-by: Tero Roponen
    Signed-off-by: Pekka Enberg

    Tero Roponen
     
  • Commit f7cb1933621bce66a77f690776a16fe3ebbc4d58 ("SLUB: Pass active
    and inactive redzone flags instead of boolean to debug functions")
    missed two instances of check_object(). This caused a lot of warnings
    during 'slabinfo -v' finally leading to a crash:

    BUG ext4_xattr: Freepointer corrupt
    ...
    BUG buffer_head: Freepointer corrupt
    ...
    BUG ext4_alloc_context: Freepointer corrupt
    ...
    ...
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: [] file_sb_list_del+0x1c/0x35
    PGD 79d78067 PUD 79e67067 PMD 0
    Oops: 0002 [#1] SMP
    last sysfs file: /sys/kernel/slab/:t-0000192/validate

    This patch fixes the problem by converting the two missed instances.

    Acked-by: Christoph Lameter
    Signed-off-by: Tero Roponen
    Signed-off-by: Pekka Enberg

    Tero Roponen
     

14 Nov, 2010

1 commit

  • There are two places, that do not release the slub_lock.

    Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable
    sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal
    of slab caches during boot).

    Acked-by: Christoph Lameter
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Pekka Enberg

    Pavel Emelyanov
     

06 Nov, 2010

1 commit

  • There are two places, that do not release the slub_lock.

    Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable
    sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal
    of slab caches during boot).

    Acked-by: Christoph Lameter
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Pekka Enberg

    Pavel Emelyanov