27 Aug, 2011

1 commit

  • The slab has just one free object, adding it to partial list head doesn't make
    sense. And it can cause lock contentation. For example,
    1. CPU takes the slab from partial list
    2. fetch an object
    3. switch to another slab
    4. free an object, then the slab is added to partial list again
    In this way n->list_lock will be heavily contended.
    In fact, Alex had a hackbench regression. 3.1-rc1 performance drops about 70%
    against 3.0. This patch fixes it.

    Acked-by: Christoph Lameter
    Reported-by: Alex Shi
    Signed-off-by: Shaohua Li
    Signed-off-by: Shaohua Li
    Signed-off-by: Pekka Enberg

    Shaohua Li
     

10 Aug, 2011

1 commit

  • deactivate_slab() has the comparison if more than the minimum number of
    partial pages are in the partial list wrong. An effect of this may be that
    empty pages are not freed from deactivate_slab(). The result could be an
    OOM due to growth of the partial slabs per node. Frees mostly occur from
    __slab_free which is okay so this would only affect use cases where a lot
    of switching around of per cpu slabs occur.

    Switching per cpu slabs occurs with high frequency if debugging options are
    enabled.

    Reported-and-tested-by: Xiaotian Feng
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

09 Aug, 2011

2 commits

  • The check_bytes() function is used by slub debugging. It returns a pointer
    to the first unmatching byte for a character in the given memory area.

    If the character for matching byte is greater than 0x80, check_bytes()
    doesn't work. Becuase 64-bit pattern is generated as below.

    value64 = value | value << 8 | value << 16 | value << 24;
    value64 = value64 | value64 << 32;

    The integer promotions are performed and sign-extended as the type of value
    is u8. The upper 32 bits of value64 is 0xffffffff in the first line, and
    the second line has no effect.

    This fixes the 64-bit pattern generation.

    Signed-off-by: Akinobu Mita
    Cc: Christoph Lameter
    Cc: Matt Mackall
    Reviewed-by: Marcin Slusarz
    Acked-by: Eric Dumazet
    Signed-off-by: Pekka Enberg

    Akinobu Mita
     
  • When a slab is freed by __slab_free() and the slab can only contain a
    single object ever then it was full (and therefore not on the partial
    lists but on the full list in the debug case) before we reached
    slab_empty.

    This caused the following full list corruption when SLUB debugging was enabled:

    [ 5913.233035] ------------[ cut here ]------------
    [ 5913.233097] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
    [ 5913.233101] Hardware name: Adamo 13
    [ 5913.233105] list_del corruption. prev->next should be ffffea000434fd20, but was ffffea0004199520
    [ 5913.233108] Modules linked in: nfs fscache fuse ebtable_nat ebtables ppdev parport_pc lp parport ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss xt_CHECKSUM sunrpc iptable_mangle bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables rfcomm bnep arc4 iwlagn snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel btusb mac80211 snd_hda_codec bluetooth snd_hwdep snd_seq snd_seq_device snd_pcm usb_debug dell_wmi sparse_keymap cdc_ether usbnet cdc_acm uvcvideo cdc_wdm mii cfg80211 snd_timer dell_laptop videodev dcdbas snd microcode v4l2_compat_ioctl32 soundcore joydev tg3 pcspkr snd_page_alloc iTCO_wdt i2c_i801 rfkill iTCO_vendor_support wmi virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
    [ 5913.233213] Pid: 0, comm: swapper Not tainted 3.0.0+ #127
    [ 5913.233213] Call Trace:
    [ 5913.233213] [] warn_slowpath_common+0x83/0x9b
    [ 5913.233213] [] warn_slowpath_fmt+0x46/0x48
    [ 5913.233213] [] __list_del_entry+0x8d/0x98
    [ 5913.233213] [] list_del+0xe/0x2d
    [ 5913.233213] [] __slab_free+0x1db/0x235
    [ 5913.233213] [] ? bvec_free_bs+0x35/0x37
    [ 5913.233213] [] ? bvec_free_bs+0x35/0x37
    [ 5913.233213] [] ? bvec_free_bs+0x35/0x37
    [ 5913.233213] [] kmem_cache_free+0x88/0x102
    [ 5913.233213] [] bvec_free_bs+0x35/0x37
    [ 5913.233213] [] bio_free+0x34/0x64
    [ 5913.233213] [] dm_bio_destructor+0x12/0x14
    [ 5913.233213] [] bio_put+0x2b/0x2d
    [ 5913.233213] [] clone_endio+0x9e/0xb4
    [ 5913.233213] [] bio_endio+0x2d/0x2f
    [ 5913.233213] [] crypt_dec_pending+0x5c/0x8b [dm_crypt]
    [ 5913.233213] [] crypt_endio+0x78/0x81 [dm_crypt]

    [ Full discussion here: https://lkml.org/lkml/2011/8/4/375 ]

    Make sure that we remove such a slab also from the full lists.

    Reported-and-tested-by: Dave Jones
    Reported-and-tested-by: Xiaotian Feng
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

31 Jul, 2011

1 commit

  • * 'slub/lockless' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: (21 commits)
    slub: When allocating a new slab also prep the first object
    slub: disable interrupts in cmpxchg_double_slab when falling back to pagelock
    Avoid duplicate _count variables in page_struct
    Revert "SLUB: Fix build breakage in linux/mm_types.h"
    SLUB: Fix build breakage in linux/mm_types.h
    slub: slabinfo update for cmpxchg handling
    slub: Not necessary to check for empty slab on load_freelist
    slub: fast release on full slab
    slub: Add statistics for the case that the current slab does not match the node
    slub: Get rid of the another_slab label
    slub: Avoid disabling interrupts in free slowpath
    slub: Disable interrupts in free_debug processing
    slub: Invert locking and avoid slab lock
    slub: Rework allocator fastpaths
    slub: Pass kmem_cache struct to lock and freeze slab
    slub: explicit list_lock taking
    slub: Add cmpxchg_double_slab()
    mm: Rearrange struct page
    slub: Move page->frozen handling near where the page->freelist handling occurs
    slub: Do not use frozen page flag but a bit in the page counters
    ...

    Linus Torvalds
     

26 Jul, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    fs: Merge split strings
    treewide: fix potentially dangerous trailing ';' in #defined values/expressions
    uwb: Fix misspelling of neighbourhood in comment
    net, netfilter: Remove redundant goto in ebt_ulog_packet
    trivial: don't touch files that are removed in the staging tree
    lib/vsprintf: replace link to Draft by final RFC number
    doc: Kconfig: `to be' -> `be'
    doc: Kconfig: Typo: square -> squared
    doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
    drivers/net: static should be at beginning of declaration
    drivers/media: static should be at beginning of declaration
    drivers/i2c: static should be at beginning of declaration
    XTENSA: static should be at beginning of declaration
    SH: static should be at beginning of declaration
    MIPS: static should be at beginning of declaration
    ARM: static should be at beginning of declaration
    rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
    Update my e-mail address
    PCIe ASPM: forcedly -> forcibly
    gma500: push through device driver tree
    ...

    Fix up trivial conflicts:
    - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
    - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
    - drivers/net/r8169.c (just context changes)

    Linus Torvalds
     
  • We need to branch to the debug code for the first object if we allocate
    a new slab otherwise the first object will be marked wrongly as inactive.

    Tested-by: Rabin Vincent
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

23 Jul, 2011

1 commit

  • * 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
    slab: fix DEBUG_SLAB warning
    slab: shrink sizeof(struct kmem_cache)
    slab: fix DEBUG_SLAB build
    SLUB: Fix missing include
    slub: reduce overhead of slub_debug
    slub: Add method to verify memory is not freed
    slub: Enable backtrace for create/delete points
    slab allocators: Provide generic description of alignment defines
    slab, slub, slob: Unify alignment definition
    slob/lockdep: Fix gfp flags passed to lockdep

    Linus Torvalds
     

21 Jul, 2011

1 commit

  • All these are instances of
    #define NAME value;
    or
    #define NAME(params_opt) value;

    These of course fail to build when used in contexts like
    if(foo $OP NAME)
    while(bar $OP NAME)
    and may silently generate the wrong code in contexts such as
    foo = NAME + 1; /* foo = value; + 1; */
    bar = NAME - 1; /* bar = value; - 1; */
    baz = NAME & quux; /* baz = value; & quux; */

    Reported on comp.lang.c,
    Message-ID:
    Initial analysis of the dangers provided by Keith Thompson in that thread.

    There are many more instances of more complicated macros having unnecessary
    trailing semicolons, but this pile seems to be all of the cases of simple
    values suffering from the problem. (Thus things that are likely to be found
    in one of the contexts above, more complicated ones aren't.)

    Signed-off-by: Phil Carmody
    Signed-off-by: Jiri Kosina

    Phil Carmody
     

18 Jul, 2011

1 commit

  • Split cmpxchg_double_slab into two functions. One for the case where we know that
    interrupts are disabled (and therefore the fallback does not need to disable
    interrupts) and one for the other cases where fallback will also disable interrupts.

    This fixes the issue that __slab_free called cmpxchg_double_slab in some scenarios
    without disabling interrupts.

    Tested-by: Hugh Dickins
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

08 Jul, 2011

4 commits

  • This fixes the following build breakage commit d6543e3 ("slub: Enable backtrace
    for create/delete points"):

    CC mm/slub.o
    mm/slub.c: In function ‘set_track’:
    mm/slub.c:428: error: storage size of ‘trace’ isn’t known
    mm/slub.c:435: error: implicit declaration of function ‘save_stack_trace’
    mm/slub.c:428: warning: unused variable ‘trace’
    make[1]: *** [mm/slub.o] Error 1
    make: *** [mm/slub.o] Error 2

    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • slub checks for poison one byte by one, which is highly inefficient
    and shows up frequently as a highest cpu-eater in perf top.

    Joining reads gives nice speedup:

    (Compiling some project with different options)
    make -j12 make clean
    slub_debug disabled: 1m 27s 1.2 s
    slub_debug enabled: 1m 46s 7.6 s
    slub_debug enabled + this patch: 1m 33s 3.2 s

    check_bytes still shows up high, but not always at the top.

    Signed-off-by: Marcin Slusarz
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Matt Mackall
    Cc: linux-mm@kvack.org
    Signed-off-by: Pekka Enberg

    Marcin Slusarz
     
  • This is for tracking down suspect memory usage.

    Acked-by: Christoph Lameter
    Signed-off-by: Ben Greear
    Signed-off-by: Pekka Enberg

    Ben Greear
     
  • This patch attempts to grab a backtrace for the creation
    and deletion points of the slub object. When a fault is
    detected, we can then get a better idea of where the item
    was deleted.

    Example output from debugging some funky nfs/rpc behaviour:

    =============================================================================
    BUG kmalloc-64: Object is on free-list
    -----------------------------------------------------------------------------

    INFO: Allocated in rpcb_getport_async+0x39c/0x5a5 [sunrpc] age=381 cpu=3 pid=3750
    __slab_alloc+0x348/0x3ba
    kmem_cache_alloc_trace+0x67/0xe7
    rpcb_getport_async+0x39c/0x5a5 [sunrpc]
    call_bind+0x70/0x75 [sunrpc]
    __rpc_execute+0x78/0x24b [sunrpc]
    rpc_execute+0x3d/0x42 [sunrpc]
    rpc_run_task+0x79/0x81 [sunrpc]
    rpc_call_sync+0x3f/0x60 [sunrpc]
    rpc_ping+0x42/0x58 [sunrpc]
    rpc_create+0x4aa/0x527 [sunrpc]
    nfs_create_rpc_client+0xb1/0xf6 [nfs]
    nfs_init_client+0x3b/0x7d [nfs]
    nfs_get_client+0x453/0x5ab [nfs]
    nfs_create_server+0x10b/0x437 [nfs]
    nfs_fs_mount+0x4ca/0x708 [nfs]
    mount_fs+0x6b/0x152
    INFO: Freed in rpcb_map_release+0x3f/0x44 [sunrpc] age=30 cpu=2 pid=29049
    __slab_free+0x57/0x150
    kfree+0x107/0x13a
    rpcb_map_release+0x3f/0x44 [sunrpc]
    rpc_release_calldata+0x12/0x14 [sunrpc]
    rpc_free_task+0x59/0x61 [sunrpc]
    rpc_final_put_task+0x82/0x8a [sunrpc]
    __rpc_execute+0x23c/0x24b [sunrpc]
    rpc_async_schedule+0x10/0x12 [sunrpc]
    process_one_work+0x230/0x41d
    worker_thread+0x133/0x217
    kthread+0x7d/0x85
    kernel_thread_helper+0x4/0x10
    INFO: Slab 0xffffea00029aa470 objects=20 used=9 fp=0xffff8800be7830d8 flags=0x20000000004081
    INFO: Object 0xffff8800be7830d8 @offset=4312 fp=0xffff8800be7827a8

    Bytes b4 0xffff8800be7830c8: 87 a8 96 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .�......ZZZZZZZZ
    Object 0xffff8800be7830d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xffff8800be7830e8: 6b 6b 6b 6b 01 08 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkk..kkkkkkkkkk
    Object 0xffff8800be7830f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
    Object 0xffff8800be783108: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk�
    Redzone 0xffff8800be783118: bb bb bb bb bb bb bb bb �������������
    Padding 0xffff8800be783258: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
    Pid: 29049, comm: kworker/2:2 Not tainted 3.0.0-rc4+ #8
    Call Trace:
    [] print_trailer+0x131/0x13a
    [] object_err+0x35/0x3e
    [] verify_mem_not_deleted+0x7a/0xb7
    [] rpcb_getport_done+0x23/0x126 [sunrpc]
    [] rpc_exit_task+0x3f/0x6d [sunrpc]
    [] __rpc_execute+0x78/0x24b [sunrpc]
    [] ? rpc_execute+0x42/0x42 [sunrpc]
    [] rpc_async_schedule+0x10/0x12 [sunrpc]
    [] process_one_work+0x230/0x41d
    [] ? process_one_work+0x17b/0x41d
    [] worker_thread+0x133/0x217
    [] ? manage_workers+0x191/0x191
    [] kthread+0x7d/0x85
    [] kernel_thread_helper+0x4/0x10
    [] ? retint_restore_args+0x13/0x13
    [] ? __init_kthread_worker+0x56/0x56
    [] ? gs_change+0x13/0x13

    Acked-by: Christoph Lameter
    Signed-off-by: Ben Greear
    Signed-off-by: Pekka Enberg

    Ben Greear
     

02 Jul, 2011

14 commits


04 Jun, 2011

1 commit

  • On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled,
    the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early
    panic during boot unless we always align cpu_slab properly.

    In principle we could remove the alignment-testing VM_BUG_ON() for
    architectures that don't have CMPXCHG_LOCAL, but leaving it in means
    that new code will tend not to break x86 even if it is introduced
    on another platform, and it's low cost to require alignment.

    Acked-by: David Rientjes
    Acked-by: Christoph Lameter
    Signed-off-by: Chris Metcalf
    Signed-off-by: Pekka Enberg

    Chris Metcalf
     

26 May, 2011

1 commit

  • Commit a71ae47a2cbf ("slub: Fix double bit unlock in debug mode")
    removed the only goto to this label, resulting in

    mm/slub.c: In function '__slab_alloc':
    mm/slub.c:1834: warning: label 'unlock_out' defined but not used

    fixed trivially by the removal of the label itself too.

    Reported-by: Stephen Rothwell
    Cc: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 May, 2011

1 commit

  • Commit 442b06bcea23 ("slub: Remove node check in slab_free") added a
    call to deactivate_slab() in the debug case in __slab_alloc(), which
    unlocks the current slab used for allocation. Going to the label
    'unlock_out' then does it again.

    Also, in the debug case we do not need all the other processing that the
    'unlock_out' path does. We always fall back to the slow path in the
    debug case. So the tid update is useless.

    Similarly, ALLOC_SLOWPATH would just be incremented for all allocations.
    Also a pretty useless thing.

    So simply restore irq flags and return the object.

    Signed-off-by: Christoph Lameter
    Reported-and-bisected-by: James Morris
    Reported-by: Ingo Molnar
    Reported-by: Jens Axboe
    Cc: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

24 May, 2011

1 commit


21 May, 2011

1 commit

  • We can set the page pointing in the percpu structure to
    NULL to have the same effect as setting c->node to NUMA_NO_NODE.

    Gets rid of one check in slab_free() that was only used for
    forcing the slab_free to the slowpath for debugging.

    We still need to set c->node to NUMA_NO_NODE to force the
    slab_alloc() fastpath to the slowpath in case of debugging.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

18 May, 2011

3 commits

  • Jumping to a label inside a conditional is considered poor style,
    especially considering the current organization of __slab_alloc().

    This removes the 'load_from_page' label and just duplicates the three
    lines of code that it uses:

    c->node = page_to_nid(page);
    c->page = page;
    goto load_freelist;

    since it's probably not worth making this a separate helper function.

    Acked-by: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     
  • Fastpath can do a speculative access to a page that CONFIG_DEBUG_PAGE_ALLOC may have
    marked as invalid to retrieve the pointer to the next free object.

    Use probe_kernel_read in that case in order not to cause a page fault.

    Cc: # 38.x
    Reported-by: Eric Dumazet
    Signed-off-by: Christoph Lameter
    Signed-off-by: Eric Dumazet
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     
  • Move the #ifdef so that get_map is only defined if CONFIG_SLUB_DEBUG is defined.

    Reported-by: David Rientjes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

08 May, 2011

1 commit

  • Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
    everywhere.

    There may be performance implications since:

    A. We now have to manage a transaction ID for all arches

    B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
    to a very short irqoff section.

    There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
    case we only have to do one disable and enable like before.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

05 May, 2011

1 commit

  • The SLUB allocator use of the cmpxchg_double logic was wrong: it
    actually needs the irq-safe one.

    That happens automatically when we use the native unlocked 'cmpxchg8b'
    instruction, but when compiling the kernel for older x86 CPUs that do
    not support that instruction, we fall back to the generic emulation
    code.

    And if you don't specify that you want the irq-safe version, the generic
    code ends up just open-coding the cmpxchg8b equivalent without any
    protection against interrupts or preemption. Which definitely doesn't
    work for SLUB.

    This was reported by Werner Landgraf , who saw
    instability with his distro-kernel that was compiled to support pretty
    much everything under the sun. Most big Linux distributions tend to
    compile for PPro and later, and would never have noticed this problem.

    This also fixes the prototypes for the irqsafe cmpxchg_double functions
    to use 'bool' like they should.

    [ Btw, that whole "generic code defaults to no protection" design just
    sounds stupid - if the code needs no protection, there is no reason to
    use "cmpxchg_double" to begin with. So we should probably just remove
    the unprotected version entirely as pointless. - Linus ]

    Signed-off-by: Thomas Gleixner
    Reported-and-tested-by: werner
    Acked-and-tested-by: Ingo Molnar
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Jens Axboe
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

17 Apr, 2011

2 commits