03 Apr, 2009

2 commits

  • Impact: also output kfree(NULL) entries

    This patch moves the trace_kfree() calls before the ZERO_OR_NULL_PTR
    check so that we can trace call-sites that call kfree() with NULL many
    times which might be an indication of a bug.

    Signed-off-by: Pekka Enberg
    Cc: Eduard - Gabriel Munteanu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Pekka Enberg
     
  • kmemtrace now uses tracepoints instead of markers. We no longer need to
    use format specifiers to pass arguments.

    Signed-off-by: Eduard - Gabriel Munteanu
    [ folded: Use the new TP_PROTO and TP_ARGS to fix the build. ]
    [ folded: fix build when CONFIG_KMEMTRACE is disabled. ]
    [ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ]
    Signed-off-by: Pekka Enberg
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eduard - Gabriel Munteanu
     

02 Apr, 2009

1 commit


31 Mar, 2009

1 commit

  • * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits)
    lockdep: fix deadlock in lockdep_trace_alloc
    lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB
    lockdep: annotate reclaim context (__GFP_NOFS), fix
    lockdep: build fix for !PROVE_LOCKING
    lockstat: warn about disabled lock debugging
    lockdep: use stringify.h
    lockdep: simplify check_prev_add_irq()
    lockdep: get_user_chars() redo
    lockdep: simplify get_user_chars()
    lockdep: add comments to mark_lock_irq()
    lockdep: remove macro usage from mark_held_locks()
    lockdep: fully reduce mark_lock_irq()
    lockdep: merge the !_READ mark_lock_irq() helpers
    lockdep: merge the _READ mark_lock_irq() helpers
    lockdep: simplify mark_lock_irq() helpers #3
    lockdep: further simplify mark_lock_irq() helpers
    lockdep: simplify the mark_lock_irq() helpers
    lockdep: split up mark_lock_irq()
    lockdep: generate usage strings
    lockdep: generate the state bit definitions
    ...

    Linus Torvalds
     

24 Mar, 2009

1 commit


23 Mar, 2009

1 commit


05 Mar, 2009

1 commit


25 Feb, 2009

1 commit


23 Feb, 2009

2 commits

  • Now that a cache's min_partial has been moved to struct kmem_cache, it's
    possible to easily tune it from userspace by adding a sysfs attribute.

    It may not be desirable to keep a large number of partial slabs around
    if a cache is used infrequently and memory, especially when constrained
    by a cgroup, is scarce. It's better to allow userspace to set the
    minimum policy per cache instead of relying explicitly on
    kmem_cache_shrink().

    The memory savings from simply moving min_partial from struct
    kmem_cache_node to struct kmem_cache is obviously not significant
    (unless maybe you're from SGI or something), at the largest it's

    # allocated caches * (MAX_NUMNODES - 1) * sizeof(unsigned long)

    The true savings occurs when userspace reduces the number of partial
    slabs that would otherwise be wasted, especially on machines with a
    large number of nodes (ia64 with CONFIG_NODES_SHIFT at 10 for default?).
    As well as the kernel estimates ideal values for n->min_partial and
    ensures it's within a sane range, userspace has no other input other
    than writing to /sys/kernel/slab/cache/shrink.

    There simply isn't any better heuristic to add when calculating the
    partial values for a better estimate that works for all possible caches.
    And since it's currently a static value, the user really has no way of
    reclaiming that wasted space, which can be significant when constrained
    by a cgroup (either cpusets or, later, memory controller slab limits)
    without shrinking it entirely.

    This also allows the user to specify that increased fragmentation and
    more partial slabs are actually desired to avoid the cost of allocating
    new slabs at runtime for specific caches.

    There's also no reason why this should be a per-struct kmem_cache_node
    value in the first place. You could argue that a machine would have
    such node size asymmetries that it should be specified on a per-node
    basis, but we know nobody is doing that right now since it's a purely
    static value at the moment and there's no convenient way to tune that
    via slub's sysfs interface.

    Cc: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     
  • Although it allows for better cacheline use, it is unnecessary to save a
    copy of the cache's min_partial value in each kmem_cache_node.

    Cc: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     

20 Feb, 2009

4 commits


15 Feb, 2009

1 commit

  • Here is another version, with the incremental patch rolled up, and
    added reclaim context annotation to kswapd, and allocation tracing
    to slab allocators (which may only ever reach the page allocator
    in rare cases, so it is good to put annotations here too).

    Haven't tested this version as such, but it should be getting closer
    to merge worthy ;)

    --
    After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
    allocation when it should not have been, I thought it might be a good idea to
    try to catch this kind of thing with lockdep.

    I coded up a little idea that seems to work. Unfortunately the system has to
    actually be in __GFP_FS page reclaim, then take the lock, before it will mark
    it. But at least that might still be some orders of magnitude more common
    (and more debuggable) than an actual deadlock condition, so we have some
    improvement I hope (the concept is no less complete than discovery of a lock's
    interrupt contexts).

    I guess we could even do the same thing with __GFP_IO (normal reclaim), and
    even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
    code paths, so let's start there and see how it goes.

    It *seems* to work. I did a quick test.

    =================================
    [ INFO: inconsistent lock state ]
    2.6.28-rc6-00007-ged31348-dirty #26
    ---------------------------------
    inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
    modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]
    {in-reclaim-W} state was registered at:
    [] __lock_acquire+0x75b/0x1a60
    [] lock_acquire+0x91/0xc0
    [] mutex_lock_nested+0xb1/0x310
    [] brd_init+0x2b/0x216 [brd]
    [] _stext+0x3b/0x170
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    irq event stamp: 3929
    hardirqs last enabled at (3929): [] mutex_lock_nested+0x285/0x310
    hardirqs last disabled at (3928): [] mutex_lock_nested+0x59/0x310
    softirqs last enabled at (3732): [] sk_filter+0x83/0xe0
    softirqs last disabled at (3730): [] sk_filter+0x16/0xe0

    other info that might help us debug this:
    1 lock held by modprobe/8526:
    #0: (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]

    stack backtrace:
    Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
    Call Trace:
    [] print_usage_bug+0x193/0x1d0
    [] mark_lock+0xaf0/0xca0
    [] mark_held_locks+0x55/0xc0
    [] ? brd_init+0x0/0x216 [brd]
    [] trace_reclaim_fs+0x2a/0x60
    [] __alloc_pages_internal+0x475/0x580
    [] ? mutex_lock_nested+0x26e/0x310
    [] ? brd_init+0x0/0x216 [brd]
    [] brd_init+0x6a/0x216 [brd]
    [] ? brd_init+0x0/0x216 [brd]
    [] _stext+0x3b/0x170
    [] ? mutex_unlock+0x9/0x10
    [] ? __mutex_unlock_slowpath+0x10d/0x180
    [] ? trace_hardirqs_on_caller+0x12c/0x190
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Nick Piggin
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Nick Piggin
     

13 Feb, 2009

1 commit


12 Feb, 2009

1 commit

  • Commit 7b2cd92adc5430b0c1adeb120971852b4ea1ab08 ("crypto: api - Fix
    zeroing on free") added modular user of ksize(). Export that to fix
    crypto.ko compilation.

    Cc: Herbert Xu
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Pekka Enberg

    Kirill A. Shutemov
     

03 Feb, 2009

1 commit


28 Jan, 2009

1 commit

  • The per cpu array of kmem_cache_cpu structures accomodates
    NR_KMEM_CACHE_CPU such structs.

    When this array overflows and a struct is allocated by kmalloc(), it may
    have an address at the upper bound of this array. If this happens, it
    does not get freed and the per cpu kmem_cache_cpu_free pointer will be out
    of bounds after kmem_cache_destroy() or cpu offlining.

    Cc: Christoph Lameter
    Signed-off-by: David Rientjes
    Signed-off-by: Pekka Enberg

    David Rientjes
     

14 Jan, 2009

1 commit


11 Jan, 2009

1 commit


06 Jan, 2009

2 commits


01 Jan, 2009

1 commit

  • Impact: Use new API

    Convert kernel mm functions to use struct cpumask.

    We skip include/linux/percpu.h and mm/allocpercpu.c, which are in flux.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Reviewed-by: Christoph Lameter

    Rusty Russell
     

31 Dec, 2008

3 commits


30 Dec, 2008

2 commits

  • Impact: new tracer plugin

    This patch adapts kmemtrace raw events tracing to the unified tracing API.

    To enable and use this tracer, just do the following:

    echo kmemtrace > /debugfs/tracing/current_tracer
    cat /debugfs/tracing/trace

    You will have the following output:

    # tracer: kmemtrace
    #
    #
    # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
    # FREE | | | | | | | |
    # |

    type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
    type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
    type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
    type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
    type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
    type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
    type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
    type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
    type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
    type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
    type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
    type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
    type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
    type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
    type_id 1 call_site 18446744071565585534 ptr 18446612134405955584

    That was to stay backward compatible with the format output produced in
    inux/tracepoint.h.

    This is the default ouput, but note that I tried something else.

    If you change an option:

    echo kmem_minimalistic > /debugfs/trace_options

    and then cat /debugfs/trace, you will have the following output:

    # tracer: kmemtrace
    #
    #
    # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
    # FREE | | | | | | | |
    # |

    - C 0xffff88007c088780 file_free_rcu
    + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
    - C 0xffff88007cad6000 putname
    + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
    + K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc
    - C 0xffff88007cad6000 putname
    + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
    + K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc
    - C 0xffff88007cad6000 putname
    + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
    + K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc
    + K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode
    - C 0xffff88007cad6000 putname
    + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
    - C 0xffff88007cad6000 putname
    + K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode
    + K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info
    + K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc
    + K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp
    + K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp

    Yeah I shall confess kmem_minimalistic should be: kmem_alternative.

    Whatever, I find it more readable but this a personal opinion of course.
    We can drop it if you want.

    On the ALLOC/FREE column, + means an allocation and - a free.

    On the type column, you have K = kmalloc, C = cache, P = page

    I would like the flags to be GFP_* strings but that would not be easy to not
    break the column with strings....

    About the node...it seems to always be -1. I don't know why but that shouldn't
    be difficult to find.

    I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
    be more easy to find the tracer headers if they are all in their common
    directory.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: avoid conflicts with kmemcheck

    kmemcheck modifies the same area of slab.c and slub.c - move the
    include lines up a bit.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Dec, 2008

10 commits


13 Dec, 2008

1 commit

  • …t_scnprintf to take pointers.

    Impact: change calling convention of existing cpumask APIs

    Most cpumask functions started with cpus_: these have been replaced by
    cpumask_ ones which take struct cpumask pointers as expected.

    These four functions don't have good replacement names; fortunately
    they're rarely used, so we just change them over.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Mike Travis <travis@sgi.com>
    Acked-by: Ingo Molnar <mingo@elte.hu>
    Cc: paulus@samba.org
    Cc: mingo@redhat.com
    Cc: tony.luck@intel.com
    Cc: ralf@linux-mips.org
    Cc: Greg Kroah-Hartman <gregkh@suse.de>
    Cc: cl@linux-foundation.org
    Cc: srostedt@redhat.com

    Rusty Russell