06 Jan, 2009

1 commit


01 Jan, 2009

1 commit

  • Impact: Use new API

    Convert kernel mm functions to use struct cpumask.

    We skip include/linux/percpu.h and mm/allocpercpu.c, which are in flux.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Reviewed-by: Christoph Lameter

    Rusty Russell
     

31 Dec, 2008

1 commit


29 Dec, 2008

5 commits


13 Dec, 2008

1 commit

  • …t_scnprintf to take pointers.

    Impact: change calling convention of existing cpumask APIs

    Most cpumask functions started with cpus_: these have been replaced by
    cpumask_ ones which take struct cpumask pointers as expected.

    These four functions don't have good replacement names; fortunately
    they're rarely used, so we just change them over.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Mike Travis <travis@sgi.com>
    Acked-by: Ingo Molnar <mingo@elte.hu>
    Cc: paulus@samba.org
    Cc: mingo@redhat.com
    Cc: tony.luck@intel.com
    Cc: ralf@linux-mips.org
    Cc: Greg Kroah-Hartman <gregkh@suse.de>
    Cc: cl@linux-foundation.org
    Cc: srostedt@redhat.com

    Rusty Russell
     

11 Dec, 2008

1 commit

  • Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
    to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use
    less stack exposing a bug in slub's list_locations() -
    kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
    beyond the end of page provided.

    The 100 slop which list_locations() allows at end of page looks roughly
    enough for all the other stuff it might print after the symbol before
    it checks again: break out KSYM_SYMBOL_LEN earlier than before.

    Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
    need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
    where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
    them.

    [akpm@linux-foundation.org: ftrace.h needs module.h]
    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc Miles Lane
    Acked-by: Pekka Enberg
    Acked-by: Steven Rostedt
    Acked-by: Frederic Weisbecker
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

08 Dec, 2008

1 commit


02 Dec, 2008

1 commit

  • Fixes for memcg/memory hotplug.

    While memory hotplug allocate/free memmap, page_cgroup doesn't free
    page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
    (Because freeing bootmem requires special care.)

    Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
    by memory hotplug, page_cgroup->page == page is no longer true.

    But current MEM_ONLINE handler doesn't check it and update
    page_cgroup->page if it's not necessary to allocate page_cgroup. (This
    was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)

    And I noticed that MEM_ONLINE can be called against "part of section".
    So, freeing page_cgroup at CANCEL_ONLINE will cause trouble. (freeing
    used page_cgroup) Don't rollback at CANCEL.

    One more, current memory hotplug notifier is stopped by slub because it
    sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback never
    be called. (low priority than slub now.)

    I think this slub's behavior is not intentional(BUG). and fixes it.

    Another way to be considered about page_cgroup allocation:
    - free page_cgroup at OFFLINE even if it's from bootmem
    and remove specieal handler. But it requires more changes.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041

    Signed-off-by: KAMEZAWA Hiruyoki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Tested-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

26 Nov, 2008

4 commits


23 Oct, 2008

1 commit


15 Sep, 2008

1 commit


21 Aug, 2008

1 commit

  • Switch remote node defragmentation off by default. The current settings can
    cause excessive node local allocations with hackbench:

    SLAB:

    % cat /proc/meminfo
    MemTotal: 7701760 kB
    MemFree: 5940096 kB
    Slab: 123840 kB

    SLUB:

    % cat /proc/meminfo
    MemTotal: 7701376 kB
    MemFree: 4740928 kB
    Slab: 1591680 kB

    [Note: this feature is not related to slab defragmentation.]

    You can find the original discussion here:

    http://lkml.org/lkml/2008/8/4/308

    Reported-by: KOSAKI Motohiro
    Tested-by: KOSAKI Motohiro
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

05 Aug, 2008

1 commit

  • This patch changes the static MIN_PARTIAL to a dynamic per-cache ->min_partial
    value that is calculated from object size. The bigger the object size, the more
    pages we keep on the partial list.

    I tested SLAB, SLUB, and SLUB with this patch on Jens Axboe's 'netio' example
    script of the fio benchmarking tool. The script stresses the networking
    subsystem which should also give a fairly good beating of kmalloc() et al.

    To run the test yourself, first clone the fio repository:

    git clone git://git.kernel.dk/fio.git

    and then run the following command n times on your machine:

    time ./fio examples/netio

    The results on my 2-way 64-bit x86 machine are as follows:

    [ the minimum, maximum, and average are captured from 50 individual runs ]

    real time (seconds)
    min max avg sd
    SLAB 22.76 23.38 22.98 0.17
    SLUB 22.80 25.78 23.46 0.72
    SLUB (dynamic) 22.74 23.54 23.00 0.20

    sys time (seconds)
    min max avg sd
    SLAB 6.90 8.28 7.70 0.28
    SLUB 7.42 16.95 8.89 2.28
    SLUB (dynamic) 7.17 8.64 7.73 0.29

    user time (seconds)
    min max avg sd
    SLAB 36.89 38.11 37.50 0.29
    SLUB 30.85 37.99 37.06 1.67
    SLUB (dynamic) 36.75 38.07 37.59 0.32

    As you can see from the above numbers, this patch brings SLUB to the same level
    as SLAB for this particular workload fixing a ~2% regression. I'd expect this
    change to help similar workloads that allocate a lot of objects that are close
    to the size of a page.

    Cc: Matthew Wilcox
    Cc: Andrew Morton
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     

30 Jul, 2008

1 commit


27 Jul, 2008

1 commit

  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

25 Jul, 2008

1 commit

  • SLUB reuses two page bits for internal purposes, it overlays PG_active and
    PG_error. This is hidden away in slub.c. Document these overlays
    explicitly in the main page-flags enum along with all the others.

    Signed-off-by: Andy Whitcroft
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Matt Mackall
    Cc: Nick Piggin
    Tested-by: KOSAKI Motohiro
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

19 Jul, 2008

1 commit

  • The limit of 128 bytes is too small when debugging slab corruption of the skb
    cache, for example. So increase the limit to PAGE_SIZE to make debugging
    corruptions easier.

    Acked-by: Ingo Molnar
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     

17 Jul, 2008

1 commit


16 Jul, 2008

3 commits


15 Jul, 2008

1 commit


11 Jul, 2008

1 commit

  • Vegard Nossum reported a crash in kmem_cache_alloc():

    BUG: unable to handle kernel paging request at da87d000
    IP: [] kmem_cache_alloc+0xc7/0xe0
    *pde = 28180163 *pte = 1a87d160
    Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
    EIP: 0060:[] EFLAGS: 00210203 CPU: 0
    EIP is at kmem_cache_alloc+0xc7/0xe0
    EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
    ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

    and analyzed it:

    "The register %ecx looks innocent but is very important here. The disassembly:

    mov %edx,%ecx
    shr $0x2,%ecx
    rep stos %eax,%es:(%edi) > 2 == 0x1adadada.)

    %ecx is the counter for the memset, from here:

    memset(object, 0, c->objsize);

    i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
    Where did "c" come from? Uh-oh...

    c = get_cpu_slab(s, smp_processor_id());

    This looks like it has very much to do with CPU hotplug/unplug. Is
    there a race between SLUB/hotplug since the CPU slab is used after it
    has been freed?"

    Good analysis.

    Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
    can be migrated on another CPU right after local_irq_restore() and
    before memset(). The inital cpu can become offline in the mean time (or
    a migration is a consequence of the CPU going offline) so its
    'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).

    At some point of time the caller continues on another CPU having an
    obsolete pointer...

    Signed-off-by: Dmitry Adamushko
    Reported-by: Vegard Nossum
    Acked-by: Ingo Molnar
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Dmitry Adamushko
     

05 Jul, 2008

1 commit

  • Remove all clameter@sgi.com addresses from the kernel tree since they will
    become invalid on June 27th. Change my maintainer email address for the
    slab allocators to cl@linux-foundation.org (which will be the new email
    address for the future).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Stephen Rothwell
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Jul, 2008

1 commit

  • The 192 byte cache is not necessary if we have a basic alignment of 128
    byte. If it would be used then the 192 would be aligned to the next 128 byte
    boundary which would result in another 256 byte cache. Two 256 kmalloc caches
    cause sysfs to complain about a duplicate entry.

    MIPS needs 128 byte aligned kmalloc caches and spits out warnings on boot without
    this patch.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

26 Jun, 2008

1 commit


23 May, 2008

1 commit

  • Add a WARN_ON for pages that don't have PageSlab nor PageCompound set to catch
    the worst abusers of ksize() in the kernel.

    Acked-by: Christoph Lameter
    Cc: Matt Mackall
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     

09 May, 2008

1 commit


02 May, 2008

2 commits


01 May, 2008

1 commit

  • x86 is the only arch right now, which provides an optimized for
    div_long_long_rem and it has the downside that one has to be very careful that
    the divide doesn't overflow.

    The API is a little akward, as the arguments for the unsigned divide are
    signed. The signed version also doesn't handle a negative divisor and
    produces worse code on 64bit archs.

    There is little incentive to keep this API alive, so this converts the few
    users to the new API.

    Signed-off-by: Roman Zippel
    Cc: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

30 Apr, 2008

1 commit

  • We can see an ever repeating problem pattern with objects of any kind in the
    kernel:

    1) freeing of active objects
    2) reinitialization of active objects

    Both problems can be hard to debug because the crash happens at a point where
    we have no chance to decode the root cause anymore. One problem spot are
    kernel timers, where the detection of the problem often happens in interrupt
    context and usually causes the machine to panic.

    While working on a timer related bug report I had to hack specialized code
    into the timer subsystem to get a reasonable hint for the root cause. This
    debug hack was fine for temporary use, but far from a mergeable solution due
    to the intrusiveness into the timer code.

    The code further lacked the ability to detect and report the root cause
    instantly and keep the system operational.

    Keeping the system operational is important to get hold of the debug
    information without special debugging aids like serial consoles and special
    knowledge of the bug reporter.

    The problems described above are not restricted to timers, but timers tend to
    expose it usually in a full system crash. Other objects are less explosive,
    but the symptoms caused by such mistakes can be even harder to debug.

    Instead of creating specialized debugging code for the timer subsystem a
    generic infrastructure is created which allows developers to verify their code
    and provides an easy to enable debug facility for users in case of trouble.

    The debugobjects core code keeps track of operations on static and dynamic
    objects by inserting them into a hashed list and sanity checking them on
    object operations and provides additional checks whenever kernel memory is
    freed.

    The tracked object operations are:
    - initializing an object
    - adding an object to a subsystem list
    - deleting an object from a subsystem list

    Each operation is sanity checked before the operation is executed and the
    subsystem specific code can provide a fixup function which allows to prevent
    the damage of the operation. When the sanity check triggers a warning message
    and a stack trace is printed.

    The list of operations can be extended if the need arises. For now it's
    limited to the requirements of the first user (timers).

    The core code enqueues the objects into hash buckets. The hash index is
    generated from the address of the object to simplify the lookup for the check
    on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
    global lock.

    The debug code can be compiled in without being active. The runtime overhead
    is minimal and could be optimized by asm alternatives. A kernel command line
    option enables the debugging code.

    Thanks to Ingo Molnar for review, suggestions and cleanup patches.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

29 Apr, 2008

1 commit