16 Jul, 2008

2 commits

  • Conflicts:

    arch/powerpc/Kconfig
    arch/s390/kernel/time.c
    arch/x86/kernel/apic_32.c
    arch/x86/kernel/cpu/perfctr-watchdog.c
    arch/x86/kernel/i8259_64.c
    arch/x86/kernel/ldt.c
    arch/x86/kernel/nmi_64.c
    arch/x86/kernel/smpboot.c
    arch/x86/xen/smp.c
    include/asm-x86/hw_irq_32.h
    include/asm-x86/hw_irq_64.h
    include/asm-x86/mach-default/irq_vectors.h
    include/asm-x86/mach-voyager/irq_vectors.h
    include/asm-x86/smp.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • With the removal of destructors, slab_destroy_objs no longer actually
    destroys any objects, making the kernel doc incorrect and the function
    name misleading.

    In keeping with the other debug functions, rename it to
    slab_destroy_debugcheck and drop the kernel doc.

    Signed-off-by: Rabin Vincent
    Signed-off-by: Pekka Enberg

    Rabin Vincent
     

26 Jun, 2008

1 commit


22 Jun, 2008

1 commit

  • The zonelist patches caused the loop that checks for available
    objects in permitted zones to not terminate immediately. One object
    per zone per allocation may be allocated and then abandoned.

    Break the loop when we have successfully allocated one object.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

30 Apr, 2008

2 commits

  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • We can see an ever repeating problem pattern with objects of any kind in the
    kernel:

    1) freeing of active objects
    2) reinitialization of active objects

    Both problems can be hard to debug because the crash happens at a point where
    we have no chance to decode the root cause anymore. One problem spot are
    kernel timers, where the detection of the problem often happens in interrupt
    context and usually causes the machine to panic.

    While working on a timer related bug report I had to hack specialized code
    into the timer subsystem to get a reasonable hint for the root cause. This
    debug hack was fine for temporary use, but far from a mergeable solution due
    to the intrusiveness into the timer code.

    The code further lacked the ability to detect and report the root cause
    instantly and keep the system operational.

    Keeping the system operational is important to get hold of the debug
    information without special debugging aids like serial consoles and special
    knowledge of the bug reporter.

    The problems described above are not restricted to timers, but timers tend to
    expose it usually in a full system crash. Other objects are less explosive,
    but the symptoms caused by such mistakes can be even harder to debug.

    Instead of creating specialized debugging code for the timer subsystem a
    generic infrastructure is created which allows developers to verify their code
    and provides an easy to enable debug facility for users in case of trouble.

    The debugobjects core code keeps track of operations on static and dynamic
    objects by inserting them into a hashed list and sanity checking them on
    object operations and provides additional checks whenever kernel memory is
    freed.

    The tracked object operations are:
    - initializing an object
    - adding an object to a subsystem list
    - deleting an object from a subsystem list

    Each operation is sanity checked before the operation is executed and the
    subsystem specific code can provide a fixup function which allows to prevent
    the damage of the operation. When the sanity check triggers a warning message
    and a stack trace is printed.

    The list of operations can be extended if the need arises. For now it's
    limited to the requirements of the first user (timers).

    The core code enqueues the objects into hash buckets. The hash index is
    generated from the address of the object to simplify the lookup for the check
    on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
    global lock.

    The debug code can be compiled in without being active. The runtime overhead
    is minimal and could be optimized by asm alternatives. A kernel command line
    option enables the debugging code.

    Thanks to Ingo Molnar for review, suggestions and cleanup patches.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

28 Apr, 2008

4 commits

  • Not all architectures define cache_line_size() so as suggested by Andrew move
    the private implementations in mm/slab.c and mm/slob.c to .

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Reviewed-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • Filtering zonelists requires very frequent use of zone_idx(). This is costly
    as it involves a lookup of another structure and a substraction operation. As
    the zone_idx is often required, it should be quickly accessible. The node idx
    could also be stored here if it was found that accessing zone->node is
    significant which may be the case on workloads where nodemasks are heavily
    used.

    This patch introduces a struct zoneref to store a zone pointer and a zone
    index. The zonelist then consists of an array of these struct zonerefs which
    are looked up as necessary. Helpers are given for accessing the zone index as
    well as the node index.

    [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
    [hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
    [hugh@veritas.com: just return do_try_to_free_pages]
    [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Currently a node has two sets of zonelists, one for each zone type in the
    system and a second set for GFP_THISNODE allocations. Based on the zones
    allowed by a gfp mask, one of these zonelists is selected. All of these
    zonelists consume memory and occupy cache lines.

    This patch replaces the multiple zonelists per-node with two zonelists. The
    first contains all populated zones in the system, ordered by distance, for
    fallback allocations when the target/preferred node has no free pages. The
    second contains all populated zones in the node suitable for GFP_THISNODE
    allocations.

    An iterator macro is introduced called for_each_zone_zonelist() that interates
    through each zone allowed by the GFP flags in the selected zonelist.

    Signed-off-by: Mel Gorman
    Acked-by: Christoph Lameter
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Introduce a node_zonelist() helper function. It is used to lookup the
    appropriate zonelist given a node and a GFP mask. The patch on its own is a
    cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If
    necessary, it can be merged with the next patch in this set without problems.

    Reviewed-by: Christoph Lameter
    Signed-off-by: Mel Gorman
    Signed-off-by: Lee Schermerhorn
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

20 Apr, 2008

1 commit

  • * Use new node_to_cpumask_ptr. This creates a pointer to the
    cpumask for a given node. This definition is in mm patch:

    asm-generic-add-node_to_cpumask_ptr-macro.patch

    * Use new set_cpus_allowed_ptr function.

    Depends on:
    [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
    [sched-devel]: sched: add new set_cpus_allowed_ptr function
    [x86/latest]: x86: add cpus_scnprintf function

    Cc: Greg Kroah-Hartman
    Cc: Greg Banks
    Cc: H. Peter Anvin
    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis
     

27 Mar, 2008

1 commit

  • Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on
    memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
    but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
    patch fixes list_add() corruption in mm/slab.c seen on the ES7000.

    Cc: Mel Gorman
    Cc: Olaf Hering
    Cc: Christoph Lameter
    Signed-off-by: Dan Yeisley
    Signed-off-by: Pekka Enberg
    Signed-off-by: Christoph Lameter

    Daniel Yeisley
     

20 Mar, 2008

1 commit

  • Fix various kernel-doc notation in mm/:

    filemap.c: add function short description; convert 2 to kernel-doc
    fremap.c: change parameter 'prot' to @prot
    pagewalk.c: change "-" in function parameters to ":"
    slab.c: fix short description of kmem_ptr_validate()
    swap.c: fix description & parameters of put_pages_list()
    swap_state.c: fix function parameters
    vmalloc.c: change "@returns" to "Returns:" since that is not a parameter

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

07 Mar, 2008

3 commits

  • NUMA slab allocator cpu migration bugfix

    The NUMA slab allocator (specifically, cache_alloc_refill)
    is not refreshing its local copies of what cpu and what
    numa node it is on, when it drops and reacquires the irq
    block that it inherited from its caller. As a result
    those values become invalid if an attempt to migrate the
    process to another numa node occured while the irq block
    had been dropped.

    The solution is to make cache_alloc_refill reload these
    variables whenever it drops and reacquires the irq block.

    The error is very difficult to hit. When it does occur,
    one gets the following oops + stack traceback bits in
    check_spinlock_acquired:

    kernel BUG at mm/slab.c:2417
    cache_alloc_refill+0xe6
    kmem_cache_alloc+0xd0
    ...

    This patch was developed against 2.6.23, ported to and
    compiled-tested only against 2.6.25-rc4.

    Signed-off-by: Joe Korty
    Signed-off-by: Christoph Lameter

    Joe Korty
     
  • Make them all use angle brackets and the directory name.

    Acked-by: Pekka Enberg
    Signed-off-by: Joe Perches
    Signed-off-by: Christoph Lameter

    Joe Perches
     
  • The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the
    flags passed in.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

15 Feb, 2008

1 commit


26 Jan, 2008

2 commits

  • This patch converts the known per-subsystem mutexes to get_online_cpus
    put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and
    CPU_LOCK_RELEASE hotplug notification events.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Ingo Molnar

    Gautham R Shenoy
     
  • If the node we're booting on doesn't have memory, bootstrapping kmalloc()
    caches resorts to fallback_alloc() which requires ->nodelists set for all
    nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in
    kmem_cache_init().

    As kmem_getpages() is called with GFP_THISNODE set, this used to work before
    because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from
    the wrong node if a node had no memory. So it may have worked accidentally and
    in an unsafe manner because the pages would have been associated with the wrong
    node which could trigger bug ons and locking troubles.

    Tested-by: Mel Gorman
    Tested-by: Olaf Hering
    Reviewed-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    [ With additional one-liner by Olaf Hering - Linus ]
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

25 Jan, 2008

1 commit

  • Partial revert the changes made by 04231b3002ac53f8a64a7bd142fde3fa4b6808c6
    to the kmem_list3 management. On a machine with a memoryless node, this
    BUG_ON was triggering

    static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
    {
    struct list_head *entry;
    struct slab *slabp;
    struct kmem_list3 *l3;
    void *obj;
    int x;

    l3 = cachep->nodelists[nodeid];
    BUG_ON(!l3);

    Signed-off-by: Mel Gorman
    Cc: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: "Aneesh Kumar K.V"
    Cc: Nishanth Aravamudan
    Cc: KAMEZAWA Hiroyuki
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

03 Jan, 2008

1 commit

  • Both SLUB and SLAB really did almost exactly the same thing for
    /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.

    This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
    and SLAB, and shares all the setup code. Maybe SLOB will want this some
    day too.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Dec, 2007

1 commit

  • mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.

    It's used by binfmt_flat, which can be built as a module.

    Signed-off-by: Tetsuo Handa
    Cc: Christoph Lameter
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

01 Dec, 2007

1 commit

  • The database performance group have found that half the cycles spent
    in kmem_cache_free are spent in this one call to BUG_ON. Moving it
    into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
    performance win of almost 0.5% on their particular benchmark.

    The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff
    with the comment that "overhead should be minimal". It may have been
    minimal at the time, but it isn't now.

    [ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the
    performance regression but rather the virt_to_head_page() changes to
    virt_to_cache() that were added later." ]

    Signed-off-by: Matthew Wilcox
    Acked-by: Pekka J Enberg
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

15 Nov, 2007

1 commit


20 Oct, 2007

1 commit


19 Oct, 2007

2 commits

  • This patch fixes memory leak in error path.

    In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming
    cpu hotplug error handling change needs this.

    Cc: Christoph Lameter
    Cc: Gautham R Shenoy
    Acked-by: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and
    CPU_UP_PREPARE handlings from cpuup_callback().

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

17 Oct, 2007

6 commits

  • Since nothing earlier than gcc-3.2 is supported for kernel
    compilation, that 2.95 hack can be removed.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up
    the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
    flags:

    GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior.

    GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur.

    GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on.

    These replace the uses of GFP_LEVEL mask in the slab allocators and in
    vmalloc.c.

    The use of the flags not included in these sets may occur as a result of a
    slab allocation standing in for a page allocation when constructing scatter
    gather lists. Extraneous flags are cleared and not passed through to the
    page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
    now be ignored if passed to a slab allocator.

    Change the allocation of allocator meta data in SLAB and vmalloc to not
    pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the
    __GFP_THISNODE flag for such allocations. Generalize that to also cover
    vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.

    The impact of allocator metadata placement on access latency to the
    cachelines of the object itself is minimal since metadata is only
    referenced on alloc and free. The attempt is still made to place the meta
    data optimally but we consistently allow fallback both in SLAB and vmalloc
    (SLUB does not need to allocate metadata like that).

    Allocator metadata may serve multiple in kernel users and thus should not
    be subject to the limitations arising from a single allocation context.

    [akpm@linux-foundation.org: fix fallback_alloc()]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Slab should not allocate control structures for nodes without memory. This
    may seem to work right now but its unreliable since not all allocations can
    fall back due to the use of GFP_THISNODE.

    Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to
    only allocate for nodes that have regular memory.

    Signed-off-by: Christoph Lameter
    Acked-by: Nishanth Aravamudan
    Acked-by: Lee Schermerhorn
    Acked-by: Bob Picco
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • A NULL pointer means that the object was not allocated. One cannot
    determine the size of an object that has not been allocated. Currently we
    return 0 but we really should BUG() on attempts to determine the size of
    something nonexistent.

    krealloc() interprets NULL to mean a zero sized object. Handle that
    separately in krealloc().

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Aug, 2007

1 commit

  • Skip calling cache_free_alien() when the platform is not numa capable.
    This will avoid cache misses that happen while accessing slabp (which is
    per page memory reference) to get nodeid. Instead use a global variable to
    skip the call, which is mostly likely to be present in the cache.

    This gives a 0.8% performance boost with the database oltp workload on a
    quad-core SMP platform and by any means the number is not small :)

    Signed-off-by: Suresh Siddha
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     

25 Jul, 2007

1 commit

  • Use the correct local variable when calling into the page allocator. Local
    `flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
    page allocator, possibly from illegal contexts. The page allocator will later
    do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
    then go BUG.

    Cc: Mike Galbraith
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

20 Jul, 2007

3 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • I suspect Christoph tested his code only in the NUMA configuration, for
    the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.

    Of course, this would only trigger in configurations where those zero-
    sized allocations happen (not very common), so that may explain why it
    wasn't more widely noticed.

    Seen by by Andi Kleen under qemu, and there seems to be a report by
    Michael Tsirkin on it too.

    Cc: Andi Kleen
    Cc: Roland Dreier
    Cc: Michael S. Tsirkin
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Work around a possible bug in the FRV compiler.

    What appears to be happening is that gcc resolves the
    __builtin_constant_p() in kmalloc() to true, but then fails to reduce the
    therefore constant conditions in the if-statements it guards to constant
    results.

    When compiling with -O2 or -Os, one single spurious error crops up in
    cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
    variable const.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

18 Jul, 2007

2 commits

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • It becomes now easy to support the zeroing allocs with generic inline
    functions in slab.h. Provide inline definitions to allow the continued use of
    kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
    functions from the slab allocators and util.c.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter