17 Jul, 2007

2 commits

  • start_cpu_timer() should be __cpuinit (which also matches what it's
    callers are).

    __devinit didn't cause problems, it simply wasted a few bytes of memory
    for the common CONFIG_HOTPLUG_CPU=n case.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This entry prints a header in .start callback. This is OK, but the more
    elegant solution would be to move this into the .show callback and use
    seq_list_start_head() in .start one.

    I have left it as is in order to make the patch just switch to new API and
    noting more.

    [adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer]
    Signed-off-by: Pavel Emelianov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

06 Jul, 2007

1 commit

  • Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
    by switching the redzone to 64 bits. Unfortunately, it neglected to
    ensure that the _second_ redzone, after the slab object, is aligned
    correctly. This caused illegal instruction faults on sparc32, which for
    some reason not entirely clear to me are not trapped and fixed up.

    Two things need to be done to fix this:
    - increase the object size, rounding up to alignof(long long) so
    that the second redzone can be aligned correctly.
    - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
    full 64 bits of space for the user word at the end of the buffer,
    even though we may not _use_ the whole 64 bits.

    This patch should be a no-op on any 64-bit architecture or any 32-bit
    architecture where alignof(long long) == 4. Of the others, it's tested
    on ppc32 by myself and a very similar patch was tested on sparc32 by
    Mark Fortescue, who reported the new problem.

    Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
    the new sizes. Again noticed by Mark.

    Signed-off-by: David Woodhouse
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

02 Jul, 2007

1 commit


09 Jun, 2007

1 commit

  • cache_free_alien must be called regardless if we use alien caches or not.
    cache_free_alien() will do the right thing if there are no alien caches
    available.

    Signed-off-by: Christoph Lameter
    Cc: Paul Mundt
    Acked-by: Pekka J Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

19 May, 2007

1 commit


17 May, 2007

4 commits

  • Currently we have a maze of configuration variables that determine the
    maximum slab size. Worst of all it seems to vary between SLAB and SLUB.

    So define a common maximum size for kmalloc. For conveniences sake we use
    the maximum size ever supported which is 32 MB. We limit the maximum size
    to a lower limit if MAX_ORDER does not allow such large allocations.

    For many architectures this patch will have the effect of adding large
    kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
    memory will be needed for these caches (contemporary SLAB has dynamically
    sizeable node and cpu structure so the waste is less than in the past)

    Most architectures will then be able to allocate object with sizes up to
    MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
    number of supported processors) on IA64 because one or the other struct
    grew beyond what the slab allocators supported. This will avoid future
    issues and f.e. avoid fixes for 2k and 4k cpu support.

    CONFIG_LARGE_ALLOCS is no longer necessary so drop it.

    It fixes sparc64 with SLAB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • slub warns on this, and we're working on making kmalloc(0) return NULL.
    Let's make slab warn as well so our testers detect such callers more
    rapidly.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • There is no user of destructors left. There is no reason why we should keep
    checking for destructors calls in the slab allocators.

    The RFC for this patch was discussed at
    http://marc.info/?l=linux-kernel&m=117882364330705&w=2

    Destructors were mainly used for list management which required them to take a
    spinlock. Taking a spinlock in a destructor is a bit risky since the slab
    allocators may run the destructors anytime they decide a slab is no longer
    needed.

    Patch drops destructor support. Any attempt to use a destructor will BUG().

    Acked-by: Pekka Enberg
    Acked-by: Paul Mundt
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

6 commits

  • Currently the slab allocators contain callbacks into the page allocator to
    perform the draining of pagesets on remote nodes. This requires SLUB to have
    a whole subsystem in order to be compatible with SLAB. Moving node draining
    out of the slab allocators avoids a section of code in SLUB.

    Move the node draining so that is is done when the vm statistics are updated.
    At that point we are already touching all the cachelines with the pagesets of
    a processor.

    Add a expire counter there. If we have to update per zone or global vm
    statistics then assume that the pageset will require subsequent draining.

    The expire counter will be decremented on each vm stats update pass until it
    reaches zero. Then we will drain one batch from the pageset. The draining
    will cause vm counter updates which will then cause another expiration until
    the pcp is empty. So we will drain a batch every 3 seconds.

    Note that remote node draining is a somewhat esoteric feature that is required
    on large NUMA systems because otherwise significant portions of system memory
    can become trapped in pcp queues. The number of pcp is determined by the
    number of processors and nodes in a system. A system with 4 processors and 2
    nodes has 8 pcps which is okay. But a system with 1024 processors and 512
    nodes has 512k pcps with a high potential for large amount of memory being
    caught in them.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • vmstat is currently using the cache reaper to periodically bring the
    statistics up to date. The cache reaper does only exists in SLUB as a way to
    provide compatibility with SLAB. This patch removes the vmstat calls from the
    slab allocators and provides its own handling.

    The advantage is also that we can use a different frequency for the updates.
    Refreshing vm stats is a pretty fast job so we can run this every second and
    stagger this by only one tick. This will lead to some overlap in large
    systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm
    updates occurring at once.

    However, the vm stats update only accesses per node information. It is only
    necessary to stagger the vm statistics updates per processor in each node. Vm
    counter updates occurring on distant nodes will not cause cacheline
    contention.

    We could implement an alternate approach that runs the first processor on each
    node at the second and then each of the other processor on a node on a
    subsequent tick. That may be useful to keep a large amount of the second free
    of timer activity. Maybe the timer folks will have some feedback on this one?

    [jirislaby@gmail.com: add missing break]
    Cc: Arjan van de Ven
    Signed-off-by: Christoph Lameter
    Signed-off-by: Jiri Slaby
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Shutdown the cache_reaper if the cpu is brought down and set the
    cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good.

    Signed-off-by: Christoph Lameter
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
    introduced.

    Cc: Pekka Enberg
    Cc: Srivatsa Vaddagiri
    Cc: Gautham Shenoy
    Signed-off-by: Heiko Carstens
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • No "blank" (or "*") line is allowed between the function name and lines for
    it parameter(s).

    Cc: Randy Dunlap
    Signed-off-by: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka J Enberg
     

09 May, 2007

2 commits

  • Same story as with cat /proc/*/wchan race vs rmmod race, only
    /proc/slab_allocators want more info than just symbol name.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There are two problems with the existing redzone implementation.

    Firstly, it's causing misalignment of structures which contain a 64-bit
    integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
    modules to fail to load because of the misalignment. (In particular, the
    first check in
    net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

    On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

    With slab debugging, we use 32-bit redzones. And allocated slab objects
    aren't sufficiently aligned to hold a structure containing a uint64_t.

    By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
    redzone checks on those architectures. By using 64-bit redzones we avoid that
    loss of debugging, and also fix the other problem while we're at it.

    When investigating this, I noticed that on 64-bit platforms we're using a
    32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
    aside for the redzone. Which means that the four bytes immediately before
    or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
    machines, respectively. Which is probably not the most useful choice of
    poison value.

    One way to fix both of those at once is just to switch to 64-bit
    redzones in all cases.

    Signed-off-by: David Woodhouse
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

08 May, 2007

14 commits

  • There is no user remaining and I have never seen any use of that flag.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
    that one would want to do something serious in a constructor or destructor.
    In particular given that the slab allocators run with interrupts disabled.
    Actions in constructors and destructors are by their nature very limited
    and usually do not go beyond initializing variables and list operations.

    (The i386 pgd ctor and dtors do take a spinlock in constructor and
    destructor..... I think that is the furthest we go at this point.)

    There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
    establishes a certain symmetry.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Currently failslab injects failures into ____cache_alloc(). But with enabling
    CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
    kmem_cache_alloc, ...) return NULL.

    This patch moves fault injection hook inside of __cache_alloc() and
    __cache_alloc_node(). These are lower call path than ____cache_alloc() and
    enable to inject faulures to slab allocators with CONFIG_NUMA.

    Acked-by: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch was recently posted to lkml and acked by Pekka.

    The flag SLAB_MUST_HWCACHE_ALIGN is

    1. Never checked by SLAB at all.

    2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

    3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

    The only remaining use is in sparc64 and ppc64 and their use there
    reflects some earlier role that the slab flag once may have had. If
    its specified then SLAB_HWCACHE_ALIGN is also specified.

    The flag is confusing, inconsistent and has no purpose.

    Remove it.

    Acked-by: Pekka Enberg
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    matze
     
  • Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If we add a new flag so that we can distinguish between the first page and the
    tail pages then we can avoid to use page->private in the first page.
    page->private == page for the first page, so there is no real information in
    there.

    Freeing up page->private makes the use of compound pages more transparent.
    They become more usable like real pages. Right now we have to be careful f.e.
    if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
    can then no longer use the private field. This is one of the issues that
    cause us not to support debugging for page size slabs in SLAB.

    Having page->private available for SLUB would allow more meta information in
    the page struct. I can probably avoid the 16 bit ints that I have in there
    right now.

    Also if page->private is available then a compound page may be equipped with
    buffer heads. This may free up the way for filesystems to support larger
    blocks than page size.

    We add PageTail as an alias of PageReclaim. Compound pages cannot currently
    be reclaimed. Because of the alias one needs to check PageCompound first.

    The RFC for the this approach was discussed at
    http://marc.info/?t=117574302800001&r=1&w=2

    [nacc@us.ibm.com: fix hugetlbfs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It is only ever used prior to free_initmem().

    (It will cause a warning when we run the section checking, but that's a
    false-positive and it simply changes the source of an existing warning, which
    is also a false-positive)

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Some NUMA machines have a big MAX_NUMNODES (possibly 1024), but fewer
    possible nodes. This patch dynamically sizes the 'struct kmem_cache' to
    allocate only needed space.

    I moved nodelists[] field at the end of struct kmem_cache, and use the
    following computation in kmem_cache_init()

    cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
    nr_node_ids * sizeof(struct kmem_list3 *);

    On my two nodes x86_64 machine, kmem_cache.obj_size is now 192 instead of 704
    (This is because on x86_64, MAX_NUMNODES is 64)

    On bigger NUMA setups, this might reduce the gfporder of "cache_cache"

    Signed-off-by: Eric Dumazet
    Cc: Pekka Enberg
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • We can avoid allocating empty shared caches and avoid unecessary check of
    cache->limit. We save some memory. We avoid bringing into CPU cache
    unecessary cache lines.

    All accesses to l3->shared are already checking NULL pointers so this patch is
    safe.

    Signed-off-by: Eric Dumazet
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • The existing comment in mm/slab.c is *perfect*, so I reproduce it :

    /*
    * CPU bound tasks (e.g. network routing) can exhibit cpu bound
    * allocation behaviour: Most allocs on one cpu, most free operations
    * on another cpu. For these cases, an efficient object passing between
    * cpus is necessary. This is provided by a shared array. The array
    * replaces Bonwick's magazine layer.
    * On uniprocessor, it's functionally equivalent (but less efficient)
    * to a larger limit. Thus disabled by default.
    */

    As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way
    a preprocessor #if can detect if the machine is UP or SMP. Better to use
    num_possible_cpus().

    This means on UP we allocate a 'size=0 shared array', to be more efficient.

    Another patch can later avoid the allocations of 'empty shared arrays', to
    save some memory.

    Signed-off-by: Eric Dumazet
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • If slab->inuse is corrupted, cache_alloc_refill can enter an infinite
    loop as detailed by Michael Richardson in the following post:
    . This adds a BUG_ON to catch
    those cases.

    Cc: Michael Richardson
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • This introduce krealloc() that reallocates memory while keeping the contents
    unchanged. The allocator avoids reallocation if the new size fits the
    currently used cache. I also added a simple non-optimized version for
    mm/slob.c for compatibility.

    [akpm@linux-foundation.org: fix warnings]
    Acked-by: Josef Sipek
    Acked-by: Matt Mackall
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

03 May, 2007

1 commit

  • Set use_alien_caches to 0 on non NUMA platforms. And avoid calling the
    cache_free_alien() when use_alien_caches is not set. This will avoid the
    cache miss that happens while dereferencing slabp to get nodeid.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Eric Dumazet
    Cc: David Rientjes
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton

    Siddha, Suresh B
     

04 Apr, 2007

1 commit


02 Mar, 2007

1 commit


21 Feb, 2007

1 commit

  • The alien cache is a per cpu per node array allocated for every slab on the
    system. Currently we size this array for all nodes that the kernel does
    support. For IA64 this is 1024 nodes. So we allocate an array with 1024
    objects even if we only boot a system with 4 nodes.

    This patch uses "nr_node_ids" to determine the number of possible nodes
    supported by a hardware configuration and only allocates an alien cache
    sized for possible nodes.

    The initialization of nr_node_ids occurred too late relative to the bootstrap
    of the slab allocator and so I moved the setup_nr_node_ids() into
    free_area_init_nodes().

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

12 Feb, 2007

4 commits

  • A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
    source files, including:

    * make multi-line initial descriptions single line
    * denote some function names, constants and structs as such
    * change erroneous opening '/*' to '/**' in a few places
    * reword some text for clarity

    Signed-off-by: Robert P. J. Day
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • kmem_cache_free() was missing the check for freeing held locks.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Make ZONE_DMA optional in core code.

    - ifdef all code for ZONE_DMA and related definitions following the example
    for ZONE_DMA32 and ZONE_HIGHMEM.

    - Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
    0.

    - Modify the VM statistics to work correctly without a DMA zone.

    - Modify slab to not create DMA slabs if there is no ZONE_DMA.

    [akpm@osdl.org: cleanup]
    [jdike@addtoit.com: build fix]
    [apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
    Signed-off-by: Christoph Lameter
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: James Bottomley
    Cc: Paul Mundt
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Use the pointer passed to cache_reap to determine the work pointer and
    consolidate exit paths.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter