25 Jul, 2007

1 commit

  • Use the correct local variable when calling into the page allocator. Local
    `flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
    page allocator, possibly from illegal contexts. The page allocator will later
    do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
    then go BUG.

    Cc: Mike Galbraith
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

20 Jul, 2007

3 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • I suspect Christoph tested his code only in the NUMA configuration, for
    the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.

    Of course, this would only trigger in configurations where those zero-
    sized allocations happen (not very common), so that may explain why it
    wasn't more widely noticed.

    Seen by by Andi Kleen under qemu, and there seems to be a report by
    Michael Tsirkin on it too.

    Cc: Andi Kleen
    Cc: Roland Dreier
    Cc: Michael S. Tsirkin
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Work around a possible bug in the FRV compiler.

    What appears to be happening is that gcc resolves the
    __builtin_constant_p() in kmalloc() to true, but then fails to reduce the
    therefore constant conditions in the if-statements it guards to constant
    results.

    When compiling with -O2 or -Os, one single spurious error crops up in
    cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
    variable const.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

18 Jul, 2007

5 commits

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • It becomes now easy to support the zeroing allocs with generic inline
    functions in slab.h. Provide inline definitions to allow the continued use of
    kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
    functions from the slab allocators and util.c.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • A kernel convention for many allocators is that if __GFP_ZERO is passed to an
    allocator then the allocated memory should be zeroed.

    This is currently not supported by the slab allocators. The inconsistency
    makes it difficult to implement in derived allocators such as in the uncached
    allocator and the pool allocators.

    In addition the support zeroed allocations in the slab allocators does not
    have a consistent API. There are no zeroing allocator functions for NUMA node
    placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are
    only provided for default allocs (kzalloc, kmem_cache_zalloc_node).
    __GFP_ZERO will make zeroing universally available and does not require any
    addititional functions.

    So add the necessary logic to all slab allocators to support __GFP_ZERO.

    The code is added to the hot path. The gfp flags are on the stack and so the
    cacheline is readily available for checking if we want a zeroed object.

    Zeroing while allocating is now a frequent operation and we seem to be
    gradually approaching a 1-1 parity between zeroing and not zeroing allocs.
    The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
    allocators. Move ZERO_SIZE_PTR related stuff into slab.h.

    Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
    WARN_ON_ONCE(size == 0) that is still remaining in SLAB.

    Make slub return NULL like the other allocators if a too large memory segment
    is requested via __kmalloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The size of a kmalloc object is readily available via ksize(). ksize is
    provided by all allocators and thus we can implement krealloc in a generic
    way.

    Implement krealloc in mm/util.c and drop slab specific implementations of
    krealloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

2 commits

  • start_cpu_timer() should be __cpuinit (which also matches what it's
    callers are).

    __devinit didn't cause problems, it simply wasted a few bytes of memory
    for the common CONFIG_HOTPLUG_CPU=n case.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This entry prints a header in .start callback. This is OK, but the more
    elegant solution would be to move this into the .show callback and use
    seq_list_start_head() in .start one.

    I have left it as is in order to make the patch just switch to new API and
    noting more.

    [adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer]
    Signed-off-by: Pavel Emelianov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

06 Jul, 2007

1 commit

  • Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
    by switching the redzone to 64 bits. Unfortunately, it neglected to
    ensure that the _second_ redzone, after the slab object, is aligned
    correctly. This caused illegal instruction faults on sparc32, which for
    some reason not entirely clear to me are not trapped and fixed up.

    Two things need to be done to fix this:
    - increase the object size, rounding up to alignof(long long) so
    that the second redzone can be aligned correctly.
    - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
    full 64 bits of space for the user word at the end of the buffer,
    even though we may not _use_ the whole 64 bits.

    This patch should be a no-op on any 64-bit architecture or any 32-bit
    architecture where alignof(long long) == 4. Of the others, it's tested
    on ppc32 by myself and a very similar patch was tested on sparc32 by
    Mark Fortescue, who reported the new problem.

    Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
    the new sizes. Again noticed by Mark.

    Signed-off-by: David Woodhouse
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

02 Jul, 2007

1 commit


09 Jun, 2007

1 commit

  • cache_free_alien must be called regardless if we use alien caches or not.
    cache_free_alien() will do the right thing if there are no alien caches
    available.

    Signed-off-by: Christoph Lameter
    Cc: Paul Mundt
    Acked-by: Pekka J Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

19 May, 2007

1 commit


17 May, 2007

4 commits

  • Currently we have a maze of configuration variables that determine the
    maximum slab size. Worst of all it seems to vary between SLAB and SLUB.

    So define a common maximum size for kmalloc. For conveniences sake we use
    the maximum size ever supported which is 32 MB. We limit the maximum size
    to a lower limit if MAX_ORDER does not allow such large allocations.

    For many architectures this patch will have the effect of adding large
    kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
    memory will be needed for these caches (contemporary SLAB has dynamically
    sizeable node and cpu structure so the waste is less than in the past)

    Most architectures will then be able to allocate object with sizes up to
    MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
    number of supported processors) on IA64 because one or the other struct
    grew beyond what the slab allocators supported. This will avoid future
    issues and f.e. avoid fixes for 2k and 4k cpu support.

    CONFIG_LARGE_ALLOCS is no longer necessary so drop it.

    It fixes sparc64 with SLAB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • slub warns on this, and we're working on making kmalloc(0) return NULL.
    Let's make slab warn as well so our testers detect such callers more
    rapidly.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • There is no user of destructors left. There is no reason why we should keep
    checking for destructors calls in the slab allocators.

    The RFC for this patch was discussed at
    http://marc.info/?l=linux-kernel&m=117882364330705&w=2

    Destructors were mainly used for list management which required them to take a
    spinlock. Taking a spinlock in a destructor is a bit risky since the slab
    allocators may run the destructors anytime they decide a slab is no longer
    needed.

    Patch drops destructor support. Any attempt to use a destructor will BUG().

    Acked-by: Pekka Enberg
    Acked-by: Paul Mundt
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

6 commits

  • Currently the slab allocators contain callbacks into the page allocator to
    perform the draining of pagesets on remote nodes. This requires SLUB to have
    a whole subsystem in order to be compatible with SLAB. Moving node draining
    out of the slab allocators avoids a section of code in SLUB.

    Move the node draining so that is is done when the vm statistics are updated.
    At that point we are already touching all the cachelines with the pagesets of
    a processor.

    Add a expire counter there. If we have to update per zone or global vm
    statistics then assume that the pageset will require subsequent draining.

    The expire counter will be decremented on each vm stats update pass until it
    reaches zero. Then we will drain one batch from the pageset. The draining
    will cause vm counter updates which will then cause another expiration until
    the pcp is empty. So we will drain a batch every 3 seconds.

    Note that remote node draining is a somewhat esoteric feature that is required
    on large NUMA systems because otherwise significant portions of system memory
    can become trapped in pcp queues. The number of pcp is determined by the
    number of processors and nodes in a system. A system with 4 processors and 2
    nodes has 8 pcps which is okay. But a system with 1024 processors and 512
    nodes has 512k pcps with a high potential for large amount of memory being
    caught in them.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • vmstat is currently using the cache reaper to periodically bring the
    statistics up to date. The cache reaper does only exists in SLUB as a way to
    provide compatibility with SLAB. This patch removes the vmstat calls from the
    slab allocators and provides its own handling.

    The advantage is also that we can use a different frequency for the updates.
    Refreshing vm stats is a pretty fast job so we can run this every second and
    stagger this by only one tick. This will lead to some overlap in large
    systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm
    updates occurring at once.

    However, the vm stats update only accesses per node information. It is only
    necessary to stagger the vm statistics updates per processor in each node. Vm
    counter updates occurring on distant nodes will not cause cacheline
    contention.

    We could implement an alternate approach that runs the first processor on each
    node at the second and then each of the other processor on a node on a
    subsequent tick. That may be useful to keep a large amount of the second free
    of timer activity. Maybe the timer folks will have some feedback on this one?

    [jirislaby@gmail.com: add missing break]
    Cc: Arjan van de Ven
    Signed-off-by: Christoph Lameter
    Signed-off-by: Jiri Slaby
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Shutdown the cache_reaper if the cpu is brought down and set the
    cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good.

    Signed-off-by: Christoph Lameter
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
    introduced.

    Cc: Pekka Enberg
    Cc: Srivatsa Vaddagiri
    Cc: Gautham Shenoy
    Signed-off-by: Heiko Carstens
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • No "blank" (or "*") line is allowed between the function name and lines for
    it parameter(s).

    Cc: Randy Dunlap
    Signed-off-by: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka J Enberg
     

09 May, 2007

2 commits

  • Same story as with cat /proc/*/wchan race vs rmmod race, only
    /proc/slab_allocators want more info than just symbol name.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There are two problems with the existing redzone implementation.

    Firstly, it's causing misalignment of structures which contain a 64-bit
    integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
    modules to fail to load because of the misalignment. (In particular, the
    first check in
    net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

    On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

    With slab debugging, we use 32-bit redzones. And allocated slab objects
    aren't sufficiently aligned to hold a structure containing a uint64_t.

    By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
    redzone checks on those architectures. By using 64-bit redzones we avoid that
    loss of debugging, and also fix the other problem while we're at it.

    When investigating this, I noticed that on 64-bit platforms we're using a
    32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
    aside for the redzone. Which means that the four bytes immediately before
    or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
    machines, respectively. Which is probably not the most useful choice of
    poison value.

    One way to fix both of those at once is just to switch to 64-bit
    redzones in all cases.

    Signed-off-by: David Woodhouse
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

08 May, 2007

13 commits

  • There is no user remaining and I have never seen any use of that flag.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
    that one would want to do something serious in a constructor or destructor.
    In particular given that the slab allocators run with interrupts disabled.
    Actions in constructors and destructors are by their nature very limited
    and usually do not go beyond initializing variables and list operations.

    (The i386 pgd ctor and dtors do take a spinlock in constructor and
    destructor..... I think that is the furthest we go at this point.)

    There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
    establishes a certain symmetry.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Currently failslab injects failures into ____cache_alloc(). But with enabling
    CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
    kmem_cache_alloc, ...) return NULL.

    This patch moves fault injection hook inside of __cache_alloc() and
    __cache_alloc_node(). These are lower call path than ____cache_alloc() and
    enable to inject faulures to slab allocators with CONFIG_NUMA.

    Acked-by: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch was recently posted to lkml and acked by Pekka.

    The flag SLAB_MUST_HWCACHE_ALIGN is

    1. Never checked by SLAB at all.

    2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

    3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

    The only remaining use is in sparc64 and ppc64 and their use there
    reflects some earlier role that the slab flag once may have had. If
    its specified then SLAB_HWCACHE_ALIGN is also specified.

    The flag is confusing, inconsistent and has no purpose.

    Remove it.

    Acked-by: Pekka Enberg
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    matze
     
  • Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If we add a new flag so that we can distinguish between the first page and the
    tail pages then we can avoid to use page->private in the first page.
    page->private == page for the first page, so there is no real information in
    there.

    Freeing up page->private makes the use of compound pages more transparent.
    They become more usable like real pages. Right now we have to be careful f.e.
    if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
    can then no longer use the private field. This is one of the issues that
    cause us not to support debugging for page size slabs in SLAB.

    Having page->private available for SLUB would allow more meta information in
    the page struct. I can probably avoid the 16 bit ints that I have in there
    right now.

    Also if page->private is available then a compound page may be equipped with
    buffer heads. This may free up the way for filesystems to support larger
    blocks than page size.

    We add PageTail as an alias of PageReclaim. Compound pages cannot currently
    be reclaimed. Because of the alias one needs to check PageCompound first.

    The RFC for the this approach was discussed at
    http://marc.info/?t=117574302800001&r=1&w=2

    [nacc@us.ibm.com: fix hugetlbfs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It is only ever used prior to free_initmem().

    (It will cause a warning when we run the section checking, but that's a
    false-positive and it simply changes the source of an existing warning, which
    is also a false-positive)

    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Some NUMA machines have a big MAX_NUMNODES (possibly 1024), but fewer
    possible nodes. This patch dynamically sizes the 'struct kmem_cache' to
    allocate only needed space.

    I moved nodelists[] field at the end of struct kmem_cache, and use the
    following computation in kmem_cache_init()

    cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
    nr_node_ids * sizeof(struct kmem_list3 *);

    On my two nodes x86_64 machine, kmem_cache.obj_size is now 192 instead of 704
    (This is because on x86_64, MAX_NUMNODES is 64)

    On bigger NUMA setups, this might reduce the gfporder of "cache_cache"

    Signed-off-by: Eric Dumazet
    Cc: Pekka Enberg
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • We can avoid allocating empty shared caches and avoid unecessary check of
    cache->limit. We save some memory. We avoid bringing into CPU cache
    unecessary cache lines.

    All accesses to l3->shared are already checking NULL pointers so this patch is
    safe.

    Signed-off-by: Eric Dumazet
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • The existing comment in mm/slab.c is *perfect*, so I reproduce it :

    /*
    * CPU bound tasks (e.g. network routing) can exhibit cpu bound
    * allocation behaviour: Most allocs on one cpu, most free operations
    * on another cpu. For these cases, an efficient object passing between
    * cpus is necessary. This is provided by a shared array. The array
    * replaces Bonwick's magazine layer.
    * On uniprocessor, it's functionally equivalent (but less efficient)
    * to a larger limit. Thus disabled by default.
    */

    As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way
    a preprocessor #if can detect if the machine is UP or SMP. Better to use
    num_possible_cpus().

    This means on UP we allocate a 'size=0 shared array', to be more efficient.

    Another patch can later avoid the allocations of 'empty shared arrays', to
    save some memory.

    Signed-off-by: Eric Dumazet
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • If slab->inuse is corrupted, cache_alloc_refill can enter an infinite
    loop as detailed by Michael Richardson in the following post:
    . This adds a BUG_ON to catch
    those cases.

    Cc: Michael Richardson
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg