17 Oct, 2007

2 commits

  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch marks a number of allocations that are either short-lived such as
    network buffers or are reclaimable such as inode allocations. When something
    like updatedb is called, long-lived and unmovable kernel allocations tend to
    be spread throughout the address space which increases fragmentation.

    This patch groups these allocations together as much as possible by adding a
    new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
    reclaimed on demand, but not moved. i.e. they can be migrated by deleting
    them and re-reading the information from elsewhere.

    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

21 Jul, 2007

1 commit


20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

18 Jul, 2007

2 commits

  • It becomes now easy to support the zeroing allocs with generic inline
    functions in slab.h. Provide inline definitions to allow the continued use of
    kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
    functions from the slab allocators and util.c.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
    allocators. Move ZERO_SIZE_PTR related stuff into slab.h.

    Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
    WARN_ON_ONCE(size == 0) that is still remaining in SLAB.

    Make slub return NULL like the other allocators if a too large memory segment
    is requested via __kmalloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

2 commits

  • This adds preliminary NUMA support to SLOB, primarily aimed at systems with
    small nodes (tested all the way down to a 128kB SRAM block), whether
    asymmetric or otherwise.

    We follow the same conventions as SLAB/SLUB, preferring current node
    placement for new pages, or with explicit placement, if a node has been
    specified. Presently on UP NUMA this has the side-effect of preferring
    node#0 allocations (since numa_node_id() == 0, though this could be
    reworked if we could hand off a pfn to determine node placement), so
    single-CPU NUMA systems will want to place smaller nodes further out in
    terms of node id. Once a page has been bound to a node (via explicit node
    id typing), we only do block allocations from partial free pages that have
    a matching node id in the page flags.

    The current implementation does have some scalability problems, in that all
    partial free pages are tracked in the global freelist (with contention due
    to the single spinlock). However, these are things that are being reworked
    for SMP scalability first, while things like per-node freelists can easily
    be built on top of this sort of functionality once it's been added.

    More background can be found in:

    http://marc.info/?l=linux-mm&m=118117916022379&w=2
    http://marc.info/?l=linux-mm&m=118170446306199&w=2
    http://marc.info/?l=linux-mm&m=118187859420048&w=2

    and subsequent threads.

    Acked-by: Christoph Lameter
    Acked-by: Matt Mackall
    Signed-off-by: Paul Mundt
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • Given that there is no remaining usage of the deprecated kmem_cache_t
    typedef anywhere in the tree, remove that typedef.

    Signed-off-by: Robert P. J. Day
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

24 Jun, 2007

1 commit


17 May, 2007

3 commits

  • Currently we have a maze of configuration variables that determine the
    maximum slab size. Worst of all it seems to vary between SLAB and SLUB.

    So define a common maximum size for kmalloc. For conveniences sake we use
    the maximum size ever supported which is 32 MB. We limit the maximum size
    to a lower limit if MAX_ORDER does not allow such large allocations.

    For many architectures this patch will have the effect of adding large
    kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
    memory will be needed for these caches (contemporary SLAB has dynamically
    sizeable node and cpu structure so the waste is less than in the past)

    Most architectures will then be able to allocate object with sizes up to
    MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
    number of supported processors) on IA64 because one or the other struct
    grew beyond what the slab allocators supported. This will avoid future
    issues and f.e. avoid fixes for 2k and 4k cpu support.

    CONFIG_LARGE_ALLOCS is no longer necessary so drop it.

    It fixes sparc64 with SLAB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Two definitions remained in slab.h that are particular to the SLAB allocator.
    Move to slab_def.h

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 May, 2007

7 commits

  • SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
    that one would want to do something serious in a constructor or destructor.
    In particular given that the slab allocators run with interrupts disabled.
    Actions in constructors and destructors are by their nature very limited
    and usually do not go beyond initializing variables and list operations.

    (The i386 pgd ctor and dtors do take a spinlock in constructor and
    destructor..... I think that is the furthest we go at this point.)

    There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
    establishes a certain symmetry.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch was recently posted to lkml and acked by Pekka.

    The flag SLAB_MUST_HWCACHE_ALIGN is

    1. Never checked by SLAB at all.

    2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

    3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

    The only remaining use is in sparc64 and ppc64 and their use there
    reflects some earlier role that the slab flag once may have had. If
    its specified then SLAB_HWCACHE_ALIGN is also specified.

    The flag is confusing, inconsistent and has no purpose.

    Remove it.

    Acked-by: Pekka Enberg
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Add proper prototypes in include/linux/slab.h.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This introduce krealloc() that reallocates memory while keeping the contents
    unchanged. The allocator avoids reallocation if the new size fits the
    currently used cache. I also added a simple non-optimized version for
    mm/slob.c for compatibility.

    [akpm@linux-foundation.org: fix warnings]
    Acked-by: Josef Sipek
    Acked-by: Matt Mackall
    Acked-by: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

14 Dec, 2006

2 commits

  • More cleanups for slab.h

    1. Remove tabs from weird locations as suggested by Pekka

    2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
    as suggested by Pekka.

    3. Uses static inline for the fallback defs as also suggested by Pekka.

    4. Make kmem_ptr_valid take a const * argument.

    5. Separate the NUMA fallback definitions from the kmalloc_track fallback
    definitions.

    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is a response to an earlier discussion on linux-mm about splitting
    slab.h components per allocator. Patch is against 2.6.19-git11. See
    http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2

    This patch cleans up the slab header definitions. We define the common
    functions of slob and slab in slab.h and put the extra definitions needed
    for slab's kmalloc implementations in . In order to get
    a greater set of common functions we add several empty functions to slob.c
    and also rename slob's kmalloc to __kmalloc.

    Slob does not need any special definitions since we introduce a fallback
    case. If there is no need for a slab implementation to provide its own
    kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
    That is sufficient for SLOB.

    Sort the function in slab.h according to their functionality. First the
    functions operating on struct kmem_cache * then the kmalloc related
    functions followed by special debug and fallback definitions.

    Also redo a lot of comments.

    Signed-off-by: Christoph Lameter ?
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 Dec, 2006

18 commits


04 Oct, 2006

1 commit

  • - rename ____kmalloc to kmalloc_track_caller so that people have a chance
    to guess what it does just from it's name. Add a comment describing it
    for those who don't. Also move it after kmalloc in slab.h so people get
    less confused when they are just looking for kmalloc - move things around
    in slab.c a little to reduce the ifdef mess.

    [penberg@cs.helsinki.fi: Fix up reversed #ifdef]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig