12 Sep, 2007

1 commit

  • This was posted on Aug 28 and fixes an issue that could cause troubles
    when slab caches >=128k are created.

    http://marc.info/?l=linux-mm&m=118798149918424&w=2

    Currently we simply add the debug flags unconditional when checking for a
    matching slab. This creates issues for sysfs processing when slabs exist
    that are exempt from debugging due to their huge size or because only a
    subset of slabs was selected for debugging.

    We need to only add the flags if kmem_cache_open() would also add them.

    Create a function to calculate the flags that would be set
    if the cache would be opened and use that function to determine
    the flags before looking for a compatible slab.

    [akpm@linux-foundation.org: fixlets]
    Signed-off-by: Christoph Lameter
    Cc: Chuck Ebbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

31 Aug, 2007

1 commit

  • Do not BUG() if we cannot register a slab with sysfs. Just print an error.
    The only consequence of not registering is that the slab cache is not
    visible via /sys/slab. A BUG() may not be visible that early during boot
    and we have had multiple issues here already.

    Signed-off-by: Christoph Lameter
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Aug, 2007

2 commits


10 Aug, 2007

2 commits

  • The dynamic dma kmalloc creation can run into trouble if a
    GFP_ATOMIC allocation is the first one performed for a certain size
    of dma kmalloc slab.

    - Move the adding of the slab to sysfs into a workqueue
    (sysfs does GFP_KERNEL allocations)
    - Do not call kmem_cache_destroy() (uses slub_lock)
    - Only acquire the slub_lock once and--if we cannot wait--do a trylock.

    This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
    for a range of sizes failing due to another process holding the slub_lock.
    However, we only need to acquire the spinlock once in order to establish
    each power of two DMA kmalloc cache. The possible conflict is with the
    slub_lock taken during slab management actions (create / remove slab cache).

    It is rather typical that a driver will first fill its buffers using
    GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
    Drivers will also create its slab caches first outside of an atomic
    context before starting to use atomic kmalloc from an interrupt context.

    If there are any failures then they will occur early after boot or when
    loading of multiple drivers concurrently. Drivers can already accomodate
    failures of GFP_ATOMIC for other reasons. Retries will then create the slab.

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     
  • The MAX_PARTIAL checks were supposed to be an optimization. However, slab
    shrinking is a manually triggered process either through running slabinfo
    or by the kernel calling kmem_cache_shrink.

    If one really wants to shrink a slab then all operations should be done
    regardless of the size of the partial list. This also fixes an issue that
    could surface if the number of partial slabs was initially above MAX_PARTIAL
    in kmem_cache_shrink and later drops below MAX_PARTIAL through the
    elimination of empty slabs on the partial list (rare). In that case a few
    slabs may be left off the partial list (and only be put back when they
    are empty).

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

31 Jul, 2007

2 commits


20 Jul, 2007

2 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • The slab and slob allocators already did this right, but slub would call
    "get_object_page()" on the magic ZERO_SIZE_PTR, with all kinds of nasty
    end results.

    Noted by Ingo Molnar.

    Cc: Ingo Molnar
    Cc: Christoph Lameter
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Jul, 2007

20 commits

  • We currently cannot disable CONFIG_SLUB_DEBUG for CONFIG_NUMA. Now that
    embedded systems start to use NUMA we may need this.

    Put an #ifdef around places where NUMA only code uses fields only valid
    for CONFIG_SLUB_DEBUG.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Sysfs can do a gazillion things when called. Make sure that we do not call
    any sysfs functions while holding the slub_lock.

    Just protect the essentials:

    1. The list of all slab caches
    2. The kmalloc_dma array
    3. The ref counters of the slabs.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The objects per slab increase with the current patches in mm since we allow up
    to order 3 allocs by default. More patches in mm actually allow to use 2M or
    higher sized slabs. For slab validation we need per object bitmaps in order
    to check a slab. We end up with up to 64k objects per slab resulting in a
    potential requirement of 8K stack space. That does not look good.

    Allocate the bit arrays via kmalloc.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It becomes now easy to support the zeroing allocs with generic inline
    functions in slab.h. Provide inline definitions to allow the continued use of
    kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
    functions from the slab allocators and util.c.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • We can get to the length of the object through the kmem_cache_structure. The
    additional parameter does no good and causes the compiler to generate bad
    code.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Do proper spacing and we only need to do this in steps of 8.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Adrian Bunk
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • There is no need to caculate the dma slab size ourselves. We can simply
    lookup the size of the corresponding non dma slab.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • kmalloc_index is a long series of comparisons. The attempt to replace
    kmalloc_index with something more efficient like ilog2 failed due to compiler
    issues with constant folding on gcc 3.3 / powerpc.

    kmalloc_index()'es long list of comparisons works fine for constant folding
    since all the comparisons are optimized away. However, SLUB also uses
    kmalloc_index to determine the slab to use for the __kmalloc_xxx functions.
    This leads to a large set of comparisons in get_slab().

    The patch here allows to get rid of that list of comparisons in get_slab():

    1. If the requested size is larger than 192 then we can simply use
    fls to determine the slab index since all larger slabs are
    of the power of two type.

    2. If the requested size is smaller then we cannot use fls since there
    are non power of two caches to be considered. However, the sizes are
    in a managable range. So we divide the size by 8. Then we have only
    24 possibilities left and then we simply look up the kmalloc index
    in a table.

    Code size of slub.o decreases by more than 200 bytes through this patch.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • We modify the kmalloc_cache_dma[] array without proper locking. Do the proper
    locking and undo the dma cache creation if another processor has already
    created it.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The rarely used dma functionality in get_slab() makes the function too
    complex. The compiler begins to spill variables from the working set onto the
    stack. The created function is only used in extremely rare cases so make sure
    that the compiler does not decide on its own to merge it back into get_slab().

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Add #ifdefs around data structures only needed if debugging is compiled into
    SLUB.

    Add inlines to small functions to reduce code size.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • A kernel convention for many allocators is that if __GFP_ZERO is passed to an
    allocator then the allocated memory should be zeroed.

    This is currently not supported by the slab allocators. The inconsistency
    makes it difficult to implement in derived allocators such as in the uncached
    allocator and the pool allocators.

    In addition the support zeroed allocations in the slab allocators does not
    have a consistent API. There are no zeroing allocator functions for NUMA node
    placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are
    only provided for default allocs (kzalloc, kmem_cache_zalloc_node).
    __GFP_ZERO will make zeroing universally available and does not require any
    addititional functions.

    So add the necessary logic to all slab allocators to support __GFP_ZERO.

    The code is added to the hot path. The gfp flags are on the stack and so the
    cacheline is readily available for checking if we want a zeroed object.

    Zeroing while allocating is now a frequent operation and we seem to be
    gradually approaching a 1-1 parity between zeroing and not zeroing allocs.
    The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
    allocators. Move ZERO_SIZE_PTR related stuff into slab.h.

    Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
    WARN_ON_ONCE(size == 0) that is still remaining in SLAB.

    Make slub return NULL like the other allocators if a too large memory segment
    is requested via __kmalloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The size of a kmalloc object is readily available via ksize(). ksize is
    provided by all allocators and thus we can implement krealloc in a generic
    way.

    Implement krealloc in mm/util.c and drop slab specific implementations of
    krealloc.

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The function we are calling to initialize object debug state during early NUMA
    bootstrap sets up an inactive object giving it the wrong redzone signature.
    The bootstrap nodes are active objects and should have active redzone
    signatures.

    Currently slab validation complains and reverts the object to active state.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Currently SLUB has no provision to deal with too high page orders that may
    be specified on the kernel boot line. If an order higher than 6 (on a 4k
    platform) is generated then we will BUG() because slabs get more than 65535
    objects.

    Add some logic that decreases order for slabs that have too many objects.
    This allow booting with slab sizes up to MAX_ORDER.

    For example

    slub_min_order=10

    will boot with a default slab size of 4M and reduce slab sizes for small
    object sizes to lower orders if the number of objects becomes too big.
    Large slab sizes like that allow a concentration of objects of the same
    slab cache under as few as possible TLB entries and thus potentially
    reduces TLB pressure.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • We currently have to do an GFP_ATOMIC allocation because the list_lock is
    already taken when we first allocate memory for tracking allocation
    information. It would be better if we could avoid atomic allocations.

    Allocate a size of the tracking table that is usually sufficient (one page)
    before we take the list lock. We will then only do the atomic allocation
    if we need to resize the table to become larger than a page (mostly only
    needed under large NUMA because of the tracking of cpus and nodes otherwise
    the table stays small).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Use list_for_each_entry() instead of list_for_each().

    Get rid of for_all_slabs(). It had only one user. So fold it into the
    callback. This also gets rid of cpu_slab_flush.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Changes the error reporting format to loosely follow lockdep.

    If data corruption is detected then we generate the following lines:

    ============================================
    BUG :
    --------------------------------------------

    INFO: [possibly multiple times]

    FIX :

    This also adds some more intelligence to the data corruption detection. Its
    now capable of figuring out the start and end.

    Add a comment on how to configure SLUB so that a production system may
    continue to operate even though occasional slab corruption occur through
    a misbehaving kernel component. See "Emergency operations" in
    Documentation/vm/slub.txt.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jul, 2007

1 commit

  • Add a new configuration variable

    CONFIG_SLUB_DEBUG_ON

    If set then the kernel will be booted by default with slab debugging
    switched on. Similar to CONFIG_SLAB_DEBUG. By default slab debugging
    is available but must be enabled by specifying "slub_debug" as a
    kernel parameter.

    Also add support to switch off slab debugging for a kernel that was
    built with CONFIG_SLUB_DEBUG_ON. This works by specifying

    slub_debug=-

    as a kernel parameter.

    Dave Jones wanted this feature.
    http://marc.info/?l=linux-kernel&m=118072189913045&w=2

    [akpm@linux-foundation.org: clean up switch statement]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Jul, 2007

1 commit

  • kmem_cache_open is static. EXPORT_SYMBOL was leftover from some earlier
    time period where kmem_cache_open was usable outside of slub.

    (Fixes powerpc build error)

    Signed-off-by: Chrsitoph Lameter
    Cc: Johannes Berg
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Jul, 2007

1 commit


24 Jun, 2007

1 commit


17 Jun, 2007

2 commits

  • If ARCH_KMALLOC_MINALIGN is set to a value greater than 8 (SLUBs smallest
    kmalloc cache) then SLUB may generate duplicate slabs in sysfs (yes again)
    because the object size is padded to reach ARCH_KMALLOC_MINALIGN. Thus the
    size of the small slabs is all the same.

    No arch sets ARCH_KMALLOC_MINALIGN larger than 8 though except mips which
    for some reason wants a 128 byte alignment.

    This patch increases the size of the smallest cache if
    ARCH_KMALLOC_MINALIGN is greater than 8. In that case more and more of the
    smallest caches are disabled.

    If we do that then the count of the active general caches that is displayed
    on boot is not correct anymore since we may skip elements of the kmalloc
    array. So count them separately.

    This approach was tested by Havard yesterday.

    Signed-off-by: Christoph Lameter
    Cc: Haavard Skinnemoen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The data structure to manage the information gathered about functions
    allocating and freeing objects is allocated when the list_lock has already
    been taken. We need to allocate with GFP_ATOMIC instead of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Jun, 2007

1 commit

  • Instead of returning the smallest available object return ZERO_SIZE_PTR.

    A ZERO_SIZE_PTR can be legitimately used as an object pointer as long as it
    is not deferenced. The dereference of ZERO_SIZE_PTR causes a distinctive
    fault. kfree can handle a ZERO_SIZE_PTR in the same way as NULL.

    This enables functions to use zero sized object. e.g. n = number of objects.

    objects = kmalloc(n * sizeof(object));

    for (i = 0; i < n; i++)
    objects[i].x = y;

    kfree(objects);

    Signed-off-by: Christoph Lameter
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Jun, 2007

1 commit


31 May, 2007

1 commit

  • We need this patch in ASAP. Patch fixes the mysterious hang that remained
    on some particular configurations with lockdep on after the first fix that
    moved the #idef CONFIG_SLUB_DEBUG to the right location. See
    http://marc.info/?t=117963072300001&r=1&w=2

    The kmem_cache_node cache is very special because it is needed for NUMA
    bootstrap. Under certain conditions (like for example if lockdep is
    enabled and significantly increases the size of spinlock_t) the structure
    may become exactly the size as one of the larger caches in the kmalloc
    array.

    That early during bootstrap we cannot perform merging properly. The unique
    id for the kmem_cache_node cache will match one of the kmalloc array.
    Sysfs will complain about a duplicate directory entry. All of this occurs
    while the console is not yet fully operational. Thus boot may appear to be
    silently failing.

    The kmem_cache_node cache is very special. During early boostrap the main
    allocation function is not operational yet and so we have to run our own
    small special alloc function during early boot. It is also special in that
    it is never freed.

    We really do not want any merging on that cache. Set the refcount -1 and
    forbid merging of slabs that have a negative refcount.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

24 May, 2007

1 commit

  • The check for super sized slabs where we can no longer move the free
    pointer behind the object for debugging purposes etc is accessing a
    field that is not setup yet. We must use objsize here since the size of
    the slab has not been determined yet.

    The effect of this is that a global slab shrink via "slabinfo -s" will
    show errors about offsets being wrong if booted with slub_debug.
    Potentially there are other troubles with huge slabs under slub_debug
    because the calculated free pointer offset is truncated.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter