09 May, 2007

10 commits

  • Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • This implements deferred IO support in fbdev. Deferred IO is a way to delay
    and repurpose IO. This implementation is done using mm's page_mkwrite and
    page_mkclean hooks in order to detect, delay and then rewrite IO. This
    functionality is used by hecubafb.

    [adaplas]
    This is useful for graphics hardware with no directly addressable/mappable
    framebuffer. Implementing this will allow the "framebuffer" to be accesible
    from user space via mmap().

    Signed-off-by: Jaya Kumar
    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaya Kumar
     
  • Same story as with cat /proc/*/wchan race vs rmmod race, only
    /proc/slab_allocators want more info than just symbol name.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove do_sync_file_range() and convert callers to just use
    do_sync_mapping_range().

    Signed-off-by: Mark Fasheh
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     
  • This patch moves the die notifier handling to common code. Previous
    various architectures had exactly the same code for it. Note that the new
    code is compiled unconditionally, this should be understood as an appel to
    the other architecture maintainer to implement support for it aswell (aka
    sprinkling a notify_die or two in the proper place)

    arm had a notifiy_die that did something totally different, I renamed it to
    arm_notify_die as part of the patch and made it static to the file it's
    declared and used at. avr32 used to pass slightly less information through
    this interface and I brought it into line with the other architectures.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
    [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Russell King
    Signed-off-by: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Cleanup: setting an outstanding error on a mapping was open coded too many
    times. Factor it out in mapping_set_error().

    Signed-off-by: Guillaume Chazarain
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guillaume Chazarain
     
  • This patch is add white list into modpost.c for some functions and
    ia64's section to fix section mismatchs.

    sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
    at boot time, and kmalloc/vmalloc at hotplug time. If config
    memory hotplug is on, there are references of bootmem allocater(init text)
    from them (normal text). This is cause of section mismatch.

    Bootmem is called by many functions and it must be
    used only at boot time. I think __init of them should keep for
    section mismatch check. So, I would like to register sparse_index_alloc()
    and zone_wait_table_init() into white list.

    In addition, ia64's .machvec section is function table of some platform
    dependent code. It is mixture of .init.text and normal text. These
    reference of __init functions are valid too.

    Signed-off-by: Yasunori Goto
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This is to fix many section mismatches of code related to memory hotplug.
    I checked compile with memory hotplug on/off on ia64 and x86-64 box.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Monakhov Dmitriy
    Cc: Christoph Hellwig
    Acked-by: Anton Altaparmakov
    Acked-by: David Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitriy Monakhov
     
  • There are two problems with the existing redzone implementation.

    Firstly, it's causing misalignment of structures which contain a 64-bit
    integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
    modules to fail to load because of the misalignment. (In particular, the
    first check in
    net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

    On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

    With slab debugging, we use 32-bit redzones. And allocated slab objects
    aren't sufficiently aligned to hold a structure containing a uint64_t.

    By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
    redzone checks on those architectures. By using 64-bit redzones we avoid that
    loss of debugging, and also fix the other problem while we're at it.

    When investigating this, I noticed that on 64-bit platforms we're using a
    32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
    aside for the redzone. Which means that the four bytes immediately before
    or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
    machines, respectively. Which is probably not the most useful choice of
    poison value.

    One way to fix both of those at once is just to switch to 64-bit
    redzones in all cases.

    Signed-off-by: David Woodhouse
    Acked-by: Pekka Enberg
    Cc: Christoph Lameter
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

08 May, 2007

30 commits

  • The newly merged SLUB allocator patches had been generated before the
    removal of "struct subsystem", and ended up applying fine, but wouldn't
    build based on the current tree as a result.

    Fix up that merge error - not that SLUB is likely really ready for
    showtime yet, but at least I can fix the trivial stuff.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
    [SERIAL] sunsu: Fix section mismatch warnings.
    [SPARC64]: pgtable_cache_init() should be __init.
    [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/prom.c
    [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/pci.c
    [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/console.c
    [MM]: sparse_init() should be __init.
    [SPARC64]: Update defconfig.
    [VIDEO]: Add Sun XVR-2500 framebuffer driver.
    [VIDEO]: Add Sun XVR-500 framebuffer driver.
    [SPARC64]: SUN4U PCI-E controller support.
    [SPARC]: Fix comment typo in smp4m_blackbox_current().
    [SCSI] SUNESP: sun_esp.c needs linux/delay.h

    Fix up conflict in arch/sparc64/mm/init.c manually due to removal of
    pgtable_cache_init() through the -mm patches (even though that patch was
    also by David ;)

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it
    happens between try_to_freeze() and prepare_to_wait(). To prevent this
    from happening we should check freezing(current) before calling schedule().

    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with
    calls to inline functions that can be changed in subsequent patches without
    modifying the code calling them.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • SLOB doesn't calculate correct page order when page size is not 4KB. This
    patch fixes it with using get_order() instead of find_order() which is SLOB
    version of get_order().

    Signed-off-by: Akinobu Mita
    Acked-by: Matt Mackall
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • There is no user remaining and I have never seen any use of that flag.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
    that one would want to do something serious in a constructor or destructor.
    In particular given that the slab allocators run with interrupts disabled.
    Actions in constructors and destructors are by their nature very limited
    and usually do not go beyond initializing variables and list operations.

    (The i386 pgd ctor and dtors do take a spinlock in constructor and
    destructor..... I think that is the furthest we go at this point.)

    There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
    establishes a certain symmetry.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove the hugetlbfs specific hacks in toplevel get_unmapped_area() now that
    all archs and hugetlbfs itself do the right thing for both cases.

    Signed-off-by: Benjamin Herrenschmidt
    Acked-by: William Irwin
    Cc: Paul Mackerras
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: David Howells
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Kyle McMartin
    Cc: Grant Grundler
    Cc: Matthew Wilcox
    Cc: "David S. Miller"
    Cc: Adam Litke
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • generic arch_get_unmapped_area() now handles MAP_FIXED. Now that all
    implementations have been fixed, change the toplevel get_unmapped_area() to
    call into arch or drivers for the MAP_FIXED case.

    Signed-off-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: David Howells
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Kyle McMartin
    Cc: Grant Grundler
    Cc: Matthew Wilcox
    Cc: "David S. Miller"
    Cc: William Irwin
    Cc: Adam Litke
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Fixes a deadlock in the OOM killer for allocations that are not
    __GFP_HARDWALL.

    Before the OOM killer checks for the allocation constraint, it takes
    callback_mutex.

    constrained_alloc() iterates through each zone in the allocation zonelist
    and calls cpuset_zone_allowed_softwall() to determine whether an allocation
    for gfp_mask is possible. If a zone's node is not in the OOM-triggering
    task's mems_allowed, it is not exiting, and we did not fail on a
    __GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
    callback_mutex to check the nearest exclusive ancestor of current's cpuset.
    This results in deadlock.

    We now take callback_mutex after iterating through the zonelist since we
    don't need it yet.

    Cc: Andi Kleen
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Martin J. Bligh
    Signed-off-by: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The current panic_on_oom may not work if there is a process using
    cpusets/mempolicy, because other nodes' memory may remain. But some people
    want failover by panic ASAP even if they are used. This patch makes new
    setting for its request.

    This is tested on my ia64 box which has 3 nodes.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Benjamin LaHaise
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Cc: Ethan Solomita
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Currently failslab injects failures into ____cache_alloc(). But with enabling
    CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
    kmem_cache_alloc, ...) return NULL.

    This patch moves fault injection hook inside of __cache_alloc() and
    __cache_alloc_node(). These are lower call path than ____cache_alloc() and
    enable to inject faulures to slab allocators with CONFIG_NUMA.

    Acked-by: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch was recently posted to lkml and acked by Pekka.

    The flag SLAB_MUST_HWCACHE_ALIGN is

    1. Never checked by SLAB at all.

    2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

    3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

    The only remaining use is in sparc64 and ppc64 and their use there
    reflects some earlier role that the slab flag once may have had. If
    its specified then SLAB_HWCACHE_ALIGN is also specified.

    The flag is confusing, inconsistent and has no purpose.

    Remove it.

    Acked-by: Pekka Enberg
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Avoid down_write of the mmap_sem in madvise when we can help it.

    Acked-by: Hugh Dickins
    Signed-off-by: Nick Piggin
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    matze
     
  • kmem_cache_create() for slob doesn't handle SLAB_PANIC.

    Signed-off-by: Matt Mackall
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • On x86_64 this cuts allocation overhead for page table pages down to a
    fraction (kernel compile / editing load. TSC based measurement of times spend
    in each function):

    no quicklist

    pte_alloc 1569048 4.3s(401ns/2.7us/179.7us)
    pmd_alloc 780988 2.1s(337ns/2.7us/86.1us)
    pud_alloc 780072 2.2s(424ns/2.8us/300.6us)
    pgd_alloc 260022 1s(920ns/4us/263.1us)

    quicklist:

    pte_alloc 452436 573.4ms(8ns/1.3us/121.1us)
    pmd_alloc 196204 174.5ms(7ns/889ns/46.1us)
    pud_alloc 195688 172.4ms(7ns/881ns/151.3us)
    pgd_alloc 65228 9.8ms(8ns/150ns/6.1us)

    pgd allocations are the most complex and there we see the most dramatic
    improvement (may be we can cut down the amount of pgds cached somewhat?). But
    even the pte allocations still see a doubling of performance.

    1. Proven code from the IA64 arch.

    The method used here has been fine tuned for years and
    is NUMA aware. It is based on the knowledge that accesses
    to page table pages are sparse in nature. Taking a page
    off the freelists instead of allocating a zeroed pages
    allows a reduction of number of cachelines touched
    in addition to getting rid of the slab overhead. So
    performance improves. This is particularly useful if pgds
    contain standard mappings. We can save on the teardown
    and setup of such a page if we have some on the quicklists.
    This includes avoiding lists operations that are otherwise
    necessary on alloc and free to track pgds.

    2. Light weight alternative to use slab to manage page size pages

    Slab overhead is significant and even page allocator use
    is pretty heavy weight. The use of a per cpu quicklist
    means that we touch only two cachelines for an allocation.
    There is no need to access the page_struct (unless arch code
    needs to fiddle around with it). So the fast past just
    means bringing in one cacheline at the beginning of the
    page. That same cacheline may then be used to store the
    page table entry. Or a second cacheline may be used
    if the page table entry is not in the first cacheline of
    the page. The current code will zero the page which means
    touching 32 cachelines (assuming 128 byte). We get down
    from 32 to 2 cachelines in the fast path.

    3. x86_64 gets lightweight page table page management.

    This will allow x86_64 arch code to faster repopulate pgds
    and other page table entries. The list operations for pgds
    are reduced in the same way as for i386 to the point where
    a pgd is allocated from the page allocator and when it is
    freed back to the page allocator. A pgd can pass through
    the quicklists without having to be reinitialized.

    64 Consolidation of code from multiple arches

    So far arches have their own implementation of quicklist
    management. This patch moves that feature into the core allowing
    an easier maintenance and consistent management of quicklists.

    Page table pages have the characteristics that they are typically zero or in a
    known state when they are freed. This is usually the exactly same state as
    needed after allocation. So it makes sense to build a list of freed page
    table pages and then consume the pages already in use first. Those pages have
    already been initialized correctly (thus no need to zero them) and are likely
    already cached in such a way that the MMU can use them most effectively. Page
    table pages are used in a sparse way so zeroing them on allocation is not too
    useful.

    Such an implementation already exits for ia64. Howver, that implementation
    did not support constructors and destructors as needed by i386 / x86_64. It
    also only supported a single quicklist. The implementation here has
    constructor and destructor support as well as the ability for an arch to
    specify how many quicklists are needed.

    Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one
    quicklist is necessary then we can define NR_QUICK for additional lists. F.e.
    i386 needs two and thus has

    config NR_QUICK
    int
    default 2

    If an arch has requested quicklist support then pages can be allocated
    from the quicklist (or from the page allocator if the quicklist is
    empty) via:

    quicklist_alloc(, , )

    Page table pages can be freed using:

    quicklist_free(, , )

    Pages must have a definite state after allocation and before
    they are freed. If no constructor is specified then pages
    will be zeroed on allocation and must be zeroed before they are
    freed.

    If a constructor is used then the constructor will establish
    a definite page state. F.e. the i386 and x86_64 pgd constructors
    establish certain mappings.

    Constructors and destructors can also be used to track the pages.
    i386 and x86_64 use a list of pgds in order to be able to dynamically
    update standard mappings.

    Signed-off-by: Christoph Lameter
    Cc: "David S. Miller"
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Make sure that the check function really only check things and do not perform
    activities. Extract the tracing and object seeding out of the two check
    functions and place them into slab_alloc and slab_free

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • At kmem_cache_shrink check if we have any empty slabs on the partial
    if so then remove them.

    Also--as an anti-fragmentation measure--sort the partial slabs so that
    the most fully allocated ones come first and the least allocated last.

    The next allocations may fill up the nearly full slabs. Having the
    least allocated slabs last gives them the maximum chance that their
    remaining objects may be freed. Thus we can hopefully minimize the
    partial slabs.

    I think this is the best one can do in terms antifragmentation
    measures. Real defragmentation (meaning moving objects out of slabs with
    the least free objects to those that are almost full) can be implemted
    by reverse scanning through the list produced here but that would mean
    that we need to provide a callback at slab cache creation that allows
    the deletion or moving of an object. This will involve slab API
    changes, so defer for now.

    Cc: Mel Gorman
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch enables listing the callers who allocated or freed objects in a
    cache.

    For example to list the allocators for kmalloc-128 do

    cat /sys/slab/kmalloc-128/alloc_calls
    7 sn_io_slot_fixup+0x40/0x700
    7 sn_io_slot_fixup+0x80/0x700
    9 sn_bus_fixup+0xe0/0x380
    6 param_sysfs_setup+0xf0/0x280
    276 percpu_populate+0xf0/0x1a0
    19 __register_chrdev_region+0x30/0x360
    8 expand_files+0x2e0/0x6e0
    1 sys_epoll_create+0x60/0x200
    1 __mounts_open+0x140/0x2c0
    65 kmem_alloc+0x110/0x280
    3 alloc_disk_node+0xe0/0x200
    33 as_get_io_context+0x90/0x280
    74 kobject_kset_add_dir+0x40/0x140
    12 pci_create_bus+0x2a0/0x5c0
    1 acpi_ev_create_gpe_block+0x120/0x9e0
    41 con_insert_unipair+0x100/0x1c0
    1 uart_open+0x1c0/0xba0
    1 dma_pool_create+0xe0/0x340
    2 neigh_table_init_no_netlink+0x260/0x4c0
    6 neigh_parms_alloc+0x30/0x200
    1 netlink_kernel_create+0x130/0x320
    5 fz_hash_alloc+0x50/0xe0
    2 sn_common_hubdev_init+0xd0/0x6e0
    28 kernel_param_sysfs_setup+0x30/0x180
    72 process_zones+0x70/0x2e0

    cat /sys/slab/kmalloc-128/free_calls
    558
    3 sn_io_slot_fixup+0x600/0x700
    84 free_fdtable_rcu+0x120/0x260
    2 seq_release+0x40/0x60
    6 kmem_free+0x70/0xc0
    24 free_as_io_context+0x20/0x200
    1 acpi_get_object_info+0x3a0/0x3e0
    1 acpi_add_single_object+0xcf0/0x1e40
    2 con_release_unimap+0x80/0x140
    1 free+0x20/0x40

    SLAB_STORE_USER must be enabled for a slab cache by either booting with
    "slab_debug" or enabling user tracking specifically for the slab of interest.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • We leave a mininum of partial slabs on nodes when we search for
    partial slabs on other node. Define a constant for that value.

    Then modify slub to keep MIN_PARTIAL slabs around.

    This avoids bad situations where a function frees the last object
    in a slab (which results in the page being returned to the page
    allocator) only to then allocate one again (which requires getting
    a page back from the page allocator if the partial list was empty).
    Keeping a couple of slabs on the partial list reduces overhead.

    Empty slabs are added to the end of the partial list to insure that
    partially allocated slabs are consumed first (defragmentation).

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This enables validation of slab. Validation means that all objects are
    checked to see if there are redzone violations, if padding has been
    overwritten or any pointers have been corrupted. Also checks the consistency
    of slab counters.

    Validation enables the detection of metadata corruption without the kernel
    having to execute code that actually uses (allocs/frees) and object. It
    allows one to make sure that the slab metainformation and the guard values
    around an object have not been compromised.

    A single slabcache can be checked by writing a 1 to the "validate" file.

    i.e.

    echo 1 >/sys/slab/kmalloc-128/validate

    or use the slabinfo tool to check all slabs

    slabinfo -v

    Error messages will show up in the syslog.

    Note that validation can only reach slabs that are on a list. This means that
    we are usually restricted to partial slabs and active slabs unless
    SLAB_STORE_USER is active which will build a full slab list and allows
    validation of slabs that are fully in use. Booting with "slub_debug" set will
    enable SLAB_STORE_USER and then full diagnostic are available.

    Note that we attempt to push cpu slabs back to the lists when we start the
    check. If the cpu slab is reactivated before we get to it (another processor
    grabs it before we get to it) then it cannot be checked.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If slab tracking is on then build a list of full slabs so that we can verify
    the integrity of all slabs and are also able to built list of alloc/free
    callers.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Object tracking did not work the right way for several call chains. Fix this up
    by adding a new parameter to slub_alloc and slub_free that specifies the
    caller address explicitly.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The patch adds PageTail(page) and PageHead(page) to check if a page is the
    head or the tail of a compound page. This is done by masking the two bits
    describing the state of a compound page and then comparing them. So one
    comparision and a branch instead of two bit checks and two branches.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • If we add a new flag so that we can distinguish between the first page and the
    tail pages then we can avoid to use page->private in the first page.
    page->private == page for the first page, so there is no real information in
    there.

    Freeing up page->private makes the use of compound pages more transparent.
    They become more usable like real pages. Right now we have to be careful f.e.
    if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
    can then no longer use the private field. This is one of the issues that
    cause us not to support debugging for page size slabs in SLAB.

    Having page->private available for SLUB would allow more meta information in
    the page struct. I can probably avoid the 16 bit ints that I have in there
    right now.

    Also if page->private is available then a compound page may be equipped with
    buffer heads. This may free up the way for filesystems to support larger
    blocks than page size.

    We add PageTail as an alias of PageReclaim. Compound pages cannot currently
    be reclaimed. Because of the alias one needs to check PageCompound first.

    The RFC for the this approach was discussed at
    http://marc.info/?t=117574302800001&r=1&w=2

    [nacc@us.ibm.com: fix hugetlbfs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Makes SLUB behave like SLAB in this area to avoid issues....

    Throw a stack dump to alert people.

    At some point the behavior should be switched back. NULL is no memory as
    far as I can tell and if the use asked for 0 bytes then he need to get no
    memory.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Structures may contain u64 items on 32 bit platforms that are only able to
    address 64 bit items on 64 bit boundaries. Change the mininum alignment of
    slabs to conform to those expectations.

    ARCH_KMALLOC_MINALIGN must be changed for good since a variety of structure
    are mixed in the general slabs.

    ARCH_SLAB_MINALIGN is changed because currently there is no consistent
    specification of object alignment. We may have that in the future when the
    KMEM_CACHE and related macros are used to generate slabs. These pass the
    alignment of the structure generated by the compiler to the slab.

    With KMEM_CACHE etc we could align structures that do not contain 64
    bit values to 32 bit boundaries potentially saving some memory.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter