Eric Lee / smarc-fsl-linux-kernel

09 May, 2007

10 commits

74add80cb Remove unused variable in get_unmapped_area ... Browse Code »

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
2007-05-09 02:35:28 +0800
60b59beaf fbdev: mm: Deferred IO support ... Browse Code »

This implements deferred IO support in fbdev. Deferred IO is a way to delay
and repurpose IO. This implementation is done using mm's page_mkwrite and
page_mkclean hooks in order to detect, delay and then rewrite IO. This
functionality is used by hecubafb.

[adaplas]
This is useful for graphics hardware with no directly addressable/mappable
framebuffer. Implementing this will allow the "framebuffer" to be accesible
from user space via mmap().

Signed-off-by: Jaya Kumar
Signed-off-by: Antonino Daplas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jaya Kumar
2007-05-09 02:15:26 +0800
a5c43dae7 Fix race between cat /proc/slab_allocators and rmmod ... Browse Code »

Same story as with cat /proc/*/wchan race vs rmmod race, only
/proc/slab_allocators want more info than just symbol name.

Signed-off-by: Alexey Dobriyan
Acked-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-05-09 02:15:08 +0800
ef51c9762 Remove do_sync_file_range() ... Browse Code »

Remove do_sync_file_range() and convert callers to just use
do_sync_mapping_range().

Signed-off-by: Mark Fasheh
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Fasheh
2007-05-09 02:15:04 +0800
1eeb66a1b move die notifier handling to common code ... Browse Code »

This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Russell King
Signed-off-by: Bryan Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-05-09 02:15:04 +0800
3e9f45bd1 Factor outstanding I/O error handling ... Browse Code »

Cleanup: setting an outstanding error on a mapping was open coded too many
times. Factor it out in mapping_set_error().

Signed-off-by: Guillaume Chazarain
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guillaume Chazarain
2007-05-09 02:14:57 +0800
72280ede3 Add white list into modpost.c for memory hotplug code and ia64's machvec section ... Browse Code »

This patch is add white list into modpost.c for some functions and
ia64's section to fix section mismatchs.

sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
at boot time, and kmalloc/vmalloc at hotplug time. If config
memory hotplug is on, there are references of bootmem allocater(init text)
from them (normal text). This is cause of section mismatch.

Bootmem is called by many functions and it must be
used only at boot time. I think __init of them should keep for
section mismatch check. So, I would like to register sparse_index_alloc()
and zone_wait_table_init() into white list.

In addition, ia64's .machvec section is function table of some platform
dependent code. It is mixture of .init.text and normal text. These
reference of __init functions are valid too.

Signed-off-by: Yasunori Goto
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-09 02:14:57 +0800
a3142c8e1 Fix section mismatch of memory hotplug related code. ... Browse Code »

This is to fix many section mismatches of code related to memory hotplug.
I checked compile with memory hotplug on/off on ia64 and x86-64 box.

Signed-off-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-09 02:14:57 +0800
0ceb33143 mm: move common segment checks to separate helper function ... Browse Code »

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Monakhov Dmitriy
Cc: Christoph Hellwig
Acked-by: Anton Altaparmakov
Acked-by: David Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitriy Monakhov
2007-05-09 02:14:57 +0800
b46b8f19c Increase slab redzone to 64bits ... Browse Code »

There are two problems with the existing redzone implementation.

Firstly, it's causing misalignment of structures which contain a 64-bit
integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
modules to fail to load because of the misalignment. (In particular, the
first check in
net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

With slab debugging, we use 32-bit redzones. And allocated slab objects
aren't sufficiently aligned to hold a structure containing a uint64_t.

By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
redzone checks on those architectures. By using 64-bit redzones we avoid that
loss of debugging, and also fix the other problem while we're at it.

When investigating this, I noticed that on 64-bit platforms we're using a
32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
aside for the redzone. Which means that the four bytes immediately before
or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
machines, respectively. Which is probably not the most useful choice of
poison value.

One way to fix both of those at once is just to switch to 64-bit
redzones in all cases.

Signed-off-by: David Woodhouse
Acked-by: Pekka Enberg
Cc: Christoph Lameter
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Woodhouse
2007-05-09 02:14:57 +0800

08 May, 2007

30 commits

0f9008ef3 Fix up SLUB compile ... Browse Code »

The newly merged SLUB allocator patches had been generated before the
removal of "struct subsystem", and ended up applying fine, but wouldn't
build based on the current tree as a result.

Fix up that merge error - not that SLUB is likely really ready for
showtime yet, but at least I can fix the trivial stuff.

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-08 03:31:58 +0800
ef93127e4 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SERIAL] sunsu: Fix section mismatch warnings.
[SPARC64]: pgtable_cache_init() should be __init.
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/prom.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/pci.c
[SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/console.c
[MM]: sparse_init() should be __init.
[SPARC64]: Update defconfig.
[VIDEO]: Add Sun XVR-2500 framebuffer driver.
[VIDEO]: Add Sun XVR-500 framebuffer driver.
[SPARC64]: SUN4U PCI-E controller support.
[SPARC]: Fix comment typo in smp4m_blackbox_current().
[SCSI] SUNESP: sun_esp.c needs linux/delay.h

Fix up conflict in arch/sparc64/mm/init.c manually due to removal of
pgtable_cache_init() through the -mm patches (even though that patch was
also by David ;)

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-08 03:22:48 +0800
b1296cc48 freezer: fix racy usage of try_to_freeze in kswapd ... Browse Code »

Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it
happens between try_to_freeze() and prepare_to_wait(). To prevent this
from happening we should check freezing(current) before calling schedule().

Signed-off-by: Rafael J. Wysocki
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-08 03:12:59 +0800
7be982349 swsusp: use inline functions for changing page flags ... Browse Code »

Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with
calls to inline functions that can be changed in subsequent patches without
modifying the code calling them.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-08 03:12:58 +0800
4ab688c51 slob: fix page order calculation on not 4KB page ... Browse Code »

SLOB doesn't calculate correct page order when page size is not 4KB. This
patch fixes it with using get_order() instead of find_order() which is SLOB
version of get_order().

Signed-off-by: Akinobu Mita
Acked-by: Matt Mackall
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2007-05-08 03:12:57 +0800
cfce66047 Slab allocators: remove useless __GFP_NO_GROW flag ... Browse Code »

There is no user remaining and I have never seen any use of that flag.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
4f1049345 slab allocators: Remove SLAB_CTOR_ATOMIC ... Browse Code »

SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
that one would want to do something serious in a constructor or destructor.
In particular given that the slab allocators run with interrupts disabled.
Actions in constructors and destructors are by their nature very limited
and usually do not go beyond initializing variables and list operations.

(The i386 pgd ctor and dtors do take a spinlock in constructor and
destructor..... I think that is the furthest we go at this point.)

There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
establishes a certain symmetry.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
50953fe9e slab allocators: Remove SLAB_DEBUG_INITIAL flag ... Browse Code »

I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
4b1d89290 get_unmapped_area doesn't need hugetlbfs hacks anymore ... Browse Code »

Remove the hugetlbfs specific hacks in toplevel get_unmapped_area() now that
all archs and hugetlbfs itself do the right thing for both cases.

Signed-off-by: Benjamin Herrenschmidt
Acked-by: William Irwin
Cc: Paul Mackerras
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: David Howells
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Kyle McMartin
Cc: Grant Grundler
Cc: Matthew Wilcox
Cc: "David S. Miller"
Cc: Adam Litke
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2007-05-08 03:12:57 +0800
06abdfb47 get_unmapped_area handles MAP_FIXED in generic code ... Browse Code »

generic arch_get_unmapped_area() now handles MAP_FIXED. Now that all
implementations have been fixed, change the toplevel get_unmapped_area() to
call into arch or drivers for the MAP_FIXED case.

Signed-off-by: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: David Howells
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Kyle McMartin
Cc: Grant Grundler
Cc: Matthew Wilcox
Cc: "David S. Miller"
Cc: William Irwin
Cc: Adam Litke
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2007-05-08 03:12:57 +0800
2b45ab339 oom: fix constraint deadlock ... Browse Code »

Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.

Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.

constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible. If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
This results in deadlock.

We now take callback_mutex after iterating through the zonelist since we
don't need it yet.

Cc: Andi Kleen
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: Martin J. Bligh
Signed-off-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-05-08 03:12:55 +0800
2b744c01a mm: fix handling of panic_on_oom when cpusets are in use ... Browse Code »

The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.

This is tested on my ia64 box which has 3 nodes.

Signed-off-by: Yasunori Goto
Signed-off-by: Benjamin LaHaise
Cc: Christoph Lameter
Cc: Paul Jackson
Cc: Ethan Solomita
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-08 03:12:55 +0800
824ebef12 fault injection: fix failslab with CONFIG_NUMA ... Browse Code »

Currently failslab injects failures into ____cache_alloc(). But with enabling
CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
kmem_cache_alloc, ...) return NULL.

This patch moves fault injection hook inside of __cache_alloc() and
__cache_alloc_node(). These are lower call path than ____cache_alloc() and
enable to inject faulures to slab allocators with CONFIG_NUMA.

Acked-by: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2007-05-08 03:12:55 +0800
5af608399 slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN ... Browse Code »

This patch was recently posted to lkml and acked by Pekka.

The flag SLAB_MUST_HWCACHE_ALIGN is

1. Never checked by SLAB at all.

2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.

The flag is confusing, inconsistent and has no purpose.

Remove it.

Acked-by: Pekka Enberg
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800
0a27a14a6 mm: madvise avoid exclusive mmap_sem ... Browse Code »

Avoid down_write of the mmap_sem in madvise when we can help it.

Acked-by: Hugh Dickins
Signed-off-by: Nick Piggin
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-05-08 03:12:54 +0800
b4169525b include KERN_* constant in printk() calls in mm/slab.c ... Browse Code »

Signed-off-by: Matthias Kaehlcke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

matze
2007-05-08 03:12:54 +0800
bc0055aee slob: handle SLAB_PANIC flag ... Browse Code »

kmem_cache_create() for slob doesn't handle SLAB_PANIC.

Signed-off-by: Matt Mackall
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2007-05-08 03:12:54 +0800
6225e9373 Quicklists for page table pages ... Browse Code »

On x86_64 this cuts allocation overhead for page table pages down to a
fraction (kernel compile / editing load. TSC based measurement of times spend
in each function):

no quicklist

pte_alloc 1569048 4.3s(401ns/2.7us/179.7us)
pmd_alloc 780988 2.1s(337ns/2.7us/86.1us)
pud_alloc 780072 2.2s(424ns/2.8us/300.6us)
pgd_alloc 260022 1s(920ns/4us/263.1us)

quicklist:

pte_alloc 452436 573.4ms(8ns/1.3us/121.1us)
pmd_alloc 196204 174.5ms(7ns/889ns/46.1us)
pud_alloc 195688 172.4ms(7ns/881ns/151.3us)
pgd_alloc 65228 9.8ms(8ns/150ns/6.1us)

pgd allocations are the most complex and there we see the most dramatic
improvement (may be we can cut down the amount of pgds cached somewhat?). But
even the pte allocations still see a doubling of performance.

1. Proven code from the IA64 arch.

The method used here has been fine tuned for years and
is NUMA aware. It is based on the knowledge that accesses
to page table pages are sparse in nature. Taking a page
off the freelists instead of allocating a zeroed pages
allows a reduction of number of cachelines touched
in addition to getting rid of the slab overhead. So
performance improves. This is particularly useful if pgds
contain standard mappings. We can save on the teardown
and setup of such a page if we have some on the quicklists.
This includes avoiding lists operations that are otherwise
necessary on alloc and free to track pgds.

2. Light weight alternative to use slab to manage page size pages

Slab overhead is significant and even page allocator use
is pretty heavy weight. The use of a per cpu quicklist
means that we touch only two cachelines for an allocation.
There is no need to access the page_struct (unless arch code
needs to fiddle around with it). So the fast past just
means bringing in one cacheline at the beginning of the
page. That same cacheline may then be used to store the
page table entry. Or a second cacheline may be used
if the page table entry is not in the first cacheline of
the page. The current code will zero the page which means
touching 32 cachelines (assuming 128 byte). We get down
from 32 to 2 cachelines in the fast path.

3. x86_64 gets lightweight page table page management.

This will allow x86_64 arch code to faster repopulate pgds
and other page table entries. The list operations for pgds
are reduced in the same way as for i386 to the point where
a pgd is allocated from the page allocator and when it is
freed back to the page allocator. A pgd can pass through
the quicklists without having to be reinitialized.

64 Consolidation of code from multiple arches

So far arches have their own implementation of quicklist
management. This patch moves that feature into the core allowing
an easier maintenance and consistent management of quicklists.

Page table pages have the characteristics that they are typically zero or in a
known state when they are freed. This is usually the exactly same state as
needed after allocation. So it makes sense to build a list of freed page
table pages and then consume the pages already in use first. Those pages have
already been initialized correctly (thus no need to zero them) and are likely
already cached in such a way that the MMU can use them most effectively. Page
table pages are used in a sparse way so zeroing them on allocation is not too
useful.

Such an implementation already exits for ia64. Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64. It
also only supported a single quicklist. The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.

Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one
quicklist is necessary then we can define NR_QUICK for additional lists. F.e.
i386 needs two and thus has

config NR_QUICK
int
default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:

quicklist_alloc(, , )

Page table pages can be freed using:

quicklist_free(, , )

Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.

Signed-off-by: Christoph Lameter
Cc: "David S. Miller"
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
70d71228a slub: remove object activities out of checking functions ... Browse Code »

Make sure that the check function really only check things and do not perform
activities. Extract the tracing and object seeding out of the two check
functions and place them into slab_alloc and slab_free

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
2086d26a0 SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink ... Browse Code »

At kmem_cache_shrink check if we have any empty slabs on the partial
if so then remove them.

Also--as an anti-fragmentation measure--sort the partial slabs so that
the most fully allocated ones come first and the least allocated last.

The next allocations may fill up the nearly full slabs. Having the
least allocated slabs last gives them the maximum chance that their
remaining objects may be freed. Thus we can hopefully minimize the
partial slabs.

I think this is the best one can do in terms antifragmentation
measures. Real defragmentation (meaning moving objects out of slabs with
the least free objects to those that are almost full) can be implemted
by reverse scanning through the list produced here but that would mean
that we need to provide a callback at slab cache creation that allows
the deletion or moving of an object. This will involve slab API
changes, so defer for now.

Cc: Mel Gorman
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
88a420e4e slub: add ability to list alloc / free callers per slab ... Browse Code »

This patch enables listing the callers who allocated or freed objects in a
cache.

For example to list the allocators for kmalloc-128 do

cat /sys/slab/kmalloc-128/alloc_calls
7 sn_io_slot_fixup+0x40/0x700
7 sn_io_slot_fixup+0x80/0x700
9 sn_bus_fixup+0xe0/0x380
6 param_sysfs_setup+0xf0/0x280
276 percpu_populate+0xf0/0x1a0
19 __register_chrdev_region+0x30/0x360
8 expand_files+0x2e0/0x6e0
1 sys_epoll_create+0x60/0x200
1 __mounts_open+0x140/0x2c0
65 kmem_alloc+0x110/0x280
3 alloc_disk_node+0xe0/0x200
33 as_get_io_context+0x90/0x280
74 kobject_kset_add_dir+0x40/0x140
12 pci_create_bus+0x2a0/0x5c0
1 acpi_ev_create_gpe_block+0x120/0x9e0
41 con_insert_unipair+0x100/0x1c0
1 uart_open+0x1c0/0xba0
1 dma_pool_create+0xe0/0x340
2 neigh_table_init_no_netlink+0x260/0x4c0
6 neigh_parms_alloc+0x30/0x200
1 netlink_kernel_create+0x130/0x320
5 fz_hash_alloc+0x50/0xe0
2 sn_common_hubdev_init+0xd0/0x6e0
28 kernel_param_sysfs_setup+0x30/0x180
72 process_zones+0x70/0x2e0

cat /sys/slab/kmalloc-128/free_calls
558
3 sn_io_slot_fixup+0x600/0x700
84 free_fdtable_rcu+0x120/0x260
2 seq_release+0x40/0x60
6 kmem_free+0x70/0xc0
24 free_as_io_context+0x20/0x200
1 acpi_get_object_info+0x3a0/0x3e0
1 acpi_add_single_object+0xcf0/0x1e40
2 con_release_unimap+0x80/0x140
1 free+0x20/0x40

SLAB_STORE_USER must be enabled for a slab cache by either booting with
"slab_debug" or enabling user tracking specifically for the slab of interest.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
e95eed571 SLUB: Add MIN_PARTIAL ... Browse Code »

We leave a mininum of partial slabs on nodes when we search for
partial slabs on other node. Define a constant for that value.

Then modify slub to keep MIN_PARTIAL slabs around.

This avoids bad situations where a function frees the last object
in a slab (which results in the page being returned to the page
allocator) only to then allocate one again (which requires getting
a page back from the page allocator if the partial list was empty).
Keeping a couple of slabs on the partial list reduces overhead.

Empty slabs are added to the end of the partial list to insure that
partially allocated slabs are consumed first (defragmentation).

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
53e15af03 slub: validation of slabs (metadata and guard zones) ... Browse Code »

This enables validation of slab. Validation means that all objects are
checked to see if there are redzone violations, if padding has been
overwritten or any pointers have been corrupted. Also checks the consistency
of slab counters.

Validation enables the detection of metadata corruption without the kernel
having to execute code that actually uses (allocs/frees) and object. It
allows one to make sure that the slab metainformation and the guard values
around an object have not been compromised.

A single slabcache can be checked by writing a 1 to the "validate" file.

i.e.

echo 1 >/sys/slab/kmalloc-128/validate

or use the slabinfo tool to check all slabs

slabinfo -v

Error messages will show up in the syslog.

Note that validation can only reach slabs that are on a list. This means that
we are usually restricted to partial slabs and active slabs unless
SLAB_STORE_USER is active which will build a full slab list and allows
validation of slabs that are fully in use. Booting with "slub_debug" set will
enable SLAB_STORE_USER and then full diagnostic are available.

Note that we attempt to push cpu slabs back to the lists when we start the
check. If the cpu slab is reactivated before we get to it (another processor
grabs it before we get to it) then it cannot be checked.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
643b11384 slub: enable tracking of full slabs ... Browse Code »

If slab tracking is on then build a list of full slabs so that we can verify
the integrity of all slabs and are also able to built list of alloc/free
callers.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
77c5e2d01 slub: fix object tracking ... Browse Code »

Object tracking did not work the right way for several call chains. Fix this up
by adding a new parameter to slub_alloc and slub_free that specifies the
caller address explicitly.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
b49af68ff Add virt_to_head_page and consolidate code in slab and slub ... Browse Code »

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
6d7779538 mm: optimize compound_head() by avoiding a shared page flag ... Browse Code »

The patch adds PageTail(page) and PageHead(page) to check if a page is the
head or the tail of a compound page. This is done by masking the two bits
describing the state of a compound page and then comparing them. So one
comparision and a branch instead of two bit checks and two branches.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
d85f33855 Make page->private usable in compound pages ... Browse Code »

If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
614410d58 SLUB: allocate smallest object size if the user asks for 0 bytes ... Browse Code »

Makes SLUB behave like SLAB in this area to avoid issues....

Throw a stack dump to alert people.

At some point the behavior should be switched back. NULL is no memory as
far as I can tell and if the use asked for 0 bytes then he need to get no
memory.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
47bfdc0d5 SLUB: change default alignments ... Browse Code »

Structures may contain u64 items on 32 bit platforms that are only able to
address 64 bit items on 64 bit boundaries. Change the mininum alignment of
slabs to conform to those expectations.

ARCH_KMALLOC_MINALIGN must be changed for good since a variety of structure
are mixed in the general slabs.

ARCH_SLAB_MINALIGN is changed because currently there is no consistent
specification of object alignment. We may have that in the future when the
KMEM_CACHE and related macros are used to generate slabs. These pass the
alignment of the structure generated by the compiler to the slab.

With KMEM_CACHE etc we could align structures that do not contain 64
bit values to 32 bit boundaries potentially saving some memory.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800