Eric Lee / smarc-fsl-linux-kernel

08 May, 2007

12 commits

50953fe9e slab allocators: Remove SLAB_DEBUG_INITIAL flag ... Browse Code »

I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:57 +0800
824ebef12 fault injection: fix failslab with CONFIG_NUMA ... Browse Code »

Currently failslab injects failures into ____cache_alloc(). But with enabling
CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
kmem_cache_alloc, ...) return NULL.

This patch moves fault injection hook inside of __cache_alloc() and
__cache_alloc_node(). These are lower call path than ____cache_alloc() and
enable to inject faulures to slab allocators with CONFIG_NUMA.

Acked-by: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2007-05-08 03:12:55 +0800
5af608399 slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN ... Browse Code »

This patch was recently posted to lkml and acked by Pekka.

The flag SLAB_MUST_HWCACHE_ALIGN is

1. Never checked by SLAB at all.

2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.

The flag is confusing, inconsistent and has no purpose.

Remove it.

Acked-by: Pekka Enberg
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:55 +0800
b4169525b include KERN_* constant in printk() calls in mm/slab.c ... Browse Code »

Signed-off-by: Matthias Kaehlcke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

matze
2007-05-08 03:12:54 +0800
b49af68ff Add virt_to_head_page and consolidate code in slab and slub ... Browse Code »

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:54 +0800
d85f33855 Make page->private usable in compound pages ... Browse Code »

If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
a3a02be79 slab: mark set_up_list3s() __init ... Browse Code »

It is only ever used prior to free_initmem().

(It will cause a warning when we run the section checking, but that's a
false-positive and it simply changes the source of an existing warning, which
is also a false-positive)

Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-05-08 03:12:53 +0800
8da3430d8 slab: NUMA kmem_cache diet ... Browse Code »

Some NUMA machines have a big MAX_NUMNODES (possibly 1024), but fewer
possible nodes. This patch dynamically sizes the 'struct kmem_cache' to
allocate only needed space.

I moved nodelists[] field at the end of struct kmem_cache, and use the
following computation in kmem_cache_init()

cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
nr_node_ids * sizeof(struct kmem_list3 *);

On my two nodes x86_64 machine, kmem_cache.obj_size is now 192 instead of 704
(This is because on x86_64, MAX_NUMNODES is 64)

On bigger NUMA setups, this might reduce the gfporder of "cache_cache"

Signed-off-by: Eric Dumazet
Cc: Pekka Enberg
Cc: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-05-08 03:12:53 +0800
631098469 SLAB: don't allocate empty shared caches ... Browse Code »

We can avoid allocating empty shared caches and avoid unecessary check of
cache->limit. We save some memory. We avoid bringing into CPU cache
unecessary cache lines.

All accesses to l3->shared are already checking NULL pointers so this patch is
safe.

Signed-off-by: Eric Dumazet
Acked-by: Pekka Enberg
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-05-08 03:12:53 +0800
364fbb29a SLAB: use num_possible_cpus() in enable_cpucache() ... Browse Code »

The existing comment in mm/slab.c is *perfect*, so I reproduce it :

/*
* CPU bound tasks (e.g. network routing) can exhibit cpu bound
* allocation behaviour: Most allocs on one cpu, most free operations
* on another cpu. For these cases, an efficient object passing between
* cpus is necessary. This is provided by a shared array. The array
* replaces Bonwick's magazine layer.
* On uniprocessor, it's functionally equivalent (but less efficient)
* to a larger limit. Thus disabled by default.
*/

As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way
a preprocessor #if can detect if the machine is UP or SMP. Better to use
num_possible_cpus().

This means on UP we allocate a 'size=0 shared array', to be more efficient.

Another patch can later avoid the allocations of 'empty shared arrays', to
save some memory.

Signed-off-by: Eric Dumazet
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2007-05-08 03:12:52 +0800
714b8171a slab: ensure cache_alloc_refill terminates ... Browse Code »

If slab->inuse is corrupted, cache_alloc_refill can enter an infinite
loop as detailed by Michael Richardson in the following post:
. This adds a BUG_ON to catch
those cases.

Cc: Michael Richardson
Acked-by: Christoph Lameter
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2007-05-08 03:12:51 +0800
fd76bab2f slab: introduce krealloc ... Browse Code »

This introduce krealloc() that reallocates memory while keeping the contents
unchanged. The allocator avoids reallocation if the new size fits the
currently used cache. I also added a simple non-optimized version for
mm/slob.c for compatibility.

[akpm@linux-foundation.org: fix warnings]
Acked-by: Josef Sipek
Acked-by: Matt Mackall
Acked-by: Christoph Lameter
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2007-05-08 03:12:50 +0800

03 May, 2007

1 commit

62918a036 [PATCH] x86-64: skip cache_free_alien() on non NUMA ... Browse Code »

Set use_alien_caches to 0 on non NUMA platforms. And avoid calling the
cache_free_alien() when use_alien_caches is not set. This will avoid the
cache miss that happens while dereferencing slabp to get nodeid.

Signed-off-by: Suresh Siddha
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Cc: Eric Dumazet
Cc: David Rientjes
Cc: Christoph Lameter
Signed-off-by: Andrew Morton

Siddha, Suresh B
2007-05-03 01:27:18 +0800

04 Apr, 2007

1 commit

e94a40c50 [PATCH] SLAB: Mention slab name when listing corrupt objects ... Browse Code »

Mention the slab name when listing corrupt objects. Although the function
that released the memory is mentioned, that is frequently ambiguous as such
functions often release several pieces of memory.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2007-04-04 23:51:52 +0800

02 Mar, 2007

1 commit

05fb6bf0b [PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers) ... Browse Code »

Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-03-02 06:53:37 +0800

21 Feb, 2007

1 commit

8ef828668 [PATCH] slab: reduce size of alien cache to cover only possible nodes ... Browse Code »

The alien cache is a per cpu per node array allocated for every slab on the
system. Currently we size this array for all nodes that the kernel does
support. For IA64 this is 1024 nodes. So we allocate an array with 1024
objects even if we only boot a system with 4 nodes.

This patch uses "nr_node_ids" to determine the number of possible nodes
supported by a hardware configuration and only allocates an alien cache
sized for possible nodes.

The initialization of nr_node_ids occurred too late relative to the bootstrap
of the slab allocator and so I moved the setup_nr_node_ids() into
free_area_init_nodes().

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-21 09:10:13 +0800

12 Feb, 2007

6 commits

72fd4a35a [PATCH] Numerous fixes to kernel-doc info in source files. ... Browse Code »

A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
source files, including:

* make multi-line initial descriptions single line
* denote some function names, constants and structs as such
* change erroneous opening '/*' to '/**' in a few places
* reword some text for clarity

Signed-off-by: Robert P. J. Day
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:32 +0800
898552c9d [PATCH] lockdep: also check for freed locks in kmem_cache_free() ... Browse Code »

kmem_cache_free() was missing the check for freeing held locks.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2007-02-12 02:51:26 +0800
4b51d6698 [PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM ... Browse Code »

Make ZONE_DMA optional in core code.

- ifdef all code for ZONE_DMA and related definitions following the example
for ZONE_DMA32 and ZONE_HIGHMEM.

- Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
0.

- Modify the VM statistics to work correctly without a DMA zone.

- Modify slab to not create DMA slabs if there is no ZONE_DMA.

[akpm@osdl.org: cleanup]
[jdike@addtoit.com: build fix]
[apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
Signed-off-by: Christoph Lameter
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Kyle McMartin
Cc: Matthew Wilcox
Cc: James Bottomley
Cc: Paul Mundt
Signed-off-by: Andy Whitcroft
Signed-off-by: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-12 02:51:18 +0800
7c5cae368 [PATCH] slab: use parameter passed to cache_reap to determine pointer to work structure ... Browse Code »

Use the pointer passed to cache_reap to determine the work pointer and
consolidate exit paths.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-02-12 02:51:17 +0800
8c8cc2c10 [PATCH] slab: cache alloc cleanups ... Browse Code »

Clean up __cache_alloc and __cache_alloc_node functions a bit. We no
longer need to do NUMA_BUILD tricks and the UMA allocation path is much
simpler. No functional changes in this patch.

Note: saves few kernel text bytes on x86 NUMA build due to using gotos in
__cache_alloc_node() and moving __GFP_THISNODE check in to
fallback_alloc().

Cc: Andy Whitcroft
Cc: Christoph Hellwig
Cc: Manfred Spraul
Acked-by: Christoph Lameter
Cc: Paul Jackson
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2007-02-12 02:51:16 +0800
6e40e7309 [PATCH] slab: remove broken PageSlab check from kfree_debugcheck ... Browse Code »

The PageSlab debug check in kfree_debugcheck() is broken for compound
pages. It is also redundant as we already do BUG_ON for non-slab pages in
page_get_cache() and page_get_slab() which are always called before we free
any actual objects.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2007-02-12 02:51:16 +0800

06 Jan, 2007

1 commit

b6a604518 [PATCH] fix BUG_ON(!PageSlab) from fallback_alloc ... Browse Code »

pdflush hit the BUG_ON(!PageSlab(page)) in kmem_freepages called from
fallback_alloc: cache_grow already freed those pages when alloc_slabmgmt
failed. But it wouldn't have freed them if __GFP_NO_GROW, so make sure
fallback_alloc doesn't waste its time on that case.

Signed-off-by: Hugh Dickins
Acked-by: Christoph Lameter
Acked-by: Pekka J Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-01-06 15:55:23 +0800

23 Dec, 2006

2 commits

af9997e42 [PATCH] fix kernel-doc warnings in 2.6.20-rc1 ... Browse Code »

Fix kernel-doc warnings in 2.6.20-rc1.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-12-23 00:55:47 +0800
b7f869a28 [PATCH] slab: fix kmem_ptr_validate definition ... Browse Code »

The declaration of kmem_ptr_validate in slab.h does not match the
one in slab.c. Remove the fastcall attribute (this is the only use in
slab.c).

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-23 00:55:47 +0800

14 Dec, 2006

4 commits

6a2d7a955 [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() ... Browse Code »

When some objects are allocated by one CPU but freed by another CPU we can
consume lot of cycles doing divides in obj_to_index().

(Typical load on a dual processor machine where network interrupts are
handled by one particular CPU (allocating skbufs), and the other CPU is
running the application (consuming and freeing skbufs))

Here on one production server (dual-core AMD Opteron 285), I noticed this
divide took 1.20 % of CPU_CLK_UNHALTED events in kernel. But Opteron are
quite modern cpus and the divide is much more expensive on oldest
architectures :

On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
cycle for a multiply.

Doing some math, we can use a reciprocal multiplication instead of a divide.

If we want to compute V = (A / B) (A and B being u32 quantities)
we can instead use :

V = ((u64)A * RECIPROCAL(B)) >> 32 ;

where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B

Note :

I wrote pure C code for clarity. gcc output for i386 is not optimal but
acceptable :

mull 0x14(%ebx)
mov %edx,%eax // part of the >> 32
xor %edx,%edx // useless
mov %eax,(%esp) // could be avoided
mov %edx,0x4(%esp) // useless
mov (%esp),%ebx

[akpm@osdl.org: small cleanups]
Signed-off-by: Eric Dumazet
Cc: Christoph Lameter
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-12-14 01:05:49 +0800
02a0e53d8 [PATCH] cpuset: rework cpuset_zone_allowed api ... Browse Code »

Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:

cpuset_zone_allowed_hardwall()
cpuset_zone_allowed_softwall()

Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.

If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.

Unfortunately, this meant that users would end up with the softwall version
without thinking about it. Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.

The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)

The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)

This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.

If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.

This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines. It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.

For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.

Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-14 01:05:49 +0800
55935a34a [PATCH] More slab.h cleanups ... Browse Code »

More cleanups for slab.h

1. Remove tabs from weird locations as suggested by Pekka

2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
as suggested by Pekka.

3. Uses static inline for the fallback defs as also suggested by Pekka.

4. Make kmem_ptr_valid take a const * argument.

5. Separate the NUMA fallback definitions from the kmalloc_track fallback
definitions.

Signed-off-by: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-14 01:05:49 +0800
dd47ea755 [PATCH] slab: fix sleeping in atomic bug ... Browse Code »

Fallback_alloc() does not do the check for GFP_WAIT as done in
cache_grow(). Thus interrupts are disabled when we call kmem_getpages()
which results in the failure.

Duplicate the handling of GFP_WAIT in cache_grow().

Signed-off-by: Christoph Lameter
Cc: Jay Cliburn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-14 01:05:48 +0800

11 Dec, 2006

1 commit

2b2842146 [PATCH] user of the jiffies rounding patch: Slab ... Browse Code »

This patch introduces users of the round_jiffies() function in the slab code.

The slab code has a few "run every second" timers for background work; these
are obviously not timing critical as long as they happen roughly at the right
frequency.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-12-11 01:57:22 +0800

09 Dec, 2006

3 commits

6b1b60f41 [PATCH] fault-injection: defaults likely to please a new user ... Browse Code »

Assign defaults most likely to please a new user:
1) generate some logging output
(verbose=2)
2) avoid injecting failures likely to lock up UI
(ignore_gfp_wait=1, ignore_gfp_highmem=1)

Signed-off-by: Don Mullis
Cc: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Don Mullis
2006-12-09 00:29:03 +0800
8a8b6502f [PATCH] fault-injection capability for kmalloc ... Browse Code »

This patch provides fault-injection capability for kmalloc.

Boot option:

failslab=,,,

-- specifies the interval of failures.

-- specifies how often it should fail in percent.

-- specifies the size of free space where memory can be
allocated safely in bytes.

-- specifies how many times failures may happen at most.

Debugfs:

/debug/failslab/interval
/debug/failslab/probability
/debug/failslab/specifies
/debug/failslab/times
/debug/failslab/ignore-gfp-highmem
/debug/failslab/ignore-gfp-wait

Example:

failslab=10,100,0,-1

slab allocation (kmalloc(), kmem_cache_alloc(),..) fails once per 10 times.

Cc: Pekka Enberg
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2006-12-09 00:29:02 +0800
b8b50b651 [PATCH] mm: fallback_alloc cpuset_zone_allowed irq fix ... Browse Code »

fallback_alloc() could end up calling cpuset_zone_allowed() with interrupts
disabled (by code in kmem_cache_alloc_node()), but without __GFP_HARDWALL
set, leading to a possible call of a sleeping function with interrupts
disabled.

This results in the BUG report:

BUG: sleeping function called from invalid context at kernel/cpuset.c:1520
in_atomic():0, irqs_disabled():1

Thanks to Paul Menage for catching this one.

Signed-off-by: Paul Jackson
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-09 00:28:37 +0800

08 Dec, 2006

7 commits

15ad7cdcf [PATCH] struct seq_operations and struct file_operations constification ... Browse Code »

- move some file_operations structs into the .rodata section

- move static strings from policy_types[] array into the .rodata section

- fix generic seq_operations usages, so that those structs may be defined
as "const" as well

[akpm@osdl.org: couple of fixes]
Signed-off-by: Helge Deller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Helge Deller
2006-12-08 00:39:46 +0800
138ae6631 [PATCH] slab: use probe_kernel_address() ... Browse Code »

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-08 00:39:34 +0800
3c517a613 [PATCH] slab: better fallback allocation behavior ... Browse Code »

Currently we simply attempt to allocate from all allowed nodes using
GFP_THISNODE. However, GFP_THISNODE does not do reclaim (it wont do any at
all if the recent GFP_THISNODE patch is accepted). If we truly run out of
memory in the whole system then fallback_alloc may return NULL although
memory may still be available if we would perform more thorough reclaim.

This patch changes fallback_alloc() so that we first only inspect all the
per node queues for available slabs. If we find any then we allocate from
those. This avoids slab fragmentation by first getting rid of all partial
allocated slabs on every node before allocating new memory.

If we cannot satisfy the allocation from any per node queue then we extend
a slab. We now call into the page allocator without specifying
GFP_THISNODE. The page allocator will then implement its own fallback (in
the given cpuset context), perform necessary reclaim (again considering not
a single node but the whole set of allowed nodes) and then return pages for
a new slab.

We identify from which node the pages were allocated and then insert the
pages into the corresponding per node structure. In order to do so we need
to modify cache_grow() to take a parameter that specifies the new slab.
kmem_getpages() can no longer set the GFP_THISNODE flag since we need to be
able to use kmem_getpage to allocate from an arbitrary node. GFP_THISNODE
needs to be specified when calling cache_grow().

One key advantage is that the decision from which node to allocate new
memory is removed from slab fallback processing. The patch allows to go
back to use of the page allocators fallback/reclaim logic.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
5bcd234d8 [PATCH] slab: fix two issues in kmalloc_node / __cache_alloc_node ... Browse Code »

This addresses two issues:

1. Kmalloc_node() may intermittently return NULL if we are allocating
from the current node and are unable to obtain memory for the current
node from the page allocator. This is because we call ___cache_alloc()
if nodeid == numa_node_id() and ____cache_alloc is not able to fallback
to other nodes.

This was introduced in the 2.6.19 development cycle.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
441e143e9 [PATCH] slab: remove SLAB_DMA ... Browse Code »

SLAB_DMA is an alias of GFP_DMA. This is the last one so we
remove the leftover comment too.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
a06d72c1d [PATCH] slab: remove SLAB_LEVEL_MASK ... Browse Code »

SLAB_LEVEL_MASK is only used internally to the slab and is
and alias of GFP_LEVEL_MASK.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800