07 Mar, 2008
3 commits
-
NUMA slab allocator cpu migration bugfix
The NUMA slab allocator (specifically, cache_alloc_refill)
is not refreshing its local copies of what cpu and what
numa node it is on, when it drops and reacquires the irq
block that it inherited from its caller. As a result
those values become invalid if an attempt to migrate the
process to another numa node occured while the irq block
had been dropped.The solution is to make cache_alloc_refill reload these
variables whenever it drops and reacquires the irq block.The error is very difficult to hit. When it does occur,
one gets the following oops + stack traceback bits in
check_spinlock_acquired:kernel BUG at mm/slab.c:2417
cache_alloc_refill+0xe6
kmem_cache_alloc+0xd0
...This patch was developed against 2.6.23, ported to and
compiled-tested only against 2.6.25-rc4.Signed-off-by: Joe Korty
Signed-off-by: Christoph Lameter -
Make them all use angle brackets and the directory name.
Acked-by: Pekka Enberg
Signed-off-by: Joe Perches
Signed-off-by: Christoph Lameter -
The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the
flags passed in.Reviewed-by: Pekka Enberg
Signed-off-by: Christoph Lameter
15 Feb, 2008
1 commit
-
- alloc_slabmgmt: initialize all slab fields in 1 place
- slab->nodeid was initialized twice: in alloc_slabmgmt
and immediately after it in cache_growSigned-off-by: Marcin Slusarz
CC: Christoph Lameter
Reviewed-by: Pekka Enberg
Signed-off-by: Christoph Lameter
26 Jan, 2008
2 commits
-
This patch converts the known per-subsystem mutexes to get_online_cpus
put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and
CPU_LOCK_RELEASE hotplug notification events.Signed-off-by: Gautham R Shenoy
Signed-off-by: Ingo Molnar -
If the node we're booting on doesn't have memory, bootstrapping kmalloc()
caches resorts to fallback_alloc() which requires ->nodelists set for all
nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in
kmem_cache_init().As kmem_getpages() is called with GFP_THISNODE set, this used to work before
because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from
the wrong node if a node had no memory. So it may have worked accidentally and
in an unsafe manner because the pages would have been associated with the wrong
node which could trigger bug ons and locking troubles.Tested-by: Mel Gorman
Tested-by: Olaf Hering
Reviewed-by: Christoph Lameter
Signed-off-by: Pekka Enberg
[ With additional one-liner by Olaf Hering - Linus ]
Signed-off-by: Linus Torvalds
25 Jan, 2008
1 commit
-
Partial revert the changes made by 04231b3002ac53f8a64a7bd142fde3fa4b6808c6
to the kmem_list3 management. On a machine with a memoryless node, this
BUG_ON was triggeringstatic void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
struct list_head *entry;
struct slab *slabp;
struct kmem_list3 *l3;
void *obj;
int x;l3 = cachep->nodelists[nodeid];
BUG_ON(!l3);Signed-off-by: Mel Gorman
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Cc: "Aneesh Kumar K.V"
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Jan, 2008
1 commit
-
Both SLUB and SLAB really did almost exactly the same thing for
/proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
and SLAB, and shares all the setup code. Maybe SLOB will want this some
day too.Reviewed-by: Pekka Enberg
Signed-off-by: Linus Torvalds
06 Dec, 2007
1 commit
-
mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.
It's used by binfmt_flat, which can be built as a module.
Signed-off-by: Tetsuo Handa
Cc: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Dec, 2007
1 commit
-
The database performance group have found that half the cycles spent
in kmem_cache_free are spent in this one call to BUG_ON. Moving it
into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
performance win of almost 0.5% on their particular benchmark.The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff
with the comment that "overhead should be minimal". It may have been
minimal at the time, but it isn't now.[ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the
performance regression but rather the virt_to_head_page() changes to
virt_to_cache() that were added later." ]Signed-off-by: Matthew Wilcox
Acked-by: Pekka J Enberg
Signed-off-by: Linus Torvalds
15 Nov, 2007
1 commit
-
This patch fixes wrong array index in allocation failure handling.
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
1 commit
-
Spelling fixes in mm/.
Signed-off-by: Simon Arlott
Signed-off-by: Adrian Bunk
19 Oct, 2007
2 commits
-
This patch fixes memory leak in error path.
In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming
cpu hotplug error handling change needs this.Cc: Christoph Lameter
Cc: Gautham R Shenoy
Acked-by: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and
CPU_UP_PREPARE handlings from cpuup_callback().Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2007
6 commits
-
Since nothing earlier than gcc-3.2 is supported for kernel
compilation, that 2.95 hack can be removed.Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.Signed-off-by: Mel Gorman
Cc: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up
the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
flags:GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior.
GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur.
GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on.
These replace the uses of GFP_LEVEL mask in the slab allocators and in
vmalloc.c.The use of the flags not included in these sets may occur as a result of a
slab allocation standing in for a page allocation when constructing scatter
gather lists. Extraneous flags are cleared and not passed through to the
page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
now be ignored if passed to a slab allocator.Change the allocation of allocator meta data in SLAB and vmalloc to not
pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the
__GFP_THISNODE flag for such allocations. Generalize that to also cover
vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.The impact of allocator metadata placement on access latency to the
cachelines of the object itself is minimal since metadata is only
referenced on alloc and free. The attempt is still made to place the meta
data optimally but we consistently allow fallback both in SLAB and vmalloc
(SLUB does not need to allocate metadata like that).Allocator metadata may serve multiple in kernel users and thus should not
be subject to the limitations arising from a single allocation context.[akpm@linux-foundation.org: fix fallback_alloc()]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab should not allocate control structures for nodes without memory. This
may seem to work right now but its unreliable since not all allocations can
fall back due to the use of GFP_THISNODE.Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to
only allocate for nodes that have regular memory.Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A NULL pointer means that the object was not allocated. One cannot
determine the size of an object that has not been allocated. Currently we
return 0 but we really should BUG() on attempts to determine the size of
something nonexistent.krealloc() interprets NULL to mean a zero sized object. Handle that
separately in krealloc().Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Aug, 2007
1 commit
-
Skip calling cache_free_alien() when the platform is not numa capable.
This will avoid cache misses that happen while accessing slabp (which is
per page memory reference) to get nodeid. Instead use a global variable to
skip the call, which is mostly likely to be present in the cache.This gives a 0.8% performance boost with the database oltp workload on a
quad-core SMP platform and by any means the number is not small :)Signed-off-by: Suresh Siddha
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jul, 2007
1 commit
-
Use the correct local variable when calling into the page allocator. Local
`flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
page allocator, possibly from illegal contexts. The page allocator will later
do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
then go BUG.Cc: Mike Galbraith
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2007
3 commits
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
-
I suspect Christoph tested his code only in the NUMA configuration, for
the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.Of course, this would only trigger in configurations where those zero-
sized allocations happen (not very common), so that may explain why it
wasn't more widely noticed.Seen by by Andi Kleen under qemu, and there seems to be a report by
Michael Tsirkin on it too.Cc: Andi Kleen
Cc: Roland Dreier
Cc: Michael S. Tsirkin
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Andrew Morton
Signed-off-by: Linus Torvalds -
Work around a possible bug in the FRV compiler.
What appears to be happening is that gcc resolves the
__builtin_constant_p() in kmalloc() to true, but then fails to reduce the
therefore constant conditions in the if-statements it guards to constant
results.When compiling with -O2 or -Os, one single spurious error crops up in
cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
variable const.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
5 commits
-
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.Signed-off-by: Tejun Heo
Acked-by: Paulo Marques
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It becomes now easy to support the zeroing allocs with generic inline
functions in slab.h. Provide inline definitions to allow the continued use of
kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
functions from the slab allocators and util.c.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A kernel convention for many allocators is that if __GFP_ZERO is passed to an
allocator then the allocated memory should be zeroed.This is currently not supported by the slab allocators. The inconsistency
makes it difficult to implement in derived allocators such as in the uncached
allocator and the pool allocators.In addition the support zeroed allocations in the slab allocators does not
have a consistent API. There are no zeroing allocator functions for NUMA node
placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are
only provided for default allocs (kzalloc, kmem_cache_zalloc_node).
__GFP_ZERO will make zeroing universally available and does not require any
addititional functions.So add the necessary logic to all slab allocators to support __GFP_ZERO.
The code is added to the hot path. The gfp flags are on the stack and so the
cacheline is readily available for checking if we want a zeroed object.Zeroing while allocating is now a frequent operation and we seem to be
gradually approaching a 1-1 parity between zeroing and not zeroing allocs.
The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc.Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
allocators. Move ZERO_SIZE_PTR related stuff into slab.h.Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
WARN_ON_ONCE(size == 0) that is still remaining in SLAB.Make slub return NULL like the other allocators if a too large memory segment
is requested via __kmalloc.Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The size of a kmalloc object is readily available via ksize(). ksize is
provided by all allocators and thus we can implement krealloc in a generic
way.Implement krealloc in mm/util.c and drop slab specific implementations of
krealloc.Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jul, 2007
2 commits
-
start_cpu_timer() should be __cpuinit (which also matches what it's
callers are).__devinit didn't cause problems, it simply wasted a few bytes of memory
for the common CONFIG_HOTPLUG_CPU=n case.Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This entry prints a header in .start callback. This is OK, but the more
elegant solution would be to move this into the .show callback and use
seq_list_start_head() in .start one.I have left it as is in order to make the patch just switch to new API and
noting more.[adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer]
Signed-off-by: Pavel Emelianov
Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jul, 2007
1 commit
-
Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
by switching the redzone to 64 bits. Unfortunately, it neglected to
ensure that the _second_ redzone, after the slab object, is aligned
correctly. This caused illegal instruction faults on sparc32, which for
some reason not entirely clear to me are not trapped and fixed up.Two things need to be done to fix this:
- increase the object size, rounding up to alignof(long long) so
that the second redzone can be aligned correctly.
- If SLAB_STORE_USER is set but alignof(long long)==8, allow a
full 64 bits of space for the user word at the end of the buffer,
even though we may not _use_ the whole 64 bits.This patch should be a no-op on any 64-bit architecture or any 32-bit
architecture where alignof(long long) == 4. Of the others, it's tested
on ppc32 by myself and a very similar patch was tested on sparc32 by
Mark Fortescue, who reported the new problem.Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
the new sizes. Again noticed by Mark.Signed-off-by: David Woodhouse
Signed-off-by: Linus Torvalds
02 Jul, 2007
1 commit
-
We agreed to remove the WARN_ON_ONCE before 2.6.22 is released.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Jun, 2007
1 commit
-
cache_free_alien must be called regardless if we use alien caches or not.
cache_free_alien() will do the right thing if there are no alien caches
available.Signed-off-by: Christoph Lameter
Cc: Paul Mundt
Acked-by: Pekka J Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 May, 2007
1 commit
-
Use the new __init_refok marker to avoid the
section mismatch warning from slab.cSigned-off-by: Sam Ravnborg
17 May, 2007
4 commits
-
Currently we have a maze of configuration variables that determine the
maximum slab size. Worst of all it seems to vary between SLAB and SLUB.So define a common maximum size for kmalloc. For conveniences sake we use
the maximum size ever supported which is 32 MB. We limit the maximum size
to a lower limit if MAX_ORDER does not allow such large allocations.For many architectures this patch will have the effect of adding large
kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
memory will be needed for these caches (contemporary SLAB has dynamically
sizeable node and cpu structure so the waste is less than in the past)Most architectures will then be able to allocate object with sizes up to
MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
number of supported processors) on IA64 because one or the other struct
grew beyond what the slab allocators supported. This will avoid future
issues and f.e. avoid fixes for 2k and 4k cpu support.CONFIG_LARGE_ALLOCS is no longer necessary so drop it.
It fixes sparc64 with SLAB.
Signed-off-by: Christoph Lameter
Signed-off-by: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
slub warns on this, and we're working on making kmalloc(0) return NULL.
Let's make slab warn as well so our testers detect such callers more
rapidly.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is no user of destructors left. There is no reason why we should keep
checking for destructors calls in the slab allocators.The RFC for this patch was discussed at
http://marc.info/?l=linux-kernel&m=117882364330705&w=2Destructors were mainly used for list management which required them to take a
spinlock. Taking a spinlock in a destructor is a bit risky since the slab
allocators may run the destructors anytime they decide a slab is no longer
needed.Patch drops destructor support. Any attempt to use a destructor will BUG().
Acked-by: Pekka Enberg
Acked-by: Paul Mundt
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds