16 Jul, 2008
2 commits
-
Conflicts:
arch/powerpc/Kconfig
arch/s390/kernel/time.c
arch/x86/kernel/apic_32.c
arch/x86/kernel/cpu/perfctr-watchdog.c
arch/x86/kernel/i8259_64.c
arch/x86/kernel/ldt.c
arch/x86/kernel/nmi_64.c
arch/x86/kernel/smpboot.c
arch/x86/xen/smp.c
include/asm-x86/hw_irq_32.h
include/asm-x86/hw_irq_64.h
include/asm-x86/mach-default/irq_vectors.h
include/asm-x86/mach-voyager/irq_vectors.h
include/asm-x86/smp.h
kernel/MakefileSigned-off-by: Ingo Molnar
-
With the removal of destructors, slab_destroy_objs no longer actually
destroys any objects, making the kernel doc incorrect and the function
name misleading.In keeping with the other debug functions, rename it to
slab_destroy_debugcheck and drop the kernel doc.Signed-off-by: Rabin Vincent
Signed-off-by: Pekka Enberg
26 Jun, 2008
1 commit
-
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.Acked-by: Jeremy Fitzhardinge
Reviewed-by: Paul E. McKenney
Signed-off-by: Jens Axboe
22 Jun, 2008
1 commit
-
The zonelist patches caused the loop that checks for available
objects in permitted zones to not terminate immediately. One object
per zone per allocation may be allocated and then abandoned.Break the loop when we have successfully allocated one object.
Signed-off-by: Christoph Lameter
Signed-off-by: Linus Torvalds
30 Apr, 2008
2 commits
-
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We can see an ever repeating problem pattern with objects of any kind in the
kernel:1) freeing of active objects
2) reinitialization of active objectsBoth problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem listEach operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Apr, 2008
4 commits
-
Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to .Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Reviewed-by: Christoph Lameter
Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Filtering zonelists requires very frequent use of zone_idx(). This is costly
as it involves a lookup of another structure and a substraction operation. As
the zone_idx is often required, it should be quickly accessible. The node idx
could also be stored here if it was found that accessing zone->node is
significant which may be the case on workloads where nodemasks are heavily
used.This patch introduces a struct zoneref to store a zone pointer and a zone
index. The zonelist then consists of an array of these struct zonerefs which
are looked up as necessary. Helpers are given for accessing the zone index as
well as the node index.[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
[hugh@veritas.com: just return do_try_to_free_pages]
[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
Signed-off-by: Mel Gorman
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Lee Schermerhorn
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Nick Piggin
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently a node has two sets of zonelists, one for each zone type in the
system and a second set for GFP_THISNODE allocations. Based on the zones
allowed by a gfp mask, one of these zonelists is selected. All of these
zonelists consume memory and occupy cache lines.This patch replaces the multiple zonelists per-node with two zonelists. The
first contains all populated zones in the system, ordered by distance, for
fallback allocations when the target/preferred node has no free pages. The
second contains all populated zones in the node suitable for GFP_THISNODE
allocations.An iterator macro is introduced called for_each_zone_zonelist() that interates
through each zone allowed by the GFP flags in the selected zonelist.Signed-off-by: Mel Gorman
Acked-by: Christoph Lameter
Signed-off-by: Lee Schermerhorn
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce a node_zonelist() helper function. It is used to lookup the
appropriate zonelist given a node and a GFP mask. The patch on its own is a
cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If
necessary, it can be merged with the next patch in this set without problems.Reviewed-by: Christoph Lameter
Signed-off-by: Mel Gorman
Signed-off-by: Lee Schermerhorn
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Apr, 2008
1 commit
-
* Use new node_to_cpumask_ptr. This creates a pointer to the
cpumask for a given node. This definition is in mm patch:asm-generic-add-node_to_cpumask_ptr-macro.patch
* Use new set_cpus_allowed_ptr function.
Depends on:
[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
[sched-devel]: sched: add new set_cpus_allowed_ptr function
[x86/latest]: x86: add cpus_scnprintf functionCc: Greg Kroah-Hartman
Cc: Greg Banks
Cc: H. Peter Anvin
Signed-off-by: Mike Travis
Signed-off-by: Ingo Molnar
27 Mar, 2008
1 commit
-
Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on
memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
patch fixes list_add() corruption in mm/slab.c seen on the ES7000.Cc: Mel Gorman
Cc: Olaf Hering
Cc: Christoph Lameter
Signed-off-by: Dan Yeisley
Signed-off-by: Pekka Enberg
Signed-off-by: Christoph Lameter
20 Mar, 2008
1 commit
-
Fix various kernel-doc notation in mm/:
filemap.c: add function short description; convert 2 to kernel-doc
fremap.c: change parameter 'prot' to @prot
pagewalk.c: change "-" in function parameters to ":"
slab.c: fix short description of kmem_ptr_validate()
swap.c: fix description & parameters of put_pages_list()
swap_state.c: fix function parameters
vmalloc.c: change "@returns" to "Returns:" since that is not a parameterSigned-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Mar, 2008
3 commits
-
NUMA slab allocator cpu migration bugfix
The NUMA slab allocator (specifically, cache_alloc_refill)
is not refreshing its local copies of what cpu and what
numa node it is on, when it drops and reacquires the irq
block that it inherited from its caller. As a result
those values become invalid if an attempt to migrate the
process to another numa node occured while the irq block
had been dropped.The solution is to make cache_alloc_refill reload these
variables whenever it drops and reacquires the irq block.The error is very difficult to hit. When it does occur,
one gets the following oops + stack traceback bits in
check_spinlock_acquired:kernel BUG at mm/slab.c:2417
cache_alloc_refill+0xe6
kmem_cache_alloc+0xd0
...This patch was developed against 2.6.23, ported to and
compiled-tested only against 2.6.25-rc4.Signed-off-by: Joe Korty
Signed-off-by: Christoph Lameter -
Make them all use angle brackets and the directory name.
Acked-by: Pekka Enberg
Signed-off-by: Joe Perches
Signed-off-by: Christoph Lameter -
The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the
flags passed in.Reviewed-by: Pekka Enberg
Signed-off-by: Christoph Lameter
15 Feb, 2008
1 commit
-
- alloc_slabmgmt: initialize all slab fields in 1 place
- slab->nodeid was initialized twice: in alloc_slabmgmt
and immediately after it in cache_growSigned-off-by: Marcin Slusarz
CC: Christoph Lameter
Reviewed-by: Pekka Enberg
Signed-off-by: Christoph Lameter
26 Jan, 2008
2 commits
-
This patch converts the known per-subsystem mutexes to get_online_cpus
put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and
CPU_LOCK_RELEASE hotplug notification events.Signed-off-by: Gautham R Shenoy
Signed-off-by: Ingo Molnar -
If the node we're booting on doesn't have memory, bootstrapping kmalloc()
caches resorts to fallback_alloc() which requires ->nodelists set for all
nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in
kmem_cache_init().As kmem_getpages() is called with GFP_THISNODE set, this used to work before
because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from
the wrong node if a node had no memory. So it may have worked accidentally and
in an unsafe manner because the pages would have been associated with the wrong
node which could trigger bug ons and locking troubles.Tested-by: Mel Gorman
Tested-by: Olaf Hering
Reviewed-by: Christoph Lameter
Signed-off-by: Pekka Enberg
[ With additional one-liner by Olaf Hering - Linus ]
Signed-off-by: Linus Torvalds
25 Jan, 2008
1 commit
-
Partial revert the changes made by 04231b3002ac53f8a64a7bd142fde3fa4b6808c6
to the kmem_list3 management. On a machine with a memoryless node, this
BUG_ON was triggeringstatic void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
struct list_head *entry;
struct slab *slabp;
struct kmem_list3 *l3;
void *obj;
int x;l3 = cachep->nodelists[nodeid];
BUG_ON(!l3);Signed-off-by: Mel Gorman
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Cc: "Aneesh Kumar K.V"
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Jan, 2008
1 commit
-
Both SLUB and SLAB really did almost exactly the same thing for
/proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
and SLAB, and shares all the setup code. Maybe SLOB will want this some
day too.Reviewed-by: Pekka Enberg
Signed-off-by: Linus Torvalds
06 Dec, 2007
1 commit
-
mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.
It's used by binfmt_flat, which can be built as a module.
Signed-off-by: Tetsuo Handa
Cc: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Dec, 2007
1 commit
-
The database performance group have found that half the cycles spent
in kmem_cache_free are spent in this one call to BUG_ON. Moving it
into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
performance win of almost 0.5% on their particular benchmark.The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff
with the comment that "overhead should be minimal". It may have been
minimal at the time, but it isn't now.[ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the
performance regression but rather the virt_to_head_page() changes to
virt_to_cache() that were added later." ]Signed-off-by: Matthew Wilcox
Acked-by: Pekka J Enberg
Signed-off-by: Linus Torvalds
15 Nov, 2007
1 commit
-
This patch fixes wrong array index in allocation failure handling.
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
1 commit
-
Spelling fixes in mm/.
Signed-off-by: Simon Arlott
Signed-off-by: Adrian Bunk
19 Oct, 2007
2 commits
-
This patch fixes memory leak in error path.
In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming
cpu hotplug error handling change needs this.Cc: Christoph Lameter
Cc: Gautham R Shenoy
Acked-by: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and
CPU_UP_PREPARE handlings from cpuup_callback().Cc: Christoph Lameter
Cc: Pekka Enberg
Signed-off-by: Akinobu Mita
Cc: Gautham R Shenoy
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2007
6 commits
-
Since nothing earlier than gcc-3.2 is supported for kernel
compilation, that 2.95 hack can be removed.Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.Signed-off-by: Mel Gorman
Cc: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up
the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
flags:GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior.
GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur.
GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on.
These replace the uses of GFP_LEVEL mask in the slab allocators and in
vmalloc.c.The use of the flags not included in these sets may occur as a result of a
slab allocation standing in for a page allocation when constructing scatter
gather lists. Extraneous flags are cleared and not passed through to the
page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
now be ignored if passed to a slab allocator.Change the allocation of allocator meta data in SLAB and vmalloc to not
pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the
__GFP_THISNODE flag for such allocations. Generalize that to also cover
vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.The impact of allocator metadata placement on access latency to the
cachelines of the object itself is minimal since metadata is only
referenced on alloc and free. The attempt is still made to place the meta
data optimally but we consistently allow fallback both in SLAB and vmalloc
(SLUB does not need to allocate metadata like that).Allocator metadata may serve multiple in kernel users and thus should not
be subject to the limitations arising from a single allocation context.[akpm@linux-foundation.org: fix fallback_alloc()]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab should not allocate control structures for nodes without memory. This
may seem to work right now but its unreliable since not all allocations can
fall back due to the use of GFP_THISNODE.Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to
only allocate for nodes that have regular memory.Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A NULL pointer means that the object was not allocated. One cannot
determine the size of an object that has not been allocated. Currently we
return 0 but we really should BUG() on attempts to determine the size of
something nonexistent.krealloc() interprets NULL to mean a zero sized object. Handle that
separately in krealloc().Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Aug, 2007
1 commit
-
Skip calling cache_free_alien() when the platform is not numa capable.
This will avoid cache misses that happen while accessing slabp (which is
per page memory reference) to get nodeid. Instead use a global variable to
skip the call, which is mostly likely to be present in the cache.This gives a 0.8% performance boost with the database oltp workload on a
quad-core SMP platform and by any means the number is not small :)Signed-off-by: Suresh Siddha
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jul, 2007
1 commit
-
Use the correct local variable when calling into the page allocator. Local
`flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
page allocator, possibly from illegal contexts. The page allocator will later
do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
then go BUG.Cc: Mike Galbraith
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2007
3 commits
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
-
I suspect Christoph tested his code only in the NUMA configuration, for
the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.Of course, this would only trigger in configurations where those zero-
sized allocations happen (not very common), so that may explain why it
wasn't more widely noticed.Seen by by Andi Kleen under qemu, and there seems to be a report by
Michael Tsirkin on it too.Cc: Andi Kleen
Cc: Roland Dreier
Cc: Michael S. Tsirkin
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Andrew Morton
Signed-off-by: Linus Torvalds -
Work around a possible bug in the FRV compiler.
What appears to be happening is that gcc resolves the
__builtin_constant_p() in kmalloc() to true, but then fails to reduce the
therefore constant conditions in the if-statements it guards to constant
results.When compiling with -O2 or -Os, one single spurious error crops up in
cpuup_callback() in mm/slab.c. This can be avoided by making the memsize
variable const.Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2007
2 commits
-
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.Signed-off-by: Tejun Heo
Acked-by: Paulo Marques
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It becomes now easy to support the zeroing allocs with generic inline
functions in slab.h. Provide inline definitions to allow the continued use of
kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
functions from the slab allocators and util.c.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds