08 Apr, 2010
1 commit
-
…git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
x86: Increase CONFIG_NODES_SHIFT max to 10
ibft, x86: Change reserve_ibft_region() to find_ibft_region()
x86, hpet: Fix bug in RTC emulation
x86, hpet: Erratum workaround for read after write of HPET comparator
bootmem, x86: Fix 32bit numa system without RAM on node 0
nobootmem, x86: Fix 32bit numa system without RAM on node 0
x86: Handle overlapping mptables
x86: Make e820_remove_range to handle all covered case
x86-32, resume: do a global tlb flush in S4 resume
02 Apr, 2010
2 commits
-
When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.If node 0 doesn't have RAM installed, the lowest populated node
becomes low RAM.This one fixes BOOTMEM path by iterating over the bdata_list.
-v3: add more comments, and fix bootmem path too.
-v4: seperate from one big patchSigned-off-by: Yinghai Lu
LKML-Reference:
Signed-off-by: H. Peter Anvin -
On one system without RAM on node0, got following boot dump with a 32
bit NUMA kernel:early_node_map[4] active PFN ranges
1: 0x00000010 -> 0x00000099
1: 0x00000100 -> 0x0007da00
1: 0x0007e800 -> 0x0007ffa0
1: 0x0007ffae -> 0x0007ffb0
...
Subtract (29 early reservations)
#000 [0000001000 - 0000002000]
#001 [0000089000 - 000008f000]
#002 [0000091000 - 0000093500]
...
#027 [007cbfef40 - 007e800000]
#028 [007e9ca000 - 007ff95000]
(0 free memory ranges)
Initializing HighMem for node 0 (00000000:00000000)
Initializing HighMem for node 1 (00000000:00000000)
Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
...
Checking if this processor honours the WP bit even in supervisor mode...Ok.
swapper: page allocation failure. order:0, mode:0x0
Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
Call Trace:
[] ? printk+0xf/0x11
[] __alloc_pages_nodemask+0x417/0x487
[] new_slab+0xe2/0x1fe
[] kmem_cache_open+0x185/0x358
[] T.954+0x1c/0x60
[] kmem_cache_init+0x24/0x113
[] start_kernel+0x166/0x2e4
[] ? unknown_bootoption+0x0/0x18e
[] i386_start_kernel+0xce/0xd5
Mem-Info:
Node 1 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Node 1 Normal per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:0 slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0 bounce:0When 32bit NUMA is used, free_all_bootmem() will still only go over with
node id 0.If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
become low ram.Use MAX_NUMNODES like 64-bit NUMA does.
Note: BOOTMEM path has the same problem.
this bug exist before We have NO_BOOTMEM support.-v3: add more comments, and fix bootmem path too.
-v4: seperate bootmem path fixSigned-off-by: Yinghai Lu
LKML-Reference:
Signed-off-by: H. Peter Anvin
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
25 Mar, 2010
1 commit
-
Commit 08677214e318297 ("x86: Make 64 bit use early_res instead
of bootmem before slab") introduced early_res replacement for
bootmem, but left code in __free_pages_memory() which dumps all
the ranges that are beeing freed, without any additional
information, causing some noise in dmesg during bootup.Just remove printing of the ranges, that doesn't provide
anything useful anyway.While at it, remove other commented-out KERN_DEBUG messages in
the NO_BOOTMEM code as well.Signed-off-by: Jiri Kosina
Found-OK-by: Andrew Morton
Cc: Johannes Weiner
Cc: Yinghai Lu
LKML-Reference:
Signed-off-by: Ingo Molnar
13 Feb, 2010
1 commit
-
Finally we can use early_res to replace bootmem for x86_64 now.
Still can use CONFIG_NO_BOOTMEM to enable it or not.
-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message belowSigned-off-by: Yinghai Lu
LKML-Reference:
Signed-off-by: H. Peter Anvin
16 Dec, 2009
1 commit
-
Signed-off-by: Jan Beulich
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Nov, 2009
1 commit
-
Add a new function for freeing bootmem after the bootmem
allocator has been released and the unreserved pages given to
the page allocator.This allows us to reserve bootmem and then release it if we
later discover it was not needed.( This new API will be used by the swiotlb code to recover
a significant amount of RAM (64MB). )Signed-off-by: FUJITA Tomonori
Acked-by: Pekka Enberg
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: hannes@cmpxchg.org
Cc: tj@kernel.org
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar
27 Aug, 2009
1 commit
-
This patch sets the min_count for alloc_bootmem objects to 0 so that
they are never reported as leaks. This is because many of these blocks
are only referred via the physical address which is not looked up by
kmemleak.Signed-off-by: Catalin Marinas
Cc: Pekka Enberg
08 Jul, 2009
1 commit
-
This patch adds kmemleak_alloc/free callbacks to the bootmem allocator.
This would allow scanning of such blocks and help avoiding a whole class
of false positives and more kmemleak annotations.Signed-off-by: Catalin Marinas
Cc: Ingo Molnar
Acked-by: Pekka Enberg
Reviewed-by: Johannes Weiner
20 Jun, 2009
1 commit
-
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Joe Perches
Cc: Pekka Enberg
Cc: Benjamin Herrenschmidt
Cc: Tejun Heo
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jun, 2009
2 commits
-
If the user requested bootmem allocation on a specific node, we should use
kzalloc_node() for the fallback allocation.Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Linus Torvalds
Cc: Yinghai Lu
Signed-off-by: Pekka Enberg -
As a preparation for initializing the slab allocator early, make sure the
bootmem allocator does not crash and burn if someone calls it after slab is up;
otherwise we'd need a flag day for switching to early slab.Acked-by: Johannes Weiner
Acked-by: Linus Torvalds
Cc: Christoph Lameter
Cc: Ingo Molnar
Cc: Matt Mackall
Cc: Nick Piggin
Cc: Yinghai Lu
Signed-off-by: Pekka Enberg
01 Mar, 2009
1 commit
-
Impact: fix new breakages introduced by previous fix
Commit c132937556f56ee4b831ef4b23f1846e05fde102 tried to clean up
bootmem arch wrapper but it wasn't quite correct. Before the commit,
the followings were broken.* Low level interface functions prefixed with __ ignored arch
preference.* reserve_bootmem(...) can't be mapped into
reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
not preference here. The region specified MUST fall into the
specified region; otherwise, it will panic.After the commit,
* If allocation fails for the arch preferred node, it should fallback
to whatever is available. Instead, it simply failed allocation.There are too many internal details to allow generic wrapping and
still keep things simple for archs. Plus, all that arch wants is a
way to prefer certain node over another.This patch drops the generic wrapping around alloc_bootmem_core() and
add alloc_bootmem_core() instead. If necessary, arch can define
bootmem_arch_referred_node() macro or function which takes all
allocation information and returns the preferred node. bootmem
generic code will always try the preferred node first and then
fallback to other nodes as usual.Breakages noted and changes reviewed by Johannes Weiner.
Signed-off-by: Tejun Heo
Acked-by: Johannes Weiner
24 Feb, 2009
1 commit
-
Impact: cleaner and consistent bootmem wrapping
By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation. However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped. This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping. Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.Signed-off-by: Tejun Heo
Cc: Johannes Weiner
07 Jan, 2009
1 commit
-
Moving the request details print-out before the sanity checks that
might panic() enables us to analyse invalid requests without having
access to the line information of the stack dump.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Oct, 2008
1 commit
-
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Aug, 2008
1 commit
-
Absolute alignment requirements may never be applied to node-relative
offsets. Andreas Herrmann spotted this flaw when a bootmem allocation on
an unaligned node was itself not aligned because the combination of an
unaligned node with an aligned offset into that node is not garuanteed to
be aligned itself.This patch introduces two helper functions that align a node-relative
index or offset with respect to the node's starting address so that the
absolute PFN or virtual address that results from combining the two
satisfies the requested alignment.Then all the broken ALIGN()s in alloc_bootmem_core() are replaced by these
helpers.Signed-off-by: Johannes Weiner
Reported-by: Andreas Herrmann
Debugged-by: Andreas Herrmann
Reviewed-by: Andreas Herrmann
Tested-by: Andreas Herrmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Aug, 2008
1 commit
-
This is the minimal sequence that jams the allocator:
void *p, *q, *r;
p = alloc_bootmem(PAGE_SIZE);
q = alloc_bootmem(64);
free_bootmem(p, PAGE_SIZE);
p = alloc_bootmem(PAGE_SIZE);
r = alloc_bootmem(64);after this sequence (assuming that the allocator was empty or page-aligned
before), pointer "q" will be equal to pointer "r".What's hapenning inside the allocator:
p = alloc_bootmem(PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
q = alloc_bootmem(64);
in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
free_bootmem(p, PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
p = alloc_bootmem(PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
r = alloc_bootmem(64);and now:
it finds bit "2", as a place where to allocate (sidx)
it hits the condition
if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
start_off = ALIGN(bdata->last_end_off, align);-you can see that the condition is true, so it assigns start_off =
ALIGN(bdata->last_end_off, align); (that is PAGE_SIZE) and allocates
over already allocated block.With the patch it tries to continue at the end of previous allocation only
if the previous allocation ended in the middle of the page.Signed-off-by: Mikulas Patocka
Acked-by: Johannes Weiner
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jul, 2008
20 commits
-
Almost all users of this field need a PFN instead of a physical address,
so replace node_boot_start with node_min_pfn.[Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
Signed-off-by: Johannes Weiner
Cc:
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since alloc_bootmem_core does no goal-fallback anymore and just returns
NULL if the allocation fails, we might now use it in alloc_bootmem_section
without all the fixup code for a misplaced allocation.Also, the limit can be the first PFN of the next section as the semantics
is that the limit is _above_ the allocated region, not within.Signed-off-by: Johannes Weiner
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__alloc_bootmem_node already does this, make the interface consistent.
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The old node-agnostic code tried allocating on all nodes starting from the
one with the lowest range. alloc_bootmem_core retried without the goal if
it could not satisfy it and so the goal was only respected at all when it
happened to be on the first (lowest page numbers) node (or theoretically
if allocations failed on all nodes before to the one holding the goal).Introduce a non-panicking helper that starts allocating from the node
holding the goal and falls back only after all thes tries failed, thus
moving the goal fallback code out of alloc_bootmem_core.Make all other allocation functions benefit from this new helper.
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Andi Kleen
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce new helpers that mark a range that resides completely on a node
or node-agnostic ranges that might also span node boundaries.The free/reserve API functions will then directly use these helpers.
Note that the free/reserve semantics become more strict: while the prior
code took basically arbitrary range arguments and marked the PFNs that
happen to fall into that range, the new code requires node-specific ranges
to be completely on the node. The node-agnostic requests might span node
boundaries as long as the nodes are contiguous.Passing ranges that do not satisfy these criteria is a bug.
[akpm@linux-foundation.org: fix printk warnings]
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Factor out the common operation of marking a range on the bitmap.
[akpm@linux-foundation.org: fix various warnings]
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alloc_bootmem_core has become quite nasty to read over time. This is a
clean rewrite that keeps the semantics.bdata->last_pos has been dropped.
bdata->last_success has been renamed to hint_idx and it is now an index
relative to the node's range. Since further block searching might start
at this index, it is now set to the end of a succeeded allocation rather
than its beginning.bdata->last_offset has been renamed to last_end_off to be more clear that
it represents the ending address of the last allocation relative to the
node.[y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()]
Signed-off-by: Johannes Weiner
Signed-off-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rewrite the code in a more concise way using less variables.
[akpm@linux-foundation.org: fix printk warnings]
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
link_bootmem handles an insertion of a new descriptor into the sorted list
in more or less three explicit branches; empty list, insert in between and
append. These cases can be expressed implicite.Also mark the sorted list as initdata as it can be thrown away after boot
as well.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Reincarnate get_mapsize as bootmap_bytes and implement
bootmem_bootmap_pages on top of it.Adjust users of these helpers and make free_all_bootmem_core use
bootmem_bootmap_pages instead of open-coding it.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce the bootmem_debug kernel parameter that enables very verbose
diagnostics regarding all range operations of bootmem as well as the
initialization and release of nodes.[akpm@linux-foundation.org: fix printk warnings]
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change the description, move a misplaced comment about the allocator
itself and add me to the list of copyright holders.Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This only reorders functions so that further patches will be easier to
read. No code changed.Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Straight forward variant of the existing __alloc_bootmem_node, only
subsequent patch when allocating giant hugepages at boot -- don't want to
panic if we can't allocate as many as the user asked for.Signed-off-by: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This function has no external callers, so unexport it. Also fix its naming
inconsistency.Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Andy Whitcroft
Cc: Mel Gorman
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All _core functions only need the bootmem data, not the whole node descriptor.
Adjust the two functions that take the node descriptor unneededly.Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The check for node_boot_start is bogus because we start freeing at the
corresponding pfn. So check if the pfn is properly aligned instead in a more
readable way and adjust the documentation.Also remove an unneeded accounting variable.
Signed-off-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are a lot of places that define either a single bootmem descriptor or an
array of them. Use only one central array with MAX_NUMNODES items instead.Signed-off-by: Johannes Weiner
Acked-by: Ralf Baechle
Cc: Ingo Molnar
Cc: Richard Henderson
Cc: Russell King
Cc: Tony Luck
Cc: Hirokazu Takata
Cc: Geert Uytterhoeven
Cc: Kyle McMartin
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: David S. Miller
Cc: Yinghai Lu
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are a number of different views to how much memory is currently active.
There is the arch-independent zone-sizing view, the bootmem allocator and
memory models view.Architectures register this information at different times and is not
necessarily in sync particularly with respect to some SPARSEMEM limitations.This patch introduces mminit_validate_memmodel_limits() which is able to
validate and correct PFN ranges with respect to the memory model. It is only
SPARSEMEM that currently validates itself.Signed-off-by: Mel Gorman
Cc: Christoph Lameter
Cc: Andy Whitcroft
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Jun, 2008
1 commit
-
This patch changes the function reserve_bootmem_node() from void to int,
returning -ENOMEM if the allocation fails.This fixes a build problem on x86 with CONFIG_KEXEC=y and
CONFIG_NEED_MULTIPLE_NODES=ySigned-off-by: Bernhard Walle
Reported-by: Adrian Bunk
Signed-off-by: Linus Torvalds