30 Apr, 2013
1 commit
-
Squishes a warning which my change to hotplug_memory_notifier() added.
I want to keep that warning, because it is punishment for failnig to check
the hotplug_memory_notifier() return value.Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Dec, 2012
2 commits
-
We need a node which only contains movable memory. This feature is very
important for node hotplug. If a node has normal/highmem, the memory may
be used by the kernel and can't be offlined. If the node only contains
movable memory, we can offline the memory and the node.All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node[akpm@linux-foundation.org: fix Kconfig text]
Signed-off-by: Lai Jiangshan
Tested-by: Yasuaki Ishimatsu
Signed-off-by: Wen Congyang
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.Signed-off-by: Lai Jiangshan
Acked-by: Hillf Danton
Signed-off-by: Wen Congyang
Cc: Christoph Lameter
Cc: Hillf Danton
Cc: Lin Feng
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Dec, 2012
4 commits
-
use [index] = init_value
use N_xxxxx instead of hardcode.Make it more readability and easier to add new state.
Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
register_node() is defined as extern in include/linux/node.h. But the
function is only called from register_one_node() in driver/base/node.c.So the patch defines register_node() as static.
Signed-off-by: Yasuaki Ishimatsu
Acked-by: David Rientjes
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When calling unregister_node(), the function shows following message at
device_release()."Device 'node2' does not have a release() function, it is broken and must
be fixed."The reason is node's device struct does not have a release() function.
So the patch registers node_device_release() to the device's release()
function for suppressing the warning message. Additionally, the patch
adds memset() to initialize a node struct into register_node(). Because
the node struct is part of node_devices[] array and it cannot be freed by
node_device_release(). So if system reuses the node struct, it has a
garbage.Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Wen Congyang
Cc: David Rientjes
Cc: Jiang Liu
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We use a static array to store struct node. In many cases, we don't have
too many nodes, and some memory will be unused. Convert it to per-device
dynamically allocated memory.Signed-off-by: Wen Congyang
Cc: David Rientjes
Cc: Jiang Liu
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 May, 2012
1 commit
-
/sys/devices/system/node/{online,possible} outputs a garbage byte
because print_nodes_state() returns content size + 1. To fix the bug,
the patch changes the use of cpuset_sprintf_cpulist to follow the use at
other places, which is clearer and safer.This bug was introduced in v2.6.24 (commit bde631a51876: "mm: add node
states sysfs class attributeS").Signed-off-by: Ryota Ozaki
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Feb, 2012
1 commit
-
One system with 2048g ram, reported soft lockup on recent kernel.
[ 34.426749] cpu_dev_init done
[ 61.166399] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 61.166733] Modules linked in:
[ 61.166904] irq event stamp: 1935610
[ 61.178431] hardirqs last enabled at (1935609): [] mutex_lock_nested+0x299/0x2b4
[ 61.178923] hardirqs last disabled at (1935610): [] apic_timer_interrupt+0x6b/0x80
[ 61.198767] softirqs last enabled at (1935476): [] __do_softirq+0x195/0x1ab
[ 61.218604] softirqs last disabled at (1935471): [] call_softirq+0x1c/0x30
[ 61.238408] CPU 0
[ 61.238549] Modules linked in:
[ 61.238744]
[ 61.238825] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc1-tip-yh-02076-g962f689-dirty #171
[ 61.278212] RIP: 0010:[] [] lock_release+0x90/0x9c
[ 61.278627] RSP: 0018:ffff883f64dbfd70 EFLAGS: 00000246
[ 61.298287] RAX: ffff883f64dc0000 RBX: 0000000000000000 RCX: 000000000000008b
[ 61.298690] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 61.318383] RBP: ffff883f64dbfda0 R08: 0000000000000001 R09: 000000000000008b
[ 61.338215] R10: 0000000000000000 R11: 0000000000000000 R12: ffff883f64dbfd10
[ 61.338610] R13: ffff883f64dc0708 R14: ffff883f64dc0708 R15: ffffffff81095657
[ 61.358299] FS: 0000000000000000(0000) GS:ffff883f7d600000(0000) knlGS:0000000000000000
[ 61.378118] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 61.378450] CR2: 0000000000000000 CR3: 00000000024af000 CR4: 00000000000007f0
[ 61.398144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 61.417918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 61.418260] Process swapper/0 (pid: 1, threadinfo ffff883f64dbe000, task ffff883f64dc0000)
[ 61.445358] Stack:
[ 61.445511] 0000000000000002 ffff897f649ba168 ffff883f64dbfe10 ffff88ff64bb57a8
[ 61.458040] 0000000000000000 0000000000000000 ffff883f64dbfdc0 ffffffff81ceb1b4
[ 61.458491] 000000000011608c ffff88ff64bb58a8 ffff883f64dbfdf0 ffffffff81c57638
[ 61.478215] Call Trace:
[ 61.478367] [] _raw_spin_unlock+0x21/0x2e
[ 61.497994] [] klist_next+0x9e/0xbc
[ 61.498264] [] next_device+0xe/0x1e
[ 61.517867] [] subsys_find_device_by_id+0xb7/0xd6
[ 61.518197] [] find_memory_block_hinted+0x3d/0x66
[ 61.537927] [] find_memory_block+0x10/0x12
[ 61.538193] [] add_memory_section+0x35/0x9e
[ 61.557932] [] memory_dev_init+0x68/0xda
[ 61.558227] [] driver_init+0x97/0xa7
[ 61.577853] [] kernel_init+0xf6/0x1c0
[ 61.578140] [] kernel_thread_helper+0x4/0x10
[ 61.597850] [] ? retint_restore_args+0xe/0xe
[ 61.598144] [] ? start_kernel+0x3ab/0x3ab
[ 61.617826] [] ? gs_change+0xb/0xb
[ 61.618060] Code: 10 48 83 3b 00 eb e8 4c 89 f2 44 89 fe 4c 89 ef e8 e1 fe ff ff 65 48 8b 04 25 40 bc 00 00 c7 80 cc 06 00 00 00 00 00 00 41 54 9d 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 41 89 cf
[ 89.285380] memory_dev_init doneFinally it takes about 55s to create 16400 memory entries.
Root cause: for x86_64, 2048g (with 2g hole at [2g,4g), and TOP2 will be 2050g), will have 16400 memory block.
find_memory_block/subsys_find_device_by_id will be expensive with that many entries.
Actually, we don't need to find that memory block for BOOT path.
Skip that finding make it get back to normal.
[ 34.466696] cpu_dev_init done
[ 35.290080] memory_dev_init doneAlso solved the delay with topology_init when sections_per_block is not 1.
Signed-off-by: Yinghai Lu
Cc: Kay Sievers
Cc: Nathan Fontenot
Cc: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman
07 Jan, 2012
1 commit
-
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.The microcode_core.c patch was provided by Stephen Rothwell
who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.Signed-off-by: Greg Kroah-Hartman
22 Dec, 2011
2 commits
-
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman -
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: Tigran Aivazian
Cc: Len Brown
Cc: Zhang Rui
Cc: Dave Jones
Cc: Peter Zijlstra
Cc: Russell King
Cc: Andrew Morton
Cc: Arjan van de Ven
Cc: "Rafael J. Wysocki"
Cc: "Srivatsa S. Bhat"
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
19 Nov, 2011
1 commit
-
Patch to fix the error message "directives may not be used inside a macro
argument" which appears when the kernel is compiled for the cris architecture.Signed-off-by: Claudio Scordino
Acked-by: David Rientjes
Cc: stable
Signed-off-by: Greg Kroah-Hartman
25 May, 2011
1 commit
-
commit 2ac390370a ("writeback: add
/sys/devices/system/node//vmstat") added vmstat entry. But
strangely it only show nr_written and nr_dirtied.# cat /sys/devices/system/node/node20/vmstat
nr_written 0
nr_dirtied 0Of course, It's not adequate. With this patch, the vmstat show all vm
stastics as /proc/vmstat.# cat /sys/devices/system/node/node0/vmstat
nr_free_pages 899224
nr_inactive_anon 201
nr_active_anon 17380
nr_inactive_file 31572
nr_active_file 28277
nr_unevictable 0
nr_mlock 0
nr_anon_pages 17321
nr_mapped 8640
nr_file_pages 60107
nr_dirty 33
nr_writeback 0
nr_slab_reclaimable 6850
nr_slab_unreclaimable 7604
nr_page_table_pages 3105
nr_kernel_stack 175
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 260
nr_dirtied 1050
nr_written 938
numa_hit 962872
numa_miss 0
numa_foreign 0
numa_interleave 8617
numa_local 962872
numa_other 0
nr_anon_transparent_hugepages 0[akpm@linux-foundation.org: no externs in .c files]
Signed-off-by: KOSAKI Motohiro
Cc: Michael Rubin
Cc: Wu Fengguang
Acked-by: David Rientjes
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Feb, 2011
1 commit
-
Update the 'phys_index' property of a the memory_block struct to be
called start_section_nr, and add a end_section_nr property. The
data tracked here is the same but the updated naming is more in line
with what is stored here, namely the first and last section number
that the memory block spans.The names presented to userspace remain the same, phys_index for
start_section_nr and end_phys_index for end_section_nr, to avoid breaking
anything in userspace.This also updates the node sysfs code to be aware of the new capability for
a memory block to contain multiple memory sections and be aware of the memory
block structure name changes (start_section_nr). This requires an additional
parameter to unregister_mem_sect_under_nodes so that we know which memory
section of the memory block to unregister.Signed-off-by: Nathan Fontenot
Reviewed-by: Robin Holt
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Greg Kroah-Hartman
14 Jan, 2011
1 commit
-
Add hugepage statistics to per-node sysfs meminfo
Reviewed-by: Rik van Riel
Signed-off-by: David Rientjes
Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Oct, 2010
1 commit
-
For NUMA node systems it is important to have visibility in memory
characteristics. Two of the /proc/vmstat values "nr_written" and
"nr_dirtied" are added here.# cat /sys/devices/system/node/node20/vmstat
nr_written 0
nr_dirtied 0Signed-off-by: Michael Rubin
Reviewed-by: Wu Fengguang
Cc: Dave Chinner
Cc: Jens Axboe
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Oct, 2010
1 commit
-
Modify link_mem_sections() to pass in the previous mem_block as a hint to
locating the next mem_block. Since they are typically added in order this
results in a massive saving in time during boot of a very large system.
For example, on a 16TB x86_64 machine, it reduced the total time spent
linking all node's memory sections from 1 hour, 27 minutes to 46 seconds.Signed-off-by: Robin Holt
To: Gary Hade
To: Badari Pulavarty
To: Ingo Molnar
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Greg Kroah-Hartman
10 Aug, 2010
1 commit
-
drivers/base/node.c: In function 'node_read_meminfo':
drivers/base/node.c:139: warning: the frame size of 848 bytes is
larger than 512 bytesFix it by splitting the sprintf() into three parts. It has no functional
change.Signed-off-by: KOSAKI Motohiro
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 May, 2010
1 commit
-
Add a per-node sysfs file called compact. When the file is written to,
each zone in that node is compacted. The intention that this would be
used by something like a job scheduler in a batch system before a job
starts so that the job can allocate the maximum number of hugepages
without significant start-up cost.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Apr, 2010
1 commit
-
NODEMASK_ALLOC/FREE are mapped to kmalloc/free if NODES_SHIFT > 8.
Among its several users, drivers/base/node.c wasn't including slab.h
leading to build failure if NODES_SHIFT > 8. Include slab.h from
drivers/base/node.c.This isn't an ideal solution but including slab.h directly from
nodemask.h is not an option because nodemask.h gets included
everywhere. For now, make it work by including slab.h from its users.Signed-off-by: Tejun Heo
Reported-by: Ingo Molnar
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
19 Mar, 2010
1 commit
-
node_read_distance() has a BUILD_BUG_ON() to prevent buffer overruns when
the number of nodes printed will exceed the buffer length.Each node only needs four chars: three for distance (maximum distance is
255) and one for a seperating space or a trailing newline.Signed-off-by: David Rientjes
Cc: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman
08 Mar, 2010
3 commits
-
Convert the node driver to sysdev_class attribute arrays. This
greatly cleans up the code and remove a lot of code.Saves ~150 bytes of code on x86-64.
Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman -
Using the new attribute argument convert the node driver class
attributes to carry the node state. Then use a shared function to do
what a lot of individual functions did before.Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman -
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.Also drivers can extend the attributes with own data fields
and use that in the low level function.Similar to sysdev_attributes and normal attributes.
This is a tree-wide sweep, converting everything in one go.
No functional changes in this patch other than passing the new
argument everywhere.Tested on x86, the non x86 parts are uncompiled.
Signed-off-by: Andi Kleen
Signed-off-by: Greg Kroah-Hartman
16 Dec, 2009
8 commits
-
Nodemasks should not be allocated on the stack for large systems (when it
is larger than 256 bytes) since there is a threat of overflow.This patch causes the unregister_mem_sect_under_nodes() nodemask to be
allocated on the stack for smaller systems and be allocated by slab for
larger systems.GFP_KERNEL is used since remove_memory_block() can block.
Cc: Gary Hade
Cc: Badari Pulavarty
Cc: Alex Chiang
Signed-off-by: David Rientjes
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
You can discover which CPUs belong to a NUMA node by examining
/sys/devices/system/node/node#/However, it's not convenient to go in the other direction, when looking at
/sys/devices/system/cpu/cpu#/Yes, you can muck about in sysfs, but adding these symlinks makes life a
lot more convenient.Signed-off-by: Alex Chiang
Acked-by: David Rientjes
Cc: Gary Hade
Cc: Badari Pulavarty
Cc: Ingo Molnar
Cc: David Rientjes
Cc: Greg KH
Cc: Randy Dunlap
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
By returning early if the node is not online, we can unindent the
interesting code by two levels.No functional change.
Signed-off-by: Alex Chiang
Cc: Gary Hade
Cc: Badari Pulavarty
Cc: Ingo Molnar
Cc: David Rientjes
Cc: Greg KH
Cc: Randy Dunlap
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
By returning early if the node is not online, we can unindent the
interesting code by one level.No functional change.
Signed-off-by: Alex Chiang
Cc: Gary Hade
Cc: Badari Pulavarty
Cc: Ingo Molnar
Cc: David Rientjes
Cc: Greg KH
Cc: Randy Dunlap
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit c04fc586c (mm: show node to memory section relationship with
symlinks in sysfs) created symlinks from nodes to memory sections, e.g./sys/devices/system/node/node1/memory135 -> ../../memory/memory135
If you're examining the memory section though and are wondering what node
it might belong to, you can find it by grovelling around in sysfs, but
it's a little cumbersome.Add a reverse symlink for each memory section that points back to the
node to which it belongs.Signed-off-by: Alex Chiang
Cc: Gary Hade
Cc: Badari Pulavarty
Cc: Ingo Molnar
Acked-by: David Rientjes
Cc: Greg KH
Cc: Randy Dunlap
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Offload the registration and unregistration of per node hstate sysfs
attributes to a worker thread rather than attempt the
allocation/attachment or detachment/freeing of the attributes in the
context of the memory hotplug handler.I don't know that this is absolutely required, but the registration can
sleep in allocations and other mem hot plug handlers do it this way. If
it turns out this is NOT required, we can drop this patch.N.B., Only tested build, boot, libhugetlbfs regression.
i.e., no memory hotplug testing.Signed-off-by: Lee Schermerhorn
Reviewed-by: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Register per node hstate attributes only for nodes with memory. As
suggested by David Rientjes.With Memory Hotplug, memory can be added to a memoryless node and a node
with memory can become memoryless. Therefore, add a memory on/off-line
notifier callback to [un]register a node's attributes on transition
to/from memoryless state.N.B., Only tested build, boot, libhugetlbfs regression.
i.e., no memory hotplug testing.Signed-off-by: Lee Schermerhorn
Reviewed-by: Andi Kleen
Acked-by: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add the per huge page size control/query attributes to the per node
sysdevs:/sys/devices/system/node/node/hugepages/hugepages-/
nr_hugepages - r/w
free_huge_pages - r/o
surplus_huge_pages - r/oThe patch attempts to re-use/share as much of the existing global hstate
attribute initialization and handling, and the "nodes_allowed" constraint
processing as possible.Calling set_max_huge_pages() with no node indicates a change to global
hstate parameters. In this case, any non-default task mempolicy will be
used to generate the nodes_allowed mask. A valid node id indicates an
update to that node's hstate parameters, and the count argument specifies
the target count for the specified node. From this info, we compute the
target global count for the hstate and construct a nodes_allowed node mask
contain only the specified node.Setting the node specific nr_hugepages via the per node attribute
effectively ignores any task mempolicy or cpuset constraints.With this patch:
(me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
./ ../ free_hugepages nr_hugepages surplus_hugepagesStarting from:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 0
Node 2 HugePages_Free: 0
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 0Allocate 16 persistent huge pages on node 2:
(me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages[Note that this is equivalent to:
numactl -m 2 hugeadmin --pool-pages-min 2M:+16
]Yields:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 16
Node 2 HugePages_Free: 16
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 16Global controls work as expected--reduce pool to 8 persistent huge pages:
(me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepagesNode 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 8
Node 2 HugePages_Free: 8
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0Signed-off-by: Lee Schermerhorn
Acked-by: Mel Gorman
Reviewed-by: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
2 commits
-
Recently we encountered OOM problems due to memory use of the GEM cache.
Generally a large amuont of Shmem/Tmpfs pages tend to create a memory
shortage problem.We often use the following calculation to determine the amount of shmem
pages:shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES
however the expression does not consider isolated and mlocked pages.
This patch adds explicit accounting for pages used by shmem and tmpfs.
Signed-off-by: KOSAKI Motohiro
Acked-by: Rik van Riel
Reviewed-by: Christoph Lameter
Acked-by: Wu Fengguang
Cc: David Rientjes
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The amount of memory allocated to kernel stacks can become significant and
cause OOM conditions. However, we do not display the amount of memory
consumed by stacks.Add code to display the amount of memory used for stacks in /proc/meminfo.
Signed-off-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Reviewed-by: Minchan Kim
Reviewed-by: Rik van Riel
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jun, 2009
1 commit
-
Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this
configurability is unnecessary.Signed-off-by: KOSAKI Motohiro
Cc: Johannes Weiner
Cc: Andi Kleen
Acked-by: Minchan Kim
Cc: David Woodhouse
Cc: Matt Mackall
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Mar, 2009
1 commit
-
Impact: cleanup
node_to_cpumask (and the blecherous node_to_cpumask_ptr which
contained a declaration) are replaced now everyone implements
cpumask_of_node.Signed-off-by: Rusty Russell
11 Mar, 2009
1 commit
-
get_nid_for_pfn() returns int
Presumably the (nid < 0) case has never happened.
We do know that it is happening on one system while creating a symlink for
a memory section so it should also happen on the same system if
unregister_mem_sect_under_nodes() were called to remove the same symlink.The test was actually added in response to a problem with an earlier
version reported by Yasunori Goto where one or more of the leading pages
of a memory section on the 2nd node of one of his systems was
uninitialized because I believe they coincided with a memory hole.That earlier version did not ignore uninitialized pages and determined
the nid by considering only the 1st page of each memory section. This
caused the symlink to the 1st memory section on the 2nd node to be
incorrectly created in /sys/devices/system/node/node0 instead of
/sys/devices/system/node/node1. The problem was fixed by adding the
test to skip over uninitialized pages.I suspect we have not seen any reports of the non-removal
of a symlink due to the incorrect declaration of the nid
variable in unregister_mem_sect_under_nodes() because
- systems where a memory section could have an uninitialized
range of leading pages are probably rare.
- memory remove is probably not done very frequently on the
systems that are capable of demonstrating the problem.
- lingering symlink(s) that should have been removed may
have simply gone unnoticed.[garyhade@us.ibm.com: wrote changelog]
Signed-off-by: Roel Kluin
Cc: Gary Hade
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jan, 2009
1 commit
-
Show node to memory section relationship with symlinks in sysfs
Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.Signed-off-by: Gary Hade
Signed-off-by: Badari Pulavarty
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds