Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

15 May, 2009

1 commit

cd17cbfda Revert "mm: add /proc controls for pdflush threads" ... Browse Code »

This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af.

Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.

Signed-off-by: Jens Axboe

Jens Axboe
2009-05-15 17:32:24 +0800

03 May, 2009

1 commit

9e4a5bda8 mm: prevent divide error for small values of vm_dirty_bytes ... Browse Code »

Avoid setting less than two pages for vm_dirty_bytes: this is necessary to
avoid potential division by 0 (like the following) in get_dirty_limits().

[ 49.951610] divide error: 0000 [#1] PREEMPT SMP
[ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
[ 49.952195] CPU 1
[ 49.952195] Modules linked in: pcspkr
[ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
[ 49.952195] RIP: 0010:[] [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202
[ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
[ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
[ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
[ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
[ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
[ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
[ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
[ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
[ 49.952195] Stack:
[ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
[ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
[ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
[ 49.952195] Call Trace:
[ 49.952195] [] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
[ 49.952195] [] ? ext3_writeback_write_end+0x9e/0x120
[ 49.952195] [] generic_file_buffered_write+0x12f/0x330
[ 49.952195] [] __generic_file_aio_write_nolock+0x26d/0x460
[ 49.952195] [] ? generic_file_aio_write+0x52/0xd0
[ 49.952195] [] generic_file_aio_write+0x69/0xd0
[ 49.952195] [] ext3_file_write+0x26/0xc0
[ 49.952195] [] do_sync_write+0xf1/0x140
[ 49.952195] [] ? get_lock_stats+0x2a/0x60
[ 49.952195] [] ? autoremove_wake_function+0x0/0x40
[ 49.952195] [] vfs_write+0xcb/0x190
[ 49.952195] [] sys_write+0x50/0x90
[ 49.952195] [] system_call_fastpath+0x16/0x1b
[ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
[ 49.952195] RIP [] get_dirty_limits+0xe9/0x2c0
[ 49.952195] RSP
[ 50.096523] ---[ end trace 008d7aa02f244d7b ]---

Signed-off-by: Andrea Righi
Cc: Peter Zijlstra
Cc: David Rientjes
Cc: Dave Chinner
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Righi
2009-05-03 06:36:10 +0800

14 Apr, 2009

1 commit

ca8b99502 Documentation/sysctl/net.txt: fix a typo ... Browse Code »

s/spicified/specified

Signed-off-by: Li Zefan
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-04-14 06:04:28 +0800

07 Apr, 2009

1 commit

fafd688e4 mm: add /proc controls for pdflush threads ... Browse Code »

Add /proc entries to give the admin the ability to control the minimum and
maximum number of pdflush threads. This allows finer control of pdflush
on both large and small machines.

The rationale is simply one size does not fit all. Admins on large and/or
small systems may want to tune the min/max pdflush thread count to best
suit their needs. Right now the min/max is hardcoded to 2/8. While
probably a fair estimate for smaller machines, large machines with large
numbers of CPUs and large numbers of filesystems/block devices may benefit
from larger numbers of threads working on different block devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would still
desire finer control on the min/max other than to have to recompile the
kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
the current value of nr_pdflush_threads_max. This minimum is required
since additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.

[akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
Signed-off-by: Peter W Morreale
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter W Morreale
2009-04-07 23:31:03 +0800

03 Apr, 2009

2 commits

45dad7bd9 documentation: fix unix_dgram_qlen description ... Browse Code »

Previous description about system parameter in /proc/sys/net/unix/ is
wrong (or missed). Simply add a new description about unix_dgram_qlen
according to latest kernel.

Signed-off-by: Li Xiaodong
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Xiaodong
2009-04-03 10:04:53 +0800
760df93ec documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls ... Browse Code »

Now /proc/sys is described in many places and much information is
redundant. This patch updates the proc.txt and move the /proc/sys
desciption out to the files in Documentation/sysctls.

Details are:

merge
- 2.1 /proc/sys/fs - File system data
- 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
- 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface
with Documentation/sysctls/fs.txt.

remove
- 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
since it's not better then the Documentation/binfmt_misc.txt.

merge
- 2.3 /proc/sys/kernel - general kernel parameters
with Documentation/sysctls/kernel.txt

remove
- 2.5 /proc/sys/dev - Device specific parameters
since it's obsolete the sysfs is used now.

remove
- 2.6 /proc/sys/sunrpc - Remote procedure calls
since it's not better then the Documentation/sysctls/sunrpc.txt

move
- 2.7 /proc/sys/net - Networking stuff
- 2.9 Appletalk
- 2.10 IPX
to newly created Documentation/sysctls/net.txt.

remove
- 2.8 /proc/sys/net/ipv4 - IPV4 settings
since it's not better then the Documentation/networking/ip-sysctl.txt.

add
- Chapter 3 Per-Process Parameters
to descibe /proc//xxx parameters.

Signed-off-by: Shen Feng
Cc: Randy Dunlap
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shen Feng
2009-04-03 10:04:53 +0800

16 Jan, 2009

1 commit

db0fb1848 Update of Documentation: vm.txt and proc.txt ... Browse Code »

Update Documentation/sysctl/vm.txt and Documentation/filesystems/proc.txt.
More specifically, the section on /proc/sys/vm in
Documentation/filesystems/proc.txt was removed and a link to
Documentation/sysctl/vm.txt added.

Most of the verbiage from proc.txt was simply moved in vm.txt, with new
addtional text for "swappiness" and "stat_interval".

Signed-off-by: Peter W Morreale
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter W Morreale
2009-01-16 08:39:35 +0800

08 Jan, 2009

1 commit

dd8632a12 NOMMU: Make mmap allocation page trimming behaviour configurable. ... Browse Code »

NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
the nearest power-of-2 number of pages. Currently it then discards the excess
pages back to the page allocator, making that memory available for use by other
things. This can, however, cause greater amount of fragmentation.

To counter this, a sysctl is added in order to fine-tune the trimming
behaviour. The default behaviour remains to trim pages aggressively, while
this can either be disabled completely or set to a higher page-granular
watermark in order to have finer-grained control.

vm region vm_top bits taken from an earlier patch by David Howells.

Signed-off-by: Paul Mundt
Signed-off-by: David Howells
Tested-by: Mike Frysinger

Paul Mundt
2009-01-08 20:04:47 +0800

07 Jan, 2009

1 commit

2da02997e mm: add dirty_background_bytes and dirty_bytes sysctls ... Browse Code »

This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.

dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.

With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.

dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system. If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.

When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value. For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:

dirtyable_memory = free pages + mapped pages + file cache

dirty_background_bytes = dirty_background_ratio * dirtyable_memory
-or-
dirty_background_ratio = dirty_background_bytes / dirtyable_memory

AND

dirty_bytes = dirty_ratio * dirtyable_memory
-or-
dirty_ratio = dirty_bytes / dirtyable_memory

Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified. When one sysctl is written, the other appears as 0 when read.

The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.

Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100. This restriction is maintained, but
dirty_bytes has a lower limit of only one page.

Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio. This restriction is maintained in addition to
restricting dirty_background_bytes. If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.

Acked-by: Peter Zijlstra
Cc: Dave Chinner
Cc: Christoph Lameter
Signed-off-by: David Rientjes
Cc: Andrea Righi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-01-07 07:59:03 +0800

30 Oct, 2008

1 commit

bb20698d4 Document kernel taint flags properly ... Browse Code »

This fills in the documentation for all of the current kernel taint
flags, and fixes the number for TAINT_CRAP, which was incorrectly
described.

Cc: Michael Kerrisk
Cc: Randy Dunlap
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-10-30 06:03:49 +0800

11 Oct, 2008

1 commit

061b1bd39 Staging: add TAINT_CRAP for all drivers/staging code ... Browse Code »

We need to add a flag for all code that is in the drivers/staging/
directory to prevent all other kernel developers from worrying about
issues here, and to notify users that the drivers might not be as good
as they are normally used to.

Based on code from Andreas Gruenbacher and Jeff Mahoney to provide a
TAINT flag for the support level of a kernel module in the Novell
enterprise kernel release.

This is the kernel portion of this feature, the ability for the flag to
be set needs to be done in the build process and will happen in a
follow-up patch.

Cc: Andreas Gruenbacher
Cc: Jeff Mahoney
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-10-11 06:31:05 +0800

23 Sep, 2008

1 commit

b4d19cc84 Documentation/sysctl/kernel.txt: fix softlockup_thresh description ... Browse Code »

- s/s/seconds/

- s/10 seconds/60 seconds/

- Mention the zero-disables-it feature.

Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-09-23 23:09:14 +0800

27 Jul, 2008

1 commit

d91958815 Documentation cleanup: trivial misspelling, punctuation, and grammar corrections. ... Browse Code »

Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt LaPlante
2008-07-27 03:00:06 +0800

14 Feb, 2008

1 commit

ac76cff2e Documentation: sysctl/kernel.txt: fix documentation reference ... Browse Code »

This patch fixes a reference to Documentation/kmod.txt
which was apparently renamed to Documentation/debugging-modules.txt

Signed-off-by: Michael Opdenacker
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Opdenacker
2008-02-14 08:21:20 +0800

10 Feb, 2008

1 commit

1ec7fd50b brk: document randomize_va_space and CONFIG_COMPAT_BRK (was Re: ... Browse Code »

Document randomize_va_space and CONFIG_COMPAT_BRK.

Signed-off-by: Jiri Kosina
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jiri Kosina
2008-02-10 06:24:08 +0800

08 Feb, 2008

1 commit

fef1bdd68 oom: add sysctl to enable task memory dump ... Browse Code »

Adds a new sysctl, 'oom_dump_tasks', that enables the kernel to produce a
dump of all system tasks (excluding kernel threads) when performing an
OOM-killing. Information includes pid, uid, tgid, vm size, rss, cpu,
oom_adj score, and name.

This is helpful for determining why there was an OOM condition and which
rogue task caused it.

It is configurable so that large systems, such as those with several
thousand tasks, do not incur a performance penalty associated with dumping
data they may not desire.

If an OOM was triggered as a result of a memory controller, the tasklist
shall be filtered to exclude tasks that are not a member of the same
cgroup.

Cc: Andrea Arcangeli
Cc: Christoph Lameter
Cc: Balbir Singh
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-02-08 00:42:19 +0800

07 Feb, 2008

1 commit

9cfe015aa get rid of NR_OPEN and introduce a sysctl_nr_open ... Browse Code »

NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet
Cc: Alan Cox
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: "David S. Miller"
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2008-02-07 02:41:06 +0800

06 Feb, 2008

1 commit

195cf453d mm/page-writeback: highmem_is_dirtyable option ... Browse Code »

Add vm.highmem_is_dirtyable toggle

A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process. On 2.6.16 this process took a few
minutes. With lowmem only accounting of dirty ratios, this takes about 12
hours of 100% disk IO, all random writes.

Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
add the highmem back to the total available memory count.

[akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
Signed-off-by: Bron Gondwana
Cc: Ethan Solomita
Cc: Peter Zijlstra
Cc: WU Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bron Gondwana
2008-02-06 01:44:18 +0800

18 Dec, 2007

1 commit

d5dbac87b Documentation: update hugetlb information ... Browse Code »

The hugetlb documentation has gotten a bit out of sync with the current code.
Updated the sysctl file to refer to Documentation/vm/hugetlbpage.txt. Update
that file to contain the current state of affairs (with the newer named sysctl
in place).

Signed-off-by: Nishanth Aravamudan
Acked-by: Adam Litke
Cc: William Lee Irwin III
Cc: Dave Hansen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2007-12-18 11:28:17 +0800

17 Oct, 2007

4 commits

24950898f vm.txt: document min_free_pages as critical for correctness ... Browse Code »

min_free_pages is critical for correctness, document it as such.

Signed-off-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Machek
2007-10-17 23:43:06 +0800
ea069f46f Add a 00-INDEX file to Documentation/sysctl/ ... Browse Code »

Add a 00-INDEX file to Documentation/sysctl/

Signed-off-by: Jesper Juhl
Cc: Rob Landley
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2007-10-17 23:43:05 +0800
c4f3b63fe softlockup: add a /proc tuning parameter ... Browse Code »

Control the trigger limit for softlockup warnings. This is useful for
debugging softlockups, by lowering the softlockup_thresh to identify
possible softlockups earlier.

This patch:
1. Adds a sysctl softlockup_thresh with valid values of 1-60s
(Higher value to disable false positives)
2. Changes the softlockup printk to print the cpu softlockup time

[akpm@linux-foundation.org: Fix various warnings and add definition of "two"]
Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2007-10-17 23:42:47 +0800
fe071d7e8 oom: add oom_kill_allocating_task sysctl ... Browse Code »

Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill
the OOM-triggering task instead of scanning through the tasklist to find a
memory-hogging target. This is helpful for systems with an insanely large
number of tasks where scanning the tasklist significantly degrades
performance.

Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800

18 Jul, 2007

1 commit

ed7ed3651 handle kernelcore=: generic ... Browse Code »

This patch adds the kernelcore= parameter for x86.

Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.

From: Yasunori Goto

When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.

This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.

[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman
Signed-off-by: Yasunori Goto
Acked-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2007-07-18 01:22:59 +0800

17 Jul, 2007

2 commits

97d8f83cb Add Documentation/sysctl/ctl_unnumbered.txt ... Browse Code »

Poeple keep on adding new numbered sysctls, when they're supposed not to.

Add a documentation file which explain why new sysctls should use
CTL_UNNUMBERED. The next patch will sprinkle pointers to this throughout
sysctl.c.

Eric provided the text (thanks)

Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-17 00:05:49 +0800
f0c0b2b80 change zonelist order: zonelist order selection logic ... Browse Code »

Make zonelist creation policy selectable from sysctl/boot option v6.

This patch makes NUMA's zonelist (of pgdat) order selectable.
Available order are Default(automatic)/ Node-based / Zone-based.

[Default Order]
The kernel selects Node-based or Zone-based order automatically.

[Node-based Order]
This policy treats the locality of memory as the most important parameter.
Zonelist order is created by each zone's locality. This means lower zones
(ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
IOW. ZONE_DMA will be in the middle of zonelist.
current 2.6.21 kernel uses this.

Pros.
* A user can expect local memory as much as possible.
Cons.
* lower zone will be exhansted before higher zone. This may cause OOM_KILL.

Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
because of ZONE_DMA exhaution and you need the best locality.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.

*node(1)'s memory allocation order:

node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

[Zone-based order]
This policy treats the zone type as the most important parameter.
Zonelist order is created by zone-type order. This means lower zone
never be used bofere higher zone exhaustion.
IOW. ZONE_DMA will be always at the tail of zonelist.

Pros.
* OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
Cons.
* memory locality may not be best.

(example)
assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.

*node(0)'s memory allocation order:

node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.

*node(1)'s memory allocation order:

node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.

bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.

command:
%echo N > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Node-based order.

command:
%echo Z > /proc/sys/vm/numa_zonelist_order

Will rebuild zonelist in Zone-based order.

Thanks to Lee Schermerhorn, he gives me much help and codes.

[Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
[akpm@linux-foundation.org: build fix]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Christoph Lameter
Cc: Andi Kleen
Cc: "jesse.barnes@intel.com"
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-07-17 00:05:35 +0800

12 Jul, 2007

1 commit

ed0321895 security: Protection for exploiting null dereference using mmap ... Browse Code »

Add a new security check on mmap operations to see if the user is attempting
to mmap to low area of the address space. The amount of space protected is
indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
0, preserving existing behavior.

This patch uses a new SELinux security class "memprotect." Policy already
contains a number of allow rules like a_t self:process * (unconfined_t being
one of them) which mean that putting this check in the process class (its
best current fit) would make it useless as all user processes, which we also
want to protect against, would be allowed. By taking the memprotect name of
the new class it will also make it possible for us to move some of the other
memory protect permissions out of 'process' and into the new class next time
we bump the policy version number (which I also think is a good future idea)

Acked-by: Stephen Smalley
Acked-by: Chris Wright
Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2007-07-12 10:52:29 +0800

09 May, 2007

2 commits

a982ac06b misc doc and kconfig typos ... Browse Code »

Fix various typos in kernel docs and Kconfigs, 2.6.21-rc4.

Signed-off-by: Matt LaPlante
Signed-off-by: Adrian Bunk

Matt LaPlante
2007-05-09 14:58:15 +0800
beb7dd86a Fix misspellings collected by members of KJ list. ... Browse Code »

Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.

Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk

Robert P. J. Day
2007-05-09 13:14:03 +0800

08 May, 2007

1 commit

2b744c01a mm: fix handling of panic_on_oom when cpusets are in use ... Browse Code »

The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.

This is tested on my ia64 box which has 3 nodes.

Signed-off-by: Yasunori Goto
Signed-off-by: Benjamin LaHaise
Cc: Christoph Lameter
Cc: Paul Jackson
Cc: Ethan Solomita
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-08 03:12:55 +0800

07 Dec, 2006

1 commit

0741f4d20 [PATCH] x86: add sysctl for kstack_depth_to_print ... Browse Code »

Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.

Signed-off-by: Chuck Ebbert
Signed-off-by: Andi Kleen

Chuck Ebbert
2006-12-07 09:14:11 +0800

30 Nov, 2006

1 commit

5d3f083d8 Fix typos in /Documentation : Misc ... Browse Code »

This patch fixes typos in various Documentation txts. The patch addresses some
misc words.

Signed-off-by: Matt LaPlante
Acked-by: Randy Dunlap
Signed-off-by: Adrian Bunk

Matt LaPlante
2006-11-30 12:21:10 +0800

12 Oct, 2006

1 commit

cd0810410 [PATCH] document the core-dump-to-a-pipe patch ... Browse Code »

The pipe-a-coredump-to-a-program feature was undocumented.
*Grumble*.

NB: a good enhancement to that patch would be: save all the stuff that a
core file can get from the %x expansions in the environment.

Signed-off-by: Matthias Urlichs
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Urlichs
2006-10-12 02:14:22 +0800

26 Sep, 2006

1 commit

0ff38490c [PATCH] zone_reclaim: dynamic slab reclaim ... Browse Code »

Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.

However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).

This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we

1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.

2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).

The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.

The default for the min_slab_ratio is 5%

Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.

[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800

28 Aug, 2006

1 commit

a2e0b5631 [PATCH] Fix docs for fs.suid_dumpable ... Browse Code »

Sergey Vlasov noticed that there is not kernel.suid_dumpable, but
fs.suid_dumpable.

How KERN_SETUID_DUMPABLE ended up in fs_table[]? Hell knows...

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-08-28 02:01:28 +0800

06 Aug, 2006

1 commit

8b23d04dd [PATCH] doc: update panic_on_oops documentation ... Browse Code »

Signed-off-by: Maxime Bizon
Acked-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Maxime Bizon
2006-08-06 23:57:47 +0800

04 Jul, 2006

1 commit

9614634fe [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O ... Browse Code »

It turns out that it is advantageous to leave a small portion of unmapped file
backed pages if all of a zone's pages (or almost all pages) are allocated and
so the page allocator has to go off-node.

This allows recently used file I/O buffers to stay on the node and
reduces the times that zone reclaim is invoked if file I/O occurs
when we run out of memory in a zone.

The problem is that zone reclaim runs too frequently when the page cache is
used for file I/O (read write and therefore unmapped pages!) alone and we have
almost all pages of the zone allocated. Zone reclaim may remove 32 unmapped
pages. File I/O will use these pages for the next read/write requests and the
unmapped pages increase. After the zone has filled up again zone reclaim will
remove it again after only 32 pages. This cycle is too inefficient and there
are potentially too many zone reclaim cycles.

With the 1% boundary we may still remove all unmapped pages for file I/O in
zone reclaim pass. However. it will take a large number of read and writes
to get back to 1% again where we trigger zone reclaim again.

The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
second timeout.

[akpm@osdl.org: rename the /proc file and the variable]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-04 06:26:59 +0800

01 Jul, 2006

1 commit

34aa1330f [PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval ... Browse Code »

The zone_reclaim_interval was necessary because we were not able to determine
how many unmapped pages exist in a zone. Therefore we had to scan in
intervals to figure out if any pages were unmapped.

With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
pages and the number of mapped pages in a zone. So we can simply skip the
reclaim if there is an insufficient number of unmapped pages. We use
SWAP_CLUSTER_MAX as the boundary.

Drop all support for /proc/sys/vm/zone_reclaim_interval.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800

23 Jun, 2006

1 commit

fadd8fbd1 [PATCH] support for panic at OOM ... Browse Code »

This patch adds panic_on_oom sysctl under sys.vm.

When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
the same way as it does today. Of course, the default value is 0 and only
root can modifies it.

In general, oom_killer works well and kill rogue processes. So the whole
system can survive. But there are environments where panic is preferable
rather than kill some processes.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-06-23 22:42:47 +0800

21 Feb, 2006

1 commit

c255d844d [PATCH] suspend-to-ram: allow video options to be set at runtime ... Browse Code »

Currently, acpi video options can only be set on kernel command line. That's
little inflexible; I'd like userland s2ram application that just works, and
modifying kernel command line according to whitelist is not fun. It is better
to just allow s2ram application to set video options just before suspend
(according to the whitelist).

This implements sysctl to allow setting suspend video options without reboot.

(akpm: Documentation updates for this new sysctl are pending..)

Signed-off-by: Pavel Machek
Cc: "Brown, Len"
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Machek
2006-02-21 12:00:10 +0800