Eric Lee / linux-smarc-t335x-v3.2

14 Nov, 2008

3 commits

2b8289256 Merge branch 'master' into next ... Browse Code »

Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris

James Morris
2008-11-14 08:29:12 +0800
c69e8d9c0 CRED: Use RCU to access another task's creds and to release a task's own creds ... Browse Code »

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:19 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800

11 Nov, 2008

1 commit

a2f2945a9 The oomkiller calculations make decisions based on capabilities. Since ... Browse Code »

these are not security decisions and LSMs should not record if they fall
the request they should use the new has_capability_noaudit() interface so
the denials will not be recorded.

Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Signed-off-by: James Morris

Eric Paris
2008-11-11 19:02:54 +0800

07 Nov, 2008

2 commits

fbdd12676 mm/oom_kill.c: fix badness() kerneldoc ... Browse Code »

Paramter @mem has been removed since v2.6.26, now delete it's comment.

Signed-off-by: Qinghuang Feng
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qinghuang Feng
2008-11-07 07:41:19 +0800
b4416d2be oom: do not dump task state for non thread group leaders ... Browse Code »

When /proc/sys/vm/oom_dump_tasks is enabled, it's only necessary to dump
task state information for thread group leaders. The kernel log gets
quickly overwhelmed on machines with a massive number of threads by
dumping non-thread group leaders.

Reviewed-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-11-07 07:41:18 +0800

14 Aug, 2008

1 commit

5cd9c58fb security: Fix setting of PF_SUPERPRIV by __capable() ... Browse Code »

Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
the target process if that is not the current process and it is trying to
change its own flags in a different way at the same time.

__capable() is using neither atomic ops nor locking to protect t->flags. This
patch removes __capable() and introduces has_capability() that doesn't set
PF_SUPERPRIV on the process being queried.

This patch further splits security_ptrace() in two:

(1) security_ptrace_may_access(). This passes judgement on whether one
process may access another only (PTRACE_MODE_ATTACH for ptrace() and
PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
current is the parent.

(2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only,
and takes only a pointer to the parent process. current is the child.

In Smack and commoncap, this uses has_capability() to determine whether
the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
This does not set PF_SUPERPRIV.

Two of the instances of __capable() actually only act on current, and so have
been changed to calls to capable().

Of the places that were using __capable():

(1) The OOM killer calls __capable() thrice when weighing the killability of a
process. All of these now use has_capability().

(2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
whether the parent was allowed to trace any process. As mentioned above,
these have been split. For PTRACE_ATTACH and /proc, capable() is now
used, and for PTRACE_TRACEME, has_capability() is used.

(3) cap_safe_nice() only ever saw current, so now uses capable().

(4) smack_setprocattr() rejected accesses to tasks other than current just
after calling __capable(), so the order of these two tests have been
switched and capable() is used instead.

(5) In smack_file_send_sigiotask(), we need to allow privileged processes to
receive SIGIO on files they're manipulating.

(6) In smack_task_wait(), we let a process wait for a privileged process,
whether or not the process doing the waiting is privileged.

I've tested this with the LTP SELinux and syscalls testscripts.

Signed-off-by: David Howells
Acked-by: Serge Hallyn
Acked-by: Casey Schaufler
Acked-by: Andrew G. Morgan
Acked-by: Al Viro
Signed-off-by: James Morris

David Howells
2008-08-14 20:59:43 +0800

28 Apr, 2008

3 commits

97d87c971 oom_kill: remove unused parameter in badness() ... Browse Code »

In commit 4c4a22148909e4c003562ea7ffe0a06e26919e3c, we moved the
memcontroller-related code from badness() to select_bad_process(), so the
parameter 'mem' in badness() is unused now.

Signed-off-by: Li Zefan
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-04-28 23:58:26 +0800
dd1a239f6 mm: have zonelist contains structs with both a zone pointer and zone_idx ... Browse Code »

Filtering zonelists requires very frequent use of zone_idx(). This is costly
as it involves a lookup of another structure and a substraction operation. As
the zone_idx is often required, it should be quickly accessible. The node idx
could also be stored here if it was found that accessing zone->node is
significant which may be the case on workloads where nodemasks are heavily
used.

This patch introduces a struct zoneref to store a zone pointer and a zone
index. The zonelist then consists of an array of these struct zonerefs which
are looked up as necessary. Helpers are given for accessing the zone index as
well as the node index.

[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
[hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
[hugh@veritas.com: just return do_try_to_free_pages]
[hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
Signed-off-by: Mel Gorman
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Lee Schermerhorn
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Nick Piggin
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2008-04-28 23:58:18 +0800
54a6eb5c4 mm: use two zonelist that are filtered by GFP mask ... Browse Code »

Currently a node has two sets of zonelists, one for each zone type in the
system and a second set for GFP_THISNODE allocations. Based on the zones
allowed by a gfp mask, one of these zonelists is selected. All of these
zonelists consume memory and occupy cache lines.

This patch replaces the multiple zonelists per-node with two zonelists. The
first contains all populated zones in the system, ordered by distance, for
fallback allocations when the target/preferred node has no free pages. The
second contains all populated zones in the node suitable for GFP_THISNODE
allocations.

An iterator macro is introduced called for_each_zone_zonelist() that interates
through each zone allowed by the GFP flags in the selected zonelist.

Signed-off-by: Mel Gorman
Acked-by: Christoph Lameter
Signed-off-by: Lee Schermerhorn
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2008-04-28 23:58:18 +0800

16 Apr, 2008

1 commit

e115f2d89 memcg: fix oops in oom handling ... Browse Code »

When I used a test program to fork mass processes and immediately move them to
a cgroup where the memory limit is low enough to trigger oom kill, I got oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000808
IP: [] _spin_lock_irqsave+0x8/0x18
PGD 4c95f067 PUD 4406c067 PMD 0
Oops: 0002 [1] SMP
CPU 2
Modules linked in:

Pid: 11973, comm: a.out Not tainted 2.6.25-rc7 #5
RIP: 0010:[] [] _spin_lock_irqsave+0x8/0x18
RSP: 0018:ffff8100448c7c30 EFLAGS: 00010002
RAX: 0000000000000202 RBX: 0000000000000009 RCX: 000000000001c9f3
RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000808
RBP: ffff81007e444080 R08: 0000000000000000 R09: ffff8100448c7900
R10: ffff81000105f480 R11: 00000100ffffffff R12: ffff810067c84140
R13: 0000000000000001 R14: ffff8100441d0018 R15: ffff81007da56200
FS: 00007f70eb1856f0(0000) GS:ffff81007fbad3c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000808 CR3: 000000004498a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process a.out (pid: 11973, threadinfo ffff8100448c6000, task ffff81007da533e0)
Stack: ffffffff8023ef5a 00000000000000d0 ffffffff80548dc0 00000000000000d0
ffff810067c84140 ffff81007e444080 ffffffff8026cef9 00000000000000d0
ffff8100441d0000 00000000000000d0 ffff8100441d0000 ffff8100505445c0
Call Trace:
[] ? force_sig_info+0x25/0xb9
[] ? oom_kill_task+0x77/0xe2
[] ? mem_cgroup_out_of_memory+0x55/0x67
[] ? mem_cgroup_charge_common+0xec/0x202
[] ? handle_mm_fault+0x24e/0x77f
[] ? default_wake_function+0x0/0xe
[] ? get_user_pages+0x2ce/0x3af
[] ? mem_cgroup_charge_common+0x2d/0x202
[] ? make_pages_present+0x8e/0xa4
[] ? mmap_region+0x373/0x429
[] ? do_mmap_pgoff+0x2ff/0x364
[] ? sys_mmap+0xe5/0x111
[] ? tracesys+0xdc/0xe1

Code: 00 00 01 48 8b 3c 24 e9 46 d4 dd ff f0 ff 07 48 8b 3c 24 e9 3a d4 dd ff fe 07 48 8b 3c 24 e9 2f d4 dd ff 9c 58 fa ba 00 01 00 00 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00
RIP [] _spin_lock_irqsave+0x8/0x18
RSP
CR2: 0000000000000808
---[ end trace c3702fa668021ea4 ]---

It's reproducable in a x86_64 box, but doesn't happen in x86_32.

This is because tsk->sighand is not guarded by RCU, so we have to
hold tasklist_lock, just as what out_of_memory() does.

Signed-off-by: Li Zefan
Cc: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Oleg Nesterov
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-04-16 10:35:40 +0800

20 Mar, 2008

1 commit

1b578df02 mm/oom_kill: fix kernel-doc ... Browse Code »

Fix kernel-doc notation in oom_kill.c.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-20 09:53:35 +0800

05 Mar, 2008

1 commit

00f0b8259 Memory controller: rename to Memory Resource Controller ... Browse Code »

Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.

Signed-off-by: Balbir Singh
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-03-05 08:35:12 +0800

08 Feb, 2008

3 commits

fef1bdd68 oom: add sysctl to enable task memory dump ... Browse Code »

Adds a new sysctl, 'oom_dump_tasks', that enables the kernel to produce a
dump of all system tasks (excluding kernel threads) when performing an
OOM-killing. Information includes pid, uid, tgid, vm size, rss, cpu,
oom_adj score, and name.

This is helpful for determining why there was an OOM condition and which
rogue task caused it.

It is configurable so that large systems, such as those with several
thousand tasks, do not incur a performance penalty associated with dumping
data they may not desire.

If an OOM was triggered as a result of a memory controller, the tasklist
shall be filtered to exclude tasks that are not a member of the same
cgroup.

Cc: Andrea Arcangeli
Cc: Christoph Lameter
Cc: Balbir Singh
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-02-08 00:42:19 +0800
4c4a22148 memcontrol: move oom task exclusion to tasklist scan ... Browse Code »

Creates a helper function to return non-zero if a task is a member of a
memory controller:

int task_in_mem_cgroup(const struct task_struct *task,
const struct mem_cgroup *mem);

When the OOM killer is constrained by the memory controller, the exclusion
of tasks that are not a member of that controller was previously misplaced
and appeared in the badness scoring function. It should be excluded
during the tasklist scan in select_bad_process() instead.

[akpm@linux-foundation.org: build fix]
Cc: Christoph Lameter
Cc: Balbir Singh
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-02-08 00:42:19 +0800
c7ba5c9e8 Memory controller: OOM handling ... Browse Code »

Out of memory handling for cgroups over their limit. A task from the
cgroup over limit is chosen using the existing OOM logic and killed.

TODO:
1. As discussed in the OLS BOF session, consider implementing a user
space policy for OOM handling.

[akpm@linux-foundation.org: fix build due to oom-killer changes]
Signed-off-by: Pavel Emelianov
Signed-off-by: Balbir Singh
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelianov
2008-02-08 00:42:19 +0800

06 Feb, 2008

2 commits

97829955a oom_kill: remove uid==0 checks ... Browse Code »

Root processes are considered more important when out of memory and killing
proceses. The check for CAP_SYS_ADMIN was augmented with a check for
uid==0 or euid==0.

There are several possible ways to look at this:

1. uid comparisons are unnecessary, trust CAP_SYS_ADMIN
alone. However CAP_SYS_RESOURCE is the one that really
means "give me extra resources" so allow for that as
well.
2. Any privileged code should be protected, but uid is not
an indication of privilege. So we should check whether
any capabilities are raised.
3. uid==0 makes processes on the host as well as in containers
more important, so we should keep the existing checks.
4. uid==0 makes processes only on the host more important,
even without any capabilities. So we should be keeping
the (uid==0||euid==0) check but only when
userns==&init_user_ns.

I'm following number 1 here.

Signed-off-by: Serge Hallyn
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-02-06 01:44:20 +0800
e338d263a Add 64-bit capability support to the kernel ... Browse Code »

The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities. If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.

FWIW libcap-2.00 supports this change (and earlier capability formats)

http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/

[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan
Cc: Stephen Smalley
Acked-by: Serge Hallyn
Cc: Chris Wright
Cc: James Morris
Cc: Casey Schaufler
Signed-off-by: Erez Zadok
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morgan
2008-02-06 01:44:20 +0800

26 Jan, 2008

1 commit

fa717060f sched: sched_rt_entity ... Browse Code »

Move the task_struct members specific to rt scheduling together.
A future optimization could be to put sched_entity and sched_rt_entity
into a union.

Signed-off-by: Peter Zijlstra
CC: Srivatsa Vaddagiri
Signed-off-by: Ingo Molnar

Peter Zijlstra
2008-01-26 04:08:27 +0800

21 Oct, 2007

1 commit

e91a810e8 oom_kill bug ... Browse Code »

Wrong order of arguments

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2007-10-21 06:04:06 +0800

20 Oct, 2007

4 commits

ba25f9dcc Use helpers to obtain task pid in printks ... Browse Code »

The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:43 +0800
bac0abd61 Isolate some explicit usage of task->tgid ... Browse Code »

With pid namespaces this field is now dangerous to use explicitly, so hide
it behind the helpers.

Also the pid and pgrp fields o task_struct and signal_struct are to be
deprecated. Unfortunately this patch cannot be sent right now as this
leads to tons of warnings, so start isolating them, and deprecate later.

Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
but Oleg pointed out that in case of posix cpu timers this is the same, and
thread_group_leader() is more preferable.

Signed-off-by: Pavel Emelyanov
Acked-by: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800
7b1915a98 mm/oom_kill.c: Use list_for_each_entry instead of list_for_each ... Browse Code »

mm/oom_kill.c: Convert list_for_each to list_for_each_entry in
oom_kill_process()

Signed-off-by: Matthias Kaehlcke
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2007-10-20 02:53:38 +0800
b460cbc58 pid namespaces: define is_global_init() and is_container_init() ... Browse Code »

is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().

2.6.21-mm2-pidns2:

- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn
Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelianov
Cc: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Herbert Poetzel
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-10-20 02:53:37 +0800

17 Oct, 2007

8 commits

ae74138da oom: convert zone_scan_lock from mutex to spinlock ... Browse Code »

There's no reason to sleep in try_set_zone_oom() or clear_zonelist_oom() if
the lock can't be acquired; it will be available soon enough once the zonelist
scanning is done. All other threads waiting for the OOM killer are also
contingent on the exiting task being able to acquire the lock in
clear_zonelist_oom() so it doesn't make sense to put it to sleep.

Cc: Andrea Arcangeli
Cc: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800
3ff566963 oom: do not take callback_mutex ... Browse Code »

Since no task descriptor's 'cpuset' field is dereferenced in the execution of
the OOM killer anymore, it is no longer necessary to take callback_mutex.

[akpm@linux-foundation.org: restore cpuset_lock for other patches]
Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800
bbe373f2c oom: compare cpuset mems_allowed instead of exclusive ancestors ... Browse Code »

Instead of testing for overlap in the memory nodes of the the nearest
exclusive ancestor of both current and the candidate task, it is better to
simply test for intersection between the task's mems_allowed in their task
descriptors. This does not require taking callback_mutex since it is only
used as a hint in the badness scoring.

Tasks that do not have an intersection in their mems_allowed with the current
task are not explicitly restricted from being OOM killed because it is quite
possible that the candidate task has allocated memory there before and has
since changed its mems_allowed.

Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800
7213f5066 oom: suppress extraneous stack and memory dump ... Browse Code »

Suppresses the extraneous stack and memory dump when a parallel OOM killing
has been found. There's no need to fill the ring buffer with this information
if its already been printed and the condition that triggered the previous OOM
killer has not yet been alleviated.

Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800
fe071d7e8 oom: add oom_kill_allocating_task sysctl ... Browse Code »

Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill
the OOM-triggering task instead of scanning through the tasklist to find a
memory-hogging target. This is helpful for systems with an insanely large
number of tasks where scanning the tasklist significantly degrades
performance.

Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:46 +0800
098d7f128 oom: add per-zone locking ... Browse Code »

OOM killer synchronization should be done with zone granularity so that memory
policy and cpuset allocations may have their corresponding zones locked and
allow parallel kills for other OOM conditions that may exist elsewhere in the
system. DMA allocations can be targeted at the zone level, which would not be
possible if locking was done in nodes or globally.

Synchronization shall be done with a variation of "trylocks." The goal is to
put the current task to sleep and restart the failed allocation attempt later
if the trylock fails. Otherwise, the OOM killer is invoked.

Each zone in the zonelist that __alloc_pages() was called with is checked for
the newly-introduced ZONE_OOM_LOCKED flag. If any zone has this flag present,
the "trylock" to serialize the OOM killer fails and returns zero. Otherwise,
all the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function
returns non-zero.

Cc: Andrea Arcangeli
Cc: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:45 +0800
70e24bdf6 oom: move constraints to enum ... Browse Code »

The OOM killer's CONSTRAINT definitions are really more appropriate in an
enum, so define them in include/linux/oom.h.

Cc: Andrea Arcangeli
Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-10-17 23:42:45 +0800
ee31af5d6 Memoryless nodes: OOM: use N_HIGH_MEMORY map instead of constructing one on the fly ... Browse Code »

constrained_alloc() builds its own memory map for nodes with memory. We have
that available in N_HIGH_MEMORY now. So simplify the code.

Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-10-17 00:42:58 +0800

01 Aug, 2007

1 commit

a5e58a614 oom: print points as unsigned long ... Browse Code »

In badness(), the automatic variable 'points' is unsigned long. Print it
as such.

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-08-01 06:39:36 +0800

30 Jul, 2007

1 commit

4e950f6f0 Remove fs.h from mm.h ... Browse Code »

Remove fs.h from mm.h. For this,
1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
2) Add back fs.h or less bloated headers (err.h) to files that need it.

As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).

Cross-compile tested without regressions on my two usual configs and (sigh):

alpha arm-mx1ads mips-bigsur powerpc-ebony
alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
alpha-up arm-netx mips-db1000 powerpc-iseries
arm arm-ns9xxx mips-db1100 powerpc-linkstation
arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
arm-footbridge ia64 mips-pb1500 powerpc-pmac32
arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
arm-integrator ia64-sn2 mips-rbhma4500 s390
arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
arm-iop33x ia64-zx1 mips-sead s390-up
arm-ixp2000 m68k mips-tb0219 sparc
arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
arm-jornada720 m68k-atari mips-workpad sparc-up
arm-kafa m68k-bvme6000 mips-wrppmc sparc64
arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
arm-ks8695 m68k-mac parisc sparc64-defconfig
arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
arm-lpd7a400 m68k-q40 parisc-up x86_64
arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
arm-lusl7200 mips powerpc-celleb x86_64-up
arm-mainstone mips-atlas powerpc-chrp32

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-07-30 08:09:29 +0800

08 May, 2007

3 commits

2b45ab339 oom: fix constraint deadlock ... Browse Code »

Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.

Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.

constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible. If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
This results in deadlock.

We now take callback_mutex after iterating through the zonelist since we
don't need it yet.

Cc: Andi Kleen
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: Martin J. Bligh
Signed-off-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-05-08 03:12:55 +0800
2b744c01a mm: fix handling of panic_on_oom when cpusets are in use ... Browse Code »

The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.

This is tested on my ia64 box which has 3 nodes.

Signed-off-by: Yasunori Goto
Signed-off-by: Benjamin LaHaise
Cc: Christoph Lameter
Cc: Paul Jackson
Cc: Ethan Solomita
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-08 03:12:55 +0800
9a82782f8 allow oom_adj of saintly processes ... Browse Code »

If the badness of a process is zero then oom_adj>0 has no effect. This
patch makes sure that the oom_adj shift actually increases badness points
appropriately.

Signed-off-by: Joshua N. Pritikin
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joshua N Pritikin
2007-05-08 03:12:51 +0800

24 Apr, 2007

2 commits

3d124cbba fix OOM killing processes wrongly thought MPOL_BIND ... Browse Code »

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins
Acked-by: Christoph Lameter
Acked-by: William Irwin
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-04-24 23:23:07 +0800
650a7c974 oom: kill all threads that share mm with killed task ... Browse Code »

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584)

Acked-by: William Irwin
Acked-by: Christoph Lameter
Cc: Nick Piggin
Cc: Andrew Morton
Cc: Andi Kleen
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2007-04-24 23:11:49 +0800

17 Mar, 2007

1 commit

35ae834fa [PATCH] oom fix: prevent oom from killing a process with children/sibling unkillable ... Browse Code »

Looking at oom_kill.c, found that the intention to not kill the selected
process if any of its children/siblings has OOM_DISABLE set, is not being
met.

Signed-off-by: Ankita Garg
Acked-by: Nick Piggin
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ankita Garg
2007-03-17 10:25:06 +0800