Eric Lee / smarc-fsl-linux-kernel

01 Aug, 2007

1 commit

a5e58a614 oom: print points as unsigned long ... Browse Code »

In badness(), the automatic variable 'points' is unsigned long. Print it
as such.

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-08-01 06:39:36 +0800

30 Jul, 2007

1 commit

4e950f6f0 Remove fs.h from mm.h ... Browse Code »

Remove fs.h from mm.h. For this,
1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
2) Add back fs.h or less bloated headers (err.h) to files that need it.

As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).

Cross-compile tested without regressions on my two usual configs and (sigh):

alpha arm-mx1ads mips-bigsur powerpc-ebony
alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
alpha-up arm-netx mips-db1000 powerpc-iseries
arm arm-ns9xxx mips-db1100 powerpc-linkstation
arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
arm-footbridge ia64 mips-pb1500 powerpc-pmac32
arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
arm-integrator ia64-sn2 mips-rbhma4500 s390
arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
arm-iop33x ia64-zx1 mips-sead s390-up
arm-ixp2000 m68k mips-tb0219 sparc
arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
arm-jornada720 m68k-atari mips-workpad sparc-up
arm-kafa m68k-bvme6000 mips-wrppmc sparc64
arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
arm-ks8695 m68k-mac parisc sparc64-defconfig
arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
arm-lpd7a400 m68k-q40 parisc-up x86_64
arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
arm-lusl7200 mips powerpc-celleb x86_64-up
arm-mainstone mips-atlas powerpc-chrp32

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-07-30 08:09:29 +0800

08 May, 2007

3 commits

2b45ab339 oom: fix constraint deadlock ... Browse Code »

Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.

Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.

constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible. If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
This results in deadlock.

We now take callback_mutex after iterating through the zonelist since we
don't need it yet.

Cc: Andi Kleen
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: Martin J. Bligh
Signed-off-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2007-05-08 03:12:55 +0800
2b744c01a mm: fix handling of panic_on_oom when cpusets are in use ... Browse Code »

The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain. But some people
want failover by panic ASAP even if they are used. This patch makes new
setting for its request.

This is tested on my ia64 box which has 3 nodes.

Signed-off-by: Yasunori Goto
Signed-off-by: Benjamin LaHaise
Cc: Christoph Lameter
Cc: Paul Jackson
Cc: Ethan Solomita
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-05-08 03:12:55 +0800
9a82782f8 allow oom_adj of saintly processes ... Browse Code »

If the badness of a process is zero then oom_adj>0 has no effect. This
patch makes sure that the oom_adj shift actually increases badness points
appropriately.

Signed-off-by: Joshua N. Pritikin
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joshua N Pritikin
2007-05-08 03:12:51 +0800

24 Apr, 2007

2 commits

3d124cbba fix OOM killing processes wrongly thought MPOL_BIND ... Browse Code »

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins
Acked-by: Christoph Lameter
Acked-by: William Irwin
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-04-24 23:23:07 +0800
650a7c974 oom: kill all threads that share mm with killed task ... Browse Code »

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584)

Acked-by: William Irwin
Acked-by: Christoph Lameter
Cc: Nick Piggin
Cc: Andrew Morton
Cc: Andi Kleen
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2007-04-24 23:11:49 +0800

17 Mar, 2007

1 commit

35ae834fa [PATCH] oom fix: prevent oom from killing a process with children/sibling unkillable ... Browse Code »

Looking at oom_kill.c, found that the intention to not kill the selected
process if any of its children/siblings has OOM_DISABLE set, is not being
met.

Signed-off-by: Ankita Garg
Acked-by: Nick Piggin
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ankita Garg
2007-03-17 10:25:06 +0800

06 Jan, 2007

1 commit

7ba348594 [PATCH] fix OOM killing of swapoff ... Browse Code »

These days, if you swapoff when there isn't enough memory, OOM killer gives
"BUG: scheduling while atomic" and the machine hangs: badness() needs to do
its PF_SWAPOFF return after the task_unlock (tasklist_lock is also held
here, so p isn't going to be freed: PF_SWAPOFF might get turned off at any
moment, but that doesn't really matter).

Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-01-06 15:55:29 +0800

31 Dec, 2006

1 commit

96ac5913f [PATCH] fix oom killer kills current every time if there is memory-less-node take2 ... Browse Code »

constrained_alloc(), which is called to detect where oom is from, checks
passed zone_list(). If zone_list doesn't include all nodes, it thinks oom
is from mempolicy.

But there is memory-less-node. memory-less-node's zones are never included
in zonelist[].

contstrained_alloc() should get memory_less_node into count. Otherwise, it
always thinks 'oom is from mempolicy'. This means that current process
dies at any time. This patch fix it.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Paul Jackson
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-12-31 02:55:55 +0800

14 Dec, 2006

1 commit

02a0e53d8 [PATCH] cpuset: rework cpuset_zone_allowed api ... Browse Code »

Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:

cpuset_zone_allowed_hardwall()
cpuset_zone_allowed_softwall()

Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.

If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.

Unfortunately, this meant that users would end up with the softwall version
without thinking about it. Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.

The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)

The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)

This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.

If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.

This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines. It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.

For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.

Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-14 01:05:49 +0800

08 Dec, 2006

3 commits

f2a2a7108 [PATCH] oom: less memdie ... Browse Code »

Don't cause all threads in all other thread groups to gain TIF_MEMDIE
otherwise we'll get a thundering herd eating our memory reserve. This may not
be the optimal scheme, but it fits our policy of allowing just one TIF_MEMDIE
in the system at once.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
f3af38d30 [PATCH] oom: cleanup messages ... Browse Code »

Clean up the OOM killer messages to be more consistent.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
c33e0fca3 [PATCH] oom: don't kill unkillable children or siblings ... Browse Code »

Abort the kill if any of our threads have OOM_DISABLE set. Having this
test here also prevents any OOM_DISABLE child of the "selected" process
from being killed.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800

21 Oct, 2006

1 commit

8ac773b4f [PATCH] OOM killer meets userspace headers ... Browse Code »

Despite mm.h is not being exported header, it does contain one thing
which is part of userspace ABI -- value disabling OOM killer for given
process. So,
a) create and export include/linux/oom.h
b) move OOM_DISABLE define there.
c) turn bounding values of /proc/$PID/oom_adj into defines and export
them too.

Note: mass __KERNEL__ removal will be done later.

Signed-off-by: Alexey Dobriyan
Cc: Nick Piggin
Cc: David Woodhouse
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-10-21 01:26:38 +0800

30 Sep, 2006

7 commits

b78483a4b [PATCH] oom: don't kill current when another OOM in progress ... Browse Code »

A previous patch to allow an exiting task to OOM kill itself (and thereby
avoid a little deadlock) introduced a problem. We don't want the
PF_EXITING task, even if it is 'current', to access mem reserves if there
is already a TIF_MEMDIE process in the system sucking up reserves.

Also make the commenting a little bit clearer, and note that our current
scheme of effectively single threading the OOM killer is not itself
perfect.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-30 00:18:21 +0800
01017a227 [PATCH] oom_kill_task(): cleanup ->mm checks ... Browse Code »

- It is not possible to have task->mm == &init_mm.

- task_lock() buys nothing for 'if (!p->mm)' check.

Signed-off-by: Oleg Nesterov
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:21 +0800
972c4ea59 [PATCH] select_bad_process(): cleanup 'releasing' check ... Browse Code »

No logic changes, but imho easier to read.

Signed-off-by: Oleg Nesterov
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:21 +0800
28324d1df [PATCH] select_bad_process(): kill a bogus PF_DEAD/TASK_DEAD check ... Browse Code »

The only one usage of TASK_DEAD outside of last schedule path,

select_bad_process:

for_each_task(p) {

if (!p->mm)
continue;
...
if (p->state == TASK_DEAD)
continue;
...

TASK_DEAD state is set at the end of do_exit(), this means that p->mm
was already set == NULL by exit_mm(), so this task was already rejected
by 'if (!p->mm)' above.

Note also that the caller holds tasklist_lock, this means that p can't
pass exit_notify() and then set TASK_DEAD when p->mm != NULL.

Also, remove open-coded is_init().

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:21 +0800
c394cc9fb [PATCH] introduce TASK_DEAD state ... Browse Code »

I am not sure about this patch, I am asking Ingo to take a decision.

task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
live only in ->exit_state as documented in sched.h.

Note that this state is not visible to user-space, get_task_state() masks off
unsuitable states.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:21 +0800
55a101f8f [PATCH] kill PF_DEAD flag ... Browse Code »

After the previous change (->flags & PF_DEAD) (->state == EXIT_DEAD), we
don't need PF_DEAD any longer.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-09-30 00:18:20 +0800
f400e198b [PATCH] pidspace: is_init() ... Browse Code »

This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.

Introduce is_init to capture this case.

With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.

Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Cc:
Acked-by: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-09-30 00:18:12 +0800

26 Sep, 2006

9 commits

89fa30242 [PATCH] NUMA: Add zone_to_nid function ... Browse Code »

There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
5a291b98b [PATCH] oom-kill: update comments to reflect current code ... Browse Code »

Update the comments for __oom_kill_task() to reflect the code changes.

Signed-off-by: Ram Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ram Gupta
2006-09-26 23:48:52 +0800
b72f16044 [PATCH] oom: more printk ... Browse Code »

Print the name of the task invoking the OOM killer. Could make debugging
easier.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
5081dde33 [PATCH] oom: kthread infinite loop fix ... Browse Code »

Skip kernel threads, rather than having them return 0 from badness.
Theoretically, badness might truncate all results to 0, thus a kernel thread
might be picked first, causing an infinite loop.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
af5b91243 [PATCH] oom: swapoff tasks tweak ... Browse Code »

PF_SWAPOFF processes currently cause select_bad_process to return straight
away. Instead, give them high priority, so we will kill them first, however
we also first ensure no parallel OOM kills are happening at the same time.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800
4a3ede107 [PATCH] oom: handle oom_disable exiting ... Browse Code »

Having the oomkilladj == OOM_DISABLE check before the releasing check means
that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer.

Moving the test down will give the desired behaviour. Also: it will allow
them to "OOM-kill" themselves if they are exiting. As per the previous patch,
this is required to prevent OOM killer deadlocks (and they don't actually get
killed, because they're already exiting -- they're simply allowed access to
memory reserves).

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
50ec3bbff [PATCH] oom: handle current exiting ... Browse Code »

If current *is* exiting, it should actually be allowed to access reserved
memory rather than OOM kill something else. Can't do this via a straight
check in page_alloc.c because that would allow multiple tasks to use up
reserves. Instead cause current to OOM-kill itself which will mark it as
TIF_MEMDIE.

The current procedure of simply aborting the OOM-kill if a task is exiting can
lead to OOM deadlocks.

In the case of killing a PF_EXITING task, don't make a lot of noise about it.
This becomes more important in future patches, where we can "kill" OOM_DISABLE
tasks.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
7887a3da7 [PATCH] oom: cpuset hint ... Browse Code »

cpuset_excl_nodes_overlap does not always indicate that killing a task will
not free any memory we for us. For example, we may be asking for an
allocation from _anywhere_ in the machine, or the task in question may be
pinning memory that is outside its cpuset. Fix this by just causing
cpuset_excl_nodes_overlap to reduce the badness rather than disallow it.

Signed-off-by: Nick Piggin
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
8bc719d3c [PATCH] out of memory notifier ... Browse Code »

Add a notifer chain to the out of memory killer. If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.

The purpose of the notifier is to add a safety net in the presence of
memory ballooners. If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.

The implementation for the s390 ballooner is included.

[akpm@osdl.org: cleanups]
Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2006-09-26 23:48:47 +0800

04 Jul, 2006

1 commit

36c8b5868 [PATCH] sched: cleanup, remove task_t, convert to struct task_struct ... Browse Code »

cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.

Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:11 +0800

23 Jun, 2006

2 commits

6937a25cf [PATCH] mm: fix typos in comments in mm/oom_kill.c ... Browse Code »

This fixes a few typos in the comments in mm/oom_kill.c.

Signed-off-by: David S. Peterson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Peterson
2006-06-23 22:42:47 +0800
fadd8fbd1 [PATCH] support for panic at OOM ... Browse Code »

This patch adds panic_on_oom sysctl under sys.vm.

When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
the same way as it does today. Of course, the default value is 0 and only
root can modifies it.

In general, oom_killer works well and kill rogue processes. So the whole
system can survive. But there are environments where panic is preferable
rather than kill some processes.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-06-23 22:42:47 +0800

20 Apr, 2006

2 commits

013159227 [PATCH] mm: fix mm_struct reference counting bugs in mm/oom_kill.c ... Browse Code »

Fix oom_kill_task() so it doesn't call mmput() (which may sleep) while
holding tasklist_lock.

Signed-off-by: David S. Peterson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Peterson
2006-04-20 00:13:50 +0800
97c2c9b84 [PATCH] oom-kill: mm locking fix ... Browse Code »

Dave Peterson points out that badness() is playing with
mm_structs without taking a reference on them.

mmput() can sleep, so taking a reference here (inside tasklist_lock) is
hard. Fix it up via task_lock() instead.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-04-20 00:13:49 +0800

03 Mar, 2006

1 commit

140ffcec4 [PATCH] out_of_memory() locking fix ... Browse Code »

I seem to have lost this read_unlock().

While we're there, let's turn that interruptible sleep unto uninterruptible,
so we don't get a busywait if signal_pending(). (Again. We seem to have a
habit of doing this).

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-03 00:33:07 +0800

01 Mar, 2006

1 commit

d6713e046 [PATCH] out_of_memory(): use of uninitialised ... Browse Code »

Under some circumstances `points' can get printed before it's initialised.
Spotted by Carlos Martin .

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-01 12:53:44 +0800

21 Feb, 2006

2 commits

9b0f8b040 [PATCH] Terminate process that fails on a constrained allocation ... Browse Code »

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints). If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior. The process may be killed but then the
sysadmin or developer can investigate the situation. The solution is
similar to what we do when running out of hugepages.

This patch adds a check before we kill processes. At that point
performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes. If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should kill the
current process.

Signed-off-by: Christoph Lameter
Cc: Nick Piggin
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-02-21 12:00:09 +0800
9827b781f [PATCH] OOM kill: children accounting ... Browse Code »

In the badness() calculation, there's currently this piece of code:

/*
* Processes which fork a lot of child processes are likely
* a good choice. We add the vmsize of the children if they
* have an own mm. This prevents forking servers to flood the
* machine with an endless amount of children
*/
list_for_each(tsk, &p->children) {
struct task_struct *chld;
chld = list_entry(tsk, struct task_struct, sibling);
if (chld->mm = p->mm && chld->mm)
points += chld->mm->total_vm;
}

The intention is clear: If some server (apache) keeps spawning new children
and we run OOM, we want to kill the father rather than picking a child.

This -- to some degree -- also helps a bit with getting fork bombs under
control, though I'd consider this a desirable side-effect rather than a
feature.

There's one problem with this: No matter how many or few children there are,
if just one of them misbehaves, and all others (including the father) do
everything right, we still always kill the whole family. This hits in real
life; whether it's javascript in konqueror resulting in kdeinit (and thus the
whole KDE session) being hit or just a classical server that spawns children.

Sidenote: The killer does kill all direct children as well, not only the
selected father, see oom_kill_process().

The idea in attached patch is that we do want to account the memory
consumption of the (direct) children to the father -- however not fully.
This maintains the property that fathers with too many children will still
very likely be picked, whereas a single misbehaving child has the chance to
be picked by the OOM killer.

In the patch I account only half (rounded up) of the children's vm_size to
the parent. This means that if one child eats more mem than the rest of
the family, it will be picked, otherwise it's still the father and thus the
whole family that gets selected.

This is heuristics -- we could debate whether accounting for a fourth would
be better than for half of it. Or -- if people would consider it worth the
trouble -- make it a sysctl. For now I sticked to accounting for half,
which should IMHO be a significant improvement.

The patch does one more thing: As users tend to be irritated by the choice
of killed processes (mainly because the children are killed first, despite
some of them having a very low OOM score), I added some more output: The
selected (father) process will be reported first and it's oom_score printed
to syslog.

Description:

Only account for half of children's vm size in oom score calculation

This should still give the parent enough point in case of fork bombs. If
any child however has more than 50% of the vm size of all children
together, it'll get a higher score and be elected.

This patch also makes the kernel display the oom_score.

Signed-off-by: Kurt Garloff
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kurt Garloff
2006-02-21 12:00:09 +0800