Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2006

2 commits

bbdb396a6 [PATCH] Use min of two prio settings in calculating distress for reclaim ... Browse Code »

If try_to_free_pages / balance_pgdat are called with a gfp_mask specifying
GFP_IO and/or GFP_FS, they will reclaim the requisite number of pages, and the
reset prev_priority to DEF_PRIORITY (or to some other high (ie: unurgent)
value).

However, another reclaimer without those gfp_mask flags set (say, GFP_NOIO)
may still be struggling to reclaim pages. The concurrent overwrite of
zone->prev_priority will cause this GFP_NOIO thread to unexpectedly cease
deactivating mapped pages, thus causing reclaim difficulties.

Fix this is to key the distress calculation not off zone->prev_priority, but
also take into account the local caller's priority by using
min(zone->prev_priority, sc->priority)

Signed-off-by: Martin J. Bligh
Cc: Nick Piggin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Bligh
2006-10-29 02:30:51 +0800
3bb1a852a [PATCH] vmscan: Fix temp_priority race ... Browse Code »

The temp_priority field in zone is racy, as we can walk through a reclaim
path, and just before we copy it into prev_priority, it can be overwritten
(say with DEF_PRIORITY) by another reclaimer.

The same bug is contained in both try_to_free_pages and balance_pgdat, but
it is fixed slightly differently. In balance_pgdat, we keep a separate
priority record per zone in a local array. In try_to_free_pages there is
no need to do this, as the priority level is the same for all zones that we
reclaim from.

Impact of this bug is that temp_priority is copied into prev_priority, and
setting this artificially high causes reclaimers to set distress
artificially low. They then fail to reclaim mapped pages, when they are,
in fact, under severe memory pressure (their priority may be as low as 0).
This causes the OOM killer to fire incorrectly.

From: Andrew Morton

__zone_reclaim() isn't modifying zone->prev_priority. But zone->prev_priority
is used in the decision whether or not to bring mapped pages onto the inactive
list. Hence there's a risk here that __zone_reclaim() will fail because
zone->prev_priority ir large (ie: low urgency) and lots of mapped pages end up
stuck on the active list.

Fix that up by decreasing (ie making more urgent) zone->prev_priority as
__zone_reclaim() scans the zone's pages.

This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created. It should be
possible to remove that now, and to just start out at DEF_PRIORITY?

Cc: Nick Piggin
Cc: Christoph Lameter
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Bligh
2006-10-29 02:30:50 +0800

21 Oct, 2006

1 commit

3fcfab16c [PATCH] separate bdi congestion functions from queue congestion functions ... Browse Code »

Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.

The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.

This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.

Cc: "Thomas Maier"
Cc: "Jens Axboe"
Cc: Trond Myklebust
Cc: David Howells
Cc: Peter Osterlund
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-10-21 01:26:35 +0800

17 Oct, 2006

1 commit

a649fd927 [PATCH] invalidate: remove_mapping() fix ... Browse Code »

If remove_mapping() failed to remove the page from its mapping, don't go and
mark it not uptodate! Makes kernel go dead.

(Actually, I don't think the ClearPageUptodate is needed there at all).

Says Nick Piggin:

"Right, it isn't needed because at this point the page is guaranteed
by remove_mapping to have no references (except us) and cannot pick
up any new ones because it is removed from pagecache.

We can delete it."

Signed-off-by: Andrew Morton
Acked-by: Nick Piggin
Signed-off-by: Linus Torvalds

Andrew Morton
2006-10-17 23:18:43 +0800

27 Sep, 2006

2 commits

0fd0e6b05 [PATCH] page invalidation cleanup ... Browse Code »

Clean up the invalidate code, and use a common function to safely remove
the page from pagecache.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-27 23:26:12 +0800
e129b5c23 [PATCH] vm: add per-zone writeout counter ... Browse Code »

The VM is supposed to minimise the number of pages which get written off the
LRU (for IO scheduling efficiency, and for high reclaim-success rates). But
we don't actually have a clear way of showing how true this is.

So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of
pages which have been written by the vm scanner in this zone and globally.

Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-27 23:26:12 +0800

26 Sep, 2006

9 commits

89fa30242 [PATCH] NUMA: Add zone_to_nid function ... Browse Code »

There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
83e33a471 [PATCH] zone reclaim with slab: avoid unecessary off node allocations ... Browse Code »

Minor performance fix.

If we reclaimed enough slab pages from a zone then we can avoid going off
node with the current allocation. Take care of updating nr_reclaimed when
reclaiming from the slab.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:52 +0800
0ff38490c [PATCH] zone_reclaim: dynamic slab reclaim ... Browse Code »

Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.

However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).

This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we

1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.

2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).

The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.

The default for the min_slab_ratio is 5%

Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.

[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
972d1a7b1 [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE ... Browse Code »

Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.

Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
8417bba4b [PATCH] Replace min_unmapped_ratio by min_unmapped_pages in struct zone ... Browse Code »

*_pages is a better description of the role of the variable.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
4ff1ffb48 [PATCH] oom: reclaim_mapped on oom ... Browse Code »

Potentially it takes several scans of the lru lists before we can even start
reclaiming pages.

mapped pages, with young ptes can take 2 passes on the active list + one on
the inactive list. But reclaim_mapped may not always kick in instantly, so it
could take even more than that.

Raise the threshold for marking a zone as all_unreclaimable from a factor of 4
time the pages in the zone to 6. Introduce a mechanism to force
reclaim_mapped if we've reached a factor 3 and still haven't made progress.

Previously, a customer doing stress testing was able to easily OOM the box
after using only a small fraction of its swap (~100MB). After the patches, it
would only OOM after having used up all swap (~800MB).

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
408d85441 [PATCH] oom: use unreclaimable info ... Browse Code »

__alloc_pages currently starts shooting if page reclaim has failed to free up
swap_cluster_max pages in one run through the priorities. This is not always
a good indicator on its own, so make use of the all_unreclaimable logic as
well: don't consider going OOM until all zones we're interested in are
unreclaimable.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
28e4d965e [PATCH] mm: remove_mapping() safeness ... Browse Code »

Some users of remove_mapping had been unsafe.

Modify the remove_mapping precondition to ensure the caller has locked the
page and obtained the correct mapping. Modify callers to ensure the
mapping is the correct one.

[hugh@veritas.com: swapper_space fix]
Signed-off-by: Nick Piggin
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:48 +0800
725d704ec [PATCH] mm: VM_BUG_ON ... Browse Code »

Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this
in the lightweight, inline refcounting functions; PageLRU and PageActive
checks in vmscan, because they're pretty well confined to vmscan. And in
page allocate/free fastpaths which can be the hottest parts of the kernel
for kbuilds.

Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
side-effects, and should not be used outside core mm code.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:44 +0800

04 Jul, 2006

1 commit

9614634fe [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O ... Browse Code »

It turns out that it is advantageous to leave a small portion of unmapped file
backed pages if all of a zone's pages (or almost all pages) are allocated and
so the page allocator has to go off-node.

This allows recently used file I/O buffers to stay on the node and
reduces the times that zone reclaim is invoked if file I/O occurs
when we run out of memory in a zone.

The problem is that zone reclaim runs too frequently when the page cache is
used for file I/O (read write and therefore unmapped pages!) alone and we have
almost all pages of the zone allocated. Zone reclaim may remove 32 unmapped
pages. File I/O will use these pages for the next read/write requests and the
unmapped pages increase. After the zone has filled up again zone reclaim will
remove it again after only 32 pages. This cycle is too inefficient and there
are potentially too many zone reclaim cycles.

With the 1% boundary we may still remove all unmapped pages for file I/O in
zone reclaim pass. However. it will take a large number of read and writes
to get back to 1% again where we trigger zone reclaim again.

The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
second timeout.

[akpm@osdl.org: rename the /proc file and the variable]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-04 06:26:59 +0800

01 Jul, 2006

6 commits

f8891e5e1 [PATCH] Light weight event counters ... Browse Code »

The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat. They have no
essential function for the VM.

We use a simple increment of per cpu variables. In order to avoid the most
severe races we disable preempt. Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter. However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter. For many architectures this will be reduced by the compiler to a
single instruction. This single instruction is atomic for i386 and x86_64.
And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:36 +0800
9a865ffa3 [PATCH] zoned vm counters: conversion of nr_slab to per zone counter ... Browse Code »

- Allows reclaim to access counter without looping over processor counts.

- Allows accurate statistics on how many pages are used in a zone by
the slab. This may become useful to balance slab allocations over
various zones.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800
34aa1330f [PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval ... Browse Code »

The zone_reclaim_interval was necessary because we were not able to determine
how many unmapped pages exist in a zone. Therefore we had to scan in
intervals to figure out if any pages were unmapped.

With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
pages and the number of mapped pages in a zone. So we can simply skip the
reclaim if there is an insufficient number of unmapped pages. We use
SWAP_CLUSTER_MAX as the boundary.

Drop all support for /proc/sys/vm/zone_reclaim_interval.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800
f3dbd3446 [PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED ... Browse Code »

The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
calculation as the number of mapped pagecache pages. However, that is not
true. NR_FILE_MAPPED includes the mapped anonymous pages. This patch
separates those and therefore allows an accurate tracking of the anonymous
pages per zone.

It then becomes possible to determine the number of unmapped pages per zone
and we can avoid scanning for unmapped pages if there are none.

Also it may now be possible to determine the mapped/unmapped ratio in
get_dirty_limit. Isnt the number of anonymous pages irrelevant in that
calculation?

Note that this will change the meaning of the number of mapped pages reported
in /proc/vmstat /proc/meminfo and in the per node statistics. This may affect
user space tools that monitor these counters! NR_FILE_MAPPED works like
NR_FILE_DIRTY. It is only valid for pagecache pages.

Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800
bf02cf4b6 [PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure ... Browse Code »

We can now access the number of pages in a mapped state in an inexpensive way
in shrink_active_list. So drop the nr_mapped field from scan_control.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800
65ba55f50 [PATCH] zoned vm counters: convert nr_mapped to per zone counter ... Browse Code »

nr_mapped is important because it allows a determination of how many pages of
a zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

We take the nr_mapped field out of the page state structure and define a new
per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
from NR_MAPPED in the next patch).

We replace the use of nr_mapped in various kernel locations. This avoids the
looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
zone reclaim).

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:34 +0800

28 Jun, 2006

2 commits

9c7b216d2 [PATCH] cpu hotplug: revert init patch submitted for 2.6.17 ... Browse Code »

In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).

We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).

If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.

This patch reverts the notifier_call changes made in 2.6.17

Signed-off-by: Chandra Seetharaman
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chandra Seetharaman
2006-06-28 08:32:40 +0800
3218ae14b [PATCH] pgdat allocation for new node add (export kswapd start func) ... Browse Code »

When node is hot-added, kswapd for the node should start. This export kswapd
start function as kswapd_run() to use at add_memory().

[akpm@osdl.org: daemonize() isn't needed when using the kthread API]
Signed-off-by: Yasunori Goto
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Dave Hansen
Cc: "Brown, Len"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2006-06-28 08:32:36 +0800

23 Jun, 2006

4 commits

bd1e22b8e [PATCH] initialise total_memory() earlier ... Browse Code »

Initialise total_memory earlier in boot. Because if for some reason we run
page reclaim early in boot, we don't want total_memory to be zero when we use
it as a divisor.

And rename total_memory to vm_total_pages to avoid naming clashes with
architectures.

Cc: Yasunori Goto
Cc: KAMEZAWA Hiroyuki
Cc: Martin Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-06-23 22:42:52 +0800
04e62a29b [PATCH] More page migration: use migration entries for file pages ... Browse Code »

This implements the use of migration entries to preserve ptes of file backed
pages during migration. Processes can therefore be migrated back and forth
without loosing their connection to pagecache pages.

Note that we implement the migration entries only for linear mappings.
Nonlinear mappings still require the unmapping of the ptes for migration.

And another writepage() ugliness shows up. writepage() can drop the page
lock. Therefore we have to remove migration ptes before calling writepages()
in order to avoid having migration entries point to unlocked pages.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-06-23 22:42:51 +0800
111ebb6e6 [PATCH] writeback: fix range handling ... Browse Code »

When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request. Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".

To make all this sane, the patch changes range of writeback_control.

So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.

And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.

This patch does,

- Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
-1 is usually ok for range_end (type is long long). But, if someone did,

range_end += val; range_end is "val - 1"
u64val = range_end >> bits; u64val is "~(0ULL)"

or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
things, and uses LLONG_MAX for range_end.

- All callers of ->writepages() sets range_start/end or range_cyclic.

- Fix updates of ->writeback_index. It seems already bit strange.
If it starts at 0 and ended by check of nr_to_write, this last
index may reduce chance to scan end of file. So, this updates
->writeback_index only if range_cyclic is true or whole-file is
scanned.

Signed-off-by: OGAWA Hirofumi
Cc: Nathan Scott
Cc: Anton Altaparmakov
Cc: Steven French
Cc: "Vladimir V. Saveliev"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2006-06-23 22:42:49 +0800
d6277db4a [PATCH] swsusp: rework memory shrinker ... Browse Code »

Rework the swsusp's memory shrinker in the following way:

- Simplify balance_pgdat() by removing all of the swsusp-related code
from it.

- Make shrink_all_memory() use shrink_slab() and a new function
shrink_all_zones() which calls shrink_active_list() and
shrink_inactive_list() directly for each zone in a way that's optimized
for suspend.

In shrink_all_memory() we try to free exactly as many pages as the caller
asks for, preferably in one shot, starting from easier targets. If slab
caches are huge, they are most likely to have enough pages to reclaim.
The inactive lists are next (the zones with more inactive pages go first)
etc.

Each time shrink_all_memory() attempts to shrink the active and inactive
lists for each zone in 5 passes. In the first pass, only the inactive
lists are taken into consideration. In the next two passes the active
lists are also shrunk, but mapped pages are not reclaimed. In the last
two passes the active and inactive lists are shrunk and mapped pages are
reclaimed as well. The aim of this is to alter the reclaim logic to choose
the best pages to keep on resume and improve the responsiveness of the
resumed system.

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Con Kolivas
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-06-23 22:42:48 +0800

12 Jun, 2006

1 commit

c0bbbc73d [PATCH] typo in vmscan.c ... Browse Code »

From: Christoph Lameter

Looks like a comma was left from the conversion from a struct to an
assignment.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-06-12 06:27:37 +0800

26 Apr, 2006

1 commit

83d722f7e [PATCH] Remove __devinit and __cpuinit from notifier_call definitions ... Browse Code »

Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).

This patch fixes all such usages to _not_ have the notifier_call __init
section.

Signed-off-by: Chandra Seetharaman
Signed-off-by: Linus Torvalds

Chandra Seetharaman
2006-04-26 23:30:03 +0800

28 Mar, 2006

1 commit

ec936fc56 [PATCH] for_each_online_pgdat: renaming for_each_pgdat ... Browse Code »

Replace for_each_pgdat() with for_each_online_pgdat().

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-03-28 00:44:48 +0800

26 Mar, 2006

1 commit

05eeae208 [PATCH] find_task_by_pid() needs tasklist_lock ... Browse Code »

A couple of places are forgetting to take it.

The kswapd case is probably unimportant. keventd_create_kthread() was racy.

The whole thing is a bit flakey: you start a kernel thread, get its pid from
kernel_thread() then look up its task_struct.

a) It assumes that pid recycling takes a "long" time.

b) We get a task_struct but no reference was taken on it. The owner of the
kswapd and kthread task_struct*'s must assume that the new thread won't
exit unexpectedly. Because if it does, they're left holding dead memory
and any attempt to control or stop that task will crash.

Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-26 00:22:57 +0800

22 Mar, 2006

8 commits

b20a35035 [PATCH] page migration reorg ... Browse Code »

Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-22 23:54:06 +0800
248a0301e [PATCH] mm: make shrink_all_memory try harder ... Browse Code »

Make shrink_all_memory() repeat the attempts to free more memory if there
seems to be no pages to free.

Signed-off-by: Rafael J. Wysocki
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-03-22 23:54:05 +0800
6e5ef1a96 [PATCH] vmscan: emove obsolete checks from shrink_list() and fix unlikely in refill_inactive_zone() ... Browse Code »

As suggested by Marcelo:

1. The optimization introduced recently for not calling
page_referenced() during zone reclaim makes two additional checks in
shrink_list unnecessary.

2. The if (unlikely(sc->may_swap)) in refill_inactive_zone is optimized
for the zone_reclaim case. However, most peoples system only does swap.
Undo that.

Signed-off-by: Christoph Lameter
Cc: Marcelo Tosatti
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-22 23:54:02 +0800
0f8053a50 [PATCH] mm: make __put_page internal ... Browse Code »

Remove __put_page from outside the core mm/. It is dangerous because it does
not handle compound pages nicely, and misses 1->0 transitions. If a user
later appears that really needs the extra speed we can reevaluate.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:54:01 +0800
fb8d14e17 [PATCH] mm: shrink_inactive_lis() nr_scan accounting fix ... Browse Code »

In shrink_inactive_list(), nr_scan is not accounted when nr_taken is 0.
But 0 pages taken does not mean 0 pages scanned.

Move the goto statement below the accounting code to fix it.

Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2006-03-22 23:54:00 +0800
c9b02d970 [PATCH] mm: isolate_lru_pages() scan count fix ... Browse Code »

In isolate_lru_pages(), *scanned reports one more scan because the scan
counter is increased one more time on exit of the while-loop.

Change the while-loop to for-loop to fix it.

Signed-off-by: Nick Piggin
Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2006-03-22 23:54:00 +0800
7fb2d46d3 [PATCH] zone_reclaim: additional comments and cleanup ... Browse Code »

Add some comments to explain how zone reclaim works. And it fixes the
following issues:

- PF_SWAPWRITE needs to be set for RECLAIM_SWAP to be able to write
out pages to swap. Currently RECLAIM_SWAP may not do that.

- remove setting nr_reclaimed pages after slab reclaim since the slab shrinking
code does not use that and the nr_reclaimed pages is just right for the
intended follow up action.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-03-22 23:54:00 +0800
1742f19fa [PATCH] vmscan: rename functions ... Browse Code »

We have:

try_to_free_pages
->shrink_caches(struct zone **zones, ..)
->shrink_zone(struct zone *, ...)
->shrink_cache(struct zone *, ...)
->shrink_list(struct list_head *, ...)
->refill_inactive_list((struct zone *, ...)

which is fairly irrational.

Rename things so that we have

try_to_free_pages
->shrink_zones(struct zone **zones, ..)
->shrink_zone(struct zone *, ...)
->shrink_inactive_list(struct zone *, ...)
->shrink_page_list(struct list_head *, ...)
->shrink_active_list(struct zone *, ...)

Cc: Nick Piggin
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-22 23:54:00 +0800