Doug / smarc-fsl-linux-kernel | Embedian Git Server

19 Jan, 2006

1 commit

9eeff2395 [PATCH] Zone reclaim: Reclaim logic ... Browse Code »

Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
patch fixes them up, removes unused code and makes zone reclaim usable.

Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available. Zone reclaim is of particular importance for NUMA
machines. It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
component (enclosure or motherboard) on IA64. The meaning of the NUMA
distance information seems to vary by arch.

If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).

This patch was discussed before. See

http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-19 11:20:17 +0800

17 Jan, 2006

1 commit

c09b42404 [PATCH] x86_64: add __meminit for memory hotplug ... Browse Code »

Add __meminit to the __init lineup to ensure functions default
to __init when memory hotplug is not enabled. Replace __devinit
with __meminit on functions that were changed when the memory
hotplug code was introduced.

Signed-off-by: Matt Tolentino
Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Matt Tolentino
2006-01-17 15:18:35 +0800

13 Jan, 2006

1 commit

cbe8dd4af [PATCH] memmap_init_zone(): remove uneccesary page++ ... Browse Code »

Remove unecessary page++ from memmap_init_zone loop.

Signed-off-by: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Ungerer
2006-01-13 01:08:49 +0800

12 Jan, 2006

3 commits

4eac915d0 [PATCH] mm: gfp_atomic comments ... Browse Code »

Clarify in comments that GFP_ATOMIC means both "don't sleep" and "use
emergency pools", hence both ALLOC_HARDER and ALLOC_HIGH.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-12 10:42:09 +0800
7365f3d16 [PATCH] Restore KERN_EMERG to each line printed by bad_page ... Browse Code »

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-01-12 10:42:08 +0800
a4fc7ab1d [PATCH] fix/simplify mutex debugging code ... Browse Code »

Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
arguments instead, since all its callers were just calculating the 'to'
address for themselves anyway... (and sometimes doing so badly).

Signed-off-by: David Woodhouse
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

David Woodhouse
2006-01-12 00:14:16 +0800

10 Jan, 2006

1 commit

de5097c2e [PATCH] mutex subsystem, more debugging code ... Browse Code »

more mutex debugging: check for held locks during memory freeing,
task exit, enable sysrq printouts, etc.

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven

Ingo Molnar
2006-01-10 07:59:21 +0800

09 Jan, 2006

6 commits

3e0d98b9f [PATCH] cpuset: memory pressure meter ... Browse Code »

Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.

This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.

This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure. It's up to the batch
manager or other user code to decide what to do about it and take action.

==> Unless this feature is enabled by writing "1" to the special file
/dev/cpuset/memory_pressure_enabled, the hook in the rebalance
code of __alloc_pages() for this metric reduces to simply noticing
that the cpuset_memory_pressure_enabled flag is zero. So only
systems that enable this feature will compute the metric.

Why a per-cpuset, running average:

Because this meter is per-cpuset, rather than per-task or mm, the
system load imposed by a batch scheduler monitoring this metric is
sharply reduced on large systems, because a scan of the tasklist can be
avoided on each set of queries.

Because this meter is a running average, instead of an accumulating
counter, a batch scheduler can detect memory pressure with a single
read, instead of having to read and accumulate results for a period of
time.

Because this meter is per-cpuset rather than per-task or mm, the
batch scheduler can obtain the key information, memory pressure in a
cpuset, with a single read, rather than having to query and accumulate
results over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.

A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-09 12:13:42 +0800
48db57f8f [PATCH] mm: free_pages opt ... Browse Code »

Try to streamline free_pages_bulk by ensuring callers don't pass in a
'count' that exceeds the list size.

Some cleanups:
Rename __free_pages_bulk to __free_one_page.
Put the page list manipulation from __free_pages_ok into free_one_page.
Make __free_pages_ok static.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-09 12:12:40 +0800
23316bc86 [PATCH] mm: cleanup zone_pcp ... Browse Code »

Use zone_pcp everywhere even though NUMA code "knows" the internal details
of the zone. Stop other people trying to copy, and it looks nicer.

Also, only print the pagesets of online cpus in zoneinfo.

Signed-off-by: Nick Piggin
Cc: "Seth, Rohit"
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-09 12:12:40 +0800
8ad4b1fb8 [PATCH] Make high and batch sizes of per_cpu_pagelists configurable ... Browse Code »

As recently there has been lot of traffic on the right values for batch and
high water marks for per_cpu_pagelists. This patch makes these two
variables configurable through /proc interface.

A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry
controls the fraction of pages at most in each zone that are allocated for
each per cpu page list. The min value for this is 8. It means that we
don't allow more than 1/8th of pages in each zone to be allocated in any
single per_cpu_pagelist.

The batch value of each per cpu pagelist is also updated as a result. It
is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)

Signed-off-by: Rohit Seth
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rohit Seth
2006-01-09 12:12:40 +0800
bec6b0c89 [PATCH] slab: remove nested #ifdef CONFIG_NUMA ... Browse Code »

For some reason there is an #ifdef CONFIG_NUMA within another #ifdef
CONFIG_NUMA in the page allocator. Remove innermost #ifdef CONFIG_NUMA

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-09 12:12:40 +0800
84c2008af [PATCH] revert "mm: page_state fixes" ... Browse Code »

Hugh says:

page_alloc_cpu_notify() specifically contains code to

/* Add dead cpu's page_states to our own. */

which handles this more efficiently.

Cc: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-01-09 12:12:38 +0800

07 Jan, 2006

18 commits

a74609faf [PATCH] mm: page_state opt ... Browse Code »

Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields. Callers must provide their own locking (either
disable interrupts or not update from interrupt context).

Switch over the hot callsites that can easily be moved under interrupts off
sections.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:29 +0800
070f80326 [PATCH] build_zonelists_node(): rename args ... Browse Code »

Give j and r meaningful names.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
02a68a5eb [PATCH] Fix zone policy determination ... Browse Code »

The use k in the inner loop means that the highest zone nr is always used
if any zone of a node is populated. This means that the policy zone is not
correctly determined on arches that do no use HIGHMEM like ia64.

Change the loop to decrement k which also simplifies the BUG_ON.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
4be38e351 [PATCH] mm: move determination of policy_zone into page allocator ... Browse Code »

Currently the function to build a zonelist for a BIND policy has the side
effect to set the policy_zone. This seems to be a bit strange. policy
zone seems to not be initialized elsewhere and therefore 0. Do we police
ZONE_DMA if no bind policy has been used yet?

This patch moves the determination of the zone to apply policies to into
the page allocator. We determine the zone while building the zonelist for
nodes.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
1a93205bd [PATCH] mm: simplify build_zonelists_node by removing the case statement. ... Browse Code »

Simplify build_zonelists_node by removing the case statement.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
f3fe65122 [PATCH] mm: add populated_zone() helper ... Browse Code »

There are numerous places we check whether a zone is populated or not.

Provide a helper function to check for populated zones and convert all
checks for zone->present_pages.

Signed-off-by: Con Kolivas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-01-07 00:33:28 +0800
224abf92b [PATCH] mm: bad_page optimisation ... Browse Code »

Cut down size slightly by not passing bad_page the function name (it should be
able to be determined by dump_stack()). And cut down the number of printks in
bad_page.

Also, cut down some branching in the destroy_compound_page path.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:26 +0800
9328b8faa [PATCH] mm: dma32 zone statistics ... Browse Code »

Add dma32 to zone statistics. Also attempt to arrange struct page_state a
bit better (visually).

Signed-off-by: Nick Piggin
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:26 +0800
a226f6c89 [PATCH] FRV: Clean up bootmem allocator's page freeing algorithm ... Browse Code »

The attached patch cleans up the way the bootmem allocator frees pages.

A new function, __free_pages_bootmem(), is provided in mm/page_alloc.c that is
called from mm/bootmem.c to turn pages over to the main allocator. All the
bits of code to initialise pages (clearing PG_reserved and setting the page
count) are moved to here. The checks on page validity are removed, on the
assumption that the struct page arrays will have been prepared correctly.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-01-07 00:33:26 +0800
085cc7d5d [PATCH] mm: page_alloc cleanups ... Browse Code »

Small cleanups that does not change generated code with the gcc's I've tested
with.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
a86b1f531 [PATCH] mm: page_state fixes ... Browse Code »

read_page_state and __get_page_state only traverse online CPUs, which will
cause results to fluctuate when CPUs are plugged in or out.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
2d92c5c91 [PATCH] mm: remove pcp low ... Browse Code »

struct per_cpu_pages.low is useless. Remove it.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
13e7444b0 [PATCH] mm: remove bad_range ... Browse Code »

bad_range is supposed to be a temporary check. It would be a pity to throw it
out. Make it depend on CONFIG_DEBUG_VM instead.

CONFIG_HOLES_IN_ZONE systems were relying on this to check pfn_valid in the
page allocator. Add that to page_is_buddy instead.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
92be2e33b [PATCH] mm: microopt conditions ... Browse Code »

Micro optimise some conditionals where we don't need lazy evaluation.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
77a8a7883 [PATCH] mm: set_page_refs opt ... Browse Code »

Inline set_page_refs.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
c54ad30c7 [PATCH] mm: pagealloc opt ... Browse Code »

Slightly optimise some page allocation and freeing functions by taking
advantage of knowing whether or not interrupts are disabled.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
a94b3ab7e [PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES ... Browse Code »

The NODES_SPAN_OTHER_NODES config option was created so that DISCONTIGMEM
could handle pSeries numa layouts. However, support for DISCONTIGMEM has
been replaced by SPARSEMEM on powerpc. As a result, this config option and
supporting code is no longer needed.

I have already sent a patch to Paul that removes the option from powerpc
specific code. This removes the arch independent piece. Doesn't really
matter which is applied first.

Signed-off-by: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2006-01-07 00:33:24 +0800
47f3a867f [PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags ... Browse Code »

Two changes to the setting of the ALLOC_CPUSET flag in
mm/page_alloc.c:__alloc_pages()

- A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
This case of all cases, since it is handling a request that will free up
more memory than is asked for (exiting tasks, e.g.) should be allowed to
escape cpuset constraints when memory is tight.

- A logic change to make it simpler. Honor cpusets even on GFP_ATOMIC
(!wait) requests. With this, cpuset confinement applies to all requests
except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
remove the ALLOC_CPUSET flag entirely. Since I don't know any real reason
this logic has to be either way, I am choosing the path of the simplest
code.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-07 00:33:21 +0800

16 Dec, 2005

1 commit

78d9955bb [PATCH] missing prototype (mm/page_alloc.c) ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-12-16 02:04:30 +0800

04 Dec, 2005

1 commit

0ceaacc97 [PATCH] Fix up per-cpu page batch sizes ... Browse Code »

The code to clamp batch sizes to 2^n - 1 went missing and an extra
check got added, which must have been a hunk of the "higer order pcp
batch refills" work sneaking in.

Signed-off-by: Nick Piggin
Signed-off-by: Linus Torvalds

Nick Piggin
2005-12-04 12:46:40 +0800

29 Nov, 2005

1 commit

3148890bf [PATCH] mm: __alloc_pages cleanup fix ... Browse Code »

I believe this patch is required to fix breakage in the asynch reclaim
watermark logic introduced by this patch:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7fb1d9fca5c6e3b06773b69165a73f3fb786b8ee

Just some background of the watermark logic in case it isn't clear...
Basically what we have is this:

--- pages_high
|
| (a)
|
--- pages_low
|
| (b)
|
--- pages_min
|
| (c)
|
--- 0

Now when pages_low is reached, we want to kick asynch reclaim, which gives us
an interval of "b" before we must start synch reclaim, and gives kswapd an
interval of "a" before it need go back to sleep.

When pages_min is reached, normal allocators must enter synch reclaim, but
PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie. atomic allocations, recursive
allocations, etc.) get access to varying amounts of the reserve "c".

Signed-off-by: Nick Piggin
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-11-29 06:42:24 +0800

23 Nov, 2005

2 commits

689bcebfd [PATCH] unpaged: PG_reserved bad_page ... Browse Code »

It used to be the case that PG_reserved pages were silently never freed, but
in 2.6.15-rc1 they may be freed with a "Bad page state" message. We should
work through such cases as they appear, fixing the code; but for now it's
safer to issue the message without freeing the page, leaving PG_reserved set.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
664beed01 [PATCH] unpaged: unifdefed PageCompound ... Browse Code »

It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.

That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800

18 Nov, 2005

1 commit

6b1de9161 [PATCH] VM: fix zone list restart in page allocatate ... Browse Code »

We must reassign z before looping through the zones kicking kswapd,
since it will be NULL if we hit an OOM condition and jump back to the
beginning again. 'z' is initially assigned before the restart: label. So
move the restart label up a little.

Signed-off-by: Jens Axboe

Jens Axboe
2005-11-18 04:43:01 +0800

15 Nov, 2005

3 commits

4060994c3 Merge x86-64 update from Andi Browse Code »

Linus Torvalds
2005-11-15 11:56:02 +0800
07808b74e [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t ... Browse Code »

Has been introduced for x86-64 at some point to save memory
in struct page, but has been obsolete for some time. Just
remove it.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-11-15 11:55:14 +0800
b0d416932 [PATCH] x86_64: When cpu_up fails clean up page allocator properly ... Browse Code »

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-11-15 11:55:13 +0800