Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2007

3 commits

d688abf50 move page writeback acounting out of macros ... Browse Code »

page-writeback accounting is presently performed in the page-flags macros.
This is inconsistent and a bit ugly and makes it awkward to implement
per-backing_dev under-writeback page accounting.

So move this accounting down to the callsite(s).

Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-07-20 01:04:52 +0800
fe3cba17c mm: share PG_readahead and PG_reclaim ... Browse Code »

Share the same page flag bit for PG_readahead and PG_reclaim.

One is used only on file reads, another is only for emergency writes. One
is used mostly for fresh/young pages, another is for old pages.

Combinations of possible interactions are:

a) clear PG_reclaim => implicit clear of PG_readahead
it will delay an asynchronous readahead into a synchronous one
it actually does _good_ for readahead:
the pages will be reclaimed soon, it's readahead thrashing!
in this case, synchronous readahead makes more sense.

b) clear PG_readahead => implicit clear of PG_reclaim
one(and only one) page will not be reclaimed in time
it can be avoided by checking PageWriteback(page) in readahead first

c) set PG_reclaim => implicit set of PG_readahead
will confuse readahead and make it restart the size rampup process
it's a trivial problem, and can mostly be avoided by checking
PageWriteback(page) first in readahead

d) set PG_readahead => implicit set of PG_reclaim
PG_readahead will never be set on already cached pages.
PG_reclaim will always be cleared on dirtying a page.
so not a problem.

In summary,
a) we get better behavior
b,d) possible interactions can be avoided
c) racy condition exists that might affect readahead, but the chance
is _really_ low, and the hurt on readahead is trivial.

Compound pages also use PG_reclaim, but for now they do not interact with
reclaim/readahead code.

Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800
d77c2d7cc readahead: introduce PG_readahead ... Browse Code »

Introduce a new page flag: PG_readahead.

It acts as a look-ahead mark, which tells the page reader: Hey, it's time to
invoke the read-ahead logic. For the sake of I/O pipelining, don't wait until
it runs out of cached pages!

Signed-off-by: Fengguang Wu
Cc: Steven Pratt
Cc: Ram Pai
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:43 +0800

18 Jul, 2007

1 commit

c85b04c37 xen: add pinned page flag ... Browse Code »

Add a new definition for PG_owner_priv_1 to define PG_pinned on Xen
pagetable pages.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Chris Wright

Jeremy Fitzhardinge
2007-07-18 23:47:43 +0800

08 May, 2007

3 commits

04293355a mm: remove unused page flags ... Browse Code »

Remove the two page flags that were previously used by swsusp and are no
longer needed.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-08 03:12:59 +0800
6d7779538 mm: optimize compound_head() by avoiding a shared page flag ... Browse Code »

The patch adds PageTail(page) and PageHead(page) to check if a page is the
head or the tail of a compound page. This is done by masking the two bits
describing the state of a compound page and then comparing them. So one
comparision and a branch instead of two bit checks and two branches.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800
d85f33855 Make page->private usable in compound pages ... Browse Code »

If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800

27 Apr, 2007

1 commit

6c210482a [S390] split page_test_and_clear_dirty. ... Browse Code »

The page_test_and_clear_dirty primitive really consists of two
operations, page_test_dirty and the page_clear_dirty. The combination
of the two is not an atomic operation, so it makes more sense to have
two separate operations instead of one.
In addition to the improved readability of the s390 version of
SetPageUptodate, it now avoids the page_test_dirty operation which is
an insert-storage-key-extended (iske) instruction which is an expensive
operation.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2007-04-27 22:01:46 +0800

02 Mar, 2007

1 commit

5409bae07 [PATCH] Rename PG_checked to PG_owner_priv_1 ... Browse Code »

Rename PG_checked to PG_owner_priv_1 to reflect its availablilty as a
private flag for use by the owner/allocator of the page. In the case of
pagecache pages (which might be considered to be owned by the mm),
filesystems may use the flag.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-03-02 06:53:37 +0800

22 Dec, 2006

1 commit

fba2591bf VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions ... Browse Code »

They were horribly easy to mis-use because of their tempting naming, and
they also did way more than any users of them generally wanted them to
do.

A dirty page can become clean under two circumstances:

(a) when we write it out. We have "clear_page_dirty_for_io()" for
this, and that function remains unchanged.

In the "for IO" case it is not sufficient to just clear the dirty
bit, you also have to mark the page as being under writeback etc.

(b) when we actually remove a page due to it becoming inaccessible to
users, notably because it was truncate()'d away or the file (or
metadata) no longer exists, and we thus want to cancel any
outstanding dirty state.

For the (b) case, we now introduce "cancel_dirty_page()", which only
touches the page state itself, and verifies that the page is not mapped
(since cancelling writes on a mapped page would be actively wrong as it
is still accessible to users).

Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
ReiserFS, XFS all use the old confusing functions, and will be fixed
separately in subsequent commits (with some of them just removing the
offending logic, and others using clear_page_dirty_for_io()).

This was confirmed by Martin Michlmayr to fix the apt database
corruption on ARM.

Cc: Martin Michlmayr
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: Arjan van de Ven
Cc: Andrei Popa
Cc: Andrew Morton
Cc: Dave Kleikamp
Cc: Gordon Farquharson
Cc: Martin Schwidefsky
Cc: Trond Myklebust
Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-22 01:19:57 +0800

30 Sep, 2006

1 commit

2dcea57ae [PATCH] convert s390 page handling macros to functions ... Browse Code »

Convert s390 page handling macros to functions. In particular this fixes a
problem with s390's SetPageUptodate macro which uses its input parameter
twice which again can cause subtle bugs.

[akpm@osdl.org: build fix]
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2006-09-30 00:18:03 +0800

26 Sep, 2006

1 commit

da6052f7b [PATCH] update some mm/ comments ... Browse Code »

Let's try to keep mm/ comments more useful and up to date. This is a start.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:49 +0800

01 Jul, 2006

2 commits

ce866b34a [PATCH] zoned vm counters: conversion of nr_writeback to per zone counter ... Browse Code »

Conversion of nr_writeback to per zone counter.

This removes the last page_state counter from arch/i386/mm/pgtable.c so we
drop the page_state from there.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:35 +0800
f6ac2354d [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h ... Browse Code »

NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas
event counters do not need to be.

Zone based VM statistics are necessary to be able to determine what the state
of memory in one zone is. In a NUMA system this can be helpful for local
reclaim and other memory optimizations that may be able to shift VM load in
order to get more balanced memory use.

It is also useful to know how the computing load affects the memory
allocations on various zones. This patchset allows the retrieval of that data
from userspace.

The patchset introduces a framework for counters that is a cross between the
existing page_stats --which are simply global counters split per cpu-- and the
approach of deferred incremental updates implemented for nr_pagecache.

Small per cpu 8 bit counters are added to struct zone. If the counter exceeds
certain thresholds then the counters are accumulated in an array of
atomic_long in the zone and in a global array that sums up all zone values.
The small 8 bit counters are next to the per cpu page pointers and so they
will be in high in the cpu cache when pages are allocated and freed.

Access to VM counter information for a zone and for the whole machine is then
possible by simply indexing an array (Thanks to Nick Piggin for pointing out
that approach). The access to the total number of pages of various types does
no longer require the summing up of all per cpu counters.

Benefits of this patchset right now:

- Ability for UP and SMP configuration to determine how memory
is balanced between the DMA, NORMAL and HIGHMEM zones.

- loops over all processors are avoided in writeback and
reclaim paths. We can avoid caching the writeback information
because the needed information is directly accessible.

- Special handling for nr_pagecache removed.

- zone_reclaim_interval vanishes since VM stats can now determine
when it is worth to do local reclaim.

- Fast inline per node page state determination.

- Accurate counters in /sys/devices/system/node/node*/meminfo. Current
counters are counting simply which processor allocated a page somewhere
and guestimate based on that. So the counters were not useful to show
the actual distribution of page use on a specific zone.

- The swap_prefetch patch requires per node statistics in order to
figure out when processors of a node can prefetch. This patch provides
some of the needed numbers.

- Detailed VM counters available in more /proc and /sys status files.

References to earlier discussions:
V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

Performance tests with AIM7 did not show any regressions. Seems to be a tad
faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes
fixes for s390/arm/uml arch code.

This patch:

Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

Create vmstat.c/vmstat.h by separating the counter code and the proc
functions.

Move the vm_stat_text array before zoneinfo_show.

[akpm@osdl.org: s390 build fix]
[akpm@osdl.org: HOTPLUG_CPU build fix]
Signed-off-by: Christoph Lameter
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:34 +0800

23 Jun, 2006

1 commit

f886ed443 [PATCH] PG_uncached is ia64 only ... Browse Code »

As Nick points out, only ia64 uses PG_uncached. So we can push it up into the
higher bits of the lower half of page->flags and make room for another flag on
32-bit machines.

Cc: "Luck, Tony"
Cc: Jesse Barnes
Cc: Jes Sorensen
Cc: Nick Piggin
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-06-23 22:42:46 +0800

11 Apr, 2006

2 commits

91fc8ab3c [PATCH] page flags: add commentry regarding field reservation ... Browse Code »

Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.

Signed-off-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-04-11 21:18:32 +0800
676165a8a [PATCH] Fix buddy list race that could lead to page lru list corruptions ... Browse Code »

Rohit found an obscure bug causing buddy list corruption.

page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.

Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).

Signed-off-by: Martin Bligh
Signed-off-by: Rohit Seth
Signed-off-by: Nick Piggin
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Linus Torvalds

Nick Piggin
2006-04-11 01:16:37 +0800

22 Mar, 2006

6 commits

9d4141522 [PATCH] mm: page_state comment more ... Browse Code »

Clarify that preemption needs to be guarded against with the
__xxx_page_state functions.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:58 +0800
f205b2fe6 [PATCH] mm: slab less atomics ... Browse Code »

Atomic operation removal from slab

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:57 +0800
5e9dace8d [PATCH] mm: page_alloc less atomics ... Browse Code »

More atomic operation removal from page allocator

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:57 +0800
674539115 [PATCH] mm: less atomic ops ... Browse Code »

In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0. So no need for atomic
operations here.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:57 +0800
4c84cacfa [PATCH] mm: PageActive no testset ... Browse Code »

PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:57 +0800
8d438f96d [PATCH] mm: PageLRU no testset ... Browse Code »

PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:53:56 +0800

07 Jan, 2006

3 commits

b09eb1c06 [PATCH] mm: page_state opt docs ... Browse Code »

Comment the new locking rules for page_state statistics.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:29 +0800
a74609faf [PATCH] mm: page_state opt ... Browse Code »

Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields. Callers must provide their own locking (either
disable interrupts or not update from interrupt context).

Switch over the hot callsites that can easily be moved under interrupts off
sections.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:29 +0800
9328b8faa [PATCH] mm: dma32 zone statistics ... Browse Code »

Add dma32 to zone statistics. Also attempt to arrange struct page_state a
bit better (visually).

Signed-off-by: Nick Piggin
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:26 +0800

23 Nov, 2005

1 commit

664beed01 [PATCH] unpaged: unifdefed PageCompound ... Browse Code »

It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.

That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800

05 Sep, 2005

2 commits

c07e02db7 [PATCH] VM: add page_state info to per-node meminfo ... Browse Code »

Add page_state info to the per-node meminfo file in sysfs. This is mostly
just for informational purposes.

The lack of this information was brought up recently during a discussion
regarding pagecache clearing, and I put this patch together to test out one
of the suggestions.

It seems like interesting info to have, so I'm submitting the patch.

Signed-off-by: Martin Hicks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Hicks
2005-09-05 15:05:49 +0800
242e54686 [PATCH] mm: remove atomic ... Browse Code »

This bitop does not need to be atomic because it is performed when there will
be no references to the page (ie. the page is being freed).

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-09-05 15:05:44 +0800

22 Jun, 2005

3 commits

c2f29ea11 [PATCH] __read_page_state(): pass unsigned long instead of unsigned ... Browse Code »

By making the offset argument of __read_page_state an unsigned long instead of
unsigned, we can avoid forcing the compiler to sign extend a usually constant
argument. This saves 1 instruction on x86-64.

Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
2005-06-22 09:46:17 +0800
83e5d8f72 [PATCH] __mod_page_state(): pass unsigned long instead of unsigned ... Browse Code »

By making the offset argument of __mod_page_state an unsigned long instead
of unsigned, we can avoid forcing the compiler to sign extend a usually
constant argument. This saves 1 instruction on x86-64.

Signed-off-by: Benjamin LaHaise
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin LaHaise
2005-06-22 09:46:17 +0800
cbe37d093 [PATCH] mm: remove PG_highmem ... Browse Code »

Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll
generate a little more code, but we don't use PageHigheMem() in many places.

Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2005-06-22 09:46:17 +0800

01 May, 2005

1 commit

edfbe2b00 [PATCH] count bounce buffer pages in vmstat ... Browse Code »

This is a patch for counting the number of pages for bounce buffers. It's
shown in /proc/vmstat.

Currently, the number of bounce pages are not counted anywhere. So, if
there are many bounce pages, it seems that there are leaked pages. And
it's difficult for a user to imagine the usage of bounce pages. So, it's
meaningful to show # of bouce pages.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2005-05-01 23:58:37 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800