Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2007

40 commits

f87061842 qnx4: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Acked-by: Anders Larsen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:56 +0800
eedcbba5e bfs: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc: Tigran Aivazian
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:56 +0800
d6091b720 hpfs: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:56 +0800
7c0efc627 hfsplus: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:56 +0800
7903d9eed hfs: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:56 +0800
d7777a25a fat: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
89e107877 fs: new cont helpers ... Browse Code »

Rework the generic block "cont" routines to handle the new aops. Supporting
cont_prepare_write would take quite a lot of code to support, so remove it
instead (and we later convert all filesystems to use it).

write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
generic_cont_expand, so filesystems can avoid the old hacks they used.

Signed-off-by: Nick Piggin
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
7765ec26a gfs2: convert to new aops ... Browse Code »

Cc: Nick Piggin
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Whitehouse
2007-10-17 00:42:55 +0800
d79689c70 xfs: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Cc: David Chinner
Cc: Timothy Shimmin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
bfc1af650 ext4: convert to new aops ... Browse Code »

Convert ext4 to use write_begin()/write_end() methods.

Signed-off-by: Badari Pulavarty
Signed-off-by: Nick Piggin
Cc: Dmitriy Monakhov
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
f4fc66a89 ext3: convert to new aops ... Browse Code »

Various fixes and improvements

Signed-off-by: Badari Pulavarty
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
f34fb6ecc ext2: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
6272b5a58 block_dev: convert to new aops ... Browse Code »

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
800d15a53 implement simple fs aops ... Browse Code »

Implement new aops for some of the simpler filesystems.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
674b892ed mm: restore KERNEL_DS optimisations ... Browse Code »

Restore the KERNEL_DS optimisation, especially helpful to the 2copy write
path.

This may be a pretty questionable gain in most cases, especially after the
legacy 2copy write path is removed, but it doesn't cost much.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
8268f5a74 deny partial write for loop dev fd ... Browse Code »
44

Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not
easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know
who still used cryptoapi, but theoretically it is possible. So let's leave
things as they are. Loop device doesn't support partial write before
Nick's "write_begin/write_end" patch set, and let's it behave the same way
after.

Signed-off-by: Dmitriy Monakhov
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Monakhov
2007-10-17 00:42:55 +0800
afddba49d fs: introduce write_begin, write_end, and perform_write aops ... Browse Code »

These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).

[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin
Signed-off-by: Mark Fasheh
Signed-off-by: Dmitriy Monakhov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
637aff46f fs: fix data-loss on error ... Browse Code »

New buffers against uptodate pages are simply be marked uptodate, while the
buffer_new bit remains set. This causes error-case code to zero out parts of
those buffers because it thinks they contain stale data: wrong, they are
actually uptodate so this is a data loss situation.

Fix this by actually clearning buffer_new and marking the buffer dirty. It
makes sense to always clear buffer_new before setting a buffer uptodate.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
2f718ffc1 mm: buffered write iterator ... Browse Code »

Add an iterator data structure to operate over an iovec. Add usercopy
operators needed by generic_file_buffered_write, and convert that function
over.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:55 +0800
08291429c mm: fix pagecache write deadlocks ... Browse Code »

Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:

1. generic_buffered_write
2. lock_page
3. prepare_write
4. unlock_page+vmtruncate
5. copy_from_user
6. mmap_sem(r)
7. handle_mm_fault
8. lock_page (filemap_nopage)
9. commit_write
10. unlock_page

a. sys_munmap / sys_mlock / others
b. mmap_sem(w)
c. make_pages_present
d. get_user_pages
e. handle_mm_fault
f. lock_page (filemap_nopage)

2,8 - recursive deadlock if page is same
2,8;2,8 - ABBA deadlock is page is different
2,6;b,f - ABBA deadlock if page is same

The solution is as follows:
1. If we find the destination page is uptodate, continue as normal, but use
atomic usercopies which do not take pagefaults and do not zero the uncopied
tail of the destination. The destination is already uptodate, so we can
commit_write the full length even if there was a partial copy: it does not
matter that the tail was not modified, because if it is dirtied and written
back to disk it will not cause any problems (uptodate *means* that the
destination page is as new or newer than the copy on disk).

1a. The above requires that fault_in_pages_readable correctly returns access
information, because atomic usercopies cannot distinguish between
non-present pages in a readable mapping, from lack of a readable mapping.

2. If we find the destination page is non uptodate, unlock it (this could be
made slightly more optimal), then allocate a temporary page to copy the
source data into. Relock the destination page and continue with the copy.
However, instead of a usercopy (which might take a fault), copy the data
from the pinned temporary page via the kernel address space.

(also, rename maxlen to seglen, because it was confusing)

This increases the CPU/memory copy cost by almost 50% on the affected
workloads. That will be solved by introducing a new set of pagecache write
aops in a subsequent patch.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
4a9e5ef1f mm: write iovec cleanup ... Browse Code »

Hide some of the open-coded nr_segs tests into the iovec helpers. This is all
to simplify generic_file_buffered_write, because that gets more complex in the
next patch.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
eb2be1893 mm: buffered write cleanup ... Browse Code »

Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
part of the file, to make use of it.

Next, the buffered write path (and others) uses its own LRU pagevec when it
should be just using the per-CPU LRU pagevec (which will cut down on both data
and code size cacheline footprint). Also, these private LRU pagevecs are
emptied after just a very short time, in contrast with the per-CPU pagevecs
that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
to add the pages to pagecache for a bulk write (in 4K chunks).

[this gets rid of some cond_resched() calls in readahead.c and mpage.c due
to clashes in -mm. What put them there, and why? ]

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
64649a589 mm: trim more holes ... Browse Code »

If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then
we may have failed the write operation despite prepare_write having
instantiated blocks past i_size. Fix this, and consolidate the trimming into
one place.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
5fe172370 mm: debug write deadlocks ... Browse Code »

Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the
Makes the race much easier to hit.

This is useful for demonstration and testing purposes, but is removed in a
subsequent patch.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
ae37461c7 mm: clean up buffered write code ... Browse Code »

Rename some variables and fix some types.

Signed-off-by: Andrew Morton
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-10-17 00:42:54 +0800
6814d7a91 Revert "[PATCH] generic_file_buffered_write(): deadlock on vectored write" ... Browse Code »

This reverts commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which
fixed the following bug:

When prefaulting in the pages in generic_file_buffered_write(), we only
faulted in the pages for the firts segment of the iovec. If the second of
successive segment described a mmapping of the page into which we're
write()ing, and that page is not up-to-date, the fault handler tries to lock
the already-locked page (to bring it up to date) and deadlocks.

An exploit for this bug is in writev-deadlock-demo.c, in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

(These demos assume blocksize < PAGE_CACHE_SIZE).

The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment. So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-10-17 00:42:54 +0800
4b49643fb Revert "[PATCH] generic_file_buffered_write(): handle zero-length iovec segments" ... Browse Code »

This reverts commit 81b0c8713385ce1b1b9058e916edcf9561ad76d6, which was
a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 ("[PATCH]
generic_file_buffered_write(): deadlock on vectored write"), which we
also revert.

Signed-off-by: Andrew Morton
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-10-17 00:42:54 +0800
41cb8ac02 mm: revert KERNEL_DS buffered write optimisation ... Browse Code »

Revert the patch from Neil Brown to optimise NFSD writev handling.

Cc: Neil Brown
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
902aaed0d mm: use pagevec to rotate reclaimable page ... Browse Code »

While running some memory intensive load, system response deteriorated just
after swap-out started.

The cause of this problem is that when a PG_reclaim page is moved to the tail
of the inactive LRU list in rotate_reclaimable_page(), lru_lock spin lock is
acquired every page writeback . This deteriorates system performance and
makes interrupt hold off time longer when swap-out started.

Following patch solves this problem. I use pagevec in rotating reclaimable
pages to mitigate LRU spin lock contention and reduce interrupt hold off time.

I did a test that allocating and touching pages in multiple processes, and
pinging to the test machine in flooding mode to measure response under memory
intensive load.

The test result is:

-2.6.23-rc5
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 53222ms
rtt min/avg/max/mdev = 0.074/0.652/172.228/7.176 ms, pipe 11, ipg/ewma
17.746/0.092 ms

-2.6.23-rc5-patched
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma
17.314/0.091 ms

Max round-trip-time was improved.

The test machine spec is that 4CPU(3.16GHz, Hyper-threading enabled)
8GB memory , 8GB swap.

I did ping test again to observe performance deterioration caused by taking
a ref.

-2.6.23-rc6-with-modifiedpatch
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 53386ms
rtt min/avg/max/mdev = 0.074/0.110/4.716/0.147 ms, pipe 2, ipg/ewma 17.801/0.129 ms

The result for my original patch is as follows.

-2.6.23-rc5-with-originalpatch
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma 17.314/0.091 ms

The influence to response was small.

[akpm@linux-foundation.org: fix uninitalised var warning]
[hugh@veritas.com: fix locking]
[randy.dunlap@oracle.com: fix function declaration]
[hugh@veritas.com: fix BUG at include/linux/mm.h:220!]
[hugh@veritas.com: kill redundancy in rotate_reclaimable_page]
[hugh@veritas.com: move_tail_pages into lru_add_drain]
Signed-off-by: Hisashi Hifumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2007-10-17 00:42:54 +0800
754af6f5a Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag ... Browse Code »

Allow an application to query the memories allowed by its context.

Updated numa_memory_policy.txt to mention that applications can use this to
obtain allowed memories for constructing valid policies.

TODO: update out-of-tree libnuma wrapper[s], or maybe add a new
wrapper--e.g., numa_get_mems_allowed() ?

Also, update numa syscall man pages.

Tested with memtoy V>=0.13.

Signed-off-by: Lee Schermerhorn
Acked-by: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2007-10-17 00:42:54 +0800
32a4330d4 mm: prevent kswapd from freeing excessive amounts of lowmem ... Browse Code »

The current VM can get itself into trouble fairly easily on systems with a
small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory.

On one side, page_alloc() will allocate down to zone->pages_low, while on
the other side, kswapd() and balance_pgdat() will try to free memory from
every zone, until every zone has more free pages than zone->pages_high.

Highmem can be filled up to zone->pages_low with page tables, ramfs,
vmalloc allocations and other unswappable things quite easily and without
many bad side effects, since we still have a huge ZONE_NORMAL to do future
allocations from.

However, as long as the number of free pages in the highmem zone is below
zone->pages_high, kswapd will continue swapping things out from
ZONE_NORMAL, too!

Sami Farin managed to get his system into a stage where kswapd had freed
about 700MB of low memory and was still "going strong".

The attached patch will make kswapd stop paging out data from zones when
there is more than enough memory free. We do go above zone->pages_high in
order to keep pressure between zones equal in normal circumstances, but the
patch should prevent the kind of excesses that made Sami's computer totally
unusable.

Signed-off-by: Rik van Riel
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2007-10-17 00:42:54 +0800
8691f3a72 mm: no need to cast vmalloc() return value in zone_wait_table_init() ... Browse Code »

vmalloc() returns a void pointer, so there's no need to cast its
return value in mm/page_alloc.c::zone_wait_table_init().

Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2007-10-17 00:42:54 +0800
26fb1589c fix the max path calculation in radix-tree.c ... Browse Code »

A while back, Nick Piggin introduced a patch to reduce the node memory
usage for small files (commit cfd9b7df4abd3257c9e381b0e445817b26a51c0c):

-#define RADIX_TREE_MAP_SHIFT 6
+#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)

Unfortunately, he didn't take into account the fact that the
calculation of the maximum path was based on an assumption of having
to round up:

#define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)

So, if CONFIG_BASE_SMALL is set, you will end up with a
RADIX_TREE_MAX_PATH that is one greater than necessary. The practical
upshot of this is just a bit of wasted memory (one long in the
height_to_maxindex array, an extra pre-allocated radix tree node per
cpu, and extra stack usage in a couple of functions), but it seems
worth getting right.

It's also worth noting that I never build with CONFIG_BASE_SMALL.
What I did to test this was duplicate the code in a small user-space
program and check the results of the calculations for max path and the
contents of the height_to_maxindex array.

Signed-off-by: Jeff Moyer
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2007-10-17 00:42:54 +0800
a4b0672db fs: fix nobh error handling ... Browse Code »

nobh mode error handling is not just pretty slack, it's wrong.

One cannot zero out the whole page to ensure new blocks are zeroed, because
it just brings the whole page "uptodate" with zeroes even if that may not
be the correct uptodate data. Also, other parts of the page may already
contain dirty data which would get lost by zeroing it out. Thirdly, the
writeback of zeroes to the new blocks will also erase existing blocks. All
these conditions are pagecache and/or filesystem corruption.

The problem comes about because we didn't keep track of which buffers
actually are new or old. However it is not enough just to keep only this
state, because at the point we start dirtying parts of the page (new
blocks, with zeroes), the handling of IO errors becomes impossible without
buffers because the page may only be partially uptodate, in which case the
page flags allone cannot capture the state of the parts of the page.

So allocate all buffers for the page upfront, but leave them unattached so
that they don't pick up any other references and can be freed when we're
done. If the error path is hit, then zero the new buffers as the regular
buffer path does, then attach the buffers to the page so that it can
actually be written out correctly and be subject to the normal IO error
handling paths.

As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page
systems.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
68671f35f mm: add end_buffer_read helper function ... Browse Code »

Move duplicated code from end_buffer_read_XXX methods to separate helper
function.

Signed-off-by: Dmitry Monakhov
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Monakhov
2007-10-17 00:42:53 +0800
ef8b4520b Slab allocators: fail if ksize is called with a NULL parameter ... Browse Code »

A NULL pointer means that the object was not allocated. One cannot
determine the size of an object that has not been allocated. Currently we
return 0 but we really should BUG() on attempts to determine the size of
something nonexistent.

krealloc() interprets NULL to mean a zero sized object. Handle that
separately in krealloc().

Signed-off-by: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-10-17 00:42:53 +0800
0da7e01f5 calculation of pgoff in do_linear_fault() uses mixed units ... Browse Code »

The calculation of pgoff in do_linear_fault() should use PAGE_SHIFT and not
PAGE_CACHE_SHIFT since vma->vm_pgoff is in units of PAGE_SIZE and not
PAGE_CACHE_SIZE. At the moment linux/pagemap.h has PAGE_CACHE_SHIFT
defined as PAGE_SHIFT, but should that ever change this calculation would
break.

Signed-off-by: Dean Nelson
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dean Nelson
2007-10-17 00:42:53 +0800
2408c5503 {slub, slob}: use unlikely() for kfree(ZERO_OR_NULL_PTR) check ... Browse Code »

Considering kfree(NULL) would normally occur only in error paths and
kfree(ZERO_SIZE_PTR) is uncommon as well, so let's use unlikely() for the
condition check in SLUB's and SLOB's kfree() to optimize for the common
case. SLAB has this already.

Signed-off-by: Satyam Sharma
Cc: Pekka Enberg
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Satyam Sharma
2007-10-17 00:42:53 +0800
c92ff1bde move mm_struct and vm_area_struct ... Browse Code »

Move the definitions of struct mm_struct and struct vma_area_struct to
include/mm_types.h. This allows to define more function in asm/pgtable.h
and friends with inline assemblies instead of macros. Compile tested on
i386, powerpc, powerpc64, s390-32, s390-64 and x86_64.

[aurelien@aurel32.net: build fix]
Signed-off-by: Martin Schwidefsky
Signed-off-by: Aurelien Jarno
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2007-10-17 00:42:53 +0800
c0bc9875b radix-tree: use indirect bit ... Browse Code »

Rather than sign direct radix-tree pointers with a special bit, sign the
indirect one that hangs off the root. This means that, given a lookup_slot
operation, the invalid result will be differentiated from the valid
(previously, valid results could have the bit either set or clear).

This does not affect slot lookups which occur under lock -- they can never
return an invalid result. Is needed in future for lockless pagecache.

Signed-off-by: Nick Piggin
Acked-by: Peter Zijlstra
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:53 +0800