30 Apr, 2013
1 commit
-
There is an ifdef in page_cache_get_speculative() that checks for !SMP
and TREE_RCU, which has been an impossible combination since the advent
of TINY_RCU. The ifdef enables a fastpath that is valid when preemption
is disabled by rcu_read_lock() in UP systems, which is the case when
TINY_RCU is enabled. This commit therefore adjusts the ifdef to
generate the fastpath when TINY_RCU is enabled.Signed-off-by: Paul E. McKenney
Reported-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Feb, 2013
1 commit
-
Create a helper function to check if a backing device requires stable
page writes and, if so, performs the necessary wait. Then, make it so
that all points in the memory manager that handle making pages writable
use the helper function. This should provide stable page write support
to most filesystems, while eliminating unnecessary waiting for devices
that don't require the feature.Before this patchset, all filesystems would block, regardless of whether
or not it was necessary. ext3 would wait, but still generate occasional
checksum errors. The network filesystems were left to do their own
thing, so they'd wait too.After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it. ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait. Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all.
The blocking behavior is back to what it was before 3.0 if you don't
have a disk requiring stable page writes.Here's the result of using dbench to test latency on ext2:
3.8.0-rc3:
Operation Count AvgLat MaxLat
----------------------------------------
WriteX 109347 0.028 59.817
ReadX 347180 0.004 3.391
Flush 15514 29.828 287.283Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms
3.8.0-rc3 + patches:
WriteX 105556 0.029 4.273
ReadX 335004 0.005 4.112
Flush 14982 30.540 298.634Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms
As you can see, the maximum write latency drops considerably with this
patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.Signed-off-by: Darrick J. Wong
Acked-by: Steven Whitehouse
Reviewed-by: Jan Kara
Cc: Adrian Hunter
Cc: Andy Lutomirski
Cc: Artem Bityutskiy
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Jens Axboe
Cc: Eric Van Hensbergen
Cc: Ron Minnich
Cc: Latchesar Ionkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Dec, 2012
1 commit
-
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.[akpm@linux-foundation.org: use PAGE_FLAGS_CHECK_AT_PREP, s/__balloon_page_flags/page_flags_cleared/, small cleanups]
[rientjes@google.com: allow balloon compaction for any system with memory compaction enabled, which is the defconfig]
Signed-off-by: Rafael Aquini
Acked-by: Mel Gorman
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Konrad Rzeszutek Wilk
Cc: Minchan Kim
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Aug, 2012
1 commit
-
In order to teach filesystems to handle swap cache pages, three new page
functions are introduced:pgoff_t page_file_index(struct page *);
loff_t page_file_offset(struct page *);
struct address_space *page_file_mapping(struct page *);page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.page_file_offset() - uses page_file_index(), so that it will give the
expected result, even for PG_swapcache pages.page_file_mapping() - gives the mapping backing the actual page; that is
for swap cache pages it will give swap_file->f_mapping.Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Christoph Hellwig
Cc: David S. Miller
Cc: Eric B Munson
Cc: Eric Paris
Cc: James Morris
Cc: Mel Gorman
Cc: Mike Christie
Cc: Neil Brown
Cc: Sebastian Andrzej Siewior
Cc: Trond Myklebust
Cc: Xiaotian Feng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 May, 2012
1 commit
-
Commit f56f821feb7b ("mm: extend prefault helpers to fault in more than
PAGE_SIZE") added in the new functions: fault_in_multipages_writeable()
and fault_in_multipages_readable().However, we currently see:
include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function
include/linux/pagemap.h:492: note: 'ret' was declared hereUnlike a lot of gcc nags, this one appears somewhat legit. i.e. passing
in an invalid negative value of "size" does make it look like all the
conditionals in there would be bypassed and the uninitialized value would
be returned.Signed-off-by: Paul Gortmaker
Cc: Daniel Vetter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Apr, 2012
1 commit
-
This regression has been introduced in
commit f56f821feb7b36223f309e0ec05986bb137ce418
Author: Daniel Vetter
Date: Sun Mar 25 19:47:41 2012 +0200mm: extend prefault helpers to fault in more than PAGE_SIZE
I have failed to notice this because x86 asm seems to happily compile
things as-is.Reported-by: Geert Uytterhoeven
Signed-off-by: Dave Airlie
27 Mar, 2012
1 commit
-
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.Add new functions in filemap.h to make that possible.
Also kill a copy&pasted spurious space in both functions while at it.
v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: Andrew Morton
Signed-off-by: Daniel Vetter
26 Jul, 2011
1 commit
-
The often-NULL data arg to read_cache_page() and read_mapping_page()
functions is misdescribed as "destination for read data": no, it's the
first arg to the filler function, often struct file * to ->readpage().Satisfy checkpatch.pl on those filler prototypes, and tidy up the
declarations in linux/pagemap.h.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Jun, 2011
1 commit
-
Create a new CONFIG_PREEMPT_COUNT that handles the inc/dec
of preempt count offset independently. So that the offset
can be updated by preempt_disable() and preempt_enable()
even without the need for CONFIG_PREEMPT beeing set.This prepares to make CONFIG_DEBUG_SPINLOCK_SLEEP working
with !CONFIG_PREEMPT where it currently doesn't detect
code that sleeps inside explicit preemption disabled
sections.Signed-off-by: Frederic Weisbecker
Acked-by: Paul E. McKenney
Cc: Ingo Molnar
Cc: Peter Zijlstra
25 May, 2011
2 commits
-
Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
readahead page allocations are completely optional. They are OK to fail
and in particular shall not trigger OOM on themselves.Reported-by: Dave Young
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Wu Fengguang
Reviewed-by: Minchan Kim
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
commit 2687a356 ("Add lock_page_killable") introduced killable
lock_page(). Similarly this patch introdues killable
wait_on_page_locked().Signed-off-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Cc: Matthew Wilcox
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Mar, 2011
1 commit
-
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...Fix up conflicts in fs/{aio.c,super.c}
23 Mar, 2011
4 commits
-
Now we renamed remove_from_page_cache with delete_from_page_cache. As
consistency of __remove_from_swap_cache and remove_from_swap_cache, we
change internal page cache handling function name, too.Signed-off-by: Minchan Kim
Cc: Christoph Hellwig
Acked-by: Hugh Dickins
Acked-by: Mel Gorman
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Johannes Weiner
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now delete_from_page_cache() replaces remove_from_page_cache(). So we
remove remove_from_page_cache so fs or something out of mainline will
notice it when compile time and can fix it.Signed-off-by: Minchan Kim
Cc: Christoph Hellwig
Acked-by: Hugh Dickins
Acked-by: Mel Gorman
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Johannes Weiner
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Presently we increase the page refcount in add_to_page_cache() but don't
decrease it in remove_from_page_cache(). Such asymmetry adds confusion,
requiring that callers notice it and a comment explaining why they release
a page reference. It's not a good API.A long time ago, Hugh tried it (http://lkml.org/lkml/2004/10/24/140) but
gave up because reiser4's drop_page() had to unlock the page between
removing it from page cache and doing the page_cache_release(). But now
the situation is changed. I think at least things in current mainline
don't have any obstacles. The problem is for out-of-mainline filesystems
- if they have done such things as reiser4, this patch could be a problem
but they will discover this at compile time since we remove
remove_from_page_cache().This patch:
This function works as just wrapper remove_from_page_cache(). The
difference is that it decreases page references in itself. So caller have
to make sure it has a page reference before calling.This patch is ready for removing remove_from_page_cache().
Signed-off-by: Minchan Kim
Cc: Christoph Hellwig
Acked-by: Hugh Dickins
Acked-by: Mel Gorman
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Johannes Weiner
Reviewed-by: KOSAKI Motohiro
Cc: Edward Shishkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This function basically does:
remove_from_page_cache(old);
page_cache_release(old);
add_to_page_cache_locked(new);Except it does this atomically, so there's no possibility for the "add" to
fail because of a race.If memory cgroups are enabled, then the memory cgroup charge is also moved
from the old page to the new.This function is currently used by fuse to move pages into the page cache
on read, instead of copying the page contents.[minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()]
Signed-off-by: Miklos Szeredi
Acked-by: Rik van Riel
Acked-by: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Mar, 2011
1 commit
-
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().Signed-off-by: Jens Axboe
14 Jan, 2011
1 commit
-
The mapping_unevictable() has a likely() around the mapping parameter.
This mapping parameter comes from page_mapping() which has an unlikely()
that the page will be set as PAGE_MAPPING_ANON, and if so, it will return
NULL. One would think that this unlikely() means that the mapping
returned by page_mapping() would not be NULL, but where page_mapping() is
used just above mapping_unevictable(), that unlikely() is incorrect most
of the time. This means that the "likely(mapping)" in
mapping_unevictable() is incorrect most of the time.Running the annotated branch profiler on my main box which runs firefox,
evolution, xchat and is part of my distcc farm, I had this:correct incorrect % Function File Line
------- --------- - -------- ---- ----
12872836 1269443893 98 mapping_unevictable pagemap.h 51
35935762 1270265395 97 page_mapping mm.h 659
1306198001 143659 0 page_mapping mm.h 657
203131478 121586 0 page_mapping mm.h 657
5415491 1116 0 page_mapping mm.h 657
74899487 1116 0 page_mapping mm.h 657
203132845 224 0 page_mapping mm.h 659
5415464 27 0 page_mapping mm.h 659
13552 0 0 page_mapping mm.h 657
13552 0 0 page_mapping mm.h 659
242630 0 0 page_mapping mm.h 657
242630 0 0 page_mapping mm.h 659
74899487 0 0 page_mapping mm.h 659The page_mapping() is a static inline, which is why it shows up multiple
times. The mapping_unevictable() is also a static inline but seems to be
used only once in my setup.The unlikely in page_mapping() was correct a total of 1909540379 times and
incorrect 1270533123 times, with a 39% being incorrect. Perhaps this is
enough to remove the unlikely from page_mapping() as well.Signed-off-by: Steven Rostedt
Reviewed-by: KOSAKI Motohiro
Acked-by: Nick Piggin
Acked-by: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Oct, 2010
1 commit
-
This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs.It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
site wants mmap_sem to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
do_page_fault() will then re-acquire mmap_sem and retry the page fault.It is expected that the retry will hit the same page which will now be
cached, and thus it will complete with a low mmap_sem hold time.Tests:
- microbenchmark: thread A mmaps a large file and does random read accesses
to the mmaped area - achieves about 55 iterations/s. Thread B does
mmap/munmap in a loop at a separate location - achieves 55 iterations/s
before, 15000 iterations/s after.- We are seeing related effects in some applications in house, which show
significant performance regressions when running without this change.[akpm@linux-foundation.org: fix warning & crash]
Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Acked-by: Linus Torvalds
Cc: Nick Piggin
Reviewed-by: Wu Fengguang
Cc: Ying Han
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Thomas Gleixner
Acked-by: "H. Peter Anvin"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Aug, 2010
1 commit
-
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
hwpoison: rename CONFIG
HWPOISON, hugetlb: support hwpoison injection for hugepage
HWPOISON, hugetlb: detect hwpoison in hugetlb code
HWPOISON, hugetlb: isolate corrupted hugepage
HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
HWPOISON, hugetlb: enable error handling path for hugepage
hugetlb, rmap: add reverse mapping for hugepage
hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.hFix up trivial conflicts in mm/memory-failure.c
11 Aug, 2010
2 commits
-
This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
we can collect processes using a hugepage through pagecache,
but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
we can neither collect processes nor unmap the hugepage.
This patch solves these problems.This patch include the bug fix given by commit 23be7468e8, so reverts it.
Dependency:
"hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completelySigned-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Andrew Morton
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Larry Woodman
Cc: Lee Schermerhorn
Acked-by: Fengguang Wu
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen -
is_vm_hugetlb_page() is a widely used inline function to insert hooks
into hugetlb code.
But we can't use it in pagemap.h because of circular dependency of
the header files. This patch removes this limitation.Acked-by: Mel Gorman
Acked-by: Fengguang Wu
Signed-off-by: Naoya Horiguchi
Signed-off-by: Andi Kleen
10 Aug, 2010
1 commit
-
Avoid quite a lot of warnings in header files in a gcc 4.6 -Wall builds
Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Jan, 2010
1 commit
-
It's a simplified 'read_cache_page()' which takes a page allocation
flag, so that different paths can control how aggressive the memory
allocations are that populate a address space.In particular, the intel GPU object mapping code wants to be able to do
a certain amount of own internal memory management by automatically
shrinking the address space when memory starts getting tight. This
allows it to dynamically use different memory allocation policies on a
per-allocation basis, rather than depend on the (static) address space
gfp policy.The actual new function is a one-liner, but re-organizing the helper
functions to the point where you can do this with a single line of code
is what most of the patch is all about.Tested-by: Chris Wilson
Signed-off-by: Linus Torvalds
22 Aug, 2009
1 commit
-
A couple of references to CONFIG_CLASSIC_RCU have survived.
Although these are harmless, it is past time for them to go.
The one in hardirq.h is strictly a readability problem.The two in pagemap.h appear to disable a !SMP performance
optimization (which this patch re-enables).This does raise the issue as to whether pagemap.h should really
be referring to the CPU implementation. Long term, I intend to
make the RCU implementation driven by CONFIG_PREEMPT, at which
point these should change from defined(CONFIG_TREE_RCU) to
!defined(CONFIG_PREEMPT). In the meantime, is there something
else that could be done in pagemap.h?Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Jun, 2009
1 commit
-
Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this
configurability is unnecessary.Signed-off-by: KOSAKI Motohiro
Cc: Johannes Weiner
Cc: Andi Kleen
Acked-by: Minchan Kim
Cc: David Woodhouse
Cc: Matt Mackall
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2009
2 commits
-
Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne -
A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next
available AS flag while the Unevictable LRU was under development. The
Unevictable LRU was using the same flag and "no one" noticed. Current
mainline, since 2.6.28, has same value for two symbolic flag names.So, define a unique flag value for AS_UNEVICTABLE--up close to the other
flags, [at the cost of an additional #ifdef] so we'll notice next time.
Note that #ifdef is not actually required, if we don't mind having the
unused flag value defined.Replace #defines with an enum.
Signed-off-by: Lee Schermerhorn
Cc: [2.6.28.x, 2.6.29.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jan, 2009
1 commit
-
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin
Reviewed-by: KOSAKI Motohiro
Cc: [2.6.28.x]
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds
20 Oct, 2008
4 commits
-
trylock_page, unlock_page open and close a critical section. Hence,
we can use the lock bitops to get the desired memory ordering.Also, mark trylock as likely to succeed (and remove the annotation from
callers).Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Setting and clearing the page locked when inserting it into swapcache /
pagecache when it has no other references can use non-atomic page flags
operations because no other CPU may be operating on it at this time.This saves one atomic operation when inserting a page into pagecache.
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
kept on the normal LRU, since scanning them is a waste of time and might
throw off kswapd's balancing algorithms. Place them on the unevictable
LRU list instead.Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
memory regions as unevictable. Then these pages will be culled off the
normal LRU lists during vmscan.Add new wrapper function to clear the mapping's unevictable state when/if
shared memory segment is munlocked.Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
the shmem segment's mapping [struct address_space] for evictability now
that they're no longer locked. If so, move them to the appropriate zone
lru list.Changes depend on [CONFIG_]UNEVICTABLE_LRU.
[kosaki.motohiro@jp.fujitsu.com: revert shm change]
Signed-off-by: Lee Schermerhorn
Signed-off-by: Rik van Riel
Signed-off-by: Kosaki Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Christoph Lameter pointed out that ram disk pages also clutter the LRU
lists. When vmscan finds them dirty and tries to clean them, the ram disk
writeback function just redirties the page so that it goes back onto the
active list. Round and round she goes...With the ram disk driver [rd.c] replaced by the newer 'brd.c', this is no
longer the case, as ram disk pages are no longer maintained on the lru.
[This makes them unmigratable for defrag or memory hot remove, but that
can be addressed by a separate patch series.] However, the ramfs pages
behave like ram disk pages used to, so:Define new address_space flag [shares address_space flags member with
mapping's gfp mask] to indicate that the address space contains all
unevictable pages. This will provide for efficient testing of ramfs pages
in page_evictable().Also provide wrapper functions to set/test the unevictable state to
minimize #ifdefs in ramfs driver and any other users of this facility.Set the unevictable state on address_space structures for new ramfs
inodes. Test the unevictable state in page_evictable() to cull
unevictable pages.These changes depend on [CONFIG_]UNEVICTABLE_LRU.
[riel@redhat.com: undo the brd.c part]
Signed-off-by: Lee Schermerhorn
Signed-off-by: Rik van Riel
Debugged-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Aug, 2008
1 commit
-
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds
30 Jul, 2008
1 commit
-
Implement lockless get_user_pages_fast for 64-bit powerpc.
Page table existence is guaranteed with RCU, and speculative page references
are used to take a reference to the pages without having a prior existence
guarantee on them.Signed-off-by: Nick Piggin
Signed-off-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Benjamin Herrenschmidt
29 Jul, 2008
1 commit
-
mm_take_all_locks holds off reclaim from an entire mm_struct. This allows
mmu notifiers to register into the mm at any time with the guarantee that
no mmu operation is in progress on the mm.This operation locks against the VM for all pte/vma/mm related operations
that could ever happen on a certain mm. This includes vmtruncate,
try_to_unmap, and all page faults.The caller must take the mmap_sem in write mode before calling
mm_take_all_locks(). The caller isn't allowed to release the mmap_sem
until mm_drop_all_locks() returns.mmap_sem in write mode is required in order to block all operations that
could modify pagetables and free pages without need of altering the vma
layout (for example populate_range() with nonlinear vmas). It's also
needed in write mode to avoid new anon_vmas to be associated with existing
vmas.A single task can't take more than one mm_take_all_locks() in a row or it
would deadlock.mm_take_all_locks() and mm_drop_all_locks are expensive operations that
may have to take thousand of locks.mm_take_all_locks() can fail if it's interrupted by signals.
When mmu_notifier_register returns, we must be sure that the driver is
notified if some task is in the middle of a vmtruncate for the 'mm' where
the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
is run around the vmtruncation but mmu_notifier_register can run after
mmu_notifier_invalidate_range_start and before
mmu_notifier_invalidate_range_end). Same problem for rmap paths. And
we've to remove page pinning to avoid replicating the tlb_gather logic
inside KVM (and GRU doesn't work well with page pinning regardless of
needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
the page, kvm would have no way to notice that it mapped into sptes a page
that is going into the freelist without a chance of any further
mmu_notifier notification.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli
Acked-by: Linus Torvalds
Cc: Christoph Lameter
Cc: Jack Steiner
Cc: Robin Holt
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Kanoj Sarcar
Cc: Roland Dreier
Cc: Steve Wise
Cc: Avi Kivity
Cc: Hugh Dickins
Cc: Rusty Russell
Cc: Anthony Liguori
Cc: Chris Wright
Cc: Marcelo Tosatti
Cc: Eric Dumazet
Cc: "Paul E. McKenney"
Cc: Izik Eidus
Cc: Anthony Liguori
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2008
1 commit
-
If we can be sure that elevating the page_count on a pagecache page will
pin it, we can speculatively run this operation, and subsequently check to
see if we hit the right page rather than relying on holding a lock or
otherwise pinning a reference to the page.This can be done if get_page/put_page behaves consistently throughout the
whole tree (ie. if we "get" the page after it has been used for something
else, we must be able to free it with a put_page).Actually, there is a period where the count behaves differently: when the
page is free or if it is a constituent page of a compound page. We need
an atomic_inc_not_zero operation to ensure we don't try to grab the page
in either case.This patch introduces the core locking protocol to the pagecache (ie.
adds page_cache_get_speculative, and tweaks some update-side code to make
it work).Thanks to Hugh for pointing out an improvement to the algorithm setting
page_count to zero when we have control of all references, in order to
hold off speculative getters.[kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
[hugh@veritas.com: fix add_to_page_cache]
[akpm@linux-foundation.org: repair a comment]
Signed-off-by: Nick Piggin
Cc: Jeff Garzik
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Hugh Dickins
Cc: "Paul E. McKenney"
Reviewed-by: Peter Zijlstra
Signed-off-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Hugh Dickins
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jul, 2008
1 commit
-
This is called on a per-page basis and in the vast majority of cases
`error' is zero.Cc: Guillaume Chazarain
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Feb, 2008
1 commit
-
FASTCALL() is always expanded to empty, remove it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Dec, 2007
1 commit
-
This routine is like lock_page, but can be interrupted by a fatal signal
Signed-off-by: Matthew Wilcox