14 Jan, 2011

1 commit

  • The mapping_unevictable() has a likely() around the mapping parameter.
    This mapping parameter comes from page_mapping() which has an unlikely()
    that the page will be set as PAGE_MAPPING_ANON, and if so, it will return
    NULL. One would think that this unlikely() means that the mapping
    returned by page_mapping() would not be NULL, but where page_mapping() is
    used just above mapping_unevictable(), that unlikely() is incorrect most
    of the time. This means that the "likely(mapping)" in
    mapping_unevictable() is incorrect most of the time.

    Running the annotated branch profiler on my main box which runs firefox,
    evolution, xchat and is part of my distcc farm, I had this:

    correct incorrect % Function File Line
    ------- --------- - -------- ---- ----
    12872836 1269443893 98 mapping_unevictable pagemap.h 51
    35935762 1270265395 97 page_mapping mm.h 659
    1306198001 143659 0 page_mapping mm.h 657
    203131478 121586 0 page_mapping mm.h 657
    5415491 1116 0 page_mapping mm.h 657
    74899487 1116 0 page_mapping mm.h 657
    203132845 224 0 page_mapping mm.h 659
    5415464 27 0 page_mapping mm.h 659
    13552 0 0 page_mapping mm.h 657
    13552 0 0 page_mapping mm.h 659
    242630 0 0 page_mapping mm.h 657
    242630 0 0 page_mapping mm.h 659
    74899487 0 0 page_mapping mm.h 659

    The page_mapping() is a static inline, which is why it shows up multiple
    times. The mapping_unevictable() is also a static inline but seems to be
    used only once in my setup.

    The unlikely in page_mapping() was correct a total of 1909540379 times and
    incorrect 1270533123 times, with a 39% being incorrect. Perhaps this is
    enough to remove the unlikely from page_mapping() as well.

    Signed-off-by: Steven Rostedt
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Nick Piggin
    Acked-by: Rik van Riel
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

27 Oct, 2010

1 commit

  • This change reduces mmap_sem hold times that are caused by waiting for
    disk transfers when accessing file mapped VMAs.

    It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
    site wants mmap_sem to be released if blocking on a pending disk transfer.
    In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
    do_page_fault() will then re-acquire mmap_sem and retry the page fault.

    It is expected that the retry will hit the same page which will now be
    cached, and thus it will complete with a low mmap_sem hold time.

    Tests:

    - microbenchmark: thread A mmaps a large file and does random read accesses
    to the mmaped area - achieves about 55 iterations/s. Thread B does
    mmap/munmap in a loop at a separate location - achieves 55 iterations/s
    before, 15000 iterations/s after.

    - We are seeing related effects in some applications in house, which show
    significant performance regressions when running without this change.

    [akpm@linux-foundation.org: fix warning & crash]
    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

13 Aug, 2010

1 commit

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
    hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
    hwpoison: rename CONFIG
    HWPOISON, hugetlb: support hwpoison injection for hugepage
    HWPOISON, hugetlb: detect hwpoison in hugetlb code
    HWPOISON, hugetlb: isolate corrupted hugepage
    HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
    HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
    HWPOISON, hugetlb: enable error handling path for hugepage
    hugetlb, rmap: add reverse mapping for hugepage
    hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h

    Fix up trivial conflicts in mm/memory-failure.c

    Linus Torvalds
     

11 Aug, 2010

2 commits

  • This patch adds reverse mapping feature for hugepage by introducing
    mapcount for shared/private-mapped hugepage and anon_vma for
    private-mapped hugepage.

    While hugepage is not currently swappable, reverse mapping can be useful
    for memory error handler.

    Without this patch, memory error handler cannot identify processes
    using the bad hugepage nor unmap it from them. That is:
    - for shared hugepage:
    we can collect processes using a hugepage through pagecache,
    but can not unmap the hugepage because of the lack of mapcount.
    - for privately mapped hugepage:
    we can neither collect processes nor unmap the hugepage.
    This patch solves these problems.

    This patch include the bug fix given by commit 23be7468e8, so reverts it.

    Dependency:
    "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

    ChangeLog since May 24.
    - create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
    - move functions setting up anon_vma for hugepage into mm/rmap.c.

    ChangeLog since May 13.
    - rebased to 2.6.34
    - fix logic error (in case that private mapping and shared mapping coexist)
    - move is_vm_hugetlb_page() into include/linux/mm.h to use this function
    from linear_page_index()
    - define and use linear_hugepage_index() instead of compound_order()
    - use page_move_anon_rmap() in hugetlb_cow()
    - copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
    - revert commit 24be7468 completely

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Acked-by: Fengguang Wu
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • is_vm_hugetlb_page() is a widely used inline function to insert hooks
    into hugetlb code.
    But we can't use it in pagemap.h because of circular dependency of
    the header files. This patch removes this limitation.

    Acked-by: Mel Gorman
    Acked-by: Fengguang Wu
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

10 Aug, 2010

1 commit


28 Jan, 2010

1 commit

  • It's a simplified 'read_cache_page()' which takes a page allocation
    flag, so that different paths can control how aggressive the memory
    allocations are that populate a address space.

    In particular, the intel GPU object mapping code wants to be able to do
    a certain amount of own internal memory management by automatically
    shrinking the address space when memory starts getting tight. This
    allows it to dynamically use different memory allocation policies on a
    per-allocation basis, rather than depend on the (static) address space
    gfp policy.

    The actual new function is a one-liner, but re-organizing the helper
    functions to the point where you can do this with a single line of code
    is what most of the patch is all about.

    Tested-by: Chris Wilson
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Aug, 2009

1 commit

  • A couple of references to CONFIG_CLASSIC_RCU have survived.
    Although these are harmless, it is past time for them to go.
    The one in hardirq.h is strictly a readability problem.

    The two in pagemap.h appear to disable a !SMP performance
    optimization (which this patch re-enables).

    This does raise the issue as to whether pagemap.h should really
    be referring to the CPU implementation. Long term, I intend to
    make the RCU implementation driven by CONFIG_PREEMPT, at which
    point these should change from defined(CONFIG_TREE_RCU) to
    !defined(CONFIG_PREEMPT). In the meantime, is there something
    else that could be done in pagemap.h?

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

17 Jun, 2009

1 commit

  • Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this
    configurability is unnecessary.

    Signed-off-by: KOSAKI Motohiro
    Cc: Johannes Weiner
    Cc: Andi Kleen
    Acked-by: Minchan Kim
    Cc: David Woodhouse
    Cc: Matt Mackall
    Cc: Rik van Riel
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

03 Apr, 2009

2 commits

  • Add a function to install a monitor on the page lock waitqueue for a particular
    page, thus allowing the page being unlocked to be detected.

    This is used by CacheFiles to detect read completion on a page in the backing
    filesystem so that it can then copy the data to the waiting netfs page.

    Signed-off-by: David Howells
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Rik van Riel
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     
  • A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next
    available AS flag while the Unevictable LRU was under development. The
    Unevictable LRU was using the same flag and "no one" noticed. Current
    mainline, since 2.6.28, has same value for two symbolic flag names.

    So, define a unique flag value for AS_UNEVICTABLE--up close to the other
    flags, [at the cost of an additional #ifdef] so we'll notice next time.
    Note that #ifdef is not actually required, if we don't mind having the
    unused flag value defined.

    Replace #defines with an enum.

    Signed-off-by: Lee Schermerhorn
    Cc: [2.6.28.x, 2.6.29.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

05 Jan, 2009

1 commit

  • With the write_begin/write_end aops, page_symlink was broken because it
    could no longer pass a GFP_NOFS type mask into the point where the
    allocations happened. They are done in write_begin, which would always
    assume that the filesystem can be entered from reclaim. This bug could
    cause filesystem deadlocks.

    The funny thing with having a gfp_t mask there is that it doesn't really
    allow the caller to arbitrarily tinker with the context in which it can be
    called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
    take the page lock. The only thing any callers care about is __GFP_FS
    anyway, so turn that into a single flag.

    Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
    this flag in their write_begin function. Change __grab_cache_page to
    accept a nofs argument as well, to honour that flag (while we're there,
    change the name to grab_cache_page_write_begin which is more instructive
    and does away with random leading underscores).

    This is really a more flexible way to go in the end anyway -- if a
    filesystem happens to want any extra allocations aside from the pagecache
    ones in ints write_begin function, it may now use GFP_KERNEL (rather than
    GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
    random example).

    [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
    [kosaki.motohiro@jp.fujitsu.com: fix fuse]
    Signed-off-by: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: [2.6.28.x]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    [ Cleaned up the calling convention: just pass in the AOP flags
    untouched to the grab_cache_page_write_begin() function. That
    just simplifies everybody, and may even allow future expansion of the
    logic. - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

20 Oct, 2008

4 commits

  • trylock_page, unlock_page open and close a critical section. Hence,
    we can use the lock bitops to get the desired memory ordering.

    Also, mark trylock as likely to succeed (and remove the annotation from
    callers).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Setting and clearing the page locked when inserting it into swapcache /
    pagecache when it has no other references can use non-atomic page flags
    operations because no other CPU may be operating on it at this time.

    This saves one atomic operation when inserting a page into pagecache.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
    kept on the normal LRU, since scanning them is a waste of time and might
    throw off kswapd's balancing algorithms. Place them on the unevictable
    LRU list instead.

    Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
    memory regions as unevictable. Then these pages will be culled off the
    normal LRU lists during vmscan.

    Add new wrapper function to clear the mapping's unevictable state when/if
    shared memory segment is munlocked.

    Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
    the shmem segment's mapping [struct address_space] for evictability now
    that they're no longer locked. If so, move them to the appropriate zone
    lru list.

    Changes depend on [CONFIG_]UNEVICTABLE_LRU.

    [kosaki.motohiro@jp.fujitsu.com: revert shm change]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: Kosaki Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Christoph Lameter pointed out that ram disk pages also clutter the LRU
    lists. When vmscan finds them dirty and tries to clean them, the ram disk
    writeback function just redirties the page so that it goes back onto the
    active list. Round and round she goes...

    With the ram disk driver [rd.c] replaced by the newer 'brd.c', this is no
    longer the case, as ram disk pages are no longer maintained on the lru.
    [This makes them unmigratable for defrag or memory hot remove, but that
    can be addressed by a separate patch series.] However, the ramfs pages
    behave like ram disk pages used to, so:

    Define new address_space flag [shares address_space flags member with
    mapping's gfp mask] to indicate that the address space contains all
    unevictable pages. This will provide for efficient testing of ramfs pages
    in page_evictable().

    Also provide wrapper functions to set/test the unevictable state to
    minimize #ifdefs in ramfs driver and any other users of this facility.

    Set the unevictable state on address_space structures for new ramfs
    inodes. Test the unevictable state in page_evictable() to cull
    unevictable pages.

    These changes depend on [CONFIG_]UNEVICTABLE_LRU.

    [riel@redhat.com: undo the brd.c part]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Debugged-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

05 Aug, 2008

1 commit

  • Converting page lock to new locking bitops requires a change of page flag
    operation naming, so we might as well convert it to something nicer
    (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

    This also facilitates lockdeping of page lock.

    Signed-off-by: Nick Piggin
    Acked-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Andrew Morton
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

30 Jul, 2008

1 commit

  • Implement lockless get_user_pages_fast for 64-bit powerpc.

    Page table existence is guaranteed with RCU, and speculative page references
    are used to take a reference to the pages without having a prior existence
    guarantee on them.

    Signed-off-by: Nick Piggin
    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Benjamin Herrenschmidt

    Nick Piggin
     

29 Jul, 2008

1 commit

  • mm_take_all_locks holds off reclaim from an entire mm_struct. This allows
    mmu notifiers to register into the mm at any time with the guarantee that
    no mmu operation is in progress on the mm.

    This operation locks against the VM for all pte/vma/mm related operations
    that could ever happen on a certain mm. This includes vmtruncate,
    try_to_unmap, and all page faults.

    The caller must take the mmap_sem in write mode before calling
    mm_take_all_locks(). The caller isn't allowed to release the mmap_sem
    until mm_drop_all_locks() returns.

    mmap_sem in write mode is required in order to block all operations that
    could modify pagetables and free pages without need of altering the vma
    layout (for example populate_range() with nonlinear vmas). It's also
    needed in write mode to avoid new anon_vmas to be associated with existing
    vmas.

    A single task can't take more than one mm_take_all_locks() in a row or it
    would deadlock.

    mm_take_all_locks() and mm_drop_all_locks are expensive operations that
    may have to take thousand of locks.

    mm_take_all_locks() can fail if it's interrupted by signals.

    When mmu_notifier_register returns, we must be sure that the driver is
    notified if some task is in the middle of a vmtruncate for the 'mm' where
    the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
    is run around the vmtruncation but mmu_notifier_register can run after
    mmu_notifier_invalidate_range_start and before
    mmu_notifier_invalidate_range_end). Same problem for rmap paths. And
    we've to remove page pinning to avoid replicating the tlb_gather logic
    inside KVM (and GRU doesn't work well with page pinning regardless of
    needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
    the page, kvm would have no way to notice that it mapped into sptes a page
    that is going into the freelist without a chance of any further
    mmu_notifier notification.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andrea Arcangeli
    Acked-by: Linus Torvalds
    Cc: Christoph Lameter
    Cc: Jack Steiner
    Cc: Robin Holt
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Kanoj Sarcar
    Cc: Roland Dreier
    Cc: Steve Wise
    Cc: Avi Kivity
    Cc: Hugh Dickins
    Cc: Rusty Russell
    Cc: Anthony Liguori
    Cc: Chris Wright
    Cc: Marcelo Tosatti
    Cc: Eric Dumazet
    Cc: "Paul E. McKenney"
    Cc: Izik Eidus
    Cc: Anthony Liguori
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

27 Jul, 2008

1 commit

  • If we can be sure that elevating the page_count on a pagecache page will
    pin it, we can speculatively run this operation, and subsequently check to
    see if we hit the right page rather than relying on holding a lock or
    otherwise pinning a reference to the page.

    This can be done if get_page/put_page behaves consistently throughout the
    whole tree (ie. if we "get" the page after it has been used for something
    else, we must be able to free it with a put_page).

    Actually, there is a period where the count behaves differently: when the
    page is free or if it is a constituent page of a compound page. We need
    an atomic_inc_not_zero operation to ensure we don't try to grab the page
    in either case.

    This patch introduces the core locking protocol to the pagecache (ie.
    adds page_cache_get_speculative, and tweaks some update-side code to make
    it work).

    Thanks to Hugh for pointing out an improvement to the algorithm setting
    page_count to zero when we have control of all references, in order to
    hold off speculative getters.

    [kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
    [hugh@veritas.com: fix add_to_page_cache]
    [akpm@linux-foundation.org: repair a comment]
    Signed-off-by: Nick Piggin
    Cc: Jeff Garzik
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

25 Jul, 2008

1 commit


14 Feb, 2008

1 commit


07 Dec, 2007

1 commit


17 Oct, 2007

3 commits

  • These are intended to replace prepare_write and commit_write with more
    flexible alternatives that are also able to avoid the buffered write
    deadlock problems efficiently (which prepare_write is unable to do).

    [mark.fasheh@oracle.com: API design contributions, code review and fixes]
    [akpm@linux-foundation.org: various fixes]
    [dmonakhov@sw.ru: new aop block_write_begin fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Mark Fasheh
    Signed-off-by: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Modify the core write() code so that it won't take a pagefault while holding a
    lock on the pagecache page. There are a number of different deadlocks possible
    if we try to do such a thing:

    1. generic_buffered_write
    2. lock_page
    3. prepare_write
    4. unlock_page+vmtruncate
    5. copy_from_user
    6. mmap_sem(r)
    7. handle_mm_fault
    8. lock_page (filemap_nopage)
    9. commit_write
    10. unlock_page

    a. sys_munmap / sys_mlock / others
    b. mmap_sem(w)
    c. make_pages_present
    d. get_user_pages
    e. handle_mm_fault
    f. lock_page (filemap_nopage)

    2,8 - recursive deadlock if page is same
    2,8;2,8 - ABBA deadlock is page is different
    2,6;b,f - ABBA deadlock if page is same

    The solution is as follows:
    1. If we find the destination page is uptodate, continue as normal, but use
    atomic usercopies which do not take pagefaults and do not zero the uncopied
    tail of the destination. The destination is already uptodate, so we can
    commit_write the full length even if there was a partial copy: it does not
    matter that the tail was not modified, because if it is dirtied and written
    back to disk it will not cause any problems (uptodate *means* that the
    destination page is as new or newer than the copy on disk).

    1a. The above requires that fault_in_pages_readable correctly returns access
    information, because atomic usercopies cannot distinguish between
    non-present pages in a readable mapping, from lack of a readable mapping.

    2. If we find the destination page is non uptodate, unlock it (this could be
    made slightly more optimal), then allocate a temporary page to copy the
    source data into. Relock the destination page and continue with the copy.
    However, instead of a usercopy (which might take a fault), copy the data
    from the pinned temporary page via the kernel address space.

    (also, rename maxlen to seglen, because it was confusing)

    This increases the CPU/memory copy cost by almost 50% on the affected
    workloads. That will be solved by introducing a new set of pagecache write
    aops in a subsequent patch.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Convert some 'unsigned long' to pgoff_t.

    Signed-off-by: Fengguang Wu
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     

09 May, 2007

1 commit


08 May, 2007

1 commit

  • Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd. All depending on whether the filler is async and/or can return
    with a !uptodate page.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

10 Feb, 2007

1 commit


29 Oct, 2006

1 commit

  • - Consolidate page_cache_alloc

    - Fix splice: only the pagecache pages and filesystem data need to use
    mapping_gfp_mask.

    - Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.

    Signed-off-by: Nick Piggin
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

26 Sep, 2006

1 commit

  • lock_page needs the caller to have a reference on the page->mapping inode
    due to sync_page, ergo set_page_dirty_lock is obviously buggy according to
    its comments.

    Solve it by introducing a new lock_page_nosync which does not do a sync_page.

    akpm: unpleasant solution to an unpleasant problem. If it goes wrong it could
    cause great slowdowns while the lock_page() caller waits for kblockd to
    perform the unplug. And if a filesystem has special sync_page() requirements
    (none presently do), permanent hangs are possible.

    otoh, set_page_dirty_lock() is usually (always?) called against userspace
    pages. They are always up-to-date, so there shouldn't be any pending read I/O
    against these pages.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Jul, 2006

1 commit

  • Currently a single atomic variable is used to establish the size of the page
    cache in the whole machine. The zoned VM counters have the same method of
    implementation as the nr_pagecache code but also allow the determination of
    the pagecache size per zone.

    Remove the special implementation for nr_pagecache and make it a zoned counter
    named NR_FILE_PAGES.

    Updates of the page cache counters are always performed with interrupts off.
    We can therefore use the __ variant here.

    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Jun, 2006

1 commit

  • Add read_mapping_page() which is used for callers that pass
    mapping->a_ops->readpage as the filler for read_cache_page. This removes
    some duplication from filesystem code.

    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

27 Apr, 2006

1 commit


01 Apr, 2006

1 commit

  • find_trylock_page() is an odd interface in that it doesn't take a reference
    like the others. Now that XFS no longer uses it, and its last remaining
    caller actually wants an elevated refcount, opencode that callsite and
    schedule find_trylock_page() for removal.

    Signed-off-by: Nick Piggin
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

24 Mar, 2006

1 commit

  • Change the page cache allocation calls to support cpuset memory spreading.

    See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
    spreading.

    On systems without cpusets configured in the kernel, this is no change.

    On systems with cpusets configured in the kernel, but the "memory_spread"
    cpuset option not enabled for the current tasks cpuset, this adds a call to a
    cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.

    On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
    routine that computes which of the tasks mems_allowed nodes should be
    preferred for this allocation.

    If memory spreading applies to a particular allocation, then any other NUMA
    mempolicy does not apply.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

14 Nov, 2005

1 commit

  • Remove last remnant of the defunct early reclaim page logic, the no longer
    used __GFP_NORECLAIM flag bit.

    Signed-off-by: Paul Jackson
    Acked-by: Martin Hicks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

28 Oct, 2005

2 commits


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro