07 Jan, 2009

18 commits

  • page_queue_congested() was introduced in 2002, but it was never used

    Signed-off-by: KOSAKI Motohiro
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Complete zap_pte_range()'s coverage of bad pagetable entries by calling
    print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
    That needs free_swap_and_cache() to tell it, which will also have shown
    one of those "swap_free" errors (but with much less information).

    Similar checks in fork's copy_one_pte()? No, that would be more noisy
    than helpful: we'll see them when parent and child exec or exit.

    Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
    case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
    VM_FAULT_OOM rather than VM_FAULT_SIGBUS? Well, okay, that is consistent
    with what happens if do_swap_page() operates a bad swap entry; but don't
    we have patches to be more careful about killing when VM_FAULT_OOM?

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove the srandom32((u32)get_seconds()) from non-rotational swapon:
    there's been a coincidental discussion of earlier randomization, assume
    that goes ahead, let swapon be a client rather than stirring for itself.

    Signed-off-by: Hugh Dickins
    Cc: David Woodhouse
    Cc: Donjun Shin
    Cc: James Bottomley
    Cc: Jens Axboe
    Cc: Joern Engel
    Cc: KAMEZAWA Hiroyuki
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Change pgoff_t nr_blocks in discard_swap() and discard_swap_cluster() to
    sector_t: given the constraints on swap offsets (in particular, the 5 bits
    of swap type accommodated in the same unsigned long), pgoff_t was actually
    safe as is, but it certainly looked worrying when shifted left.

    [akpm@linux-foundation.org: fix shift overflow]
    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Though attempting to find free clusters (Andrea), swap allocation has
    always restarted its searches from the beginning of the swap area (sct),
    to reduce seek times between swap pages, by not scattering them all over
    the partition.

    But on a solidstate swap device, seeks are cheap, and block remapping to
    level the wear may be limited by zones: in that case it's better to cycle
    around the whole partition.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Swap allocation has always started from the beginning of the swap area;
    but if we're dealing with a solidstate swap device which can only remap
    blocks within limited zones, that would sooner wear out the first zone.

    Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
    randomize the cluster_next starting position for allocation.

    If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
    with an "SS" at the right end of the kernel's "Adding ... swap" message
    (so that if it's both nonrot and discardable, "SSD" will be shown there).
    Perhaps something should be shown in /proc/swaps (swapon -s), but we have
    to be more cautious before making any addition to that format.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When scan_swap_map() finds a free cluster of swap pages to allocate,
    discard the old contents of the cluster if the device supports discard.
    But don't bother when swap is so fragmented that we allocate single pages.

    Be careful about racing allocations made while we're scanning for a
    cluster; and hold up allocations made while we're discarding.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When adding swap, all the old data on swap can be forgotten: sys_swapon()
    discard all but the header page of the swap partition (or every extent but
    the header of the swap file), to give a solidstate swap device the
    opportunity to optimize its wear-levelling.

    If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
    "D" at the right end of the kernel's "Adding ... swap" message. Perhaps
    something should be shown in /proc/swaps (swapon -s), but we have to be
    more cautious before making any addition to that format.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Before making functional changes, rearrange scan_swap_map() to simplify
    subsequent diffs. Actually, there is one functional change in there:
    leave cluster_nr negative while scanning for a new cluster - resetting it
    early increased the likelihood that when we have difficulty finding a free
    cluster, another task may come in and try doing exactly the same - just a
    waste of cpu.

    Before making functional changes, rearrange struct swap_info_struct
    slightly: flags will be needed as an unsigned long (for wait_on_bit), next
    is a good int to pair with prio, old_block_size is uninteresting so shift
    it to the end.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The kernel has not supported v0 SWAP-SPACE since 2.5.22: I think we can
    now safely drop its "version 0 swap is no longer supported" message - just
    say "Unable to find swap-space signature" as usual. This removes one
    level of indentation from a stretch of sys_swapon().

    I'd have liked to be specific, saying "Unable to find SWAPSPACE2
    signature", but it's just too confusing that the version 1 signature shows
    the number 2.

    Irrelevant nearby cleanup: kmap(page) already gives page_address(page).

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove trailing whitespace from swapfile.c, and odd swap_show() alignment.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • sys_swapon()'s swapfilesize (better renamed swapfilepages) is declared as
    an int, but should be an unsigned long like the maxpages it's compared
    against: on 64-bit (with 4kB pages) a swapfile of 2^44 bytes was rejected
    with "Swap area shorter than signature indicates".

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Rik suggests a simplified get_scan_ratio() for !CONFIG_SWAP. Yes, the gcc
    optimizer gives us that, when nr_swap_pages is #defined as 0L. Move usual
    declaration to swapfile.c: it never belonged in page_alloc.c.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Acked-by: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • There's a possible race in try_to_unuse() which Nick Piggin led me to two
    years ago. Where it does lock_page() after read_swap_cache_async(), what
    if another task removed that page from swapcache just before we locked it?

    It would sail though the (*swap_map > 1) tests doing nothing (because it
    could not have been removed from swapcache before its swap references were
    gone), until it reaches the delete_from_swap_cache(page) near the bottom.

    Now imagine that this page has been allocated to swap on a different swap
    area while we dropped page lock (perhaps at the top, perhaps in unuse_mm):
    we could wrongly remove from swap cache before the page has been written
    to swap, so a subsequent do_swap_page() would read in stale data from
    swap.

    I think this case could not happen before: remove_exclusive_swap_page()
    refused while page count was raised. But now with reuse_swap_page() and
    try_to_free_swap() removing from swap cache without minding page count, I
    think it could happen - the previous patch argued that it was safe because
    try_to_unuse() already ignored page count, but overlooked that it might be
    breaking the assumptions in try_to_unuse() itself.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • remove_exclusive_swap_page(): its problem is in living up to its name.

    It doesn't matter if someone else has a reference to the page (raised
    page_count); it doesn't matter if the page is mapped into userspace
    (raised page_mapcount - though that hints it may be worth keeping the
    swap): all that matters is that there be no more references to the swap
    (and no writeback in progress).

    swapoff (try_to_unuse) has been removing pages from swapcache for years,
    with no concern for page count or page mapcount, and we used to have a
    comment in lookup_swap_cache() recognizing that: if you go for a page of
    swapcache, you'll get the right page, but it could have been removed from
    swapcache by the time you get page lock.

    So, give up asking for exclusivity: get rid of
    remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
    remove_exclusive_swap_page_count() which were spawned for the recent LRU
    work: replace them by the simpler try_to_free_swap() which just checks
    page_swapcount().

    Similarly, remove the page_count limitation from free_swap_and_count(),
    but assume that it's worth holding on to the swap if page is mapped and
    swap nowhere near full. Add a vm_swap_full() test in free_swap_cache()?
    It would be consistent, but I think we probably have enough for now.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • A good place to free up old swap is where do_wp_page(), or do_swap_page(),
    is about to redirty the page: the data on disk is then stale and won't be
    read again; and if we do decide to write the page out later, using the
    previous swap location makes an unnecessary disk seek very likely.

    So give can_share_swap_page() the side-effect of delete_from_swap_cache()
    when it safely can. And can_share_swap_page() was always a misleading
    name, the more so if it has a side-effect: rename it reuse_swap_page().

    Irrelevant cleanup nearby: remove swap_token_default_timeout definition
    from swap.h: it's used nowhere.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Acked-by: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The swap code is over-provisioned with BUG_ONs on assorted page flags,
    mostly dating back to 2.3. They're good documentation, and guard against
    developer error, but a waste of space on most systems: change them to
    VM_BUG_ONs, conditional on CONFIG_DEBUG_VM. Just delete the PagePrivate
    ones: they're later, from 2.5.69, but even less interesting now.

    Signed-off-by: Hugh Dickins
    Reviewed-by: Christoph Lameter
    Cc: Nick Piggin
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Dec, 2008

1 commit

  • Impact: cleanup, code robustization

    The __swp_...() macros silently relied upon which bits are used for
    _PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
    our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
    were reported that could have been avoided if these macros properly
    used the symbolic constants. Since, as pointed out earlier, for Xen
    Dom0 support mainline likewise will need to eliminate the conflict
    between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
    adjustments, plus it introduces a mechanism to check consistency
    between MAX_SWAPFILES_SHIFT and the actual encoding macros.

    This also fixes a latent bug in that x86-64 used a 6-bit mask in
    __swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
    seemingly unrelated) linux/swap.h, this would have resulted in a
    collision with _PAGE_FILE.

    Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
    pgoff_to_pte() calculations.

    Signed-off-by: Jan Beulich
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

20 Oct, 2008

2 commits

  • trylock_page, unlock_page open and close a critical section. Hence,
    we can use the lock bitops to get the desired memory ordering.

    Also, mark trylock as likely to succeed (and remove the annotation from
    callers).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • If vm_swap_full() (swap space more than 50% full), the system will free
    swap space at swapin time. With this patch, the system will also free the
    swap space in the pageout code, when we decide that the page is not a
    candidate for swapout (and just wasting swap space).

    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: MinChan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

05 Aug, 2008

1 commit

  • Converting page lock to new locking bitops requires a change of page flag
    operation naming, so we might as well convert it to something nicer
    (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

    This also facilitates lockdeping of page lock.

    Signed-off-by: Nick Piggin
    Acked-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Andrew Morton
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

31 Jul, 2008

1 commit


27 Jul, 2008

2 commits

  • This patch makes the following needlessly global code static:
    - swap_lock
    - nr_swapfiles
    - struct swap_list

    Signed-off-by: Adrian Bunk
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • mapping->tree_lock has no read lockers. convert the lock from an rwlock
    to a spinlock.

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

25 Jul, 2008

1 commit

  • Vegard Nossum has noticed the ever-decreasing negative priority in a
    swapon /swapoff loop, which eventually would misprioritize when int wraps
    positive. Not worth spending much code on, but probably better fixed.

    It's easy to handle the swapping on and off of just one area, but there's
    not much point if a pair or more still misbehave. To handle the general
    case, swapoff should compact negative priorities, keeping them always from
    -1 to -MAX_SWAPFILES. That's a change, but should cause no regression,
    since these negative (unspecified) priorities are disjoint from the the
    positive specified priorities 0 to 32767.

    One small functional difference, which seems appropriate: when swapoff
    fails to free all swap from a negative priority area, that area is now
    reinserted at lowest priority, rather than at its original priority.

    In moving down swapon's setting of priority, I notice that an area is
    visible to /proc/swaps when it has swap_map set, yet that was being set
    before all the visible fields were properly filled in: corrected.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Reported-by: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

29 Apr, 2008

1 commit


28 Apr, 2008

1 commit

  • When checking for the swap header try byteswapping the endianess dependent
    fields to allow the swap partition to be shared between big & little endian
    systems.

    Signed-off-by: Chris Dearman
    Signed-off-by: Ralf Baechle
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Dearman
     

15 Feb, 2008

1 commit

  • seq_path() is always called with a dentry and a vfsmount from a struct path.
    Make seq_path() take it directly as an argument.

    Signed-off-by: Jan Blunck
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

08 Feb, 2008

4 commits

  • This patch reinstates the "swapoff: scan ptes preemptibly" mod we started
    with: in due course it should be rendered down into the earlier patches,
    leaving us with a more straightforward mem_cgroup_charge mod to unuse_pte,
    allocating with GFP_KERNEL while holding no spinlock and no atomic kmap.

    Signed-off-by: Hugh Dickins
    Cc: Pavel Emelianov
    Acked-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Nick Piggin pointed out that swap cache and page cache addition routines
    could be called from non GFP_KERNEL contexts. This patch makes the
    charging routine aware of the gfp context. Charging might fail if the
    cgroup is over it's limit, in which case a suitable error is returned.

    This patch was tested on a Powerpc box. I am still looking at being able
    to test the path, through which allocations happen in non GFP_KERNEL
    contexts.

    [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add the accounting hooks. The accounting is carried out for RSS and Page
    Cache (unmapped) pages. There is now a common limit and accounting for both.
    The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
    time. Page cache is accounted at add_to_page_cache(),
    __delete_from_page_cache(). Swap cache is also accounted for.

    Each page's page_cgroup is protected with the last bit of the
    page_cgroup pointer, this makes handling of race conditions involving
    simultaneous mappings of a page easier. A reference count is kept in the
    page_cgroup to deal with cases where a page might be unmapped from the RSS
    of all tasks, but still lives in the page cache.

    Credits go to Vaidyanathan Srinivasan for helping with reference counting work
    of the page cgroup. Almost all of the page cache accounting code has help
    from Vaidyanathan Srinivasan.

    [hugh@veritas.com: fix swapoff breakage]
    [akpm@linux-foundation.org: fix locking]
    Signed-off-by: Vaidyanathan Srinivasan
    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc:
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • This patch precisely reverts the "swapoff: scan ptes preemptibly" patch
    just presented. It's a temporary measure to allow existing memory
    controller patches to apply without rejects: in due course they should be
    rendered down into one sensible patch, and this reversion disappear.

    Signed-off-by: Hugh Dickins
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Feb, 2008

4 commits

  • There are a couple of reasons (patches follow) why it would be good to open a
    window for sleep in shmem_unuse_inode, between its search for a matching swap
    entry, and its handling of the entry found.

    shmem_unuse_inode must then use igrab to hold the inode against deletion in
    that window, and its corresponding iput might result in deletion: so it had
    better unlock_page before the iput, and might as well release the page too.

    Nor is there any need to hold on to shmem_swaplist_mutex once we know we'll
    leave the loop. So this unwinding moves from try_to_unuse and shmem_unuse
    into shmem_unuse_inode, in the case when it finds a match.

    Let try_to_unuse break on error in the shmem_unuse case, as it does in the
    unuse_mm case: though at this point in the series, no error to break on.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Provided that CONFIG_HIGHPTE is not set, unuse_pte_range can reduce latency
    in swapoff by scanning the page table preemptibly: so long as unuse_pte is
    careful to recheck that entry under pte lock.

    (To tell the truth, this patch was not inspired by any cries for lower
    latency here: rather, this restructuring permits a future memory controller
    patch to allocate with GFP_KERNEL in unuse_pte, where before it could not.
    But it would be wrong to tuck this change away inside a memcgroup patch.)

    Signed-off-by: Hugh Dickins
    Acked-by: Balbir Singh
    Tested-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • valid_swaphandles is supposed to do a quick pass over the swap map entries
    neigbouring the entry which swapin_readahead is targetting, to determine for
    it a range worth reading all together. But since it always starts its search
    from the beginning of the swap "cluster", a reject (free entry) there
    immediately curtails the readaround, and every swapin_readahead from that
    cluster is for just a single page. Instead scan forwards and backwards around
    the target entry.

    Use better names for some variables: a swap_info pointer is usually called
    "si" not "swapdev". And at the end, if only the target page should be read,
    return count of 0 to disable readaround, to avoid the unnecessarily repeated
    call to read_swap_cache_async.

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Building in a filesystem on a loop device on a tmpfs file can hang when
    swapping, the loop thread caught in that infamous throttle_vm_writeout.

    In theory this is a long standing problem, which I've either never seen in
    practice, or long ago suppressed the recollection, after discounting my load
    and my tmpfs size as unrealistically high. But now, with the new aops, it has
    become easy to hang on one machine.

    Loop used to grab_cache_page before the old prepare_write to tmpfs, which
    seems to have been enough to free up some memory for any swapin needed; but
    the new write_begin lets tmpfs find or allocate the page (much nicer, since
    grab_cache_page missed tmpfs pages in swapcache).

    When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
    has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
    break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
    read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
    mapping_gfp_mask - hence the hang.

    So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
    swapin_readahead to read_swap_cache_async to add_to_swap_cache.

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Jul, 2007

1 commit


17 Jul, 2007

1 commit


08 May, 2007

1 commit

  • Ensure pages are uptodate after returning from read_cache_page, which allows
    us to cut out most of the filesystem-internal PageUptodate calls.

    I didn't have a great look down the call chains, but this appears to fixes 7
    possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
    ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
    block2mtd. All depending on whether the filler is async and/or can return
    with a !uptodate page.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin