20 Jul, 2007

1 commit

  • Fix page index to offset conversion overflows in buffer layer, ecryptfs,
    and ocfs2.

    It would be nice to convert the whole tree to page_offset, but for now
    just fix the bugs.

    Signed-off-by: Nick Piggin
    Cc: Michael Halcrow
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jul, 2007

1 commit

  • Many filesystems need a ->page-mkwrite callout to correctly
    set up pages that have been written to by mmap. This is especially
    important when mmap is writing into holes as it allows filesystems
    to correctly account for and allocate space before the mmap
    write is allowed to proceed.

    Protection against truncate races is provided by locking the page
    and checking to see whether the page mapping is correct and whether
    it is beyond EOF so we don't end up allowing allocations beyond
    the current EOF or changing EOF as a result of a mmap write.

    SGI-PV: 940392
    SGI-Modid: 2.6.x-xfs-melb:linux:29146a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

18 Jul, 2007

3 commits

  • It is a bug to set a page dirty if it is not uptodate unless it has
    buffers. If the page has buffers, then the page may be dirty (some buffers
    dirty) but not uptodate (some buffers not uptodate). The exception to this
    rule is if the set_page_dirty caller is racing with truncate or invalidate.

    A buffer can not be set dirty if it is not uptodate.

    If either of these situations occurs, it indicates there could be some data
    loss problem. Some of these warnings could be a harmless one where the
    page or buffer is set uptodate immediately after it is dirtied, however we
    should fix those up, and enforce this ordering.

    Bring the order of operations for truncate into line with those of
    invalidate. This will prevent a page from being able to go !uptodate while
    we're holding the tree_lock, which is probably a good thing anyway.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • When we are out of memory of a suitable size we enter reclaim. The current
    reclaim algorithm targets pages in LRU order, which is great for fairness at
    order-0 but highly unsuitable if you desire pages at higher orders. To get
    pages of higher order we must shoot down a very high proportion of memory;
    >95% in a lot of cases.

    This patch set adds a lumpy reclaim algorithm to the allocator. It targets
    groups of pages at the specified order anchored at the end of the active and
    inactive lists. This encourages groups of pages at the requested orders to
    move from active to inactive, and active to free lists. This behaviour is
    only triggered out of direct reclaim when higher order pages have been
    requested.

    This patch set is particularly effective when utilised with an
    anti-fragmentation scheme which groups pages of similar reclaimability
    together.

    This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
    the foundation. Credit to Mel Gorman for sanitity checking.

    Mel said:

    The patches have an application with hugepage pool resizing.

    When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
    be resized with greater reliability. Testing on a desktop machine with 2GB
    of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
    was very slow as the success rate was quite low. Without lumpy-reclaim,
    each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
    With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.

    [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
    [bunk@stusta.de: static declarations for internal functions]
    [a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
    Signed-off-by: Andy Whitcroft
    Acked-by: Peter Zijlstra
    Acked-by: Mel Gorman
    Acked-by: Mel Gorman
    Cc: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • It is often known at allocation time whether a page may be migrated or not.
    This patch adds a flag called __GFP_MOVABLE and a new mask called
    GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
    using the page migration mechanism or reclaimed by syncing with backing
    storage and discarding.

    An API function very similar to alloc_zeroed_user_highpage() is added for
    __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
    flags used by alloc_zeroed_user_highpage() are not changed because it would
    change the semantics of an existing API. After this patch is applied there
    are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
    be marked deprecated if this patch is merged.

    Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
    shmem.c to keep all flag modifications to inode->mapping in the
    shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
    Hugh Dickens.

    Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
    concept. Credit to Hugh Dickens for catching issues with shmem swap vector
    and ramfs allocations.

    [akpm@linux-foundation.org: build fix]
    [hugh@veritas.com: __GFP_ZERO cleanup]
    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Jul, 2007

1 commit


22 May, 2007

1 commit


17 May, 2007

2 commits

  • grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means
    the allocation of radix tree nodes is done with GFP_NOFS and the allocation
    of a new page is done using GFP_NOFS.

    The mapping has a flags field that contains the necessary allocation flags
    for the page cache allocation. These need to be consulted in order to get
    DMA and HIGHMEM allocations etc right. And yes a blockdev could be
    allowing Highmem allocations if its a ramdisk.

    Cc: Hugh Dickins
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

2 commits

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • It's very common for file systems to need to zero part or all of a page,
    the simplist way is just to use kmap_atomic() and memset(). There's
    actually a library function in include/linux/highmem.h that does exactly
    that, but it's confusingly named memclear_highpage_flush(), which is
    descriptive of *how* it does the work rather than what the *purpose* is.
    So this patchset renames the function to zero_user_page(), and calls it
    from the various places that currently open code it.

    This first patch introduces the new function call, and converts all the
    core kernel callsites, both the open-coded ones and the old
    memclear_highpage_flush() ones. Following this patch is a series of
    conversions for each file system individually, per AKPM, and finally a
    patch deprecating the old call. The diffstat below shows the entire
    patchset.

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Nate Diller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

09 May, 2007

2 commits


08 May, 2007

4 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove duplicate work in kill_bdev().

    It currently invalidates and then truncates the bdev's mapping.
    invalidate_mapping_pages() will opportunistically remove pages from the
    mapping. And truncate_inode_pages() will forcefully remove all pages.

    The only thing truncate doesn't do is flush the bh lrus. So do that
    explicitly. This avoids (very unlikely) but possible invalid lookup
    results if the same bdev is quickly re-issued.

    It also will prevent extreme kernel latencies which are observed when
    blockdevs which have a large amount of pagecache are unmounted, by avoiding
    invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
    cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
    has one.

    [akpm@linux-foundation.org: restore nrpages==0 optimisation]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • __block_write_full_page is calling SetPageUptodate without the page locked.
    This is unusual, but not incorrect, as PG_writeback is still set.

    However the next patch will require that SetPageUptodate always be called with
    the page locked. Simply don't bother setting the page uptodate in this case
    (it is unusual that the write path does such a thing anyway). Instead just
    leave it to the read side to bring the page uptodate when it notices that all
    buffers are uptodate.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

07 Mar, 2007

1 commit

  • This fixes a regression caused by 22c8ca78f20724676b6006232bf06cc3e9299539.

    nobh_prepare_write() no longer marks the page uptodate, so
    nobh_truncate_page() needs to do it.

    Signed-off-by: Dave Kleikamp
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     

21 Feb, 2007

2 commits

  • nobh_prepare_write leaks data similarly to how simple_prepare_write did. Fix
    by not marking the page uptodate until nobh_commit_write time. Again, this
    could break weird use-cases, but none appear to exist in the tree.

    We can safely remove the set_page_dirty, because as the comment says,
    nobh_commit_write does set_page_dirty. If a filesystem wants to allocate
    backing store for a page dirtied via mmap, page_mkwrite is the suggested
    approach.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Andrew noticed that unlocking the page before submitting all buffers for
    writeout could cause problems if the IO completes before we've finished
    messing around with the page buffers, and they subsequently get freed.

    Even if there were no bug, it is a good idea to bring the error case
    into line with the common case here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

13 Feb, 2007

2 commits

  • While compiling my code with -Wconversion using gcc-trunk, I always get a
    bunch of warrning from headers, here is fix for them:

    __getblk is alawys called with unsigned argument,
    but it takes signed, the same story with __bread,__breadahead and so on.

    Signed-off-by: Tomasz Kvarsin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Kvarsin
     
  • Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
    bufferhead. Recently, I found the long standing mmap/unwritten extent
    conversion bug, and it was to do with partial page invalidation not clearing
    the unwritten flag from bufferheads attached to the page but beyond EOF. See
    here for a full explaination:

    http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

    The solution I have checked into the XFS dev tree involves duplicating code
    from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
    and then calling block_invalidatepage() to do the rest.

    Christoph suggested that this would be better solved by pushing the unwritten
    flag into the common buffer head flags and just adding the call to
    discard_buffer():

    http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

    The following patch makes BH_Unwritten a first class citizen.

    Signed-off-by: Dave Chinner
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

12 Feb, 2007

2 commits

  • unlock_buffer(), like unlock_page(), must not clear the lock without
    ensuring that the critical section is closed.

    Mingming later sent the same patch, saying:

    We are running SDET benchmark and saw double free issue for ext3 extended
    attributes block, which complains the same xattr block already being freed (in
    ext3_xattr_release_block()). The problem could also been triggered by
    multiple threads loop untar/rm a kernel tree.

    The race is caused by missing a memory barrier at unlock_buffer() before the
    lock bit being cleared, resulting in possible concurrent h_refcounter update.
    That causes a reference counter leak, then later leads to the double free that
    we have seen.

    Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
    bit is being cleared, however, there is no memory barrier *before* the bit is
    cleared. On some arch the h_refcount update instruction and the clear bit
    instruction could be reordered, thus leave the critical section re-entered.

    The race is like this: For example, if the h_refcount is initialized as 1,

    cpu 0: cpu1
    -------------------------------------- -----------------------------------
    lock_buffer() /* test_and_set_bit */
    clear_buffer_locked(bh);
    lock_buffer() /* test_and_set_bit */
    h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */
    clear_buffer_locked(bh);
    .... ......

    We lost a h_refcount here. We need a memory barrier before the buffer head lock
    bit being cleared to force the order of the two writes. Please apply.

    Signed-off-by: Nick Piggin
    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Convert all calls to invalidate_inode_pages() into open-coded calls to
    invalidate_mapping_pages().

    Leave the invalidate_inode_pages() wrapper in place for now, marked as
    deprecated.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

30 Jan, 2007

1 commit

  • Fix commit ecdfc9787fe527491baefc22dce8b2dbd5b2908d

    Not to put too fine a point on it, but in a nutshell...

    __set_page_dirty_buffers() | try_to_free_buffers()
    ---------------------------+---------------------------
    | spin_lock(private_lock);
    | drop_bufers()
    | spin_unlock(private_lock);
    spin_lock(private_lock) |
    !page_has_buffers() |
    spin_unlock(private_lock) |
    SetPageDirty() |
    | cancel_dirty_page()

    oops!

    Signed-off-by: Nick Piggin
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

27 Jan, 2007

1 commit

  • It's not pretty, but it appears that ext3 with data=journal will clean
    pages without ever actually telling the VM that they are clean. This,
    in turn, will result in the VM (and balance_dirty_pages() in particular)
    to never realize that the pages got cleaned, and wait forever for an
    event that already happened.

    Technically, this seems to be a problem with ext3 itself, but it used to
    be hidden by 'try_to_free_buffers()' noticing this situation on its own,
    and just working around the filesystem problem.

    This commit re-instates that hack, in order to avoid a regression for
    the 2.6.20 release. This fixes bugzilla 7844:

    http://bugzilla.kernel.org/show_bug.cgi?id=7844

    Peter Zijlstra points out that we should probably retain the debugging
    code that this removes from cancel_dirty_page(), and I agree, but for
    the imminent release we might as well just silence the warning too
    (since it's not a new bug: anything that triggers that warning has been
    around forever).

    Acked-by: Randy Dunlap
    Acked-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Jan, 2007

1 commit

  • Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
    xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

    (XFS unlocks the semaphore from a different task, by design. The mutex
    code warns about this)

    Signed-off-by: Dave Chinner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

22 Dec, 2006

1 commit


11 Dec, 2006

3 commits

  • Account for the number of byte writes which this process caused to not happen
    after all.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Accounting writes is fairly simple: whenever a process flips a page from clean
    to dirty, we accuse it of having caused a write to underlying storage of
    PAGE_CACHE_SIZE bytes.

    This may overestimate the amount of writing: the page-dirtying may cause only
    one buffer_head's worth of writeout. Fixing that is possible, but probably a
    bit messy and isn't obviously important.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
    and a few other places. No functional changes.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Dec, 2006

2 commits

  • There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
    prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
    generating compiler warnings of unused symbols, hence forcing people to add
    #ifdefs.

    the compiler can skip truly unused functions just fine:

    text data bss dec hex filename
    1624412 728710 3674856 6027978 5bfaca vmlinux.before
    1624412 728710 3674856 6027978 5bfaca vmlinux.after

    [akpm@osdl.org: topology.c fix]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Oct, 2006

1 commit

  • When IO error happens on metadata buffer, buffer is freed from memory and
    later fsync() is called, filesystems like ext2 fail to report EIO. We

    solve the problem by introducing a pointer to associated address space into
    the buffer_head. When a buffer is removed from a list of metadata buffers
    associated with an address space, IO error is transferred from the buffer to
    the address space, so that fsync can later report it.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

12 Oct, 2006

2 commits

  • A couple of flush_dcache_page()s are missing on the I/O-error paths.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monakhov Dmitriy
     
  • If grow_buffers() is for some reason passed a block number which wants to lie
    outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
    will accidentally truncate `index' and will then instnatiate a page at the
    wrong pagecache offset. This causes __getblk_slow() to go into an infinite
    loop.

    This can happen with corrupted disks, or with software errors elsewhere.

    Detect that, and handle it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

10 Oct, 2006

1 commit

  • This was triggered, but not the fault of, the dirty page accounting
    patches. Suitable for -stable as well, after it goes upstream.

    Unable to handle kernel NULL pointer dereference at virtual address 0000004c
    EIP is at _spin_lock+0x12/0x66
    Call Trace:
    [] __set_page_dirty_buffers+0x15/0xc0
    [] set_page_dirty+0x2c/0x51
    [] set_page_dirty_balance+0xb/0x3b
    [] __do_fault+0x1d8/0x279
    [] __handle_mm_fault+0x125/0x951
    [] do_page_fault+0x440/0x59f
    [] error_code+0x39/0x40
    [] 0x8048a33

    Signed-off-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Oct, 2006

1 commit

  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

26 Sep, 2006

1 commit

  • Tracking of dirty pages in shared writeable mmap()s.

    The idea is simple: write protect clean shared writeable pages, catch the
    write-fault, make writeable and set dirty. On page write-back clean all the
    PTE dirty bits and write protect them once again.

    The implementation is a tad harder, mainly because the default
    backing_dev_info capabilities were too loosely maintained. Hence it is not
    enough to test the backing_dev_info for cap_account_dirty.

    The current heuristic is as follows, a VMA is eligible when:
    - its shared writeable
    (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
    - it is not a 'special' mapping
    (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
    - the backing_dev_info is cap_account_dirty
    mapping_cap_account_dirty(vma->vm_file->f_mapping)
    - f_op->mmap() didn't change the default page protection

    Page from remap_pfn_range() are explicitly excluded because their COW
    semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
    because they don't have a backing store anyway.

    mprotect() is taught about the new behaviour as well. However it overrides
    the last condition.

    Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
    It can be called on any page, but is currently only implemented for mapped
    pages, if the page is found the be of a VMA that accounts dirty pages it will
    also wrprotect the PTE.

    Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
    under ->private_lock. This seems to be safe, since ->private_lock is used to
    serialize access to the buffers, not the page itself. This is needed because
    clear_page_dirty() will call into page_mkclean() and would thereby violate
    locking order.

    [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
    Signed-off-by: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

01 Aug, 2006

1 commit

  • We can immediately bail from invalidate_bdev() if the blockdev has no
    pagecache.

    This solves the huge IPI storms which hald is causing on the big ia64
    machines when it polls CDROM drives.

    Acked-by: Jes Sorensen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton