28 Apr, 2008

1 commit

  • DIO invalidates page cache through invalidate_inode_pages2_range().
    invalidate_inode_pages2_range() sets ret=-EIO when
    invalidate_complete_page2() fails, but this ret is cleared if
    do_launder_page() succeed on a page of next index.

    In this case, dio is carried out even if invalidate_complete_page2() fails
    on some pages.

    This can cause inconsistency between memory and blocks on HDD because the
    page cache still exists.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Hisashi Hifumi
    Cc: Badari Pulavarty
    Cc: Ken Chen
    Cc: Zach Brown
    Cc: Nick Piggin
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Chuck Lever
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     

04 Mar, 2008

1 commit


06 Feb, 2008

3 commits

  • Orphaned page might have fs-private metadata, the page is truncated. As
    the page hasn't mapping, page migration refuse to migrate the page. It
    appears the page is only freed in page reclaim and if zone watermark is
    low, the page is never freed, as a result migration always fail. I thought
    we could free the metadata so such page can be freed in migration and make
    migration more reliable.

    [akpm@linux-foundation.org: go direct to try_to_free_buffers()]
    Signed-off-by: Shaohua Li
    Acked-by: Nick Piggin
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 ("Clean up and make
    try_to_free_buffers() not race with dirty pages"), try_to_free_buffers
    was changed to bail out if the page was dirty.

    That in turn caused truncate_complete_page to leak massive amounts of
    memory, because the dirty bit was only cleared after the call to
    try_to_free_buffers.

    So the call to cancel_dirty_page was moved up to have the dirty bit
    cleared early in 3e67c0987d7567ad666641164a153dca9a43b11d ("truncate:
    clear page dirtiness before running try_to_free_buffers()").

    The problem with that fix is, that the page can be redirtied after
    cancel_dirty_page was called, eg. like this:

    truncate_complete_page()
    cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
    do_invalidatepage()
    ext3_invalidatepage()
    journal_invalidatepage()
    journal_unmap_buffer()
    __dispose_buffer()
    __journal_unfile_buffer()
    __journal_temp_unlink_buffer()
    mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

    And then we end up with dirty pages being wrongly accounted.

    As a result, in ecdfc9787fe527491baefc22dce8b2dbd5b2908d ("Resurrect
    'try_to_free_buffers()' VM hackery") the changes to try_to_free_buffers
    were reverted, so the original reason for the massive memory leak is
    gone, and we can also revert the move of the call to cancel_dirty_page
    from truncate_complete_page and get the accounting right again.

    I'm not sure if it matters, but opposed to the final check in
    __remove_from_page_cache, this one also cares about the task io
    accounting, so maybe we want to use this instead, although it's not
    quite the clean fix either.

    Signed-off-by: Björn Steinbrink
    Tested-by: Krzysztof Piotr Oledzki
    Cc: Jan Kara
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Thomas Osterried
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Steinbrink
     
  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Feb, 2008

1 commit


17 Oct, 2007

2 commits


20 Jul, 2007

2 commits

  • Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
    the virtual address -> file offset differently from linear mappings.

    ->populate is a layering violation because the filesystem/pagecache code
    should need to know anything about the virtual memory mapping. The hitch here
    is that the ->nopage handler didn't pass down enough information (ie. pgoff).
    But it is more logical to pass pgoff rather than have the ->nopage function
    calculate it itself anyway (because that's a similar layering violation).

    Having the populate handler install the pte itself is likewise a nasty thing
    to be doing.

    This patch introduces a new fault handler that replaces ->nopage and
    ->populate and (later) ->nopfn. Most of the old mechanism is still in place
    so there is a lot of duplication and nice cleanups that can be removed if
    everyone switches over.

    The rationale for doing this in the first place is that nonlinear mappings are
    subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
    to duplicate the synchronisation logic rather than just consolidate the two.

    After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
    pagecache. Seems like a fringe functionality anyway.

    NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
    users have hit mainline yet.

    [akpm@linux-foundation.org: cleanup]
    [randy.dunlap@oracle.com: doc. fixes for readahead]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Randy Dunlap
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix the race between invalidate_inode_pages and do_no_page.

    Andrea Arcangeli identified a subtle race between invalidation of pages from
    pagecache with userspace mappings, and do_no_page.

    The issue is that invalidation has to shoot down all mappings to the page,
    before it can be discarded from the pagecache. Between shooting down ptes to
    a particular page, and actually dropping the struct page from the pagecache,
    do_no_page from any process might fault on that page and establish a new
    mapping to the page just before it gets discarded from the pagecache.

    The most common case where such invalidation is used is in file truncation.
    This case was catered for by doing a sort of open-coded seqlock between the
    file's i_size, and its truncate_count.

    Truncation will decrease i_size, then increment truncate_count before
    unmapping userspace pages; do_no_page will read truncate_count, then find the
    page if it is within i_size, and then check truncate_count under the page
    table lock and back out and retry if it had subsequently been changed (ptl
    will serialise against unmapping, and ensure a potentially updated
    truncate_count is actually visible).

    Complexity and documentation issues aside, the locking protocol fails in the
    case where we would like to invalidate pagecache inside i_size. do_no_page
    can come in anytime and filemap_nopage is not aware of the invalidation in
    progress (as it is when it is outside i_size). The end result is that
    dangling (->mapping == NULL) pages that appear to be from a particular file
    may be mapped into userspace with nonsense data. Valid mappings to the same
    place will see a different page.

    Andrea implemented two working fixes, one using a real seqlock, another using
    a page->flags bit. He also proposed using the page lock in do_no_page, but
    that was initially considered too heavyweight. However, it is not a global or
    per-file lock, and the page cacheline is modified in do_no_page to increment
    _count and _mapcount anyway, so a further modification should not be a large
    performance hit. Scalability is not an issue.

    This patch implements this latter approach. ->nopage implementations return
    with the page locked if it is possible for their underlying file to be
    invalidated (in that case, they must set a special vm_flags bit to indicate
    so). do_no_page only unlocks the page after setting up the mapping
    completely. invalidation is excluded because it holds the page lock during
    invalidation of each page (and ensures that the page is not mapped while
    holding the lock).

    This also allows significant simplifications in do_no_page, because we have
    the page locked in the right place in the pagecache from the start.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

18 Jul, 2007

1 commit

  • It is a bug to set a page dirty if it is not uptodate unless it has
    buffers. If the page has buffers, then the page may be dirty (some buffers
    dirty) but not uptodate (some buffers not uptodate). The exception to this
    rule is if the set_page_dirty caller is racing with truncate or invalidate.

    A buffer can not be set dirty if it is not uptodate.

    If either of these situations occurs, it indicates there could be some data
    loss problem. Some of these warnings could be a harmless one where the
    page or buffer is set uptodate immediately after it is dirtied, however we
    should fix those up, and enforce this ordering.

    Bring the order of operations for truncate into line with those of
    invalidate. This will prevent a page from being able to go !uptodate while
    we're holding the tree_lock, which is probably a good thing anyway.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Jul, 2007

2 commits

  • invalidate_mapping_pages() can sometimes take a long time (millions of pages
    to free). Long enough for the softlockup detector to trigger.

    We used to have a cond_resched() in there but I took it out because the
    drop_caches code calls invalidate_mapping_pages() under inode_lock.

    The patch adds a nasty flag and puts the cond_resched() back.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Fix the shrink_list name on some files under mm/ directory.

    Signed-off-by: Anderson Briglia
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anderson Briglia
     

10 May, 2007

1 commit

  • It's very common for file systems to need to zero part or all of a page,
    the simplist way is just to use kmap_atomic() and memset(). There's
    actually a library function in include/linux/highmem.h that does exactly
    that, but it's confusingly named memclear_highpage_flush(), which is
    descriptive of *how* it does the work rather than what the *purpose* is.
    So this patchset renames the function to zero_user_page(), and calls it
    from the various places that currently open code it.

    This first patch introduces the new function call, and converts all the
    core kernel callsites, both the open-coded ones and the old
    memclear_highpage_flush() ones. Following this patch is a series of
    conversions for each file system individually, per AKPM, and finally a
    patch deprecating the old call. The diffstat below shows the entire
    patchset.

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Nate Diller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

02 Mar, 2007

1 commit


12 Feb, 2007

2 commits

  • Convert all calls to invalidate_inode_pages() into open-coded calls to
    invalidate_mapping_pages().

    Leave the invalidate_inode_pages() wrapper in place for now, marked as
    deprecated.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It makes no sense to me to export invalidate_inode_pages() and not
    invalidate_mapping_pages() and I actually need invalidate_mapping_pages()
    because of its range specification ability...

    akpm: also remove the export of invalidate_inode_pages() by making it an
    inlined wrapper.

    Signed-off-by: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     

27 Jan, 2007

2 commits

  • NFS can handle the case where invalidate_inode_pages2_range() fails, so the
    premise behind commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140 is now gone.

    Remove the WARN_ON_ONCE() which is causing users grief as we can see from
    http://bugzilla.kernel.org/show_bug.cgi?id=7826

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • It's not pretty, but it appears that ext3 with data=journal will clean
    pages without ever actually telling the VM that they are clean. This,
    in turn, will result in the VM (and balance_dirty_pages() in particular)
    to never realize that the pages got cleaned, and wait forever for an
    event that already happened.

    Technically, this seems to be a problem with ext3 itself, but it used to
    be hidden by 'try_to_free_buffers()' noticing this situation on its own,
    and just working around the filesystem problem.

    This commit re-instates that hack, in order to avoid a regression for
    the 2.6.20 release. This fixes bugzilla 7844:

    http://bugzilla.kernel.org/show_bug.cgi?id=7844

    Peter Zijlstra points out that we should probably retain the debugging
    code that this removes from cancel_dirty_page(), and I agree, but for
    the imminent release we might as well just silence the warning too
    (since it's not a new bug: anything that triggers that warning has been
    around forever).

    Acked-by: Randy Dunlap
    Acked-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Jan, 2007

1 commit

  • NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

    Signed-off-by: Trond Myklebust

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Peter Zijlstra
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

24 Dec, 2006

1 commit

  • Make cancel_dirty_page() act more like all the other dirty and writeback
    accounting functions: test for "mapping" being NULL, and do the
    NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

    Also, add it to the exports, so that modular filesystems can use it.

    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Dec, 2006

1 commit


22 Dec, 2006

2 commits

  • truncate presently invalidates the dirty page's buffer_heads then shoots down
    the page. But try_to_free_buffers() will now bale out because the page is
    dirty.

    Net effect: the LRU gets filled with dirty pages which have invalidated
    buffer_heads attached. They have no ->mapping and hence cannot be cleaned.
    The machine leaks memory at an enormous rate.

    Fix this by cleaning the page before running try_to_free_buffers(), so
    try_to_free_buffers() can do its work.

    Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
    machine won't wedge up trying to write non-existent dirty pages.

    Probably still wrong, but now less so.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • They were horribly easy to mis-use because of their tempting naming, and
    they also did way more than any users of them generally wanted them to
    do.

    A dirty page can become clean under two circumstances:

    (a) when we write it out. We have "clear_page_dirty_for_io()" for
    this, and that function remains unchanged.

    In the "for IO" case it is not sufficient to just clear the dirty
    bit, you also have to mark the page as being under writeback etc.

    (b) when we actually remove a page due to it becoming inaccessible to
    users, notably because it was truncate()'d away or the file (or
    metadata) no longer exists, and we thus want to cancel any
    outstanding dirty state.

    For the (b) case, we now introduce "cancel_dirty_page()", which only
    touches the page state itself, and verifies that the page is not mapped
    (since cancelling writes on a mapped page would be actively wrong as it
    is still accessible to users).

    Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
    ReiserFS, XFS all use the old confusing functions, and will be fixed
    separately in subsequent commits (with some of them just removing the
    offending logic, and others using clear_page_dirty_for_io()).

    This was confirmed by Martin Michlmayr to fix the apt database
    corruption on ARM.

    Cc: Martin Michlmayr
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Cc: Arjan van de Ven
    Cc: Andrei Popa
    Cc: Andrew Morton
    Cc: Dave Kleikamp
    Cc: Gordon Farquharson
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Dec, 2006

1 commit

  • Account for the number of byte writes which this process caused to not happen
    after all.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Oct, 2006

1 commit

  • If remove_mapping() failed to remove the page from its mapping, don't go and
    mark it not uptodate! Makes kernel go dead.

    (Actually, I don't think the ClearPageUptodate is needed there at all).

    Says Nick Piggin:

    "Right, it isn't needed because at this point the page is guaranteed
    by remove_mapping to have no references (except us) and cannot pick
    up any new ones because it is removed from pagecache.

    We can delete it."

    Signed-off-by: Andrew Morton
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Oct, 2006

2 commits

  • If try_to_release_page() is called with a zero gfp mask, then the
    filesystem is effectively denied the possibility of sleeping while
    attempting to release the page. There doesn't appear to be any valid
    reason why this should be banned, given that we're not calling this from a
    memory allocation context.

    For this reason, change the gfp_mask argument of the call to GFP_KERNEL.

    Signed-off-by: Trond Myklebust
    Cc: Steve Dickson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • A failure in invalidate_inode_pages2_range() can result in unpleasant things
    happening in NFS (at least). Stick a WARN_ON_ONCE() in there so we can find
    out if it happens, and maybe why.

    (akpm: might be a -mm-only patch, we'll see..)

    Cc: Chuck Lever
    Cc: Trond Myklebust
    Cc: Steve Dickson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Oct, 2006

3 commits

  • The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
    unfix invalidate_inode_pages2().

    The problem is that various bits of code in the kernel can take transient refs
    on pages: the page scanner will do this when inspecting a batch of pages, and
    the lru_cache_add() batching pagevecs also hold a ref.

    Net result is transient failures in invalidate_inode_pages2(). This affects
    NFS directory invalidation (observed) and presumably also block-backed
    direct-io (not yet reported).

    Fix it by reverting invalidate_inode_pages2() back to the old version which
    ignores the page refcounts.

    We may come up with something more clever later, but for now we need a 2.6.18
    fix for NFS.

    Cc: Chuck Lever
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

27 Sep, 2006

1 commit


09 Sep, 2006

1 commit

  • If a CPU faults this page into pagetables after invalidate_mapping_pages()
    checked page_mapped(), invalidate_complete_page() will still proceed to remove
    the page from pagecache. This leaves the page-faulting process with a
    detached page. If it was MAP_SHARED then file data loss will ensue.

    Fix that up by checking the page's refcount after taking tree_lock.

    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Jun, 2006

1 commit

  • If invalidate_mapping_pages is called to invalidate a very large mapping
    (e.g. a very large block device) and if the only active page in that
    device is near the end (or at least, at a very large index), such as, say,
    the superblock of an md array, and if that page happens to be locked when
    invalidate_mapping_pages is called, then

    pagevec_lookup will return this page and
    as it is locked, 'next' will be incremented and pagevec_lookup
    will be called again. and again. and again.
    while we count from 0 upto a very large number.

    We should really always set 'next' to 'page->index+1' before going around
    the loop again, not just if the page isn't locked.

    Cc: "Steinar H. Gunderson"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

10 Jan, 2006

1 commit


09 Jan, 2006

1 commit

  • Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
    discard as much pagecache and/or reclaimable slab objects as it can. THis
    operation requires root permissions.

    It won't drop dirty data, so the user should run `sync' first.

    Caveats:

    a) Holds inode_lock for exorbitant amounts of time.

    b) Needs to be taught about NUMA nodes: propagate these all the way through
    so the discarding can be controlled on a per-node basis.

    This is a debugging feature: useful for getting consistent results between
    filesystem benchmarks. We could possibly put it under a config option, but
    it's less than 300 bytes.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

07 Jan, 2006

1 commit

  • This patch makes truncate_inode_pages_range from truncate_inode_pages.
    truncate_inode_pages became a one-liner call to truncate_inode_pages_range.

    Reiser4 needs truncate_inode_pages_ranges because it tries to keep
    correspondence between existences of metadata pointing to data pages and pages
    to which those metadata point to. So, when metadata of certain part of file
    is removed from filesystem tree, only pages of corresponding range are to be
    truncated.

    (Needed by the madvise(MADV_REMOVE) patch)

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans Reiser
     

24 Nov, 2005

1 commit


31 Oct, 2005

1 commit

  • Fix the problem (BUG 4964) with unmapped buffers in transaction's
    t_sync_data list. The problem is we need to call filesystem's own
    invalidatepage() from block_write_full_page().

    block_write_full_page() must call filesystem's invalidatepage(). Otherwise
    following nasty race can happen:

    proc 1 proc 2
    ------ ------
    - write some new data to 'offset'
    => bh gets to the transactions data list
    - starts truncate
    => i_size set to new size
    - mpage_writepages()
    - ext3_ordered_writepage() to 'offset'
    - block_write_full_page()
    - page->index > end_index+1
    - block_invalidatepage()
    - discard_buffer()
    - clear_buffer_mapped()

    - commit triggers and finds unmapped buffer - BOOM!

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

01 May, 2005

1 commit