10 May, 2007

1 commit

  • It's very common for file systems to need to zero part or all of a page,
    the simplist way is just to use kmap_atomic() and memset(). There's
    actually a library function in include/linux/highmem.h that does exactly
    that, but it's confusingly named memclear_highpage_flush(), which is
    descriptive of *how* it does the work rather than what the *purpose* is.
    So this patchset renames the function to zero_user_page(), and calls it
    from the various places that currently open code it.

    This first patch introduces the new function call, and converts all the
    core kernel callsites, both the open-coded ones and the old
    memclear_highpage_flush() ones. Following this patch is a series of
    conversions for each file system individually, per AKPM, and finally a
    patch deprecating the old call. The diffstat below shows the entire
    patchset.

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Nate Diller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

02 Mar, 2007

1 commit


12 Feb, 2007

2 commits

  • Convert all calls to invalidate_inode_pages() into open-coded calls to
    invalidate_mapping_pages().

    Leave the invalidate_inode_pages() wrapper in place for now, marked as
    deprecated.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It makes no sense to me to export invalidate_inode_pages() and not
    invalidate_mapping_pages() and I actually need invalidate_mapping_pages()
    because of its range specification ability...

    akpm: also remove the export of invalidate_inode_pages() by making it an
    inlined wrapper.

    Signed-off-by: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     

27 Jan, 2007

2 commits

  • NFS can handle the case where invalidate_inode_pages2_range() fails, so the
    premise behind commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140 is now gone.

    Remove the WARN_ON_ONCE() which is causing users grief as we can see from
    http://bugzilla.kernel.org/show_bug.cgi?id=7826

    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • It's not pretty, but it appears that ext3 with data=journal will clean
    pages without ever actually telling the VM that they are clean. This,
    in turn, will result in the VM (and balance_dirty_pages() in particular)
    to never realize that the pages got cleaned, and wait forever for an
    event that already happened.

    Technically, this seems to be a problem with ext3 itself, but it used to
    be hidden by 'try_to_free_buffers()' noticing this situation on its own,
    and just working around the filesystem problem.

    This commit re-instates that hack, in order to avoid a regression for
    the 2.6.20 release. This fixes bugzilla 7844:

    http://bugzilla.kernel.org/show_bug.cgi?id=7844

    Peter Zijlstra points out that we should probably retain the debugging
    code that this removes from cancel_dirty_page(), and I agree, but for
    the imminent release we might as well just silence the warning too
    (since it's not a new bug: anything that triggers that warning has been
    around forever).

    Acked-by: Randy Dunlap
    Acked-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Jan, 2007

1 commit

  • NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

    Signed-off-by: Trond Myklebust

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Peter Zijlstra
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

24 Dec, 2006

1 commit

  • Make cancel_dirty_page() act more like all the other dirty and writeback
    accounting functions: test for "mapping" being NULL, and do the
    NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

    Also, add it to the exports, so that modular filesystems can use it.

    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Dec, 2006

1 commit


22 Dec, 2006

2 commits

  • truncate presently invalidates the dirty page's buffer_heads then shoots down
    the page. But try_to_free_buffers() will now bale out because the page is
    dirty.

    Net effect: the LRU gets filled with dirty pages which have invalidated
    buffer_heads attached. They have no ->mapping and hence cannot be cleaned.
    The machine leaks memory at an enormous rate.

    Fix this by cleaning the page before running try_to_free_buffers(), so
    try_to_free_buffers() can do its work.

    Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
    machine won't wedge up trying to write non-existent dirty pages.

    Probably still wrong, but now less so.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • They were horribly easy to mis-use because of their tempting naming, and
    they also did way more than any users of them generally wanted them to
    do.

    A dirty page can become clean under two circumstances:

    (a) when we write it out. We have "clear_page_dirty_for_io()" for
    this, and that function remains unchanged.

    In the "for IO" case it is not sufficient to just clear the dirty
    bit, you also have to mark the page as being under writeback etc.

    (b) when we actually remove a page due to it becoming inaccessible to
    users, notably because it was truncate()'d away or the file (or
    metadata) no longer exists, and we thus want to cancel any
    outstanding dirty state.

    For the (b) case, we now introduce "cancel_dirty_page()", which only
    touches the page state itself, and verifies that the page is not mapped
    (since cancelling writes on a mapped page would be actively wrong as it
    is still accessible to users).

    Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
    ReiserFS, XFS all use the old confusing functions, and will be fixed
    separately in subsequent commits (with some of them just removing the
    offending logic, and others using clear_page_dirty_for_io()).

    This was confirmed by Martin Michlmayr to fix the apt database
    corruption on ARM.

    Cc: Martin Michlmayr
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Cc: Arjan van de Ven
    Cc: Andrei Popa
    Cc: Andrew Morton
    Cc: Dave Kleikamp
    Cc: Gordon Farquharson
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Dec, 2006

1 commit

  • Account for the number of byte writes which this process caused to not happen
    after all.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Oct, 2006

1 commit

  • If remove_mapping() failed to remove the page from its mapping, don't go and
    mark it not uptodate! Makes kernel go dead.

    (Actually, I don't think the ClearPageUptodate is needed there at all).

    Says Nick Piggin:

    "Right, it isn't needed because at this point the page is guaranteed
    by remove_mapping to have no references (except us) and cannot pick
    up any new ones because it is removed from pagecache.

    We can delete it."

    Signed-off-by: Andrew Morton
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

12 Oct, 2006

2 commits

  • If try_to_release_page() is called with a zero gfp mask, then the
    filesystem is effectively denied the possibility of sleeping while
    attempting to release the page. There doesn't appear to be any valid
    reason why this should be banned, given that we're not calling this from a
    memory allocation context.

    For this reason, change the gfp_mask argument of the call to GFP_KERNEL.

    Signed-off-by: Trond Myklebust
    Cc: Steve Dickson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • A failure in invalidate_inode_pages2_range() can result in unpleasant things
    happening in NFS (at least). Stick a WARN_ON_ONCE() in there so we can find
    out if it happens, and maybe why.

    (akpm: might be a -mm-only patch, we'll see..)

    Cc: Chuck Lever
    Cc: Trond Myklebust
    Cc: Steve Dickson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Oct, 2006

3 commits

  • The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
    unfix invalidate_inode_pages2().

    The problem is that various bits of code in the kernel can take transient refs
    on pages: the page scanner will do this when inspecting a batch of pages, and
    the lru_cache_add() batching pagevecs also hold a ref.

    Net result is transient failures in invalidate_inode_pages2(). This affects
    NFS directory invalidation (observed) and presumably also block-backed
    direct-io (not yet reported).

    Fix it by reverting invalidate_inode_pages2() back to the old version which
    ignores the page refcounts.

    We may come up with something more clever later, but for now we need a 2.6.18
    fix for NFS.

    Cc: Chuck Lever
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

27 Sep, 2006

1 commit


09 Sep, 2006

1 commit

  • If a CPU faults this page into pagetables after invalidate_mapping_pages()
    checked page_mapped(), invalidate_complete_page() will still proceed to remove
    the page from pagecache. This leaves the page-faulting process with a
    detached page. If it was MAP_SHARED then file data loss will ensue.

    Fix that up by checking the page's refcount after taking tree_lock.

    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

23 Jun, 2006

1 commit

  • If invalidate_mapping_pages is called to invalidate a very large mapping
    (e.g. a very large block device) and if the only active page in that
    device is near the end (or at least, at a very large index), such as, say,
    the superblock of an md array, and if that page happens to be locked when
    invalidate_mapping_pages is called, then

    pagevec_lookup will return this page and
    as it is locked, 'next' will be incremented and pagevec_lookup
    will be called again. and again. and again.
    while we count from 0 upto a very large number.

    We should really always set 'next' to 'page->index+1' before going around
    the loop again, not just if the page isn't locked.

    Cc: "Steinar H. Gunderson"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

10 Jan, 2006

1 commit


09 Jan, 2006

1 commit

  • Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
    discard as much pagecache and/or reclaimable slab objects as it can. THis
    operation requires root permissions.

    It won't drop dirty data, so the user should run `sync' first.

    Caveats:

    a) Holds inode_lock for exorbitant amounts of time.

    b) Needs to be taught about NUMA nodes: propagate these all the way through
    so the discarding can be controlled on a per-node basis.

    This is a debugging feature: useful for getting consistent results between
    filesystem benchmarks. We could possibly put it under a config option, but
    it's less than 300 bytes.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

07 Jan, 2006

1 commit

  • This patch makes truncate_inode_pages_range from truncate_inode_pages.
    truncate_inode_pages became a one-liner call to truncate_inode_pages_range.

    Reiser4 needs truncate_inode_pages_ranges because it tries to keep
    correspondence between existences of metadata pointing to data pages and pages
    to which those metadata point to. So, when metadata of certain part of file
    is removed from filesystem tree, only pages of corresponding range are to be
    truncated.

    (Needed by the madvise(MADV_REMOVE) patch)

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans Reiser
     

24 Nov, 2005

1 commit


31 Oct, 2005

1 commit

  • Fix the problem (BUG 4964) with unmapped buffers in transaction's
    t_sync_data list. The problem is we need to call filesystem's own
    invalidatepage() from block_write_full_page().

    block_write_full_page() must call filesystem's invalidatepage(). Otherwise
    following nasty race can happen:

    proc 1 proc 2
    ------ ------
    - write some new data to 'offset'
    => bh gets to the transactions data list
    - starts truncate
    => i_size set to new size
    - mpage_writepages()
    - ext3_ordered_writepage() to 'offset'
    - block_write_full_page()
    - page->index > end_index+1
    - block_invalidatepage()
    - discard_buffer()
    - clear_buffer_mapped()

    - commit triggers and finds unmapped buffer - BOOM!

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

01 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds