08 Oct, 2008

1 commit


16 Jul, 2008

1 commit

  • The main problem is dealing with inode->i_size: we need to set the
    inode->i_lock on all attribute updates, and so vmtruncate won't cut it.
    Make an NFS-private version of vmtruncate that has the necessary locking
    semantics.

    The result should be that the following inode attribute updates are
    protected by inode->i_lock
    nfsi->cache_validity
    nfsi->read_cache_jiffies
    nfsi->attrtimeo
    nfsi->attrtimeo_timestamp
    nfsi->change_attr
    nfsi->last_updated
    nfsi->cache_change_attribute
    nfsi->access_cache
    nfsi->access_cache_entry_lru
    nfsi->access_cache_inode_lru
    nfsi->acl_access
    nfsi->acl_default
    nfsi->nfs_page_tree
    nfsi->ncommit
    nfsi->npages
    nfsi->open_files
    nfsi->silly_list
    nfsi->acl
    nfsi->open_states
    inode->i_size
    inode->i_atime
    inode->i_mtime
    inode->i_ctime
    inode->i_nlink
    inode->i_uid
    inode->i_gid

    The following is protected by dir->i_mutex
    nfsi->cookieverf

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Jul, 2008

6 commits

  • Currently, if an unstable write completes, we cannot redirty the page in
    order to reflect a new change in the page data until after we've sent a
    COMMIT request.

    This patch allows a page rewrite to proceed without the unnecessary COMMIT
    step, putting it immediately back onto the dirty page list, undoing the
    VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from
    the NFS radix tree.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Simplify the loop in nfs_update_request by moving into a separate function
    the code that attempts to update an existing cached NFS write.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Clean up: fix a few dprintk messages that still need to show the RPC task ID
    correctly, and be sure we use the preferred %lld or %llu instead of %Ld or
    %Lu.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it
    is on disk". While it is true that the write may fail, that is always the
    case. There is no reason why we should treat data on pages that are not
    already marked as PG_uptodate as being special. The only thing we gain is a
    noticeable slowdown when re-reading these pages.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If a file is being extended, and we're creating a hole, we might as well
    declare the entire page to be up to date.

    This patch significantly improves the write performance for sparse files
    in the case where lseek(SEEK_END) is used to append several non-contiguous
    writes at intervals of < PAGE_SIZE.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The commit 2785259631697ebb0749a3782cca206e2e542939 (nfs: use GFP_NOFS
    preloads for radix-tree insertion) appears to have introduced a bug:
    We only want to call radix_tree_preload() once after creating a request.
    Calling it every time we loop after we created the request, will cause
    preemption count leaks.

    Signed-off-by: Trond Myklebust
    Cc: Nick Piggin

    Trond Myklebust
     

24 Jun, 2008

1 commit


17 May, 2008

1 commit

  • When called from nfs_flush_incompatible, the req is not locked, so
    req->wb_page might be set to NULL before it is used by PageWriteback.

    Signed-off-by: Fred Isaman
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

20 Apr, 2008

3 commits


20 Mar, 2008

3 commits

  • Both flush functions have the same error handling routine. Pull
    it out as a function.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred
     
  • Trond Myklebust
     
  • Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

    In read path:
    call nfs_pageio_add_request from readpage_async_filler
    assume at this point that there are requests already in desc, that
    can't be merged with the current request.
    so nfs_pageio_doio is fired up to clear out desc.
    assume something goes wrong in setting up the io, so desc->pg_error is set.
    This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
    BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
    do_generic_mapping_read will eventually call lock_page, resulting in deadlock

    In write path:
    page is marked dirty by generic_perform_write
    nfs_writepages is called
    call nfs_pageio_add_request from nfs_page_async_flush
    assume at this point that there are requests already in desc, that
    can't be merged with the current request.
    so nfs_pageio_doio is fired up to clear out desc.
    assume something goes wrong in setting up the io, so desc->pg_error is set.
    This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
    The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

08 Mar, 2008

1 commit


29 Feb, 2008

2 commits


26 Feb, 2008

3 commits

  • We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod
    rather than on rpciod, so that they don't deadlock when the resulting
    umount calls rpc_shutdown_client(). Hence we specify that read, write and
    commit calls must complete on nfsiod.
    Ditto for NFSv4 open, lock, locku and close asynchronous calls.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare()
    and task->tk_ops->rpc_call_done() to call mntput() in any way, since
    that will cause a deadlock when the call to rpc_shutdown_client() attempts
    to wait on 'task' to complete.

    We can avoid the above deadlock by moving calls to mntput to
    task->tk_ops->rpc_release() callback, since at that time the task will be
    marked as completed, and so rpc_shutdown_client won't attempt to wait on
    it.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • O_SYNC is stored in filp->f_flags.
    Thanks to Al Viro for pointing out the bug.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

14 Feb, 2008

1 commit

  • NFS should use GFP_NOFS mode radix tree preloads rather than GFP_ATOMIC
    allocations at radix-tree insertion-time. This is important to reduce the
    atomic memory requirement.

    Signed-off-by: Nick Piggin
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Trond Myklebust

    Nick Piggin
     

08 Feb, 2008

1 commit

  • If the inode is flagged as having an invalid mapping, then we can't rely on
    the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation"
    write optimisation in nfs_updatepage(), since that will cause NFS to write
    out areas of the page that are no longer guaranteed to be up to date.

    A potential corruption could occur in the following scenario:

    client 1 client 2
    =============== ===============
    fd=open("f",O_CREAT|O_WRONLY,0644);
    write(fd,"fubar\n",6); // cache last page
    close(fd);
    fd=open("f",O_WRONLY|O_APPEND);
    write(fd,"foo\n",4);
    close(fd);

    fd=open("f",O_WRONLY|O_APPEND);
    write(fd,"bar\n",4);
    close(fd);
    -----
    The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because
    client 2 does not update the cached page after re-opening the file for
    write. Instead it keeps it marked as PageUptodate() until someone calls
    invaldate_inode_pages2() (typically by calling read()).

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

06 Feb, 2008

1 commit

  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Feb, 2008

1 commit

  • * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
    Remove commented-out code copied from NFS
    NFS: Switch from intr mount option to TASK_KILLABLE
    Add wait_for_completion_killable
    Add wait_event_killable
    Add schedule_timeout_killable
    Use mutex_lock_killable in vfs_readdir
    Add mutex_lock_killable
    Use lock_page_killable
    Add lock_page_killable
    Add fatal_signal_pending
    Add TASK_WAKEKILL
    exit: Use task_is_*
    signal: Use task_is_*
    sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
    ptrace: Use task_is_*
    power: Use task_is_*
    wait: Use TASK_NORMAL
    proc/base.c: Use task_is_*
    proc/array.c: Use TASK_REPORT
    perfmon: Use task_is_*
    ...

    Fixed up conflicts in NFS/sunrpc manually..

    Linus Torvalds
     

30 Jan, 2008

6 commits


07 Dec, 2007

1 commit


27 Nov, 2007

1 commit


20 Oct, 2007

1 commit

  • This patch fixes a regression that was introduced by commit
    44dd151d5c21234cc534c47d7382f5c28c3143cd

    We cannot zero the user page in nfs_mark_uptodate() any more, since

    a) We'd be modifying the page without holding the page lock
    b) We can race with other updates of the page, most notably
    because of the call to nfs_wb_page() in nfs_writepage_setup().

    Instead, we do the zeroing in nfs_update_request() if we see that we're
    creating a request that might potentially be marked as up to date.

    Thanks to Olivier Paquet for reporting the bug and providing a test-case.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

17 Oct, 2007

2 commits

  • Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • These patches aim to improve balance_dirty_pages() and directly address three
    issues:
    1) inter device starvation
    2) stacked device deadlocks
    3) inter process starvation

    1 and 2 are a direct result from removing the global dirty limit and using
    per device dirty limits. By giving each device its own dirty limit is will
    no longer starve another device, and the cyclic dependancy on the dirty limit
    is broken.

    In order to efficiently distribute the dirty limit across the independant
    devices a floating proportion is used, this will allocate a share of the total
    limit proportional to the device's recent activity.

    3 is done by also scaling the dirty limit proportional to the current task's
    recent dirty rate.

    This patch:

    nfs: remove congestion_end(). It's redundant, clear_bdi_congested() already
    wakes the waiters.

    Signed-off-by: Peter Zijlstra
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

10 Oct, 2007

3 commits