16 Sep, 2009

1 commit


20 Aug, 2009

1 commit


12 Aug, 2009

1 commit

  • We can't call nfs_readdata_release()/nfs_writedata_release() without
    first initialising and referencing args.context. Doing so inside
    nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
    causes an Oops.

    We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
    those cases.

    Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
    referencing the nfs_open_context for us. Since the readdata and writedata
    structures carry a reference to that, we can simplify things by getting rid
    of the extra nfs_open_context references, so that we can replace all
    instances of nfs_readdata_release()/nfs_writedata_release().

    Reported-by: Catalin Marinas
    Signed-off-by: Trond Myklebust
    Tested-by: Catalin Marinas
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

10 Aug, 2009

1 commit


12 Jul, 2009

1 commit


11 Jul, 2009

2 commits

  • When building v2.6.31-rc2-344-g69ca06c, the following build errors are
    found due to missing includes:

    CC [M] fs/fuse/dev.o
    fs/fuse/dev.c: In function ‘request_end’:
    fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function)
    ...
    fs/nfs/write.c: In function ‘nfs_set_page_writeback’:
    fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function)

    Signed-off-by: Larry Finger@lwfinger.net>
    Signed-off-by: Linus Torvalds

    Larry Finger
     
  • Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
    the bdi congestion wait queue logic, causing us to wait on congestion
    for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Jun, 2009

4 commits

  • [nfs41: change nfs4_restart_rpc argument]
    [nfs41: check for session not minorversion]
    [nfs41: trigger the state manager for session reset]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [always define nfs4_restart_rpc]
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Separate commit calls from nfs41: sequence setup/done support

    Implement the commit rpc_call_prepare method for
    asynchronuos nfs rpcs, call nfs41_setup_sequence from
    respective rpc_call_validate_args methods.

    Call nfs4_sequence_done from respective rpc_call_done methods.

    Note that we need to pass a pointer to the nfs_server in calls data
    for passing on to nfs4_sequence_done.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [pnfs: client data server write validate and release]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: Support sessions with O_DIRECT.]
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Benny Halevy
    [nfs41: separate free slot from sequence done]
    [nfs41: nfs4_sequence_free_slot use nfs_client for data server]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Separate write calls from nfs41: sequence setup/done support

    Implement the write rpc_call_prepare method for
    asynchronuos nfs rpcs, call nfs41_setup_sequence from
    respective rpc_call_validate_args methods.

    Call nfs4_sequence_done from respective rpc_call_done methods.

    Note that we need to pass a pointer to the nfs_server in calls data
    for passing on to nfs4_sequence_done.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [pnfs: client data server write validate and release]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [move the nfs4_sequence_free_slot call in nfs_readpage_retry from]
    [nfs41: separate free slot from sequence done
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfs41: Support sessions with O_DIRECT.]
    Signed-off-by: Dean Hildebrand
    Signed-off-by: Benny Halevy
    [nfs41: nfs4_sequence_free_slot use nfs_client for data server]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Initialize nfs4_sequence_res sr_slotid to NFS4_MAX_SLOT_TABLE.

    [was nfs41: sequence res use slotid]
    Signed-off-by: Andy Adamson
    [pulled definition of struct nfs4_sequence_res.sr_slotid to here]
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

20 Mar, 2009

1 commit


12 Mar, 2009

2 commits

  • The following patch is a combination of a patch by myself and Peter
    Staubach.

    Trond: If we allow other processes to dirty pages while a process is doing
    a consistency sync to disk, we can end up never making progress.

    Peter: Attached is a patch which addresses a continuing problem with
    the NFS client generating out of order WRITE requests. While
    this is compliant with all of the current protocol
    specifications, there are servers in the market which can not
    handle out of order WRITE requests very well. Also, this may
    lead to sub-optimal block allocations in the underlying file
    system on the server. This may cause the read throughputs to
    be reduced when reading the file from the server.

    Peter: There has been a lot of work recently done to address out of
    order issues on a systemic level. However, the NFS client is
    still susceptible to the problem. Out of order WRITE
    requests can occur when pdflush is in the middle of writing
    out pages while the process dirtying the pages calls
    generic_file_buffered_write which calls
    generic_perform_write which calls
    balance_dirty_pages_rate_limited which ends up calling
    writeback_inodes which ends up calling back into the NFS
    client to writes out dirty pages for the same file that
    pdflush happens to be working with.

    Signed-off-by: Peter Staubach
    [modification by Trond to merge the two similar patches]
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     

08 Oct, 2008

1 commit


16 Jul, 2008

1 commit

  • The main problem is dealing with inode->i_size: we need to set the
    inode->i_lock on all attribute updates, and so vmtruncate won't cut it.
    Make an NFS-private version of vmtruncate that has the necessary locking
    semantics.

    The result should be that the following inode attribute updates are
    protected by inode->i_lock
    nfsi->cache_validity
    nfsi->read_cache_jiffies
    nfsi->attrtimeo
    nfsi->attrtimeo_timestamp
    nfsi->change_attr
    nfsi->last_updated
    nfsi->cache_change_attribute
    nfsi->access_cache
    nfsi->access_cache_entry_lru
    nfsi->access_cache_inode_lru
    nfsi->acl_access
    nfsi->acl_default
    nfsi->nfs_page_tree
    nfsi->ncommit
    nfsi->npages
    nfsi->open_files
    nfsi->silly_list
    nfsi->acl
    nfsi->open_states
    inode->i_size
    inode->i_atime
    inode->i_mtime
    inode->i_ctime
    inode->i_nlink
    inode->i_uid
    inode->i_gid

    The following is protected by dir->i_mutex
    nfsi->cookieverf

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

10 Jul, 2008

6 commits

  • Currently, if an unstable write completes, we cannot redirty the page in
    order to reflect a new change in the page data until after we've sent a
    COMMIT request.

    This patch allows a page rewrite to proceed without the unnecessary COMMIT
    step, putting it immediately back onto the dirty page list, undoing the
    VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from
    the NFS radix tree.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Simplify the loop in nfs_update_request by moving into a separate function
    the code that attempts to update an existing cached NFS write.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Clean up: fix a few dprintk messages that still need to show the RPC task ID
    correctly, and be sure we use the preferred %lld or %llu instead of %Ld or
    %Lu.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it
    is on disk". While it is true that the write may fail, that is always the
    case. There is no reason why we should treat data on pages that are not
    already marked as PG_uptodate as being special. The only thing we gain is a
    noticeable slowdown when re-reading these pages.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If a file is being extended, and we're creating a hole, we might as well
    declare the entire page to be up to date.

    This patch significantly improves the write performance for sparse files
    in the case where lseek(SEEK_END) is used to append several non-contiguous
    writes at intervals of < PAGE_SIZE.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The commit 2785259631697ebb0749a3782cca206e2e542939 (nfs: use GFP_NOFS
    preloads for radix-tree insertion) appears to have introduced a bug:
    We only want to call radix_tree_preload() once after creating a request.
    Calling it every time we loop after we created the request, will cause
    preemption count leaks.

    Signed-off-by: Trond Myklebust
    Cc: Nick Piggin

    Trond Myklebust
     

24 Jun, 2008

1 commit


17 May, 2008

1 commit

  • When called from nfs_flush_incompatible, the req is not locked, so
    req->wb_page might be set to NULL before it is used by PageWriteback.

    Signed-off-by: Fred Isaman
    Signed-off-by: Benny Halevy
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

20 Apr, 2008

3 commits


20 Mar, 2008

3 commits

  • Both flush functions have the same error handling routine. Pull
    it out as a function.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred
     
  • Trond Myklebust
     
  • Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

    In read path:
    call nfs_pageio_add_request from readpage_async_filler
    assume at this point that there are requests already in desc, that
    can't be merged with the current request.
    so nfs_pageio_doio is fired up to clear out desc.
    assume something goes wrong in setting up the io, so desc->pg_error is set.
    This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
    BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
    do_generic_mapping_read will eventually call lock_page, resulting in deadlock

    In write path:
    page is marked dirty by generic_perform_write
    nfs_writepages is called
    call nfs_pageio_add_request from nfs_page_async_flush
    assume at this point that there are requests already in desc, that
    can't be merged with the current request.
    so nfs_pageio_doio is fired up to clear out desc.
    assume something goes wrong in setting up the io, so desc->pg_error is set.
    This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
    The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     

08 Mar, 2008

1 commit


29 Feb, 2008

2 commits


26 Feb, 2008

3 commits

  • We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod
    rather than on rpciod, so that they don't deadlock when the resulting
    umount calls rpc_shutdown_client(). Hence we specify that read, write and
    commit calls must complete on nfsiod.
    Ditto for NFSv4 open, lock, locku and close asynchronous calls.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare()
    and task->tk_ops->rpc_call_done() to call mntput() in any way, since
    that will cause a deadlock when the call to rpc_shutdown_client() attempts
    to wait on 'task' to complete.

    We can avoid the above deadlock by moving calls to mntput to
    task->tk_ops->rpc_release() callback, since at that time the task will be
    marked as completed, and so rpc_shutdown_client won't attempt to wait on
    it.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • O_SYNC is stored in filp->f_flags.
    Thanks to Al Viro for pointing out the bug.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

14 Feb, 2008

1 commit

  • NFS should use GFP_NOFS mode radix tree preloads rather than GFP_ATOMIC
    allocations at radix-tree insertion-time. This is important to reduce the
    atomic memory requirement.

    Signed-off-by: Nick Piggin
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Trond Myklebust

    Nick Piggin
     

08 Feb, 2008

1 commit

  • If the inode is flagged as having an invalid mapping, then we can't rely on
    the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation"
    write optimisation in nfs_updatepage(), since that will cause NFS to write
    out areas of the page that are no longer guaranteed to be up to date.

    A potential corruption could occur in the following scenario:

    client 1 client 2
    =============== ===============
    fd=open("f",O_CREAT|O_WRONLY,0644);
    write(fd,"fubar\n",6); // cache last page
    close(fd);
    fd=open("f",O_WRONLY|O_APPEND);
    write(fd,"foo\n",4);
    close(fd);

    fd=open("f",O_WRONLY|O_APPEND);
    write(fd,"bar\n",4);
    close(fd);
    -----
    The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because
    client 2 does not update the cached page after re-opening the file for
    write. Instead it keeps it marked as PageUptodate() until someone calls
    invaldate_inode_pages2() (typically by calling read()).

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

06 Feb, 2008

1 commit

  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Feb, 2008

1 commit

  • * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
    Remove commented-out code copied from NFS
    NFS: Switch from intr mount option to TASK_KILLABLE
    Add wait_for_completion_killable
    Add wait_event_killable
    Add schedule_timeout_killable
    Use mutex_lock_killable in vfs_readdir
    Add mutex_lock_killable
    Use lock_page_killable
    Add lock_page_killable
    Add fatal_signal_pending
    Add TASK_WAKEKILL
    exit: Use task_is_*
    signal: Use task_is_*
    sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
    ptrace: Use task_is_*
    power: Use task_is_*
    wait: Use TASK_NORMAL
    proc/base.c: Use task_is_*
    proc/array.c: Use TASK_REPORT
    perfmon: Use task_is_*
    ...

    Fixed up conflicts in NFS/sunrpc manually..

    Linus Torvalds