03 Nov, 2012

1 commit


02 Nov, 2012

1 commit


01 Nov, 2012

8 commits

  • Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
    context. This avoids a deadlock where rpc_shutdown_client loops forever
    in a workqueue kworker context, trying to kill all RPC tasks associated with
    the client, while one or more of these tasks have already been assigned to the
    same kworker (and will never run rpc_exit_task).

    This approach is needed because RPC tasks that have already been assigned
    to a kworker by queue_work cannot be canceled, as explained in the comment
    for workqueue.c:insert_wq_barrier.

    Signed-off-by: Weston Andros Adamson
    [Trond: add module_get/put.]
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • Since commit c7f404b ('vfs: new superblock methods to override
    /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
    device name reported back to userland.

    nfs_path() always generates a trailing slash when the given dentry is
    the root of an NFS mount, but userland may expect the original device
    name to be returned verbatim (as it used to be). Make this
    canonicalisation optional and change the callers accordingly.

    [jrnieder@gmail.com: use flag instead of bool argument]
    Reported-and-tested-by: Chris Hiestand
    Reference: http://bugs.debian.org/669314
    Signed-off-by: Ben Hutchings
    Cc: # v2.6.39+
    Signed-off-by: Jonathan Nieder
    Signed-off-by: Trond Myklebust

    Ben Hutchings
     
  • In very busy v3 environment, rpc.mountd can respond to the NULL
    procedure but not the MNT procedure in a timely manner causing
    the MNT procedure to time out. The problem is the mount system
    call returns EIO which causes the mount to fail, instead of
    ETIMEDOUT, which would cause the mount to be retried.

    This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
    the rpc_call_sync() call in nfs_mount() which causes
    ETIMEDOUT to be returned on timed out connections.

    Signed-off-by: Steve Dickson
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Scott Mayhew
     
  • The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
    out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
    will go wrong because it can not deal with a NULL pointer.

    Signed-off-by: Yanchuan Nian
    Signed-off-by: Trond Myklebust

    Yanchuan Nian
     
  • The DNS resolver's use of the sunrpc cache involves a 'ttl' number
    (relative) rather that a timeout (absolute). This confused me when
    I wrote
    commit c5b29f885afe890f953f7f23424045cdad31d3e4
    "sunrpc: use seconds since boot in expiry cache"

    and I managed to break it. The effect is that any TTL is interpreted
    as 0, and nothing useful gets into the cache.

    This patch removes the use of get_expiry() - which really expects an
    expiry time - and uses get_uint() instead, treating the int correctly
    as a ttl.

    This fixes a regression that has been present since 2.6.37, causing
    certain NFS accesses in certain environments to incorrectly fail.

    Reported-by: Chuck Lever
    Tested-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • If the state recovery machinery is triggered by the call to
    nfs4_async_handle_error() then we can deadlock.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • If we do not release the sequence id in cases where we fail to get a
    session slot, then we can deadlock if we hit a recovery scenario.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • Currently, we will schedule session recovery and then return to the
    caller of nfs4_handle_exception. This works for most cases, but causes
    a hang on the following test case:

    Client Server
    ------ ------
    Open file over NFS v4.1
    Write to file
    Expire client
    Try to lock file

    The server will return NFS4ERR_BADSESSION, prompting the client to
    schedule recovery. However, the client will continue placing lock
    attempts and the open recovery never seems to be scheduled. The
    simplest solution is to wait for session recovery to run before retrying
    the lock.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Bryan Schumaker
     

17 Oct, 2012

3 commits


15 Oct, 2012

3 commits


13 Oct, 2012

1 commit

  • Pull nfsd update from J Bruce Fields:
    "Another relatively quiet cycle. There was some progress on my
    remaining 4.1 todo's, but a couple of them were just of the form
    "check that we do X correctly", so didn't have much affect on the
    code.

    Other than that, a bunch of cleanup and some bugfixes (including an
    annoying NFSv4.0 state leak and a busy-loop in the server that could
    cause it to peg the CPU without making progress)."

    * 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
    UAPI: (Scripted) Disintegrate include/linux/sunrpc
    UAPI: (Scripted) Disintegrate include/linux/nfsd
    nfsd4: don't allow reclaims of expired clients
    nfsd4: remove redundant callback probe
    nfsd4: expire old client earlier
    nfsd4: separate session allocation and initialization
    nfsd4: clean up session allocation
    nfsd4: minor free_session cleanup
    nfsd4: new_conn_from_crses should only allocate
    nfsd4: separate connection allocation and initialization
    nfsd4: reject bad forechannel attrs earlier
    nfsd4: enforce per-client sessions/no-sessions distinction
    nfsd4: set cl_minorversion at create time
    nfsd4: don't pin clientids to pseudoflavors
    nfsd4: fix bind_conn_to_session xdr comment
    nfsd4: cast readlink() bug argument
    NFSD: pass null terminated buf to kstrtouint()
    nfsd: remove duplicate init in nfsd4_cb_recall
    nfsd4: eliminate redundant nfs4_free_stateid
    fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
    ...

    Linus Torvalds
     

12 Oct, 2012

1 commit

  • Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
    for-3.7-incoming. Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
    backchannel tcp ops", without which the 4.1 server oopses.

    J. Bruce Fields
     

10 Oct, 2012

2 commits

  • Pull NFS client updates from Trond Myklebust:
    "Features include:

    - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
    Aside from the issues discussed at the LKS, distros are shipping
    NFSv4.1 with all the trimmings.
    - Fix fdatasync()/fsync() for the corner case of a server reboot.
    - NFSv4 OPEN access fix: finally distinguish correctly between
    open-for-read and open-for-execute permissions in all situations.
    - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
    - More idmapper bugfixes
    - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
    make the code easier to read.
    - In cases where a pNFS read or write fails, allow the client to
    resume trying layoutgets after two minutes of read/write-
    through-mds.
    - More net namespace fixes to the NFSv4 callback code.
    - More net namespace fixes to the NFSv3 locking code.
    - More NFSv4 migration preparatory patches.
    Including patches to detect network trunking in both NFSv4 and
    NFSv4.1
    - pNFS block updates to optimise LAYOUTGET calls."

    * tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
    pnfsblock: cleanup nfs4_blkdev_get
    NFS41: send real read size in layoutget
    NFS41: send real write size in layoutget
    NFS: track direct IO left bytes
    NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
    NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
    NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
    NFSv4 set open access operation call flag in nfs4_init_opendata_res
    NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
    NFSv4 reduce attribute requests for open reclaim
    NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
    NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
    NFSv4.1: Deal with wraparound issues when updating the layout stateid
    NFSv4.1: Always set the layout stateid if this is the first layoutget
    NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
    NFSv4: don't put ACCESS in OPEN compound if O_EXCL
    NFSv4: don't check MAY_WRITE access bit in OPEN
    NFS: Set key construction data for the legacy upcall
    NFSv4.1: don't do two EXCHANGE_IDs on mount
    NFS: nfs41_walk_client_list(): re-lock before iterating
    ...

    Linus Torvalds
     
  • This is to complete part of the Userspace API (UAPI) disintegration for which
    the preparatory patches were pulled recently. After these patches, userspace
    headers will be segregated into:

    include/uapi/linux/.../foo.h

    for the userspace interface stuff, and:

    include/linux/.../foo.h

    for the strictly kernel internal stuff.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

09 Oct, 2012

5 commits

  • Move actual pte filling for non-linear file mappings into the new special
    vma operation: ->remap_pages().

    Filesystems must implement this method to get non-linear mapping support,
    if it uses filemap_fault() then generic_file_remap_pages() can be used.

    Now device drivers can implement this method and obtain nonlinear vma support.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf #arch/tile
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • It is not needed at all and it is messing with return values...

    Reported-by: Wei Yongjun
    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • For buffer read, use offst-to-isize.

    For direct read, use dreq->bytes_left.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • For buffer write, block layout client scan inode mapping to find
    next hole and use offset-to-hole as layoutget length. Object
    layout client uses offset-to-isize as layoutget length.

    For direct write, both block layout and object layout use dreq->bytes_left.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     

06 Oct, 2012

1 commit


05 Oct, 2012

2 commits


04 Oct, 2012

2 commits


03 Oct, 2012

10 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     
  • We currently make no distinction in attribute requests between normal OPENs
    and OPEN with CLAIM_PREVIOUS. This offers more possibility of failures in
    the GETATTR response which foils OPEN reclaim attempts.

    Reduce the requested attributes to the bare minimum needed to update the
    reclaim open stateid and split nfs4_opendata_to_nfs4_state processing
    accordingly.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • ...before it can check the validity of that file type.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • ...and fix a bug in pnfs_set_layout_stateid.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • ...and add a helper function.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If the list of layout segments is empty, we must unconditionally set
    the layout stateid.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS
    will return permission denied for all bits until close.

    Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to
    OPEN compound)

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • Don't check MAY_WRITE as a newly created file may not have write mode bits,
    but POSIX allows the creating process to write regardless.
    This is ok because NFSv4 OPEN ops handle write permissions correctly -
    the ACCESS in the OPEN compound is to differentiate READ v EXEC permissions.

    Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to
    OPEN compound)

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson