13 Sep, 2013

1 commit

  • Pull vfs pile 4 from Al Viro:
    "list_lru pile, mostly"

    This came out of Andrew's pile, Al ended up doing the merge work so that
    Andrew didn't have to.

    Additionally, a few fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
    super: fix for destroy lrus
    list_lru: dynamically adjust node arrays
    shrinker: Kill old ->shrink API.
    shrinker: convert remaining shrinkers to count/scan API
    staging/lustre/libcfs: cleanup linux-mem.h
    staging/lustre/ptlrpc: convert to new shrinker API
    staging/lustre/obdclass: convert lu_object shrinker to count/scan API
    staging/lustre/ldlm: convert to shrinkers to count/scan API
    hugepage: convert huge zero page shrinker to new shrinker API
    i915: bail out earlier when shrinker cannot acquire mutex
    drivers: convert shrinkers to new count/scan API
    fs: convert fs shrinkers to new scan/count API
    xfs: fix dquot isolation hang
    xfs-convert-dquot-cache-lru-to-list_lru-fix
    xfs: convert dquot cache lru to list_lru
    xfs: rework buffer dispose list tracking
    xfs-convert-buftarg-lru-to-generic-code-fix
    xfs: convert buftarg LRU to generic code
    fs: convert inode and dentry shrinking to be node aware
    vmscan: per-node deferred work
    ...

    Linus Torvalds
     

11 Sep, 2013

2 commits

  • Pull nfsd updates from Bruce Fields:
    "This was a very quiet cycle! Just a few bugfixes and some cleanup"

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
    rpc: let xdr layer allocate gssproxy receieve pages
    rpc: fix huge kmalloc's in gss-proxy
    rpc: comment on linux_cred encoding, treat all as unsigned
    rpc: clean up decoding of gssproxy linux creds
    svcrpc: remove unused rq_resused
    nfsd4: nfsd4_create_clid_dir prints uninitialized data
    nfsd4: fix leak of inode reference on delegation failure
    Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
    sunrpc: prepare NFS for 2038
    nfsd4: fix setlease error return
    nfsd: nfs4_file_get_access: need to be more careful with O_RDWR

    Linus Torvalds
     
  • Convert the filesystem shrinkers to use the new API, and standardise some
    of the behaviours of the shrinkers at the same time. For example,
    nr_to_scan means the number of objects to scan, not the number of objects
    to free.

    I refactored the CIFS idmap shrinker a little - it really needs to be
    broken up into a shrinker per tree and keep an item count with the tree
    root so that we don't need to walk the tree every time the shrinker needs
    to count the number of objects in the tree (i.e. all the time under
    memory pressure).

    [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
    [assorted fixes folded in]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Acked-by: Artem Bityutskiy
    Acked-by: Jan Kara
    Acked-by: Steven Whitehouse
    Cc: Adrian Hunter
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     

04 Sep, 2013

1 commit


31 Aug, 2013

4 commits


08 Aug, 2013

2 commits


27 Jul, 2013

1 commit

  • This actually makes a difference in the 4.1 case, since we use the
    status to decide what reason to give the client for the delegation
    refusal (see nfsd4_open_deleg_none_ext), and in theory a client might
    choose suboptimal behavior if we give the wrong answer.

    Reported-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

24 Jul, 2013

2 commits

  • If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY
    the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR]))
    doesn't trigger when it should.

    Signed-off-by: Harshula Jayasuriya
    Signed-off-by: J. Bruce Fields

    Harshula Jayasuriya
     
  • The following call chain:
    ------------------------------------------------------------
    nfs4_get_vfs_file
    - nfsd_open
    - dentry_open
    - do_dentry_open
    - __get_file_write_access
    - get_write_access
    - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
    ------------------------------------------------------------

    can result in the following state:
    ------------------------------------------------------------
    struct nfs4_file {
    ...
    fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
    fi_access = {{
    counter = 0x1
    }, {
    counter = 0x0
    }},
    ...
    ------------------------------------------------------------

    1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
    NULL, hence nfsd_open() is called where we get status set to an error
    and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
    nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

    2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
    NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
    nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
    Thus we leave a landmine in the form of the nfs4_file data structure in
    an incorrect state.

    3) Eventually, when __nfs4_file_put_access() is called it finds
    fi_access[O_WRONLY] being non-zero, it decrements it and calls
    nfs4_file_put_fd() which tries to fput -ETXTBSY.
    ------------------------------------------------------------
    ...
    [exception RIP: fput+0x9]
    RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282
    RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002
    RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6
    RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58
    R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001
    R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
    #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
    #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
    #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
    #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
    #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
    #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
    #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
    #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
    #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
    #19 [ffff88062e365ee8] kthread at ffffffff81090886
    #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
    ------------------------------------------------------------

    Cc: stable@vger.kernel.org
    Signed-off-by: Harshula Jayasuriya
    Signed-off-by: J. Bruce Fields

    Harshula Jayasuriya
     

18 Jul, 2013

1 commit


13 Jul, 2013

1 commit

  • You can turn on or off support for minorversions using e.g.

    echo "-4.2" >/proc/fs/nfsd/versions

    However, the current implementation is a little wonky. For example, the
    above will turn off 4.2 support, but it will also turn *on* 4.1 support.

    This didn't matter as long as we only had 2 minorversions, which was
    true till very recently.

    And do a little cleanup here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

12 Jul, 2013

1 commit

  • Pull nfsd changes from Bruce Fields:
    "Changes this time include:

    - 4.1 enabled on the server by default: the last 4.1-specific issues
    I know of are fixed, so we're not going to find the rest of the
    bugs without more exposure.
    - Experimental support for NFSv4.2 MAC Labeling (to allow running
    selinux over NFS), from Dave Quigley.
    - Fixes for some delicate cache/upcall races that could cause rare
    server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
    debugging persistence.
    - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
    and v4.1-specific, but also a generic bug handling fragmented rpc
    calls"

    * 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
    nfsd4: support minorversion 1 by default
    nfsd4: allow destroy_session over destroyed session
    svcrpc: fix failures to handle -1 uid's
    sunrpc: Don't schedule an upcall on a replaced cache entry.
    net/sunrpc: xpt_auth_cache should be ignored when expired.
    sunrpc/cache: ensure items removed from cache do not have pending upcalls.
    sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
    sunrpc/cache: remove races with queuing an upcall.
    nfsd4: return delegation immediately if lease fails
    nfsd4: do not throw away 4.1 lock state on last unlock
    nfsd4: delegation-based open reclaims should bypass permissions
    svcrpc: don't error out on small tcp fragment
    svcrpc: fix handling of too-short rpc's
    nfsd4: minor read_buf cleanup
    nfsd4: fix decoding of compounds across page boundaries
    nfsd4: clean up nfs4_open_delegation
    NFSD: Don't give out read delegations on creates
    nfsd4: allow client to send no cb_sec flavors
    nfsd4: fail attempts to request gss on the backchannel
    nfsd4: implement minimal SP4_MACH_CRED
    ...

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Feature highlights include:
    - Add basic client support for NFSv4.2
    - Add basic client support for Labeled NFS (selinux for NFSv4.2)
    - Fix the use of credentials in NFSv4.1 stateful operations, and add
    support for NFSv4.1 state protection.

    Bugfix highlights:
    - Fix another NFSv4 open state recovery race
    - Fix an NFSv4.1 back channel session regression
    - Various rpc_pipefs races
    - Fix another issue with NFSv3 auth negotiation

    Please note that Labeled NFS does require some additional support from
    the security subsystem. The relevant changesets have all been
    reviewed and acked by James Morris."

    * tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
    NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
    NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
    nfs: have NFSv3 try server-specified auth flavors in turn
    nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
    nfs: move server_authlist into nfs_try_mount_request
    nfs: refactor "need_mount" code out of nfs_try_mount
    SUNRPC: PipeFS MOUNT notification optimization for dying clients
    SUNRPC: split client creation routine into setup and registration
    SUNRPC: fix races on PipeFS UMOUNT notifications
    SUNRPC: fix races on PipeFS MOUNT notifications
    NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
    NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
    NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
    NFS: Improve legacy idmapping fallback
    NFSv4.1 end back channel session draining
    NFS: Apply v4.1 capabilities to v4.2
    NFSv4.1: Clean up layout segment comparison helper names
    NFSv4.1: layout segment comparison helpers should take 'const' parameters
    NFSv4: Move the DNS resolver into the NFSv4 module
    rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
    ...

    Linus Torvalds
     

09 Jul, 2013

2 commits

  • We now have minimal minorversion 1 support; turn it on by default.

    This can still be turned off with "echo -4.1 >/proc/fs/nfsd/versions".

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • RFC 5661 allows a client to destroy a session using a compound
    associated with the destroyed session, as long as the DESTROY_SESSION op
    is the last op of the compound.

    We attempt to allow this, but testing against a Solaris client (which
    does destroy sessions in this way) showed that we were failing the
    DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
    count on the session (held by us) represented another rpc in progress
    over this session.

    Fix this by noting that in this case the expected reference count is 1,
    not 0.

    Also, note as long as the session holds a reference to the compound
    we're destroying, we can't free it here--instead, delay the free till
    the final put in nfs4svc_encode_compoundres.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

02 Jul, 2013

11 commits

  • This case shouldn't happen--the administrator shouldn't really allow
    other applications access to the export until clients have had the
    chance to reclaim their state--but if it does then we should set the
    "return this lease immediately" bit on the reply. That still leaves
    some small races, but it's the best the protocol allows us to do in the
    case a lease is ripped out from under us....

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • This reverts commit eb2099f31b0f090684a64ef8df44a30ff7c45fc2 "nfsd4:
    release lockowners on last unlock in 4.1 case". Trond identified
    language in rfc 5661 section 8.2.4 which forbids this behavior:

    Stateids associated with byte-range locks are an exception.
    They remain valid even if a LOCKU frees all remaining locks, so
    long as the open file with which they are associated remains
    open, unless the client frees the stateids via the FREE_STATEID
    operation.

    And bakeathon 2013 testing found a 4.1 freebsd client was getting an
    incorrect BAD_STATEID return from a FREE_STATEID in the above situation
    and then failing.

    The spec language honestly was probably a mistake but at this point with
    implementations already following it we're probably stuck with that.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • We saw a v4.0 client's create fail as follows:

    - open create succeeds and gets a read delegation
    - client attempts to set mode on new file, gets DELAY while
    server recalls delegation.
    - client attempts a CLAIM_DELEGATE_CUR open using the
    delegation, gets error because of new file mode.

    This probably can't happen on a recent kernel since we're no longer
    giving out delegations on create opens. Nevertheless, it's a
    bug--reclaim opens should bypass permission checks.

    Reported-by: Steve Dickson
    Reported-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The code to step to the next page seems reasonably self-contained.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
    A network trace showed the server returning BAD_XDR on the final getattr
    of a getattr+write+getattr compound. The final getattr started on a
    page boundary.

    I believe the Linux client ignores errors on the post-write getattr, and
    that that's why we haven't seen this before.

    Cc: stable@vger.kernel.org
    Reported-by: Rick Macklem
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The nfs4_open_delegation logic is unecessarily baroque.

    Also stop pretending we support write delegations in several places.

    Some day we will support write delegations, but when that happens adding
    back in these flag parameters will be the easy part. For now they're
    just confusing.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When an exclusive create is done with the mode bits
    set (aka open(testfile, O_CREAT | O_EXCL, 0777)) this
    causes a OPEN op followed by a SETATTR op. When a
    read delegation is given in the OPEN, it causes
    the SETATTR to delay with EAGAIN until the
    delegation is recalled.

    This patch caused exclusive creates to give out
    a write delegation (which turn into no delegation)
    which allows the SETATTR seamlessly succeed.

    Signed-off-by: Steve Dickson
    [bfields: do this for any CREATE, not just exclusive; comment]
    Signed-off-by: J. Bruce Fields

    Steve Dickson
     
  • In testing I notice that some of the pynfs tests forget to send any
    cb_sec flavors, and that we haven't necessarily errored out in that case
    before.

    I'll fix pynfs, but am also inclined to default to trying AUTH_NONE in
    that case in case this is something clients actually do.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • We don't support gss on the backchannel. We should state that fact up
    front rather than just letting things continue and later making the
    client try to figure out why the backchannel isn't working.

    Trond suggested instead returning NFS4ERR_NOENT. I think it would be
    tricky for the client to distinguish between the case "I don't support
    gss on the backchannel" and "I can't find that in my cache, please
    create another context and try that instead", and I'd prefer something
    that currently doesn't have any other meaning for this operation, hence
    the (somewhat arbitrary) NFS4ERR_ENCR_ALG_UNSUPP.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
    the client-provided spo_must_* arrays and just enforcing credential
    checks for the minimum required operations.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
    This will make it easier to enforce SP4_MACH_CRED, which needs to
    compare the mechanism used on the exchange_id with that used on
    protected operations.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

29 Jun, 2013

4 commits

  • Having a global lock that protects all of this code is a clear
    scalability problem. Instead of doing that, move most of the code to be
    protected by the i_lock instead. The exceptions are the global lists
    that the ->fl_link sits on, and the ->fl_block list.

    ->fl_link is what connects these structures to the
    global lists, so we must ensure that we hold those locks when iterating
    over or updating these lists.

    Furthermore, sound deadlock detection requires that we hold the
    blocked_list state steady while checking for loops. We also must ensure
    that the search and update to the list are atomic.

    For the checking and insertion side of the blocked_list, push the
    acquisition of the global lock into __posix_lock_file and ensure that
    checking and update of the blocked_list is done without dropping the
    lock in between.

    On the removal side, when waking up blocked lock waiters, take the
    global lock before walking the blocked list and dequeue the waiters from
    the global list prior to removal from the fl_block list.

    With this, deadlock detection should be race free while we minimize
    excessive file_lock_lock thrashing.

    Finally, in order to avoid a lock inversion problem when handling
    /proc/locks output we must ensure that manipulations of the fl_block
    list are also protected by the file_lock_lock.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • New method - ->iterate(file, ctx). That's the replacement for ->readdir();
    it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
    calls dir_emit(ctx, ...) instead of filldir(data, ...). It does *not*
    update file->f_pos (or look at it, for that matter); iterate_dir() does the
    update.

    Note that dir_emit() takes the offset from ctx->pos (and eventually
    filldir_t will lose that argument).

    Signed-off-by: Al Viro

    Al Viro
     
  • iterate_dir(): new helper, replacing vfs_readdir().

    struct dir_context: contains the readdir callback (and will get more stuff
    in it), embedded into whatever data that callback wants to deal with;
    eventually, we'll be passing it to ->readdir() replacement instead of
    (data,filldir) pair.

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2013

1 commit

  • In C, signed integer overflow results in undefined behavior, but unsigned
    overflow wraps around. So do the subtraction first, then cast to signed.

    Reported-by: Joakim Tjernlund
    Signed-off-by: Jim Rees
    Signed-off-by: J. Bruce Fields

    Jim Rees
     

15 May, 2013

2 commits


13 May, 2013

3 commits

  • This enables NFSv4.2 support for the server. To enable this
    code do the following:
    echo "+4.2" >/proc/fs/nfsd/versions

    after the nfsd kernel module is loaded.

    On its own this does nothing except allow the server to respond to
    compounds with minorversion set to 2. All the new NFSv4.2 features are
    optional, so this is perfectly legal.

    Signed-off-by: Steve Dickson
    Signed-off-by: J. Bruce Fields

    Steve Dickson
     
  • This code assumes that any client using exchange_id is using NFSv4.1,
    but with the introduction of 4.2 that will no longer true.

    This main effect of this is that client callbacks will use the same
    minorversion as that used on the exchange_id.

    Note that clients are forbidden from mixing 4.1 and 4.2 compounds. (See
    rfc 5661, section 2.7, #13: "A client MUST NOT attempt to use a stateid,
    filehandle, or similar returned object from the COMPOUND procedure with
    minor version X for another COMPOUND procedure with minor version Y,
    where X != Y.") However, we do not currently attempt to enforce this
    except in the case of mixing zero minor version with non-zero minor
    versions.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The fh_lock_parent(), nfsd_truncate(), nfsd_notify_change() and
    nfsd_sync_dir() fuctions are neither implemented nor used, just remove
    them.

    Signed-off-by: Zhao Hongjiang
    Signed-off-by: J. Bruce Fields

    Zhao Hongjiang