10 May, 2011

1 commit


20 Apr, 2011

2 commits

  • An open on a NFS4 share using the O_CREAT flag on an existing file for
    which we have permissions to open but contained in a directory with no
    write permissions will fail with EACCES.

    A tcpdump shows that the client had set the open mode to UNCHECKED which
    indicates that the file should be created if it doesn't exist and
    encountering an existing flag is not an error. Since in this case the
    file exists and can be opened by the user, the NFS server is wrong in
    attempting to check create permissions on the parent directory.

    The patch adds a conditional statement to check for create permissions
    only if the file doesn't exist.

    Signed-off-by: Sachin S. Prabhu
    Signed-off-by: J. Bruce Fields

    Sachin Prabhu
     
  • 23fcf2ec93fb8573a653408316af599939ff9a8e (nfsd4: fix oops on lock failure)

    The above patch breaks free path for stp->st_file. If stp was inserted
    into sop->so_stateids, we have to free stp->st_file refcount. Because
    stp->st_file refcount itself is taken whether or not any refcounts are
    taken on the stp->st_file->fi_fds[].

    Signed-off-by: OGAWA Hirofumi
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    OGAWA Hirofumi
     

19 Apr, 2011

1 commit


12 Apr, 2011

1 commit


11 Apr, 2011

1 commit

  • Lock stateid's can have access_bmap 0 if they were only partially
    initialized (due to a failed lock request); handle that case in
    free_generic_stateid.

    ------------[ cut here ]------------
    kernel BUG at fs/nfsd/nfs4state.c:380!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/kernel/mm/ksm/run
    Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf]

    Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
    EIP: 0060:[] EFLAGS: 00010297 CPU: 0
    EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd]
    EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004
    ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000)
    Stack:
    dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20
    ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf
    dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68
    Call Trace:
    [] free_generic_stateid+0x1c/0x3e [nfsd]
    [] release_lockowner+0x71/0x8a [nfsd]
    [] nfsd4_lock+0x617/0x66c [nfsd]
    [] ? nfsd_setuser+0x199/0x1bb [nfsd]
    [] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd]
    [] ? _cond_resched+0x8/0x1c
    [] ? slab_pre_alloc_hook.clone.33+0x23/0x27
    [] ? kmem_cache_alloc+0x1a/0xd2
    [] ? __call_rcu+0xd7/0xdd
    [] ? fh_verify+0x401/0x452 [nfsd]
    [] ? nfsd4_encode_operation+0x52/0x117 [nfsd]
    [] ? nfsd4_putfh+0x33/0x3b [nfsd]
    [] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd]
    [] nfsd4_proc_compound+0x1ea/0x33e [nfsd]
    [] nfsd_dispatch+0xd1/0x1a5 [nfsd]
    [] svc_process_common+0x282/0x46f [sunrpc]
    [] svc_process+0xdc/0xfa [sunrpc]
    [] nfsd+0xd6/0x115 [nfsd]
    [] ? nfsd_shutdown+0x24/0x24 [nfsd]
    [] kthread+0x62/0x67
    [] ? kthread_worker_fn+0x114/0x114
    [] kernel_thread_helper+0x6/0x10
    Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d
    EIP: [] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0
    ---[ end trace 2b0bf6c6557cb284 ]---

    The trace route is:

    -> nfsd4_lock()
    -> if (lock->lk_is_new) {
    -> alloc_init_lock_stateid()

    3739: stp->st_access_bmap = 0;

    ->if (status && lock->lk_is_new && lock_sop)
    -> release_lockowner()
    -> free_generic_stateid()
    -> nfs4_access_bmap_to_omode()
    -> nfs4_access_to_omode()

    380: BUG(); *****

    This problem was introduced by 0997b173609b9229ece28941c118a2a9b278796e.

    Reported-by: Mi Jinlong
    Tested-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • This was noticed by users who performed more than 2^32 lock operations
    and hence made this counter overflow (eventually leading to
    use-after-free's). Setting rq_client to NULL here means that it won't
    later get auth_domain_put() when it should be.

    Appears to have been introduced in 2.5.42 by "[PATCH] kNFSd: Move auth
    domain lookup into svcauth" which moved most of the rq_client handling
    to common svcauth code, but left behind this one line.

    Cc: Neil Brown
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

24 Mar, 2011

1 commit

  • * 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
    SUNRPC: Remove resource leak in svc_rdma_send_error()
    nfsd: wrong index used in inner loop
    nfsd4: fix comment and remove unused nfsd4_file fields
    nfs41: make sure nfs server return right ca_maxresponsesize_cached
    nfsd: fix compile error
    svcrpc: fix bad argument in unix_domain_find
    nfsd4: fix struct file leak
    nfsd4: minor nfs4state.c reshuffling
    svcrpc: fix rare race on unix_domain creation
    nfsd41: modify the members value of nfsd4_op_flags
    nfsd: add proc file listing kernel's gss_krb5 enctypes
    gss:krb5 only include enctype numbers in gm_upcall_enctypes
    NFSD, VFS: Remove dead code in nfsd_rename()
    nfsd: kill unused macro definition
    locks: use assign_type()

    Linus Torvalds
     

18 Mar, 2011

4 commits


16 Mar, 2011

1 commit

  • According to rfc5661,

    ca_maxresponsesize_cached:

    Like ca_maxresponsesize, but the maximum size of a reply that
    will be stored in the reply cache (Section 2.10.6.1). For each
    channel, the server MAY decrease this value, but MUST NOT
    increase it.

    the latest kernel(2.6.38-rc8) may increase the value for ignoring
    request's ca_maxresponsesize_cached value. We should not ignore it.

    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     

15 Mar, 2011

1 commit

  • "fs/built-in.o: In function `supported_enctypes_show':
    nfsctl.c:(.text+0x7beb0): undefined reference to `gss_mech_get_by_name'
    nfsctl.c:(.text+0x7bebc): undefined reference to `gss_mech_put'
    "

    Reported-by: Guennadi Liakhovetski
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

09 Mar, 2011

3 commits


08 Mar, 2011

5 commits

  • The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS)
    equals to ALLOWED_AS_FIRST_OP, maybe that's not what we want.

    OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS,
    can't appears as the first operation with out SEQUENCE ops.

    This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which
    was introduced by f9bb94c4.

    Cc: stable@kernel.org
    Reviewed-by: Benny Halevy
    Signed-off-by: Mi Jinlong
    Signed-off-by: J. Bruce Fields

    Mi Jinlong
     
  • Add a new proc file which lists the encryption types supported
    by the kernel's gss_krb5 code.

    Newer MIT Kerberos libraries support the assertion of acceptor
    subkeys. This enctype information allows user-land (svcgssd)
    to request that the Kerberos libraries limit the encryption
    types that it uses when generating the subkeys.

    Signed-off-by: Kevin Coffman
    Signed-off-by: J. Bruce Fields

    Kevin Coffman
     
  • Currently we have the following code in fs/nfsd/vfs.c::nfsd_rename() :

    ...
    host_err = nfsd_break_lease(odentry->d_inode);
    if (host_err)
    goto out_drop_write;
    if (ndentry->d_inode) {
    host_err = nfsd_break_lease(ndentry->d_inode);
    if (host_err)
    goto out_drop_write;
    }
    if (host_err)
    goto out_drop_write;
    ...

    'host_err' is guaranteed to be 0 by the time we test 'ndentry->d_inode'.
    If 'host_err' becomes != 0 inside the 'if' statement, then we goto
    'out_drop_write'. So, after the 'if' statement there is no way that
    'host_err' can be anything but 0, so the test afterwards is just dead
    code.
    This patch removes the dead code.

    Signed-off-by: Jesper Juhl
    Signed-off-by: J. Bruce Fields

    Jesper Juhl
     
  • These macros had never been used for several years.
    So, remove them.

    Signed-off-by: Shan Wei
    Signed-off-by: J. Bruce Fields

    Shan Wei
     
  • In case of a nonempty list, the return on error here is obviously bogus;
    it ends up being a pointer to the list head instead of to any valid
    delegation on the list.

    In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
    then renew_client may oops, and it may take an embarassingly long time to
    figure out why. Facepalm.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
    IP: [] nfsd4_delegreturn+0x125/0x200
    ...

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

23 Feb, 2011

1 commit

  • Fix bug introduced in patch
    85a56480 NFSD: Update XDR decoders in NFSv4 callback client

    Although decode_cb_sequence4resok ignores highest slotid and target highest slotid
    it must account for their space in their xdr stream when calling xdr_inline_decode

    Cc: Chuck Lever
    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Benny Halevy
     

17 Feb, 2011

1 commit

  • These functions return an nfs status, not a host_err. So don't
    try to convert before returning.

    This is a regression introduced by
    3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
    but missed these two.

    Cc: stable@kernel.org
    Reported-by: Herbert Poetzl
    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

14 Feb, 2011

11 commits


17 Jan, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
    sanitize vfsmount refcounting changes
    fix old umount_tree() breakage
    autofs4: Merge the remaining dentry ops tables
    Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
    Allow d_manage() to be used in RCU-walk mode
    Remove a further kludge from __do_follow_link()
    autofs4: Bump version
    autofs4: Add v4 pseudo direct mount support
    autofs4: Fix wait validation
    autofs4: Clean up autofs4_free_ino()
    autofs4: Clean up dentry operations
    autofs4: Clean up inode operations
    autofs4: Remove unused code
    autofs4: Add d_manage() dentry operation
    autofs4: Add d_automount() dentry operation
    Remove the automount through follow_link() kludge code from pathwalk
    CIFS: Use d_automount() rather than abusing follow_link()
    NFS: Use d_automount() rather than abusing follow_link()
    AFS: Use d_automount() rather than abusing follow_link()
    Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
    ...

    Linus Torvalds
     

16 Jan, 2011

1 commit

  • Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
    sleep when it tries to transit away from one of that filesystem's directories
    during a pathwalk. The operation is keyed off a new dentry flag
    (DCACHE_MANAGE_TRANSIT).

    The filesystem is allowed to be selective about which processes it holds and
    which it permits to continue on or prohibits from transiting from each flagged
    directory. This will allow autofs to hold up client processes whilst letting
    its userspace daemon through to maintain the directory or the stuff behind it
    or mounted upon it.

    The ->d_manage() dentry operation:

    int (*d_manage)(struct path *path, bool mounting_here);

    takes a pointer to the directory about to be transited away from and a flag
    indicating whether the transit is undertaken by do_add_mount() or
    do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

    It should return 0 if successful and to let the process continue on its way;
    -EISDIR to prohibit the caller from skipping to overmounted filesystems or
    automounting, and to use this directory; or some other error code to return to
    the user.

    ->d_manage() is called with namespace_sem writelocked if mounting_here is true
    and no other locks held, so it may sleep. However, if mounting_here is true,
    it may not initiate or wait for a mount or unmount upon the parameter
    directory, even if the act is actually performed by userspace.

    Within fs/namei.c, follow_managed() is extended to check with d_manage() first
    on each managed directory, before transiting away from it or attempting to
    automount upon it.

    follow_down() is renamed follow_down_one() and should only be used where the
    filesystem deliberately intends to avoid management steps (e.g. autofs).

    A new follow_down() is added that incorporates the loop done by all other
    callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
    and CIFS do use it, their use is removed by converting them to use
    d_automount()). The new follow_down() calls d_manage() as appropriate. It
    also takes an extra parameter to indicate if it is being called from mount code
    (with namespace_sem writelocked) which it passes to d_manage(). follow_down()
    ignores automount points so that it can be used to mount on them.

    __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
    DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
    sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
    that determine whether to abort or not itself. That would allow the autofs
    daemon to continue on in rcu-walk mode.

    Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
    required as every tranist from that directory will cause d_manage() to be
    invoked. It can always be set again when necessary.

    ==========================
    WHAT THIS MEANS FOR AUTOFS
    ==========================

    Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
    trigger the automounting of indirect mounts, and both of these can be called
    with i_mutex held.

    autofs knows that the i_mutex will be held by the caller in lookup(), and so
    can drop it before invoking the daemon - but this isn't so for d_revalidate(),
    since the lock is only held on _some_ of the code paths that call it. This
    means that autofs can't risk dropping i_mutex from its d_revalidate() function
    before it calls the daemon.

    The bug could manifest itself as, for example, a process that's trying to
    validate an automount dentry that gets made to wait because that dentry is
    expired and needs cleaning up:

    mkdir S ffffffff8014e05a 0 32580 24956
    Call Trace:
    [] :autofs4:autofs4_wait+0x674/0x897
    [] avc_has_perm+0x46/0x58
    [] autoremove_wake_function+0x0/0x2e
    [] :autofs4:autofs4_expire_wait+0x41/0x6b
    [] :autofs4:autofs4_revalidate+0x91/0x149
    [] __lookup_hash+0xa0/0x12f
    [] lookup_create+0x46/0x80
    [] sys_mkdirat+0x56/0xe4

    versus the automount daemon which wants to remove that dentry, but can't
    because the normal process is holding the i_mutex lock:

    automount D ffffffff8014e05a 0 32581 1 32561
    Call Trace:
    [] __mutex_lock_slowpath+0x60/0x9b
    [] do_path_lookup+0x2ca/0x2f1
    [] .text.lock.mutex+0xf/0x14
    [] do_rmdir+0x77/0xde
    [] tracesys+0x71/0xe0
    [] tracesys+0xd5/0xe0

    which means that the system is deadlocked.

    This patch allows autofs to hold up normal processes whilst the daemon goes
    ahead and does things to the dentry tree behind the automouter point without
    risking a deadlock as almost no locks are held in d_manage() and none in
    d_automount().

    Signed-off-by: David Howells
    Was-Acked-by: Ian Kent
    Signed-off-by: Al Viro

    David Howells
     

15 Jan, 2011

2 commits

  • * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
    nfsd4: fix callback restarting
    nfsd: break lease on unlink, link, and rename
    nfsd4: break lease on nfsd setattr
    nfsd: don't support msnfs export option
    nfsd4: initialize cb_per_client
    nfsd4: allow restarting callbacks
    nfsd4: simplify nfsd4_cb_prepare
    nfsd4: give out delegations more quickly in 4.1 case
    nfsd4: add helper function to run callbacks
    nfsd4: make sure sequence flags are set after destroy_session
    nfsd4: re-probe callback on connection loss
    nfsd4: set sequence flag when backchannel is down
    nfsd4: keep finer-grained callback status
    rpc: allow xprt_class->setup to return a preexisting xprt
    rpc: keep backchannel xprt as long as server connection
    rpc: move sk_bc_xprt to svc_xprt
    nfsd4: allow backchannel recovery
    nfsd4: support BIND_CONN_TO_SESSION
    nfsd4: modify session list under cl_lock
    Documentation: fl_mylease no longer exists
    ...

    Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The
    vfs-scale work touched some msnfs cases, and this merge removes support
    for that entirely, so the conflict was trivial to resolve.

    Linus Torvalds
     
  • Ensure a new callback is added to the client's list of callbacks at most
    once.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields