04 Sep, 2013

1 commit


08 Aug, 2013

1 commit


09 Jul, 2013

1 commit

  • RFC 5661 allows a client to destroy a session using a compound
    associated with the destroyed session, as long as the DESTROY_SESSION op
    is the last op of the compound.

    We attempt to allow this, but testing against a Solaris client (which
    does destroy sessions in this way) showed that we were failing the
    DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
    count on the session (held by us) represented another rpc in progress
    over this session.

    Fix this by noting that in this case the expected reference count is 1,
    not 0.

    Also, note as long as the session holds a reference to the compound
    we're destroying, we can't free it here--instead, delay the free till
    the final put in nfs4svc_encode_compoundres.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

02 Jul, 2013

4 commits


15 May, 2013

2 commits


13 May, 2013

1 commit

  • This enables NFSv4.2 support for the server. To enable this
    code do the following:
    echo "+4.2" >/proc/fs/nfsd/versions

    after the nfsd kernel module is loaded.

    On its own this does nothing except allow the server to respond to
    compounds with minorversion set to 2. All the new NFSv4.2 features are
    optional, so this is perfectly legal.

    Signed-off-by: Steve Dickson
    Signed-off-by: J. Bruce Fields

    Steve Dickson
     

01 May, 2013

2 commits

  • If nfsd4_do_encode_secinfo() can't find GSS info that matches an
    export security flavor, it assumes the flavor is not a GSS
    pseudoflavor, and simply puts it on the wire.

    However, if this XDR encoding logic is given a legitimate GSS
    pseudoflavor but the RPC layer says it does not support that
    pseudoflavor for some reason, then the server leaks GSS pseudoflavor
    numbers onto the wire.

    I confirmed this happens by blacklisting rpcsec_gss_krb5, then
    attempted a client transition from the pseudo-fs to a Kerberos-only
    share. The client received a flavor list containing the Kerberos
    pseudoflavor numbers, rather than GSS tuples.

    The encoder logic can check that each pseudoflavor in flavs[] is
    less than MAXFLAVOR before writing it into the buffer, to prevent
    this. But after "nflavs" is written into the XDR buffer, the
    encoder can't skip writing flavor information into the buffer when
    it discovers the RPC layer doesn't support that flavor.

    So count the number of valid flavors as they are written into the
    XDR buffer, then write that count into a placeholder in the XDR
    buffer when all recognized flavors have been encoded.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

30 Apr, 2013

1 commit


24 Apr, 2013

1 commit

  • The seconds field of an nfstime4 structure is 64bit, but we are assuming
    that the first 32bits are zero-filled. So if the client tries to set
    atime to a value before the epoch (touch -t 196001010101), then the
    server will save the wrong value on disk.

    Signed-off-by: Bryan Schumaker
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    Bryan Schumaker
     

17 Apr, 2013

1 commit


08 Apr, 2013

1 commit

  • Closed stateid's are kept around a little while to handle close replays
    in the 4.0 case. So we stash them in the last-used stateid in the
    oo_last_closed_stateid field of the open owner. We can free that in
    encode_seqid_op_tail once the seqid on the open owner is next
    incremented. But we don't want to do that on the close itself; so we
    set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
    first time through encode_seqid_op_tail, then when we see that flag set
    next time we free it.

    This is unnecessarily baroque.

    Instead, just move the logic that increments the seqid out of the xdr
    code and into the operation code itself.

    The justification given for the current placement is that we need to
    wait till the last minute to be sure we know whether the status is a
    sequence-id-mutating error or not, but examination of the code shows
    that can't actually happen.

    Reported-by: Yanchuan Nian
    Tested-by: Yanchuan Nian
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

03 Apr, 2013

3 commits

  • When a setclientid_confirm or create_session confirms a client after a
    client reboot, it also destroys any previous state held by that client.

    The shutdown of that previous state must be careful not to free the
    client out from under threads processing other requests that refer to
    the client.

    This is a particular problem in the NFSv4.1 case when we hold a
    reference to a session (hence a client) throughout compound processing.

    The server attempts to handle this by unhashing the client at the time
    it's destroyed, then delaying the final free to the end. But this still
    leaves some races in the current code.

    I believe it's simpler just to fail the attempt to destroy the client by
    returning NFS4ERR_DELAY. This is a case that should never happen
    anyway.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Dropping the session's reference count after the client's means we leave
    a window where the session's se_client pointer is NULL. An xpt_user
    callback that encounters such a session may then crash:

    [ 303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
    [ 303.959061] IP: [] _raw_spin_lock+0x1e/0x40
    [ 303.959061] PGD 37811067 PUD 3d498067 PMD 0
    [ 303.959061] Oops: 0002 [#8] PREEMPT SMP
    [ 303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
    [ 303.959061] CPU 0
    [ 303.959061] Pid: 264, comm: nfsd Tainted: G D 3.8.0-ARCH+ #156 Bochs Bochs
    [ 303.959061] RIP: 0010:[] [] _raw_spin_lock+0x1e/0x40
    [ 303.959061] RSP: 0018:ffff880037877dd8 EFLAGS: 00010202
    [ 303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
    [ 303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
    [ 303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
    [ 303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
    [ 303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
    [ 303.959061] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
    [ 303.959061] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
    [ 303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
    [ 303.959061] Stack:
    [ 303.959061] ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
    [ 303.959061] ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
    [ 303.959061] 0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
    [ 303.959061] Call Trace:
    [ 303.959061] [] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
    [ 303.959061] [] svc_delete_xprt+0x10f/0x180 [sunrpc]
    [ 303.959061] [] svc_recv+0xe6/0x580 [sunrpc]
    [ 303.959061] [] nfsd+0xb5/0x140 [nfsd]
    [ 303.959061] [] ? nfsd_destroy+0x90/0x90 [nfsd]
    [ 303.959061] [] kthread+0xc0/0xd0
    [ 303.959061] [] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
    [ 303.959061] [] ? kthread_freezable_should_stop+0x70/0x70
    [ 303.959061] [] ret_from_fork+0x7c/0xb0
    [ 303.959061] [] ? kthread_freezable_should_stop+0x70/0x70
    [ 303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
    [ 303.959061] RIP [] _raw_spin_lock+0x1e/0x40
    [ 303.959061] RSP
    [ 303.959061] CR2: 0000000000000318
    [ 304.001218] ---[ end trace 2d809cd4a7931f5a ]---
    [ 304.001903] note: nfsd[264] exited with preempt_count 2

    Reported-by: Bryan Schumaker
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • If a client sets an owner (or group_owner or acl) attribute on open for
    create, and the mapping of that owner to an id fails, then we return
    BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't
    shortcut the open processing that case: we have to at least look up the
    owner so we can find the seqid to bump.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

30 Mar, 2013

1 commit


27 Mar, 2013

1 commit

  • Since we only enforce an upper bound, not a lower bound, a "negative"
    length can get through here.

    The symptom seen was a warning when we attempt to a kmalloc with an
    excessive size.

    Reported-by: Toralf Förster
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

01 Mar, 2013

1 commit

  • Pull nfsd changes from J Bruce Fields:
    "Miscellaneous bugfixes, plus:

    - An overhaul of the DRC cache by Jeff Layton. The main effect is
    just to make it larger. This decreases the chances of intermittent
    errors especially in the UDP case. But we'll need to watch for any
    reports of performance regressions.

    - Containerized nfsd: with some limitations, we now support
    per-container nfs-service, thanks to extensive work from Stanislav
    Kinsbursky over the last year."

    Some notes about conflicts, since there were *two* non-data semantic
    conflicts here:

    - idr_remove_all() had been added by a memory leak fix, but has since
    become deprecated since idr_destroy() does it for us now.

    - xs_local_connect() had been added by this branch to make AF_LOCAL
    connections be synchronous, but in the meantime Trond had changed the
    calling convention in order to avoid a RCU dereference.

    There were a couple of more obvious actual source-level conflicts due to
    the hlist traversal changes and one just due to code changes next to
    each other, but those were trivial.

    * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
    SUNRPC: make AF_LOCAL connect synchronous
    nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
    svcrpc: fix rpc server shutdown races
    svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    lockd: nlmclnt_reclaim(): avoid stack overflow
    nfsd: enable NFSv4 state in containers
    nfsd: disable usermode helper client tracker in container
    nfsd: use proper net while reading "exports" file
    nfsd: containerize NFSd filesystem
    nfsd: fix comments on nfsd_cache_lookup
    SUNRPC: move cache_detail->cache_request callback call to cache_read()
    SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
    SUNRPC: rework cache upcall logic
    SUNRPC: introduce cache_detail->cache_request callback
    NFS: simplify and clean cache library
    NFS: use SUNRPC cache creation and destruction helper for DNS cache
    nfsd4: free_stid can be static
    nfsd: keep a checksum of the first 256 bytes of request
    sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
    sunrpc: fix comment in struct xdr_buf definition
    ...

    Linus Torvalds
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit


13 Feb, 2013

2 commits

  • Change uid and gid in struct nfsd4_cb_sec to be of type kuid_t and
    kgid_t.

    In nfsd4_decode_cb_sec when reading uids and gids off the wire convert
    them to kuids and kgids, and if they don't convert to valid kuids or
    valid kuids ignore RPC_AUTH_UNIX and don't fill in any of the fields.

    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • In struct nfs4_ace remove the member who and replace it with an
    anonymous union holding who_uid and who_gid. Allowing typesafe
    storage uids and gids.

    Add a helper pace_gt for sorting posix_acl_entries.

    In struct posix_user_ace_state to replace uid with a union
    of kuid_t uid and kgid_t gid.

    Remove all initializations of the deprecated posic_acl_entry
    e_id field. Which is not present when user namespaces are enabled.

    Split find_uid into two functions find_uid and find_gid that work
    in a typesafe manner.

    In nfs4xdr update nfsd4_encode_fattr to deal with the changes
    in struct nfs4_ace.

    Rewrite nfsd4_encode_name to take a kuid_t and a kgid_t instead
    of a generic id and flag if it is a group or a uid. Replace
    the group flag with a test for a valid gid.

    Modify nfsd4_encode_user to take a kuid_t and call the modifed
    nfsd4_encode_name.

    Modify nfsd4_encode_group to take a kgid_t and call the modified
    nfsd4_encode_name.

    Modify nfsd4_encode_aclname to take an ace instead of taking the
    fields of an ace broken out. This allows it to detect if the ace is
    for a user or a group and to pass the appropriate value while still
    being typesafe.

    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

24 Jan, 2013

1 commit

  • It seems slightly simpler to make nfsd4_encode_fattr rather than its
    callers responsible for advancing the write pointer on success.

    (Also: the count == 0 check in the verify case looks superfluous.
    Running out of buffer space is really the only reason fattr encoding
    should fail with eresource.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

18 Dec, 2012

2 commits


03 Dec, 2012

1 commit


28 Nov, 2012

2 commits


26 Nov, 2012

5 commits

  • Our server rejects compounds containing more than one write operation.
    It's unclear whether this is really permitted by the spec; with 4.0,
    it's possibly OK, with 4.1 (which has clearer limits on compound
    parameters), it's probably not OK. No client that we're aware of has
    ever done this, but in theory it could be useful.

    The source of the limitation: we need an array of iovecs to pass to the
    write operation. In the worst case that array of iovecs could have
    hundreds of elements (the maximum rwsize divided by the page size), so
    it's too big to put on the stack, or in each compound op. So we instead
    keep a single such array in the compound argument.

    We fill in that array at the time we decode the xdr operation.

    But we decode every op in the compound before executing any of them. So
    once we've used that array we can't decode another write.

    If we instead delay filling in that array till the time we actually
    perform the write, we can reuse it.

    Another option might be to switch to decoding compound ops one at a
    time. I considered doing that, but it has a number of other side
    effects, and I'd rather fix just this one problem for now.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • In preparation for moving some of this elsewhere.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • In preparation for moving some of it elsewhere.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The comment here is totally bogus:
    - OP_WRITE + 1 is RELEASE_LOCKOWNER. Maybe there was some older
    version of the spec in which that served as a sort of
    OP_ILLEGAL? No idea, but it's clearly wrong now.
    - In any case, I can't see that the spec says anything about
    what to do if the client sends us less ops than promised.
    It's clearly nutty client behavior, and we should do
    whatever's easiest: returning an xdr error (even though it
    won't be consistent with the error on the last op returned)
    seems fine to me.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

08 Nov, 2012

3 commits