17 Dec, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A comparatively quieter cycle for nfsd this time, but still with two
    larger changes:

    - RPC server scalability improvements from Jeff Layton (using RCU
    instead of a spinlock to find idle threads).

    - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
    Schumaker, enabling fallocate on new clients"

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd4: fix xdr4 count of server in fs_location4
    nfsd4: fix xdr4 inclusion of escaped char
    sunrpc/cache: convert to use string_escape_str()
    sunrpc: only call test_bit once in svc_xprt_received
    fs: nfsd: Fix signedness bug in compare_blob
    sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
    sunrpc: convert to lockless lookup of queued server threads
    sunrpc: fix potential races in pool_stats collection
    sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
    sunrpc: require svc_create callers to pass in meaningful shutdown routine
    sunrpc: have svc_wake_up only deal with pool 0
    sunrpc: convert sp_task_pending flag to use atomic bitops
    sunrpc: move rq_cachetype field to better optimize space
    sunrpc: move rq_splice_ok flag into rq_flags
    sunrpc: move rq_dropme flag into rq_flags
    sunrpc: move rq_usedeferral flag to rq_flags
    sunrpc: move rq_local field to rq_flags
    sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
    nfsd: minor off by one checks in __write_versions()
    sunrpc: release svc_pool_map reference when serv allocation fails
    ...

    Linus Torvalds
     

11 Dec, 2014

1 commit

  • Pull VFS changes from Al Viro:
    "First pile out of several (there _definitely_ will be more). Stuff in
    this one:

    - unification of d_splice_alias()/d_materialize_unique()

    - iov_iter rewrite

    - killing a bunch of ->f_path.dentry users (and f_dentry macro).

    Getting that completed will make life much simpler for
    unionmount/overlayfs, since then we'll be able to limit the places
    sensitive to file _dentry_ to reasonably few. Which allows to have
    file_inode(file) pointing to inode in a covered layer, with dentry
    pointing to (negative) dentry in union one.

    Still not complete, but much closer now.

    - crapectomy in lustre (dead code removal, mostly)

    - "let's make seq_printf return nothing" preparations

    - assorted cleanups and fixes

    There _definitely_ will be more piles"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    copy_from_iter_nocache()
    new helper: iov_iter_kvec()
    csum_and_copy_..._iter()
    iov_iter.c: handle ITER_KVEC directly
    iov_iter.c: convert copy_to_iter() to iterate_and_advance
    iov_iter.c: convert copy_from_iter() to iterate_and_advance
    iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
    iov_iter.c: convert iov_iter_zero() to iterate_and_advance
    iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
    iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
    iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
    iov_iter.c: iterate_and_advance
    iov_iter.c: macros for iterating over iov_iter
    kill f_dentry macro
    dcache: fix kmemcheck warning in switch_names
    new helper: audit_file()
    nfsd_vfs_write(): use file_inode()
    ncpfs: use file_inode()
    kill f_dentry uses
    lockd: get rid of ->f_path.dentry->d_sb
    ...

    Linus Torvalds
     

10 Dec, 2014

2 commits


25 Nov, 2014

1 commit


20 Nov, 2014

1 commit


07 Nov, 2014

1 commit

  • When lockd can't talk to a remote statd, it'll spew a warning message
    to the ring buffer. If the application is really hammering on locks
    however, it's possible for that message to spam the logs. Ratelimit it
    to minimize the potential for harm.

    Reported-by: Ian Collier
    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

12 Oct, 2014

1 commit

  • Pull file locking related changes from Jeff Layton:
    "This release is a little more busy for file locking changes than the
    last:

    - a set of patches from Kinglong Mee to fix the lockowner handling in
    knfsd
    - a pile of cleanups to the internal file lease API. This should get
    us a bit closer to allowing for setlease methods that can block.

    There are some dependencies between mine and Bruce's trees this cycle,
    and I based my tree on top of the requisite patches in Bruce's tree"

    * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
    locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
    locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
    locks: set fl_owner for leases to filp instead of current->files
    locks: give lm_break a return value
    locks: __break_lease cleanup in preparation of allowing direct removal of leases
    locks: remove i_have_this_lease check from __break_lease
    locks: move freeing of leases outside of i_lock
    locks: move i_lock acquisition into generic_*_lease handlers
    locks: define a lm_setup handler for leases
    locks: plumb a "priv" pointer into the setlease routines
    nfsd: don't keep a pointer to the lease in nfs4_file
    locks: clean up vfs_setlease kerneldoc comments
    locks: generic_delete_lease doesn't need a file_lock at all
    nfsd: fix potential lease memory leak in nfs4_setlease
    locks: close potential race in lease_get_mtime
    security: make security_file_set_fowner, f_setown and __f_setown void return
    locks: consolidate "nolease" routines
    locks: remove lock_may_read and lock_may_write
    lockd: rip out deferred lock handling from testlock codepath
    NFSD: Get reference of lockowner when coping file_lock
    ...

    Linus Torvalds
     

09 Oct, 2014

2 commits

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - support the NFSv4.2 SEEK operation (allowing clients to support
    SEEK_HOLE/SEEK_DATA), thanks to Anna.
    - end the grace period early in a number of cases, mitigating a
    long-standing annoyance, thanks to Jeff
    - improve SMP scalability, thanks to Trond"

    * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
    nfsd: eliminate "to_delegation" define
    NFSD: Implement SEEK
    NFSD: Add generic v4.2 infrastructure
    svcrdma: advertise the correct max payload
    nfsd: introduce nfsd4_callback_ops
    nfsd: split nfsd4_callback initialization and use
    nfsd: introduce a generic nfsd4_cb
    nfsd: remove nfsd4_callback.cb_op
    nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
    nfsd: fix nfsd4_cb_recall_done error handling
    nfsd4: clarify how grace period ends
    nfsd4: stop grace_time update at end of grace period
    nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
    nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
    nfsd: serialize nfsdcltrack upcalls for a particular client
    nfsd: pass extra info in env vars to upcalls to allow for early grace period end
    nfsd: add a v4_end_grace file to /proc/fs/nfsd
    lockd: add a /proc/fs/lockd/nlm_end_grace file
    nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
    nfsd: remove redundant boot_time parm from grace_done client tracking op
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable fixes:
    - fix an NFSv4.1 state renewal regression
    - fix open/lock state recovery error handling
    - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
    - fix statd when reconnection fails
    - don't wake tasks during connection abort
    - don't start reboot recovery if lease check fails
    - fix duplicate proc entries

    Features:
    - pNFS block driver fixes and clean ups from Christoph
    - More code cleanups from Anna
    - Improve mmap() writeback performance
    - Replace use of PF_TRANS with a more generic mechanism for avoiding
    deadlocks in nfs_release_page"

    * tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits)
    NFSv4.1: Fix an NFSv4.1 state renewal regression
    NFSv4: fix open/lock state recovery error handling
    NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
    NFS: Fabricate fscache server index key correctly
    SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
    NFSv3: Fix missing includes of nfs3_fs.h
    NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
    NFS: avoid waiting at all in nfs_release_page when congested.
    NFS: avoid deadlocks with loop-back mounted NFS filesystems.
    MM: export page_wakeup functions
    SCHED: add some "wait..on_bit...timeout()" interfaces.
    NFS: don't use STABLE writes during writeback.
    NFSv4: use exponential retry on NFS4ERR_DELAY for async requests.
    rpc: Add -EPERM processing for xs_udp_send_request()
    rpc: return sent and err from xs_sendpages()
    lockd: Try to reconnect if statd has moved
    SUNRPC: Don't wake tasks during connection abort
    Fixing lease renewal
    nfs: fix duplicate proc entries
    pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe
    ...

    Linus Torvalds
     

25 Sep, 2014

1 commit


18 Sep, 2014

2 commits

  • Add a new procfile that will allow a (privileged) userland process to
    end the NLM grace period early. The basic idea here will be to have
    sm-notify write to this file, if it sent out no NOTIFY requests when
    it runs. In that situation, we can generally expect that there will be
    no reclaim requests so the grace period can be lifted early.

    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Currently, all of the grace period handling is part of lockd. Eventually
    though we'd like to be able to build v4-only servers, at which point
    we'll need to put all of this elsewhere.

    Move the code itself into fs/nfs_common and have it build a grace.ko
    module. Then, rejigger the Kconfig options so that both nfsd and lockd
    enable it automatically.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

10 Sep, 2014

3 commits

  • As Kinglong points out, the nlm_block->b_fl field is no longer used at
    all. Also, vfs_test_lock in the generic locking code will only return
    FILE_LOCK_DEFERRED if FL_SLEEP is set, and it isn't here.

    The only other place that returns that value is the DLM lock code, but
    it only does that in dlm_posix_lock, never in dlm_posix_get.

    Remove all of the deferred locking code from the testlock codepath
    since it doesn't appear to ever be used anyway.

    I do have a small concern that this might cause a behavior change in the
    case where you have a block already sitting on the list when the
    testlock request comes in, but that looks like it doesn't really work
    properly anyway. I think it's best to just pass that down to
    vfs_test_lock and let the filesystem report that instead of trying to
    infer what's going on with the lock by looking at an existing block.

    Cc: cluster-devel@redhat.com
    Signed-off-by: Jeff Layton
    Reviewed-by: Kinglong Mee

    Jeff Layton
     
  • Commit d5b9026a67 ([PATCH] knfsd: locks: flag NFSv4-owned locks) using
    fl_lmops field in file_lock for checking nfsd4 lockowner.

    But, commit 1a747ee0cc (locks: don't call ->copy_lock methods on return
    of conflicting locks) causes the fl_lmops of conflock always be NULL.

    Also, commit 0996905f93 (lockd: posix_test_lock() should not call
    locks_copy_lock()) caused the fl_lmops of conflock always be NULL too.

    Make sure copy the private information by fl_copy_lock() in struct
    file_lock_operations, merge __locks_copy_lock() to fl_copy_lock().

    Jeff advice, "Set fl_lmops on conflocks, but don't set fl_ops.
    fl_ops are superfluous, since they are callbacks into the filesystem.
    There should be no need to bother the filesystem at all with info
    in a conflock. But, lock _ownership_ matters for conflocks and that's
    indicated by the fl_lmops. So you really do want to copy the fl_lmops
    for conflocks I think."

    v5: add missing calling of locks_release_private() in nlmsvc_testlock()
    v4: only copy fl_lmops for conflock, don't copy fl_ops

    Signed-off-by: Kinglong Mee
    Signed-off-by: Jeff Layton

    Kinglong Mee
     
  • This argument is always NULL so don't pass it around.

    [jlayton: remove dependencies on previous patches in series]

    Signed-off-by: Joe Perches
    Signed-off-by: Jeff Layton

    Joe Perches
     

09 Sep, 2014

1 commit

  • Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
    then nfs mounting without portmap or rpcbind running using a busybox
    mount resulted in:

    # mount -t nfs 10.30.130.21:/opt /mnt
    svc: failed to register lockdv1 RPC service (errno 111).
    lockd_up: makesock failed, error=-111
    Unable to handle kernel paging request for data at address 0x00000030
    Faulting instruction address: 0xc055e65c
    Oops: Kernel access of bad area, sig: 11 [#1]
    MPC85xx CDS
    Modules linked in:
    CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
    task: cf29cea0 ti: cf35c000 task.ti: cf35c000
    NIP: c055e65c LR: c0566490 CTR: c055e648
    REGS: cf35dad0 TRAP: 0300 Not tainted (3.10.44.cge)
    MSR: 00029000 CR: 22442488 XER: 20000000
    DEAR: 00000030, ESR: 00000000

    GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
    00000000
    GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
    10090ae8
    GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
    bfa46ef0
    GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
    cf0ded80
    NIP [c055e65c] call_start+0x14/0x34
    LR [c0566490] __rpc_execute+0x70/0x250
    Call Trace:
    [cf35db80] [00000080] 0x80 (unreliable)
    [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
    [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
    [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
    [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
    [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
    [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
    [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
    [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
    [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
    [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
    [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
    [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
    [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
    [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
    [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
    [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
    [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
    [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c

    The addition of svc_shutdown_net() resulted in two calls to
    svc_rpcb_cleanup(); the second is no longer necessary and crashes when
    it calls rpcb_register_call with clnt=NULL.

    Reported-by: Nikita Yushchenko
    Fixes: 679b033df484 "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
    Cc: stable@vger.kernel.org
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

03 Sep, 2014

1 commit


18 Aug, 2014

1 commit


24 Jul, 2014

1 commit


11 Jun, 2014

1 commit

  • Pull nfsd updates from Bruce Fields:
    "The largest piece is a long-overdue rewrite of the xdr code to remove
    some annoying limitations: for example, there was no way to return
    ACLs larger than 4K, and readdir results were returned only in 4k
    chunks, limiting performance on large directories.

    Also:
    - part of Neil Brown's work to make NFS work reliably over the
    loopback interface (so client and server can run on the same
    machine without deadlocks). The rest of it is coming through
    other trees.
    - cleanup and bugfixes for some of the server RDMA code, from
    Steve Wise.
    - Various cleanup of NFSv4 state code in preparation for an
    overhaul of the locking, from Jeff, Trond, and Benny.
    - smaller bugfixes and cleanup from Christoph Hellwig and
    Kinglong Mee.

    Thanks to everyone!

    This summer looks likely to be busier than usual for knfsd. Hopefully
    we won't break it too badly; testing definitely welcomed"

    * 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits)
    nfsd4: fix FREE_STATEID lockowner leak
    svcrdma: Fence LOCAL_INV work requests
    svcrdma: refactor marshalling logic
    nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry
    nfs4: remove unused CHANGE_SECURITY_LABEL
    nfsd4: kill READ64
    nfsd4: kill READ32
    nfsd4: simplify server xdr->next_page use
    nfsd4: hash deleg stateid only on successful nfs4_set_delegation
    nfsd4: rename recall_lock to state_lock
    nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound
    nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open
    nfsd4: use recall_lock for delegation hashing
    nfsd: fix laundromat next-run-time calculation
    nfsd: make nfsd4_encode_fattr static
    SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
    nfsd: remove unused function nfsd_read_file
    nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer
    NFSD: Error out when getting more than one fsloc/secinfo/uuid
    NFSD: Using type of uint32_t for ex_nflavors instead of int
    ...

    Linus Torvalds
     

07 Jun, 2014

1 commit


07 May, 2014

3 commits


28 Mar, 2014

1 commit

  • We had a Fedora ABRT report with a stack trace like this:

    kernel BUG at net/sunrpc/svc.c:550!
    invalid opcode: 0000 [#1] SMP
    [...]
    CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
    Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
    task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
    RIP: 0010:[] [] svc_destroy+0x128/0x130 [sunrpc]
    RSP: 0018:ffff88003f9b9de0 EFLAGS: 00010206
    RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
    RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
    RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
    R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
    R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
    FS: 00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
    Stack:
    ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
    ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
    ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
    Call Trace:
    [] lockd_up+0xaa/0x330 [lockd]
    [] nfsd_svc+0x1b5/0x2f0 [nfsd]
    [] ? simple_strtoull+0x2c/0x50
    [] ? write_pool_threads+0x280/0x280 [nfsd]
    [] write_threads+0x8b/0xf0 [nfsd]
    [] ? __get_free_pages+0x14/0x50
    [] ? get_zeroed_page+0x16/0x20
    [] ? simple_transaction_get+0xb1/0xd0
    [] nfsctl_transaction_write+0x48/0x80 [nfsd]
    [] vfs_write+0xb4/0x1f0
    [] ? putname+0x29/0x40
    [] SyS_write+0x49/0xa0
    [] ? __audit_syscall_exit+0x1f6/0x2a0
    [] system_call_fastpath+0x16/0x1b
    Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
    RIP [] svc_destroy+0x128/0x130 [sunrpc]
    RSP

    Evidently, we created some lockd sockets and then failed to create
    others. make_socks then returned an error and we tried to tear down the
    svc, but svc->sv_permsocks was not empty so we ended up tripping over
    the BUG() in svc_destroy().

    Fix this by ensuring that we tear down any live sockets we created when
    socket creation is going to return an error.

    Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...)
    Reported-by: Raphos
    Signed-off-by: Jeff Layton
    Reviewed-by: Stanislav Kinsbursky
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

14 Feb, 2014

1 commit

  • If an NFS client attempts to get a lock (using NLM) and the lock is
    not available, the server will remember the request and when the lock
    becomes available it will send a GRANT request to the client to
    provide the lock.

    If the client already held an adjacent lock, the GRANT callback will
    report the union of the existing and new locks, which can confuse the
    client.

    This happens because __posix_lock_file (called by vfs_lock_file)
    updates the passed-in file_lock structure when adjacent or
    over-lapping locks are found.

    To avoid this problem we take a copy of the two fields that can
    be changed (fl_start and fl_end) before the call and restore them
    afterwards.
    An alternate would be to allocate a 'struct file_lock', initialise it,
    use locks_copy_lock() to take a copy, then locks_release_private()
    after the vfs_lock_file() call. But that is a lot more work.

    Reported-by: Olaf Kirch
    Signed-off-by: NeilBrown
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    --
    v1 had a couple of issues (large on-stack struct and didn't really work properly).
    This version is much better tested.
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

06 Aug, 2013

1 commit

  • Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
    which case we're in entirely the wrong namespace.

    Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
    exit_task_namespaces() outside of exit_notify()) now means that
    exit_task_work() is called after exit_task_namespaces(), which
    triggers an Oops when we're freeing up the locks.

    Fix this by ensuring that we initialise the nlm_host's rpc_client at mount
    time, so that the cl_nodename field is initialised to the value of
    utsname()->nodename that the net namespace uses. Then replace the
    lockd callers of utsname()->nodename.

    Signed-off-by: Trond Myklebust
    Cc: Toralf Förster
    Cc: Oleg Nesterov
    Cc: Nix
    Cc: Jeff Layton
    Cc: stable@vger.kernel.org # 3.10.x

    Trond Myklebust
     

18 Jul, 2013

1 commit


12 Jul, 2013

1 commit

  • In nlmsvc_retry_blocked, the check that the list is non-empty and acquiring
    the pointer of the first entry is unprotected by any lock. This allows a rare
    race condition when there is only one entry on the list. A function such as
    nlmsvc_grant_callback() can be called, which will temporarily remove the entry
    from the list. Between the list_empty() and list_entry(),the list may become
    empty, causing an invalid pointer to be used as an nlm_block, leading to a
    possible crash.

    This patch adds the nlm_block_lock around these calls to prevent concurrent
    use of the nlm_blocked list.

    This was a regression introduced by
    f904be9cc77f361d37d71468b13ff3d1a1823dea "lockd: Mostly remove BKL from
    the server".

    Cc: Bryan Schumaker
    Cc: stable@vger.kernel.org
    Signed-off-by: David Jeffery
    Signed-off-by: J. Bruce Fields

    David Jeffery
     

04 Jul, 2013

1 commit


29 Jun, 2013

3 commits

  • Currently, the hashing that the locking code uses to add these values
    to the blocked_hash is simply calculated using fl_owner field. That's
    valid in most cases except for server-side lockd, which validates the
    owner of a lock based on fl_owner and fl_pid.

    In the case where you have a small number of NFS clients doing a lot
    of locking between different processes, you could end up with all
    the blocked requests sitting in a very small number of hash buckets.

    Add a new lm_owner_key operation to the lock_manager_operations that
    will generate an unsigned long to use as the key in the hashtable.
    That function is only implemented for server-side lockd, and simply
    XORs the fl_owner and fl_pid.

    Signed-off-by: Jeff Layton
    Acked-by: J. Bruce Fields
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Having a global lock that protects all of this code is a clear
    scalability problem. Instead of doing that, move most of the code to be
    protected by the i_lock instead. The exceptions are the global lists
    that the ->fl_link sits on, and the ->fl_block list.

    ->fl_link is what connects these structures to the
    global lists, so we must ensure that we hold those locks when iterating
    over or updating these lists.

    Furthermore, sound deadlock detection requires that we hold the
    blocked_list state steady while checking for loops. We also must ensure
    that the search and update to the list are atomic.

    For the checking and insertion side of the blocked_list, push the
    acquisition of the global lock into __posix_lock_file and ensure that
    checking and update of the blocked_list is done without dropping the
    lock in between.

    On the removal side, when waking up blocked lock waiters, take the
    global lock before walking the blocked list and dequeue the waiters from
    the global list prior to removal from the fl_block list.

    With this, deadlock detection should be race free while we minimize
    excessive file_lock_lock thrashing.

    Finally, in order to avoid a lock inversion problem when handling
    /proc/locks output we must ensure that manipulations of the fl_block
    list are also protected by the file_lock_lock.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

22 Apr, 2013

1 commit

  • After a server reboot, the reclaimer thread will recover all the existing
    locks. For locks that are blocked, however, it will change the value
    of block->b_status to nlm_lck_denied_grace_period in order to signal that
    they need to wake up and resend the original blocking lock request.

    Due to a bug, however, the block->b_status never gets reset after the
    blocked locks have been woken up, and so the process goes into an
    infinite loop of resends until the blocked lock is satisfied.

    Reported-by: Marc Eshel
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     

01 Mar, 2013

1 commit

  • Pull nfsd changes from J Bruce Fields:
    "Miscellaneous bugfixes, plus:

    - An overhaul of the DRC cache by Jeff Layton. The main effect is
    just to make it larger. This decreases the chances of intermittent
    errors especially in the UDP case. But we'll need to watch for any
    reports of performance regressions.

    - Containerized nfsd: with some limitations, we now support
    per-container nfs-service, thanks to extensive work from Stanislav
    Kinsbursky over the last year."

    Some notes about conflicts, since there were *two* non-data semantic
    conflicts here:

    - idr_remove_all() had been added by a memory leak fix, but has since
    become deprecated since idr_destroy() does it for us now.

    - xs_local_connect() had been added by this branch to make AF_LOCAL
    connections be synchronous, but in the meantime Trond had changed the
    calling convention in order to avoid a RCU dereference.

    There were a couple of more obvious actual source-level conflicts due to
    the hlist traversal changes and one just due to code changes next to
    each other, but those were trivial.

    * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
    SUNRPC: make AF_LOCAL connect synchronous
    nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
    svcrpc: fix rpc server shutdown races
    svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    lockd: nlmclnt_reclaim(): avoid stack overflow
    nfsd: enable NFSv4 state in containers
    nfsd: disable usermode helper client tracker in container
    nfsd: use proper net while reading "exports" file
    nfsd: containerize NFSd filesystem
    nfsd: fix comments on nfsd_cache_lookup
    SUNRPC: move cache_detail->cache_request callback call to cache_read()
    SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
    SUNRPC: rework cache upcall logic
    SUNRPC: introduce cache_detail->cache_request callback
    NFS: simplify and clean cache library
    NFS: use SUNRPC cache creation and destruction helper for DNS cache
    nfsd4: free_stid can be static
    nfsd: keep a checksum of the first 256 bytes of request
    sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
    sunrpc: fix comment in struct xdr_buf definition
    ...

    Linus Torvalds
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


20 Feb, 2013

1 commit

  • Currently, nlmclnt_lock will break out of the for(;;) loop when
    the reclaimer wakes up the blocking lock thread by setting
    nlm_lck_denied_grace_period. This causes the lock request to fail
    with an ENOLCK error.
    The intention was always to ensure that we resend the lock request
    after the grace period has expired.

    Reported-by: Wangyuan Zhang
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust