13 Jul, 2017

1 commit

  • svcrdma needs 259 pages allocated to receive 1MB NFSv4.0 WRITE requests:

    - 1 page for the transport header and head iovec
    - 256 pages for the data payload
    - 1 page for the trailing GETATTR request (since NFSD XDR decoding
    does not look for a tail iovec, the GETATTR is stuck at the end
    of the rqstp->rq_arg.pages list)
    - 1 page for building the reply xdr_buf

    But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
    svc_alloc_arg never allocates that many pages. To address this:

    1. The final element of rq_pages always points to NULL. To
    accommodate up to 259 pages in rq_pages, add an extra element
    to rq_pages for the array termination sentinel.

    2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
    is calculated, so it can go up to 259. Bruce noted that the
    calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
    which might not always be true. I didn't change this assumption.

    3. Change the loop boundaries to allow 259 pages to be allocated.

    Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
    which is basically never taken. And there's no need to dump the
    stack here because svc_alloc_arg has only one caller.

    Keeping that NULL "array termination sentinel"; there doesn't appear to
    be any code that depends on it, only code in nfsd_splice_actor() which
    needs the 259th element to be initialized to *something*. So it's
    possible we could just keep the array at 259 elements and drop that
    final NULL, but we're being conservative for now.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

21 Feb, 2017

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Implement wraparound-safe refcount_t and kref_t types based on
    generic atomic primitives (Peter Zijlstra)

    - Improve and fix the ww_mutex code (Nicolai Hähnle)

    - Add self-tests to the ww_mutex code (Chris Wilson)

    - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
    Bueso)

    - Micro-optimize the current-task logic all around the core kernel
    (Davidlohr Bueso)

    - Tidy up after recent optimizations: remove stale code and APIs,
    clean up the code (Waiman Long)

    - ... plus misc fixes, updates and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    fork: Fix task_struct alignment
    locking/spinlock/debug: Remove spinlock lockup detection code
    lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
    lkdtm: Convert to refcount_t testing
    kref: Implement 'struct kref' using refcount_t
    refcount_t: Introduce a special purpose refcount type
    sched/wake_q: Clarify queue reinit comment
    sched/wait, rcuwait: Fix typo in comment
    locking/mutex: Fix lockdep_assert_held() fail
    locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
    locking/rwsem: Reinit wake_q after use
    locking/rwsem: Remove unnecessary atomic_long_t casts
    jump_labels: Move header guard #endif down where it belongs
    locking/atomic, kref: Implement kref_put_lock()
    locking/ww_mutex: Turn off __must_check for now
    locking/atomic, kref: Avoid more abuse
    locking/atomic, kref: Use kref_get_unless_zero() more
    locking/atomic, kref: Kill kref_sub()
    locking/atomic, kref: Add kref_read()
    locking/atomic, kref: Add KREF_INIT()
    ...

    Linus Torvalds
     

14 Jan, 2017

1 commit

  • Since we need to change the implementation, stop exposing internals.

    Provide kref_read() to read the current reference count; typically
    used for debug messages.

    Kills two anti-patterns:

    atomic_read(&kref->refcount)
    kref->refcount.counter

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Jan, 2017

1 commit

  • The inet6addr_chain is an atomic notifier chain, so we can't call
    anything that might sleep (like lock_sock)... instead of closing the
    socket from svc_age_temp_xprts_now (which is called by the notifier
    function), just have the rpc service threads do it instead.

    Cc: stable@vger.kernel.org
    Fixes: c3d4879e01be "sunrpc: Add a function to close..."
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

14 Nov, 2016

1 commit

  • This fixes the following panic that can occur with NFSoRDMA.

    general protection fault: 0000 [#1] SMP
    Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
    scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
    scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
    mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
    ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
    irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
    lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
    ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
    auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
    crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
    syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
    tg3 crct10dif_pclmul drm crct10dif_common
    ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
    dm_log dm_mod
    CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
    Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
    Workqueue: events check_lifetime
    task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
    RIP: 0010:[] []
    _raw_spin_lock_bh+0x17/0x50
    RSP: 0018:ffff88031f587ba8 EFLAGS: 00010206
    RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
    RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
    R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
    R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff880322a40000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
    ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
    0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
    Call Trace:
    [] lock_sock_nested+0x20/0x50
    [] sock_setsockopt+0x78/0x940
    [] ? lock_timer_base.isra.33+0x2b/0x50
    [] kernel_setsockopt+0x4d/0x50
    [] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
    [] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
    [] notifier_call_chain+0x4c/0x70
    [] __blocking_notifier_call_chain+0x4d/0x70
    [] blocking_notifier_call_chain+0x16/0x20
    [] __inet_del_ifa+0x168/0x2d0
    [] check_lifetime+0x25f/0x270
    [] process_one_work+0x17b/0x470
    [] worker_thread+0x126/0x410
    [] ? rescuer_thread+0x460/0x460
    [] kthread+0xcf/0xe0
    [] ? kthread_create_on_node+0x140/0x140
    [] ret_from_fork+0x58/0x90
    [] ? kthread_create_on_node+0x140/0x140
    Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
    44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 0f
    c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
    RIP [] _raw_spin_lock_bh+0x17/0x50
    RSP

    Signed-off-by: Scott Mayhew
    Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
    Reviewed-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

14 Jul, 2016

4 commits


15 Jun, 2016

1 commit

  • The spec allows backchannels for multiple clients to share the same tcp
    connection. When that happens, we need to use the same xprt for all of
    them. Similarly, we need the same xps.

    This fixes list corruption introduced by the multipath code.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields
    Acked-by: Trond Myklebust

    J. Bruce Fields
     

23 May, 2016

1 commit


23 Dec, 2015

1 commit

  • Add a function svc_age_temp_xprts_now() to close temporary transports
    whose xpt_local matches the address passed in server_addr immediately
    instead of waiting for them to be closed by the timer function.

    The function is intended to be used by notifier_blocks that will be
    added to nfsd and lockd that will run when an ip address is deleted.

    This will eliminate the ACK storms and client hangs that occur in
    HA-NFS configurations where nfsd & lockd is left running on the cluster
    nodes all the time and the NFS 'service' is migrated back and forth
    within a short timeframe.

    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

11 Aug, 2015

1 commit


23 Jan, 2015

1 commit


10 Dec, 2014

10 commits

  • ...move the WARN_ON_ONCE inside the following if block since they use
    the same condition.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • These were useful when I was tracking down a race condition between
    svc_xprt_do_enqueue and svc_get_next_xprt.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Testing has shown that the pool->sp_lock can be a bottleneck on a busy
    server. Every time data is received on a socket, the server must take
    that lock in order to dequeue a thread from the sp_threads list.

    Address this problem by eliminating the sp_threads list (which contains
    threads that are currently idle) and replacing it with a RQ_BUSY flag in
    svc_rqst. This allows us to walk the sp_all_threads list under the
    rcu_read_lock and find a suitable thread for the xprt by doing a
    test_and_set_bit.

    Note that we do still have a potential atomicity problem however with
    this approach. We don't want svc_xprt_do_enqueue to set the
    rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned
    zero (which indicates that the thread was idle). But, by the time we
    check that, the bit could be flipped by a waking thread.

    To address this, we acquire a new per-rqst spinlock (rq_lock) and take
    that before doing the test_and_set_bit. If that returns false, then we
    can set rq_xprt and drop the spinlock. Then, when the thread wakes up,
    it must set the bit under the same spinlock and can trust that if it was
    already set then the rq_xprt is also properly set.

    With this scheme, the case where we have an idle thread no longer needs
    to take the highly contended pool->sp_lock at all, and that removes the
    bottleneck.

    That still leaves one issue: What of the case where we walk the whole
    sp_all_threads list and don't find an idle thread? Because the search is
    lockess, it's possible for the queueing to race with a thread that is
    going to sleep. To address that, we queue the xprt and then search again.

    If we find an idle thread at that point, we can't attach the xprt to it
    directly since that might race with a different thread waking up and
    finding it. All we can do is wake the idle thread back up and let it
    attempt to find the now-queued xprt.

    Signed-off-by: Jeff Layton
    Tested-by: Chris Worley
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we'll be removing some spinlocking around the socket
    and thread queueing code in order to fix some contention problems. At
    that point, the stats counters will no longer be protected by the
    sp_lock.

    Change the counters to atomic_long_t fields, except for the
    "sockets_queued" counter which will still be manipulated under a
    spinlock.

    Signed-off-by: Jeff Layton
    Tested-by: Chris Worley
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • The way that svc_wake_up works is a bit inefficient. It walks all of the
    available pools for a service and either wakes up a task in each one or
    sets the SP_TASK_PENDING flag in each one.

    When svc_wake_up is called, there is no need to wake up more than one
    thread to do this work. In practice, only lockd currently uses this
    function and it's single threaded anyway. Thus, this just boils down to
    doing a wake up of a thread in pool 0 or setting a single flag.

    Eliminate the for loop in this function and change it to just operate on
    pool 0. Also update the comments that sit above it and get rid of some
    code that has been commented out for years now.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we'll want to be able to handle this flag without
    holding the sp_lock. Change this field to an unsigned long flags
    field, and declare a new flag in it that can be managed with atomic
    bitops.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • In a later patch, we're going to need some atomic bit flags. Since that
    field will need to be an unsigned long, we mitigate that space
    consumption by migrating some other bitflags to the new field. Start
    with the rq_secure flag.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in
    svc_rqst handling functions", which subsequent server rpc patches from
    jlayton depend on. I'm merging this later tag on the assumption that's
    more likely to be a tested and stable point.

    J. Bruce Fields
     

02 Dec, 2014

1 commit

  • All it does is indicate whether a xprt has already been deleted from
    a list or not, which is unnecessary since we use list_del_init and it's
    always set and checked under the sv_lock anyway.

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

25 Nov, 2014

1 commit


29 Aug, 2014

1 commit

  • current_task appears to be x86-only, oops.

    Let's just delete this check entirely:

    Any developer that adds a new user without setting rq_task will get a
    crash the first time they test it. I also don't think there are
    normally any important locks held here, and I can't see any other reason
    why killing a server thread would bring the whole box down.

    So the effort to fail gracefully here looks like overkill.

    Reported-by: Stephen Rothwell
    Fixes: 983c684466e0 "SUNRPC: get rid of the request wait queue"
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

18 Aug, 2014

5 commits


30 Jul, 2014

2 commits


31 May, 2014

1 commit


23 May, 2014

2 commits

  • If the accept() call fails, we need to put the module reference.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • An NFS/RDMA client's source port is meaningless for RDMA transports.
    The transport layer typically sets the source port value on the
    connection to a random ephemeral port.

    Currently, NFS server administrators must specify the "insecure"
    export option to enable clients to access exports via RDMA.

    But this means NFS clients can access such an export via IP using an
    ephemeral port, which may not be desirable.

    This patch eliminates the need to specify the "insecure" export
    option to allow NFS/RDMA clients access to an export.

    BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

10 Feb, 2014

1 commit

  • Mark functions as static in net/sunrpc/svc_xprt.c because they are not
    used outside this file.

    This eliminates the following warning in net/sunrpc/svc_xprt.c:
    net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes]
    net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes]
    net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David S. Miller

    Rashika Kheria
     

17 Feb, 2013

2 commits

  • Rewrite server shutdown to remove the assumption that there are no
    longer any threads running (no longer true, for example, when shutting
    down the service in one network namespace while it's still running in
    others).

    Do that by doing what we'd do in normal circumstances: just CLOSE each
    socket, then enqueue it.

    Since there may not be threads to handle the resulting queued xprts,
    also run a simplified version of the svc_recv() loop run by a server to
    clean up any closed xprts afterwards.

    Cc: stable@kernel.org
    Tested-by: Jason Tibbitts
    Tested-by: Paweł Sikora
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • svc_age_temp_xprts expires xprts in a two-step process: first it takes
    the sv_lock and moves the xprts to expire off their server-wide list
    (sv_tempsocks or sv_permsocks) to a local list. Then it drops the
    sv_lock and enqueues and puts each one.

    I see no reason for this: svc_xprt_enqueue() will take sp_lock, but the
    sv_lock and sp_lock are not otherwise nested anywhere (and documentation
    at the top of this file claims it's correct to nest these with sp_lock
    inside.)

    Cc: stable@kernel.org
    Tested-by: Jason Tibbitts
    Tested-by: Paweł Sikora
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields