12 Jun, 2020

2 commits

  • Pull NFS client updates from Anna Schumaker:
    "New features and improvements:
    - Sunrpc receive buffer sizes only change when establishing a GSS credentials
    - Add more sunrpc tracepoints
    - Improve on tracepoints to capture internal NFS I/O errors

    Other bugfixes and cleanups:
    - Move a dprintk() to after a call to nfs_alloc_fattr()
    - Fix off-by-one issues in rpc_ntop6
    - Fix a few coccicheck warnings
    - Use the correct SPDX license identifiers
    - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
    - Replace zero-length array with flexible array
    - Remove duplicate headers
    - Set invalid blocks after NFSv4 writes to update space_used attribute
    - Fix direct WRITE throughput regression"

    * tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
    NFS: Fix direct WRITE throughput regression
    SUNRPC: rpc_xprt lifetime events should record xprt->state
    xprtrdma: Make xprt_rdma_slot_table_entries static
    nfs: set invalid blocks after NFSv4 writes
    NFS: remove redundant initialization of variable result
    sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
    NFS: Add a tracepoint in nfs_set_pgio_error()
    NFS: Trace short NFS READs
    NFS: nfs_xdr_status should record the procedure name
    SUNRPC: Set SOFTCONN when destroying GSS contexts
    SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
    SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
    SUNRPC: trace RPC client lifetime events
    SUNRPC: Trace transport lifetime events
    SUNRPC: Split the xdr_buf event class
    SUNRPC: Add tracepoint to rpc_call_rpcerror()
    SUNRPC: Update the RPC_SHOW_SOCKET() macro
    SUNRPC: Update the rpc_show_task_flags() macro
    SUNRPC: Trace GSS context lifetimes
    SUNRPC: receive buffer size estimation values almost never change
    ...

    Linus Torvalds
     
  • To help tie the recorded xdr_buf to a particular RPC transaction,
    the client side version of this class should display task ID
    information and the server side one should show the request's XID.

    Signed-off-by: Chuck Lever
    Signed-off-by: Anna Schumaker

    Chuck Lever
     

21 May, 2020

1 commit

  • - Rename these so they are easy to enable and search for as a set
    - Move the tracepoints to get a more accurate sense of control flow
    - Tracepoints should not fire on xprt shutdown
    - Display memory address in case data structure had been corrupted
    - Abandon dprintk in these paths

    I haven't ever gotten one of these tracepoints to trigger. I wonder
    if we should simply remove them.

    Signed-off-by: Chuck Lever

    Chuck Lever
     

18 May, 2020

3 commits

  • In lieu of dprintks or tracepoints in each individual transport
    implementation, introduce tracepoints in the generic part of the RPC
    layer. These typically fire for connection lifetime events, so
    shouldn't contribute a lot of noise.

    Signed-off-by: Chuck Lever

    Chuck Lever
     
  • Capture transport creation failures.

    Signed-off-by: Chuck Lever

    Chuck Lever
     
  • It appears that the RPC/RDMA transport does not need serialization
    of calls to its xpo_sendto method. Move the mutex into the socket
    methods that still need that serialization.

    Tail latencies are unambiguously better with this patch applied.
    fio randrw 8KB 70/30 on NFSv3, smaller numbers are better:

    clat percentiles (usec):

    With xpt_mutex:
    r | 99.99th=[ 8848]
    w | 99.99th=[ 9634]

    Without xpt_mutex:
    r | 99.99th=[ 8586]
    w | 99.99th=[ 8979]

    Serializing the construction of RPC/RDMA transport headers is not
    really necessary at this point, because the Linux NFS server
    implementation never changes its credit grant on a connection. If
    that should change, then svc_rdma_sendto will need to serialize
    access to the transport's credit grant fields.

    Reported-by: kbuild test robot
    [ cel: fix uninitialized variable warning ]
    Signed-off-by: Chuck Lever

    Chuck Lever
     

18 Apr, 2020

2 commits

  • Utilize the xpo_release_rqst transport method to ensure that each
    rqstp's svc_rdma_recv_ctxt object is released even when the server
    cannot return a Reply for that rqstp.

    Without this fix, each RPC whose Reply cannot be sent leaks one
    svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
    Receive buffer, and any pages that might be part of the Reply
    message.

    The leak is infrequent unless the network fabric is unreliable or
    Kerberos is in use, as GSS sequence window overruns, which result
    in connection loss, are more common on fast transports.

    Fixes: 3a88092ee319 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
    Signed-off-by: Chuck Lever

    Chuck Lever
     
  • Currently, after the forward channel connection goes away,
    backchannel operations are causing soft lockups on the server
    because call_transmit_status's SOFTCONN logic ignores ENOTCONN.
    Such backchannel Calls are aggressively retried until the client
    reconnects.

    Backchannel Calls should use RPC_TASK_NOCONNECT rather than
    RPC_TASK_SOFTCONN. If there is no forward connection, the server is
    not capable of establishing a connection back to the client, thus
    that backchannel request should fail before the server attempts to
    send it. Commit 58255a4e3ce5 ("NFSD: NFSv4 callback client should
    use RPC_TASK_SOFTCONN") was merged several years before
    RPC_TASK_NOCONNECT was available.

    Because setup_callback_client() explicitly sets NOPING, the NFSv4.0
    callback connection depends on the first callback RPC to initiate
    a connection to the client. Thus NFSv4.0 needs to continue to use
    RPC_TASK_SOFTCONN.

    Suggested-by: Trond Myklebust
    Signed-off-by: Chuck Lever
    Cc: # v4.20+

    Chuck Lever
     

28 Mar, 2020

1 commit

  • 'maxlen' is the total size of the destination buffer. There is only one
    caller and this value is 256.

    When we compute the size already used and what we would like to add in
    the buffer, the trailling NULL character is not taken into account.
    However, this trailling character will be added by the 'strcat' once we
    have checked that we have enough place.

    So, there is a off-by-one issue and 1 byte of the stack could be
    erroneously overwridden.

    Take into account the trailling NULL, when checking if there is enough
    place in the destination buffer.

    While at it, also replace a 'sprintf' by a safer 'snprintf', check for
    output truncation and avoid a superfluous 'strlen'.

    Fixes: dc9a16e49dbba ("svc: Add /proc/sys/sunrpc/transport files")
    Signed-off-by: Christophe JAILLET
    [ cel: very minor fix to documenting comment
    Signed-off-by: Chuck Lever

    Christophe JAILLET
     

17 Mar, 2020

1 commit


04 Jul, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

24 Apr, 2019

2 commits


07 Feb, 2019

3 commits

  • In the rpc server, When something happens that might be reason to wake
    up a thread to do something, what we do is

    - modify xpt_flags, sk_sock->flags, xpt_reserved, or
    xpt_nr_rqsts to indicate the new situation
    - call svc_xprt_enqueue() to decide whether to wake up a thread.

    svc_xprt_enqueue may require multiple conditions to be true before
    queueing up a thread to handle the xprt. In the SMP case, one of the
    other CPU's may have set another required condition, and in that case,
    although both CPUs run svc_xprt_enqueue(), it's possible that neither
    call sees the writes done by the other CPU in time, and neither one
    recognizes that all the required conditions have been set. A socket
    could therefore be ignored indefinitely.

    Add memory barries to ensure that any svc_xprt_enqueue() call will
    always see the conditions changed by other CPUs before deciding to
    ignore a socket.

    I've never seen this race reported. In the unlikely event it happens,
    another event will usually come along and the problem will fix itself.
    So I don't think this is worth backporting to stable.

    Chuck tried this patch and said "I don't see any performance
    regressions, but my server has only a single last-level CPU cache."

    Tested-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The long name seemed cute till I wanted to refer to it somewhere else.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Use READ_ONCE() to tell the compiler to not optimse away the read of
    xprt->xpt_flags in svc_xprt_release_slot().

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

28 Dec, 2018

2 commits

  • _svc_create_xprt() returns positive port number
    so its non-zero return value is not an error

    Reviewed-by: Jeff Layton
    Signed-off-by: Vasily Averin
    Signed-off-by: J. Bruce Fields

    Vasily Averin
     
  • if node have NFSv41+ mounts inside several net namespaces
    it can lead to use-after-free in svc_process_common()

    svc_process_common()
    /* Setup reply header */
    rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE

    svc_process_common() can use incorrect rqstp->rq_xprt,
    its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
    The problem is that serv is global structure but sv_bc_xprt
    is assigned per-netnamespace.

    According to Trond, the whole "let's set up rqstp->rq_xprt
    for the back channel" is nothing but a giant hack in order
    to work around the fact that svc_process_common() uses it
    to find the xpt_ops, and perform a couple of (meaningless
    for the back channel) tests of xpt_flags.

    All we really need in svc_process_common() is to be able to run
    rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()

    Bruce J Fields points that this xpo_prep_reply_hdr() call
    is an awfully roundabout way just to do "svc_putnl(resv, 0);"
    in the tcp case.

    This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
    now it calls svc_process_common() with rqstp->rq_xprt = NULL.

    To adjust reply header svc_process_common() just check
    rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.

    To handle rqstp->rq_xprt = NULL case in functions called from
    svc_process_common() patch intruduces net namespace pointer
    svc_rqst->rq_bc_net and adjust SVC_NET() definition.
    Some other function was also adopted to properly handle described case.

    Signed-off-by: Vasily Averin
    Cc: stable@vger.kernel.org
    Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup")
    Signed-off-by: J. Bruce Fields

    Vasily Averin
     

31 Oct, 2018

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Olga added support for the NFSv4.2 asynchronous copy protocol. We
    already supported COPY, by copying a limited amount of data and then
    returning a short result, letting the client resend. The asynchronous
    protocol should offer better performance at the expense of some
    complexity.

    The other highlight is Trond's work to convert the duplicate reply
    cache to a red-black tree, and to move it and some other server caches
    to RCU. (Previously these have meant taking global spinlocks on every
    RPC)

    Otherwise, some RDMA work and miscellaneous bugfixes"

    * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits)
    lockd: fix access beyond unterminated strings in prints
    nfsd: Fix an Oops in free_session()
    nfsd: correctly decrement odstate refcount in error path
    svcrdma: Increase the default connection credit limit
    svcrdma: Remove try_module_get from backchannel
    svcrdma: Remove ->release_rqst call in bc reply handler
    svcrdma: Reduce max_send_sges
    nfsd: fix fall-through annotations
    knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
    knfsd: Further simplify the cache lookup
    knfsd: Simplify NFS duplicate replay cache
    knfsd: Remove dead code from nfsd_cache_lookup
    SUNRPC: Simplify TCP receive code
    SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
    SUNRPC: Remove non-RCU protected lookup
    NFS: Fix up a typo in nfs_dns_ent_put
    NFS: Lockless DNS lookups
    knfsd: Lockless lookup of NFSv4 identities.
    SUNRPC: Lockless server RPCSEC_GSS context lookup
    knfsd: Allow lockless lookups of the exports
    ...

    Linus Torvalds
     

30 Oct, 2018

1 commit

  • In call_xpt_users(), we delete the entry from the list, but we
    do not reinitialise it. This triggers the list poisoning when
    we later call unregister_xpt_user() in nfsd4_del_conns().

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

01 Oct, 2018

1 commit


04 Apr, 2018

8 commits

  • Record the time between when a rqstp is enqueued on a transport
    and when it is dequeued. This includes how long the rqstp waits on
    the queue and how long it takes the kernel scheduler to wake a
    nfsd thread to service it.

    The svc_xprt_dequeue trace point is altered to include the number
    of microseconds between xprt_enqueue and xprt_dequeue.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce a mechanism to report the server-side execution latency of
    each RPC. The goal is to enable user space to filter the trace
    record for latency outliers, build histograms, etc.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • TP_printk defines a format string that is passed to user space for
    converting raw trace event records to something human-readable.

    My user space's printf (Oracle Linux 7), however, does not have a
    %pI format specifier. The result is that what is supposed to be an
    IP address in the output of "trace-cmd report" is just a string that
    says the field couldn't be displayed.

    To fix this, adopt the same approach as the client: maintain a pre-
    formated presentation address for occasions when %pI is not
    available.

    The location of the trace_svc_send trace point is adjusted so that
    rqst->rq_xprt is not NULL when the trace event is recorded.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • There doesn't seem to be a lot of value in calling trace_svc_recv
    in the failing case.

    1. There are two very common cases: one is the transport is not
    ready, and the other is shutdown. Neither is terribly interesting.

    2. The trace record for the failing case contains nothing but
    the status code.

    Therefore the trace point call site in the error exit is removed.
    Since the trace point is now recording a length instead of a
    status, rename the status field and remove the case that records a
    zero XID.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • There are three cases where svc_xprt_do_enqueue() returns without
    waking an nfsd thread:

    1. There is no work to do

    2. The transport is already busy

    3. There are no available nfsd threads

    Only 3. is truly interesting. Move the trace point so it records
    that there was work to do and either an nfsd thread was awoken, or
    a free one could not found.

    As an additional clean up, remove a redundant comment and a couple
    of dprintk call sites.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Reduce the amount of noise generated by trace_svc_xprt_dequeue by
    moving it to the end of svc_get_next_xprt. This generates exactly
    one trace event when a ready xprt is found, rather than spurious
    events when there is no work to do. The empty events contain no
    information that can't be obtained simply by tracing function calls
    to svc_xprt_dequeue.

    A small additional benefit is simplification of the svc_xprt_event
    trace class, which no longer has to handle the case when the @xprt
    parameter is NULL.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Instead of returning a value that is used to set or clear
    a bit, just make ->xpo_secure_port mangle that bit, and return void.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Noticed during code inspection that there is already a
    local automatic variable "xprt" so dereferencing rqst->rq_xprt
    again is unnecessary.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

22 Nov, 2017

1 commit

  • With all callbacks converted, and the timer callback prototype
    switched over, the TIMER_FUNC_TYPE cast is no longer needed,
    so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
    $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
    $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

    The now unused macros are also dropped from include/linux/timer.h.

    Signed-off-by: Kees Cook

    Kees Cook
     

19 Nov, 2017

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Lots of good bugfixes, including:

    - fix a number of races in the NFSv4+ state code

    - fix some shutdown crashes in multiple-network-namespace cases

    - relax our 4.1 session limits; if you've an artificially low limit
    to the number of 4.1 clients that can mount simultaneously, try
    upgrading"

    * tag 'nfsd-4.15' of git://linux-nfs.org/~bfields/linux: (22 commits)
    SUNRPC: Improve ordering of transport processing
    nfsd: deal with revoked delegations appropriately
    svcrdma: Enqueue after setting XPT_CLOSE in completion handlers
    nfsd: use nfs->ns.inum as net ID
    rpc: remove some BUG()s
    svcrdma: Preserve CB send buffer across retransmits
    nfds: avoid gettimeofday for nfssvc_boot time
    fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t
    fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t
    fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t
    lockd: double unregister of inetaddr notifiers
    nfsd4: catch some false session retries
    nfsd4: fix cached replies to solo SEQUENCE compounds
    sunrcp: make function _svc_create_xprt static
    SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
    nfsd: use ARRAY_SIZE
    nfsd: give out fewer session slots as limit approaches
    nfsd: increase DRC cache limit
    nfsd: remove unnecessary nofilehandle checks
    nfs_common: convert int to bool
    ...

    Linus Torvalds
     

08 Nov, 2017

2 commits

  • Since it can take a while before a specific thread gets scheduled, it
    is better to just implement a first come first served queue mechanism.
    That way, if a thread is already scheduled and is idle, it can pick up
    the work to do from the queue.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • The function _svc_create_xprt is local to the source and
    does not need to be in global scope, so make it static.

    Cleans up sparse warning:
    symbol '_svc_create_xprt' was not declared. Should it be static?

    Signed-off-by: Colin Ian King
    Reviewed-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Colin Ian King
     

18 Oct, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Trond Myklebust
    Cc: Anna Schumaker
    Cc: "J. Bruce Fields"
    Cc: Jeff Layton
    Cc: "David S. Miller"
    Cc: linux-nfs@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

13 Jul, 2017

1 commit

  • svcrdma needs 259 pages allocated to receive 1MB NFSv4.0 WRITE requests:

    - 1 page for the transport header and head iovec
    - 256 pages for the data payload
    - 1 page for the trailing GETATTR request (since NFSD XDR decoding
    does not look for a tail iovec, the GETATTR is stuck at the end
    of the rqstp->rq_arg.pages list)
    - 1 page for building the reply xdr_buf

    But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
    svc_alloc_arg never allocates that many pages. To address this:

    1. The final element of rq_pages always points to NULL. To
    accommodate up to 259 pages in rq_pages, add an extra element
    to rq_pages for the array termination sentinel.

    2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
    is calculated, so it can go up to 259. Bruce noted that the
    calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
    which might not always be true. I didn't change this assumption.

    3. Change the loop boundaries to allow 259 pages to be allocated.

    Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
    which is basically never taken. And there's no need to dump the
    stack here because svc_alloc_arg has only one caller.

    Keeping that NULL "array termination sentinel"; there doesn't appear to
    be any code that depends on it, only code in nfsd_splice_actor() which
    needs the 259th element to be initialized to *something*. So it's
    possible we could just keep the array at 259 elements and drop that
    final NULL, but we're being conservative for now.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

21 Feb, 2017

1 commit

  • Pull locking updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Implement wraparound-safe refcount_t and kref_t types based on
    generic atomic primitives (Peter Zijlstra)

    - Improve and fix the ww_mutex code (Nicolai Hähnle)

    - Add self-tests to the ww_mutex code (Chris Wilson)

    - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
    Bueso)

    - Micro-optimize the current-task logic all around the core kernel
    (Davidlohr Bueso)

    - Tidy up after recent optimizations: remove stale code and APIs,
    clean up the code (Waiman Long)

    - ... plus misc fixes, updates and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    fork: Fix task_struct alignment
    locking/spinlock/debug: Remove spinlock lockup detection code
    lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
    lkdtm: Convert to refcount_t testing
    kref: Implement 'struct kref' using refcount_t
    refcount_t: Introduce a special purpose refcount type
    sched/wake_q: Clarify queue reinit comment
    sched/wait, rcuwait: Fix typo in comment
    locking/mutex: Fix lockdep_assert_held() fail
    locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
    locking/rwsem: Reinit wake_q after use
    locking/rwsem: Remove unnecessary atomic_long_t casts
    jump_labels: Move header guard #endif down where it belongs
    locking/atomic, kref: Implement kref_put_lock()
    locking/ww_mutex: Turn off __must_check for now
    locking/atomic, kref: Avoid more abuse
    locking/atomic, kref: Use kref_get_unless_zero() more
    locking/atomic, kref: Kill kref_sub()
    locking/atomic, kref: Add kref_read()
    locking/atomic, kref: Add KREF_INIT()
    ...

    Linus Torvalds
     

14 Jan, 2017

1 commit

  • Since we need to change the implementation, stop exposing internals.

    Provide kref_read() to read the current reference count; typically
    used for debug messages.

    Kills two anti-patterns:

    atomic_read(&kref->refcount)
    kref->refcount.counter

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

13 Jan, 2017

1 commit

  • The inet6addr_chain is an atomic notifier chain, so we can't call
    anything that might sleep (like lock_sock)... instead of closing the
    socket from svc_age_temp_xprts_now (which is called by the notifier
    function), just have the rpc service threads do it instead.

    Cc: stable@vger.kernel.org
    Fixes: c3d4879e01be "sunrpc: Add a function to close..."
    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

14 Nov, 2016

1 commit

  • This fixes the following panic that can occur with NFSoRDMA.

    general protection fault: 0000 [#1] SMP
    Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
    scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
    scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
    mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
    ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
    irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
    lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
    ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
    auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
    crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
    syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
    tg3 crct10dif_pclmul drm crct10dif_common
    ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
    dm_log dm_mod
    CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
    Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
    Workqueue: events check_lifetime
    task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
    RIP: 0010:[] []
    _raw_spin_lock_bh+0x17/0x50
    RSP: 0018:ffff88031f587ba8 EFLAGS: 00010206
    RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
    RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
    R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
    R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff880322a40000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
    ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
    0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
    Call Trace:
    [] lock_sock_nested+0x20/0x50
    [] sock_setsockopt+0x78/0x940
    [] ? lock_timer_base.isra.33+0x2b/0x50
    [] kernel_setsockopt+0x4d/0x50
    [] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
    [] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
    [] notifier_call_chain+0x4c/0x70
    [] __blocking_notifier_call_chain+0x4d/0x70
    [] blocking_notifier_call_chain+0x16/0x20
    [] __inet_del_ifa+0x168/0x2d0
    [] check_lifetime+0x25f/0x270
    [] process_one_work+0x17b/0x470
    [] worker_thread+0x126/0x410
    [] ? rescuer_thread+0x460/0x460
    [] kthread+0xcf/0xe0
    [] ? kthread_create_on_node+0x140/0x140
    [] ret_from_fork+0x58/0x90
    [] ? kthread_create_on_node+0x140/0x140
    Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
    44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 0f
    c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
    RIP [] _raw_spin_lock_bh+0x17/0x50
    RSP

    Signed-off-by: Scott Mayhew
    Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
    Reviewed-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Scott Mayhew