20 Dec, 2005

1 commit

  • If we get something like the following,
    [ 125.300636] [] schedule_timeout+0x54/0xa5
    [ 125.305931] [] io_schedule_timeout+0x29/0x33
    [ 125.311495] [] blk_congestion_wait+0x70/0x85
    [ 125.317058] [] throttle_vm_writeout+0x69/0x7d
    [ 125.322720] [] shrink_zone+0xe0/0xfa
    [ 125.327560] [] shrink_caches+0x6d/0x6f
    [ 125.332581] [] try_to_free_pages+0xd0/0x1b5
    [ 125.338056] [] __alloc_pages+0x135/0x2e8
    [ 125.343258] [] tcp_sendmsg+0xaa0/0xb78
    [ 125.348281] [] inet_sendmsg+0x48/0x53
    [ 125.353212] [] sock_sendmsg+0xb8/0xd3
    [ 125.358147] [] kernel_sendmsg+0x42/0x4f
    [ 125.363259] [] sock_no_sendpage+0x5e/0x77
    [ 125.368556] [] xs_tcp_send_request+0x2af/0x375
    then the socket is blocked until memory is reclaimed, and no
    progress can ever be made.

    Try to access the emergency pools by using GFP_ATOMIC.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

05 Nov, 2005

1 commit


24 Sep, 2005

24 commits

  • In fact, ->set_buffer_size should be completely functionless for non-UDP.

    Test-plan:
    Check socket buffer size on UDP sockets over time.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Each transport implementation can now set unique bind, connect,
    reestablishment, and idle timeout values. These are variables,
    allowing the values to be modified dynamically. This permits
    exponential backoff of any of these values, for instance.

    As an example, we implement exponential backoff for the connection
    reestablishment timeout.

    Test-plan:
    Destructive testing (unplugging the network temporarily). Connectathon
    with UDP and TCP.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Implement a best practice: if the remote end drops our connection, try to
    reconnect using the same port number. This is important because the NFS
    server's Duplicate Reply Cache often hashes on the source port number.
    If the client reuses the port number when it reconnects, the server's DRC
    will be more effective.

    Based on suggestions by Mike Eisler, Olaf Kirch, and Alexey Kuznetsky.

    Test-plan:
    Destructive testing.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Select an RPC client source port between 650 and 1023 instead of between
    1 and 800. The old range conflicts with a number of network services.
    Provide sysctls to allow admins to select a different port range.

    Note that this doesn't affect user-level RPC library behavior, which
    still uses 1 to 800.

    Based on a suggestion by Olaf Kirch .

    Test-plan:
    Repeated mount and unmount. Destructive testing. Idle timeouts.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean-up: Move some macros that are specific to the Van Jacobson
    implementation into xprt.c. Get rid of the cong_wait field in
    rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog.

    Test-plan:
    Compile with CONFIG_NFS enabled.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Get rid of the "xprt->nocong" variable.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
    Look for significant regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The final place where congestion control state is adjusted is in
    xprt_release, where each request is finally released. Add a callout
    there to allow transports to perform additional processing when a
    request is about to be released.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for significant
    regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • A new interface that allows transports to adjust their congestion window
    using the Van Jacobson implementation in xprt.c is provided.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for
    significant regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Allow transports to hook the retransmit timer interrupt. Some transports
    calculate their congestion window here so that a retransmit timeout has
    immediate effect on the congestion window.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for significant
    regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The next method we abstract is the one that releases a transport,
    allowing another task to have access to the transport.

    Again, one generic version of this is provided for transports that
    don't need the RPC client to perform congestion control, and one
    version is for transports that can use the original Van Jacobson
    implementation in xprt.c.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for
    significant regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The next several patches introduce an API that allows transports to
    choose whether the RPC client provides congestion control or whether
    the transport itself provides it.

    The first method we abstract is the one that serializes access to the
    RPC transport to prevent the bytes from different requests from mingling
    together. This method provides proper request serialization and the
    opportunity to prevent new requests from being started because the
    transport is congested.

    The normal situation is for the transport to handle congestion control
    itself. Although NFS over UDP was first, it has been recognized after
    years of experience that having the transport provide congestion control
    is much better than doing it in the RPC client. Thus TCP, and probably
    every future transport implementation, will use the default method,
    xprt_lock_write, provided in xprt.c, which does not provide any kind
    of congestion control. UDP can continue using the xprt.c-provided
    Van Jacobson congestion avoidance implementation.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for significant
    regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Prepare the way to remove the "xprt->nocong" variable by adding a callout
    to the RPC client transport switch API to handle setting RPC retransmit
    timeouts.

    Add a pair of generic helper functions that provide the ability to set a
    simple fixed timeout, or to set a timeout based on the state of a round-
    trip estimator.

    Test-plan:
    Use WAN simulation to cause sporadic bursty packet loss. Look for significant
    regression in performance or client stability.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Now we can fix up the last few places that use the "xprt->stream"
    variable, and get rid of it from the rpc_xprt structure.

    Test-plan:
    Destructive testing (unplugging the network temporarily). Connectathon
    with UDP and TCP.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Add a generic mechanism for skipping over transport-specific headers
    when constructing an RPC request. This removes another "xprt->stream"
    dependency.

    Test-plan:
    Write-intensive workload on a single mount point (try both UDP and
    TCP).

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Split the RPC client's main socket write path into a TCP version and a UDP
    version to eliminate another dependency on the "xprt->stream" variable.

    Compiler optimization removes unneeded code from xs_sendpages, as this
    function is now called with some constant arguments.

    We can now cleanly perform transport protocol-specific return code testing
    and error recovery in each path.

    Test-plan:
    Millions of fsx operations. Performance characterization such as
    "sio" or "iozone". Examine oprofile results for any changes before and
    after this patch is applied.

    Version: Thu, 11 Aug 2005 16:08:46 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Create separate connection worker functions for managing UDP and TCP
    transport sockets. This eliminates several dependencies on "xprt->stream".

    Test-plan:
    Destructive testing (unplugging the network temporarily). Connectathon with
    v2, v3, and v4.

    Version: Thu, 11 Aug 2005 16:08:18 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Split the socket write space callback function into a TCP version and UDP
    version, eliminating one dependence on the "xprt->stream" variable.

    Keep the common pieces of this path in xprt.c so other transports can use
    it too.

    Test-plan:
    Write-intensive workload on a single mount point.

    Version: Thu, 11 Aug 2005 16:07:51 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean-up: change some comments to reflect the realities of the new RPC
    transport switch mechanism. Get rid of unused xprt_receive() prototype.

    Also, organize function prototypes in xprt.h by usage and scope.

    Test-plan:
    Compile kernel with CONFIG_NFS enabled.

    Version: Thu, 11 Aug 2005 16:07:21 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean-up: remove only reference to xprt->pending from the socket transport
    implementation. This makes a cleaner interface for other transport
    implementations as well.

    Test-plan:
    Compile kernel with CONFIG_NFS enabled.

    Version: Thu, 11 Aug 2005 16:06:52 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean-up: get rid of a name reference to sockets in the generic parts of the
    RPC client by renaming the sockstate field in the rpc_xprt structure.

    Test-plan:
    Compile kernel with CONFIG_NFS enabled.

    Version: Thu, 11 Aug 2005 16:05:53 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean-up: replace a name reference to sockets in the generic parts of the RPC
    client by renaming sock_lock in the rpc_xprt structure.

    Test-plan:
    Compile kernel with CONFIG_NFS enabled.

    Version: Thu, 11 Aug 2005 16:05:00 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Reduce stack utilization of the RPC socket transport's send path.

    A couple of unlikely()s are added to ensure the compiler places the
    tail processing at the end of the csect.

    Test-plan:
    Millions of fsx operations. Performance characterization such as "sio" or
    "iozone".

    Version: Thu, 11 Aug 2005 16:04:30 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Introduce block header comments and a function naming convention to the
    socket transport implementation. Provide a debug setting for transports
    that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout().

    Provide block comments for exposed interfaces in xprt.c, and eliminate
    the useless obvious comments.

    Convert printk's to dprintk's.

    Test-plan:
    Compile kernel with CONFIG_NFS enabled.

    Version: Thu, 11 Aug 2005 16:04:04 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Move the bulk of client-side socket-specific code into a separate source
    file, net/sunrpc/xprtsock.c.

    Test-plan:
    Millions of fsx operations. Performance characterization such as "sio" or
    "iozone". Destructive testing (unplugging the network temporarily, server
    reboots). Connectathon with v2, v3, and v4.

    Version: Thu, 11 Aug 2005 16:03:38 -0400

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever