23 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
    SUNRPC: Fix the TCP server's send buffer accounting
    nfsd41: Backchannel: minorversion support for the back channel
    nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
    nfsd41: Remove ip address collision detection case
    nfsd: optimise the starting of zero threads when none are running.
    nfsd: don't take nfsd_mutex twice when setting number of threads.
    nfsd41: sanity check client drc maxreqs
    nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
    NFS: kill off complicated macro 'PROC'
    sunrpc: potential memory leak in function rdma_read_xdr
    nfsd: minor nfsd_vfs_write cleanup
    nfsd: Pull write-gathering code out of nfsd_vfs_write
    nfsd: track last inode only in use_wgather case
    sunrpc: align cache_clean work's timer
    nfsd: Use write gathering only with NFSv2
    NFSv4: kill off complicated macro 'PROC'
    NFSv4: do exact check about attribute specified
    knfsd: remove unreported filehandle stats counters
    knfsd: fix reply cache memory corruption
    knfsd: reply cache cleanups
    ...

    Linus Torvalds
     

19 Jun, 2009

1 commit

  • Currently, the sunrpc server is refusing to allow us to process new RPC
    calls if the TCP send buffer is 2/3 full, even if we do actually have
    enough free space to guarantee that we can send another request.
    The following patch fixes svc_tcp_has_wspace() so that we only stop
    processing requests if we know that the socket buffer cannot possibly fit
    another reply.

    It also fixes the tcp write_space() callback so that we only clear the
    SOCK_NOSPACE flag when the TCP send buffer is less than 2/3 full.
    This should ensure that the send window will grow as per the standard TCP
    socket code.

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     

18 Jun, 2009

1 commit


16 Jun, 2009

1 commit


28 May, 2009

1 commit

  • This reverts commit 47a14ef1af48c696b214ac168f056ddc79793d0e "svcrpc:
    take advantage of tcp autotuning", which uncovered some further problems
    in the server rpc code, causing significant performance regressions in
    common cases.

    We will likely reinstate this patch after releasing 2.6.30 and applying
    some work on the underlying fixes to the problem (developed by Trond).

    Reported-by: Jeff Moyer
    Cc: Olga Kornievskaia
    Cc: Jim Rees
    Cc: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

29 Apr, 2009

6 commits

  • Clean up svc_one_sock_name() by setting up automatic variables for
    frequently used expressions.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Add an arm to the switch statement in svc_one_sock_name() so it can
    construct the name of PF_INET6 sockets properly.

    Signed-off-by: Chuck Lever
    Cc: Aime Le Rouzic
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Use snprintf() in one_sock_name() to prevent overflowing the output
    buffer. If the name doesn't fit in the buffer, the buffer is filled
    in with an empty string, and -ENAMETOOLONG is returned.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Adjust the synopsis of svc_sock_names() to pass in the size of the
    output buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Adjust the synopsis of svc_addsock() to pass in the size of the output
    buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't
    recognize the address family of the passed-in socket address. However,
    the return type of this function is size_t, which means -EAFNOSUPPORT
    is turned into a very large positive value in this case.

    The check in svc_udp_recvfrom() to see if the return value is less
    than zero therefore won't work at all.

    Additionally, handle_connect_req() passes this value directly to
    memset(). This could cause memset() to clobber a large chunk of memory
    if svc_addr_len() has returned an error. Currently the address family
    of these addresses, however, is known to be supported long before
    handle_connect_req() is called, so this isn't a real risk.

    Change the error return value of svc_addr_len() to zero, which fits in
    the range of size_t, and is safer to pass to memset() directly.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

07 Apr, 2009

1 commit

  • * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
    nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
    nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
    nfsd41: Documentation/filesystems/nfs41-server.txt
    nfsd41: CREATE_EXCLUSIVE4_1
    nfsd41: SUPPATTR_EXCLCREAT attribute
    nfsd41: support for 3-word long attribute bitmask
    nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
    nfsd41: pass writable attrs mask to nfsd4_decode_fattr
    nfsd41: provide support for minor version 1 at rpc level
    nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
    nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
    nfsd41: access_valid
    nfsd41: clientid handling
    nfsd41: check encode size for sessions maxresponse cached
    nfsd41: stateid handling
    nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
    nfsd41: destroy_session operation
    nfsd41: non-page DRC for solo sequence responses
    nfsd41: Add a create session replay cache
    nfsd41: create_session operation
    ...

    Linus Torvalds
     

02 Apr, 2009

1 commit


29 Mar, 2009

3 commits

  • We are about to convert to using separate RPC listener sockets for
    PF_INET and PF_INET6. This echoes the way IPv6 is handled in user
    space by TI-RPC, and eliminates the need for ULPs to worry about
    mapped IPv4 AF_INET6 addresses when doing address comparisons.

    Start by setting the IPV6ONLY flag on PF_INET6 RPC listener sockets.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Since the sv_family field is going away, modify svc_setup_socket() to
    extract the protocol family from the passed-in socket instead of from
    the passed-in svc_serv struct.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The sv_family field is going away. Instead of using sv_family, have
    the svc_register() function take a protocol family argument.

    Since this argument represents a protocol family, and not an address
    family, this argument takes an int, as this is what is passed to
    sock_create_kern(). Also make sure svc_register's helpers are
    checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are
    equivalent; this is simply a symbolic change to reflect the semantics
    of the value stored in that variable.

    sock_create_kern() should return EPFNOSUPPORT if the passed-in
    protocol family isn't supported, but it uses EAFNOSUPPORT for this
    case. We will stick with that tradition here, as svc_register()
    is called by the RPC server in the same path as sock_create_kern().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

19 Mar, 2009

1 commit

  • Allow the NFSv4 server to make use of TCP autotuning behaviour, which
    was previously disabled by setting the sk_userlocks variable.

    Set the receive buffers to be big enough to receive the whole RPC
    request, and set this for the listening socket, not the accept socket.

    Remove the code that readjusts the receive/send buffer sizes for the
    accepted socket. Previously this code was used to influence the TCP
    window management behaviour, which is no longer needed when autotuning
    is enabled.

    This can improve IO bandwidth on networks with high bandwidth-delay
    products, where a large tcp window is required. It also simplifies
    performance tuning, since getting adequate tcp buffers previously
    required increasing the number of nfsd threads.

    Signed-off-by: Olga Kornievskaia
    Cc: Jim Rees
    Signed-off-by: J. Bruce Fields

    Olga Kornievskaia
     

08 Jan, 2009

2 commits


07 Jan, 2009

1 commit


16 Dec, 2008

1 commit


25 Nov, 2008

1 commit

  • The svc_addsock function adds transport instances without taking a
    reference on the sunrpc.ko module, however, the generic transport
    destruction code drops a reference when a transport instance
    is destroyed.

    Add a try_module_get call to the svc_addsock function for transport
    instances added by this function.

    Signed-off-by: Tom Tucker
    Signed-off-by: J. Bruce Fields
    Tested-by: Jeff Moyer

    Tom Tucker
     

31 Oct, 2008

1 commit


05 Oct, 2008

1 commit


30 Sep, 2008

1 commit

  • My plan is to use an AF_INET listener on systems that support only IPv4,
    and an AF_INET6 listener on systems that can support IPv6. Incoming
    IPv4 packets will be posted to an AF_INET6 listener with a mapped IPv4
    address.

    Max Matveev says:
    Creating a single listener can be dangerous - if net.ipv6.bindv6only
    is enabled then it's possible to create another listener in v4
    namespace on the same port and steal the traffic from the "unifed"
    listener. You need to disable V6ONLY explicitly via a sockopt to stop
    that.

    Set appropriate socket option on RPC server listener sockets to prevent
    this.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

24 Apr, 2008

3 commits


22 Feb, 2008

1 commit

  • Sorry for the noise, but here's the v3 of this compilation fix :)

    There are some places, which declare the char buf[...] on the stack
    to push it later into dprintk(). Since the dprintk sometimes (if the
    CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
    cause gcc to produce appropriate warnings.

    Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
    compile them out when not needed.

    Signed-off-by: Pavel Emelyanov
    Acked-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Pavel Emelyanov
     

02 Feb, 2008

11 commits

  • Some transports have a header in front of the RPC header. The current
    defer/revisit processing considers only the iov_len and arg_len to
    determine how much to back up when saving the original request
    to revisit. Add a field to the rqstp structure to save the size
    of the transport header so svc_defer can correctly compute
    the start of a request.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This functionally trivial patch moves all of the transport independent
    functions from the svcsock.c file to the transport independent svc_xprt.c
    file.

    In addition the following formatting changes were made:
    - White space cleanup
    - Function signatures on single line
    - The inline directive was removed
    - Lines over 80 columns were reformatted
    - The term 'socket' was changed to 'transport' in comments
    - The SMP comment was moved and updated.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The svc_check_conn_limits function only manipulates xprt fields. Change references
    to svc_sock->sk_xprt to svc_xprt directly.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This functionally empty patch removes rq_sock and unamed union
    from rqstp structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move the svc transport list logic into common transport creation code.
    Refactor this code path to make the flow of control easier to read.

    Move the setting and clearing of the BUSY_BIT during transport creation
    to common code.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This function is transport independent. Change it to use svc_xprt directly
    and change it's name to reflect this.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • All of the transport field and functions used by svc_recv are now
    transport independent. Change the svc_recv function to use the svc_xprt
    structure directly instead of the transport specific svc_sock structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The svc_sock_release function only touches transport independent fields.
    Change the function to manipulate svc_xprt directly instead of the transport
    dependent svc_sock structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This patch moves the transport sockaddr to the svc_xprt
    structure. Convenience functions are added to set and
    get the local and remote addresses of a transport from
    the transport provider as well as determine the length
    of a sockaddr.

    A transport is responsible for setting the xpt_local
    and xpt_remote addresses in the svc_xprt structure as
    part of transport creation and xpo_accept processing. This
    cannot be done in a generic way and in fact varies
    between TCP, UDP and RDMA. A set of xpo_ functions
    (e.g. getlocalname, getremotename) could have been
    added but this would have resulted in additional
    caching and copying of the addresses around. Note that
    the xpt_local address should also be set on listening
    endpoints; for TCP/RDMA this is done as part of
    endpoint creation.

    For connected transports like TCP and RDMA, the addresses
    never change and can be set once and copied into the
    rqstp structure for each request. For UDP, however, the
    local and remote addresses may change for each request. In
    this case, the address information is obtained from the
    UDP recvmsg info and copied into the rqstp structure from
    there.

    A svc_xprt_local_port function was also added that returns
    the local port given a transport. This is used by
    svc_create_xprt when returning the port associated with
    a newly created transport, and later when creating a
    generic find transport service to check if a service is
    already listening on a given port.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This patch moves the transport independent sk_deferred list to the svc_xprt
    structure and updates the svc_deferred_req structure to keep pointers to
    svc_xprt's directly. The deferral processing code is also moved out of the
    transport dependent recvfrom functions and into the generic svc_recv path.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
    transports to share this logic. A flag bit is used to determine if
    auth information is to be cached or not. Previously, this code looked
    at the transport protocol.

    I've also changed the spin_lock/unlock logic so that a lock is not taken for
    transports that are not caching auth info.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker