12 Sep, 2009

1 commit

  • When the call direction is a reply, copy the xid and call direction into the
    req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
    rpc_garbage.

    Signed-off-by: Rahul Iyer
    Signed-off-by: Mike Sager
    Signed-off-by: Marc Eshel
    Signed-off-by: Benny Halevy
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [get rid of CONFIG_NFSD_V4_1]
    [sunrpc: refactoring of svc_tcp_recvfrom]
    [nfsd41: sunrpc: create common send routine for the fore and the back channels]
    [nfsd41: sunrpc: Use free_page() to free server backchannel pages]
    [nfsd41: sunrpc: Document server backchannel locking]
    [nfsd41: sunrpc: remove bc_connect_worker()]
    [nfsd41: sunrpc: Define xprt_server_backchannel()[
    [nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
    [nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
    [nfsd41: sunrpc: Don't auto close the server backchannel connection]
    [nfsd41: sunrpc: Remove unused functions]
    Signed-off-by: Alexandros Batsakis
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy
    [nfsd41: change bc_sock to bc_xprt]
    [nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
    [nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
    [removed cosmetic changes]
    Signed-off-by: Benny Halevy
    [sunrpc: add new xprt class for nfsv4.1 backchannel]
    [sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
    Signed-off-by: Alexandros Batsakis
    [reverted more cosmetic leftovers]
    [got rid of xprt_server_backchannel]
    [separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
    Signed-off-by: Benny Halevy
    Cc: Trond Myklebust
    [sunrpc: change idle timeout value for the backchannel]
    Signed-off-by: Alexandros Batsakis
    Signed-off-by: Benny Halevy
    Acked-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Rahul Iyer
     

23 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
    SUNRPC: Fix the TCP server's send buffer accounting
    nfsd41: Backchannel: minorversion support for the back channel
    nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
    nfsd41: Remove ip address collision detection case
    nfsd: optimise the starting of zero threads when none are running.
    nfsd: don't take nfsd_mutex twice when setting number of threads.
    nfsd41: sanity check client drc maxreqs
    nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
    NFS: kill off complicated macro 'PROC'
    sunrpc: potential memory leak in function rdma_read_xdr
    nfsd: minor nfsd_vfs_write cleanup
    nfsd: Pull write-gathering code out of nfsd_vfs_write
    nfsd: track last inode only in use_wgather case
    sunrpc: align cache_clean work's timer
    nfsd: Use write gathering only with NFSv2
    NFSv4: kill off complicated macro 'PROC'
    NFSv4: do exact check about attribute specified
    knfsd: remove unreported filehandle stats counters
    knfsd: fix reply cache memory corruption
    knfsd: reply cache cleanups
    ...

    Linus Torvalds
     

18 Jun, 2009

1 commit


29 Apr, 2009

2 commits

  • Adjust the synopsis of svc_sock_names() to pass in the size of the
    output buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Adjust the synopsis of svc_addsock() to pass in the size of the output
    buffer. Add a documenting comment.

    This is a cosmetic change for now. A subsequent patch will make sure
    the buffer length is passed to one_sock_name(), where the length will
    actually be useful.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

05 Oct, 2008

1 commit


24 Apr, 2008

1 commit


02 Feb, 2008

14 commits

  • This patch moves the transport sockaddr to the svc_xprt
    structure. Convenience functions are added to set and
    get the local and remote addresses of a transport from
    the transport provider as well as determine the length
    of a sockaddr.

    A transport is responsible for setting the xpt_local
    and xpt_remote addresses in the svc_xprt structure as
    part of transport creation and xpo_accept processing. This
    cannot be done in a generic way and in fact varies
    between TCP, UDP and RDMA. A set of xpo_ functions
    (e.g. getlocalname, getremotename) could have been
    added but this would have resulted in additional
    caching and copying of the addresses around. Note that
    the xpt_local address should also be set on listening
    endpoints; for TCP/RDMA this is done as part of
    endpoint creation.

    For connected transports like TCP and RDMA, the addresses
    never change and can be set once and copied into the
    rqstp structure for each request. For UDP, however, the
    local and remote addresses may change for each request. In
    this case, the address information is obtained from the
    UDP recvmsg info and copied into the rqstp structure from
    there.

    A svc_xprt_local_port function was also added that returns
    the local port given a transport. This is used by
    svc_create_xprt when returning the port associated with
    a newly created transport, and later when creating a
    generic find transport service to check if a service is
    already listening on a given port.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This patch moves the transport independent sk_deferred list to the svc_xprt
    structure and updates the svc_deferred_req structure to keep pointers to
    svc_xprt's directly. The deferral processing code is also moved out of the
    transport dependent recvfrom functions and into the generic svc_recv path.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
    transports to share this logic. A flag bit is used to determine if
    auth information is to be cached or not. Previously, this code looked
    at the transport protocol.

    I've also changed the spin_lock/unlock logic so that a lock is not taken for
    transports that are not caching auth info.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • With the implementation of the new mark and sweep algorithm for shutting
    down old connections, the sk_lastrecv field is no longer needed.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move the sk_mutex field to the transport independent svc_xprt structure.
    Now all the fields that svc_send touches are transport neutral. Change the
    svc_send function to use the transport independent svc_xprt directly instead
    of the transport dependent svc_sock structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This functionally trivial patch moves the sk_reserved field to the
    transport independent svc_xprt structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Move sk_list and sk_ready to svc_xprt. This involves close because these
    lists are walked by svcs when closing all their transports. So I combined
    the moving of these lists to svc_xprt with making close transport independent.

    The svc_force_sock_close has been changed to svc_close_all and takes a list
    as an argument. This removes some svc internals knowledge from the svcs.

    This code races with module removal and transport addition.

    Thanks to Simon Holm Thøgersen for a compile fix.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields
    Cc: Simon Holm Thøgersen

    Tom Tucker
     
  • This is another incremental change that moves transport independent
    fields from svc_sock to the svc_xprt structure. The changes
    should be functionally null.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This functionally trivial change moves the transport independent sk_flags
    field to the transport independent svc_xprt structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Change the atomic_t reference count to a kref and move it to the
    transport indepenent svc_xprt structure. Change the reference count
    wrapper names to be generic.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Modify the various kernel RPC svcs to use the svc_create_xprt service.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Previously, the accept logic looked into the socket state to determine
    whether to call accept or recv when data-ready was indicated on an endpoint.
    Since some transports don't use sockets, this logic now uses a flag
    bit (SK_LISTENER) to identify listening endpoints. A transport function
    (xpo_accept) allows each transport to define its own accept processing.
    A transport's initialization logic is reponsible for setting the
    SK_LISTENER bit. I didn't see any way to do this in transport independent
    logic since the passive side of a UDP connection doesn't listen and
    always recv's.

    In the svc_recv function, if the SK_LISTENER bit is set, the transport
    xpo_accept function is called to handle accept processing.

    Note that all functions are defined even if they don't make sense
    for a given transport. For example, accept doesn't mean anything for
    UDP. The function is defined anyway and bug checks if called. The
    UDP transport should never set the SK_LISTENER bit.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
    to be used for both UDP and TCP. Move these function pointers to the
    svc_xprt_ops structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • Make TCP and UDP svc_sock transports, and register them
    with the svc transport core.

    A transport type (svc_sock) has an svc_xprt as its first member,
    and calls svc_xprt_init to initialize this field.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     

11 Jul, 2007

1 commit


10 May, 2007

1 commit

  • Now that sk_defer_lock protects two different things, make the name more
    generic.

    Also don't bother with disabling _bh as the lock is only ever taken from
    process context.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

07 Mar, 2007

1 commit

  • When the last thread of nfsd exits, it shuts down all related sockets. It
    currently uses svc_close_socket to do this, but that only is immediately
    effective if the socket is not SK_BUSY.

    If the socket is busy - i.e. if a request has arrived that has not yet been
    processes - svc_close_socket is not effective and the shutdown process spins.

    So create a new svc_force_close_socket which removes the SK_BUSY flag is set
    and then calls svc_close_socket.

    Also change some open-codes loops in svc_destroy to use
    list_for_each_entry_safe.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

13 Feb, 2007

3 commits


10 Feb, 2007

1 commit

  • If you lose this race, it can iput a socket inode twice and you get a BUG
    in fs/inode.c

    When I added the option for user-space to close a socket, I added some
    cruft to svc_delete_socket so that I could call that function when closing
    a socket per user-space request.

    This was the wrong thing to do. I should have just set SK_CLOSE and let
    normal mechanisms do the work.

    Not only wrong, but buggy. The locking is all wrong and it openned up a
    race where-by a socket could be closed twice.

    So this patch:
    Introduces svc_close_socket which sets SK_CLOSE then either leave
    the close up to a thread, or calls svc_delete_socket if it can
    get SK_BUSY.

    Adds a bias to sk_busy which is removed when SK_DEAD is set,
    This avoid races around shutting down the socket.

    Changes several 'spin_lock' to 'spin_lock_bh' where the _bh
    was missing.

    Bugzilla-url: http://bugzilla.kernel.org/show_bug.cgi?id=7916

    Signed-off-by: Neil Brown
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

04 Oct, 2006

1 commit

  • Speed up high call-rate workloads by caching the struct ip_map for the peer on
    the connected struct svc_sock instead of looking it up in the ip_map cache
    hashtable on every call. This helps workloads using AUTH_SYS authentication
    over TCP.

    Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
    synthetic client threads simulating an rsync (i.e. recursive directory
    listing) workload reading from an i386 RH9 install image (161480 regular files
    in 10841 directories) on the server. That tree is small enough to fill in the
    server's RAM so no disk traffic was involved. This setup gives a sustained
    call rate in excess of 60000 calls/sec before being CPU-bound on the server.

    Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each
    CPU, and ip_map_lookup() was taking 2.9%. This patch drops both contribution
    into the profile noise.

    Note that the above result overstates this value of this patch for most
    workloads. The synthetic clients are all using separate IP addresses, so
    there are 64 entries in the ip_map cache hash. Because the kernel measured
    contained the bug fixed in commit

    commit 1f1e030bf75774b6a283518e1534d598e14147d4

    and was running on 64bit little-endian machine, probably all of those 64
    entries were on a single chain, thus increasing the cost of ip_map_lookup().

    With a modern kernel you would need more clients to see the same amount of
    performance improvement. This patch has helped to scale knfsd to handle a
    deployment with 2000 NFS clients.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     

02 Oct, 2006

8 commits

  • Split out the list of idle threads and pending sockets from svc_serv into a
    new svc_pool structure, and allocate a fixed number (in this patch, 1) of
    pools per svc_serv. The new structure contains a lock which takes over
    several of the duties of svc_serv->sv_lock, which is now relegated to
    protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

    The point is to move the hottest fields out of svc_serv and into svc_pool,
    allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
    This is a major step towards making the NFS server NUMA-friendly.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Convert the svc_sock->sk_reserved variable from an int protected by
    svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we
    need to take the (effectively global) svc_serv->sv_lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
    instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the
    number of places we need to take the svc_serv lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Convert the svc_sock->sk_inuse counter from an int protected by
    svc_serv->sv_lock, to an atomic. This reduces the number of places we need to
    take the (effectively global) svc_serv->sv_lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Following are 11 patches from Greg Banks which combine to make knfsd more
    Numa-aware. They reduce hitting on 'global' data structures, and create some
    data-structures that can be node-local.

    knfsd threads are bound to a particular node, and the thread to handle a new
    request is chosen from the threads that are attach to the node that received
    the interrupt.

    The distribution of threads across nodes can be controlled by a new file in
    the 'nfsd' filesystem, though the default approach of an even spread is
    probably fine for most sites.

    Some (old) numbers that show the efficacy of these patches: N == number of
    NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2

    N Throughput, MiB/s CPU usage, % (max=N*100)
    Before After Before After
    --- ------ ---- ----- -----
    4 312 435 350 228
    6 500 656 501 418
    8 562 804 690 589

    This patch:

    Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
    a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
    the amount of work that needs to be done in the main RPC loop and the length
    of time we need to hold the (effectively global) svc_serv->sv_lock.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • It isn't needed as it is available in rqstp->rq_server, and dropping it allows
    some local vars to be dropped.

    [akpm@osdl.org: build fix]
    Cc: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Userspace should create and bind a socket (but not connectted) and write the
    'fd' to portlist. This will cause the nfs server to listen on that socket.

    To close a socket, the name of the socket - as read from 'portlist' can be
    written to 'portlist' with a preceding '-'.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This file will list all ports that nfsd has open.
    Default when TCP enabled will be
    ipv4 udp 0.0.0.0 2049
    ipv4 tcp 0.0.0.0 2049

    Later, the list of ports will be settable.

    'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
    non-mainline patches which created 'ports' with different semantics.

    [akpm@osdl.org: cleanups, build fix]
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

21 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Ingo Molnar
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds