16 Dec, 2009

1 commit


24 Nov, 2009

1 commit


15 Jul, 2009

1 commit

  • The version 4.1 DRC memory limit and tracking variables are server wide and
    session specific. Replace struct svc_serv fields with globals.
    Stop using the svc_serv sv_lock.

    Add a spinlock to serialize access to the DRC limit management variables which
    change on session creation and deletion (usage counter) or (future)
    administrative action to adjust the total DRC memory limit.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy

    Andy Adamson
     

18 Jun, 2009

3 commits

  • This svc_xprt is passed on to the callback service thread to be later used
    to processes incoming svc_rqst's

    Signed-off-by: Benny Halevy

    Andy Adamson
     
  • Implement the NFSv4.1 backchannel service. Invokes the common callback
    processing logic svc_process_common() to authenticate the call and
    dispatch the appropriate NFSv4.1 XDR decoder and operation procedure.
    It then invokes bc_send() to send the reply over the same connection.
    bc_send() is implemented in a separate patch.

    At this time there is no slot validation or reply cache handling.

    [nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy
    [Move bc_svc_process() declaration to correct patch]
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy

    Ricardo Labiaga
     
  • Adds new list of rpc_xprt structures, and a readers/writers lock to
    protect the list. The list is used to preallocate resources for
    the backchannel during backchannel requests. Callbacks are not
    expected to cause significant latency, so only one callback will
    be allowed at this time.

    It also adds a pointer to the NFS callback service so that
    requests can be directed to it for processing.

    New callback members added to svc_serv. The NFSv4.1 callback service will
    sleep on the svc_serv->svc_cb_waitq until new callback requests arrive.
    The request will be queued in svc_serv->svc_cb_list. This patch adds this
    list, the sleep queue and spinlock to svc_serv.

    [nfs41: NFSv4.1 callback support]
    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Benny Halevy

    Ricardo Labiaga
     

07 Apr, 2009

1 commit

  • * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
    nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
    nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
    nfsd41: Documentation/filesystems/nfs41-server.txt
    nfsd41: CREATE_EXCLUSIVE4_1
    nfsd41: SUPPATTR_EXCLCREAT attribute
    nfsd41: support for 3-word long attribute bitmask
    nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
    nfsd41: pass writable attrs mask to nfsd4_decode_fattr
    nfsd41: provide support for minor version 1 at rpc level
    nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
    nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
    nfsd41: access_valid
    nfsd41: clientid handling
    nfsd41: check encode size for sessions maxresponse cached
    nfsd41: stateid handling
    nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
    nfsd41: destroy_session operation
    nfsd41: non-page DRC for solo sequence responses
    nfsd41: Add a create session replay cache
    nfsd41: create_session operation
    ...

    Linus Torvalds
     

04 Apr, 2009

2 commits

  • Use no more than 1/128th of the number of free pages at nfsd startup for the
    v4.1 DRC.

    This is an arbitrary default which should probably end up under the control
    of an administrator.

    Signed-off-by: Andy Adamson
    [moved added fields in struct svc_serv under CONFIG_NFSD_V4_1]
    Signed-off-by: Benny Halevy
    [fix set_max_drc calculation of sv_drc_max_pages]
    [moved NFSD_DRC_SIZE_SHIFT's declaration up in header file]
    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Andy Adamson
     
  • On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
    returned. It is up to the NFSv4.1 client to resend only the operations that
    have not been processed.

    Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
    nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

    Note: this isn't an adequate solution on its own. It's acceptable as a way
    to get some minimal 4.1 up and working, but we're going to have to find a
    way to avoid returning DELAY in all common cases before 4.1 can really be
    considered ready.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfsd41: reverse rq_nodeferral negative logic]
    Signed-off-by: Benny Halevy
    [sunrpc: initialize rq_usedeferral]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Andy Adamson
     

29 Mar, 2009

2 commits

  • Since an RPC service listener's protocol family is specified now via
    svc_create_xprt(), it no longer needs to be passed to svc_create() or
    svc_create_pooled(). Remove that argument from the synopsis of those
    functions, and remove the sv_family field from the svc_serv struct.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The sv_family field is going away. Instead of using sv_family, have
    the svc_register() function take a protocol family argument.

    Since this argument represents a protocol family, and not an address
    family, this argument takes an int, as this is what is passed to
    sock_create_kern(). Also make sure svc_register's helpers are
    checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are
    equivalent; this is simply a symbolic change to reflect the semantics
    of the value stored in that variable.

    sock_create_kern() should return EPFNOSUPPORT if the passed-in
    protocol family isn't supported, but it uses EAFNOSUPPORT for this
    case. We will stick with that tradition here, as svc_register()
    is called by the RPC server in the same path as sock_create_kern().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

19 Mar, 2009

2 commits

  • Add /proc/fs/nfsd/pool_stats to export to userspace various
    statistics about the operation of rpc server thread pools.

    This patch is based on a forward-ported version of
    knfsd-add-pool-thread-stats which has been shipping in the SGI
    "Enhanced NFS" product since 2006 and which was previously
    posted:

    http://article.gmane.org/gmane.linux.nfs/10375

    It has also been updated thus:

    * moved EXPORT_SYMBOL() to near the function it exports
    * made the new struct struct seq_operations const
    * used SEQ_START_TOKEN instead of ((void *)1)
    * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
    printk in svc_pool_stats_*()" by Harshula Jayasuriya.
    * merged fix from SGI PV 964001 "Crash reading pool_stats before
    nfsds are started".

    Signed-off-by: Greg Banks
    Signed-off-by: Harshula Jayasuriya
    Signed-off-by: J. Bruce Fields

    Greg Banks
     
  • Avoid overloading the CPU scheduler with enormous load averages
    when handling high call-rate NFS loads. When the knfsd bottom half
    is made aware of an incoming call by the socket layer, it tries to
    choose an nfsd thread and wake it up. As long as there are idle
    threads, one will be woken up.

    If there are lot of nfsd threads (a sensible configuration when
    the server is disk-bound or is running an HSM), there will be many
    more nfsd threads than CPUs to run them. Under a high call-rate
    low service-time workload, the result is that almost every nfsd is
    runnable, but only a handful are actually able to run. This situation
    causes two significant problems:

    1. The CPU scheduler takes over 10% of each CPU, which is robbing
    the nfsd threads of valuable CPU time.

    2. At a high enough load, the nfsd threads starve userspace threads
    of CPU time, to the point where daemons like portmap and rpc.mountd
    do not schedule for tens of seconds at a time. Clients attempting
    to mount an NFS filesystem timeout at the very first step (opening
    a TCP connection to portmap) because portmap cannot wake up from
    select() and call accept() in time.

    Disclaimer: these effects were observed on a SLES9 kernel, modern
    kernels' schedulers may behave more gracefully.

    The solution is simple: keep in each svc_pool a counter of the number
    of threads which have been woken but have not yet run, and do not wake
    any more if that count reaches an arbitrary small threshold.

    Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
    synthetic client threads simulating an rsync (i.e. recursive directory
    listing) workload reading from an i386 RH9 install image (161480
    regular files in 10841 directories) on the server. That tree is small
    enough to fill in the server's RAM so no disk traffic was involved.
    This setup gives a sustained call rate in excess of 60000 calls/sec
    before being CPU-bound on the server. The server was running 128 nfsds.

    Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
    taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
    Load average was over 120 before the patch, and 20.9 after.

    This patch is a forward-ported version of knfsd-avoid-nfsd-overload
    which has been shipping in the SGI "Enhanced NFS" product since 2006.
    It has been posted before:

    http://article.gmane.org/gmane.linux.nfs/10374

    Signed-off-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Greg Banks
     

07 Jan, 2009

1 commit

  • svc_check_conn_limits() attempts to prevent denial of service attacks
    by having the service close old connections once it reaches a
    threshold. This threshold is based on the number of threads in the
    service:

    (serv->sv_nrthreads + 3) * 20

    Once we reach this, we drop the oldest connections and a printk pops
    to warn the admin that they should increase the number of threads.

    Increasing the number of threads isn't an option however for services
    like lockd. We don't want to eliminate this check entirely for such
    services but we need some way to increase this limit.

    This patch adds a sv_maxconn field to the svc_serv struct. When it's
    set to 0, we use the current method to calculate the max number of
    connections. RPC services can then set this on an as-needed basis.

    Signed-off-by: Jeff Layton
    Acked-by: Neil Brown
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

30 Sep, 2008

3 commits

  • Clean up: Add extra type safety and squelch a few compiler complaints
    in upcoming patches.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • In order to advertise NFS-related services on IPv6 interfaces via
    rpcbind, the kernel RPC server implementation must use
    rpcb_v4_register() instead of rpcb_register().

    A new kernel build option allows distributions to use the legacy
    v2 call until they integrate an appropriate user-space rpcbind
    daemon that can support IPv6 RPC services.

    I tried adding some automatic logic to fall back if registering
    with a v4 protocol request failed, but there are too many corner
    cases. So I just made it a compile-time switch that distributions
    can throw when they've replaced portmapper with rpcbind.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce and initialize an address family field in the svc_serv structure.

    This field will determine what family to use for the service's listener
    sockets and what families are advertised via the local rpcbind daemon.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

24 Jun, 2008

2 commits

  • Since we no longer make any distinction between shutdown signals with
    nfsd, then it becomes easier to just standardize on a particular signal
    to use to bring it down (SIGINT, in this case).

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • This patch is rather large, but I couldn't figure out a way to break it
    up that would remain bisectable. It does several things:

    - change svc_thread_fn typedef to better match what kthread_create expects
    - change svc_pool_map_set_cpumask to be more kthread friendly. Make it
    take a task arg and and get rid of the "oldmask"
    - have svc_set_num_threads call kthread_create directly
    - eliminate __svc_create_thread

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

24 Apr, 2008

1 commit


11 Feb, 2008

1 commit

  • This is a void function attempting to return the return value from
    another void function, which seems harmless but extremely weird, and
    apparently makes some compilers complain.

    While we're there, clean up a little (e.g. the switch statement had a
    minor style problem and seemed overkill as long as there's only one
    case).

    Thanks to Trond for noticing this.

    Signed-off-by: J. Bruce Fields
    Cc: Trond Myklebust

    J. Bruce Fields
     

02 Feb, 2008

6 commits

  • Move the initialzation in __svc_create_thread that happens prior to
    thread creation to a new function. Export the function to allow
    services to have better control over the svc_rqst structs.

    Also rearrange the rqstp initialization to prevent NULL pointer
    dereferences in svc_exit_thread in case allocations fail.

    Signed-off-by: Jeff Layton
    Reviewed-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Some transports have a header in front of the RPC header. The current
    defer/revisit processing considers only the iov_len and arg_len to
    determine how much to back up when saving the original request
    to revisit. Add a field to the rqstp structure to save the size
    of the transport header so svc_defer can correctly compute
    the start of a request.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This functionally empty patch removes rq_sock and unamed union
    from rqstp structure.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • This patch moves the transport independent sk_deferred list to the svc_xprt
    structure and updates the svc_deferred_req structure to keep pointers to
    svc_xprt's directly. The deferral processing code is also moved out of the
    transport dependent recvfrom functions and into the generic svc_recv path.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The svc_sock_release function releases pages allocated to a thread. For
    UDP this frees the receive skb. For RDMA it will post a receive WR
    and bump the client credit count.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The rqstp structure contains a pointer to the transport for the
    RPC request. This functionaly trivial patch adds an unamed union
    with pointers to both svc_sock and svc_xprt. Ultimately the
    union will be removed and only the rq_xprt field will remain. This
    allows incrementally extracting transport independent interfaces without
    one gigundo patch.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     

18 Jul, 2007

2 commits

  • We want it to be possible for users to restrict exports both by IP address and
    by pseudoflavor. The pseudoflavor information has previously been passed
    using special auth_domains stored in the rq_client field. After the preceding
    patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
    now we use rq_client for the ip information, as auth_null and auth_unix do.

    However, we keep around the special auth_domain in the rq_gssclient field for
    backwards compatibility purposes, so we can still do upcalls using the old
    "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
    appropriate export. This allows us to continue supporting old mountd.

    In fact, for this first patch, we always use the "gss/pseudoflavor"
    auth_domain (and only it) if it is available; thus rq_client is ignored in the
    auth_gss case, and this patch on its own makes no change in behavior; that
    will be left to later patches.

    Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
    upcall by a dummy value--no version of idmapd has ever used it, and it's
    unlikely anyone really wants to perform idmapping differently depending on the
    where the client is (they may want to perform *credential* mapping
    differently, but that's a different matter--the idmapper just handles id's
    used in getattr and setattr). But I'm updating the idmapd code anyway, just
    out of general backwards-compatibility paranoia.

    Signed-off-by: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Add a new field to the svc_rqst structure to record the pseudoflavor that the
    request was made with. For now we record the pseudoflavor but don't use it
    for anything.

    Signed-off-by: Andy Adamson
    Signed-off-by: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Adamson
     

10 Jul, 2007

1 commit


10 May, 2007

1 commit

  • When the kernel calls svc_reserve to downsize the expected size of an RPC
    reply, it fails to account for the possibility of a checksum at the end of
    the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
    then you'll generally see messages similar to this in the server's ring
    buffer:

    RPC request reserved 164 but used 208

    While I was never able to verify it, I suspect that this problem is also
    the root cause of some oopses I've seen under these conditions:

    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

    This is probably also a problem for other sec= types and for NFSv4. The
    large reserved size for NFSv4 compound packets seems to generally paper
    over the problem, however.

    This patch adds a wrapper for svc_reserve that accounts for the possibility
    of a checksum. It also fixes up the appropriate callers of svc_reserve to
    call the wrapper. For now, it just uses a hardcoded value that I
    determined via testing. That value may need to be revised upward as things
    change, or we may want to eventually add a new auth_op that attempts to
    calculate this somehow.

    Unfortunately, there doesn't seem to be a good way to reliably determine
    the expected checksum length prior to actually calculating it, particularly
    with schemes like spkm3.

    Signed-off-by: Jeff Layton
    Acked-by: Neil Brown
    Cc: Trond Myklebust
    Acked-by: J. Bruce Fields
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

07 Mar, 2007

1 commit


13 Feb, 2007

5 commits


27 Jan, 2007

1 commit

  • NFSd assumes that largest number of pages that will be needed for a
    request+response is 2+N where N pages is the size of the largest permitted
    read/write request. The '2' are 1 for the non-data part of the request, and 1
    for the non-data part of the reply.

    However, when a read request is not page-aligned, and we choose to use
    ->sendfile to send it directly from the page cache, we may need N+1 pages to
    hold the whole reply. This can overflow and array and cause an Oops.

    This patch increases size of the array for holding pages by one and makes sure
    that entry is NULL when it is not in use.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

21 Oct, 2006

1 commit


06 Oct, 2006

1 commit

  • There is some confusion about the meaning of 'bufsz' for a sunrpc server.
    In some cases it is the largest message that can be sent or received. In
    other cases it is the largest 'payload' that can be included in a NFS
    message.

    In either case, it is not possible for both the request and the reply to be
    this large. One of the request or reply may only be one page long, which
    fits nicely with NFS.

    So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
    'max_mesg'. Max_payload is the size that the server requests. It is used
    by the server to check the max size allowed on a particular connection:
    depending on the protocol a lower limit might be used.

    max_mesg is the largest single message that can be sent or received. It is
    calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
    with PAGE_SIZE added to overhead. Only one of the request and reply may be
    this size. The other must be at most one page.

    Cc: Greg Banks
    Cc: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown