30 Mar, 2009

1 commit


08 Jan, 2009

1 commit


30 Sep, 2008

6 commits

  • With the new rpcbind code, a PMAP_UNSET will not have any effect on
    services registered via rpcbind v3 or v4.

    Implement a version of svc_unregister() that uses an RPCB_UNSET with
    an empty netid string to make sure we have cleared *all* entries for
    a kernel RPC service when shutting down, or before starting a fresh
    instance of the service.

    Use the new version only when CONFIG_SUNRPC_REGISTER_V4 is enabled;
    otherwise, the legacy PMAP version is used to ensure complete
    backwards-compatibility with the Linux portmapper daemon.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • TI-RPC is a user-space library of RPC functions that replaces ONC RPC
    and allows RPC to operate in the new world of IPv6.

    TI-RPC combines the concept of a transport protocol (UDP and TCP)
    and a protocol family (PF_INET and PF_INET6) into a single identifier
    called a "netid." For example, "udp" means UDP over IPv4, and "udp6"
    means UDP over IPv6.

    For rpcbind, then, the RPC service tuple that is registered and
    advertised is:

    [RPC program, RPC version, service address and port, netid]

    instead of

    [RPC program, RPC version, port, protocol]

    Service address is typically ANYADDR, but can be a specific address
    of one of the interfaces on a multi-homed host. The third item in
    the new tuple is expressed as a universal address.

    The current Linux rpcbind implementation registers a netid for both
    protocol families when RPCB_SET is done for just the PF_INET6 version
    of the netid (ie udp6 or tcp6). So registering "udp6" causes a
    registration for "udp" to appear automatically as well.

    We've recently determined that this is incorrect behavior. In the
    TI-RPC world, "udp6" is not meant to imply that the registered RPC
    service handles requests from AF_INET as well, even if the listener
    socket does address mapping. "udp" and "udp6" are entirely separate
    capabilities, and must be registered separately.

    The Linux kernel, unlike TI-RPC, leverages address mapping to allow a
    single listener socket to handle requests for both AF_INET and AF_INET6.
    This is still OK, but the kernel currently assumes registering "udp6"
    will cover "udp" as well. It registers only "udp6" for it's AF_INET6
    services, even though they handle both AF_INET and AF_INET6 on the same
    port.

    So svc_register() actually needs to register both "udp" and "udp6"
    explicitly (and likewise for TCP). Until rpcbind is fixed, the
    kernel can ignore the return code for the second RPCB_SET call.

    Please merge this with commit 15231312:

    SUNRPC: Support IPv6 when registering kernel RPC services

    Signed-off-by: Chuck Lever
    Cc: Olaf Kirch
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • In order to advertise NFS-related services on IPv6 interfaces via
    rpcbind, the kernel RPC server implementation must use
    rpcb_v4_register() instead of rpcb_register().

    A new kernel build option allows distributions to use the legacy
    v2 call until they integrate an appropriate user-space rpcbind
    daemon that can support IPv6 RPC services.

    I tried adding some automatic logic to fall back if registering
    with a v4 protocol request failed, but there are too many corner
    cases. So I just made it a compile-time switch that distributions
    can throw when they've replaced portmapper with rpcbind.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Create a separate server-level interface for unregistering RPC services.

    The mechanics of, and the API for, registering and unregistering RPC
    services will diverge further as support for IPv6 is added.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Bruce suggested there's no need to expose the difference between an error
    sending the PMAP_SET request and an error reply from the portmapper to
    rpcb_register's callers. The user space equivalent of rpcb_register() is
    pmap_set(3), which returns a bool_t : either the PMAP set worked, or it
    didn't. Simple.

    So let's remove the "*okay" argument from rpcb_register() and
    rpcb_v4_register(), and simply return an error if any part of the call
    didn't work.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce and initialize an address family field in the svc_serv structure.

    This field will determine what family to use for the service's listener
    sockets and what families are advertised via the local rpcbind daemon.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     

26 Jul, 2008

1 commit


21 Jul, 2008

1 commit


19 Jul, 2008

1 commit

  • * This patch replaces the dangerous lvalue version of cpumask_of_cpu
    with new cpumask_of_cpu_ptr macros. These are patterned after the
    node_to_cpumask_ptr macros.

    In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
    the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map
    is provided when there is a large NR_CPUS count, reducing
    greatly the amount of code generated and stack space used for
    cpumask_of_cpu(). The pointer to the cpumask_t value is needed for
    calling set_cpus_allowed_ptr() to reduce the amount of stack space
    needed to pass the cpumask_t value.

    If there isn't a cpumask_of_cpu_map[], then a temporary variable is
    declared and filled in with value from cpumask_of_cpu(cpu) as well as
    a pointer variable pointing to this temporary variable. Afterwards,
    the pointer is used to reference the cpumask value. The compiler
    will optimize out the extra dereference through the pointer as well
    as the stack space used for the pointer, resulting in identical code.

    A good example of the orthogonal usages is in net/sunrpc/svc.c:

    case SVC_POOL_PERCPU:
    {
    unsigned int cpu = m->pool_to[pidx];
    cpumask_of_cpu_ptr(cpumask, cpu);

    *oldmask = current->cpus_allowed;
    set_cpus_allowed_ptr(current, cpumask);
    return 1;
    }
    case SVC_POOL_PERNODE:
    {
    unsigned int node = m->pool_to[pidx];
    node_to_cpumask_ptr(nodecpumask, node);

    *oldmask = current->cpus_allowed;
    set_cpus_allowed_ptr(current, nodecpumask);
    return 1;
    }

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis
     

24 Jun, 2008

3 commits

  • Since we no longer make any distinction between shutdown signals with
    nfsd, then it becomes easier to just standardize on a particular signal
    to use to bring it down (SIGINT, in this case).

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • This patch is rather large, but I couldn't figure out a way to break it
    up that would remain bisectable. It does several things:

    - change svc_thread_fn typedef to better match what kthread_create expects
    - change svc_pool_map_set_cpumask to be more kthread friendly. Make it
    take a task arg and and get rid of the "oldmask"
    - have svc_set_num_threads call kthread_create directly
    - eliminate __svc_create_thread

    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • This removes the BKL from the RPC service creation codepath. The BKL
    really isn't adequate for this job since some of this info needs
    protection across sleeps.

    Also, add some comments to try and clarify how the locking should work
    and to make it clear that the BKL isn't necessary as long as there is
    adequate locking between tasks when touching the svc_serv fields.

    Signed-off-by: Neil Brown
    Signed-off-by: Jeff Layton
    Signed-off-by: J. Bruce Fields

    Neil Brown
     

24 May, 2008

1 commit

  • * Pass reference to cpumask variable instead of using stack.

    For inclusion into sched-devel/latest tree.

    Based on:
    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    + sched-devel/latest .../mingo/linux-2.6-sched-devel.git

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Mike Travis
     

09 May, 2008

1 commit


24 Apr, 2008

5 commits


20 Apr, 2008

1 commit

  • * Use new node_to_cpumask_ptr. This creates a pointer to the
    cpumask for a given node. This definition is in mm patch:

    asm-generic-add-node_to_cpumask_ptr-macro.patch

    * Use new set_cpus_allowed_ptr function.

    Depends on:
    [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
    [sched-devel]: sched: add new set_cpus_allowed_ptr function
    [x86/latest]: x86: add cpus_scnprintf function

    Cc: Greg Kroah-Hartman
    Cc: Greg Banks
    Cc: H. Peter Anvin
    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis
     

02 Feb, 2008

6 commits

  • Clean up: When looping over RPC version and procedure numbers, use
    unsigned index variables.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Do it for the server code...

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields

    Trond Myklebust
     
  • Move the initialzation in __svc_create_thread that happens prior to
    thread creation to a new function. Export the function to allow
    services to have better control over the svc_rqst structs.

    Also rearrange the rqstp initialization to prevent NULL pointer
    dereferences in svc_exit_thread in case allocations fail.

    Signed-off-by: Jeff Layton
    Reviewed-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Move sk_list and sk_ready to svc_xprt. This involves close because these
    lists are walked by svcs when closing all their transports. So I combined
    the moving of these lists to svc_xprt with making close transport independent.

    The svc_force_sock_close has been changed to svc_close_all and takes a list
    as an argument. This removes some svc internals knowledge from the svcs.

    This code races with module removal and transport addition.

    Thanks to Simon Holm Thøgersen for a compile fix.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields
    Cc: Simon Holm Thøgersen

    Tom Tucker
     
  • Some transports add fields to the RPC header for replies, e.g. the TCP
    record length. This function is called when preparing the reply header
    to allow each transport to add whatever fields it requires.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     
  • The svc_max_payload function currently looks at the socket type
    to determine the max payload. Add a max payload value to svc_xprt_class
    so it can be returned directly.

    Signed-off-by: Tom Tucker
    Acked-by: Neil Brown
    Reviewed-by: Chuck Lever
    Reviewed-by: Greg Banks
    Signed-off-by: J. Bruce Fields

    Tom Tucker
     

30 Jan, 2008

1 commit


10 Oct, 2007

1 commit

  • This patch adds the address of the client that caused an error in
    sunrpc/svc.c so that you get errors that look like:

    svc: 192.168.66.28, port=709: unknown version (3 for prog 100003, nfsd)

    I've seen machines which get bunches of unknown version or similar
    errors from time to time, and while the recent patch to add the service
    helps to find which service has the wrong version it doesn't help find
    the potentially bad client.

    The patch is against a checkout of Linus's git tree made on 2007-08-24.

    One observation is that the svc_print_addr function prints to a buffer
    which in this case makes life a little more complex; it just feels as if
    there must be lots of places that print a connection address - is there
    a better function to use anywhere?

    I think actually there are a few places with semi duplicated code; e.g.
    one_sock_name switches on the address family but only currently has
    IPV4; I wonder how many other places are similar.

    Signed-off-by: Dave Gilbert
    Cc: Randy Dunlap
    Signed-off-by: J. Bruce Fields
    Acked-by: Neil Brown

    Dr. David Alan Gilbert
     

10 Jul, 2007

1 commit


10 May, 2007

1 commit

  • When the kernel calls svc_reserve to downsize the expected size of an RPC
    reply, it fails to account for the possibility of a checksum at the end of
    the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
    then you'll generally see messages similar to this in the server's ring
    buffer:

    RPC request reserved 164 but used 208

    While I was never able to verify it, I suspect that this problem is also
    the root cause of some oopses I've seen under these conditions:

    https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

    This is probably also a problem for other sec= types and for NFSv4. The
    large reserved size for NFSv4 compound packets seems to generally paper
    over the problem, however.

    This patch adds a wrapper for svc_reserve that accounts for the possibility
    of a checksum. It also fixes up the appropriate callers of svc_reserve to
    call the wrapper. For now, it just uses a hardcoded value that I
    determined via testing. That value may need to be revised upward as things
    change, or we may want to eventually add a new auth_op that attempts to
    calculate this somehow.

    Unfortunately, there doesn't seem to be a good way to reliably determine
    the expected checksum length prior to actually calculating it, particularly
    with schemes like spkm3.

    Signed-off-by: Jeff Layton
    Acked-by: Neil Brown
    Cc: Trond Myklebust
    Acked-by: J. Bruce Fields
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

01 May, 2007

1 commit


07 Mar, 2007

2 commits

  • Provide a module param "pool_mode" for sunrpc.ko which allows a sysadmin to
    choose the mode for mapping NFS thread service pools to CPUs. Values are:

    auto choose a mapping mode heuristically
    global (default, same as the pre-2.6.19 code) a single global pool
    percpu one pool per CPU
    pernode one pool per NUMA node

    Note that since 2.6.19 the hardcoded behaviour has been "auto", this patch
    makes the default "global".

    The pool mode can be changed after boot/modprobe using /sys, if the NFS and
    lockd services have been shut down. A useful side effect of this change is to
    fix a small memory leak when unloading the module.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • When the last thread of nfsd exits, it shuts down all related sockets. It
    currently uses svc_close_socket to do this, but that only is immediately
    effective if the socket is not SK_BUSY.

    If the socket is busy - i.e. if a request has arrived that has not yet been
    processes - svc_close_socket is not effective and the shutdown process spins.

    So create a new svc_force_close_socket which removes the SK_BUSY flag is set
    and then calls svc_close_socket.

    Also change some open-codes loops in svc_destroy to use
    list_for_each_entry_safe.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

21 Feb, 2007

2 commits

  • We frequently need the maximum number of possible processors in order to
    allocate arrays for all processors. So far this was done using
    highest_possible_processor_id(). However, we do need the number of
    processors not the highest id. Moreover the number was so far dynamically
    calculated on each invokation. The number of possible processors does not
    change when the system is running. We can therefore calculate that number
    once.

    Signed-off-by: Christoph Lameter
    Cc: Frederik Deweerdt
    Cc: Neil Brown
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • highest_possible_node_id() is currently used to calculate the last possible
    node idso that the network subsystem can figure out how to size per node
    arrays.

    I think having the ability to determine the maximum amount of nodes in a
    system at runtime is useful but then we should name this entry
    correspondingly, it should return the number of node_ids, and the the value
    needs to be setup only once on bootup. The node_possible_map does not
    change after bootup.

    This patch introduces nr_node_ids and replaces the use of
    highest_possible_node_id(). nr_node_ids is calculated on bootup when the
    page allocators pagesets are initialized.

    [deweerdt@free.fr: fix oops]
    Signed-off-by: Christoph Lameter
    Cc: Neil Brown
    Cc: Trond Myklebust
    Signed-off-by: Frederik Deweerdt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

13 Feb, 2007

1 commit


11 Feb, 2007

1 commit


10 Feb, 2007

1 commit

  • If you lose this race, it can iput a socket inode twice and you get a BUG
    in fs/inode.c

    When I added the option for user-space to close a socket, I added some
    cruft to svc_delete_socket so that I could call that function when closing
    a socket per user-space request.

    This was the wrong thing to do. I should have just set SK_CLOSE and let
    normal mechanisms do the work.

    Not only wrong, but buggy. The locking is all wrong and it openned up a
    race where-by a socket could be closed twice.

    So this patch:
    Introduces svc_close_socket which sets SK_CLOSE then either leave
    the close up to a thread, or calls svc_delete_socket if it can
    get SK_BUSY.

    Adds a bias to sk_busy which is removed when SK_DEAD is set,
    This avoid races around shutting down the socket.

    Changes several 'spin_lock' to 'spin_lock_bh' where the _bh
    was missing.

    Bugzilla-url: http://bugzilla.kernel.org/show_bug.cgi?id=7916

    Signed-off-by: Neil Brown
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown