02 Oct, 2006

14 commits

  • Replace references to system_utsname to the per-process uts namespace
    where appropriate. This includes things like uname.

    Changes: Per Eric Biederman's comments, use the per-process uts namespace
    for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

    [jdike@addtoit.com: UML fix]
    [clg@fr.ibm.com: cleanup]
    [akpm@osdl.org: build fix]
    Signed-off-by: Serge E. Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Cedric Le Goater
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Actually implement multiple pools. On NUMA machines, allocate a svc_pool per
    NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue
    sockets on the svc_pool corresponding to the CPU on which the socket bh is run
    (i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them
    to the CPUs in the svc_pool that owns them.

    This is the patch that allows an Altix to scale NFS traffic linearly
    beyond 4 CPUs and 4 NICs.

    Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
    Christoph Hellwig.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
    way of managing the list of all threads in a svc_serv. Add
    svc_create_pooled() to allow creation of a svc_serv whose threads are managed
    by the sunrpc code. Add svc_set_num_threads() to manage the number of threads
    in a service, either per-pool or globally across the service.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Split out the list of idle threads and pending sockets from svc_serv into a
    new svc_pool structure, and allocate a fixed number (in this patch, 1) of
    pools per svc_serv. The new structure contains a lock which takes over
    several of the duties of svc_serv->sv_lock, which is now relegated to
    protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

    The point is to move the hottest fields out of svc_serv and into svc_pool,
    allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
    This is a major step towards making the NFS server NUMA-friendly.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • The SK_BUSY bit in svc_sock->sk_flags ensures that we do not attempt to
    enqueue a socket twice. Currently, setting and clearing the bit is protected
    by svc_serv->sv_lock. As I intend to reduce the data that the lock protects
    so it's not held when svc_sock_enqueue() tests and sets SK_BUSY, that test and
    set needs to be atomic.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Convert the svc_sock->sk_reserved variable from an int protected by
    svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we
    need to take the (effectively global) svc_serv->sv_lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
    instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the
    number of places we need to take the svc_serv lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Convert the svc_sock->sk_inuse counter from an int protected by
    svc_serv->sv_lock, to an atomic. This reduces the number of places we need to
    take the (effectively global) svc_serv->sv_lock.

    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • Following are 11 patches from Greg Banks which combine to make knfsd more
    Numa-aware. They reduce hitting on 'global' data structures, and create some
    data-structures that can be node-local.

    knfsd threads are bound to a particular node, and the thread to handle a new
    request is chosen from the threads that are attach to the node that received
    the interrupt.

    The distribution of threads across nodes can be controlled by a new file in
    the 'nfsd' filesystem, though the default approach of an even spread is
    probably fine for most sites.

    Some (old) numbers that show the efficacy of these patches: N == number of
    NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2

    N Throughput, MiB/s CPU usage, % (max=N*100)
    Before After Before After
    --- ------ ---- ----- -----
    4 312 435 350 228
    6 500 656 501 418
    8 562 804 690 589

    This patch:

    Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
    a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
    the amount of work that needs to be done in the main RPC loop and the length
    of time we need to hold the (effectively global) svc_serv->sv_lock.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     
  • It isn't needed as it is available in rqstp->rq_server, and dropping it allows
    some local vars to be dropped.

    [akpm@osdl.org: build fix]
    Cc: "J. Bruce Fields"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Userspace should create and bind a socket (but not connectted) and write the
    'fd' to portlist. This will cause the nfs server to listen on that socket.

    To close a socket, the name of the socket - as read from 'portlist' can be
    written to 'portlist' with a preceding '-'.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This file will list all ports that nfsd has open.
    Default when TCP enabled will be
    ipv4 udp 0.0.0.0 2049
    ipv4 tcp 0.0.0.0 2049

    Later, the list of ports will be settable.

    'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
    non-mainline patches which created 'ports' with different semantics.

    [akpm@osdl.org: cleanups, build fix]
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • nfsd has some cleanup that it wants to do when the last thread exits, and
    there will shortly be some more. So collect this all into one place and
    define a callback for an rpc service to call when the service is about to be
    destroyed.

    [akpm@osdl.org: cleanups, build fix]
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Greg Banks
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Banks
     

01 Oct, 2006

1 commit


29 Sep, 2006

4 commits


27 Sep, 2006

2 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

24 Sep, 2006

1 commit

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits)
    NFS: unmark NFS direct I/O as experimental
    NFS: add comments clarifying the use of nfs_post_op_update()
    NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers
    NFS: Use SEEK_END instead of hardcoded value
    NFSv4: When mounting with a port=0 argument, substitute port=2049
    NFSv4: Poll more aggressively when handling NFS4ERR_DELAY
    NFSv4: Handle the condition NFS4ERR_FILE_OPEN
    NFSv4: Retry lease recovery if it failed during a synchronous operation.
    NFS: Don't invalidate the symlink we just stuffed into the cache
    NFS: Make read() return an ESTALE if the file has been deleted
    NFSv4: It's perfectly legal for clp to be NULL here....
    NFS: nfs_lookup - don't hash dentry when optimising away the lookup
    SUNRPC: Fix Oops in pmap_getport_done
    SUNRPC: Add refcounting to the struct rpc_xprt
    SUNRPC: Clean up soft task error handling
    SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
    SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
    Fix a referral error Oops
    NFS: NFS_ROOT should use the new rpc_create API
    NFS: Fix up compiler warnings on 64-bit platforms in client.c
    ...

    Manually resolved conflict in net/sunrpc/xprtsock.c

    Linus Torvalds
     

23 Sep, 2006

18 commits