Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

02 Oct, 2006

40 commits

fe74290d5 [PATCH] provide kernel_execve on all architectures ... Browse Code »

This adds the new kernel_execve function on all architectures that were using
_syscall3() to implement execve.

The implementation uses code from the _syscall3 macros provided in the
unistd.h header file. I don't have cross-compilers for any of these
architectures, so the patch is untested with the exception of i386.

Most architectures can probably implement this in a nicer way in assembly or
by combining it with the sys_execve implementation itself, but this should do
it for now.

[bunk@stusta.de: m68knommu build fix]
[markh@osdl.org: build fix]
[bero@arklinux.org: build fix]
[ralf@linux-mips.org: mips fix]
[schwidefsky@de.ibm.com: s390 fix]
Signed-off-by: Arnd Bergmann
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Ralf Baechle
Signed-off-by: Bernhard Rosenkraenzer
Signed-off-by: Mark Haverkamp
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2006-10-02 22:57:23 +0800
3db03b4af [PATCH] rename the provided execve functions to kernel_execve ... Browse Code »

Some architectures provide an execve function that does not set errno, but
instead returns the result code directly. Rename these to kernel_execve to
get the right semantics there. Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann
Cc: Andi Kleen
Acked-by: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Adrian Bunk
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2006-10-02 22:57:23 +0800
676085679 [PATCH] introduce kernel_execve ... Browse Code »

The use of execve() in the kernel is dubious, since it relies on the
__KERNEL_SYSCALLS__ mechanism that stores the result in a global errno
variable. As a first step of getting rid of this, change all users to a
global kernel_execve function that returns a proper error code.

This function is a terrible hack, and a later patch removes it again after the
kernel syscalls are gone.

Signed-off-by: Arnd Bergmann
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Russell King
Cc: Ian Molton
Cc: Mikael Starvik
Cc: David Howells
Cc: Yoshinori Sato
Cc: Hirokazu Takata
Cc: Ralf Baechle
Cc: Kyle McMartin
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Paul Mundt
Cc: Kazumoto Kojima
Cc: Richard Curnow
Cc: William Lee Irwin III
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Paolo 'Blaisorblade' Giarrusso
Cc: Miles Bader
Cc: Chris Zankel
Cc: "Luck, Tony"
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2006-10-02 22:57:23 +0800
2453a3062 [PATCH] ipc: replace kmalloc and memset in get_undo_list with kzalloc ... Browse Code »

Simplify get_undo_list() by dropping the unnecessary cast, removing the
size variable, and switching to kzalloc() instead of a kmalloc() followed
by a memset().

This cleanup was split then modified from Jes Sorenson's Task Notifiers
patches.

Signed-off-by: Matt Helsley
Cc: Jes Sorensen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Helsley
2006-10-02 22:57:22 +0800
5d124e99c [PATCH] nsproxy cloning error path fix ... Browse Code »

This patch fixes copy_namespaces()'s error path.

when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are
copied from the old nsproxy. Later in copy_utsname, copy_ipcs, etc.
according namespaces are get-ed. On error path needed namespaces are
put-ed, so there's no need to put new nsproxy itelf as it woud cause
putting namespaces for the second time.

Found when incorporating namespaces into OpenVZ kernel.

Signed-off-by: Pavel Emelianov
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel
2006-10-02 22:57:22 +0800
fcfbd547b [PATCH] IPC namespace - sysctls ... Browse Code »

Sysctl tweaks for IPC namespace

Signed-off-by: Pavel Emelianiov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
4e9823111 [PATCH] IPC namespace - shm ... Browse Code »

IPC namespace support for IPC shm code.

Signed-off-by: Pavel Emelianiov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
e38935341 [PATCH] IPC namespace - sem ... Browse Code »

IPC namespace support for IPC sem code.

Signed-off-by: Pavel Emelianiov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
1e7869373 [PATCH] IPC namespace - msg ... Browse Code »

IPC namespace support for IPC msg code.

Signed-off-by: Pavel Emelianiov
Signed-off-by: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
73ea41302 [PATCH] IPC namespace - utils ... Browse Code »

This patch adds basic IPC namespace functionality to
IPC utils:
- init_ipc_ns
- copy/clone/unshare/free IPC ns
- /proc preparations

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
25b21cb2f [PATCH] IPC namespace core ... Browse Code »

This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace. Basically, it is another building block of
containers functionality.

This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support

[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Cedric Le Goater
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800
c0b2fc316 [PATCH] uts: copy nsproxy only when needed ... Browse Code »

The nsproxy was being copied in unshare() when anything was being unshared,
even if it was something not referenced from nsproxy. This should end up
in some cases with far more memory usage than necessary.

Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge Hallyn
2006-10-02 22:57:22 +0800
071df104f [PATCH] namespaces: utsname: implement CLONE_NEWUTS flag ... Browse Code »

Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare.

[clg@fr.ibm.com: IPC unshare fix]
[bunk@stusta.de: cleanup]
Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Adrian Bunk
Signed-off-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:22 +0800
bf47fdcda [PATCH] namespaces: utsname: remove system_utsname ... Browse Code »

The system_utsname isn't needed now that kernel/sysctl.c is fixed.
Nuke it.

Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
8218c74c0 [PATCH] namespaces: utsname: sysctl ... Browse Code »

Sysctl uts patch. This will need to be done another way, but since sysctl
itself needs to be container aware, 'the right thing' is a separate patchset.

[akpm@osdl.org: ia64 build fix]
[sam.vilain@catalyst.net.nz: cleanup]
[sam.vilain@catalyst.net.nz: add proc_do_utsns_string]
Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
4865ecf13 [PATCH] namespaces: utsname: implement utsname namespaces ... Browse Code »

This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.

It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.

[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
96b644bde [PATCH] namespaces: utsname: use init_utsname when appropriate ... Browse Code »

In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use. This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname(). Hope I picked all the
right ones in net/ipv4/ipconfig.c. These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
e9ff3990f [PATCH] namespaces: utsname: switch to using uts namespaces ... Browse Code »

Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Cedric Le Goater
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
0bdd7aab7 [PATCH] namespaces: utsname: introduce temporary helpers ... Browse Code »

Define utsname() and init_utsname() which return &system_utsname. Users of
system_utsname will be changed to use these helpers, after which
system_utsname will disappear.

Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:21 +0800
fab413a33 [PATCH] namespaces: exit_task_namespaces() invalidates nsproxy ... Browse Code »

exit_task_namespaces() has replaced the former exit_namespace(). It
invalidates task->nsproxy and associated namespaces. This is an issue for
the (futur) pid namespace which is required to be valid in exit_notify().

This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy
valid.

Signed-off-by: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-10-02 22:57:21 +0800
1651e14e2 [PATCH] namespaces: incorporate fs namespace into nsproxy ... Browse Code »

This moves the mount namespace into the nsproxy. The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks. As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.

Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:20 +0800
0437eb594 [PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c ... Browse Code »

Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.

Signed-off-by: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:20 +0800
ab516013a [PATCH] namespaces: add nsproxy ... Browse Code »

This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2006-10-02 22:57:20 +0800
b1ba4ddde [PATCH] make kernel/sysctl.c:_proc_do_string() static ... Browse Code »

This patch makes the needlessly global _proc_do_string() static.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-10-02 22:57:20 +0800
f5dd3d6fa [PATCH] proc: sysctl: add _proc_do_string helper ... Browse Code »

The logic in proc_do_string is worth re-using without passing in a
ctl_table structure (say, we want to calculate a pointer and pass that in
instead); pass in the two fields it uses from that structure as explicit
arguments.

Signed-off-by: Sam Vilain
Cc: Serge E. Hallyn
Cc: Kirill Korotaev
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Andrey Savochkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sam Vilain
2006-10-02 22:57:20 +0800
12fd35203 [PATCH] nfsd: lockdep annotation ... Browse Code »

while doing a kernel make modules_install install over an NFS mount.

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
nfsd/9550 is trying to acquire lock:
(&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

but task is already holding lock:
(&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

other info that might help us debug this:
2 locks held by nfsd/9550:
#0: (hash_sem){..--}, at: [] exp_readlock+0xd/0xf [nfsd]
#1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

stack backtrace:
[] show_trace_log_lvl+0x58/0x152
[] show_trace+0xd/0x10
[] dump_stack+0x19/0x1b
[] __lock_acquire+0x77a/0x9a3
[] lock_acquire+0x60/0x80
[] __mutex_lock_slowpath+0xa7/0x20e
[] mutex_lock+0x1c/0x1f
[] vfs_unlink+0x34/0x8a
[] nfsd_unlink+0x18f/0x1e2 [nfsd]
[] nfsd3_proc_remove+0x95/0xa2 [nfsd]
[] nfsd_dispatch+0xc0/0x178 [nfsd]
[] svc_process+0x3a5/0x5ed
[] nfsd+0x1a7/0x305 [nfsd]
[] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
[] show_trace+0xd/0x10
[] dump_stack+0x19/0x1b
[] __lock_acquire+0x77a/0x9a3
[] lock_acquire+0x60/0x80
[] __mutex_lock_slowpath+0xa7/0x20e
[] mutex_lock+0x1c/0x1f
[] vfs_unlink+0x34/0x8a
[] nfsd_unlink+0x18f/0x1e2 [nfsd]
[] nfsd3_proc_remove+0x95/0xa2 [nfsd]
[] nfsd_dispatch+0xc0/0x178 [nfsd]
[] svc_process+0x3a5/0x5ed
[] nfsd+0x1a7/0x305 [nfsd]
[] kernel_thread_helper+0x5/0xb

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
nfsd/9580 is trying to acquire lock:
(&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

but task is already holding lock:
(&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

other info that might help us debug this:
2 locks held by nfsd/9580:
#0: (hash_sem){..--}, at: [] exp_readlock+0xd/0xf [nfsd]
#1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x1c/0x1f

stack backtrace:
[] show_trace_log_lvl+0x58/0x152
[] show_trace+0xd/0x10
[] dump_stack+0x19/0x1b
[] __lock_acquire+0x77a/0x9a3
[] lock_acquire+0x60/0x80
[] __mutex_lock_slowpath+0xa7/0x20e
[] mutex_lock+0x1c/0x1f
[] nfsd_setattr+0x2c8/0x499 [nfsd]
[] nfsd_create_v3+0x31b/0x4ac [nfsd]
[] nfsd3_proc_create+0x128/0x138 [nfsd]
[] nfsd_dispatch+0xc0/0x178 [nfsd]
[] svc_process+0x3a5/0x5ed
[] nfsd+0x1a7/0x305 [nfsd]
[] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
[] show_trace+0xd/0x10
[] dump_stack+0x19/0x1b
[] __lock_acquire+0x77a/0x9a3
[] lock_acquire+0x60/0x80
[] __mutex_lock_slowpath+0xa7/0x20e
[] mutex_lock+0x1c/0x1f
[] nfsd_setattr+0x2c8/0x499 [nfsd]
[] nfsd_create_v3+0x31b/0x4ac [nfsd]
[] nfsd3_proc_create+0x128/0x138 [nfsd]
[] nfsd_dispatch+0xc0/0x178 [nfsd]
[] svc_process+0x3a5/0x5ed
[] nfsd+0x1a7/0x305 [nfsd]
[] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra
Cc: Neil Brown
Cc: Ingo Molnar
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-10-02 22:57:20 +0800
eed2965af [PATCH] knfsd: allow admin to set nthreads per node ... Browse Code »

Add /proc/fs/nfsd/pool_threads which allows the sysadmin (or a userspace
daemon) to read and change the number of nfsd threads in each pool. The
format is a list of space-separated integers, one per pool.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:20 +0800
bfd241600 [PATCH] knfsd: make rpc threads pools numa aware ... Browse Code »

Actually implement multiple pools. On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:20 +0800
eec09661d [PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd ... Browse Code »

Replace the existing list of all nfsd threads with new code using
svc_create_pooled().

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:20 +0800
a74554429 [PATCH] knfsd: add svc_set_num_threads ... Browse Code »

Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv. Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code. Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
9a24ab574 [PATCH] knfsd: add svc_get ... Browse Code »

add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
3262c816a [PATCH] knfsd: split svc_serv into pools ... Browse Code »

Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv. The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
This is a major step towards making the NFS server NUMA-friendly.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
c081a0c7c [PATCH] knfsd: test and set SK_BUSY atomically ... Browse Code »

The SK_BUSY bit in svc_sock->sk_flags ensures that we do not attempt to
enqueue a socket twice. Currently, setting and clearing the bit is protected
by svc_serv->sv_lock. As I intend to reduce the data that the lock protects
so it's not held when svc_sock_enqueue() tests and sets SK_BUSY, that test and
set needs to be atomic.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
5685f0fa1 [PATCH] knfsd: convert sk_reserved to atomic_t ... Browse Code »

Convert the svc_sock->sk_reserved variable from an int protected by
svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we
need to take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
1a68d952a [PATCH] knfsd: use new lock for svc_sock deferred list ... Browse Code »

Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the
number of places we need to take the svc_serv lock.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
c45c357d7 [PATCH] knfsd: convert sk_inuse to atomic_t ... Browse Code »

Convert the svc_sock->sk_inuse counter from an int protected by
svc_serv->sv_lock, to an atomic. This reduces the number of places we need to
take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
36bdfc8ba [PATCH] knfsd: move tempsock aging to a timer ... Browse Code »

Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware. They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2

N Throughput, MiB/s CPU usage, % (max=N*100)
Before After Before After
--- ------ ---- ----- -----
4 312 435 350 228
6 500 656 501 418
8 562 804 690 589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Banks
2006-10-02 22:57:19 +0800
4a3ae42dc [PATCH] knfsd: Correctly handle error condition from lockd_up ... Browse Code »

If lockd_up fails - what should we expect? Do we have to later call
lockd_down?

Well the nfs client thinks "no", the nfs server thinks "yes". lockd thinks
"yes".

The only answer that really makes sense is "no" !!

So:
Make lockd_up only increment nlmsvc_users on success.
Make nfsd handle errors from lockd_up properly.
Make sure lockd_up(0) never fails when lockd is running
so that the 'reclaimer' call to lockd_up doesn't need to
be error checked.

Cc: "J. Bruce Fields"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-02 22:57:18 +0800
7dcf91ec6 [PATCH] knfsd: Move makesock failed warning into make_socks. ... Browse Code »

Thus it is printed for any path that leads to failure (make_socks is called
from two places).

Cc: "J. Bruce Fields"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-02 22:57:18 +0800
3dfb42105 [PATCH] knfsd: Check return value of lockd_up in write_ports ... Browse Code »

We should be checking the return value of lockd_up when adding a new socket to
nfsd. So move the lockd_up before the svc_addsock and check the return value.

The move is because lockd_down is easy, but there is no easy way to remove a
recently added socket.

Cc: "J. Bruce Fields"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-02 22:57:18 +0800