Eric Lee / smarc-fsl-linux-kernel

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

20 Jul, 2011

1 commit

665771939 make sure that nsproxy_cache is initialized early enough ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:44:07 +0800

27 May, 2011

1 commit

a77aea920 cgroup: remove the ns_cgroup ... Browse Code »

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup

The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano
Signed-off-by: Serge E. Hallyn
Cc: Eric W. Biederman
Cc: Jamal Hadi Salim
Reviewed-by: Li Zefan
Acked-by: Paul Menage
Acked-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Lezcano
2011-05-27 08:12:34 +0800

11 May, 2011

1 commit

0663c6f8f ns: Introduce the setns syscall ... Browse Code »

With the networking stack today there is demand to handle
multiple network stacks at a time. Not in the context
of containers but in the context of people doing interesting
things with routing.

There is also demand in the context of containers to have
an efficient way to execute some code in the container itself.
If nothing else it is very useful ad a debugging technique.

Both problems can be solved by starting some form of login
daemon in the namespaces people want access to, or you
can play games by ptracing a process and getting the
traced process to do things you want it to do. However
it turns out that a login daemon or a ptrace puppet
controller are more code, they are more prone to
failure, and generally they are less efficient than
simply changing the namespace of a process to a
specified one.

Pieces of this puzzle can also be solved by instead of
coming up with a general purpose system call coming up
with targed system calls perhaps socketat that solve
a subset of the larger problem. Overall that appears
to be more work for less reward.

int setns(int fd, int nstype);

The fd argument is a file descriptor referring to a proc
file of the namespace you want to switch the process to.

In the setns system call the nstype is 0 or specifies
an clone flag of the namespace you intend to change
to prevent changing a namespace unintentionally.

v2: Most of the architecture support added by Daniel Lezcano
v3: ported to v2.6.36-rc4 by: Eric W. Biederman
v4: Moved wiring up of the system call to another patch
v5: Cleaned up the system call arguments
- Changed the order.
- Modified nstype to take the standard clone flags.
v6: Added missing error handling as pointed out by Matt Helsley

Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2011-05-11 05:32:56 +0800

24 Mar, 2011

4 commits

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800
b515498f5 userns: add a user namespace owner of ipc ns ... Browse Code »

Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.
Feb 23: Move extern declaration to ipc_namespace.h, and group
fwd declarations at top.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:07 +0800
bb96a6f50 userns: allow sethostname in a container ... Browse Code »

Changelog:
Feb 23: let clone_uts_ns() handle setting uts->user_ns
To do so we need to pass in the task_struct who'll
get the utsname, so we can get its user_ns.
Feb 23: As per Oleg's coment, just pass in tsk, instead of two
of its members.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:03 +0800
59607db36 userns: add a user_namespace as creator/owner of uts_namespace ... Browse Code »

The expected course of development for user namespaces targeted
capabilities is laid out at https://wiki.ubuntu.com/UserNamespace.

Goals:

- Make it safe for an unprivileged user to unshare namespaces. They
will be privileged with respect to the new namespace, but this should
only include resources which the unprivileged user already owns.

- Provide separate limits and accounting for userids in different
namespaces.

Status:

Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to
get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and
CAP_SETGID capabilities. What this gets you is a whole new set of
userids, meaning that user 500 will have a different 'struct user' in
your namespace than in other namespaces. So any accounting information
stored in struct user will be unique to your namespace.

However, throughout the kernel there are checks which

- simply check for a capability. Since root in a child namespace
has all capabilities, this means that a child namespace is not
constrained.

- simply compare uid1 == uid2. Since these are the integer uids,
uid 500 in namespace 1 will be said to be equal to uid 500 in
namespace 2.

As a result, the lxc implementation at lxc.sf.net does not use user
namespaces. This is actually helpful because it leaves us free to
develop user namespaces in such a way that, for some time, user
namespaces may be unuseful.

Bugs aside, this patchset is supposed to not at all affect systems which
are not actively using user namespaces, and only restrict what tasks in
child user namespace can do. They begin to limit privilege to a user
namespace, so that root in a container cannot kill or ptrace tasks in the
parent user namespace, and can only get world access rights to files.
Since all files currently belong to the initila user namespace, that means
that child user namespaces can only get world access rights to *all*
files. While this temporarily makes user namespaces bad for system
containers, it starts to get useful for some sandboxing.

I've run the 'runltplite.sh' with and without this patchset and found no
difference.

This patch:

copy_process() handles CLONE_NEWUSER before the rest of the namespaces.
So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace
will have the new user namespace as its owner. That is what we want,
since we want root in that new userns to be able to have privilege over
it.

Changelog:
Feb 15: don't set uts_ns->user_ns if we didn't create
a new uts_ns.
Feb 23: Move extern init_user_ns declaration from
init/version.c to utsname.h.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:46:59 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

13 Mar, 2010

1 commit

8467005da nsproxy: remove INIT_NSPROXY() ... Browse Code »

Remove INIT_NSPROXY(), use C99 initializer.
Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

Note: headers trim will be done later, now it's quite pointless because
results will be invalidated by merge window.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2010-03-13 07:52:40 +0800

19 Jun, 2009

1 commit

90af90d7d nsproxy: extract create_nsproxy() ... Browse Code »

clone_nsproxy() does useless copying of old nsproxy -- every pointer will
be rewritten to new ns or to old ns. Remove copying, rename
clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh
nsproxy on restart.

Signed-off-by: Alexey Dobriyan
Acked-by: Serge Hallyn
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-19 04:03:56 +0800

25 Nov, 2008

1 commit

18b6e0414 User namespaces: set of cleanups (v2) ... Browse Code »

The user_ns is moved from nsproxy to user_struct, so that a struct
cred by itself is sufficient to determine access (which it otherwise
would not be). Corresponding ecryptfs fixes (by David Howells) are
here as well.

Fix refcounting. The following rules now apply:
1. The task pins the user struct.
2. The user struct pins its user namespace.
3. The user namespace pins the struct user which created it.

User namespaces are cloned during copy_creds(). Unsharing a new user_ns
is no longer possible. (We could re-add that, but it'll cause code
duplication and doesn't seem useful if PAM doesn't need to clone user
namespaces).

When a user namespace is created, its first user (uid 0) gets empty
keyrings and a clean group_info.

This incorporates a previous patch by David Howells. Here
is his original patch description:

>I suggest adding the attached incremental patch. It makes the following
>changes:
>
> (1) Provides a current_user_ns() macro to wrap accesses to current's user
> namespace.
>
> (2) Fixes eCryptFS.
>
> (3) Renames create_new_userns() to create_user_ns() to be more consistent
> with the other associated functions and because the 'new' in the name is
> superfluous.
>
> (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
> beginning of do_fork() so that they're done prior to making any attempts
> at allocation.
>
> (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
> to fill in rather than have it return the new root user. I don't imagine
> the new root user being used for anything other than filling in a cred
> struct.
>
> This also permits me to get rid of a get_uid() and a free_uid(), as the
> reference the creds were holding on the old user_struct can just be
> transferred to the new namespace's creator pointer.
>
> (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
> preparation rather than doing it in copy_creds().
>
>David

>Signed-off-by: David Howells

Changelog:
Oct 20: integrate dhowells comments
1. leave thread_keyring alone
2. use current_user_ns() in set_user()

Signed-off-by: Serge Hallyn

Serge Hallyn
2008-11-25 07:57:41 +0800

24 Aug, 2008

1 commit

7a8fc9b24 removed unused #include <linux/version.h>'s ... Browse Code »

This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-08-24 03:14:12 +0800

26 Jul, 2008

1 commit

e885dcde7 cgroup_clone: use pid of newly created task for new cgroup ... Browse Code »

cgroup_clone creates a new cgroup with the pid of the task. This works
correctly for unshare, but for clone cgroup_clone is called from
copy_namespaces inside copy_process, which happens before the new pid is
created. As a result, the new cgroup was created with current's pid.
This patch:

1. Moves the call inside copy_process to after the new pid
is created
2. Passes the struct pid into ns_cgroup_clone (as it is not
yet attached to the task)
3. Passes a name from ns_cgroup_clone() into cgroup_clone()
so as to keep cgroup_clone() itself simpler
4. Uses pid_vnr() to get the process id value, so that the
pid used to name the new cgroup is always the pid as it
would be known to the task which did the cloning or
unsharing. I think that is the most intuitive thing to
do. This way, task t1 does clone(CLONE_NEWPID) to get
t2, which does clone(CLONE_NEWPID) to get t3, then the
cgroup for t3 will be named for the pid by which t2 knows
t3.

(Thanks to Dan Smith for finding the main bug)

Changelog:
June 11: Incorporate Paul Menage's feedback: don't pass
NULL to ns_cgroup_clone from unshare, and reduce
patch size by using 'nodename' in cgroup_clone.
June 10: Original version

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge Hallyn
Acked-by: Paul Menage
Tested-by: Dan Smith
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-07-26 01:53:37 +0800

29 Apr, 2008

1 commit

02fdb36ae ipc: sysvsem: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC) ... Browse Code »

CLONE_NEWIPC|CLONE_SYSVSEM interaction isn't handled properly. This can cause
a kernel memory corruption. CLONE_NEWIPC must detach from the existing undo
lists.

Fix, part 3: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC).

With unshare, specifying CLONE_SYSVSEM means unshare the sysvsem. So it seems
reasonable that CLONE_NEWIPC without CLONE_SYSVSEM would just imply
CLONE_SYSVSEM.

However with clone, specifying CLONE_SYSVSEM means *share* the sysvsem. So
calling clone(CLONE_SYSVSEM|CLONE_NEWIPC) is explicitly asking for something
we can't allow. So return -EINVAL in that case.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Serge E. Hallyn
Cc: Manfred Spraul
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Michael Kerrisk
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-04-29 23:06:14 +0800

09 Feb, 2008

1 commit

ae5e1b22f namespaces: move the IPC namespace under IPC_NS option ... Browse Code »

Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.

The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case. This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good. On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.

Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files. It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file. Moving them would make this patch complicated. On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.

Signed-off-by: Pavel Emelyanov
Acked-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:23 +0800

20 Oct, 2007

4 commits

30e49c263 pid namespaces: allow cloning of new namespace ... Browse Code »

When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
create a new struct pid for the new process. Allocate pid_t's for the new
process in the new pid namespace and all ancestor pid namespaces. Make the
newly cloned process the session and process group leader.

Since the active pid namespace is special and expected to be the first entry
in pid->upid_list, preserve the order of pid namespaces.

The size of 'struct pid' is dependent on the the number of pid namespaces the
process exists in, so we use multiple pid-caches'. Only one pid cache is
created during system startup and this used by processes that exist only in
init_pid_ns.

When a process clones its pid namespace, we create additional pid caches as
necessary and use the pid cache to allocate 'struct pids' for that depth.

Note, that with this patch the newly created namespace won't work, since the
rest of the kernel still uses global pids, but this is to be fixed soon. Init
pid namespace still works.

[oleg@tv-sign.ru: merge fix]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:39 +0800
cf7b708c8 Make access to task's nsproxy lighter ... Browse Code »

When someone wants to deal with some other taks's namespaces it has to lock
the task and then to get the desired namespace if the one exists. This is
slow on read-only paths and may be impossible in some cases.

E.g. Oleg recently noticed a race between unshare() and the (sent for
review in cgroups) pid namespaces - when the task notifies the parent it
has to know the parent's namespace, but taking the task_lock() is
impossible there - the code is under write locked tasklist lock.

On the other hand switching the namespace on task (daemonize) and releasing
the namespace (after the last task exit) is rather rare operation and we
can sacrifice its speed to solve the issues above.

The access to other task namespaces is proposed to be performed
like this:

rcu_read_lock();
nsproxy = task_nsproxy(tsk);
if (nsproxy != NULL) {
/ *
* work with the namespaces here
* e.g. get the reference on one of them
* /
} / *
* NULL task_nsproxy() means that this task is
* almost dead (zombie)
* /
rcu_read_unlock();

This patch has passed the review by Eric and Oleg :) and,
of course, tested.

[clg@fr.ibm.com: fix unshare()]
[ebiederm@xmission.com: Update get_net_ns_by_pid]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Serge Hallyn
Signed-off-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:37 +0800
2894d650c pid namespaces: define and use task_active_pid_ns() wrapper ... Browse Code »

With multiple pid namespaces, a process is known by some pid_t in every
ancestor pid namespace. Every time the process forks, the child process also
gets a pid_t in every ancestor pid namespace.

While a process is visible in >=1 pid namespaces, it can see pid_t's in only
one pid namespace. We call this pid namespace it's "active pid namespace",
and it is always the youngest pid namespace in which the process is known.

This patch defines and uses a wrapper to find the active pid namespace of a
process. The implementation of the wrapper will be changed in when support
for multiple pid namespaces are added.

Changelog:
2.6.22-rc4-mm2-pidns1:
- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
task_active_pid_ns() in child_reaper() since task->nsproxy
can be NULL during task exit (so child_reaper() continues to
use init_pid_ns).

to implement child_reaper() since init_pid_ns.child_reaper to
implement child_reaper() since tsk->nsproxy can be NULL during exit.

2.6.21-rc6-mm1:
- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
process can have multiple pid namespaces.

Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelianov
Cc: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Herbert Poetzel
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2007-10-20 02:53:37 +0800
858d72ead cgroups: implement namespace tracking subsystem ... Browse Code »

When a task enters a new namespace via a clone() or unshare(), a new cgroup
is created and the task moves into it.

This version names cgroups which are automatically created using
cgroup_clone() as "node_" where pid is the pid of the unsharing or
cloned process. (Thanks Pavel for the idea) This is safe because if the
process unshares again, it will create

/cgroups/(...)/node_/node_

The only possibilities (AFAICT) for a -EEXIST on unshare are

1. pid wraparound
2. a process fails an unshare, then tries again.

Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the
node_ will be empty and can be rmdir'ed to make the subsequent unshare()
succeed.

Changelog:
Name cloned cgroups as "node_".

[clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
Signed-off-by: Serge E. Hallyn
Cc: Paul Menage
Signed-off-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-10-20 02:53:37 +0800

17 Oct, 2007

1 commit

db8906da5 Use KMEM_CACHE macro to create the nsproxy cache ... Browse Code »

The blessed way for standard caches is to use it. Besides, this may give
this cache a better alignment.

Signed-off-by: Pavel Emelyanov
Acked-by: Cedric Le Goater
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-17 23:42:59 +0800

11 Oct, 2007

1 commit

9dd776b6d [NET]: Add network namespace clone & unshare support. ... Browse Code »

This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.

As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.

As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2007-10-11 07:52:46 +0800

20 Jul, 2007

1 commit

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800

17 Jul, 2007

6 commits

213dd266d namespace: ensure clone_flags are always stored in an unsigned long ... Browse Code »

While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.

So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.

Signed-off-by: Eric W. Biederman
Acked-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-07-17 00:05:48 +0800
98c0d07cb add a kmem_cache for nsproxy objects ... Browse Code »

It should improve performance in some scenarii where a lot of
these nsproxy objects are created by unsharing namespaces. This is
a typical use of virtual servers that are being created or entered.

This is also a good tool to find leaks and gather statistics on
namespace usage.

Signed-off-by: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
467e9f4b5 fix create_new_namespaces() return value ... Browse Code »

dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
copy_utsname().

Modify create_new_namespaces() to also use the errors returned by the
copy_*_ns routines and not to systematically return ENOMEM.

[oleg@tv-sign.ru: better changelog]
Signed-off-by: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: Badari Pulavarty
Cc: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
77ec739d8 user namespace: add unshare ... Browse Code »

This patch enables the unshare of user namespaces.

It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)

For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinux

The plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.

Signed-off-by: Serge E. Hallyn
Signed-off-by: Cedric Le Goater
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-07-17 00:05:47 +0800
acce292c8 user namespace: add the framework ... Browse Code »

Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.

A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)

The unshare is not included in this patch.

Changes since [try #4]:
- Updated get_user_ns and put_user_ns to accept NULL, and
get_user_ns to return the namespace.

Changes since [try #3]:
- moved struct user_namespace to files user_namespace.{c,h}

Changes since [try #2]:
- removed struct user_namespace* argument from find_user()

Changes since [try #1]:
- removed struct user_namespace* argument from find_user()
- added a root_user per user namespace

Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800
7d69a1f4a remove CONFIG_UTS_NS and CONFIG_IPC_NS ... Browse Code »

CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.

Signed-off-by: Cedric Le Goater
Acked-by: "Serge E. Hallyn"
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-07-17 00:05:47 +0800

24 Jun, 2007

1 commit

4e71e474c fix refcounting of nsproxy object when unshared ... Browse Code »

When a namespace is unshared, a refcount on the previous nsproxy is
abusively taken, leading to a memory leak of nsproxy objects.

Signed-off-by: Cedric Le Goater
Cc: Badari Pulavarty
Cc: Herbert Poetzl
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2007-06-24 23:59:10 +0800

09 May, 2007

1 commit

e3222c4ec Merge sys_clone()/sys_unshare() nsproxy and namespace handling ... Browse Code »

sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
namespaces. But they have different code paths.

This patch merges all the nsproxy and its associated namespace copy/clone
handling (as much as possible). Posted on container list earlier for
feedback.

- Create a new nsproxy and its associated namespaces and pass it back to
caller to attach it to right process.

- Changed all copy_*_ns() routines to return a new copy of namespace
instead of attaching it to task->nsproxy.

- Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

- Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
just incase.

- Get rid of all individual unshare_*_ns() routines and make use of
copy_*_ns() instead.

[akpm@osdl.org: cleanups, warning fix]
[clg@fr.ibm.com: remove dup_namespaces() declaration]
[serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
[akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
Signed-off-by: Badari Pulavarty
Signed-off-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc:
Signed-off-by: Cedric Le Goater
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2007-05-09 02:15:00 +0800

31 Jan, 2007

2 commits

444f378b2 Revert "[PATCH] namespaces: fix exit race by splitting exit" ... Browse Code »

This reverts commit 7a238fcba0629b6f2edbcd37458bae56fcf36be5 in
preparation for a better and simpler fix proposed by Eric Biederman
(and fixed up by Serge Hallyn)

Acked-by: Serge E. Hallyn
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-01-31 05:35:18 +0800
7a238fcba [PATCH] namespaces: fix exit race by splitting exit ... Browse Code »

Fix exit race by splitting the nsproxy putting into two pieces. First
piece reduces the nsproxy refcount. If we dropped the last reference, then
it puts the mnt_ns, and returns the nsproxy as a hint to the caller. Else
it returns NULL. The second piece of exiting task namespaces sets
tsk->nsproxy to NULL, and drops the references to other namespaces and
frees the nsproxy only if an nsproxy was passed in.

A little awkward and should probably be reworked, but hopefully it fixes
the NFS oops.

Signed-off-by: Serge E. Hallyn
Cc: Herbert Poetzl
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Cedric Le Goater
Cc: Daniel Hokka Zakrisson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-01-31 00:26:44 +0800

14 Dec, 2006

1 commit

5f8442edf [PATCH] Revert "[PATCH] identifier to nsproxy" ... Browse Code »

This reverts commit 373beb35cd6b625e0ba4ad98baace12310a26aa8.

No one is using this identifier yet. The purpose of this identifier is to
export nsproxy to user space which is wrong. nsproxy is an internal
implementation optimization, which should keep our fork times from getting
slower as we increase the number of global namespaces you don't have to
share.

Adding a global identifier like this is inappropriate because it makes
namespaces inherently non-recursive, greatly limiting what we can do with
them in the future.

Signed-off-by: Eric W. Biederman
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-12-14 01:05:47 +0800

09 Dec, 2006

3 commits

9a575a92d [PATCH] to nsproxy ... Browse Code »

Add the pid namespace framework to the nsproxy object. The copy of the pid
namespace only increases the refcount on the global pid namespace,
init_pid_ns, and unshare is not implemented.

There is no configuration option to activate or deactivate this feature
because this not relevant for the moment.

Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-12-09 00:28:52 +0800
373beb35c [PATCH] identifier to nsproxy ... Browse Code »

Add an identifier to nsproxy. The default init_ns_proxy has identifier 0 and
allocated nsproxies are given -1.

This identifier will be used by a new syscall sys_bind_ns.

Signed-off-by: Cedric Le Goater
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cedric Le Goater
2006-12-09 00:28:52 +0800
6b3286ed1 [PATCH] rename struct namespace to struct mnt_namespace ... Browse Code »

Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'

Signed-off-by: Kirill Korotaev
Signed-off-by: Cedric Le Goater
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-12-09 00:28:51 +0800

21 Oct, 2006

1 commit

e05d722e4 [PATCH] kernel/nsproxy.c: use kmemdup() ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2006-10-21 01:26:44 +0800

02 Oct, 2006

2 commits

5d124e99c [PATCH] nsproxy cloning error path fix ... Browse Code »

This patch fixes copy_namespaces()'s error path.

when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are
copied from the old nsproxy. Later in copy_utsname, copy_ipcs, etc.
according namespaces are get-ed. On error path needed namespaces are
put-ed, so there's no need to put new nsproxy itelf as it woud cause
putting namespaces for the second time.

Found when incorporating namespaces into OpenVZ kernel.

Signed-off-by: Pavel Emelianov
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel
2006-10-02 22:57:22 +0800
25b21cb2f [PATCH] IPC namespace core ... Browse Code »

This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace. Basically, it is another building block of
containers functionality.

This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support

[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Signed-off-by: Cedric Le Goater
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-10-02 22:57:22 +0800