09 Aug, 2016

1 commit


26 Jan, 2015

1 commit


05 Dec, 2014

1 commit


05 Aug, 2014

1 commit

  • /proc/thread-self is derived from /proc/self. /proc/thread-self
    points to the directory in proc containing information about the
    current thread.

    This funtionality has been missing for a long time, and is tricky to
    implement in userspace as gettid() is not exported by glibc. More
    importantly this allows fixing defects in /proc/mounts and /proc/net
    where in a threaded application today they wind up being empty files
    when only the initial pthread has exited, causing problems for other
    threads.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

25 Oct, 2013

1 commit


08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

1 commit

  • Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can
    simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE.

    [akpm@linux-foundation.org: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined]
    Signed-off-by: Raphael S.Carvalho
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Raphael S.Carvalho
     

10 Apr, 2013

1 commit


26 Dec, 2012

1 commit

  • Oleg pointed out that in a pid namespace the sequence.
    - pid 1 becomes a zombie
    - setns(thepidns), fork,...
    - reaping pid 1.
    - The injected processes exiting.

    Can lead to processes attempting access their child reaper and
    instead following a stale pointer.

    That waitpid for init can return before all of the processes in
    the pid namespace have exited is also unfortunate.

    Avoid these problems by disabling the allocation of new pids in a pid
    namespace when init dies, instead of when the last process in a pid
    namespace is reaped.

    Pointed-out-by: Oleg Nesterov
    Reviewed-by: Oleg Nesterov
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

19 Nov, 2012

2 commits

  • Track the number of pids in the proc hash table. When the number of
    pids goes to 0 schedule work to unmount the kernel mount of proc.

    Move the mount of proc into alloc_pid when we allocate the pid for
    init.

    Remove the surprising calls of pid_ns_release proc in fork and
    proc_flush_task. Those code paths really shouldn't know about proc
    namespace implementation details and people have demonstrated several
    times that finding and understanding those code paths is difficult and
    non-obvious.

    Because of the call path detach pid is alwasy called with the
    rtnl_lock held free_pid is not allowed to sleep, so the work to
    unmounting proc is moved to a work queue. This has the side benefit
    of not blocking the entire world waiting for the unnecessary
    rcu_barrier in deactivate_locked_super.

    In the process of making the code clear and obvious this fixes a bug
    reported by Gao feng where we would leak a
    mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
    succeeded and copy_net_ns failed.

    Acked-by: "Serge E. Hallyn"
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • - Capture the the user namespace that creates the pid namespace
    - Use that user namespace to test if it is ok to write to
    /proc/sys/kernel/ns_last_pid.

    Zhao Hongjiang noticed I was missing a put_user_ns
    in when destroying a pid_ns. I have foloded his patch into this one
    so that bisects will work properly.

    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Oct, 2012

1 commit

  • free_pid_ns() operates in a recursive fashion:

    free_pid_ns(parent)
    put_pid_ns(parent)
    kref_put(&ns->kref, free_pid_ns);
    free_pid_ns

    thus if there was a huge nesting of namespaces the userspace may trigger
    avalanche calling of free_pid_ns leading to kernel stack exhausting and a
    panic eventually.

    This patch turns the recursion into an iterative loop.

    Based on a patch by Andrew Vagin.

    [akpm@linux-foundation.org: export put_pid_ns() to modules]
    Signed-off-by: Cyrill Gorcunov
    Cc: Andrew Vagin
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

16 May, 2012

1 commit


29 Mar, 2012

1 commit

  • In the case of a child pid namespace, rebooting the system does not really
    makes sense. When the pid namespace is used in conjunction with the other
    namespaces in order to create a linux container, the reboot syscall leads
    to some problems.

    A container can reboot the host. That can be fixed by dropping the
    sys_reboot capability but we are unable to correctly to poweroff/
    halt/reboot a container and the container stays stuck at the shutdown time
    with the container's init process waiting indefinitively.

    After several attempts, no solution from userspace was found to reliabily
    handle the shutdown from a container.

    This patch propose to make the init process of the child pid namespace to
    exit with a signal status set to : SIGINT if the child pid namespace
    called "halt/poweroff" and SIGHUP if the child pid namespace called
    "reboot". When the reboot syscall is called and we are not in the initial
    pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
    "RESTART", and "RESTART2". Otherwise we return EINVAL.

    Returning EINVAL is also an easy way to check if this feature is supported
    by the kernel when invoking another 'reboot' option like CAD.

    By this way the parent process of the child pid namespace knows if it
    rebooted or not and can take the right decision.

    Test case:
    ==========

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #include

    static int do_reboot(void *arg)
    {
    int *cmd = arg;

    if (reboot(*cmd))
    printf("failed to reboot(%d): %m\n", *cmd);
    }

    int test_reboot(int cmd, int sig)
    {
    long stack_size = 4096;
    void *stack = alloca(stack_size) + stack_size;
    int status;
    pid_t ret;

    ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
    if (ret < 0) {
    printf("failed to clone: %m\n");
    return -1;
    }

    if (wait(&status) < 0) {
    printf("unexpected wait error: %m\n");
    return -1;
    }

    if (!WIFSIGNALED(status)) {
    printf("child process exited but was not signaled\n");
    return -1;
    }

    if (WTERMSIG(status) != sig) {
    printf("signal termination is not the one expected\n");
    return -1;
    }

    return 0;
    }

    int main(int argc, char *argv[])
    {
    int status;

    status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
    if (status >= 0) {
    printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
    return 1;
    }
    printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");

    return 0;
    }

    [akpm@linux-foundation.org: tweak and add comments]
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Daniel Lezcano
    Acked-by: Serge Hallyn
    Tested-by: Serge Hallyn
    Reviewed-by: Oleg Nesterov
    Cc: Michael Kerrisk
    Cc: "Eric W. Biederman"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

11 Jan, 2012

1 commit

  • Add support for mount options to restrict access to /proc/PID/
    directories. The default backward-compatible "relaxed" behaviour is left
    untouched.

    The first mount option is called "hidepid" and its value defines how much
    info about processes we want to be available for non-owners:

    hidepid=0 (default) means the old behavior - anybody may read all
    world-readable /proc/PID/* files.

    hidepid=1 means users may not access any /proc// directories, but
    their own. Sensitive files like cmdline, sched*, status are now protected
    against other users. As permission checking done in proc_pid_permission()
    and files' permissions are left untouched, programs expecting specific
    files' modes are not confused.

    hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
    users. It doesn't mean that it hides whether a process exists (it can be
    learned by other means, e.g. by kill -0 $PID), but it hides process' euid
    and egid. It compicates intruder's task of gathering info about running
    processes, whether some daemon runs with elevated privileges, whether
    another user runs some sensitive program, whether other users run any
    program at all, etc.

    gid=XXX defines a group that will be able to gather all processes' info
    (as in hidepid=0 mode). This group should be used instead of putting
    nonroot user in sudoers file or something. However, untrusted users (like
    daemons, etc.) which are not supposed to monitor the tasks in the whole
    system should not be added to the group.

    hidepid=1 or higher is designed to restrict access to procfs files, which
    might reveal some sensitive private information like precise keystrokes
    timings:

    http://www.openwall.com/lists/oss-security/2011/11/05/3

    hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and
    conky gracefully handle EPERM/ENOENT and behave as if the current user is
    the only user running processes. pstree shows the process subtree which
    contains "pstree" process.

    Note: the patch doesn't deal with setuid/setgid issues of keeping
    preopened descriptors of procfs files (like
    https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked
    information like the scheduling counters of setuid apps doesn't threaten
    anybody's privacy - only the user started the setuid program may read the
    counters.

    Signed-off-by: Vasiliy Kulikov
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Randy Dunlap
    Cc: "H. Peter Anvin"
    Cc: Greg KH
    Cc: Theodore Tso
    Cc: Alan Cox
    Cc: James Morris
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

09 Jan, 2009

1 commit

  • Currently task_active_pid_ns is not safe to call after a task becomes a
    zombie and exit_task_namespaces is called, as nsproxy becomes NULL. By
    reading the pid namespace from the pid of the task we can trivially solve
    this problem at the cost of one extra memory read in what should be the
    same cacheline as we read the namespace from.

    When moving things around I have made task_active_pid_ns out of line
    because keeping it in pid_namespace.h would require adding includes of
    pid.h and sched.h that I don't think we want.

    This change does make task_active_pid_ns unsafe to call during
    copy_process until we attach a pid on the task_struct which seems to be a
    reasonable trade off.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Bastian Blank
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

17 Oct, 2008

1 commit


26 Jul, 2008

2 commits


30 Apr, 2008

1 commit

  • These values represent the nesting level of a namespace and pids living in it,
    and it's always non-negative.

    Turning this from int to unsigned int saves some space in pid.c (11 bytes on
    x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit.
    E.g. on ia64 this removes the sign extension calls, which compiler adds to
    optimize access to pid->nubers[ns->level].

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

09 Feb, 2008

1 commit

  • Just like with the user namespaces, move the namespace management code into
    the separate .c file and mark the (already existing) PID_NS option as "depend
    on NAMESPACES"

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

15 Nov, 2007

1 commit

  • This is my trivial patch to swat innumerable little bugs with a single
    blow.

    After some intensive review (my apologies for not having gotten to this
    sooner) what we have looks like a good base to build on with the current
    pid namespace code but it is not complete, and it is still much to simple
    to find issues where the kernel does the wrong thing outside of the initial
    pid namespace.

    Until the dust settles and we are certain we have the ABI and the
    implementation is as correct as humanly possible let's keep process ID
    namespaces behind CONFIG_EXPERIMENTAL.

    Allowing us the option of fixing any ABI or other bugs we find as long as
    they are minor.

    Allowing users of the kernel to avoid those bugs simply by ensuring their
    kernel does not have support for multiple pid namespaces.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Adrian Bunk
    Cc: Jeremy Fitzhardinge
    Cc: Kir Kolyshkin
    Cc: Kirill Korotaev
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

20 Oct, 2007

7 commits

  • * remove pid.h from pid_namespaces.h;
    * rework is_(cgroup|global)_init;
    * optimize (get|put)_pid_ns for init_pid_ns;
    * declare task_child_reaper to return actual reaper.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Each pid namespace have to be visible through its own proc mount. Thus we
    need to have per-namespace proc trees with their own superblocks.

    We cannot easily show different pid namespace via one global proc tree, since
    each pid refers to different tasks in different namespaces. E.g. pid 1
    refers to the init task in the initial namespace and to some other task when
    seeing from another namespace. Moreover - pid, exisintg in one namespace may
    not exist in the other.

    This approach has one move advantage is that the tasks from the init namespace
    can see what tasks live in another namespace by reading entries from another
    proc tree.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Each namespace has a parent and is characterized by its "level". Level is the
    number of the namespace generation. E.g. init namespace has level 0, after
    cloning new one it will have level 1, the next one - 2 and so on and so forth.
    This level is not explicitly limited.

    True hierarchy must have some way to find each namespace's children, but it is
    not used in the patches, so this ability is not added (yet).

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Rename the child_reaper() function to task_child_reaper() to be similar to
    other task_* functions and to distinguish the function from 'struct
    pid_namspace.child_reaper'.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • With multiple pid namespaces, a process is known by some pid_t in every
    ancestor pid namespace. Every time the process forks, the child process also
    gets a pid_t in every ancestor pid namespace.

    While a process is visible in >=1 pid namespaces, it can see pid_t's in only
    one pid namespace. We call this pid namespace it's "active pid namespace",
    and it is always the youngest pid namespace in which the process is known.

    This patch defines and uses a wrapper to find the active pid namespace of a
    process. The implementation of the wrapper will be changed in when support
    for multiple pid namespaces are added.

    Changelog:
    2.6.22-rc4-mm2-pidns1:
    - [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
    task_active_pid_ns() in child_reaper() since task->nsproxy
    can be NULL during task exit (so child_reaper() continues to
    use init_pid_ns).

    to implement child_reaper() since init_pid_ns.child_reaper to
    implement child_reaper() since tsk->nsproxy can be NULL during exit.

    2.6.21-rc6-mm1:
    - Rename task_pid_ns() to task_active_pid_ns() to reflect that a
    process can have multiple pid namespaces.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Add kmem_cache to pid_namespace to allocate pids from.

    Since both implementations expand the struct pid to carry more numerical
    values each namespace should have separate cache to store pids of different
    sizes.

    Each kmem cache is name "pid_", where is the number of numerical ids
    on the pid. Different namespaces with same level of nesting will have same
    caches.

    This patch has two FIXMEs that are to be fixed after we reach the consensus
    about the struct pid itself.

    The first one is that the namespace to free the pid from in free_pid() must be
    taken from pid. Now the init_pid_ns is used.

    The second FIXME is about the cache allocation. When we do know how long the
    object will be then we'll have to calculate this size in create_pid_cachep.
    Right now the sizeof(struct pid) value is used.

    [akpm@linux-foundation.org: coding-style repair]
    Signed-off-by: Pavel Emelianov
    Acked-by: Cedric Le Goater
    Acked-by: Sukadev Bhattiprolu
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Make get_pid_ns() return the namespace itself to look like the other getters
    and make the code using it look nicer.

    Signed-off-by: Pavel Emelianov
    Acked-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

17 Jul, 2007

1 commit

  • While working on unshare support for the network namespace I noticed we
    were putting clone flags in an int. Which is weird because the syscall
    uses unsigned long and we at least need an unsigned to properly hold all of
    the unshare flags.

    So to make the code consistent, this patch updates the code to use
    unsigned long instead of int for the clone flags in those places
    where we get it wrong today.

    Signed-off-by: Eric W. Biederman
    Acked-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

09 May, 2007

1 commit

  • sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
    namespaces. But they have different code paths.

    This patch merges all the nsproxy and its associated namespace copy/clone
    handling (as much as possible). Posted on container list earlier for
    feedback.

    - Create a new nsproxy and its associated namespaces and pass it back to
    caller to attach it to right process.

    - Changed all copy_*_ns() routines to return a new copy of namespace
    instead of attaching it to task->nsproxy.

    - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

    - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
    just incase.

    - Get rid of all individual unshare_*_ns() routines and make use of
    copy_*_ns() instead.

    [akpm@osdl.org: cleanups, warning fix]
    [clg@fr.ibm.com: remove dup_namespaces() declaration]
    [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
    [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc:
    Signed-off-by: Cedric Le Goater
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

31 Jan, 2007

1 commit

  • This is based on a patch by Eric W. Biederman, who pointed out that pid
    namespaces are still fake, and we only have one ever active.

    So for the time being, we can modify any code which could access
    tsk->nsproxy->pid_ns during task exit to just use &init_pid_ns instead,
    and move the exit_task_namespaces call in do_exit() back above
    exit_notify(), so that an exiting nfs server has a valid tsk->sighand to
    work with.

    Long term, pulling pid_ns out of nsproxy might be the cleanest solution.

    Signed-off-by: Eric W. Biederman

    [ Eric's patch fixed to take care of free_pid() too ]

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

09 Dec, 2006

3 commits

  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Add the pid namespace framework to the nsproxy object. The copy of the pid
    namespace only increases the refcount on the global pid namespace,
    init_pid_ns, and unshare is not implemented.

    There is no configuration option to activate or deactivate this feature
    because this not relevant for the moment.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Rename struct pspace to struct pid_namespace for consistency with other
    namespaces (uts_namespace and ipc_namespace). Also rename
    include/linux/pspace.h to include/linux/pid_namespace.h and variables from
    pspace to pid_ns.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu