20 Oct, 2007

40 commits

  • define global BIT macro

    move all local BIT defines to the new globally define macro.

    Signed-off-by: Jiri Slaby
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Cc: Dmitry Torokhov
    Cc: Jeff Garzik
    Cc: James Bottomley
    Cc: "Antonino A. Daplas"
    Cc: Russell King
    Acked-by: Ralf Baechle
    Cc: "John W. Linville"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • get rid of input BIT* duplicate defines

    use newly global defined macros for input layer. Also remove includes of
    input.h from non-input sources only for BIT macro definiton. Define the
    macro temporarily in local manner, all those local definitons will be
    removed further in this patchset (to not break bisecting).
    BIT macro will be globally defined (1<
    Cc:
    Acked-by: Jiri Kosina
    Cc:
    Acked-by: Marcel Holtmann
    Cc:
    Acked-by: Mauro Carvalho Chehab
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • define first set of BIT* macros

    - move BITOP_MASK and BITOP_WORD from asm-generic/bitops/atomic.h to
    include/linux/bitops.h and rename it to BIT_MASK and BIT_WORD
    - move BITS_TO_LONGS and BITS_PER_BYTE to bitops.h too and allow easily
    define another BITS_TO_something (e.g. in event.c) by BITS_TO_TYPE macro
    Remaining (and common) BIT macro will be defined after all occurences and
    conflicts will be sorted out in the patches.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • amba-pl011, rename BIT macro

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • s2io, rename BIT macro

    BIT macro will be global definiton of (1<
    Cc: Ramkrishna Vepa
    Cc: Rastapur Santosh
    Cc: Sivakumar Subramani
    Cc: Sreenivasa Honnur
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • i2c-pxa, rename BIT macro to PXA_BIT

    BIT macro will be global definiton of (1 << x)

    Signed-off-by: Jiri Slaby
    Cc: Nicolas Pitre
    Cc: Jean Delvare
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • This patch fixes errors and warnings pointed out by the checkpatch.pl
    script.

    Antonino Daplas replaced BIT with ENCODE_BIT.

    Signed-off-by: Krzysztof Helt
    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Helt
     
  • cyber2000fb, rename BIT macro

    BIT will be global macro for (1 << x)

    Signed-off-by: Jiri Slaby
    Cc: Russell King
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • forbid asm/bitops.h direct inclusion

    Because of compile errors that may occur after bit changes if asm/bitops.h is
    included directly without e.g. linux/kernel.h which includes linux/bitops.h,
    forbid direct inclusion of asm/bitops.h. Thanks to Adrian Bunk.

    Signed-off-by: Jiri Slaby
    Cc: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • remove asm/bitops.h includes

    including asm/bitops directly may cause compile errors. don't include it
    and include linux/bitops instead. next patch will deny including asm header
    directly.

    Cc: Adrian Bunk
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • fs/select, remove unused macros

    this is due to preparation for global BIT macro

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • This new version guarantees amb_bit switch in small enough intervals, so that
    the device won't stop working in the middle of a movement anymore. However it
    preserves old (openhaptics) functionality.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Wait after disabling device's interrupt until the handler finishes its work if
    still in progress.

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks:

    - collect batches of tasks under tasklist_lock and then call
    set_cpus_allowed() on them outside the lock (since this can sleep).

    - add a simple generic priority heap type to allow efficient collection
    of batches of tasks to be processed without duplicating or missing any
    tasks in subsequent batches.

    - make "cpus" file update a no-op if the mask hasn't changed

    - fix race between update_cpumask() and sched_setaffinity() by making
    sched_setaffinity() post-check that it's not running on any cpus outside
    cpuset_cpus_allowed().

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Paul Menage
    Cc: Paul Jackson
    Cc: David Rientjes
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Balbir Singh
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code.

    Other than subtle improvements in the consistency of identifying
    white space at the beginning and end of passed in masks, this
    doesn't make any visible difference in behaviour. But it's
    one or two hundred kernel text bytes smaller, and easier to
    understand.

    [akpm@linux-foundation.org: coding-style fix]
    Signed-off-by: Paul Jackson
    Reviewed-by: Paul Menage
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Add a new per-cpuset flag called 'sched_load_balance'.

    When enabled in a cpuset (the default value) it tells the kernel scheduler
    that the scheduler should provide the normal load balancing on the CPUs in
    that cpuset, sometimes moving tasks from one CPU to a second CPU if the
    second CPU is less loaded and if that task is allowed to run there.

    When disabled (write "0" to the file) then it tells the kernel scheduler
    that load balancing is not required for the CPUs in that cpuset.

    Now even if this flag is disabled for some cpuset, the kernel may still
    have to load balance some or all the CPUs in that cpuset, if some
    overlapping cpuset has its sched_load_balance flag enabled.

    If there are some CPUs that are not in any cpuset whose sched_load_balance
    flag is enabled, the kernel scheduler will not load balance tasks to those
    CPUs.

    Moreover the kernel will partition the 'sched domains' (non-overlapping
    sets of CPUs over which load balancing is attempted) into the finest
    granularity partition that it can find, while still keeping any two CPUs
    that are in the same shed_load_balance enabled cpuset in the same element
    of the partition.

    This serves two purposes:
    1) It provides a mechanism for real time isolation of some CPUs, and
    2) it can be used to improve performance on systems with many CPUs
    by supporting configurations in which load balancing is not done
    across all CPUs at once, but rather only done in several smaller
    disjoint sets of CPUs.

    This mechanism replaces the earlier overloading of the per-cpuset
    flag 'cpu_exclusive', which overloading was removed in an earlier
    patch: cpuset-remove-sched-domain-hooks-from-cpusets

    See further the Documentation and comments in the code itself.

    [akpm@linux-foundation.org: don't be weird]
    Signed-off-by: Paul Jackson
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Since these are expanded into call to pid_nr_ns() anyway, it's OK to move
    the whole routine out-of-line. This is a cheap way to save ~100 bytes from
    vmlinux. Together with the previous two patches, it saves half-a-kilo from
    the vmlinux.

    Un-inline other (currently inlined) functions must be done with additional
    performance testing.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its
    id, depending on whic id - global or virtual - is used.

    The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the
    stack to call another function - find_pid_ns(). It turned out, that this
    dereference together with the push itself cause the kernel text size to
    grow too much.

    Move all these out-of-line. Together with the previous patch this saves a
    bit less that 400 bytes from .text section.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • With pid namespaces this field is now dangerous to use explicitly, so hide
    it behind the helpers.

    Also the pid and pgrp fields o task_struct and signal_struct are to be
    deprecated. Unfortunately this patch cannot be sent right now as this
    leads to tons of warnings, so start isolating them, and deprecate later.

    Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
    but Oleg pointed out that in case of posix cpu timers this is the same, and
    thread_group_leader() is more preferable.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Since we've switched from using pid->nr to pid->upids->nr some
    fields on struct pid are no longer needed

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The find_task_by_something is a set of macros are used to find task by pid
    depending on what kind of pid is proposed - global or virtual one. All of
    them are wrappers above the most generic one - find_task_by_pid_type_ns() -
    and just substitute some args for it.

    It turned out, that dereferencing the current->nsproxy->pid_ns construction
    and pushing one more argument on the stack inline cause kernel text size to
    grow.

    This patch moves all this stuff out-of-line into kernel/pid.c. Together
    with the next patch it saves a bit less than 400 bytes from the .text
    section.

    Signed-off-by: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Terminate all processes in a namespace when the reaper of the namespace is
    exiting. We do this by walking the pidmap of the namespace and sending
    SIGKILL to all processes.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Only the global-init process must be special - any other cgroup-init
    process must be killable to prevent run-away processes in the system.

    TODO: Ideally we should allow killing the cgroup-init only from parent
    cgroup and prevent it being killed from within the cgroup.
    But that is a more complex change and will be addressed by a follow-on
    patch. For now allow the cgroup-init to be terminated by any process
    with sufficient privileges.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • This will help fixing memory leaks due to bad reference counting.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • The namespace's proc_mnt must be kern_mount-ed to make this pointer always
    valid, independently of whether the user space mounted the proc or not. This
    solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL
    to not-NULL.

    The initialization is done after the init's pid is created and hashed to make
    proc_get_sb() finr it and get for root inode.

    Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
    superblock holds the namespace we must explicitly break this circle to destroy
    all the stuff. This is done after the init of the namespace dies. Running a
    few steps forward - when init exits it will kill all its children, so no
    proc_mnt will be needed after its death.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This means that proc_flush_task_mnt() is to be called for many proc mounts and
    with different ids, depending on the namespace this pid is to be flushed from.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
    create a new struct pid for the new process. Allocate pid_t's for the new
    process in the new pid namespace and all ancestor pid namespaces. Make the
    newly cloned process the session and process group leader.

    Since the active pid namespace is special and expected to be the first entry
    in pid->upid_list, preserve the order of pid namespaces.

    The size of 'struct pid' is dependent on the the number of pid namespaces the
    process exists in, so we use multiple pid-caches'. Only one pid cache is
    created during system startup and this used by processes that exist only in
    init_pid_ns.

    When a process clones its pid namespace, we create additional pid caches as
    necessary and use the pid cache to allocate 'struct pids' for that depth.

    Note, that with this patch the newly created namespace won't work, since the
    rest of the kernel still uses global pids, but this is to be fixed soon. Init
    pid namespace still works.

    [oleg@tv-sign.ru: merge fix]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • * remove pid.h from pid_namespaces.h;
    * rework is_(cgroup|global)_init;
    * optimize (get|put)_pid_ns for init_pid_ns;
    * declare task_child_reaper to return actual reaper.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Each pid namespace have to be visible through its own proc mount. Thus we
    need to have per-namespace proc trees with their own superblocks.

    We cannot easily show different pid namespace via one global proc tree, since
    each pid refers to different tasks in different namespaces. E.g. pid 1
    refers to the init task in the initial namespace and to some other task when
    seeing from another namespace. Moreover - pid, exisintg in one namespace may
    not exist in the other.

    This approach has one move advantage is that the tasks from the init namespace
    can see what tasks live in another namespace by reading entries from another
    proc tree.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When we create new namespace we will need to allocate the struct pid, that
    will have one extra struct upid in array, comparing to the parent.

    Thus we need to know the new namespace (if any) in alloc_pid() to init this
    struct upid properly, so move the alloc_pid() call lower in copy_process().

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When searching the task by numerical id on may need to find it using global
    pid (as it is done now in kernel) or by its virtual id, e.g. when sending a
    signal to a task from one namespace the sender will specify the task's virtual
    id and we should find the task by this value.

    [akpm@linux-foundation.org: fix gfs2 linkage]
    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When showing pid to user or getting the pid numerical id for in-kernel use the
    value of this id may differ depending on the namespace.

    This set of helpers is used to get the global pid nr, the virtual (i.e. seen
    by task in its namespace) nr and the nr as it is seen from the specified
    namespace.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Each struct upid element of struct pid has to be initialized properly, i.e.
    its nr mst be allocated from appropriate pidmap and ns set to appropriate
    namespace.

    When allocating a new pid, we need to know the namespace this pid will live
    in, so the additional argument is added to alloc_pid().

    On the other hand, the rest of the kernel still uses the pid->nr and
    pid->pid_chain fields, so these ones are still initialized, but this will be
    removed soon.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Each namespace has a parent and is characterized by its "level". Level is the
    number of the namespace generation. E.g. init namespace has level 0, after
    cloning new one it will have level 1, the next one - 2 and so on and so forth.
    This level is not explicitly limited.

    True hierarchy must have some way to find each namespace's children, but it is
    not used in the patches, so this ability is not added (yet).

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Since task will be visible from different pid namespaces each of them have to
    be addressed by multiple pids. struct upid is to store the information about
    which id refers to which namespace.

    The constuciton looks like this. Each struct pid carried the reference
    counter and the list of tasks attached to this pid. At its end it has a
    variable length array of struct upid-s. Each struct upid has a numerical id
    (pid itself), pointer to the namespace, this ID is valid in and is hashed into
    a pid_hash for searching the pids.

    The nr and pid_chain fields are kept in struct pid for a while to make kernel
    still work (no patch initialize the upids yet), but it will be removed at the
    end of this series when we switch to upids completely.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • The first part is trivial - we just make the proc_flush_task() to operate on
    arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to
    it.

    The other change is more tricky: I moved the proc_flush_task() call in
    release_task() higher to address the following problem.

    When flushing task from many proc trees we need to know the set of ids (not
    just one pid) to find the dentries' names to flush. Thus we need to pass the
    task's pid to proc_flush_task() as struct pid is the only object that can
    provide all the pid numbers. But after __exit_signal() task has detached all
    his pids and this information is lost.

    This creates a tiny gap for proc_pid_lookup() to bring some dentries back to
    tree and keep them in hash (since pids are still alive before __exit_signal())
    till the next shrink, but since proc_flush_task() does not provide a 100%
    guarantee that the dentries will be flushed, this is OK to do so.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This flag tells the .get_sb callback that this is a kern_mount() call so that
    it can trust *data pointer to be valid in-kernel one. If this flag is passed
    from the user process, it is cleared since the *data pointer is not a valid
    kernel object.

    Running a few steps forward - this will be needed for proc to create the
    superblock and store a valid pid namespace on it during the namespace
    creation. The reason, why the namespace cannot live without proc mount is
    described in the appropriate patch.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Make task release its namespaces after it has reparented all his children to
    child_reaper, but before it notifies its parent about its death.

    The reason to release namespaces after reparenting is that when task exits it
    may send a signal to its parent (SIGCHLD), but if the parent has already
    exited its namespaces there will be no way to decide what pid to dever to him
    - parent can be from different namespace.

    The reason to release namespace before notifying the parent it that when task
    sends a SIGCHLD to parent it can call wait() on this taks and release it. But
    releasing the mnt namespace implies dropping of all the mounts in the mnt
    namespace and NFS expects the task to have valid sighand pointer.

    Thanks to Oleg for pointing out some races that can apear and helping with
    patches and fixes.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov