13 Jan, 2012

1 commit

  • The sysctl works on the current task's pid namespace, getting and setting
    its last_pid field.

    Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible
    to create a task with desired pid value. This ability is required badly
    for the checkpoint/restore in userspace.

    This approach suits all the parties for now.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Cyrill Gorcunov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

24 Mar, 2011

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

13 Mar, 2010

1 commit

  • zap_pid_ns_processes() uses force_sig(SIGKILL) to ensure SIGKILL will be
    delivered to sub-namespace inits as well. This is correct, but we are
    going to change force_sig_info() semantics. See
    http://bugzilla.kernel.org/show_bug.cgi?id=15395#c31

    We can use send_sig_info(SEND_SIG_NOINFO) instead, since
    614c517d7c00af1b26ded20646b329397d6f51a1 ("signals: SEND_SIG_NOINFO should
    be considered as SI_FROMUSER()") SEND_SIG_NOINFO means "from user" and
    therefore send_signal() will get the correct from_ancestor_ns = T flag.

    Signed-off-by: Oleg Nesterov
    Acked-by: Serge Hallyn
    Acked-by: Linus Torvalds
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

24 Sep, 2009

1 commit

  • CLONE_PARENT was used to implement an older threading model. For
    consistency with the CLONE_THREAD check in copy_pid_ns(), disable
    CLONE_PARENT with CLONE_NEWPID, at least until the required semantics of
    pid namespaces are clear.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Roland McGrath
    Acked-by: Serge Hallyn
    Cc: Oren Laadan
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

19 Jun, 2009

2 commits

  • copy_pid_ns() is a perfect example of a case where unwinding leads to more
    code and makes it less clear. Watch the diffstat.

    Signed-off-by: Alexey Dobriyan
    Cc: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Reviewed-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • create_pid_namespace() creates everything, but caller has to assign parent
    pidns by hand, which is unnatural. At the moment of call new ->level has
    to be taken from somewhere and parent pidns is already available.

    Signed-off-by: Alexey Dobriyan
    Cc: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Acked-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Apr, 2009

1 commit

  • send_signal() assumes that signals with SEND_SIG_PRIV are generated from
    within the same namespace. So any nested container-init processes become
    immune to the SIGKILL generated by kill_proc_info() in
    zap_pid_ns_processes().

    Use force_sig() in zap_pid_ns_processes() instead - force_sig() clears the
    SIGNAL_UNKILLABLE flag ensuring the signal is processed by
    container-inits.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: "Eric W. Biederman"
    Cc: Daniel Lezcano
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

03 Sep, 2008

2 commits

  • We don't change pid_ns->child_reaper when the main thread of the
    subnamespace init exits. As Robert Rex pointed
    out this is wrong.

    Yes, the re-parenting itself works correctly, but if the reparented task
    exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
    main thread is zombie its ->nsproxy was already cleared by
    exit_task_namespaces().

    Introduce the new function, find_new_reaper(), which finds the new
    ->parent for the re-parenting and changes ->child_reaper if needed. Kill
    the now unneeded exit_child_reaper().

    Also move the changing of ->child_reaper from zap_pid_ns_processes() to
    find_new_reaper(), this consolidates the games with ->child_reaper and
    makes it stable under tasklist_lock.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391

    Reported-by: Robert Rex
    Signed-off-by: Oleg Nesterov
    Acked-by: Serge Hallyn
    Acked-by: Pavel Emelyanov
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

    Yes, we have already killed all tasks in this namespace, and sys_wait4()
    doesn't see any child. But this doesn't mean ->children list is empty, we
    may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
    the subsequent forget_original_parent() will crash the kernel because it
    will try to re-parent these tasks to the NULL reaper.

    Even if there are no childs, it is not good that forget_original_parent()
    uses reaper == NULL.

    Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
    We could use pid_ns->parent->child_reaper as well, I think this does not
    really matter. These EXIT_DEAD tasks are not visible to the new ->parent
    after re-parenting, they will silently do release_task() eventually.

    Note that we must change ->child_reaper, otherwise
    forget_original_parent() will use reaper == father, and in that case we
    will hit the (correct) BUG_ON(!list_empty(&father->children)).

    Signed-off-by: Oleg Nesterov
    Acked-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

26 Jul, 2008

2 commits

  • Allocate the structure on the first call to sys_acct(). After this each
    namespace, that ordered the accounting, will live with this structure till
    its own death.

    Two notes
    - routines, that close the accounting on fs umount time use
    the init_pid_ns's acct by now;
    - accounting routine accounts to dying task's namespace
    (also by now).

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • It makes many fields initialization implicit helping in auto-setting
    #ifdef-ed fields (bsd-acct related pointer will be such).

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

30 Apr, 2008

1 commit

  • These values represent the nesting level of a namespace and pids living in it,
    and it's always non-negative.

    Turning this from int to unsigned int saves some space in pid.c (11 bytes on
    x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit.
    E.g. on ia64 this removes the sign extension calls, which compiler adds to
    optimize access to pid->nubers[ns->level].

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

29 Apr, 2008

1 commit


09 Feb, 2008

1 commit

  • Just like with the user namespaces, move the namespace management code into
    the separate .c file and mark the (already existing) PID_NS option as "depend
    on NAMESPACES"

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov