19 Apr, 2011
1 commit
-
next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
Reported-by: Tavis Ormandy
Analyzed-by: Robert Święcki
Cc: Eric W. Biederman
Cc: Pavel Emelyanov
Signed-off-by: Linus Torvalds
18 Mar, 2011
1 commit
-
Export the symbols required for a race-free kvm_vcpu_on_spin.
Signed-off-by: Rik van Riel
Signed-off-by: Avi Kivity
20 Aug, 2010
2 commits
-
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/pid.c:386 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by rc.sysinit/1102:
#0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x40/0x160stack backtrace:
Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
Call Trace:
[] lockdep_rcu_dereference+0x94/0xb0
[] find_task_by_pid_ns+0x6d/0x70
[] find_task_by_vpid+0x18/0x20
[] sys_setpgid+0x47/0x160
[] sysenter_do_call+0x12/0x36Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().Signed-off-by: Tetsuo Handa
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett -
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Reviewed-by: Josh Triplett
11 Aug, 2010
2 commits
-
alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map->page twice. This is correct, we want to find the
unused bits < offset in this bitmap block. Add the comment.But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map->page for the second time. We have already
already checked the bits >= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.Remove this hard-to-understand code. It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.Signed-off-by: Oleg Nesterov
Cc: Salman Qazi
Cc: Ingo Molnar
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program. This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.Race Description:
A B
// pid == offset == n // pid == offset == n + 1
test_and_set_bit(offset, map->page)
test_and_set_bit(offset, map->page);
pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
// pid == n + 1 is freed (wait())// Next fork()...
last = pid_ns->last_pid; // == n
pid = last + 1;Code to reproduce it (Running multiple instances is more effective):
#include
#include
#include
#include
#include
#include// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
return (second + 32768 - first) % 32768;
}int main(int argc, char* argv[]) {
int failed = 0;
pid_t last_pid = 0;
int i;
printf("%d\n", sizeof(pid_t));
for (i = 0; i < 10000000; ++i) {
if (i % 32786 == 0)
printf("Iter: %d\n", i/32768);
int child_exit_code = i % 256;
pid_t pid = fork();
if (pid == -1) {
fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
exit(1);
}
if (pid == 0) {
// Child
exit(child_exit_code);
} else {
// Parent
if (i > 0) {
int distance = PidDistance(last_pid, pid);
if (distance == 0 || distance > 30000) {
fprintf(stderr,
"Unexpected pid sequence: previous fork: pid=%d, "
"current fork: pid=%d for iteration=%d.\n",
last_pid, pid, i);
failed = 1;
}
}
last_pid = pid;
int status;
int reaped = wait(&status);
if (reaped != pid) {
fprintf(stderr,
"Wait return value: expected pid=%d, "
"got %d, iteration %d\n",
pid, reaped, i);
failed = 1;
} else if (WEXITSTATUS(status) != child_exit_code) {
fprintf(stderr,
"Unexpected exit status %x, iteration %d\n",
WEXITSTATUS(status), i);
failed = 1;
}
}
}
exit(failed);
}Thanks to Ted Tso for the key ideas of this implementation.
Signed-off-by: Salman Qazi
Cc: Ingo Molnar
Cc: Theodore Ts'o
Cc: Peter Zijlstra
Cc: Sukadev Bhattiprolu
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 May, 2010
1 commit
-
On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough. A system with 1664 CPU's, there are
25163 processes started before the login prompt. It's estimated that with
2048 CPU's we will pass the 32k limit. With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche
Signed-off-by: Mike Travis
Signed-off-by: Robin Holt
Acked-by: Linus Torvalds
Cc: Ingo Molnar
Cc: Pavel Machek
Cc: Alan Cox
Cc: Greg KH
Cc: Rik van Riel
Cc: John Stoffel
Cc: Jack Steiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Mar, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_apertureFix trivial conflicts in kernel/trace/ftrace.c
07 Mar, 2010
1 commit
-
tasklist_lock does protect the task and its pid, it can't go away. The
problem is that find_pid_ns() itself is unsafe without rcu lock, it can
race with copy_process()->free_pid(any_pid).Protecting copy_process()->free_pid(any_pid) with tasklist_lock would make
it possible to call find_task_by_pid_ns() under tasklist safely, but we
don't do so because we are trying to get rid of the read_lock sites of
tasklist_lock.Signed-off-by: Tetsuo Handa
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Mar, 2010
1 commit
-
Lockdep-RCU commit d11c563d exported tasklist_lock, which is not
a good thing. This patch instead exports a function that uses
lockdep to check whether tasklist_lock is held.Suggested-by: Christoph Hellwig
Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Christoph Hellwig
LKML-Reference:
Signed-off-by: Ingo Molnar
25 Feb, 2010
1 commit
-
Update the rcu_dereference() usages to take advantage of the new
lockdep-based checking.Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
[ -v2: fix allmodconfig missing symbol export build failure on x86 ]
Signed-off-by: Ingo Molnar
16 Dec, 2009
2 commits
-
It decreases code size by 16 bytes on my gcc 4.4.1 on Core 2:
text data bss dec hex filename
4314 2216 8 6538 198a kernel/pid.o-BEFORE
4298 2216 8 6522 197a kernel/pid.o-AFTERSigned-off-by: André Goddard Rosa
Cc: Pekka Enberg
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Avoid calling kfree() under pidmap spinlock, calling it afterwards.
Normally kfree() is fast, but sometimes it can be slow, so avoid
calling it under the spinlock if we can do it.Signed-off-by: André Goddard Rosa
Cc: Pekka Enberg
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Sep, 2009
1 commit
-
This is being done by allowing boot time allocations to specify that they
may want a sub-page sized amount of memory.Overall this seems more consistent with the other hash table allocations,
and allows making two supposedly mm-only variables really mm-only
(nr_{kernel,all}_pages).Signed-off-by: Jan Beulich
Cc: Ingo Molnar
Cc: "Eric W. Biederman"
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Jul, 2009
1 commit
-
kmemleak_alloc() calls were added in some places where alloc_bootmem was
called. Since now kmemleak tracks bootmem allocations, these explicit
calls should be run.Signed-off-by: Catalin Marinas
Cc: Ingo Molnar
Acked-by: Pekka Enberg
30 Jun, 2009
1 commit
-
Kmemleak does not track alloc_bootmem calls but the pid_hash allocated
in pidhash_init() would need to be scanned as it contains pointers to
struct pid objects.Signed-off-by: Catalin Marinas
19 Jun, 2009
1 commit
-
find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
find_task_by_pid_ns to implement find_task_by_vpid.While we're at it also remove the exports for find_task_by_pid_ns and
find_task_by_vpid - we don't have any modular callers left as the only
modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
switched to pid_task which operates on a struct pid pointer instead of a
pid_t. Given the confusion about pid_t values vs namespace that's
generally the better option anyway and I think we're better of restricting
modules to do it that way.Signed-off-by: Christoph Hellwig
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2009
2 commits
-
Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy.
task_pid_nr_ns(task) needs rcu/tasklist depending on task == current.
As for "special" pids, vnr/nr_ns helpers always need rcu. However, if
task != current, they are unsafe even under rcu lock, we can't trust
task->group_leader without the special checks.And almost every helper has a callsite which needs a fix.
Also, it is a bit annoying that the implementations of, say,
task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical".This patch introduces the new helper, __task_pid_nr_ns(), which is always
safe to use, and turns all other helpers into the trivial wrappers.After this I'll send another patch which converts task_tgid_xxx() as well,
they're are a bit special.Signed-off-by: Oleg Nesterov
Cc: Louis Rilling
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sys_wait4() does get_pid(task_pgrp(current)), this is not safe. We can
add rcu lock/unlock around, but we already have get_task_pid() which can
be improved to handle the special pids in more reliable manner.Signed-off-by: Oleg Nesterov
Cc: Louis Rilling
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Jan, 2009
1 commit
-
Currently task_active_pid_ns is not safe to call after a task becomes a
zombie and exit_task_namespaces is called, as nsproxy becomes NULL. By
reading the pid namespace from the pid of the task we can trivially solve
this problem at the cost of one extra memory read in what should be the
same cacheline as we read the namespace from.When moving things around I have made task_active_pid_ns out of line
because keeping it in pid_namespace.h would require adding includes of
pid.h and sched.h that I don't think we want.This change does make task_active_pid_ns unsafe to call during
copy_process until we attach a pid on the task_struct which seems to be a
reasonable trade off.Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Roland McGrath
Cc: Bastian Blank
Cc: Pavel Emelyanov
Cc: Nadia Derbey
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jan, 2009
1 commit
-
- (better, more, bigger ...) then -> (...) than
Signed-off-by: Frederik Schwarzer
Signed-off-by: Jiri Kosina
26 Jul, 2008
2 commits
-
This one had the only users so far - the kill_proc, which is removed, so
drop this (invalid in namespaced world) call too.And of course - erase all references on it from comments.
Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move EXPORT_SYMBOL right after the func
Signed-off-by: David Sterba
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 May, 2008
1 commit
-
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.Signed-off-by: Franck Bui-Huu
Acked-by: Paul E. McKenney
Acked-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar
30 Apr, 2008
4 commits
-
Based on Eric W. Biederman's idea.
Without tasklist_lock held task_session()/task_pgrp() can return NULL if the
caller races with setprgp()/setsid() which does detach_pid() + attach_pid().
This can happen even if task == current.Intoduce the new helper, change_pid(), which should be used instead. This way
the caller always sees the special pid != NULL, either old or new.Also change the prototype of attach_pid(), it always returns 0 and nobody
check the returned value.Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Based on Eric W. Biederman's idea.
Unless task == current, without tasklist_lock held task_session()/task_pgrp()
can return NULL if the caller races with de_thread() which switches the group
leader.Change transfer_pid() to not clear old->pids[type].pid for the old leader.
This means that its .pid can point to "nowhere", but this is already true for
sub-threads, and the old leader is not group_leader() any longer. IOW, with
or without this change we can't trust task's special pids unless it is the
group leader.With this change the following code
rcu_read_lock();
task = find_task_by_xxx();
do_something(task_pgrp(task), task_session(task));
rcu_read_unlock();can't race with exec and hit the NULL pid.
Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are some places that are known to operate on tasks'
global pids only:* the rest_init() call (called on boot)
* the kgdb's getthread
* the create_kthread() (since the kthread is run in init ns)So use the find_task_by_pid_ns(..., &init_pid_ns) there
and schedule the find_task_by_pid for removal.[sukadev@us.ibm.com: Fix warning in kernel/pid.c]
Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Signed-off-by: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The callers of free_pidmap() pass 2 members of "struct upid", we can just
pass "struct upid *" instead. Shaves off 10 bytes from pid.o.Also, simplify the alloc_pid's "out_free:" error path a little bit. This
way it looks more clear which subset of pid->numbers[] we are freeing.Signed-off-by: Oleg Nesterov
Cc: Pavel Emelyanov
Cc: "Eric W. Biederman"
Cc :Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Feb, 2008
3 commits
-
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
pid_vnr returns the user space pid with respect to the pid namespace the
struct pid was allocated in. What we want before we return a pid to user
space is the user space pid with respect to the pid namespace of current.pid_vnr is a very nice optimization but because it isn't quite what we want
it is easy to use pid_vnr at times when we aren't certain the struct pid
was allocated in our pid namespace.Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
parent process reported in the core dumps and the parent process in
get_signal_to_deliver.So unless the performance impact is huge having an interface that does what
we want instead of always what we want should be much more reliable and
much less error prone.Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Just like with the user namespaces, move the namespace management code into
the separate .c file and mark the (already existing) PID_NS option as "depend
on NAMESPACES"[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov
Acked-by: Serge Hallyn
Cc: Cedric Le Goater
Cc: "Eric W. Biederman"
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Feb, 2008
1 commit
-
The gl_owner_pid field is used to get the lock owning task by its pid, so make
it in a proper manner, i.e. by using the struct pid pointer and pid_task()
function.The pid_task() becomes exported for the gfs2 module.
Signed-off-by: Pavel Emelyanov
Cc: "Eric W. Biederman"
Acked-by: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Nov, 2007
1 commit
-
This is my trivial patch to swat innumerable little bugs with a single
blow.After some intensive review (my apologies for not having gotten to this
sooner) what we have looks like a good base to build on with the current
pid namespace code but it is not complete, and it is still much to simple
to find issues where the kernel does the wrong thing outside of the initial
pid namespace.Until the dust settles and we are certain we have the ABI and the
implementation is as correct as humanly possible let's keep process ID
namespaces behind CONFIG_EXPERIMENTAL.Allowing us the option of fixing any ABI or other bugs we find as long as
they are minor.Allowing users of the kernel to avoid those bugs simply by ensuring their
kernel does not have support for multiple pid namespaces.[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Adrian Bunk
Cc: Jeremy Fitzhardinge
Cc: Kir Kolyshkin
Cc: Kirill Korotaev
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
7 commits
-
Since these are expanded into call to pid_nr_ns() anyway, it's OK to move
the whole routine out-of-line. This is a cheap way to save ~100 bytes from
vmlinux. Together with the previous two patches, it saves half-a-kilo from
the vmlinux.Un-inline other (currently inlined) functions must be done with additional
performance testing.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its
id, depending on whic id - global or virtual - is used.The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the
stack to call another function - find_pid_ns(). It turned out, that this
dereference together with the push itself cause the kernel text size to
grow too much.Move all these out-of-line. Together with the previous patch this saves a
bit less that 400 bytes from .text section.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since we've switched from using pid->nr to pid->upids->nr some
fields on struct pid are no longer neededSigned-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one. All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.It turned out, that dereferencing the current->nsproxy->pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.This patch moves all this stuff out-of-line into kernel/pid.c. Together
with the next patch it saves a bit less than 400 bytes from the .text
section.Signed-off-by: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Terminate all processes in a namespace when the reaper of the namespace is
exiting. We do this by walking the pidmap of the namespace and sending
SIGKILL to all processes.Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This will help fixing memory leaks due to bad reference counting.
Signed-off-by: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
create a new struct pid for the new process. Allocate pid_t's for the new
process in the new pid namespace and all ancestor pid namespaces. Make the
newly cloned process the session and process group leader.Since the active pid namespace is special and expected to be the first entry
in pid->upid_list, preserve the order of pid namespaces.The size of 'struct pid' is dependent on the the number of pid namespaces the
process exists in, so we use multiple pid-caches'. Only one pid cache is
created during system startup and this used by processes that exist only in
init_pid_ns.When a process clones its pid namespace, we create additional pid caches as
necessary and use the pid cache to allocate 'struct pids' for that depth.Note, that with this patch the newly created namespace won't work, since the
rest of the kernel still uses global pids, but this is to be fixed soon. Init
pid namespace still works.[oleg@tv-sign.ru: merge fix]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds