09 Feb, 2008
40 commits
-
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly. Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already. In that case the group stop never
finishes properly.Note: this is a minimal fix, we can add some optimizations later. Say we
can return quickly if thread_group_empty(). Also, we can move some signal
related code from exit_notify() to exit_signals().Signed-off-by: Oleg Nesterov
Acked-by: Davide Libenzi
Cc: Ingo Molnar
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.Signed-off-by: Oleg Nesterov
Acked-by: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sys_setsid() still deals with pid_t's from the global namespace. This means
that the "session > 1" check can't help for sub-namespace init, setsid() can't
succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.Remove the usage of task_struct->pid and convert the code to use "struct pid".
This also simplifies and speedups the code, saves one find_pid().Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
versa. Use "struct pid" more consistently. Saves one find_vpid() and
eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
code.Also use the same_thread_group() helper.
Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
The exit_state was EXIT_ZOMBIE when the function was called, and another
thread can change it to EXIT_DEAD right after the check.The second condition is not possible, detached non-traced threads were already
filtered out by eligible_child(), we didn't drop tasklist since then.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Surprise, the other two wait_task_*() functions also abuse the
task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
in wait_task_continued(). wait_task_zombie() doesn't have this problem,
but it is still better to cache pid_t rather than call task_pid_nr_ns()
three times on the saved pid_namespace.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Imho, the current usage of security_task_wait() is not logical.
Suppose we have the single child p, and security_task_wait(p) return
-EANY. In that case waitpid(-1) returns this error. Why? Isn't it
better to return ECHLD? We don't really have reapable children.Now suppose that child was stolen by gdb. In that case we find this
child on ->ptrace_children and set flag = 1, but we don't check that the
child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't
match the behaviour above. Without WNOHANG do_wait() blocks only to
return the error later, when the child will be untraced. Inho, really
strange.I think eligible_child() should return the error only if the child's pid
was requested explicitly, otherwise we should silently ignore the tasks
which were nacked by security_task_wait().Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Chris Wright
Cc: Eric Paris
Cc: James Morris
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
eligible_child() == 2 means delay_group_leader(). With the previous patch
this only matters for EXIT_ZOMBIE task, we can move that special check to
the only place it is really needed.Also, with this patch we don't skip security_task_wait() for the group
leaders in a non-empty thread group. I don't really understand the exact
semantics of security_task_wait(), but imho this change is a bugfix.Also rearrange the code a bit to kill an ugly "check_continued" backdoor.
Signed-off-by: Oleg Nesterov
Cc: Eric Paris
Cc: James Morris
Cc: Roland McGrath
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
wait_task_stopped() doesn't need the "delay_group_leader" parameter. If
the child is not traced it must be a group leader. With or without
subthreads ->group_stop_count == 0 when the whole task is stopped.Signed-off-by: Oleg Nesterov
Cc: Mika Penttila
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the tracer is gone and we are not going to stop, ptrace_stop() sets
->exit_code = nostop_code. However, the tracer could actually clear the
exit code before detaching. In that case get_signal_to_deliver() "resends"
the signal which was cancelled by the debugger. For example, it is
possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in
STOPPED state.Change the behaviour of ptrace_stop(). If the caller is ptrace notify(),
we should always clear ->exit_code. If the caller is
get_signal_to_deliver(), we should not touch it at all. To do so, change
the nonstop_code parameter to "bool clear_code" and change the callers
accordingly.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Every branch if the main "if" statement does the same code at the end. Move
it down. Also, fix the indentation.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
wait_task_stopped() has multiple races with SIGCONT/SIGKILL. tasklist_lock
does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info
reported (including exit_code) may be wrong.In fact, the code under write_lock_irq(tasklist_lock) is not safe. The child
may be PTRACE_DETACH'ed at this time by another subthread, in that case it is
possible we are no longer its ->parent.Change wait_task_stopped() to take ->siglock before inspecting the task. This
guarantees that the child can't resume and (for example) clear its
->exit_code, so we don't need to use xchg(&p->exit_code) and re-check. The
only exception is ptrace_stop() which changes ->state and ->exit_code without
->siglock held during abort. But this can only happen if both the tracer and
the tracee are dying (coredump is in progress), we don't care.With this patch wait_task_stopped() doesn't move the child to the end of
the ->parent list on success. This optimization could be restored, but
in that case we have to take write_lock(tasklist) and do some nasty
checks.Also change the do_wait() since we don't return EAGAIN any longer.
[akpm@linux-foundation.org: fix up after Willy renamed everything]
Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the tracer went away (may_ptrace_stop() failed), ptrace_stop() drops
tasklist and then changes the ->state from TASK_TRACED to TASK_RUNNING.This can fool another tracer which attaches to us in between. Change the
->state under tasklist_lock to ensure that ptrace_check_attach() can't wrongly
succeed. Also, remove the unnecessary mb().Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is not possible to see the PT_PTRACED task without ->signal/sighand under
tasklist_lock, release_task() does ptrace_unlink() first. If the task was
already released before, ptrace_attach() can't succeed and set PT_PTRACED.
Remove this check.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED"
inline and simplify the corresponding logic in do_wait: we can't find the
child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace()
either sets TASK_STOPPED or wakes up the tracee.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since the patch
"Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
commit f5b40e363ad6041a96e3da32281d8faa191597b9we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.
This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
which were needed to catch the "ptrace attach is in progress" case. We can
also remove the flag itself since nobody else uses it.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
ipc_namespace is released to free all ipcs of each type. But in fact, they
do the same thing: they loop around all ipcs to free them individually by
calling a specific routine.This patch proposes to consolidate this by introducing a common function,
free_ipcs(), that do the job. The specific routine to call on each
individual ipcs is passed as parameter. For this, these ipc-specific
'free' routines are reworked to take a generic 'struct ipc_perm' as
parameter.Signed-off-by: Pierre Peiffer
Cc: Cedric Le Goater
Cc: Pavel Emelyanov
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
are dynamically allocated for each icp_namespace as the ipc_namespace
itself (for the init namespace, they are initialized with pointers to
static variables instead)It is so for historical reason: in fact, before the use of idr to store the
ipcs, the ipcs were stored in tables of variable length, depending of the
maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
size. As they are allocated in any cases for each new ipc_namespace, there
is no gain of memory in having them allocated separately of the struct
ipc_namespace.This patch proposes to make this table static in the struct ipc_namespace.
Thus, we can allocate all in once and get rid of all the code needed to
allocate and free these ipc_ids separately.Signed-off-by: Pierre Peiffer
Acked-by: Cedric Le Goater
Cc: Pavel Emelyanov
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
These commands (SEM_STAT and IPC_STAT) are rather doing the same things
(only the meaning of the id given as input and the return value differ).
However, for the semaphores, they are handled in two different places (two
different functions).This patch consolidates this for clarification by handling these both
commands in the same place in semctl_nolock(). It also removes one unused
parameter for this function.Signed-off-by: Pierre Peiffer
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
inline. Besides, they give no optimization being inline as they perform
calls inside in any case.Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
internal API.$ ./scripts/bloat-o-meter vmlinux-orig vmlinux
add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
function old new delta
ipcget - 392 +392
ipc_lock_check_down - 49 +49
ipc_lock_check - 49 +49
sys_semget 119 105 -14
sys_shmget 108 86 -22
sys_msgget 100 78 -22
do_msgsnd 665 631 -34
do_msgrcv 680 644 -36
do_shmat 771 733 -38
sys_msgctl 1302 1229 -73
ipcget_new 80 - -80
sys_semtimedop 1534 1452 -82
sys_semctl 2034 1922 -112
sys_shmctl 1919 1765 -154
ipcget_public 322 - -322The ipcget() growth is the result of gcc inlining of currently static
ipcget_new/_public.Signed-off-by: Pavel Emelyanov
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Also removes a cflag comparison that caused some mode changes to get wrongly
ignoredSigned-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alan Cox
Cc: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Get the constness right, avoid nasty cast.
Cc: Ingo Molnar
Cc: Kyle McMartin
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
module.c should not define linker variables on its own. We have an include
file for that.Signed-off-by: Christoph Lameter
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The module subsystem cannot handle symbols that are zero. If symbols are
present that have a zero value then the module resolver prints out a
message that these symbols are unresolved.[akinobu.mita@gmail.com: fix __find_symbl() error checks]
Cc: Mathieu Desnoyers
Cc: Kay Sievers
Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds