Doug / smarc-fsl-linux-kernel | Embedian Git Server

29 Sep, 2008

1 commit

31a78f23b mm owner: fix race between swapoff and exit ... Browse Code »

There's a race between mm->owner assignment and swapoff, more easily
seen when task slab poisoning is turned on. The condition occurs when
try_to_unuse() runs in parallel with an exiting task. A similar race
can occur with callers of get_task_mm(), such as /proc//
or ptrace or page migration.

CPU0 CPU1
try_to_unuse
looks at mm = task0->mm
increments mm->mm_users
task 0 exits
mm->owner needs to be updated, but no
new owner is found (mm_users > 1, but
no other task has task->mm = task0->mm)
mm_update_next_owner() leaves
mmput(mm) decrements mm->mm_users
task0 freed
dereferencing mm->owner fails

The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.

Jiri Slaby:
mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.

Daisuke Nishimura:
mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.

Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same. And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.

Reported-by: Hugh Dickins
Signed-off-by: Balbir Singh
Signed-off-by: Jiri Slaby
Signed-off-by: Daisuke Nishimura
Signed-off-by: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-09-29 23:41:47 +0800

06 Sep, 2008

1 commit

49048622e sched: fix process time monotonicity ... Browse Code »

Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem

TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.

Reported-by: spencer@bluehost.com
Signed-off-by: Balbir Singh
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Balbir Singh
2008-09-06 00:14:35 +0800

03 Sep, 2008

1 commit

950bbabb5 pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits ... Browse Code »

We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits. As Robert Rex pointed
out this is wrong.

Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().

Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed. Kill
the now unneeded exit_child_reaper().

Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391

Reported-by: Robert Rex
Signed-off-by: Oleg Nesterov
Acked-by: Serge Hallyn
Acked-by: Pavel Emelyanov
Acked-by: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-09-03 10:21:38 +0800

27 Aug, 2008

1 commit

2633f0e57 exit signals: use of uninitialized field notify_count ... Browse Code »

task->signal->notify_count is only initialized if
task->signal->group_exit_task is not NULL. Reorder a conditional so
that uninitialised memory is not used. Found by Valgrind.

Signed-off-by: Steve VanDeBogart
Signed-off-by: Ingo Molnar

Steve VanDeBogart
2008-08-27 15:10:09 +0800

02 Aug, 2008

1 commit

5c7edcd7e tracehook: fix exit_signal=0 case ... Browse Code »

My commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664 introduced a regression
(sorry about that) for the odd case of exit_signal=0 (e.g. clone_flags=0).
This is not a normal use, but it's used by a case in the glibc test suite.

Dying with exit_signal=0 sends no signal, but it's supposed to wake up a
parent's blocked wait*() calls (unlike the delayed_group_leader case).
This fixes tracehook_notify_death() and its caller to distinguish a
"signal 0" wakeup from the delayed_group_leader case (with no wakeup).

Signed-off-by: Roland McGrath
Tested-by: Serge Hallyn
Signed-off-by: Linus Torvalds

Roland McGrath
2008-08-02 03:01:11 +0800

28 Jul, 2008

1 commit

5995477ab task IO accounting: improve code readability ... Browse Code »

Put all i/o statistics in struct proc_io_accounting and use inline functions to
initialize and increment statistics, removing a lot of single variable
assignments.

This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
CONFIG_TASK_IO_ACCOUNTING=y).

text data bss dec hex filename
11651 0 0 11651 2d83 kernel/exit.o.before
11619 0 0 11619 2d63 kernel/exit.o.after
10886 132 136 11154 2b92 kernel/fork.o.before
10758 132 136 11026 2b12 kernel/fork.o.after

3082029 807968 4818600 8708597 84e1f5 vmlinux.o.before
3081869 807968 4818600 8708437 84e155 vmlinux.o.after

Signed-off-by: Andrea Righi
Acked-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Andrea Righi
2008-07-28 00:58:20 +0800

27 Jul, 2008

4 commits

7f2da1e7d [PATCH] kill altroot ... Browse Code »

long overdue...

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:20 +0800
2b2a1ff64 tracehook: death ... Browse Code »

This moves the ptrace logic in task death (exit_notify) into tracehook.h
inlines. Some code is rearranged slightly to make things nicer. There is
no change, only cleanup.

There is one hook called with the tasklist_lock write-locked, as ptrace
needs. There is also a new hook called after exit_state changes and
without locks. This is a better place for tracing work to be in the
future, since it doesn't delay the whole system with locking.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Reviewed-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-07-27 03:00:09 +0800
dae33574d tracehook: release_task ... Browse Code »

This moves the ptrace-related logic from release_task into tracehook.h and
ptrace.h inlines. It provides clean hooks both before and after locking
tasklist_lock, for future tracing logic to do more cleanup without the
lock.

This also changes release_task() itself in the rare "zap_leader" case to
set the leader to EXIT_DEAD before iterating. This maintains the
invariant that release_task() only ever handles a task in EXIT_DEAD. This
is a common-sense invariant that is already always true except in this one
arcane case of zombie leader whose parent ignores SIGCHLD.

This change is harmless and only costs one store in this one rare case.
It keeps the expected state more consisently sane, which is nicer when
debugging weirdness in release_task(). It also lets some future code in
the tracehook entry points rely on this invariant for bookkeeping.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Reviewed-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-07-27 03:00:08 +0800
30199f5a4 tracehook: exit ... Browse Code »

This moves the PTRACE_EVENT_EXIT tracing into a tracehook.h inline,
tracehook_report_exec(). The change has no effect, just clean-up.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Reviewed-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-07-27 03:00:08 +0800

26 Jul, 2008

8 commits

297c5d926 task IO accounting: provide distinct tgid/tid I/O statistics ... Browse Code »

Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
parent I/O statistics in /proc/pid/io. This approach follows the same
model used to account per-process and per-thread CPU times.

As a practial application, this allows for example to quickly find the top
I/O consumer when a process spawns many child threads that perform the
actual I/O work, because the aggregated I/O statistics can always be found
in /proc/pid/io.

[ Oleg Nesterov points out that we should check that the task is still
alive before we iterate over the threads, but also says that we can do
that fixup on top of this later. - Linus ]

Acked-by: Balbir Singh
Signed-off-by: Andrea Righi
Cc: Matt Heaton
Cc: Shailabh Nagar
Acked-by-with-comments: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Righi
2008-07-26 01:53:47 +0800
a94e2d408 coredump: kill mm->core_done ... Browse Code »

Now that we have core_state->dumper list we can use it to wake up the
sub-threads waiting for the coredump completion.

This uglifies the code and .text grows by 47 bytes, but otoh mm_struct
lessens by sizeof(struct completion). Also, with this change we can
decouple exit_mm() from the coredumping code.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:40 +0800
b564daf80 coredump: construct the list of coredumping threads at startup time ... Browse Code »

binfmt->core_dump() has to iterate over the all threads in system in order
to find the coredumping threads and construct the list using the
GFP_ATOMIC allocations.

With this patch each thread allocates the list node on exit_mm()'s stack and
adds itself to the list.

This allows us to do further changes:

- simplify ->core_dump()

- change exit_mm() to clear ->mm first, then wait for ->core_done.
this makes the coredumping process visible to oom_kill

- kill mm->core_done

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:40 +0800
c5f1cc8c1 coredump: turn core_state->nr_threads into atomic_t ... Browse Code »

Turn core_state->nr_threads into atomic_t and kill now unneeded
down_write(&mm->mmap_sem) in exit_mm().

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:39 +0800
999d9fc16 coredump: move mm->core_waiters into struct core_state ... Browse Code »

Move mm->core_waiters into "struct core_state" allocated on stack. This
shrinks mm_struct a little bit and allows further changes.

This patch mostly does s/core_waiters/core_state. The only essential
change is that coredump_wait() must clear mm->core_state before return.

The coredump_wait()'s path is uglified and .text grows by 30 bytes, this
is fixed by the next patch.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:39 +0800
32ecb1f26 coredump: turn mm->core_startup_done into the pointer to struct core_state ... Browse Code »

mm->core_startup_done points to "struct completion startup_done" allocated
on the coredump_wait()'s stack. Introduce the new structure, core_state,
which holds this "struct completion". This way we can add more info
visible to the threads participating in coredump without enlarging
mm_struct.

No changes in affected .o files.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:39 +0800
7b34e4283 introduce PF_KTHREAD flag ... Browse Code »

Introduce the new PF_KTHREAD flag to mark the kernel threads. It is set
by INIT_TASK() and copied to the forked childs (we could set it in
kthreadd() along with PF_NOFREEZE instead).

daemonize() was changed as well. In that case testing of PF_KTHREAD is
racy, but daemonize() is hopeless anyway.

This flag is cleared in do_execve(), before search_binary_handler().
Probably not the best place, we can do this in exec_mmap() or in
start_thread(), or clear it along with PF_FORKNOEXEC. But I think this
doesn't matter in practice, and if do_execve() fails kthread should die
soon.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:39 +0800
3854a7718 __exit_signal: don't take rcu lock ... Browse Code »

There is no reason for rcu_read_lock() in __exit_signal(). tsk->sighand
can only be changed if tsk does exec, obviously this is not possible.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-07-26 01:53:38 +0800

17 Jul, 2008

4 commits

666f164f4 fix dangling zombie when new parent ignores children ... Browse Code »

This fixes an arcane bug that we think was a regression introduced
by commit b2b2cbc4b2a2f389442549399a993a8306420baf. When a parent
ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap
but they don't if it's using ptrace on them. When the parent thread
later exits and ceases to ptrace a child but leaves other live
threads in the parent's thread group, any zombie children are left
dangling. The fix makes them self-reap then, as they would have
done earlier if ptrace had not been in use.

Signed-off-by: Roland McGrath

Roland McGrath
2008-07-17 09:02:34 +0800
14dd0b814 do_wait: return security_task_wait() error code in place of -ECHILD ... Browse Code »

This reverts the effect of commit f2cc3eb133baa2e9dc8efd40f417106b2ee520f3
"do_wait: fix security checks". That change reverted the effect of commit
73243284463a761e04d69d22c7516b2be7de096c. The rationale for the original
commit still stands. The inconsistent treatment of children hidden by
ptrace was an unintended omission in the original change and in no way
invalidates its purpose.

This makes do_wait return the error returned by security_task_wait()
(usually -EACCES) in place of -ECHILD when there are some children the
caller would be able to wait for if not for the permission failure. A
permission error will give the user a clue to look for security policy
problems, rather than for mysterious wait bugs.

Signed-off-by: Roland McGrath

Roland McGrath
2008-07-17 09:02:34 +0800
f470021ad ptrace children revamp ... Browse Code »

ptrace no longer fiddles with the children/sibling links, and the
old ptrace_children list is gone. Now ptrace, whether of one's own
children or another's via PTRACE_ATTACH, just uses the new ptraced
list instead.

There should be no user-visible difference that matters. The only
change is the order in which do_wait() sees multiple stopped
children and stopped ptrace attachees. Since wait_task_stopped()
was changed earlier so it no longer reorders the children list, we
already know this won't cause any new problems.

Signed-off-by: Roland McGrath

Roland McGrath
2008-07-17 09:02:33 +0800
98abed020 do_wait reorganization ... Browse Code »

This breaks out the guts of do_wait into three subfunctions.
The control flow is less nonobvious without so much goto.
do_wait_thread and ptrace_do_wait contain the main work of the outer loop.
wait_consider_task contains the main work of the inner loop.

Signed-off-by: Roland McGrath

Roland McGrath
2008-07-17 09:02:33 +0800

03 Jul, 2008

1 commit

da9cbc873 block: blkdev.h cleanup, move iocontext stuff to iocontext.h ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2008-07-03 19:21:14 +0800

25 May, 2008

1 commit

da7978b03 signals: fix sigqueue_free() vs __exit_signal() race ... Browse Code »

__exit_signal() does flush_sigqueue(tsk->pending) outside of ->siglock.
This can race with another thread doing sigqueue_free(), we can free the
same SIGQUEUE_PREALLOC sigqueue twice or corrupt the pending->list.

Note that even sys_exit_group() can trigger this race, not only
sys_timer_delete().

Move the callsite of flush_sigqueue(tsk->pending) under ->siglock.

This patch doesn't touch flush_sigqueue(->shared_pending) below, it is
called when there are no other threads which can play with signals, and
sigqueue_free() can't be used outside of our thread group.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-05-25 00:56:10 +0800

02 May, 2008

1 commit

9f3acc314 [PATCH] split linux/file.h ... Browse Code »

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:16 +0800

30 Apr, 2008

6 commits

7d8da0962 pids: __set_special_pids: use change_pid() helper ... Browse Code »

Use change_pid() instead of detach_pid() + attach_pid() in
__set_special_pids().

This way task_session() is not NULL in between.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:49 +0800
53b6f9fbd ptrace: introduce ptrace_reparented() helper ... Browse Code »

Add another trivial helper for the sake of grep. It also auto-documents the
fact that ->parent != real_parent implies ->ptrace.

No functional changes.

Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:38 +0800
2800d8d19 document de_thread() with exit_notify() connection ... Browse Code »

Add a couple of small comments, it is not easy to see what this code does.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:38 +0800
376e1d253 reparent_thread: use same_thread_group() ... Browse Code »

Trivial, use same_thread_group() in reparent_thread().

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:38 +0800
d839fd4d2 ptrace: introduce task_detached() helper ... Browse Code »

exit.c has numerous "->exit_signal == -1" comparisons, this check is subtle
and deserves a helper. Imho makes the code more parseable for humans. At
least it's surely more greppable.

Also, a couple of whitespace cleanups. No functional changes.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:38 +0800
bfc4b0890 signals: do_group_exit(): use signal_group_exit() more consistently ... Browse Code »

do_group_exit() checks SIGNAL_GROUP_EXIT to avoid taking sighand->siglock.
Since ed5d2cac114202fe2978a9cbcab8f5032796d538 exec() doesn't set this
flag, we should use signal_group_exit().

This is not needed for correctness, but can speedup the multithreaded exec
and makes the code more consistent.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-04-30 23:29:33 +0800

29 Apr, 2008

1 commit

cf475ad28 cgroups: add an owner to the mm_struct ... Browse Code »

Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh
Cc: Pavel Emelianov
Cc: Hugh Dickins
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: Hirokazu Takahashi
Cc: David Rientjes ,
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Pekka Enberg
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-04-29 23:06:10 +0800

28 Apr, 2008

1 commit

f0be3d32b mempolicy: rename mpol_free to mpol_put ... Browse Code »

This is a change that was requested some time ago by Mel Gorman. Makes sense
to me, so here it is.

Note: I retain the name "mpol_free_shared_policy()" because it actually does
free the shared_policy, which is NOT a reference counted object. However, ...

The mempolicy object[s] referenced by the shared_policy are reference counted,
so mpol_put() is used to release the reference held by the shared_policy. The
mempolicy might not be freed at this time, because some task attached to the
shared object associated with the shared policy may be in the process of
allocating a page based on the mempolicy. In that case, the task performing
the allocation will hold a reference on the mempolicy, obtained via
mpol_shared_policy_lookup(). The mempolicy will be freed when all tasks
holding such a reference have called mpol_put() for the mempolicy.

Signed-off-by: Lee Schermerhorn
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Mel Gorman
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2008-04-28 23:58:23 +0800

25 Apr, 2008

2 commits

3b1253880 [PATCH] sanitize unshare_files/reset_files_struct ... Browse Code »

* let unshare_files() give caller the displaced files_struct
* don't bother with grabbing reference only to drop it in the
caller if it hadn't been shared in the first place
* in that form unshare_files() is trivially implemented via
unshare_fd(), so we eliminate the duplicate logics in fork.c
* reset_files_struct() is not just only called for current;
it will break the system if somebody ever calls it for anything
else (we can't modify ->files of somebody else). Lose the
task_struct * argument.

Signed-off-by: Al Viro

Al Viro
2008-04-25 21:23:59 +0800
fd8328be8 [PATCH] sanitize handling of shared descriptor tables in failing execve() ... Browse Code »

* unshare_files() can fail; doing it after irreversible actions is wrong
and de_thread() is certainly irreversible.
* since we do it unconditionally anyway, we might as well do it in do_execve()
and save ourselves the PITA in binfmt handlers, etc.
* while we are at it, binfmt_som actually leaked files_struct on failure.

As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
become unexported.

Signed-off-by: Al Viro

Al Viro
2008-04-25 21:23:53 +0800

23 Apr, 2008

1 commit

1ec7f1ddb [PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct() ... Browse Code »

The only reason to have separated __...() for those was to keep them inlined
for local users in exit.c. Since Alexey removed the inline on those, there's
no reason whatsoever to keep them around; just collapse with normal variants.

Signed-off-by: Al Viro

Al Viro
2008-04-23 07:55:09 +0800

11 Apr, 2008

1 commit

54a015104 asmlinkage_protect replaces prevent_tail_call ... Browse Code »

The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs. The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.

Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.

More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function. This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else. It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
2008-04-11 08:28:26 +0800

09 Mar, 2008

1 commit

6efcae460 Fix waitid si_code regression ... Browse Code »

In commit ee7c82da830ea860b1f9274f1f0cdf99f206e7c2 ("wait_task_stopped:
simplify and fix races with SIGCONT/SIGKILL/untrace"), the magic (short)
cast when storing si_code was lost in wait_task_stopped. This leaks the
in-kernel CLD_* values that do not match what userland expects.

Signed-off-by: Roland McGrath
Cc: Oleg Nesterov
Signed-off-by: Linus Torvalds

Roland McGrath
2008-03-09 03:54:00 +0800

04 Mar, 2008

2 commits

821c7de71 exit_notify: fix kill_orphaned_pgrp() usage with mt exit ... Browse Code »

1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we
should do this only when the whole process exits.

2. exit_notify() uses "current" as "ignored_task", obviously wrong.
Use ->group_leader instead.

Test case:

void hup(int sig)
{
printf("HUP received\n");
}

void *tfunc(void *arg)
{
sleep(2);
printf("sub-thread exited\n");
return NULL;
}

int main(int argc, char *argv[])
{
if (!fork()) {
signal(SIGHUP, hup);
kill(getpid(), SIGSTOP);
exit(0);
}

pthread_t thr;
pthread_create(&thr, NULL, tfunc, NULL);

sleep(1);
printf("main thread exited\n");
syscall(__NR_exit, 0);

return 0;
}

output:

main thread exited
HUP received
Hangup

With this patch the output is:

main thread exited
sub-thread exited
HUP received

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-03-04 06:53:16 +0800
05e83df62 will_become_orphaned_pgrp: partially fix insufficient ->exit_state check ... Browse Code »

p->exit_state != 0 doesn't mean this process is dead, it may have
sub-threads. Change the code to use "p->exit_state && thread_group_empty(p)"
instead.

Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process
if the main thread has exited.

However, the new check is not perfect either. There is a window when
exit_notify() drops tasklist and before release_task(). Suppose that
the last (non-leader) thread exits. This means that entire group exits,
but thread_group_empty() is not true yet.

As Eric pointed out, is_global_init() is wrong as well, but I did not
dare to do other changes.

Just for the record, has_stopped_jobs() is absolutely wrong too. But we
can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues.

Even with this patch ^Z doesn't play well with the dead main thread.
The task is stopped correctly but do_wait(WSTOPPED) won't see it. This
is another unrelated issue, will be (hopefully) fixed separately.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-03-04 06:53:16 +0800