Eric Lee / smarc-fsl-linux-kernel

11 Jan, 2012

2 commits

6b550f949 user namespace: make signal.c respect user namespaces ... Browse Code »

ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)

__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)

do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace

ptrace_signal maps parent's uid into current's user namespace before
including in signal to current. IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.

Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
simplify callers and help make clear what we are translating
(which uid into which namespace). Passing the target task would
make callers even easier to read, but we pass in user_ns because
current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
ancestor or unrelated user namespace. Pass that on to __send_signal,
which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
SI_FROMKERNEL cases at callers, because we can't assume sender is
current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer

Signed-off-by: Serge Hallyn
Cc: Oleg Nesterov
Cc: Matt Helsley
Cc: "Eric W. Biederman"
From: Serge Hallyn
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)

Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.

A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.

Changelog:
Nov 18: previous patch missed a leading '&'

Signed-off-by: Serge Hallyn
Cc: "Eric W. Biederman"
From: Dan Carpenter
Subject: ipc/mqueue: lock() => unlock() typo

There was a double lock typo introduced in b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"

Signed-off-by: Dan Carpenter
Cc: Oleg Nesterov
Cc: Matt Helsley
Cc: "Eric W. Biederman"
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2012-01-11 08:30:54 +0800
5e6292c0f signal: add block_sigmask() for adding sigmask to current->blocked ... Browse Code »
1118

Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.

Signed-off-by: Matt Fleming
Signed-off-by: Oleg Nesterov
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Tejun Heo
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Fleming
2012-01-11 08:30:54 +0800

10 Jan, 2012

1 commit

db0c2bf69 Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cgroup: fix to allow mounting a hierarchy by name
cgroup: move assignement out of condition in cgroup_attach_proc()
cgroup: Remove task_lock() from cgroup_post_fork()
cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
cgroup: only need to check oldcgrp==newgrp once
cgroup: remove redundant get/put of task struct
cgroup: remove redundant get/put of old css_set from migrate
cgroup: Remove unnecessary task_lock before fetching css_set on migration
cgroup: Drop task_lock(parent) on cgroup_fork()
cgroups: remove redundant get/put of css_set from css_set_check_fetched()
resource cgroups: remove bogus cast
cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
cgroup, cpuset: don't use ss->pre_attach()
cgroup: don't use subsys->can_attach_task() or ->attach_task()
cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup: always lock threadgroup during migration
threadgroup: extend threadgroup_lock() to cover exit and exec
threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.

Linus Torvalds
2012-01-10 04:59:24 +0800

07 Jan, 2012

1 commit

0db49b72b Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...

Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches

Linus Torvalds
2012-01-07 00:44:54 +0800

05 Jan, 2012

1 commit

8a88951b5 ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach ... Browse Code »

This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
stopped thread group is not possible. This was our goal but it is
not yet achieved: a stopped-but-resumed tracee can clone the running
thread which can initiate another group-stop.

Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
in do_jobctl_trap() if another debugger attaches.

Change __ptrace_unlink() to set the artificial SIGSTOP for report.

Alternatively we could change ptrace_init_task() to copy signr from
current, but this means we can copy it for no reason and hide the
possible similar problems.

Acked-by: Tejun Heo
Cc: [3.1]
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-01-05 07:01:59 +0800

15 Dec, 2011

1 commit

648616343 [S390] cputime: add sparse checking and cleanup ... Browse Code »

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2011-12-15 21:56:19 +0800

13 Dec, 2011

1 commit

77e4ef99d threadgroup: extend threadgroup_lock() to cover exit and exec ... Browse Code »

threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup. On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.

This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec. For
exit, threadgroup_change_begin/end() calls are added to exit_signals
around assertion of PF_EXITING. For exec, threadgroup_[un]lock() are
updated to also grab and release cred_guard_mutex.

With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.

The next patch will update cgroup so that it can take full advantage
of this change.

-v2: beefed up comment as suggested by Frederic.

-v3: narrowed scope of protection in exit path as suggested by
Frederic.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: Li Zefan
Acked-by: Frederic Weisbecker
Cc: Oleg Nesterov
Cc: Andrew Morton
Cc: Paul Menage
Cc: Linus Torvalds

Tejun Heo
2011-12-13 10:12:21 +0800

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

30 Sep, 2011

1 commit

d178bc3a7 user namespace: usb: make usb urbs user namespace aware (v2) ... Browse Code »

Add to the dev_state and alloc_async structures the user namespace
corresponding to the uid and euid. Pass these to kill_pid_info_as_uid(),
which can then implement a proper, user-namespace-aware uid check.

Changelog:
Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
uid, and euid each separately, pass a struct cred.
Sep 26: Address Alan Stern's comments: don't define a struct cred at
usbdev_open(), and take and put a cred at async_completed() to
ensure it lasts for the duration of kill_pid_info_as_cred().

Signed-off-by: Serge Hallyn
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Serge Hallyn
2011-09-30 04:13:08 +0800

28 Jul, 2011

1 commit

c1095c6da signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked() ... Browse Code »

sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
change ->blocked directly. This is not correct, see the changelog in
e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Another change is that now we are doing ->saved_sigmask = ->blocked
lockless, it doesn't make any sense to do this under ->siglock.

Signed-off-by: Oleg Nesterov
Reviewed-by: Matt Fleming
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2011-07-28 03:53:36 +0800

23 Jul, 2011

1 commit

8209f53d7 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc ... Browse Code »

* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
connector: add an event for monitoring process tracers
ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
ptrace_init_task: initialize child->jobctl explicitly
has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
ptrace: ptrace_reparented() should check same_thread_group()
redefine thread_group_leader() as exit_signal >= 0
do not change dead_task->exit_signal
kill task_detached()
reparent_leader: check EXIT_DEAD instead of task_detached()
make do_notify_parent() __must_check, update the callers
__ptrace_detach: avoid task_detached(), check do_notify_parent()
kill tracehook_notify_death()
make do_notify_parent() return bool
ptrace: s/tracehook_tracer_task()/ptrace_parent()/
...

Linus Torvalds
2011-07-23 06:06:50 +0800

21 Jul, 2011

2 commits

8a3524180 ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction ... Browse Code »

Simple test-case,

int main(void)
{
int pid, status;

pid = fork();
if (!pid) {
pause();
assert(0);
return 0x23;
}

assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(wait(&status) == pid);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);

kill(pid, SIGCONT); // ptrace check from ptrace_signal() to its caller,
get_signal_to_deliver(), this looks more natural.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo

Oleg Nesterov
2011-07-21 23:06:53 +0800
a841796f1 signal: align __lock_task_sighand() irq disabling and RCU ... Browse Code »

The __lock_task_sighand() function calls rcu_read_lock() with interrupts
and preemption enabled, but later calls rcu_read_unlock() with interrupts
disabled. It is therefore possible that this RCU read-side critical
section will be preempted and later RCU priority boosted, which means that
rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but
with interrupts disabled. This results in lockdep splats, so this commit
nests the RCU read-side critical section within the interrupt-disabled
region of code. This prevents the RCU read-side critical section from
being preempted, and thus prevents the attempt to deboost with interrupts
disabled.

It is quite possible that a better long-term fix is to make rt_mutex_unlock()
disable irqs when acquiring the rt_mutex structure's ->wait_lock.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-07-21 02:04:54 +0800

28 Jun, 2011

3 commits

bb3696da8 ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented() ... Browse Code »

Kill real_parent_is_ptracer() and update the callers to use
ptrace_reparented(), after the previous patch they do the same.

Remove the unnecessary ->ptrace != 0 check in get_signal_to_deliver(),
if ptrace_reparented() == T then the task must be ptraced.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo

Oleg Nesterov
2011-06-28 02:30:10 +0800
d4f7c511c do not change dead_task->exit_signal ... Browse Code »

__ptrace_detach() and do_notify_parent() set task->exit_signal = -1
to mark the task dead. This is no longer needed, nobody checks
exit_signal to detect the EXIT_DEAD task.

Signed-off-by: Oleg Nesterov
Reviewed-by: Tejun Heo

Oleg Nesterov
2011-06-28 02:30:10 +0800
53c8f9f19 make do_notify_parent() return bool ... Browse Code »

- change do_notify_parent() to return a boolean, true if the task should
be reaped because its parent ignores SIGCHLD.

- update the only caller which checks the returned value, exit_notify().

This temporary uglifies exit_notify() even more, will be cleanuped by
the next change.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo

Oleg Nesterov
2011-06-28 02:30:08 +0800

23 Jun, 2011

2 commits

a288eecce ptrace: kill trivial tracehooks ... Browse Code »

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following trivial tracehooks.

* Ones testing whether task is ptraced. Replace with ->ptrace test.

tracehook_expect_breakpoints()
tracehook_consider_ignored_signal()
tracehook_consider_fatal_signal()

* ptrace_event() wrappers. Call directly.

tracehook_report_exec()
tracehook_report_exit()
tracehook_report_vfork_done()

* ptrace_release_task() wrapper. Call directly.

tracehook_finish_release_task()

* noop

tracehook_prepare_release_task()
tracehook_report_death()

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Cc: Martin Schwidefsky
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-23 01:26:28 +0800
d21142ece ptrace: kill task_ptrace() ... Browse Code »

task_ptrace(task) simply dereferences task->ptrace and isn't even used
consistently only adding confusion. Kill it and directly access
->ptrace instead.

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-23 01:26:27 +0800

17 Jun, 2011

4 commits

544b2c91a ptrace: implement PTRACE_LISTEN ... Browse Code »

The previous patch implemented async notification for ptrace but it
only worked while trace is running. This patch introduces
PTRACE_LISTEN which is suggested by Oleg Nestrov.

It's allowed iff tracee is in STOP trap and puts tracee into
quasi-running state - tracee never really runs but wait(2) and
ptrace(2) consider it to be running. While ptracer is listening,
tracee is allowed to re-enter STOP to notify an async event.
Listening state is cleared on the first notification. Ptracer can
also clear it by issuing INTERRUPT - tracee will re-trap into STOP
with listening state cleared.

This allows ptracer to monitor group stop state without running tracee
- use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
wait(2) to wait for the next group stop event. When it happens,
PTRACE_GETSIGINFO provides information to determine the current state.

Test program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207
#define PTRACE_LISTEN 0x4208

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts1s = { .tv_sec = 1 };

int main(int argc, char **argv)
{
pid_t tracee, tracer;
int i;

tracee = fork();
if (!tracee)
while (1)
pause();

tracer = fork();
if (!tracer) {
siginfo_t si;

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
repeat:
waitid(P_PID, tracee, NULL, WSTOPPED);

ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
if (!si.si_code) {
printf("tracer: SIG %d\n", si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL,
(void *)(unsigned long)si.si_signo);
goto repeat;
}
printf("tracer: stopped=%d signo=%d\n",
si.si_signo != SIGTRAP, si.si_signo);
if (si.si_signo != SIGTRAP)
ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
else
ptrace(PTRACE_CONT, tracee, NULL, NULL);
goto repeat;
}

for (i = 0; i < 3; i++) {
nanosleep(&ts1s, NULL);
printf("mother: SIGSTOP\n");
kill(tracee, SIGSTOP);
nanosleep(&ts1s, NULL);
printf("mother: SIGCONT\n");
kill(tracee, SIGCONT);
}
nanosleep(&ts1s, NULL);

kill(tracer, SIGKILL);
kill(tracee, SIGKILL);
return 0;
}

This is identical to the program to test TRAP_NOTIFY except that
tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
This allows ptracer to monitor when group stop ends without running
tracee.

# ./test-listen
tracer: stopped=0 signo=5
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18

-v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
task_stopped_code() as suggested by Oleg.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov

Tejun Heo
2011-06-17 03:41:54 +0800
fb1d910c1 ptrace: implement TRAP_NOTIFY and use it for group stop events ... Browse Code »

Currently there's no way for ptracer to find out whether group stop
finished other than polling with INTERRUPT - GETSIGINFO - CONT
sequence. This patch implements group stop notification for ptracer
using STOP traps.

When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
is set, which schedules a STOP trap which is sticky - it isn't cleared
by other traps and at least one STOP trap will happen eventually.
STOP trap is synchronization point for event notification and the
tracer can determine the current group stop state by looking at the
signal number portion of exit code (si_status from waitid(2) or
si_code from PTRACE_GETSIGINFO).

Notifications are generated both on start and end of group stops but,
because group stop participation always happens before STOP trap, this
doesn't cause an extra trap while tracee is participating in group
stop. The symmetry will be useful later.

Note that this notification works iff tracee is not trapped.
Currently there is no way to be notified of group stop state changes
while tracee is trapped. This will be addressed by a later patch.

An example program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts1s = { .tv_sec = 1 };

int main(int argc, char **argv)
{
pid_t tracee, tracer;
int i;

tracee = fork();
if (!tracee)
while (1)
pause();

tracer = fork();
if (!tracer) {
siginfo_t si;

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
repeat:
waitid(P_PID, tracee, NULL, WSTOPPED);

ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
if (!si.si_code) {
printf("tracer: SIG %d\n", si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL,
(void *)(unsigned long)si.si_signo);
goto repeat;
}
printf("tracer: stopped=%d signo=%d\n",
si.si_signo != SIGTRAP, si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL, NULL);
goto repeat;
}

for (i = 0; i < 3; i++) {
nanosleep(&ts1s, NULL);
printf("mother: SIGSTOP\n");
kill(tracee, SIGSTOP);
nanosleep(&ts1s, NULL);
printf("mother: SIGCONT\n");
kill(tracee, SIGCONT);
}
nanosleep(&ts1s, NULL);

kill(tracer, SIGKILL);
kill(tracee, SIGKILL);
return 0;
}

In the above program, tracer keeps tracee running and gets
notification of each group stop state changes.

# ./test-notify
tracer: stopped=0 signo=5
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1 signo=19
mother: SIGCONT
tracer: stopped=0 signo=5
tracer: SIG 18

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov

Tejun Heo
2011-06-17 03:41:53 +0800
3544d72a0 ptrace: implement PTRACE_SEIZE ... Browse Code »

PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
effects on tracee signal and job control states. This patch
implements a new ptrace request PTRACE_SEIZE which attaches a tracee
without trapping it or affecting its signal and job control states.

The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
flags in @data. Currently, the only defined flag is
PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
The changes will be implemented gradually and the DEVEL flag is to
prevent programs which expect full SEIZE behavior from using it before
all the behavior modifications are complete while allowing unit
testing. The flag will be removed once SEIZE behaviors are completely
implemented.

* PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap. After
attaching tracee continues to run unless a trap condition occurs.

* PTRACE_SEIZE doesn't affect signal or group stop state.

* If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
the stopping signals if group stop is in effect or SIGTRAP
otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
instead of NULL.

Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
used to determine whether new SEIZE behaviors should be enabled.

Test program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts100ms = { .tv_nsec = 100000000 };
static const struct timespec ts1s = { .tv_sec = 1 };
static const struct timespec ts3s = { .tv_sec = 3 };

int main(int argc, char **argv)
{
pid_t tracee;

tracee = fork();
if (tracee == 0) {
nanosleep(&ts100ms, NULL);
while (1) {
printf("tracee: alive\n");
nanosleep(&ts1s, NULL);
}
}

if (argc > 1)
kill(tracee, SIGSTOP);

nanosleep(&ts100ms, NULL);

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
if (argc > 1) {
waitid(P_PID, tracee, NULL, WSTOPPED);
ptrace(PTRACE_CONT, tracee, NULL, NULL);
}
nanosleep(&ts3s, NULL);
printf("tracer: exiting\n");
return 0;
}

When the above program is called w/o argument, tracee is seized while
running and remains running. When tracer exits, tracee continues to
run and print out messages.

# ./test-seize-simple
tracee: alive
tracee: alive
tracee: alive
tracer: exiting
tracee: alive
tracee: alive

When called with an argument, tracee is seized from stopped state and
continued, and returns to stopped state when tracer exits.

# ./test-seize
tracee: alive
tracee: alive
tracee: alive
tracer: exiting
# ps -el|grep test-seize
1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

-v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
suggested.

-v3: PTRACE_EVENT_STOP traps now report group stop state by signr. If
group stop is in effect the stop signal number is returned as
part of exit_code; otherwise, SIGTRAP. This was suggested by
Denys and Oleg.

Signed-off-by: Tejun Heo
Cc: Jan Kratochvil
Cc: Denys Vlasenko
Cc: Oleg Nesterov

Tejun Heo
2011-06-17 03:41:53 +0800
73ddff2be job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap ... Browse Code »

do_signal_stop() implemented both normal group stop and trap for group
stop while ptraced. This approach has been enough but scheduled
changes require trap mechanism which can be used in more generic
manner and using group stop trap for generic trap site simplifies both
userland visible interface and implementation.

This patch adds a new jobctl flag - JOBCTL_TRAP_STOP. When set, it
triggers a trap site, which behaves like group stop trap, in
get_signal_to_deliver() after checking for pending signals. While
ptraced, do_signal_stop() doesn't stop itself. It initiates group
stop if requested and schedules JOBCTL_TRAP_STOP and returns. The
caller - get_signal_to_deliver() - is responsible for checking whether
TRAP_STOP is pending afterwards and handling it.

ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
after detach.

While at it, add proper function comment to do_signal_stop() and make
it return bool.

-v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
instead of JOBCTL_PENDING_MASK. This avoids accidentally
clearing JOBCTL_STOP_CONSUME. Spotted by Oleg.

-v3: do_signal_stop() updated to return %false without dropping
siglock while ptraced and TRAP_STOP check moved inside for(;;)
loop after group stop participation. This avoids unnecessary
relocking and also will help avoiding unnecessary traps by
consuming group stop before handling pending traps.

-v4: Jobctl trap handling moved into a separate function -
do_jobctl_trap().

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov

Tejun Heo
2011-06-17 03:41:52 +0800

15 Jun, 2011

1 commit

ada9c9331 signal.c: fix kernel-doc notation ... Browse Code »

Fix kernel-doc warnings in signal.c:

Warning(kernel/signal.c:2374): No description found for parameter 'nset'
Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-06-15 10:12:17 +0800

05 Jun, 2011

7 commits

dd1d67726 signal: remove three noop tracehooks ... Browse Code »

Remove the following three noop tracehooks in signals.c.

* tracehook_force_sigpending()
* tracehook_get_signal()
* tracehook_finish_jctl()

The code area is about to be updated and these hooks don't do anything
other than obfuscating the logic.

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:11 +0800
62c124ff3 ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit ... Browse Code »

ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
->wait_chldexit was already complicated with waker-side filtering
without adding TRAPPING wait on top of it. Also, it unnecessarily
made TRAPPING clearing depend on the current ptrace relationship - if
the ptracee is detached, wakeup is lost.

There is no reason to use signal->wait_chldexit here. We're just
waiting for JOBCTL_TRAPPING bit to clear and given the relatively
infrequent use of ptrace, bit_waitqueue can serve it perfectly.

This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
signal->wait_chldexit.

-v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.

Signed-off-by: Tejun Heo
Cc: Linus Torvalds
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:11 +0800
7dd3db54e job control: introduce task_set_jobctl_pending() ... Browse Code »

task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
pending bits too. Setting pending conditions on a dying task may make
the task unkillable. Currently, each setting site is responsible for
checking for the condition but with to-be-added job control traps this
becomes too fragile.

This patch adds task_set_jobctl_pending() which should be used when
setting task->jobctl bits to schedule a stop or trap. The function
performs the followings to ease setting pending bits.

* Sanity checks.

* If fatal signal is pending or PF_EXITING is set, no bit is set.

* STOP_SIGMASK is automatically cleared if new value is being set.

do_signal_stop() and ptrace_attach() are updated to use
task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
The surrounding structures around setting are changed to fit
task_set_jobctl_pending() better but there should be no userland
visible behavior difference.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:11 +0800
6dfca3298 job control: make task_clear_jobctl_pending() clear TRAPPING automatically ... Browse Code »

JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
(re)transit into TRACED. task_clear_jobctl_pending() must be called
when either tracee enters TRACED or the transition is cancelled for
some reason. The former is achieved by explicitly calling
task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
it at the end of do_signal_stop().

Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
limits the scope TRAPPING can be used and is fragile in that seemingly
unrelated changes to tracee's control flow can lead to stuck TRAPPING.

We already have task_clear_jobctl_pending() calls on those cancelling
events to clear JOBCTL_STOP_PENDING. Cancellations can be handled by
making those call sites use JOBCTL_PENDING_MASK instead and updating
task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
called automatically if no stop/trap is pending.

This patch makes the above changes and removes the fallback
task_clear_jobctl_trapping() call from do_signal_stop().

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:11 +0800
3759a0d94 job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() ... Browse Code »

This patch introduces JOBCTL_PENDING_MASK and replaces
task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
which takes an extra @mask argument.

JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
future patches will add more bits. recalc_sigpending_tsk() is updated
to use JOBCTL_PENDING_MASK instead.

task_clear_jobctl_pending() takes @mask which in subset of
JOBCTL_PENDING_MASK and clears the relevant jobctl bits. If
JOBCTL_STOP_PENDING is set, other STOP bits are cleared together. All
task_clear_jobctl_stop_pending() users are updated to call
task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
functionally identical to task_clear_jobctl_stop_pending().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:10 +0800
81be24b8c ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() ... Browse Code »

In ptrace_stop(), after arch hook is done, the task state and jobctl
bits are updated while holding siglock. The ordering requirement
there is that TASK_TRACED is set before JOBCTL_TRAPPING is cleared to
prevent ptracer waiting on TRAPPING doesn't end up waking up TRACED is
actually set and sees TASK_RUNNING in wait(2).

Move set_current_state(TASK_TRACED) to the top of the block and
reorganize comments. This makes the ordering more obvious
(TASK_TRACED before other updates) and helps future updates to group
stop participation.

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:10 +0800
a8f072c1d job control: rename signal->group_stop and flags to jobctl and update them ... Browse Code »

signal->group_stop currently hosts mostly group stop related flags;
however, it's gonna be used for wider purposes and the GROUP_STOP_
flag prefix becomes confusing. Rename signal->group_stop to
signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
defined in terms of them to allow using bitops later.

While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
future additions.

This doesn't cause any functional change.

-v2: JOBCTL_*_BIT macros added as suggested by Linus.

Signed-off-by: Tejun Heo
Cc: Linus Torvalds
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-05 00:17:09 +0800

26 May, 2011

1 commit

d92fcf055 signal: sys_pause() should check signal_pending() ... Browse Code »

ERESTART* is always wrong without TIF_SIGPENDING. Teach sys_pause()
to handle the spurious wakeup correctly.

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2011-05-26 01:22:27 +0800

21 May, 2011

1 commit

3ed4c0583 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc ... Browse Code »

* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits)
signal: trivial, fix the "timespec declared inside parameter list" warning
job control: reorganize wait_task_stopped()
ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
signal: sys_sigprocmask() needs retarget_shared_pending()
signal: cleanup sys_sigprocmask()
signal: rename signandsets() to sigandnsets()
signal: do_sigtimedwait() needs retarget_shared_pending()
signal: introduce do_sigtimedwait() to factor out compat/native code
signal: sys_rt_sigtimedwait: simplify the timeout logic
signal: cleanup sys_rt_sigprocmask()
x86: signal: sys_rt_sigreturn() should use set_current_blocked()
x86: signal: handle_signal() should use set_current_blocked()
signal: sigprocmask() should do retarget_shared_pending()
signal: sigprocmask: narrow the scope of ->siglock
signal: retarget_shared_pending: optimize while_each_thread() loop
signal: retarget_shared_pending: consider shared/unblocked signals only
signal: introduce retarget_shared_pending()
ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/
signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED
signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending()
...

Linus Torvalds
2011-05-21 04:33:21 +0800

09 May, 2011

2 commits

40ae717d1 ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() ... Browse Code »

GROUP_STOP_TRAPPING waiting mechanism piggybacks on
signal->wait_chldexit which is primarily used to implement waiting for
wait(2) and friends. When do_wait() waits on signal->wait_chldexit,
it uses a custom wake up callback, child_wait_callback(), which
expects the child task which is waking up the parent to be passed in
as @key to filter out spurious wakeups.

task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
@key causing the following oops if the parent was doing do_wait().

BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
IP: [] child_wait_callback+0x29/0x80
PGD 1d899067 PUD 1e418067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
CPU 2
Modules linked in:

Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
RIP: 0010:[] [] child_wait_callback+0x29/0x80
RSP: 0000:ffff88001b889bf8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
Stack:
ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
0000000000000002 0000000000000082 0000000000000001 0000000000000000
Call Trace:
[] __wake_up_common+0x59/0x90
[] __wake_up_sync_key+0x53/0x80
[] __wake_up_sync+0x10/0x20
[] task_clear_jobctl_trapping+0x44/0x50
[] ptrace_stop+0x7c/0x290
[] do_signal_stop+0x28a/0x2d0
[] get_signal_to_deliver+0x14f/0x5a0
[] do_signal+0x75/0x7b0
[] do_notify_resume+0x5d/0x70
[] retint_signal+0x46/0x8c
Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

Fix it by using __wake_up_sync_key() and passing in the child as @key.

I still think it's a mistake to piggyback on wait_chldexit for this.
Given the relative low frequency of ptrace use, we would be much
better off leaving already complex wait_chldexit alone and using bit
waitqueue.

Signed-off-by: Tejun Heo
Reviewed-by: Oleg Nesterov

Tejun Heo
2011-05-09 20:19:54 +0800
2e4f7c776 signal: sys_sigprocmask() needs retarget_shared_pending() ... Browse Code »

sys_sigprocmask() changes current->blocked by hand. Convert this code
to use set_current_blocked().

Signed-off-by: Oleg Nesterov

Oleg Nesterov
2011-05-09 19:48:56 +0800

28 Apr, 2011

6 commits

b013c3992 signal: cleanup sys_sigprocmask() ... Browse Code »

Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
unconditionally and then copy-to-user it if oset != NULL.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo
Reviewed-by: Matt Fleming

Oleg Nesterov
2011-04-28 19:01:40 +0800
702a5073f signal: rename signandsets() to sigandnsets() ... Browse Code »

As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
it should be "andn". Rename signandsets() as suggested.

Suggested-by: Tejun Heo
Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo

Oleg Nesterov
2011-04-28 19:01:39 +0800
b182801ab signal: do_sigtimedwait() needs retarget_shared_pending() ... Browse Code »

do_sigtimedwait() changes current->blocked and thus it needs
set_current_blocked()->retarget_shared_pending().

We could use set_current_blocked() directly. It is fine to change
->real_blocked from all-zeroes to ->blocked and vice versa lockless,
but this is not immediately clear, looks racy, and needs a huge
comment to explain why this is correct.

To keep the things simple this patch adds the new static helper,
__set_task_blocked() which should be called with ->siglock held. This
way we can change both ->real_blocked and ->blocked atomically under
->siglock as the current code does. This is more understandable.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo
Reviewed-by: Matt Fleming

Oleg Nesterov
2011-04-28 19:01:39 +0800
943df1485 signal: introduce do_sigtimedwait() to factor out compat/native code ... Browse Code »

Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
to the new helper, do_sigtimedwait().

Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
thanks to Linus who explained this to me.

Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo
Reviewed-by: Matt Fleming

Oleg Nesterov
2011-04-28 19:01:38 +0800
fe0faa005 signal: sys_rt_sigtimedwait: simplify the timeout logic ... Browse Code »

No functional changes, cleanup compat_sys_rt_sigtimedwait() and
sys_rt_sigtimedwait().

Calculate the timeout before we take ->siglock, this simplifies and
lessens the code. Use timespec_valid() to check the timespec.

Signed-off-by: Oleg Nesterov
Acked-by: Tejun Heo
Reviewed-by: Matt Fleming

Oleg Nesterov
2011-04-28 19:01:38 +0800
bb7efee2c signal: cleanup sys_rt_sigprocmask() ... Browse Code »

sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
We can just read current->blocked lockless unconditionally before
anything else and then copy-to-user it if needed. At worst we
copy 4 words on mips.

We could copy-to-user the old mask first and simplify the code even
more, but the patch tries to keep the current behaviour: we change
current->block even if copy_to_user(oset) fails.

Signed-off-by: Oleg Nesterov
Reviewed-by: Matt Fleming
Acked-by: Tejun Heo

Oleg Nesterov
2011-04-28 19:01:38 +0800