11 Jan, 2012

2 commits

  • ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
    user namespace. (new, thanks Oleg)

    __send_signal: convert current's uid to the recipient's user namespace
    for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
    again :)

    do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
    user namespace

    ptrace_signal maps parent's uid into current's user namespace before
    including in signal to current. IIUC Oleg has argued that this shouldn't
    matter as the debugger will play with it, but it seems like not converting
    the value currently being set is misleading.

    Changelog:
    Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
    simplify callers and help make clear what we are translating
    (which uid into which namespace). Passing the target task would
    make callers even easier to read, but we pass in user_ns because
    current_user_ns() != task_cred_xxx(current, user_ns).
    Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
    in ptrace_signal().
    Sep 23: In send_signal(), detect when (user) signal is coming from an
    ancestor or unrelated user namespace. Pass that on to __send_signal,
    which sets si_uid to 0 or overflowuid if needed.
    Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
    SI_FROMKERNEL cases at callers, because we can't assume sender is
    current in those cases.
    Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
    Nov 10: (akpm) make the !CONFIG_USER_NS case clearer

    Signed-off-by: Serge Hallyn
    Cc: Oleg Nesterov
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    From: Serge Hallyn
    Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)

    Eric Biederman pointed out that passing info is a bug and could lead to a
    NULL pointer deref to boot.

    A collection of signal, securebits, filecaps, cap_bounds, and a few other
    ltp tests passed with this kernel.

    Changelog:
    Nov 18: previous patch missed a leading '&'

    Signed-off-by: Serge Hallyn
    Cc: "Eric W. Biederman"
    From: Dan Carpenter
    Subject: ipc/mqueue: lock() => unlock() typo

    There was a double lock typo introduced in b085f4bd6b21 "user namespace:
    make signal.c respect user namespaces"

    Signed-off-by: Dan Carpenter
    Cc: Oleg Nesterov
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Abstract the code sequence for adding a signal handler's sa_mask to
    current->blocked because the sequence is identical for all architectures.
    Furthermore, in the past some architectures actually got this code wrong,
    so introduce a wrapper that all architectures can use.

    Signed-off-by: Matt Fleming
    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tejun Heo
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Fleming
     

10 Jan, 2012

1 commit

  • * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    cgroup: fix to allow mounting a hierarchy by name
    cgroup: move assignement out of condition in cgroup_attach_proc()
    cgroup: Remove task_lock() from cgroup_post_fork()
    cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
    cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
    cgroup: only need to check oldcgrp==newgrp once
    cgroup: remove redundant get/put of task struct
    cgroup: remove redundant get/put of old css_set from migrate
    cgroup: Remove unnecessary task_lock before fetching css_set on migration
    cgroup: Drop task_lock(parent) on cgroup_fork()
    cgroups: remove redundant get/put of css_set from css_set_check_fetched()
    resource cgroups: remove bogus cast
    cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
    cgroup, cpuset: don't use ss->pre_attach()
    cgroup: don't use subsys->can_attach_task() or ->attach_task()
    cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
    cgroup: improve old cgroup handling in cgroup_attach_proc()
    cgroup: always lock threadgroup during migration
    threadgroup: extend threadgroup_lock() to cover exit and exec
    threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
    ...

    Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
    fix a css_set not found bug in cgroup_attach_proc" that already
    mentioned that the bug is fixed (differently) in Tejun's cgroup
    patchset. This one, in other words.

    Linus Torvalds
     

07 Jan, 2012

1 commit

  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    sched/tracing: Add a new tracepoint for sleeptime
    sched: Disable scheduler warnings during oopses
    sched: Fix cgroup movement of waking process
    sched: Fix cgroup movement of newly created process
    sched: Fix cgroup movement of forking process
    sched: Remove cfs bandwidth period check in tg_set_cfs_period()
    sched: Fix load-balance lock-breaking
    sched: Replace all_pinned with a generic flags field
    sched: Only queue remote wakeups when crossing cache boundaries
    sched: Add missing rcu_dereference() around ->real_parent usage
    [S390] fix cputime overflow in uptime_proc_show
    [S390] cputime: add sparse checking and cleanup
    sched: Mark parent and real_parent as __rcu
    sched, nohz: Fix missing RCU read lock
    sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
    sched, nohz: Fix the idle cpu check in nohz_idle_balance
    sched: Use jump_labels for sched_feat
    sched/accounting: Fix parameter passing in task_group_account_field
    sched/accounting: Fix user/system tick double accounting
    sched/accounting: Re-use scheduler statistics for the root cgroup
    ...

    Fix up conflicts in
    - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
    usecs_to_cputime64() vs the sparse cleanups
    - kernel/sched/fair.c, kernel/time/tick-sched.c
    scheduler changes in multiple branches

    Linus Torvalds
     

05 Jan, 2012

1 commit

  • This is the temporary simple fix for 3.2, we need more changes in this
    area.

    1. do_signal_stop() assumes that the running untraced thread in the
    stopped thread group is not possible. This was our goal but it is
    not yet achieved: a stopped-but-resumed tracee can clone the running
    thread which can initiate another group-stop.

    Remove WARN_ON_ONCE(!current->ptrace).

    2. A new thread always starts with ->jobctl = 0. If it is auto-attached
    and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
    but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
    in do_jobctl_trap() if another debugger attaches.

    Change __ptrace_unlink() to set the artificial SIGSTOP for report.

    Alternatively we could change ptrace_init_task() to copy signr from
    current, but this means we can copy it for no reason and hide the
    possible similar problems.

    Acked-by: Tejun Heo
    Cc: [3.1]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

15 Dec, 2011

1 commit


13 Dec, 2011

1 commit

  • threadgroup_lock() protected only protected against new addition to
    the threadgroup, which was inherently somewhat incomplete and
    problematic for its only user cgroup. On-going migration could race
    against exec and exit leading to interesting problems - the symmetry
    between various attach methods, task exiting during method execution,
    ->exit() racing against attach methods, migrating task switching basic
    properties during exec and so on.

    This patch extends threadgroup_lock() such that it protects against
    all three threadgroup altering operations - fork, exit and exec. For
    exit, threadgroup_change_begin/end() calls are added to exit_signals
    around assertion of PF_EXITING. For exec, threadgroup_[un]lock() are
    updated to also grab and release cred_guard_mutex.

    With this change, threadgroup_lock() guarantees that the target
    threadgroup will remain stable - no new task will be added, no new
    PF_EXITING will be set and exec won't happen.

    The next patch will update cgroup so that it can take full advantage
    of this change.

    -v2: beefed up comment as suggested by Frederic.

    -v3: narrowed scope of protection in exit path as suggested by
    Frederic.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Li Zefan
    Acked-by: Frederic Weisbecker
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Menage
    Cc: Linus Torvalds

    Tejun Heo
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

30 Sep, 2011

1 commit

  • Add to the dev_state and alloc_async structures the user namespace
    corresponding to the uid and euid. Pass these to kill_pid_info_as_uid(),
    which can then implement a proper, user-namespace-aware uid check.

    Changelog:
    Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
    uid, and euid each separately, pass a struct cred.
    Sep 26: Address Alan Stern's comments: don't define a struct cred at
    usbdev_open(), and take and put a cred at async_completed() to
    ensure it lasts for the duration of kill_pid_info_as_cred().

    Signed-off-by: Serge Hallyn
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Serge Hallyn
     

28 Jul, 2011

1 commit

  • sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
    change ->blocked directly. This is not correct, see the changelog in
    e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"

    Change them to use set_current_blocked().

    Another change is that now we are doing ->saved_sigmask = ->blocked
    lockless, it doesn't make any sense to do this under ->siglock.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

23 Jul, 2011

1 commit

  • * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
    ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
    ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
    connector: add an event for monitoring process tracers
    ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
    ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
    ptrace_init_task: initialize child->jobctl explicitly
    has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
    ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
    ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
    ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
    ptrace: ptrace_reparented() should check same_thread_group()
    redefine thread_group_leader() as exit_signal >= 0
    do not change dead_task->exit_signal
    kill task_detached()
    reparent_leader: check EXIT_DEAD instead of task_detached()
    make do_notify_parent() __must_check, update the callers
    __ptrace_detach: avoid task_detached(), check do_notify_parent()
    kill tracehook_notify_death()
    make do_notify_parent() return bool
    ptrace: s/tracehook_tracer_task()/ptrace_parent()/
    ...

    Linus Torvalds
     

21 Jul, 2011

2 commits

  • Simple test-case,

    int main(void)
    {
    int pid, status;

    pid = fork();
    if (!pid) {
    pause();
    assert(0);
    return 0x23;
    }

    assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
    assert(wait(&status) == pid);
    assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);

    kill(pid, SIGCONT); // ptrace check from ptrace_signal() to its caller,
    get_signal_to_deliver(), this looks more natural.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • The __lock_task_sighand() function calls rcu_read_lock() with interrupts
    and preemption enabled, but later calls rcu_read_unlock() with interrupts
    disabled. It is therefore possible that this RCU read-side critical
    section will be preempted and later RCU priority boosted, which means that
    rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but
    with interrupts disabled. This results in lockdep splats, so this commit
    nests the RCU read-side critical section within the interrupt-disabled
    region of code. This prevents the RCU read-side critical section from
    being preempted, and thus prevents the attempt to deboost with interrupts
    disabled.

    It is quite possible that a better long-term fix is to make rt_mutex_unlock()
    disable irqs when acquiring the rt_mutex structure's ->wait_lock.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 Jun, 2011

3 commits

  • Kill real_parent_is_ptracer() and update the callers to use
    ptrace_reparented(), after the previous patch they do the same.

    Remove the unnecessary ->ptrace != 0 check in get_signal_to_deliver(),
    if ptrace_reparented() == T then the task must be ptraced.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • __ptrace_detach() and do_notify_parent() set task->exit_signal = -1
    to mark the task dead. This is no longer needed, nobody checks
    exit_signal to detect the EXIT_DEAD task.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo

    Oleg Nesterov
     
  • - change do_notify_parent() to return a boolean, true if the task should
    be reaped because its parent ignores SIGCHLD.

    - update the only caller which checks the returned value, exit_notify().

    This temporary uglifies exit_notify() even more, will be cleanuped by
    the next change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

23 Jun, 2011

2 commits

  • At this point, tracehooks aren't useful to mainline kernel and mostly
    just add an extra layer of obfuscation. Although they have comments,
    without actual in-kernel users, it is difficult to tell what are their
    assumptions and they're actually trying to achieve. To mainline
    kernel, they just aren't worth keeping around.

    This patch kills the following trivial tracehooks.

    * Ones testing whether task is ptraced. Replace with ->ptrace test.

    tracehook_expect_breakpoints()
    tracehook_consider_ignored_signal()
    tracehook_consider_fatal_signal()

    * ptrace_event() wrappers. Call directly.

    tracehook_report_exec()
    tracehook_report_exit()
    tracehook_report_vfork_done()

    * ptrace_release_task() wrapper. Call directly.

    tracehook_finish_release_task()

    * noop

    tracehook_prepare_release_task()
    tracehook_report_death()

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: Martin Schwidefsky
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task_ptrace(task) simply dereferences task->ptrace and isn't even used
    consistently only adding confusion. Kill it and directly access
    ->ptrace instead.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

17 Jun, 2011

4 commits

  • The previous patch implemented async notification for ptrace but it
    only worked while trace is running. This patch introduces
    PTRACE_LISTEN which is suggested by Oleg Nestrov.

    It's allowed iff tracee is in STOP trap and puts tracee into
    quasi-running state - tracee never really runs but wait(2) and
    ptrace(2) consider it to be running. While ptracer is listening,
    tracee is allowed to re-enter STOP to notify an async event.
    Listening state is cleared on the first notification. Ptracer can
    also clear it by issuing INTERRUPT - tracee will re-trap into STOP
    with listening state cleared.

    This allows ptracer to monitor group stop state without running tracee
    - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
    wait(2) to wait for the next group stop event. When it happens,
    PTRACE_GETSIGINFO provides information to determine the current state.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207
    #define PTRACE_LISTEN 0x4208

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    if (si.si_signo != SIGTRAP)
    ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
    else
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    This is identical to the program to test TRAP_NOTIFY except that
    tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
    This allows ptracer to monitor when group stop ends without running
    tracee.

    # ./test-listen
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
    task_stopped_code() as suggested by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently there's no way for ptracer to find out whether group stop
    finished other than polling with INTERRUPT - GETSIGINFO - CONT
    sequence. This patch implements group stop notification for ptracer
    using STOP traps.

    When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
    is set, which schedules a STOP trap which is sticky - it isn't cleared
    by other traps and at least one STOP trap will happen eventually.
    STOP trap is synchronization point for event notification and the
    tracer can determine the current group stop state by looking at the
    signal number portion of exit code (si_status from waitid(2) or
    si_code from PTRACE_GETSIGINFO).

    Notifications are generated both on start and end of group stops but,
    because group stop participation always happens before STOP trap, this
    doesn't cause an extra trap while tracee is participating in group
    stop. The symmetry will be useful later.

    Note that this notification works iff tracee is not trapped.
    Currently there is no way to be notified of group stop state changes
    while tracee is trapped. This will be addressed by a later patch.

    An example program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    In the above program, tracer keeps tracee running and gets
    notification of each group stop state changes.

    # ./test-notify
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
    effects on tracee signal and job control states. This patch
    implements a new ptrace request PTRACE_SEIZE which attaches a tracee
    without trapping it or affecting its signal and job control states.

    The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
    flags in @data. Currently, the only defined flag is
    PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
    PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
    The changes will be implemented gradually and the DEVEL flag is to
    prevent programs which expect full SEIZE behavior from using it before
    all the behavior modifications are complete while allowing unit
    testing. The flag will be removed once SEIZE behaviors are completely
    implemented.

    * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap. After
    attaching tracee continues to run unless a trap condition occurs.

    * PTRACE_SEIZE doesn't affect signal or group stop state.

    * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
    exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
    the stopping signals if group stop is in effect or SIGTRAP
    otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
    instead of NULL.

    Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
    used to determine whether new SEIZE behaviors should be enabled.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive\n");
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);
    printf("tracer: exiting\n");
    return 0;
    }

    When the above program is called w/o argument, tracee is seized while
    running and remains running. When tracer exits, tracee continues to
    run and print out messages.

    # ./test-seize-simple
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    tracee: alive
    tracee: alive

    When called with an argument, tracee is seized from stopped state and
    continued, and returns to stopped state when tracer exits.

    # ./test-seize
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    # ps -el|grep test-seize
    1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

    -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
    suggested.

    -v3: PTRACE_EVENT_STOP traps now report group stop state by signr. If
    group stop is in effect the stop signal number is returned as
    part of exit_code; otherwise, SIGTRAP. This was suggested by
    Denys and Oleg.

    Signed-off-by: Tejun Heo
    Cc: Jan Kratochvil
    Cc: Denys Vlasenko
    Cc: Oleg Nesterov

    Tejun Heo
     
  • do_signal_stop() implemented both normal group stop and trap for group
    stop while ptraced. This approach has been enough but scheduled
    changes require trap mechanism which can be used in more generic
    manner and using group stop trap for generic trap site simplifies both
    userland visible interface and implementation.

    This patch adds a new jobctl flag - JOBCTL_TRAP_STOP. When set, it
    triggers a trap site, which behaves like group stop trap, in
    get_signal_to_deliver() after checking for pending signals. While
    ptraced, do_signal_stop() doesn't stop itself. It initiates group
    stop if requested and schedules JOBCTL_TRAP_STOP and returns. The
    caller - get_signal_to_deliver() - is responsible for checking whether
    TRAP_STOP is pending afterwards and handling it.

    ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
    JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
    bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
    after detach.

    While at it, add proper function comment to do_signal_stop() and make
    it return bool.

    -v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
    instead of JOBCTL_PENDING_MASK. This avoids accidentally
    clearing JOBCTL_STOP_CONSUME. Spotted by Oleg.

    -v3: do_signal_stop() updated to return %false without dropping
    siglock while ptraced and TRAP_STOP check moved inside for(;;)
    loop after group stop participation. This avoids unnecessary
    relocking and also will help avoiding unnecessary traps by
    consuming group stop before handling pending traps.

    -v4: Jobctl trap handling moved into a separate function -
    do_jobctl_trap().

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     

15 Jun, 2011

1 commit

  • Fix kernel-doc warnings in signal.c:

    Warning(kernel/signal.c:2374): No description found for parameter 'nset'
    Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

05 Jun, 2011

7 commits

  • Remove the following three noop tracehooks in signals.c.

    * tracehook_force_sigpending()
    * tracehook_get_signal()
    * tracehook_finish_jctl()

    The code area is about to be updated and these hooks don't do anything
    other than obfuscating the logic.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
    ->wait_chldexit was already complicated with waker-side filtering
    without adding TRAPPING wait on top of it. Also, it unnecessarily
    made TRAPPING clearing depend on the current ptrace relationship - if
    the ptracee is detached, wakeup is lost.

    There is no reason to use signal->wait_chldexit here. We're just
    waiting for JOBCTL_TRAPPING bit to clear and given the relatively
    infrequent use of ptrace, bit_waitqueue can serve it perfectly.

    This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
    signal->wait_chldexit.

    -v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
    pending bits too. Setting pending conditions on a dying task may make
    the task unkillable. Currently, each setting site is responsible for
    checking for the condition but with to-be-added job control traps this
    becomes too fragile.

    This patch adds task_set_jobctl_pending() which should be used when
    setting task->jobctl bits to schedule a stop or trap. The function
    performs the followings to ease setting pending bits.

    * Sanity checks.

    * If fatal signal is pending or PF_EXITING is set, no bit is set.

    * STOP_SIGMASK is automatically cleared if new value is being set.

    do_signal_stop() and ptrace_attach() are updated to use
    task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
    The surrounding structures around setting are changed to fit
    task_set_jobctl_pending() better but there should be no userland
    visible behavior difference.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
    (re)transit into TRACED. task_clear_jobctl_pending() must be called
    when either tracee enters TRACED or the transition is cancelled for
    some reason. The former is achieved by explicitly calling
    task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
    it at the end of do_signal_stop().

    Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
    limits the scope TRAPPING can be used and is fragile in that seemingly
    unrelated changes to tracee's control flow can lead to stuck TRAPPING.

    We already have task_clear_jobctl_pending() calls on those cancelling
    events to clear JOBCTL_STOP_PENDING. Cancellations can be handled by
    making those call sites use JOBCTL_PENDING_MASK instead and updating
    task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
    called automatically if no stop/trap is pending.

    This patch makes the above changes and removes the fallback
    task_clear_jobctl_trapping() call from do_signal_stop().

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • This patch introduces JOBCTL_PENDING_MASK and replaces
    task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
    which takes an extra @mask argument.

    JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
    future patches will add more bits. recalc_sigpending_tsk() is updated
    to use JOBCTL_PENDING_MASK instead.

    task_clear_jobctl_pending() takes @mask which in subset of
    JOBCTL_PENDING_MASK and clears the relevant jobctl bits. If
    JOBCTL_STOP_PENDING is set, other STOP bits are cleared together. All
    task_clear_jobctl_stop_pending() users are updated to call
    task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
    functionally identical to task_clear_jobctl_stop_pending().

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • In ptrace_stop(), after arch hook is done, the task state and jobctl
    bits are updated while holding siglock. The ordering requirement
    there is that TASK_TRACED is set before JOBCTL_TRAPPING is cleared to
    prevent ptracer waiting on TRAPPING doesn't end up waking up TRACED is
    actually set and sees TASK_RUNNING in wait(2).

    Move set_current_state(TASK_TRACED) to the top of the block and
    reorganize comments. This makes the ordering more obvious
    (TASK_TRACED before other updates) and helps future updates to group
    stop participation.

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • signal->group_stop currently hosts mostly group stop related flags;
    however, it's gonna be used for wider purposes and the GROUP_STOP_
    flag prefix becomes confusing. Rename signal->group_stop to
    signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

    Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
    defined in terms of them to allow using bitops later.

    While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
    future additions.

    This doesn't cause any functional change.

    -v2: JOBCTL_*_BIT macros added as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

26 May, 2011

1 commit


21 May, 2011

1 commit

  • * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits)
    signal: trivial, fix the "timespec declared inside parameter list" warning
    job control: reorganize wait_task_stopped()
    ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
    signal: sys_sigprocmask() needs retarget_shared_pending()
    signal: cleanup sys_sigprocmask()
    signal: rename signandsets() to sigandnsets()
    signal: do_sigtimedwait() needs retarget_shared_pending()
    signal: introduce do_sigtimedwait() to factor out compat/native code
    signal: sys_rt_sigtimedwait: simplify the timeout logic
    signal: cleanup sys_rt_sigprocmask()
    x86: signal: sys_rt_sigreturn() should use set_current_blocked()
    x86: signal: handle_signal() should use set_current_blocked()
    signal: sigprocmask() should do retarget_shared_pending()
    signal: sigprocmask: narrow the scope of ->siglock
    signal: retarget_shared_pending: optimize while_each_thread() loop
    signal: retarget_shared_pending: consider shared/unblocked signals only
    signal: introduce retarget_shared_pending()
    ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/
    signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED
    signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending()
    ...

    Linus Torvalds
     

09 May, 2011

2 commits

  • GROUP_STOP_TRAPPING waiting mechanism piggybacks on
    signal->wait_chldexit which is primarily used to implement waiting for
    wait(2) and friends. When do_wait() waits on signal->wait_chldexit,
    it uses a custom wake up callback, child_wait_callback(), which
    expects the child task which is waking up the parent to be passed in
    as @key to filter out spurious wakeups.

    task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
    @key causing the following oops if the parent was doing do_wait().

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
    IP: [] child_wait_callback+0x29/0x80
    PGD 1d899067 PUD 1e418067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
    CPU 2
    Modules linked in:

    Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
    RIP: 0010:[] [] child_wait_callback+0x29/0x80
    RSP: 0000:ffff88001b889bf8 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
    RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
    FS: 00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
    Stack:
    ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
    0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
    0000000000000002 0000000000000082 0000000000000001 0000000000000000
    Call Trace:
    [] __wake_up_common+0x59/0x90
    [] __wake_up_sync_key+0x53/0x80
    [] __wake_up_sync+0x10/0x20
    [] task_clear_jobctl_trapping+0x44/0x50
    [] ptrace_stop+0x7c/0x290
    [] do_signal_stop+0x28a/0x2d0
    [] get_signal_to_deliver+0x14f/0x5a0
    [] do_signal+0x75/0x7b0
    [] do_notify_resume+0x5d/0x70
    [] retint_signal+0x46/0x8c
    Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

    Fix it by using __wake_up_sync_key() and passing in the child as @key.

    I still think it's a mistake to piggyback on wait_chldexit for this.
    Given the relative low frequency of ptrace use, we would be much
    better off leaving already complex wait_chldexit alone and using bit
    waitqueue.

    Signed-off-by: Tejun Heo
    Reviewed-by: Oleg Nesterov

    Tejun Heo
     
  • sys_sigprocmask() changes current->blocked by hand. Convert this code
    to use set_current_blocked().

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     

28 Apr, 2011

6 commits

  • Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
    unconditionally and then copy-to-user it if oset != NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
    it should be "andn". Rename signandsets() as suggested.

    Suggested-by: Tejun Heo
    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • do_sigtimedwait() changes current->blocked and thus it needs
    set_current_blocked()->retarget_shared_pending().

    We could use set_current_blocked() directly. It is fine to change
    ->real_blocked from all-zeroes to ->blocked and vice versa lockless,
    but this is not immediately clear, looks racy, and needs a huge
    comment to explain why this is correct.

    To keep the things simple this patch adds the new static helper,
    __set_task_blocked() which should be called with ->siglock held. This
    way we can change both ->real_blocked and ->blocked atomically under
    ->siglock as the current code does. This is more understandable.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
    to the new helper, do_sigtimedwait().

    Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
    thanks to Linus who explained this to me.

    Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
    signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • No functional changes, cleanup compat_sys_rt_sigtimedwait() and
    sys_rt_sigtimedwait().

    Calculate the timeout before we take ->siglock, this simplifies and
    lessens the code. Use timespec_valid() to check the timespec.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
    We can just read current->blocked lockless unconditionally before
    anything else and then copy-to-user it if needed. At worst we
    copy 4 words on mips.

    We could copy-to-user the old mask first and simplify the code even
    more, but the patch tries to keep the current behaviour: we change
    current->block even if copy_to_user(oset) fails.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov