15 Jun, 2011

1 commit

  • Fix kernel-doc warnings in signal.c:

    Warning(kernel/signal.c:2374): No description found for parameter 'nset'
    Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

26 May, 2011

1 commit


21 May, 2011

1 commit

  • * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits)
    signal: trivial, fix the "timespec declared inside parameter list" warning
    job control: reorganize wait_task_stopped()
    ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
    signal: sys_sigprocmask() needs retarget_shared_pending()
    signal: cleanup sys_sigprocmask()
    signal: rename signandsets() to sigandnsets()
    signal: do_sigtimedwait() needs retarget_shared_pending()
    signal: introduce do_sigtimedwait() to factor out compat/native code
    signal: sys_rt_sigtimedwait: simplify the timeout logic
    signal: cleanup sys_rt_sigprocmask()
    x86: signal: sys_rt_sigreturn() should use set_current_blocked()
    x86: signal: handle_signal() should use set_current_blocked()
    signal: sigprocmask() should do retarget_shared_pending()
    signal: sigprocmask: narrow the scope of ->siglock
    signal: retarget_shared_pending: optimize while_each_thread() loop
    signal: retarget_shared_pending: consider shared/unblocked signals only
    signal: introduce retarget_shared_pending()
    ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/
    signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED
    signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending()
    ...

    Linus Torvalds
     

09 May, 2011

2 commits

  • GROUP_STOP_TRAPPING waiting mechanism piggybacks on
    signal->wait_chldexit which is primarily used to implement waiting for
    wait(2) and friends. When do_wait() waits on signal->wait_chldexit,
    it uses a custom wake up callback, child_wait_callback(), which
    expects the child task which is waking up the parent to be passed in
    as @key to filter out spurious wakeups.

    task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
    @key causing the following oops if the parent was doing do_wait().

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
    IP: [] child_wait_callback+0x29/0x80
    PGD 1d899067 PUD 1e418067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
    CPU 2
    Modules linked in:

    Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
    RIP: 0010:[] [] child_wait_callback+0x29/0x80
    RSP: 0000:ffff88001b889bf8 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
    RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
    FS: 00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
    Stack:
    ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
    0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
    0000000000000002 0000000000000082 0000000000000001 0000000000000000
    Call Trace:
    [] __wake_up_common+0x59/0x90
    [] __wake_up_sync_key+0x53/0x80
    [] __wake_up_sync+0x10/0x20
    [] task_clear_jobctl_trapping+0x44/0x50
    [] ptrace_stop+0x7c/0x290
    [] do_signal_stop+0x28a/0x2d0
    [] get_signal_to_deliver+0x14f/0x5a0
    [] do_signal+0x75/0x7b0
    [] do_notify_resume+0x5d/0x70
    [] retint_signal+0x46/0x8c
    Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

    Fix it by using __wake_up_sync_key() and passing in the child as @key.

    I still think it's a mistake to piggyback on wait_chldexit for this.
    Given the relative low frequency of ptrace use, we would be much
    better off leaving already complex wait_chldexit alone and using bit
    waitqueue.

    Signed-off-by: Tejun Heo
    Reviewed-by: Oleg Nesterov

    Tejun Heo
     
  • sys_sigprocmask() changes current->blocked by hand. Convert this code
    to use set_current_blocked().

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     

28 Apr, 2011

11 commits

  • Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
    unconditionally and then copy-to-user it if oset != NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
    it should be "andn". Rename signandsets() as suggested.

    Suggested-by: Tejun Heo
    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • do_sigtimedwait() changes current->blocked and thus it needs
    set_current_blocked()->retarget_shared_pending().

    We could use set_current_blocked() directly. It is fine to change
    ->real_blocked from all-zeroes to ->blocked and vice versa lockless,
    but this is not immediately clear, looks racy, and needs a huge
    comment to explain why this is correct.

    To keep the things simple this patch adds the new static helper,
    __set_task_blocked() which should be called with ->siglock held. This
    way we can change both ->real_blocked and ->blocked atomically under
    ->siglock as the current code does. This is more understandable.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
    to the new helper, do_sigtimedwait().

    Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
    thanks to Linus who explained this to me.

    Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
    signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • No functional changes, cleanup compat_sys_rt_sigtimedwait() and
    sys_rt_sigtimedwait().

    Calculate the timeout before we take ->siglock, this simplifies and
    lessens the code. Use timespec_valid() to check the timespec.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
    We can just read current->blocked lockless unconditionally before
    anything else and then copy-to-user it if needed. At worst we
    copy 4 words on mips.

    We could copy-to-user the old mask first and simplify the code even
    more, but the patch tries to keep the current behaviour: we change
    current->block even if copy_to_user(oset) fails.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • In short, almost every changing of current->blocked is wrong, or at least
    can lead to the unexpected results.

    For example. Two threads T1 and T2, T1 sleeps in sigtimedwait/pause/etc.
    kill(tgid, SIG) can pick T2 for TIF_SIGPENDING. If T2 calls sigprocmask()
    and blocks SIG before it notices the pending signal, nobody else can handle
    this pending shared signal.

    I am not sure this is bug, but at least this looks strange imho. T1 should
    not sleep forever, there is a signal which should wake it up.

    This patch moves the code which actually changes ->blocked into the new
    helper, set_current_blocked() and changes this code to call
    retarget_shared_pending() as exit_signals() does. We should only care about
    the signals we just blocked, we use "newset & ~current->blocked" as a mask.

    We do not check !sigisemptyset(newblocked), retarget_shared_pending() is
    cheap unless mask & shared_pending.

    Note: for this particular case we could simply change sigprocmask() to
    return -EINTR if signal_pending(), but then we should change other callers
    and, more importantly, if we need this fix then set_current_blocked() will
    have more callers and some of them can't restart. See the next patch as a
    random example.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • No functional changes, preparation to simplify the review of the next change.

    1. We can read current->block lockless, nobody else can ever change this mask.

    2. Calculate the resulting sigset_t outside of ->siglock into the temporary
    variable, then take ->siglock and change ->blocked.

    Also, kill the stale comment about BKL.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • retarget_shared_pending() blindly does recalc_sigpending_and_wake() for
    every sub-thread, this is suboptimal. We can check t->blocked and stop
    looping once every bit in shared_pending has the new target.

    Note: we do not take task_is_stopped_or_traced(t) into account, we are
    not trying to speed up the signal delivery or to avoid the unnecessary
    (but harmless) signal_wake_up(0) in this unlikely case.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • exit_signals() checks signal_pending() before retarget_shared_pending() but
    this is suboptimal. We can avoid the while_each_thread() loop in case when
    there are no shared signals visible to us.

    Add the "shared_pending.signal & ~blocked" check. We don't use tsk->blocked
    directly but pass ~blocked as an argument, this is needed for the next patch.

    Note: we can optimize this more. while_each_thread(t) can check t->blocked
    into account and stop after every pending signal has the new target, see the
    next patch.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • No functional changes. Move the notify-other-threads code from exit_signals()
    to the new helper, retarget_shared_pending().

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     

09 Apr, 2011

1 commit


08 Apr, 2011

2 commits


05 Apr, 2011

2 commits


04 Apr, 2011

3 commits

  • This patch moves SIGNAL_STOP_DEQUEUED from signal_struct->flags to
    task_struct->group_stop, and thus makes it per-thread.

    Like SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can be false-positive
    after return from get_signal_to_deliver(), this is fine. The only
    purpose of this bit is: we can drop ->siglock after __dequeue_signal()
    returns the sig_kernel_stop() signal and before we call
    do_signal_stop(), in this case we must not miss SIGCONT if it comes in
    between.

    But, unlike SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can not be
    false-positive in do_signal_stop() if multiple threads dequeue the
    sig_kernel_stop() signal at the same time.

    Consider two threads T1 and T2, SIGTTIN has a hanlder.

    - T1 dequeues SIGTSTP and sets SIGNAL_STOP_DEQUEUED, then
    it drops ->siglock

    - SIGCONT comes and clears SIGNAL_STOP_DEQUEUED, SIGTSTP
    should be cancelled.

    - T2 dequeues SIGTTIN and sets SIGNAL_STOP_DEQUEUED again.
    Since we have a handler we should not stop, T2 returns
    to usermode to run the handler.

    - T1 continues, calls do_signal_stop() and wrongly starts
    the group stop because SIGNAL_STOP_DEQUEUED was restored
    in between.

    With or without this change:

    - we need to do something with ptrace_signal() which can
    return SIGSTOP, but this needs another discussion

    - SIGSTOP can be lost if it races with the mt exec, will
    be fixed later.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     
  • PF_EXITING or TASK_STOPPED has already called task_participate_group_stop()
    and cleared its ->group_stop. No need to do task_clear_group_stop_pending()
    when we start the new group stop.

    Add a small comment to explain the !task_is_stopped() check. Note that this
    check is not exactly right and it can lead to unnecessary stop later if the
    thread is TASK_PTRACED. What we need is task_participated_in_group_stop(),
    this will be solved later.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     
  • prepare_signal(SIGCONT) should never set TIF_SIGPENDING or wake up
    the TASK_INTERRUPTIBLE threads. We are going to call complete_signal()
    which should pick the right thread correctly. All we need is to wake
    up the TASK_STOPPED threads.

    If the task was stopped, it can't return to usermode without taking
    ->siglock. Otherwise we don't care, and the spurious TIF_SIGPENDING
    can't be useful.

    The comment says:

    * If there is a handler for SIGCONT, we must make
    * sure that no thread returns to user mode before
    * we post the signal

    It is not clear what this means. Probably, "when there's only a single
    thread" and this continues to be true. Otherwise, even if this SIGCONT
    is not private, with or without this change only one thread can dequeue
    SIGCONT, other threads can happily return to user mode before before
    that thread handles this signal.

    Note also that wake_up_state(t, __TASK_STOPPED) can't race with the task
    which changes its state, TASK_STOPPED state is protected by ->siglock as
    well.

    In short: when it comes to signal delivery, SIGCONT is the normal signal
    and does not need any special support.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     

31 Mar, 2011

1 commit


29 Mar, 2011

1 commit

  • Commit da48524eb206 ("Prevent rt_sigqueueinfo and rt_tgsigqueueinfo
    from spoofing the signal code") made the check on si_code too strict.
    There are several legitimate places where glibc wants to queue a
    negative si_code different from SI_QUEUE:

    - This was first noticed with glibc's aio implementation, which wants
    to queue a signal with si_code SI_ASYNCIO; the current kernel
    causes glibc's tst-aio4 test to fail because rt_sigqueueinfo()
    fails with EPERM.

    - Further examination of the glibc source shows that getaddrinfo_a()
    wants to use SI_ASYNCNL (which the kernel does not even define).
    The timer_create() fallback code wants to queue signals with SI_TIMER.

    As suggested by Oleg Nesterov , loosen the check to
    forbid only the problematic SI_TKILL case.

    Reported-by: Klaus Dittrich
    Acked-by: Julien Tinnes
    Cc:
    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Roland Dreier
     

24 Mar, 2011

1 commit

  • Changelog:
    Dec 8: Fixed bug in my check_kill_permission pointed out by
    Eric Biederman.
    Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
    for clarity
    Dec 31: address comment by Eric Biederman:
    don't need cred/tcred in check_kill_permission.
    Jan 1: use const cred struct.
    Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
    Feb 16: kill_ok_by_cred: fix bad parentheses
    Feb 23: per akpm, let compiler inline kill_ok_by_cred

    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

23 Mar, 2011

13 commits

  • Just as group_exit_code shouldn't be generated when a PTRACE_CONT'd
    task re-enters job control stop, notifiction for the event should be
    suppressed too. The logic is the same as the group_exit_code
    generation suppression in do_signal_stop(), if SIGNAL_STOP_STOPPED is
    already set, the task is re-entering job control stop without
    intervening SIGCONT and the notifications should be suppressed.

    Test case follows.

    #include
    #include
    #include
    #include
    #include
    #include

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static pid_t tracee, tracer;

    static const char *pid_who(pid_t pid)
    {
    return pid == tracee ? "tracee" : (pid == tracer ? "tracer" : "mommy ");
    }

    static void sigchld_sigaction(int signo, siginfo_t *si, void *ucxt)
    {
    printf("%s: SIG status=%02d code=%02d (%s)\n",
    pid_who(getpid()), si->si_status, si->si_code,
    pid_who(si->si_pid));
    }

    int main(void)
    {
    const struct sigaction chld_sa = { .sa_sigaction = sigchld_sigaction,
    .sa_flags = SA_SIGINFO|SA_RESTART };
    siginfo_t si;

    sigaction(SIGCHLD, &chld_sa, NULL);

    tracee = fork();
    if (!tracee) {
    tracee = getpid();
    while (1)
    pause();
    }

    kill(tracee, SIGSTOP);
    waitid(P_PID, tracee, &si, WSTOPPED);

    tracer = fork();
    if (!tracer) {
    tracer = getpid();
    ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    waitid(P_PID, tracee, &si, WSTOPPED);
    printf("tracer: detaching\n");
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    return 0;
    }

    while (1)
    pause();
    return 0;
    }

    Before the patch, the parent gets the second notification for the
    tracee after the tracer detaches. si_status is zero because
    group_exit_code is not set by the group stop completion which
    triggered this notification.

    mommy : SIG status=19 code=05 (tracee)
    tracer: SIG status=00 code=05 (tracee)
    tracer: SIG status=19 code=04 (tracee)
    tracer: SIG status=00 code=05 (tracee)
    tracer: detaching
    mommy : SIG status=00 code=05 (tracee)
    mommy : SIG status=00 code=01 (tracer)
    ^C

    After the patch, the duplicate notification is gone.

    mommy : SIG status=19 code=05 (tracee)
    tracer: SIG status=00 code=05 (tracee)
    tracer: SIG status=19 code=04 (tracee)
    tracer: SIG status=00 code=05 (tracee)
    tracer: detaching
    mommy : SIG status=00 code=01 (tracer)
    ^C

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • With recent changes, job control and ptrace stopped states are
    properly separated and accessible to the real parent and the ptracer
    respectively; however, notifications of job control stopped/continued
    events to the real parent while ptraced are still missing.

    A ptracee participates in group stop in ptrace_stop() but the
    completion isn't notified. If participation results in completion of
    group stop, notify the real parent of the event. The ptrace and group
    stops are separate and can be handled as such.

    However, when the real parent and the ptracer are in the same thread
    group, only the ptrace stop event is visible through wait(2) and the
    duplicate notifications are different from the current behavior and
    are confusing. Suppress group stop notification in such cases.

    The continued state is shared between the real parent and the ptracer
    but is only meaningful to the real parent. Always notify the real
    parent and notify the ptracer too for backward compatibility. Similar
    to stop notification, if the real parent is the ptracer, suppress a
    duplicate notification.

    Test case follows.

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int main(void)
    {
    const struct timespec ts100ms = { .tv_nsec = 100000000 };
    pid_t tracee, tracer;
    siginfo_t si;
    int i;

    tracee = fork();
    if (tracee == 0) {
    while (1) {
    printf("tracee: SIGSTOP\n");
    raise(SIGSTOP);
    nanosleep(&ts100ms, NULL);
    printf("tracee: SIGCONT\n");
    raise(SIGCONT);
    nanosleep(&ts100ms, NULL);
    }
    }

    waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);

    tracer = fork();
    if (tracer == 0) {
    nanosleep(&ts100ms, NULL);
    ptrace(PTRACE_ATTACH, tracee, NULL, NULL);

    for (i = 0; i < 11; i++) {
    si.si_pid = 0;
    waitid(P_PID, tracee, &si, WSTOPPED);
    if (si.si_pid && si.si_code == CLD_TRAPPED)
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(long)si.si_status);
    }
    printf("tracer: EXITING\n");
    return 0;
    }

    while (1) {
    si.si_pid = 0;
    waitid(P_PID, tracee, &si, WSTOPPED | WCONTINUED | WEXITED);
    if (si.si_pid)
    printf("mommy : WAIT status=%02d code=%02d\n",
    si.si_status, si.si_code);
    }
    return 0;
    }

    Before this patch, while ptraced, the real parent doesn't get
    notifications for job control events, so although it can access those
    events, the later waitid(2) call never wakes up.

    tracee: SIGSTOP
    mommy : WAIT status=19 code=05
    tracee: SIGCONT
    tracee: SIGSTOP
    tracee: SIGCONT
    tracee: SIGSTOP
    tracee: SIGCONT
    tracee: SIGSTOP
    tracer: EXITING
    mommy : WAIT status=19 code=05
    ^C

    After this patch, it works as expected.

    tracee: SIGSTOP
    mommy : WAIT status=19 code=05
    tracee: SIGCONT
    mommy : WAIT status=18 code=06
    tracee: SIGSTOP
    mommy : WAIT status=19 code=05
    tracee: SIGCONT
    mommy : WAIT status=18 code=06
    tracee: SIGSTOP
    mommy : WAIT status=19 code=05
    tracee: SIGCONT
    mommy : WAIT status=18 code=06
    tracee: SIGSTOP
    tracer: EXITING
    mommy : WAIT status=19 code=05
    ^C

    -v2: Oleg pointed out that

    * Group stop notification to the real parent should also happen
    when ptracer detach races with ptrace_stop().

    * real_parent_is_ptracer() should be testing thread group
    equality not the task itself as wait(2) and stop/cont
    notifications are normally thread-group wide.

    Both issues are fixed accordingly.

    -v3: real_parent_is_ptracer() updated to test child->real_parent
    instead of child->group_leader->real_parent per Oleg's
    suggestion.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • The stopped notifications in do_signal_stop() and exit_signals() are
    always for the completion of job control. The one in do_signal_stop()
    may be delivered to the ptracer if PTRACE_ATTACH races with
    notification and the one in exit_signals() if task exits while
    ptraced.

    In both cases, the notifications are meaningless and confusing to the
    ptracer as it never accesses the group stop state while the real
    parent would miss notifications for the events it is watching.

    Make sure these notifications always go to the real parent by calling
    do_notify_parent_cld_stop() with %false @for_ptrace.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Currently, do_notify_parent_cldstop() determines whether the
    notification is for the real parent or ptracer. Move the decision to
    the caller by adding @for_ptrace parameter to
    do_notify_parent_cldstop(). All the callers are updated to pass
    task_ptrace(target_task), so this patch doesn't cause any behavior
    difference.

    While at it, add function comment to do_notify_parent_cldstop().

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • While ptraced, a task may be resumed while the containing process is
    still job control stopped. If the task receives another stop signal
    in this state, it will still initiate group stop, which generates
    group_exit_code, which the real parent would be able to see once the
    ptracer detaches.

    In this scenario, the real parent may see two consecutive CLD_STOPPED
    events from two stop signals without intervening SIGCONT, which
    normally is impossible.

    Test case follows.

    #include
    #include
    #include
    #include

    int main(void)
    {
    pid_t tracee;
    siginfo_t si;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    kill(tracee, SIGSTOP);
    waitid(P_PID, tracee, &si, WSTOPPED);

    if (!fork()) {
    ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    return 0;
    }

    while (1) {
    si.si_pid = 0;
    waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG);
    if (si.si_pid)
    printf("st=%02d c=%02d\n", si.si_status, si.si_code);
    }
    return 0;
    }

    Before the patch, the latter waitid() in polling mode reports the
    second stopped event generated by the implied SIGSTOP of
    PTRACE_ATTACH.

    st=19 c=05
    ^C

    After the patch, the second event is not reported.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Currently, if the task is STOPPED on ptrace attach, it's left alone
    and the state is silently changed to TRACED on the next ptrace call.
    The behavior breaks the assumption that arch_ptrace_stop() is called
    before any task is poked by ptrace and is ugly in that a task
    manipulates the state of another task directly.

    With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
    TRACED can be made clean. The tracer can use the flag to tell the
    tracee to retry stop on attach and detach. On retry, the tracee will
    enter the desired state in the correct way. The lower 16bits of
    task->group_stop is used to remember the signal number which caused
    the last group stop. This is used while retrying for ptrace attach as
    the original group_exit_code could have been consumed with wait(2) by
    then.

    As the real parent may wait(2) and consume the group_exit_code
    anytime, the group_exit_code needs to be saved separately so that it
    can be used when switching from regular sleep to ptrace_stop(). This
    is recorded in the lower 16bits of task->group_stop.

    If a task is already stopped and there's no intervening SIGCONT, a
    ptrace request immediately following a successful PTRACE_ATTACH should
    always succeed even if the tracer doesn't wait(2) for attach
    completion; however, with this change, the tracee might still be
    TASK_RUNNING trying to enter TASK_TRACED which would cause the
    following request to fail with -ESRCH.

    This intermediate state is hidden from the ptracer by setting
    GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
    for it to clear on its signal->wait_chldexit. Completing the
    transition or getting killed clears TRAPPING and wakes up the tracer.

    Note that the STOPPED -> RUNNING -> TRACED transition is still visible
    to other threads which are in the same group as the ptracer and the
    reverse transition is visible to all. Please read the comments for
    details.

    Oleg:

    * Spotted a race condition where a task may retry group stop without
    proper bookkeeping. Fixed by redoing bookkeeping on retry.

    * Spotted that the transition is visible to userland in several
    different ways. Most are fixed with GROUP_STOP_TRAPPING. Unhandled
    corner case is documented.

    * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
    task would result in more consistent behavior.

    * Pointed out that calling ptrace_stop() from do_signal_stop() in
    TASK_STOPPED can race with group stop start logic and then confuse
    the TRAPPING wait in ptrace_check_attach(). ptrace_stop() is now
    called with TASK_RUNNING.

    * Suggested using signal->wait_chldexit instead of bit wait.

    * Spotted a race condition between TRACED transition and clearing of
    TRAPPING.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jan Kratochvil

    Tejun Heo
     
  • A ptraced task would still stop at do_signal_stop() when it's stopping
    for stop signals and do_signal_stop() behaves the same whether the
    task is ptraced or not. However, in addition to stopping,
    ptrace_stop() also does ptrace specific stuff like calling
    architecture specific callbacks, so this behavior makes the code more
    fragile and difficult to understand.

    This patch makes do_signal_stop() test whether the task is ptraced and
    use ptrace_stop() if so. This renders tracehook_notify_jctl() rather
    pointless as the ptrace notification is now handled by ptrace_stop()
    regardless of the return value from the tracehook. It probably is a
    good idea to update it.

    This doesn't solve the whole problem as tasks already in stopped state
    would stay in the regular stop when ptrace attached. That part will
    be handled by the next patch.

    Oleg pointed out that this makes a userland-visible change. Before,
    SIGCONT would be able to wake up a task in group stop even if the task
    is ptraced if the tracer hasn't issued another ptrace command
    afterwards (as the next ptrace commands transitions the state into
    TASK_TRACED which ignores SIGCONT wakeups). With this and the next
    patch, SIGCONT may race with the transition into TASK_TRACED and is
    ignored if the tracee already entered TASK_TRACED.

    Another userland visible change of this and the next patch is that the
    ptracee's state would now be TASK_TRACED where it used to be
    TASK_STOPPED, which is visible via fs/proc.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jan Kratochvil

    Tejun Heo
     
  • Currently, ptrace_stop() unconditionally participates in group stop
    bookkeeping. This is unnecessary and inaccurate. Make it only
    participate if the task is trapping for group stop - ie. if @why is
    CLD_STOPPED. As ptrace_stop() currently is not used when trapping for
    group stop, this equals to disabling group stop participation from
    ptrace_stop().

    A visible behavior change is increased likelihood of delayed group
    stop completion if the thread group contains one or more ptraced
    tasks.

    This is to preapre for further cleanup of the interaction between
    group stop and ptrace.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath

    Tejun Heo
     
  • Currently task->signal->group_stop_count is used to decide whether to
    stop for group stop. However, if there is a task in the group which
    is taking a long time to stop, other tasks which are continued by
    ptrace would repeatedly stop for the same group stop until the group
    stop is complete.

    Conversely, if a ptraced task is in TASK_TRACED state, the debugger
    won't get notified of group stops which is inconsistent compared to
    the ptraced task in any other state.

    This patch introduces GROUP_STOP_PENDING which tracks whether a task
    is yet to stop for the group stop in progress. The flag is set when a
    group stop starts and cleared when the task stops the first time for
    the group stop, and consulted whenever whether the task should
    participate in a group stop needs to be determined. Note that now
    tasks in TASK_TRACED also participate in group stop.

    This results in the following behavior changes.

    * For a single group stop, a ptracer would see at most one stop
    reported.

    * A ptracee in TASK_TRACED now also participates in group stop and the
    tracer would get the notification. However, as a ptraced task could
    be in TASK_STOPPED state or any ptrace trap could consume group
    stop, the notification may still be missing. These will be
    addressed with further patches.

    * A ptracee may start a group stop while one is still in progress if
    the tracer let it continue with stop signal delivery. Group stop
    code handles this correctly.

    Oleg:

    * Spotted that a task might skip signal check even when its
    GROUP_STOP_PENDING is set. Fixed by updating
    recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
    group_stop_count.

    * Pointed out that task->group_stop should be cleared whenever
    task->signal->group_stop_count is cleared. Fixed accordingly.

    * Pointed out the behavior inconsistency between TASK_TRACED and
    RUNNING and the last behavior change.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath

    Tejun Heo
     
  • task->signal->group_stop_count is used to track the progress of group
    stop. It's initialized to the number of tasks which need to stop for
    group stop to finish and each stopping or trapping task decrements.
    However, each task doesn't keep track of whether it decremented the
    counter or not and if woken up before the group stop is complete and
    stops again, it can decrement the counter multiple times.

    Please consider the following example code.

    static void *worker(void *arg)
    {
    while (1) ;
    return NULL;
    }

    int main(void)
    {
    pthread_t thread;
    pid_t pid;
    int i;

    pid = fork();
    if (!pid) {
    for (i = 0; i < 5; i++)
    pthread_create(&thread, NULL, worker, NULL);
    while (1) ;
    return 0;
    }

    ptrace(PTRACE_ATTACH, pid, NULL, NULL);
    while (1) {
    waitid(P_PID, pid, NULL, WSTOPPED);
    ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
    }
    return 0;
    }

    The child creates five threads and the parent continuously traps the
    first thread and whenever the child gets a signal, SIGSTOP is
    delivered. If an external process sends SIGSTOP to the child, all
    other threads in the process should reliably stop. However, due to
    the above bug, the first thread will often end up consuming
    group_stop_count multiple times and SIGSTOP often ends up stopping
    none or part of the other four threads.

    This patch adds a new field task->group_stop which is protected by
    siglock and uses GROUP_STOP_CONSUME flag to track which task is still
    to consume group_stop_count to fix this bug.

    task_clear_group_stop_pending() and task_participate_group_stop() are
    added to help manipulating group stop states. As ptrace_stop() now
    also uses task_participate_group_stop(), it will set
    SIGNAL_STOP_STOPPED if it completes a group stop.

    There still are many issues regarding the interaction between group
    stop and ptrace. Patches to address them will follow.

    - Oleg spotted duplicate GROUP_STOP_CONSUME. Dropped.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath

    Tejun Heo
     
  • To prepare for cleanup of the interaction between group stop and
    ptrace, add @why to ptrace_stop(). Existing users are updated such
    that there is no behavior change.

    Signed-off-by: Tejun Heo
    Acked-by: Roland McGrath

    Tejun Heo
     
  • tracehook_notify_jctl() aids in determining whether and what to report
    to the parent when a task is stopped or continued. The function also
    adds an extra requirement that siglock may be released across it,
    which is currently unused and quite difficult to satisfy in
    well-defined manner.

    As job control and the notifications are about to receive major
    overhaul, remove the tracehook and open code it. If ever necessary,
    let's factor it out after the overhaul.

    * Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
    Fixed.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Roland McGrath

    Tejun Heo
     
  • do_signal_stop() is used only by get_signal_to_deliver() and after a
    successful signal stop, it always calls try_to_freeze(), so the
    try_to_freeze() loop around schedule() in do_signal_stop() is
    superflous and confusing. Remove it.

    Signed-off-by: Tejun Heo
    Acked-by: Rafael J. Wysocki
    Acked-by: Oleg Nesterov
    Acked-by: Roland McGrath

    Tejun Heo