26 Sep, 2011

1 commit


19 Jul, 2011

1 commit

  • This change adds a procfs connector event, which is emitted on every
    successful process tracer attach or detach.

    If some process connects to other one, kernelspace connector reports
    process id and thread group id of both these involved processes. On
    disconnection null process id is returned.

    Such an event allows to create a simple automated userspace mechanism
    to be aware about processes connecting to others, therefore predefined
    process policies can be applied to them if needed.

    Note, a detach signal is emitted only in case, if a tracer process
    explicitly executes PTRACE_DETACH request. In other cases like tracee
    or tracer exit detach event from proc connector is not reported.

    Signed-off-by: Vladimir Zapolskiy
    Acked-by: Evgeniy Polyakov
    Cc: David S. Miller
    Signed-off-by: Oleg Nesterov

    Vladimir Zapolskiy
     

28 Jun, 2011

2 commits

  • __ptrace_detach() and do_notify_parent() set task->exit_signal = -1
    to mark the task dead. This is no longer needed, nobody checks
    exit_signal to detect the EXIT_DEAD task.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo

    Oleg Nesterov
     
  • __ptrace_detach() relies on the current obscure behaviour of
    do_notify_parent(tsk) which changes tsk->exit_signal if this child
    should be silently reaped. That is why we check task_detached(), it
    is true if the task is sub-thread, or it is the group_leader but
    its exit_signal was changed by do_notify_parent().

    This is confusing, change the code to rely on !thread_group_leader()
    or the value returned by do_notify_parent().

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

17 Jun, 2011

4 commits

  • The previous patch implemented async notification for ptrace but it
    only worked while trace is running. This patch introduces
    PTRACE_LISTEN which is suggested by Oleg Nestrov.

    It's allowed iff tracee is in STOP trap and puts tracee into
    quasi-running state - tracee never really runs but wait(2) and
    ptrace(2) consider it to be running. While ptracer is listening,
    tracee is allowed to re-enter STOP to notify an async event.
    Listening state is cleared on the first notification. Ptracer can
    also clear it by issuing INTERRUPT - tracee will re-trap into STOP
    with listening state cleared.

    This allows ptracer to monitor group stop state without running tracee
    - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
    wait(2) to wait for the next group stop event. When it happens,
    PTRACE_GETSIGINFO provides information to determine the current state.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207
    #define PTRACE_LISTEN 0x4208

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    if (si.si_signo != SIGTRAP)
    ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
    else
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    This is identical to the program to test TRAP_NOTIFY except that
    tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
    This allows ptracer to monitor when group stop ends without running
    tracee.

    # ./test-listen
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
    task_stopped_code() as suggested by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently, there's no way to trap a running ptracee short of sending a
    signal which has various side effects. This patch implements
    PTRACE_INTERRUPT which traps ptracee without any signal or job control
    related side effect.

    The implementation is almost trivial. It uses the group stop trap -
    SIGTRAP | PTRACE_EVENT_STOP << 8. A new trap flag
    JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
    cleared when any trap happens. As INTERRUPT should be useable
    regardless of the current state of tracee, task_is_traced() test in
    ptrace_check_attach() is skipped for INTERRUPT.

    PTRACE_INTERRUPT is available iff tracee is attached with
    PTRACE_SEIZE.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive pid=%d\n", getpid());
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);

    printf("tracer: INTERRUPT and DETACH\n");
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    nanosleep(&ts3s, NULL);

    printf("tracer: exiting\n");
    kill(tracee, SIGKILL);
    return 0;
    }

    When called without argument, tracee is seized from running state,
    interrupted and then detached back to running state.

    # ./test-interrupt
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: INTERRUPT and DETACH
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: exiting

    When called with argument, tracee is seized from stopped state,
    continued, interrupted and then detached back to stopped state.

    # ./test-interrupt 1
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracer: INTERRUPT and DETACH
    tracer: exiting

    Before PTRACE_INTERRUPT, once the tracee was running, there was no way
    to trap tracee and do PTRACE_DETACH without causing side effect.

    -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
    up scheduling TRAP_STOP if child is dying which may make the
    child unkillable. Spotted by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
    effects on tracee signal and job control states. This patch
    implements a new ptrace request PTRACE_SEIZE which attaches a tracee
    without trapping it or affecting its signal and job control states.

    The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
    flags in @data. Currently, the only defined flag is
    PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
    PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
    The changes will be implemented gradually and the DEVEL flag is to
    prevent programs which expect full SEIZE behavior from using it before
    all the behavior modifications are complete while allowing unit
    testing. The flag will be removed once SEIZE behaviors are completely
    implemented.

    * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap. After
    attaching tracee continues to run unless a trap condition occurs.

    * PTRACE_SEIZE doesn't affect signal or group stop state.

    * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
    exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
    the stopping signals if group stop is in effect or SIGTRAP
    otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
    instead of NULL.

    Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
    used to determine whether new SEIZE behaviors should be enabled.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive\n");
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);
    printf("tracer: exiting\n");
    return 0;
    }

    When the above program is called w/o argument, tracee is seized while
    running and remains running. When tracer exits, tracee continues to
    run and print out messages.

    # ./test-seize-simple
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    tracee: alive
    tracee: alive

    When called with an argument, tracee is seized from stopped state and
    continued, and returns to stopped state when tracer exits.

    # ./test-seize
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    # ps -el|grep test-seize
    1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

    -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
    suggested.

    -v3: PTRACE_EVENT_STOP traps now report group stop state by signr. If
    group stop is in effect the stop signal number is returned as
    part of exit_code; otherwise, SIGTRAP. This was suggested by
    Denys and Oleg.

    Signed-off-by: Tejun Heo
    Cc: Jan Kratochvil
    Cc: Denys Vlasenko
    Cc: Oleg Nesterov

    Tejun Heo
     
  • do_signal_stop() implemented both normal group stop and trap for group
    stop while ptraced. This approach has been enough but scheduled
    changes require trap mechanism which can be used in more generic
    manner and using group stop trap for generic trap site simplifies both
    userland visible interface and implementation.

    This patch adds a new jobctl flag - JOBCTL_TRAP_STOP. When set, it
    triggers a trap site, which behaves like group stop trap, in
    get_signal_to_deliver() after checking for pending signals. While
    ptraced, do_signal_stop() doesn't stop itself. It initiates group
    stop if requested and schedules JOBCTL_TRAP_STOP and returns. The
    caller - get_signal_to_deliver() - is responsible for checking whether
    TRAP_STOP is pending afterwards and handling it.

    ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
    JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
    bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
    after detach.

    While at it, add proper function comment to do_signal_stop() and make
    it return bool.

    -v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
    instead of JOBCTL_PENDING_MASK. This avoids accidentally
    clearing JOBCTL_STOP_CONSUME. Spotted by Oleg.

    -v3: do_signal_stop() updated to return %false without dropping
    siglock while ptraced and TRAP_STOP check moved inside for(;;)
    loop after group stop participation. This avoids unnecessary
    relocking and also will help avoiding unnecessary traps by
    consuming group stop before handling pending traps.

    -v4: Jobctl trap handling moved into a separate function -
    do_jobctl_trap().

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     

05 Jun, 2011

5 commits

  • ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
    ->wait_chldexit was already complicated with waker-side filtering
    without adding TRAPPING wait on top of it. Also, it unnecessarily
    made TRAPPING clearing depend on the current ptrace relationship - if
    the ptracee is detached, wakeup is lost.

    There is no reason to use signal->wait_chldexit here. We're just
    waiting for JOBCTL_TRAPPING bit to clear and given the relatively
    infrequent use of ptrace, bit_waitqueue can serve it perfectly.

    This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
    signal->wait_chldexit.

    -v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
    pending bits too. Setting pending conditions on a dying task may make
    the task unkillable. Currently, each setting site is responsible for
    checking for the condition but with to-be-added job control traps this
    becomes too fragile.

    This patch adds task_set_jobctl_pending() which should be used when
    setting task->jobctl bits to schedule a stop or trap. The function
    performs the followings to ease setting pending bits.

    * Sanity checks.

    * If fatal signal is pending or PF_EXITING is set, no bit is set.

    * STOP_SIGMASK is automatically cleared if new value is being set.

    do_signal_stop() and ptrace_attach() are updated to use
    task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
    The surrounding structures around setting are changed to fit
    task_set_jobctl_pending() better but there should be no userland
    visible behavior difference.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_INTERRUPT is going to be added which should also skip
    task_is_traced() check in ptrace_check_attach(). Rename @kill to
    @ignore_state and make it bool. Add function comment while at it.

    This patch doesn't introduce any behavior difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • signal->group_stop currently hosts mostly group stop related flags;
    however, it's gonna be used for wider purposes and the GROUP_STOP_
    flag prefix becomes confusing. Rename signal->group_stop to
    signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

    Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
    defined in terms of them to allow using bitops later.

    While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
    future additions.

    This doesn't cause any functional change.

    -v2: JOBCTL_*_BIT macros added as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • Remove local variable wait_trap which determines whether to wait for
    !TRAPPING or not and simply wait for it if attach was successful.

    -v2: Oleg pointed out wait should happen iff attach was successful.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

26 May, 2011

1 commit

  • It is not clear why ptrace_resume() does wake_up_process(). Unless the
    caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
    wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
    not need the extra and potentionally spurious wakeup.

    If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
    The tracee can sleep in any state in any place, and if we have a buggy
    code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
    be used to exploit it. For example:

    int main(void)
    {
    int child, status;

    child = fork();
    if (!child) {
    int ret;

    assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);

    ret = pause();
    printf("pause: %d %m\n", ret);

    return 0x23;
    }

    sleep(1);
    assert(ptrace(PTRACE_KILL, child, 0,0) == 0);

    assert(child == wait(&status));
    printf("wait: %x\n", status);

    return 0;
    }

    prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
    userland. In this case sys_pause() is buggy as well and should be
    fixed.

    I do not know what was the original rationality behind PTRACE_KILL.
    The man page is simply wrong and afaics it was always wrong. Imho
    it should be deprecated, or may be it should do send_sig(SIGKILL)
    as Denys suggests, but in any case I do not think that the current
    behaviour was intentional.

    Note: there is another problem, ptrace_resume() changes ->exit_code
    and this can race with SIGKILL too. Eventually we should change ptrace
    to not use ->exit_code.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     

21 May, 2011

1 commit

  • * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (41 commits)
    signal: trivial, fix the "timespec declared inside parameter list" warning
    job control: reorganize wait_task_stopped()
    ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
    signal: sys_sigprocmask() needs retarget_shared_pending()
    signal: cleanup sys_sigprocmask()
    signal: rename signandsets() to sigandnsets()
    signal: do_sigtimedwait() needs retarget_shared_pending()
    signal: introduce do_sigtimedwait() to factor out compat/native code
    signal: sys_rt_sigtimedwait: simplify the timeout logic
    signal: cleanup sys_rt_sigprocmask()
    x86: signal: sys_rt_sigreturn() should use set_current_blocked()
    x86: signal: handle_signal() should use set_current_blocked()
    signal: sigprocmask() should do retarget_shared_pending()
    signal: sigprocmask: narrow the scope of ->siglock
    signal: retarget_shared_pending: optimize while_each_thread() loop
    signal: retarget_shared_pending: consider shared/unblocked signals only
    signal: introduce retarget_shared_pending()
    ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/
    signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED
    signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending()
    ...

    Linus Torvalds
     

25 Apr, 2011

1 commit

  • When a task is traced and is in a stopped state, the tracer
    may execute a ptrace request to examine the tracee state and
    get its task struct. Right after, the tracee can be killed
    and thus its breakpoints released.
    This can happen concurrently when the tracer is in the middle
    of reading or modifying these breakpoints, leading to dereferencing
    a freed pointer.

    Hence, to prepare the fix, create a generic breakpoint reference
    holding API. When a reference on the breakpoints of a task is
    held, the breakpoints won't be released until the last reference
    is dropped. After that, no more ptrace request on the task's
    breakpoints can be serviced for the tracer.

    Reported-by: Oleg Nesterov
    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: Prasad
    Cc: Paul Mundt
    Cc: v2.6.33..
    Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com

    Frederic Weisbecker
     

08 Apr, 2011

1 commit


04 Apr, 2011

1 commit

  • After "ptrace: Clean transitions between TASK_STOPPED and TRACED"
    d79fdd6d96f46fabb779d86332e3677c6f5c2a4f, ptrace_check_attach()
    should never see a TASK_STOPPED tracee and s/STOPPED/TRACED/ is
    no longer legal. Add the warning.

    Note: ptrace_check_attach() can be greatly simplified, in particular
    it doesn't need tasklist. But I'd prefer another patch for that.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     

24 Mar, 2011

1 commit

  • ptrace is allowed to tasks in the same user namespace according to the
    usual rules (i.e. the same rules as for two tasks in the init user
    namespace). ptrace is also allowed to a user namespace to which the
    current task the has CAP_SYS_PTRACE capability.

    Changelog:
    Dec 31: Address feedback by Eric:
    . Correct ptrace uid check
    . Rename may_ptrace_ns to ptrace_capable
    . Also fix the cap_ptrace checks.
    Jan 1: Use const cred struct
    Jan 11: use task_ns_capable() in place of ptrace_capable().
    Feb 23: same_or_ancestore_user_ns() was not an appropriate
    check to constrain cap_issubset. Rather, cap_issubset()
    only is meaningful when both capsets are in the same
    user_ns.

    Signed-off-by: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

23 Mar, 2011

4 commits

  • Currently, __ptrace_unlink() wakes up the tracee iff it's in
    TASK_TRACED. For unlinking from PTRACE_DETACH, this is correct as the
    tracee is guaranteed to be in TASK_TRACED or dead; however, unlinking
    also happens when the ptracer exits and in this case the ptracee can
    be in any state and ptrace might be left running even if the group it
    belongs to is stopped.

    This patch updates __ptrace_unlink() such that GROUP_STOP_PENDING is
    reinstated regardless of the ptracee's current state as long as it's
    alive and makes sure that signal_wake_up() is called if execution
    state transition is necessary.

    Test case follows.

    #include
    #include
    #include
    #include
    #include

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(void)
    {
    pid_t tracee;
    siginfo_t si;

    tracee = fork();
    if (tracee == 0) {
    while (1) {
    nanosleep(&ts1s, NULL);
    write(1, ".", 1);
    }
    }

    ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    waitid(P_PID, tracee, &si, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
    write(1, "exiting", 7);
    return 0;
    }

    Before the patch, after the parent process exits, the child is left
    running and prints out "." every second.

    exiting..... (continues)

    After the patch, the group stop initiated by the implied SIGSTOP from
    PTRACE_ATTACH is re-established when the parent exits.

    exiting

    Signed-off-by: Tejun Heo
    Reported-by: Oleg Nesterov
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Remove the extra task_is_traced() check in __ptrace_unlink() and
    collapse ptrace_untrace() into __ptrace_unlink(). This is to prepare
    for further changes.

    While at it, drop the comment on top of ptrace_untrace() and convert
    __ptrace_unlink() comment to docbook format. Detailed comment will be
    added by the next patch.

    This patch doesn't cause any visible behavior changes.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov

    Tejun Heo
     
  • Currently, if the task is STOPPED on ptrace attach, it's left alone
    and the state is silently changed to TRACED on the next ptrace call.
    The behavior breaks the assumption that arch_ptrace_stop() is called
    before any task is poked by ptrace and is ugly in that a task
    manipulates the state of another task directly.

    With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
    TRACED can be made clean. The tracer can use the flag to tell the
    tracee to retry stop on attach and detach. On retry, the tracee will
    enter the desired state in the correct way. The lower 16bits of
    task->group_stop is used to remember the signal number which caused
    the last group stop. This is used while retrying for ptrace attach as
    the original group_exit_code could have been consumed with wait(2) by
    then.

    As the real parent may wait(2) and consume the group_exit_code
    anytime, the group_exit_code needs to be saved separately so that it
    can be used when switching from regular sleep to ptrace_stop(). This
    is recorded in the lower 16bits of task->group_stop.

    If a task is already stopped and there's no intervening SIGCONT, a
    ptrace request immediately following a successful PTRACE_ATTACH should
    always succeed even if the tracer doesn't wait(2) for attach
    completion; however, with this change, the tracee might still be
    TASK_RUNNING trying to enter TASK_TRACED which would cause the
    following request to fail with -ESRCH.

    This intermediate state is hidden from the ptracer by setting
    GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
    for it to clear on its signal->wait_chldexit. Completing the
    transition or getting killed clears TRAPPING and wakes up the tracer.

    Note that the STOPPED -> RUNNING -> TRACED transition is still visible
    to other threads which are in the same group as the ptracer and the
    reverse transition is visible to all. Please read the comments for
    details.

    Oleg:

    * Spotted a race condition where a task may retry group stop without
    proper bookkeeping. Fixed by redoing bookkeeping on retry.

    * Spotted that the transition is visible to userland in several
    different ways. Most are fixed with GROUP_STOP_TRAPPING. Unhandled
    corner case is documented.

    * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
    task would result in more consistent behavior.

    * Pointed out that calling ptrace_stop() from do_signal_stop() in
    TASK_STOPPED can race with group stop start logic and then confuse
    the TRAPPING wait in ptrace_check_attach(). ptrace_stop() is now
    called with TASK_RUNNING.

    * Suggested using signal->wait_chldexit instead of bit wait.

    * Spotted a race condition between TRACED transition and clearing of
    TRAPPING.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Jan Kratochvil

    Tejun Heo
     
  • This wake_up_state() has a turbulent history. This is a remnant from
    ancient ptrace implementation and patently wrong. Commit 95a3540d
    (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
    it but the change was reverted later by commit edaba2c5 (ptrace:
    revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
    citing compatibility breakage and general brokeness of the whole group
    stop / ptrace interaction. Then, recently, it got converted from
    wake_up_process() to wake_up_state() to make it less dangerous.

    Digging through the mailing archives, the compatibility breakage
    doesn't seem to be critical in the sense that the behavior isn't well
    defined or reliable to begin with and it seems to have been agreed to
    remove the wakeup with proper cleanup of the whole thing.

    Now that the group stop and its interaction with ptrace are being
    cleaned up, it's high time to finally kill this silliness.

    Signed-off-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath

    Tejun Heo
     

05 Mar, 2011

1 commit


12 Feb, 2011

1 commit

  • The wake_up_process() call in ptrace_detach() is spurious and not
    interlocked with the tracee state. IOW, the tracee could be running or
    sleeping in any place in the kernel by the time wake_up_process() is
    called. This can lead to the tracee waking up unexpectedly which can be
    dangerous.

    The wake_up is spurious and should be removed but for now reduce its
    toxicity by only waking up if the tracee is in TRACED or STOPPED state.

    This bug can possibly be used as an attack vector. I don't think it
    will take too much effort to come up with an attack which triggers oops
    somewhere. Most sleeps are wrapped in condition test loops and should
    be safe but we have quite a number of places where sleep and wakeup
    conditions are expected to be interlocked. Although the window of
    opportunity is tiny, ptrace can be used by non-privileged users and with
    some loading the window can definitely be extended and exploited.

    Signed-off-by: Tejun Heo
    Acked-by: Roland McGrath
    Acked-by: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

28 Oct, 2010

4 commits

  • Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
    itself and we can reuse ->cred_guard_mutex for it. Yes, concurrent
    execve() has no worth.

    Let's move ->cred_guard_mutex from task_struct to signal_struct. It
    naturally prevent multiple-threads-inside-exec.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Use new 'datavp' and 'datalp' variables to remove unnecesary castings.

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Since userspace API of ptrace syscall defines @addr and @data as void
    pointers, it would be more appropriate to define them as unsigned long in
    kernel. Therefore related functions are changed also.

    'unsigned long' is typically used in other places in kernel as an opaque
    data type and that using this helps cleaning up a lot of warnings from
    sparse.

    Suggested-by: Arnd Bergmann
    Signed-off-by: Namhyung Kim
    Acked-by: Arnd Bergmann
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • exit_ptrace() releases and regrabs tasklist_lock but was missing proper
    annotation. Add it.

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     

11 Aug, 2010

1 commit

  • exit_ptrace() takes tasklist_lock unconditionally. We need this lock to
    avoid the race with ptrace_traceme(), it acts as a barrier.

    Change its caller, forget_original_parent(), to call exit_ptrace() under
    tasklist_lock. Change exit_ptrace() to drop and reacquire this lock if
    needed.

    This allows us to add the fastpath list_empty(ptraced) check. In the
    likely no-tracees case exit_ptrace() just returns and we avoid the lock()
    + unlock() sequence.

    "Zhang, Yanmin" suggested to add this
    check, and he reports that this change adds about 11% improvement in some
    tests.

    Suggested-and-tested-by: "Zhang, Yanmin"
    Signed-off-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

28 May, 2010

2 commits

  • Now that Mike Frysinger unified the FDPIC ptrace code, we can fix the
    unsafe usage of child->mm in ptrace_request(PTRACE_GETFDPIC).

    We have the reference to task_struct, and ptrace_check_attach() verified
    the tracee is stopped. But nothing can protect from SIGKILL after that,
    we must not assume child->mm != NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Mike Frysinger
    Acked-by: David Howells
    Cc: Paul Mundt
    Cc: Greg Ungerer
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The Blackfin/FRV/SuperH guys all have the same exact FDPIC ptrace code in
    their arch handlers (since they were probably copied & pasted). Since
    these ptrace interfaces are an arch independent aspect of the FDPIC code,
    unify them in the common ptrace code so new FDPIC ports don't need to copy
    and paste this fundamental stuff yet again.

    Signed-off-by: Mike Frysinger
    Acked-by: Roland McGrath
    Acked-by: David Howells
    Acked-by: Paul Mundt
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     

18 May, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
    perf tools: Add mode to build without newt support
    perf symbols: symbol inconsistency message should be done only at verbose=1
    perf tui: Add explicit -lslang option
    perf options: Type check all the remaining OPT_ variants
    perf options: Type check OPT_BOOLEAN and fix the offenders
    perf options: Check v type in OPT_U?INTEGER
    perf options: Introduce OPT_UINTEGER
    perf tui: Add workaround for slang < 2.1.4
    perf record: Fix bug mismatch with -c option definition
    perf options: Introduce OPT_U64
    perf tui: Add help window to show key associations
    perf tui: Make <- exit menus too
    perf newt: Add single key shortcuts for zoom into DSO and threads
    perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
    perf newt: Fix the 'A'/'a' shortcut for annotate
    perf newt: Make <- exit the ui_browser
    x86, perf: P4 PMU - fix counters management logic
    perf newt: Make <- zoom out filters
    perf report: Report number of events, not samples
    perf hist: Clarify events_stats fields usage
    ...

    Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c

    Linus Torvalds
     

27 Apr, 2010

1 commit

  • BKL isn't present anymore into this file thus we can safely remove
    smp_lock.h inclusion.

    Signed-off-by: Alessio Igor Bogani
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: James Morris
    Cc: Ingo Molnar
    Signed-off-by: Frederic Weisbecker

    Alessio Igor Bogani
     

10 Apr, 2010

1 commit

  • The comment suggests that this usage is stale. There is no bkl in the
    exec path so if there is a race lurking there, the bkl in ptrace is
    not going to help in this regard.

    Overview of the possibility of "accidental" races this bkl might
    protect:

    - ptrace_traceme() is protected against task removal and concurrent
    read/write on current->ptrace as it locks write tasklist_lock.

    - arch_ptrace_attach() is serialized by ptrace_traceme() against
    concurrent PTRACE_TRACEME or PTRACE_ATTACH

    - ptrace_attach() is protected the same way ptrace_traceme() and
    in turn serializes arch_ptrace_attach()

    - ptrace_check_attach() does its own well described serializing too.

    There is no obvious race here.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Frederic Weisbecker
    Acked-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Cc: Andrew Morton
    Cc: Roland McGrath

    Arnd Bergmann
     

26 Mar, 2010

1 commit

  • Support for the PMU's BTS features has been upstreamed in
    v2.6.32, but we still have the old and disabled ptrace-BTS,
    as Linus noticed it not so long ago.

    It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
    regard for other uses (perf) and doesn't provide the flexibility
    needed for perf either.

    Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
    was never used and ptrace-block-step can be implemented using a
    much simpler approach.

    So axe all 3000 lines of it. That includes the *locked_memory*()
    APIs in mm/mlock.c as well.

    Reported-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Markus Metzger
    Cc: Steven Rostedt
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

24 Feb, 2010

1 commit


12 Feb, 2010

1 commit

  • Generic support for PTRACE_GETREGSET/PTRACE_SETREGSET commands which
    export the regsets supported by each architecture using the correponding
    NT_* types. These NT_* types are already part of the userland ABI, used
    in representing the architecture specific register sets as different NOTES
    in an ELF core file.

    'addr' parameter for the ptrace system call encode the REGSET type (using
    the corresppnding NT_* type) and the 'data' parameter points to the
    struct iovec having the user buffer and the length of that buffer.

    struct iovec iov = { buf, len};
    ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);

    On successful completion, iov.len will be updated by the kernel specifying
    how much the kernel has written/read to/from the user's iov.buf.

    x86 extended state registers are primarily exported using this interface.

    Signed-off-by: Suresh Siddha
    LKML-Reference:
    Acked-by: Hongjiu Lu
    Cc: Roland McGrath
    Signed-off-by: H. Peter Anvin

    Suresh Siddha
     

24 Sep, 2009

1 commit

  • The bug is old, it wasn't cause by recent changes.

    Test case:

    static void *tfunc(void *arg)
    {
    int pid = (long)arg;

    assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0);
    kill(pid, SIGKILL);

    sleep(1);
    return NULL;
    }

    int main(void)
    {
    pthread_t th;
    long pid = fork();

    if (!pid)
    pause();

    signal(SIGCHLD, SIG_IGN);
    assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0);

    int r = waitpid(-1, NULL, __WNOTHREAD);
    printf("waitpid: %d %m\n", r);

    return 0;
    }

    Before the patch this program hangs, after this patch waitpid() correctly
    fails with errno == -ECHILD.

    The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its
    ->real_parent is our sub-thread and we ignore SIGCHLD. But in this case
    we should wake up other threads which can sleep in do_wait().

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Vitaly Mayatskikh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

13 Jul, 2009

1 commit