17 Apr, 2015

2 commits

  • ptrace_detach() re-checks ->ptrace under tasklist lock and calls
    release_task() if __ptrace_detach() returns true. This was needed because
    the __TASK_TRACED tracee could be killed/untraced, and it could even pass
    exit_notify() before we take tasklist_lock.

    But this is no longer possible after 9899d11f6544 "ptrace: ensure
    arch_ptrace/ptrace_request can never race with SIGKILL". We can turn
    these checks into WARN_ON() and remove release_task().

    While at it, document the setting of child->exit_code.

    Signed-off-by: Oleg Nesterov
    Cc: Pavel Labath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
    tracee->exit_code and then wake_up_state() changes tracee->state. If the
    tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
    wrongly looks like another report from tracee.

    This confuses debugger, and since wait_task_stopped() clears ->exit_code
    the tracee can miss a signal.

    Test-case:

    #include
    #include
    #include
    #include
    #include
    #include

    int pid;

    void *waiter(void *arg)
    {
    int stat;

    for (;;) {
    assert(pid == wait(&stat));
    assert(WIFSTOPPED(stat));
    if (WSTOPSIG(stat) == SIGHUP)
    continue;

    assert(WSTOPSIG(stat) == SIGCONT);
    printf("ERR! extra/wrong report:%x\n", stat);
    }
    }

    int main(void)
    {
    pthread_t thread;

    pid = fork();
    if (!pid) {
    assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
    for (;;)
    kill(getpid(), SIGHUP);
    }

    assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

    for (;;)
    ptrace(PTRACE_CONT, pid, 0, SIGCONT);

    return 0;
    }

    Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
    ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
    should use lock_task_sighand(child).

    Signed-off-by: Oleg Nesterov
    Reported-by: Pavel Labath
    Tested-by: Pavel Labath
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

18 Feb, 2015

1 commit


11 Dec, 2014

1 commit

  • Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks,
    we can simply pass "dead_children" list to exit_ptrace() and remove
    another release_task() loop. Plus this way we do not need to drop and
    reacquire tasklist_lock.

    Also shift the list_empty(ptraced) check, if we want this optimization it
    makes sense to eliminate the function call altogether.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

06 Mar, 2014

1 commit

  • Convert all compat system call functions where all parameter types
    have a size of four or less than four bytes, or are pointer types
    to COMPAT_SYSCALL_DEFINE.
    The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
    zero and sign extension to 64 bit of all parameters if needed.

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

13 Nov, 2013

1 commit

  • The get_dumpable() return value is not boolean. Most users of the
    function actually want to be testing for non-SUID_DUMP_USER(1) rather than
    SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
    protected state. Almost all places did this correctly, excepting the two
    places fixed in this patch.

    Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
    or
    if (dumpable == 0) { /* be protective */ }
    or
    if (!dumpable) { /* be protective */ }

    Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
    or
    if (dumpable != 1) { /* be protective */ }

    Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
    user was able to ptrace attach to processes that had dropped privileges to
    that user. (This may have been partially mitigated if Yama was enabled.)

    The macros have been moved into the file that declares get/set_dumpable(),
    which means things like the ia64 code can see them too.

    CVE-2013-2929

    Reported-by: Vasily Kulikov
    Signed-off-by: Kees Cook
    Cc: "Luck, Tony"
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

12 Sep, 2013

1 commit

  • __ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
    current, this can can lead to surprising results.

    For example, a sub-thread can't readlink("/proc/self/exe") if the
    executable is not readable. setup_new_exec()->would_dump() notices that
    inode_permission(MAY_READ) fails and then it does
    set_dumpable(suid_dumpable). After that get_dumpable() fails.

    (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
    could add PTRACE_MODE_NODUMPABLE)

    Change __ptrace_may_access() to use same_thread_group() instead of "task
    == current". Any security check is pointless when the tasks share the
    same ->mm.

    Signed-off-by: Mark Grondona
    Signed-off-by: Ben Woodard
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Grondona
     

07 Aug, 2013

1 commit

  • This reverts commit fab840fc2d542fabcab903db8e03589a6702ba5f.

    This commit even has the test-case to prove that the tracee
    can be killed by SIGTRAP if the debugger does not remove the
    breakpoints before PTRACE_DETACH.

    However, this is exactly what wineserver deliberately does,
    set_thread_context() calls PTRACE_ATTACH + PTRACE_DETACH just
    for PTRACE_POKEUSER(DR*) in between.

    So we should revert this fix and document that PTRACE_DETACH
    should keep the breakpoints.

    Reported-by: Felipe Contreras
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

10 Jul, 2013

2 commits

  • Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child). This
    frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
    ensures that the tracee won't be killed by SIGTRAP triggered by the
    active breakpoints.

    Test-case:

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
    unsigned long dr7;

    dr7 = ((len | type) & 0xf)
    << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
    if (enable)
    dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

    return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
    return ptrace(PTRACE_POKEUSER, pid,
    offsetof (struct user, u_debugreg[dr]),
    val);
    }

    void func(void)
    {
    }

    int main(void)
    {
    int pid, stat;
    unsigned long dr7;

    pid = fork();
    if (!pid) {
    assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
    kill(getpid(), SIGHUP);

    func();
    return 0x13;
    }

    assert(pid == waitpid(-1, &stat, 0));
    assert(WSTOPSIG(stat) == SIGHUP);

    assert(write_dr(pid, 0, (long)func) == 0);
    dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
    assert(write_dr(pid, 7, dr7) == 0);

    assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
    assert(pid == waitpid(-1, &stat, 0));
    assert(stat == 0x1300);

    return 0;
    }

    Before this patch the child is killed after PTRACE_DETACH.

    Signed-off-by: Oleg Nesterov
    Acked-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Ingo Molnar
    Cc: Jan Kratochvil
    Cc: Michael Neuling
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Prasad
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This reverts commit bf26c018490c ("Prepare to fix racy accesses on task
    breakpoints").

    The patch was fine but we can no longer race with SIGKILL after commit
    9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request can never race
    with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
    ->ptrace_bps[] can't go away.

    Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
    we can kill them and remove task->ptrace_bp_refcnt.

    Signed-off-by: Oleg Nesterov
    Acked-by: Frederic Weisbecker
    Acked-by: Michael Neuling
    Cc: Benjamin Herrenschmidt
    Cc: Ingo Molnar
    Cc: Jan Kratochvil
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Prasad
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

04 Jul, 2013

1 commit

  • crtools uses a parasite code for dumping processes. The parasite code is
    injected into a process with help PTRACE_SEIZE.

    Currently crtools blocks signals from a parasite code. If a process has
    pending signals, crtools wait while a process handles these signals.

    This method is not suitable for stopped tasks. A stopped task can have a
    few pending signals, when we will try to execute a parasite code, we will
    need to drop SIGSTOP, but all other signals must remain pending, because a
    state of processes must not be changed during checkpointing.

    This patch adds two ptrace commands to set/get signal-blocked mask.

    I think gdb can use this commands too.

    [akpm@linux-foundation.org: be consistent with brace layout]
    Signed-off-by: Andrey Vagin
    Reviewed-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Michael Kerrisk
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

30 Jun, 2013

1 commit

  • This __put_user() could be used by unprivileged processes to write into
    kernel memory. The issue here is that even if copy_siginfo_to_user()
    fails, the error code is not checked before __put_user() is executed.

    Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
    so it has not hit a stable release yet.

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Roland McGrath
    Cc: Paul McKenney
    Cc: David Howells
    Cc: Dave Jones
    Cc: Pavel Emelyanov
    Cc: Pedro Alves
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

01 May, 2013

1 commit

  • This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

    This request is used to retrieve information about pending signals
    starting with the specified sequence number. Siginfo_t structures are
    copied from the child into the buffer starting at "data".

    The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
    struct ptrace_peeksiginfo_args {
    u64 off; /* from which siginfo to start */
    u32 flags;
    s32 nr; /* how may siginfos to take */
    };

    "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
    i386 and a negative values is used for errors.

    Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
    signals from process-wide queue. If this flag is not set, signals are
    read from a per-thread queue.

    The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a
    signal with the specified sequence number doesn't exist, ptrace returns
    zero. The request returns an error, if no signal has been dumped.

    Errors:
    EINVAL - one or more specified flags are not supported or nr is negative
    EFAULT - buf or addr is outside your accessible address space.

    A result siginfo contains a kernel part of si_code which usually striped,
    but it's required for queuing the same siginfo back during restore of
    pending signals.

    This functionality is required for checkpointing pending signals. Pedro
    Alves suggested using it in "gdb" to peek at pending signals. gdb already
    uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
    dequeued. This functionality allows gdb to look at the pending signals
    which were not reported yet.

    The prototype of this code was developed by Oleg Nesterov.

    Signed-off-by: Andrew Vagin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: David Howells
    Cc: Dave Jones
    Cc: "Michael Kerrisk (man-pages)"
    Cc: Pavel Emelyanov
    Cc: Linus Torvalds
    Cc: Pedro Alves
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

09 Feb, 2013

1 commit

  • The original pull message for uprobes (commit 654443e2) noted:

    This tree includes uprobes support in 'perf probe' - but SystemTap
    (and other tools) can take advantage of user probe points as well.

    In order to actually be usable in module-based tools like SystemTap, the
    interface needs to be exported. This patch first adds the obvious
    exports for uprobe_register and uprobe_unregister. Then it also adds
    one for task_user_regset_view, which is necessary to get the correct
    state of userspace registers.

    Signed-off-by: Josh Stone
    Signed-off-by: Oleg Nesterov

    Josh Stone
     

23 Jan, 2013

2 commits

  • putreg() assumes that the tracee is not running and pt_regs_access() can
    safely play with its stack. However a killed tracee can return from
    ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
    that debugger can actually read/modify the kernel stack until the tracee
    does SAVE_REST again.

    set_task_blockstep() can race with SIGKILL too and in some sense this
    race is even worse, the very fact the tracee can be woken up breaks the
    logic.

    As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
    call, this ensures that nobody can ever wakeup the tracee while the
    debugger looks at it. Not only this fixes the mentioned problems, we
    can do some cleanups/simplifications in arch_ptrace() paths.

    Probably ptrace_unfreeze_traced() needs more callers, for example it
    makes sense to make the tracee killable for oom-killer before
    access_process_vm().

    While at it, add the comment into may_ptrace_stop() to explain why
    ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

    Reported-by: Salman Qazi
    Reported-by: Suleiman Souhlal
    Suggested-by: Linus Torvalds
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cleanup and preparation for the next change.

    signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
    actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
    necessary mask.

    Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
    signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
    which adds __TASK_TRACED.

    This way ptrace_signal_wake_up() can work "inside" ptrace_request()
    even if the tracee doesn't have the TASK_WAKEKILL bit set.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Jan, 2013

1 commit

  • The ia64 function "thread_matches()" has no users since commit
    e868a55c2a8c ("[IA64] remove find_thread_for_addr()"). Remove it.

    This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
    which is good since we'll need to change the semantics of it and fix up
    all the callers.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

18 Dec, 2012

2 commits

  • Merge misc patches from Andrew Morton:
    "Incoming:

    - lots of misc stuff

    - backlight tree updates

    - lib/ updates

    - Oleg's percpu-rwsem changes

    - checkpatch

    - rtc

    - aoe

    - more checkpoint/restart support

    I still have a pile of MM stuff pending - Pekka should be merging
    later today after which that is good to go. A number of other things
    are twiddling thumbs awaiting maintainer merges."

    * emailed patches from Andrew Morton : (180 commits)
    scatterlist: don't BUG when we can trivially return a proper error.
    docs: update documentation about /proc//fdinfo/ fanotify output
    fs, fanotify: add @mflags field to fanotify output
    docs: add documentation about /proc//fdinfo/ output
    fs, notify: add procfs fdinfo helper
    fs, exportfs: add exportfs_encode_inode_fh() helper
    fs, exportfs: escape nil dereference if no s_export_op present
    fs, epoll: add procfs fdinfo helper
    fs, eventfd: add procfs fdinfo helper
    procfs: add ability to plug in auxiliary fdinfo providers
    tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
    breakpoint selftests: print failure status instead of cause make error
    kcmp selftests: print fail status instead of cause make error
    kcmp selftests: make run_tests fix
    mem-hotplug selftests: print failure status instead of cause make error
    cpu-hotplug selftests: print failure status instead of cause make error
    mqueue selftests: print failure status instead of cause make error
    vm selftests: print failure status instead of cause make error
    ubifs: use prandom_bytes
    mtd: nandsim: use prandom_bytes
    ...

    Linus Torvalds
     
  • Ptrace jailers want to be sure that the tracee can never escape
    from the control. However if the tracer dies unexpectedly the
    tracee continues to run in potentially unsafe mode.

    Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
    it sends SIGKILL to every tracee which has this bit set.

    Note that the new option is not equal to the last-option << 1. Because
    currently all options have an event, and the new one starts the eventless
    group. It uses the random 20 bit, so we have the room for 12 more events,
    but we can also add the new eventless options below this one.

    Suggested by Amnon Shiloh.

    Signed-off-by: Oleg Nesterov
    Tested-by: Amnon Shiloh
    Cc: Denys Vlasenko
    Cc: Michael Kerrisk
    Cc: Serge Hallyn
    Cc: Chris Evans
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

20 Nov, 2012

1 commit

  • The task_user_ns function hides the fact that it is getting the user
    namespace from struct cred on the task. struct cred may go away as
    soon as the rcu lock is released. This leads to a race where we
    can dereference a stale user namespace pointer.

    To make it obvious a struct cred is involved kill task_user_ns.

    To kill the race modify the users of task_user_ns to only
    reference the user namespace while the rcu lock is held.

    Cc: Kees Cook
    Cc: James Morris
    Acked-by: Kees Cook
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

03 Aug, 2012

1 commit


03 May, 2012

1 commit


08 Apr, 2012

1 commit


24 Mar, 2012

4 commits

  • PTRACE_SEIZE code is tested and ready for production use, remove the
    code which requires special bit in data argument to make PTRACE_SEIZE
    work.

    Strace team prepares for a new release of strace, and we would like to
    ship the code which uses PTRACE_SEIZE, preferably after this change goes
    into released kernel.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • This can be used to close a few corner cases in strace where we get
    unwanted racy behavior after attach, but before we have a chance to set
    options (the notorious post-execve SIGTRAP comes to mind), and removes
    the need to track "did we set opts for this task" state in strace
    internals.

    While we are at it:

    Make it possible to extend SEIZE in the future with more functionality
    by passing non-zero 'addr' parameter. To that end, error out if 'addr'
    is non-zero. PTRACE_ATTACH did not (and still does not) have such
    check, and users (strace) do pass garbage there... let's avoid
    repeating this mistake with SEIZE.

    Set all task->ptrace bits in one operation - before this change, we were
    adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
    was probably ok (not a bug), but let's be on a safer side.

    Changes since v2: use (unsigned long) casts instead of (long) ones, move
    PTRACE_SEIZE_DEVEL-related code to separate lines of code.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Cc: Pedro Alves
    Reviewed-by: Oleg Nesterov
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
    PT_option bits contiguous and therefore makes code in
    ptrace_setoptions() much simpler.

    Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
    instead of using explicit numeric constants, to ensure we don't mess up
    relationship between bit positions and event ids.

    PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
    value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

    PT_TRACE_MASK constant is nuked, the only its use is replaced by
    (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • On ptrace(PTRACE_SETOPTIONS, pid, 0, ), we used to set those
    option bits which are known, and then fail with -EINVAL if there are
    some unknown bits in .

    This is inconsistent with typical error handling, which does not change
    any state if input is invalid.

    This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
    return -EINVAL and don't change any bits in task->ptrace.

    It's very unlikely that there is userspace code in the wild which will
    be affected by this change: it should have the form

    ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)

    where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel
    headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
    way userspace can use one if it defines one itself. I can't see why
    anyone would do such a thing deliberately.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     

15 Jan, 2012

1 commit

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     

06 Jan, 2012

2 commits

  • Reading /proc/pid/stat of another process checks if one has ptrace permissions
    on that process. If one does have permissions it outputs some data about the
    process which might have security and attack implications. If the current
    task does not have ptrace permissions the read still works, but those fields
    are filled with inocuous (0) values. Since this check and a subsequent denial
    is not a violation of the security policy we should not audit such denials.

    This can be quite useful to removing ptrace broadly across a system without
    flooding the logs when ps is run or something which harmlessly walks proc.

    Signed-off-by: Eric Paris
    Acked-by: Serge E. Hallyn

    Eric Paris
     
  • task_ in the front of a function, in the security subsystem anyway, means
    to me at least, that we are operating with that task as the subject of the
    security decision. In this case what it means is that we are using current as
    the subject but we use the task to get the right namespace. Who in the world
    would ever realize that's what task_ns_capability means just by the name? This
    patch eliminates the task_ns functions entirely and uses the has_ns_capability
    function instead. This means we explicitly open code the ns in question in
    the caller. I think it makes the caller a LOT more clear what is going on.

    Signed-off-by: Eric Paris
    Acked-by: Serge E. Hallyn

    Eric Paris
     

05 Jan, 2012

1 commit

  • This is the temporary simple fix for 3.2, we need more changes in this
    area.

    1. do_signal_stop() assumes that the running untraced thread in the
    stopped thread group is not possible. This was our goal but it is
    not yet achieved: a stopped-but-resumed tracee can clone the running
    thread which can initiate another group-stop.

    Remove WARN_ON_ONCE(!current->ptrace).

    2. A new thread always starts with ->jobctl = 0. If it is auto-attached
    and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
    but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
    in do_jobctl_trap() if another debugger attaches.

    Change __ptrace_unlink() to set the artificial SIGSTOP for report.

    Alternatively we could change ptrace_init_task() to copy signr from
    current, but this means we can copy it for no reason and hide the
    possible similar problems.

    Acked-by: Tejun Heo
    Cc: [3.1]
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

31 Oct, 2011

1 commit

  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

26 Sep, 2011

1 commit


19 Jul, 2011

1 commit

  • This change adds a procfs connector event, which is emitted on every
    successful process tracer attach or detach.

    If some process connects to other one, kernelspace connector reports
    process id and thread group id of both these involved processes. On
    disconnection null process id is returned.

    Such an event allows to create a simple automated userspace mechanism
    to be aware about processes connecting to others, therefore predefined
    process policies can be applied to them if needed.

    Note, a detach signal is emitted only in case, if a tracer process
    explicitly executes PTRACE_DETACH request. In other cases like tracee
    or tracer exit detach event from proc connector is not reported.

    Signed-off-by: Vladimir Zapolskiy
    Acked-by: Evgeniy Polyakov
    Cc: David S. Miller
    Signed-off-by: Oleg Nesterov

    Vladimir Zapolskiy
     

28 Jun, 2011

2 commits

  • __ptrace_detach() and do_notify_parent() set task->exit_signal = -1
    to mark the task dead. This is no longer needed, nobody checks
    exit_signal to detect the EXIT_DEAD task.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo

    Oleg Nesterov
     
  • __ptrace_detach() relies on the current obscure behaviour of
    do_notify_parent(tsk) which changes tsk->exit_signal if this child
    should be silently reaped. That is why we check task_detached(), it
    is true if the task is sub-thread, or it is the group_leader but
    its exit_signal was changed by do_notify_parent().

    This is confusing, change the code to rely on !thread_group_leader()
    or the value returned by do_notify_parent().

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

17 Jun, 2011

2 commits

  • The previous patch implemented async notification for ptrace but it
    only worked while trace is running. This patch introduces
    PTRACE_LISTEN which is suggested by Oleg Nestrov.

    It's allowed iff tracee is in STOP trap and puts tracee into
    quasi-running state - tracee never really runs but wait(2) and
    ptrace(2) consider it to be running. While ptracer is listening,
    tracee is allowed to re-enter STOP to notify an async event.
    Listening state is cleared on the first notification. Ptracer can
    also clear it by issuing INTERRUPT - tracee will re-trap into STOP
    with listening state cleared.

    This allows ptracer to monitor group stop state without running tracee
    - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
    wait(2) to wait for the next group stop event. When it happens,
    PTRACE_GETSIGINFO provides information to determine the current state.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207
    #define PTRACE_LISTEN 0x4208

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    if (si.si_signo != SIGTRAP)
    ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
    else
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    This is identical to the program to test TRAP_NOTIFY except that
    tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
    This allows ptracer to monitor when group stop ends without running
    tracee.

    # ./test-listen
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
    task_stopped_code() as suggested by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently, there's no way to trap a running ptracee short of sending a
    signal which has various side effects. This patch implements
    PTRACE_INTERRUPT which traps ptracee without any signal or job control
    related side effect.

    The implementation is almost trivial. It uses the group stop trap -
    SIGTRAP | PTRACE_EVENT_STOP << 8. A new trap flag
    JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
    cleared when any trap happens. As INTERRUPT should be useable
    regardless of the current state of tracee, task_is_traced() test in
    ptrace_check_attach() is skipped for INTERRUPT.

    PTRACE_INTERRUPT is available iff tracee is attached with
    PTRACE_SEIZE.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive pid=%d\n", getpid());
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);

    printf("tracer: INTERRUPT and DETACH\n");
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    nanosleep(&ts3s, NULL);

    printf("tracer: exiting\n");
    kill(tracee, SIGKILL);
    return 0;
    }

    When called without argument, tracee is seized from running state,
    interrupted and then detached back to running state.

    # ./test-interrupt
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: INTERRUPT and DETACH
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: exiting

    When called with argument, tracee is seized from stopped state,
    continued, interrupted and then detached back to stopped state.

    # ./test-interrupt 1
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracer: INTERRUPT and DETACH
    tracer: exiting

    Before PTRACE_INTERRUPT, once the tracee was running, there was no way
    to trap tracee and do PTRACE_DETACH without causing side effect.

    -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
    up scheduling TRAP_STOP if child is dying which may make the
    child unkillable. Spotted by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo