06 Jan, 2017

2 commits

  • commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3 upstream.

    It is the reasonable expectation that if an executable file is not
    readable there will be no way for a user without special privileges to
    read the file. This is enforced in ptrace_attach but if ptrace
    is already attached before exec there is no enforcement for read-only
    executables.

    As the only way to read such an mm is through access_process_vm
    spin a variant called ptrace_access_vm that will fail if the
    target process is not being ptraced by the current process, or
    the current process did not have sufficient privileges when ptracing
    began to read the target processes mm.

    In the ptrace implementations replace access_process_vm by
    ptrace_access_vm. There remain several ptrace sites that still use
    access_process_vm as they are reading the target executables
    instructions (for kernel consumption) or register stacks. As such it
    does not appear necessary to add a permission check to those calls.

    This bug has always existed in Linux.

    Fixes: v1.0
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

    When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
    overlooked. This can result in incorrect behavior when an application
    like strace traces an exec of a setuid executable.

    Further PT_PTRACE_CAP does not have enough information for making good
    security decisions as it does not report which user namespace the
    capability is in. This has already allowed one mistake through
    insufficient granulariy.

    I found this issue when I was testing another corner case of exec and
    discovered that I could not get strace to set PT_PTRACE_CAP even when
    running strace as root with a full set of caps.

    This change fixes the above issue with strace allowing stracing as
    root a setuid executable without disabling setuid. More fundamentaly
    this change allows what is allowable at all times, by using the correct
    information in it's decision.

    Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

21 Jan, 2016

1 commit

  • By checking the effective credentials instead of the real UID / permitted
    capabilities, ensure that the calling process actually intended to use its
    credentials.

    To ensure that all ptrace checks use the correct caller credentials (e.g.
    in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
    flag), use two new flags and require one of them to be set.

    The problem was that when a privileged task had temporarily dropped its
    privileges, e.g. by calling setreuid(0, user_uid), with the intent to
    perform following syscalls with the credentials of a user, it still passed
    ptrace access checks that the user would not be able to pass.

    While an attacker should not be able to convince the privileged task to
    perform a ptrace() syscall, this is a problem because the ptrace access
    check is reused for things in procfs.

    In particular, the following somewhat interesting procfs entries only rely
    on ptrace access checks:

    /proc/$pid/stat - uses the check for determining whether pointers
    should be visible, useful for bypassing ASLR
    /proc/$pid/maps - also useful for bypassing ASLR
    /proc/$pid/cwd - useful for gaining access to restricted
    directories that contain files with lax permissions, e.g. in
    this scenario:
    lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
    drwx------ root root /root
    drwxr-xr-x root root /root/foobar
    -rw-r--r-- root root /root/foobar/secret

    Therefore, on a system where a root-owned mode 6755 binary changes its
    effective credentials as described and then dumps a user-specified file,
    this could be used by an attacker to reveal the memory layout of root's
    processes or reveal the contents of files he is not allowed to access
    (through /proc/$pid/cwd).

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Jann Horn
    Acked-by: Kees Cook
    Cc: Casey Schaufler
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Andy Shevchenko
    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Willy Tarreau
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jann Horn
     

16 Jul, 2015

1 commit

  • This patch is the first step in enabling checkpoint/restore of processes
    with seccomp enabled.

    One of the things CRIU does while dumping tasks is inject code into them
    via ptrace to collect information that is only available to the process
    itself. However, if we are in a seccomp mode where these processes are
    prohibited from making these syscalls, then what CRIU does kills the task.

    This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
    a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
    filters to disable (and re-enable) seccomp filters for another task so that
    they can be successfully dumped (and restored). We restrict the set of
    processes that can disable seccomp through ptrace because although today
    ptrace can be used to bypass seccomp, there is some discussion of closing
    this loophole in the future and we would like this patch to not depend on
    that behavior and be future proofed for when it is removed.

    Note that seccomp can be suspended before any filters are actually
    installed; this behavior is useful on criu restore, so that we can suspend
    seccomp, restore the filters, unmap our restore code from the restored
    process' address space, and then resume the task by detaching and have the
    filters resumed as well.

    v2 changes:

    * require that the tracer have no seccomp filters installed
    * drop TIF_NOTSC manipulation from the patch
    * change from ptrace command to a ptrace option and use this ptrace option
    as the flag to check. This means that as soon as the tracer
    detaches/dies, seccomp is re-enabled and as a corrollary that one can not
    disable seccomp across PTRACE_ATTACHs.

    v3 changes:

    * get rid of various #ifdefs everywhere
    * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
    used

    v4 changes:

    * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
    directly

    v5 changes:

    * check that seccomp is not enabled (or suspended) on the tracer

    Signed-off-by: Tycho Andersen
    CC: Will Drewry
    CC: Roland McGrath
    CC: Pavel Emelyanov
    CC: Serge E. Hallyn
    Acked-by: Oleg Nesterov
    Acked-by: Andy Lutomirski
    [kees: access seccomp.mode through seccomp_mode() instead]
    Signed-off-by: Kees Cook

    Tycho Andersen
     

11 Dec, 2014

1 commit

  • Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks,
    we can simply pass "dead_children" list to exit_ptrace() and remove
    another release_task() loop. Plus this way we do not need to drop and
    reacquire tasklist_lock.

    Also shift the list_empty(ptraced) check, if we want this optimization it
    makes sense to eliminate the function call altogether.

    Signed-off-by: Oleg Nesterov
    Cc: Aaron Tomlin
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman" ,
    Cc: Sterling Alexander
    Cc: Peter Zijlstra
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

04 Jul, 2014

1 commit

  • The 'sysret' fastpath does not correctly restore even all regular
    registers, much less any segment registers or reflags values. That is
    very much part of why it's faster than 'iret'.

    Normally that isn't a problem, because the normal ptrace() interface
    catches the process using the signal handler infrastructure, which
    always returns with an iret.

    However, some paths can get caught using ptrace_event() instead of the
    signal path, and for those we need to make sure that we aren't going to
    return to user space using 'sysret'. Otherwise the modifications that
    may have been done to the register set by the tracer wouldn't
    necessarily take effect.

    Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
    arch_ptrace_stop_needed() which is invoked from ptrace_stop().

    Signed-off-by: Tejun Heo
    Reported-by: Andy Lutomirski
    Acked-by: Oleg Nesterov
    Suggested-by: Linus Torvalds
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

07 Jun, 2014

1 commit

  • When tracing a process in another pid namespace, it's important for fork
    event messages to contain the child's pid as seen from the tracer's pid
    namespace, not the parent's. Otherwise, the tracer won't be able to
    correlate the fork event with later SIGTRAP signals it receives from the
    child.

    We still risk a race condition if a ptracer from a different pid
    namespace attaches after we compute the pid_t value. However, sending a
    bogus fork event message in this unlikely scenario is still a vast
    improvement over the status quo where we always send bogus fork event
    messages to debuggers in a different pid namespace than the forking
    process.

    Signed-off-by: Matthew Dempsky
    Acked-by: Oleg Nesterov
    Cc: Kees Cook
    Cc: Julien Tinnes
    Cc: Roland McGrath
    Cc: Jan Kratochvil
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dempsky
     

10 Jul, 2013

1 commit

  • This reverts commit bf26c018490c ("Prepare to fix racy accesses on task
    breakpoints").

    The patch was fine but we can no longer race with SIGKILL after commit
    9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request can never race
    with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
    ->ptrace_bps[] can't go away.

    Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
    we can kill them and remove task->ptrace_bp_refcnt.

    Signed-off-by: Oleg Nesterov
    Acked-by: Frederic Weisbecker
    Acked-by: Michael Neuling
    Cc: Benjamin Herrenschmidt
    Cc: Ingo Molnar
    Cc: Jan Kratochvil
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: Will Deacon
    Cc: Prasad
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Jan, 2013

1 commit

  • The ia64 function "thread_matches()" has no users since commit
    e868a55c2a8c ("[IA64] remove find_thread_for_addr()"). Remove it.

    This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
    which is good since we'll need to change the semantics of it and fix up
    all the callers.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

21 Dec, 2012

1 commit

  • Pull signal handling cleanups from Al Viro:
    "sigaltstack infrastructure + conversion for x86, alpha and um,
    COMPAT_SYSCALL_DEFINE infrastructure.

    Note that there are several conflicts between "unify
    SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
    resolution is trivial - just remove definitions of SS_ONSTACK and
    SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
    include/uapi/linux/signal.h contains the unified variant."

    Fixed up conflicts as per Al.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to generic sigaltstack
    new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
    generic compat_sys_sigaltstack()
    introduce generic sys_sigaltstack(), switch x86 and um to it
    new helper: compat_user_stack_pointer()
    new helper: restore_altstack()
    unify SS_ONSTACK/SS_DISABLE definitions
    new helper: current_user_stack_pointer()
    missing user_stack_pointer() instances
    Bury the conditionals from kernel_thread/kernel_execve series
    COMPAT_SYSCALL_DEFINE: infrastructure

    Linus Torvalds
     

20 Dec, 2012

1 commit

  • Cross-architecture equivalent of rdusp(); default is
    user_stack_pointer(current_pt_regs()) - that works for almost all
    platforms that have usp saved in pt_regs. The only exception from
    that is ia64 - we want memory stack, not the backing store for
    register one.

    Signed-off-by: Al Viro

    Al Viro
     

18 Dec, 2012

1 commit

  • Ptrace jailers want to be sure that the tracee can never escape
    from the control. However if the tracer dies unexpectedly the
    tracee continues to run in potentially unsafe mode.

    Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
    it sends SIGKILL to every tracee which has this bit set.

    Note that the new option is not equal to the last-option << 1. Because
    currently all options have an event, and the new one starts the eventless
    group. It uses the random 20 bit, so we have the room for 12 more events,
    but we can also add the new eventless options below this one.

    Suggested by Amnon Shiloh.

    Signed-off-by: Oleg Nesterov
    Tested-by: Amnon Shiloh
    Cc: Denys Vlasenko
    Cc: Michael Kerrisk
    Cc: Serge Hallyn
    Cc: Chris Evans
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

29 Nov, 2012

3 commits

  • the first one is equal to signal_pt_regs(), the second is never used
    (and always NULL, while we are at it).

    Signed-off-by: Al Viro

    Al Viro
     
  • Always equal to task_pt_regs(current); defined only when we are in
    signal delivery. It may be different from current_pt_regs() - e.g.
    architectures like m68k may have pt_regs location on exception
    different from that on a syscall and signals (just as ptrace handling)
    may happen on exceptions as well as on syscalls.

    When they are equal, it's often better to have signal_pt_regs
    defined (in asm/ptrace.h) as current_pt_regs - that tends to be
    optimized better than default would be. However, optimisation is
    the only reason why we might want an arch-specific definition;
    if current_pt_regs() and task_pt_regs(current) have different
    values, the latter one is right.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     

13 Oct, 2012

1 commit


10 Oct, 2012

1 commit

  • Pull generic execve() changes from Al Viro:
    "This introduces the generic kernel_thread() and kernel_execve()
    functions, and switches x86, arm, alpha, um and s390 over to them."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
    s390: convert to generic kernel_execve()
    s390: switch to generic kernel_thread()
    s390: fold kernel_thread_helper() into ret_from_fork()
    s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
    um: switch to generic kernel_thread()
    x86, um/x86: switch to generic sys_execve and kernel_execve
    x86: split ret_from_fork
    alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    alpha: switch to generic kernel_thread()
    alpha: switch to generic sys_execve()
    arm: get rid of execve wrapper, switch to generic execve() implementation
    arm: optimized current_pt_regs()
    arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
    arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
    generic sys_execve()
    generic kernel_execve()
    new helper: current_pt_regs()
    preparation for generic kernel_thread()
    um: kill thread->forking
    um: let signal_delivered() do SIGTRAP on singlestepping into handler
    ...

    Linus Torvalds
     

01 Oct, 2012

1 commit

  • Normally (and that's the default) it's just task_pt_regs(current).
    However, if an architecture can optimize that, it can do so by
    making a macro of its own available from asm/ptrace.h. More
    importantly, some architectures have task_pt_regs() working only
    for traced tasks blocked on signal delivery. current_pt_regs()
    needs to work for *all* processes, so before those architectures
    start using stuff relying on current_pt_regs() they'll need a
    properly working variant.

    Signed-off-by: Al Viro

    Al Viro
     

03 Aug, 2012

1 commit


14 Apr, 2012

1 commit

  • This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
    and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

    When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
    tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
    results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit
    SECCOMP_RET_DATA mask of the BPF program return value will be passed as
    the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

    If the subordinate process is not using seccomp filter, then no
    system call notifications will occur even if the option is specified.

    If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
    is returned, the system call will not be executed and an -ENOSYS errno
    will be returned to userspace.

    This change adds a dependency on the system call slow path. Any future
    efforts to use the system call fast path for seccomp filter will need to
    address this restriction.

    Signed-off-by: Will Drewry
    Acked-by: Eric Paris

    v18: - rebase
    - comment fatal_signal check
    - acked-by
    - drop secure_computing_int comment
    v17: - ...
    v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
    [note PT_TRACE_MASK disappears in linux-next]
    v15: - add audit support for non-zero return codes
    - clean up style (indan@nul.nu)
    v14: - rebase/nochanges
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    (Brings back a change to ptrace.c and the masks.)
    v12: - rebase to linux-next
    - use ptrace_event and update arch/Kconfig to mention slow-path dependency
    - drop all tracehook changes and inclusion (oleg@redhat.com)
    v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
    (indan@nul.nu)
    v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
    v9: - n/a
    v8: - guarded PTRACE_SECCOMP use with an ifdef
    v7: - introduced
    Signed-off-by: James Morris

    Will Drewry
     

25 Mar, 2012

1 commit

  • Pull cleanup from Paul Gortmaker:
    "The changes shown here are to unify linux's BUG support under the one
    file. Due to historical reasons, we have some BUG code
    in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in
    linux/kernel.h predates the addition of linux/bug.h, but old code in
    kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h
    was including to pseudo link them.

    This has caused confusion[1] and general yuck/WTF[2] reactions. Here
    is an example that violates the principle of least surprise:

    CC lib/string.o
    lib/string.c: In function 'strlcat':
    lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
    make[2]: *** [lib/string.o] Error 1
    $
    $ grep linux/bug.h lib/string.c
    #include
    $

    We've included for the BUG infrastructure and yet we
    still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
    very confusing for someone who is new to kernel development.

    With the above in mind, the goals of this changeset are:

    1) find and fix any include/*.h files that were relying on the
    implicit presence of BUG code.
    2) find and fix any C files that were consuming kernel.h and hence
    relying on implicitly getting some/all BUG code.
    3) Move the BUG related code living in kernel.h to
    4) remove the asm/bug.h from kernel.h to finally break the chain.

    During development, the order was more like 3-4, build-test, 1-2. But
    to ensure that git history for bisect doesn't get needless build
    failures introduced, the commits have been reorderd to fix the problem
    areas in advance.

    [1] https://lkml.org/lkml/2012/1/3/90
    [2] https://lkml.org/lkml/2012/1/17/414"

    Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
    and linux-next.

    * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    kernel.h: doesn't explicitly use bug.h, so don't include it.
    bug: consolidate BUILD_BUG_ON with other bug code
    BUG: headers with BUG/BUG_ON etc. need linux/bug.h
    bug.h: add include of it to various implicit C users
    lib: fix implicit users of kernel.h for TAINT_WARN
    spinlock: macroize assert_spin_locked to avoid bug.h dependency
    x86: relocate get/set debugreg fcns to include/asm/debugreg.

    Linus Torvalds
     

24 Mar, 2012

4 commits

  • PTRACE_SEIZE code is tested and ready for production use, remove the
    code which requires special bit in data argument to make PTRACE_SEIZE
    work.

    Strace team prepares for a new release of strace, and we would like to
    ship the code which uses PTRACE_SEIZE, preferably after this change goes
    into released kernel.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.

    New PTRACE_EVENT_STOP is the first event which has no corresponding
    PTRACE_O_TRACE option. If we will ever want to add another such option,
    its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.

    This patch changes PTRACE_EVENT_STOP value to prevent this.

    While at it, added a comment - the one atop PTRACE_EVENT block, saying
    "Wait extended result codes for the above trace options", is not true
    for PTRACE_EVENT_STOP.

    Signed-off-by: Denys Vlasenko
    Cc: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
    PT_option bits contiguous and therefore makes code in
    ptrace_setoptions() much simpler.

    Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
    instead of using explicit numeric constants, to ensure we don't mess up
    relationship between bit positions and event ids.

    PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
    value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

    PT_TRACE_MASK constant is nuked, the only its use is replaced by
    (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
    set. This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
    we do not need/want this with PT_SEIZED which can set the options during
    attach.

    Suggested-by: Pedro Alves
    Signed-off-by: Oleg Nesterov
    Cc: Chris Evans
    Cc: Indan Zupancic
    Cc: Denys Vlasenko
    Cc: Tejun Heo
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

18 Jan, 2012

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
    audit: no leading space in audit_log_d_path prefix
    audit: treat s_id as an untrusted string
    audit: fix signedness bug in audit_log_execve_info()
    audit: comparison on interprocess fields
    audit: implement all object interfield comparisons
    audit: allow interfield comparison between gid and ogid
    audit: complex interfield comparison helper
    audit: allow interfield comparison in audit rules
    Kernel: Audit Support For The ARM Platform
    audit: do not call audit_getname on error
    audit: only allow tasks to set their loginuid if it is -1
    audit: remove task argument to audit_set_loginuid
    audit: allow audit matching on inode gid
    audit: allow matching on obj_uid
    audit: remove audit_finish_fork as it can't be called
    audit: reject entry,always rules
    audit: inline audit_free to simplify the look of generic code
    audit: drop audit_set_macxattr as it doesn't do anything
    audit: inline checks for not needing to collect aux records
    audit: drop some potentially inadvisable likely notations
    ...

    Use evil merge to fix up grammar mistakes in Kconfig file.

    Bad speling and horrible grammar (and copious swearing) is to be
    expected, but let's keep it to commit messages and comments, rather than
    expose it to users in config help texts or printouts.

    Linus Torvalds
     
  • The audit system previously expected arches calling to audit_syscall_exit to
    supply as arguments if the syscall was a success and what the return code was.
    Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
    by converting from negative retcodes to an audit internal magic value stating
    success or failure. This helper was wrong and could indicate that a valid
    pointer returned to userspace was a failed syscall. The fix is to fix the
    layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
    in turns calls back into arch code to collect the return value and to
    determine if the syscall was a success or failure. We also define a generic
    is_syscall_success() macro which determines success/failure based on if the
    value is < -MAX_ERRNO. This works for arches like x86 which do not use a
    separate mechanism to indicate syscall failure.

    We make both the is_syscall_success() and regs_return_value() static inlines
    instead of macros. The reason is because the audit function must take a void*
    for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
    pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
    function takes a void* we need to use static inlines to cast it back to the
    arch correct structure to dereference it.

    The other major change is that on some arches, like ia64, MIPS and ppc, we
    change regs_return_value() to give us the negative value on syscall failure.
    THE only other user of this macro, kretprobe_example.c, won't notice and it
    makes the value signed consistently for the audit functions across all archs.

    In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
    audit code as the return value. But the ptrace_64.h code defined the macro
    regs_return_value() as regs[3]. I have no idea which one is correct, but this
    patch now uses the regs_return_value() function, so it now uses regs[3].

    For powerpc we previously used regs->result but now use the
    regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
    always positive so the regs_return_value(), much like ia64 makes it negative
    before calling the audit code when appropriate.

    Signed-off-by: Eric Paris
    Acked-by: H. Peter Anvin [for x86 portion]
    Acked-by: Tony Luck [for ia64]
    Acked-by: Richard Weinberger [for uml]
    Acked-by: David S. Miller [for sparc]
    Acked-by: Ralf Baechle [for mips]
    Acked-by: Benjamin Herrenschmidt [for ppc]

    Eric Paris
     

06 Jan, 2012

1 commit

  • Reading /proc/pid/stat of another process checks if one has ptrace permissions
    on that process. If one does have permissions it outputs some data about the
    process which might have security and attack implications. If the current
    task does not have ptrace permissions the read still works, but those fields
    are filled with inocuous (0) values. Since this check and a subsequent denial
    is not a violation of the security policy we should not audit such denials.

    This can be quite useful to removing ptrace broadly across a system without
    flooding the logs when ps is run or something which harmlessly walks proc.

    Signed-off-by: Eric Paris
    Acked-by: Serge E. Hallyn

    Eric Paris
     

18 Jul, 2011

3 commits

  • The fake SIGSTOP during attach has numerous problems. PTRACE_SEIZE
    is already fine, but we have basically the same problems is SIGSTOP
    is sent on auto-attach, the tracer can't know if this signal signal
    should be cancelled or not.

    Change ptrace_event() to set JOBCTL_TRAP_STOP if the new child is
    PT_SEIZED, this triggers the PTRACE_EVENT_STOP report.

    Thereafter a PT_SEIZED task can never report the bogus SIGSTOP.

    Test-case:

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_SEIZE_DEVEL 0x80000000
    #define PTRACE_EVENT_STOP 7
    #define WEVENT(s) ((s & 0xFF0000) >> 16)

    int main(void)
    {
    int child, grand_child, status;
    long message;

    child = fork();
    if (!child) {
    kill(getpid(), SIGSTOP);
    fork();
    assert(0);
    return 0x23;
    }

    assert(ptrace(PTRACE_SEIZE, child, 0,PTRACE_SEIZE_DEVEL) == 0);
    assert(wait(&status) == child);
    assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);

    assert(ptrace(PTRACE_SETOPTIONS, child, 0, PTRACE_O_TRACEFORK) == 0);

    assert(ptrace(PTRACE_CONT, child, 0,0) == 0);
    assert(waitpid(child, &status, 0) == child);
    assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
    assert(WEVENT(status) == PTRACE_EVENT_FORK);

    assert(ptrace(PTRACE_GETEVENTMSG, child, 0, &message) == 0);
    grand_child = message;

    assert(waitpid(grand_child, &status, 0) == grand_child);
    assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
    assert(WEVENT(status) == PTRACE_EVENT_STOP);

    kill(child, SIGKILL);
    kill(grand_child, SIGKILL);
    return 0;
    }

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • If the new child is traced, do_fork() adds the pending SIGSTOP.
    It assumes that either it is traced because of auto-attach or the
    tracer attached later, in both cases sigaddset/set_thread_flag is
    correct even if SIGSTOP is already pending.

    Now that we have PTRACE_SEIZE this is no longer right in the latter
    case. If the tracer does PTRACE_SEIZE after copy_process() makes the
    child visible the queued SIGSTOP is wrong.

    We could check PT_SEIZED bit and change ptrace_attach() to set both
    PT_PTRACED and PT_SEIZED bits simultaneously but see the next patch,
    we need to know whether this child was auto-attached or not anyway.

    So this patch simply moves this code to ptrace_init_task(), this
    way we can never race with ptrace_attach().

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • new_child->jobctl is not initialized during the fork, it is copied
    from parent->jobctl. Currently this is harmless, the forking task
    is running and copy_process() can't succeed if signal_pending() is
    true, so only JOBCTL_STOP_DEQUEUED can be copied. Still this is a
    bit fragile, it would be more clean to set ->jobctl = 0 explicitly.

    Also, check ->ptrace != 0 instead of PT_PTRACED, move the
    CONFIG_HAVE_HW_BREAKPOINT code up.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

28 Jun, 2011

1 commit

  • ptrace_reparented() naively does parent != real_parent, this means
    it returns true even if the tracer _is_ the real parent. This is per
    process thing, not per-thread. The only reason ->real_parent can
    point to the non-leader thread is that we have __WNOTHREAD.

    Change it to check !same_thread_group(parent, real_parent).

    It has two callers, and in both cases the current check does not
    look right.

    exit_notify: we should respect ->exit_signal if the exiting leader
    is traced by any thread from the parent thread group. It is the
    child of the whole group, and we are going to send the signal to
    the whole group.

    wait_task_zombie: without __WNOTHREAD do_wait() should do the same
    for any thread, only sys_ptrace() is "bound" to the single thread.
    However do_wait(WEXITED) succeeds but does not release a traced
    natural child unless the caller is the tracer.

    Test-case:

    void *tfunc(void *arg)
    {
    assert(ptrace(PTRACE_ATTACH, (long)arg, 0,0) == 0);
    pause();
    return NULL;
    }

    int main(void)
    {
    pthread_t thr;
    pid_t pid, stat, ret;

    pid = fork();
    if (!pid) {
    pause();
    assert(0);
    }

    assert(pthread_create(&thr, NULL, tfunc, (void*)(long)pid) == 0);

    assert(waitpid(-1, &stat, 0) == pid);
    assert(WIFSTOPPED(stat));

    kill(pid, SIGKILL);

    assert(waitpid(-1, &stat, 0) == pid);
    assert(WIFSIGNALED(stat) && WTERMSIG(stat) == SIGKILL);

    ret = waitpid(pid, &stat, 0);
    if (ret < 0)
    return 0;

    printf("WTF? %d is dead, but: wait=%d stat=%x\n",
    pid, ret, stat);

    return 1;
    }

    Note that the main thread simply does

    pid = fork();
    kill(pid, SIGKILL);

    and then without the patch wait4(WEXITED) succeeds twice and reports
    WTERMSIG(stat) == SIGKILL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

23 Jun, 2011

4 commits

  • tracehook.h is on the way out. Rename tracehook_tracer_task() to
    ptrace_parent() and move it from tracehook.h to ptrace.h.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: John Johansen
    Cc: Stephen Smalley
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • Move SIGTRAP on exec(2) logic from tracehook_report_exec() to
    ptrace_event(). This is part of changes to make ptrace_event()
    smarter and handle ptrace event related details in one place.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • This patch implements ptrace_event_enabled() which tests whether a
    given PTRACE_EVENT_* is enabled and use it to simplify ptrace_event()
    and tracehook_prepare_clone().

    PT_EVENT_FLAG() macro is added which calculates PT_TRACE_* flag from
    PTRACE_EVENT_*. This is used to define PT_TRACE_* flags and by
    ptrace_event_enabled() to find the matching flag.

    This is used to make ptrace_event() and tracehook_prepare_clone()
    simpler.

    * ptrace_event() callers were responsible for providing mask to test
    whether the event was enabled. This patch implements
    ptrace_event_enabled() and make ptrace_event() drop @mask and
    determine whether the event is enabled from @event. Note that
    @event is constant and this conversion doesn't add runtime overhead.

    All conversions except tracehook_report_clone_complete() are
    trivial. tracehook_report_clone_complete() used to use 0 for @mask
    (always enabled) but now tests whether the specified event is
    enabled. This doesn't cause any behavior difference as it's
    guaranteed that the event specified by @trace is enabled.

    * tracehook_prepare_clone() now only determines which event is
    applicable and use ptrace_event_enabled() for enable test.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task_ptrace(task) simply dereferences task->ptrace and isn't even used
    consistently only adding confusion. Kill it and directly access
    ->ptrace instead.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

17 Jun, 2011

3 commits

  • The previous patch implemented async notification for ptrace but it
    only worked while trace is running. This patch introduces
    PTRACE_LISTEN which is suggested by Oleg Nestrov.

    It's allowed iff tracee is in STOP trap and puts tracee into
    quasi-running state - tracee never really runs but wait(2) and
    ptrace(2) consider it to be running. While ptracer is listening,
    tracee is allowed to re-enter STOP to notify an async event.
    Listening state is cleared on the first notification. Ptracer can
    also clear it by issuing INTERRUPT - tracee will re-trap into STOP
    with listening state cleared.

    This allows ptracer to monitor group stop state without running tracee
    - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
    wait(2) to wait for the next group stop event. When it happens,
    PTRACE_GETSIGINFO provides information to determine the current state.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207
    #define PTRACE_LISTEN 0x4208

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    if (si.si_signo != SIGTRAP)
    ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
    else
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    This is identical to the program to test TRAP_NOTIFY except that
    tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
    This allows ptracer to monitor when group stop ends without running
    tracee.

    # ./test-listen
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
    task_stopped_code() as suggested by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently, there's no way to trap a running ptracee short of sending a
    signal which has various side effects. This patch implements
    PTRACE_INTERRUPT which traps ptracee without any signal or job control
    related side effect.

    The implementation is almost trivial. It uses the group stop trap -
    SIGTRAP | PTRACE_EVENT_STOP << 8. A new trap flag
    JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
    cleared when any trap happens. As INTERRUPT should be useable
    regardless of the current state of tracee, task_is_traced() test in
    ptrace_check_attach() is skipped for INTERRUPT.

    PTRACE_INTERRUPT is available iff tracee is attached with
    PTRACE_SEIZE.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive pid=%d\n", getpid());
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);

    printf("tracer: INTERRUPT and DETACH\n");
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    nanosleep(&ts3s, NULL);

    printf("tracer: exiting\n");
    kill(tracee, SIGKILL);
    return 0;
    }

    When called without argument, tracee is seized from running state,
    interrupted and then detached back to running state.

    # ./test-interrupt
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: INTERRUPT and DETACH
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: exiting

    When called with argument, tracee is seized from stopped state,
    continued, interrupted and then detached back to stopped state.

    # ./test-interrupt 1
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracer: INTERRUPT and DETACH
    tracer: exiting

    Before PTRACE_INTERRUPT, once the tracee was running, there was no way
    to trap tracee and do PTRACE_DETACH without causing side effect.

    -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
    up scheduling TRAP_STOP if child is dying which may make the
    child unkillable. Spotted by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
    effects on tracee signal and job control states. This patch
    implements a new ptrace request PTRACE_SEIZE which attaches a tracee
    without trapping it or affecting its signal and job control states.

    The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
    flags in @data. Currently, the only defined flag is
    PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
    PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
    The changes will be implemented gradually and the DEVEL flag is to
    prevent programs which expect full SEIZE behavior from using it before
    all the behavior modifications are complete while allowing unit
    testing. The flag will be removed once SEIZE behaviors are completely
    implemented.

    * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap. After
    attaching tracee continues to run unless a trap condition occurs.

    * PTRACE_SEIZE doesn't affect signal or group stop state.

    * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
    exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
    the stopping signals if group stop is in effect or SIGTRAP
    otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
    instead of NULL.

    Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
    used to determine whether new SEIZE behaviors should be enabled.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive\n");
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);
    printf("tracer: exiting\n");
    return 0;
    }

    When the above program is called w/o argument, tracee is seized while
    running and remains running. When tracer exits, tracee continues to
    run and print out messages.

    # ./test-seize-simple
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    tracee: alive
    tracee: alive

    When called with an argument, tracee is seized from stopped state and
    continued, and returns to stopped state when tracer exits.

    # ./test-seize
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    # ps -el|grep test-seize
    1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

    -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
    suggested.

    -v3: PTRACE_EVENT_STOP traps now report group stop state by signr. If
    group stop is in effect the stop signal number is returned as
    part of exit_code; otherwise, SIGTRAP. This was suggested by
    Denys and Oleg.

    Signed-off-by: Tejun Heo
    Cc: Jan Kratochvil
    Cc: Denys Vlasenko
    Cc: Oleg Nesterov

    Tejun Heo