Eric Lee / smarc-fsl-linux-kernel

06 Jan, 2017

3 commits

e71b4e061 ptrace: Don't allow accessing an undumpable mm ... Browse Code »

commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3 upstream.

It is the reasonable expectation that if an executable file is not
readable there will be no way for a user without special privileges to
read the file. This is enforced in ptrace_attach but if ptrace
is already attached before exec there is no enforcement for read-only
executables.

As the only way to read such an mm is through access_process_vm
spin a variant called ptrace_access_vm that will fail if the
target process is not being ptraced by the current process, or
the current process did not have sufficient privileges when ptracing
began to read the target processes mm.

In the ptrace implementations replace access_process_vm by
ptrace_access_vm. There remain several ptrace sites that still use
access_process_vm as they are reading the target executables
instructions (for kernel consumption) or register stacks. As such it
does not appear necessary to add a permission check to those calls.

This bug has always existed in Linux.

Fixes: v1.0
Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2017-01-06 17:40:13 +0800
e747b4ae3 ptrace: Capture the ptracer's creds not PT_PTRACE_CAP ... Browse Code »

commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked. This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in. This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid. More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2017-01-06 17:40:13 +0800
694a95fa6 mm: Add a user_ns owner to mm_struct and fix ptrace permission checks ... Browse Code »

commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream.

During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file. A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue. task->cred->user_ns is always the same
as or descendent of mm->user_ns. Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.

To prevent regressions mm->dumpable and mm->user_ns are not considered
when a task has no mm. As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc//stat

Acked-by: Kees Cook
Tested-by: Cyrill Gorcunov
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2017-01-06 17:40:13 +0800

19 Oct, 2016

1 commit

f307ab6dc mm: replace access_process_vm() write parameter with gup_flags ... Browse Code »

This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Acked-by: Michael Ellerman
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:31:25 +0800

12 Oct, 2016

1 commit

0a5bf409d ptrace: clear TIF_SYSCALL_TRACE on ptrace detach ... Browse Code »

On __ptrace_detach(), called from do_exit()->exit_notify()->
forget_original_parent()->exit_ptrace(), the TIF_SYSCALL_TRACE in
thread->flags of the tracee is not cleared up. This results in the
tracehook_report_syscall_* being called (though there's no longer a tracer
listening to that) upon its further syscalls.

Example scenario - attach "strace" to a running process and kill it (the
strace) with SIGKILL. You'll see that the syscall trace hooks are still
being called.

The clearing of this flag should be moved from ptrace_detach() to
__ptrace_detach().

Link: http://lkml.kernel.org/r/1472759493-20554-1-git-send-email-alnovak@suse.cz
Signed-off-by: Ales Novak
Acked-by: Oleg Nesterov
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ales Novak
2016-10-12 06:06:32 +0800

04 Aug, 2016

1 commit

97f2645f3 tree-wide: replace config_enabled() with IS_ENABLED() ... Browse Code »

The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]

- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Kees Cook
Cc: Stas Sergeev
Cc: Matt Redfearn
Cc: Joshua Kinard
Cc: Jiri Slaby
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Markos Chandras
Cc: "Dmitry V. Levin"
Cc: yu-cheng yu
Cc: James Hogan
Cc: Brian Gerst
Cc: Johannes Berg
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Will Drewry
Cc: Nikolay Martynov
Cc: Huacai Chen
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Daniel Borkmann
Cc: Leonid Yegoshin
Cc: Rafal Milecki
Cc: James Cowgill
Cc: Greg Kroah-Hartman
Cc: Ralf Baechle
Cc: Alex Smith
Cc: Adam Buchbinder
Cc: Qais Yousef
Cc: Jiang Liu
Cc: Mikko Rapeli
Cc: Paul Gortmaker
Cc: Denys Vlasenko
Cc: Brian Norris
Cc: Hidehiro Kawai
Cc: "Luis R. Rodriguez"
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: "Kirill A. Shutemov"
Cc: Roland McGrath
Cc: Paul Burton
Cc: Kalle Valo
Cc: Viresh Kumar
Cc: Tony Wu
Cc: Huaitong Han
Cc: Sumit Semwal
Cc: Alexei Starovoitov
Cc: Juergen Gross
Cc: Jason Cooper
Cc: "David S. Miller"
Cc: Oleg Nesterov
Cc: Andrea Gelmini
Cc: David Woodhouse
Cc: Marc Zyngier
Cc: Rabin Vincent
Cc: "Maciej W. Rozycki"
Cc: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2016-08-04 20:50:07 +0800

23 Mar, 2016

2 commits

1333ab031 ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock ... Browse Code »

This test-case (simplified version of generated by syzkaller)

#include
#include
#include

void test(void)
{
for (;;) {
if (fork()) {
wait(NULL);
continue;
}

ptrace(PTRACE_SEIZE, getppid(), 0, 0);
ptrace(PTRACE_INTERRUPT, getppid(), 0, 0);
_exit(0);
}
}

int main(void)
{
int np;

for (np = 0; np < 8; ++np)
if (!fork())
test();

while (wait(NULL) > 0)
;
return 0;
}

triggers the 2nd WARN_ON_ONCE(!signr) warning in do_jobctl_trap(). The
problem is that __ptrace_unlink() clears task->jobctl under siglock but
task->ptrace is cleared without this lock held; this fools the "else"
branch which assumes that !PT_SEIZED means PT_PTRACED.

Note also that most of other PTRACE_SEIZE checks can race with detach
from the exiting tracer too. Say, the callers of ptrace_trap_notify()
assume that SEIZED can't go away after it was checked.

Signed-off-by: Oleg Nesterov
Reported-by: Dmitry Vyukov
Cc: Tejun Heo
Cc: syzkaller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-03-23 06:36:02 +0800
5c465217a ptrace: in PEEK_SIGINFO, check syscall bitness, not task bitness ... Browse Code »

Users of the 32-bit ptrace() ABI expect the full 32-bit ABI. siginfo
translation should check ptrace() ABI, not caller task ABI.

This is an ABI change on SPARC. Let's hope that no one relied on the
old buggy ABI.

Signed-off-by: Andy Lutomirski
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-03-23 06:36:02 +0800

21 Jan, 2016

2 commits

caaee6234 ptrace: use fsuid, fsgid, effective creds for fs access checks ... Browse Code »

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Casey Schaufler
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Andy Shevchenko
Cc: Andy Lutomirski
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Willy Tarreau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2016-01-21 09:09:18 +0800
7c3b00e06 ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable ... Browse Code »

ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.

This doesn't really solve the problem(s) and we probably need to fix the
freezer. In particular, note that this means that pm freezer will fail if
it races attach-to-stopped-task.

And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
not clear if we really need to hide this transition from debugger, WNOHANG
after PTRACE_ATTACH can fail anyway if it races with SIGCONT.

Signed-off-by: Oleg Nesterov
Reported-by: Andrey Ryabinin
Cc: Roland McGrath
Acked-by: Tejun Heo
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-01-21 09:09:18 +0800

28 Oct, 2015

1 commit

f8e529ed9 seccomp, ptrace: add support for dumping seccomp filters ... Browse Code »

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
* check that the orig_prog exists (not necessary right now, but when
grows eBPF support it will be)
* s/n/filter_off and make it an unsigned long to match ptrace
* count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
* use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

v5: * add another condition to WARN_ON

v6: * rebase on net-next

Signed-off-by: Tycho Andersen
Acked-by: Kees Cook
CC: Will Drewry
Reviewed-by: Oleg Nesterov
CC: Andy Lutomirski
CC: Pavel Emelyanov
CC: Serge E. Hallyn
CC: Alexei Starovoitov
CC: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Tycho Andersen
2015-10-28 10:55:13 +0800

16 Jul, 2015

1 commit

13c4a9011 seccomp: add ptrace options for suspend/resume ... Browse Code »

This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.

One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.

This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.

Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.

v2 changes:

* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
as the flag to check. This means that as soon as the tracer
detaches/dies, seccomp is re-enabled and as a corrollary that one can not
disable seccomp across PTRACE_ATTACHs.

v3 changes:

* get rid of various #ifdefs everywhere
* report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
used

v4 changes:

* get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
directly

v5 changes:

* check that seccomp is not enabled (or suspended) on the tracer

Signed-off-by: Tycho Andersen
CC: Will Drewry
CC: Roland McGrath
CC: Pavel Emelyanov
CC: Serge E. Hallyn
Acked-by: Oleg Nesterov
Acked-by: Andy Lutomirski
[kees: access seccomp.mode through seccomp_mode() instead]
Signed-off-by: Kees Cook

Tycho Andersen
2015-07-16 02:52:52 +0800

17 Apr, 2015

2 commits

64a4096c5 ptrace: ptrace_detach() can no longer race with SIGKILL ... Browse Code »

ptrace_detach() re-checks ->ptrace under tasklist lock and calls
release_task() if __ptrace_detach() returns true. This was needed because
the __TASK_TRACED tracee could be killed/untraced, and it could even pass
exit_notify() before we take tasklist_lock.

But this is no longer possible after 9899d11f6544 "ptrace: ensure
arch_ptrace/ptrace_request can never race with SIGKILL". We can turn
these checks into WARN_ON() and remove release_task().

While at it, document the setting of child->exit_code.

Signed-off-by: Oleg Nesterov
Cc: Pavel Labath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-04-17 21:04:06 +0800
b72c18699 ptrace: fix race between ptrace_resume() and wait_task_stopped() ... Browse Code »

ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

#include
#include
#include
#include
#include
#include

int pid;

void *waiter(void *arg)
{
int stat;

for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;

assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}

int main(void)
{
pthread_t thread;

pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}

assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);

return 0;
}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov
Reported-by: Pavel Labath
Tested-by: Pavel Labath
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-04-17 21:04:06 +0800

18 Feb, 2015

1 commit

1cca3385e ptrace: remove linux/compat.h inclusion under CONFIG_COMPAT ... Browse Code »

Commit 84c751bd4aeb ("ptrace: add ability to retrieve signals without
removing from a queue (v4)") includes globally in
ptrace.c

This patch removes inclusion under if defined CONFIG_COMPAT.

Signed-off-by: Fabian Frederick
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2015-02-18 06:34:51 +0800

11 Dec, 2014

1 commit

7c8bd2322 exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent() ... Browse Code »

Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks,
we can simply pass "dead_children" list to exit_ptrace() and remove
another release_task() loop. Plus this way we do not need to drop and
reacquire tasklist_lock.

Also shift the list_empty(ptraced) check, if we want this optimization it
makes sense to eliminate the function call altogether.

Signed-off-by: Oleg Nesterov
Cc: Aaron Tomlin
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman" ,
Cc: Sterling Alexander
Cc: Peter Zijlstra
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-12-11 09:41:10 +0800

16 Jul, 2014

1 commit

743162013 sched: Remove proliferation of wait_on_bit() action functions ... Browse Code »

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown
Acked-by: David Howells (fscache, keys)
Acked-by: Steven Whitehouse (gfs2)
Acked-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: Steve French
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar

NeilBrown
2014-07-16 21:10:39 +0800

06 Mar, 2014

1 commit

62a6fa976 kernel/compat: convert to COMPAT_SYSCALL_DEFINE ... Browse Code »

Convert all compat system call functions where all parameter types
have a size of four or less than four bytes, or are pointer types
to COMPAT_SYSCALL_DEFINE.
The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
zero and sign extension to 64 bit of all parameters if needed.

Signed-off-by: Heiko Carstens

Heiko Carstens
2014-03-06 22:35:10 +0800

13 Nov, 2013

1 commit

d049f74f2 exec/ptrace: fix get_dumpable() incorrect tests ... Browse Code »

The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }

Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov
Signed-off-by: Kees Cook
Cc: "Luck, Tony"
Cc: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-11-13 11:09:33 +0800

12 Sep, 2013

1 commit

73af963f9 __ptrace_may_access() should not deny sub-threads ... Browse Code »

__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.

For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable. setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable). After that get_dumpable() fails.

(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)

Change __ptrace_may_access() to use same_thread_group() instead of "task
== current". Any security check is pointless when the tasks share the
same ->mm.

Signed-off-by: Mark Grondona
Signed-off-by: Ben Woodard
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Grondona
2013-09-12 06:59:01 +0800

07 Aug, 2013

1 commit

35114fcbe Revert "ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)" ... Browse Code »

This reverts commit fab840fc2d542fabcab903db8e03589a6702ba5f.

This commit even has the test-case to prove that the tracee
can be killed by SIGTRAP if the debugger does not remove the
breakpoints before PTRACE_DETACH.

However, this is exactly what wineserver deliberately does,
set_thread_context() calls PTRACE_ATTACH + PTRACE_DETACH just
for PTRACE_POKEUSER(DR*) in between.

So we should revert this fix and document that PTRACE_DETACH
should keep the breakpoints.

Reported-by: Felipe Contreras
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-08-07 04:16:32 +0800

10 Jul, 2013

2 commits

fab840fc2 ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child) ... Browse Code »

Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child). This
frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
ensures that the tracee won't be killed by SIGTRAP triggered by the
active breakpoints.

Test-case:

unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
{
unsigned long dr7;

dr7 = ((len | type) & 0xf)
<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
if (enable)
dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

return dr7;
}

int write_dr(int pid, int dr, unsigned long val)
{
return ptrace(PTRACE_POKEUSER, pid,
offsetof (struct user, u_debugreg[dr]),
val);
}

void func(void)
{
}

int main(void)
{
int pid, stat;
unsigned long dr7;

pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
kill(getpid(), SIGHUP);

func();
return 0x13;
}

assert(pid == waitpid(-1, &stat, 0));
assert(WSTOPSIG(stat) == SIGHUP);

assert(write_dr(pid, 0, (long)func) == 0);
dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
assert(write_dr(pid, 7, dr7) == 0);

assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
assert(pid == waitpid(-1, &stat, 0));
assert(stat == 0x1300);

return 0;
}

Before this patch the child is killed after PTRACE_DETACH.

Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-10 01:33:26 +0800
7c8df2863 ptrace: revert "Prepare to fix racy accesses on task breakpoints" ... Browse Code »

This reverts commit bf26c018490c ("Prepare to fix racy accesses on task
breakpoints").

The patch was fine but we can no longer race with SIGKILL after commit
9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
we can kill them and remove task->ptrace_bp_refcnt.

Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Acked-by: Michael Neuling
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-10 01:33:26 +0800

04 Jul, 2013

1 commit

29000caec ptrace: add ability to get/set signal-blocked mask ... Browse Code »

crtools uses a parasite code for dumping processes. The parasite code is
injected into a process with help PTRACE_SEIZE.

Currently crtools blocks signals from a parasite code. If a process has
pending signals, crtools wait while a process handles these signals.

This method is not suitable for stopped tasks. A stopped task can have a
few pending signals, when we will try to execute a parasite code, we will
need to drop SIGSTOP, but all other signals must remain pending, because a
state of processes must not be changed during checkpointing.

This patch adds two ptrace commands to set/get signal-blocked mask.

I think gdb can use this commands too.

[akpm@linux-foundation.org: be consistent with brace layout]
Signed-off-by: Andrey Vagin
Reviewed-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Michael Kerrisk
Cc: Pavel Emelyanov
Cc: Cyrill Gorcunov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2013-07-04 07:08:01 +0800

30 Jun, 2013

1 commit

706b23bde Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation ... Browse Code »

This __put_user() could be used by unprivileged processes to write into
kernel memory. The issue here is that even if copy_siginfo_to_user()
fails, the error code is not checked before __put_user() is executed.

Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
so it has not hit a stable release yet.

Signed-off-by: Mathieu Desnoyers
Acked-by: Oleg Nesterov
Cc: Andrey Vagin
Cc: Roland McGrath
Cc: Paul McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Pavel Emelyanov
Cc: Pedro Alves
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2013-06-30 02:29:08 +0800

08 May, 2013

1 commit

a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800

01 May, 2013

1 commit

84c751bd4 ptrace: add ability to retrieve signals without removing from a queue (v4) ... Browse Code »

This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

This request is used to retrieve information about pending signals
starting with the specified sequence number. Siginfo_t structures are
copied from the child into the buffer starting at "data".

The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
struct ptrace_peeksiginfo_args {
u64 off; /* from which siginfo to start */
u32 flags;
s32 nr; /* how may siginfos to take */
};

"nr" has type "s32", because ptrace() returns "long", which has 32 bits on
i386 and a negative values is used for errors.

Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
signals from process-wide queue. If this flag is not set, signals are
read from a per-thread queue.

The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a
signal with the specified sequence number doesn't exist, ptrace returns
zero. The request returns an error, if no signal has been dumped.

Errors:
EINVAL - one or more specified flags are not supported or nr is negative
EFAULT - buf or addr is outside your accessible address space.

A result siginfo contains a kernel part of si_code which usually striped,
but it's required for queuing the same siginfo back during restore of
pending signals.

This functionality is required for checkpointing pending signals. Pedro
Alves suggested using it in "gdb" to peek at pending signals. gdb already
uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
dequeued. This functionality allows gdb to look at the pending signals
which were not reported yet.

The prototype of this code was developed by Oleg Nesterov.

Signed-off-by: Andrew Vagin
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Cc: David Howells
Cc: Dave Jones
Cc: "Michael Kerrisk (man-pages)"
Cc: Pavel Emelyanov
Cc: Linus Torvalds
Cc: Pedro Alves
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2013-05-01 08:04:05 +0800

09 Feb, 2013

1 commit

e8440c145 uprobes: Add exports for module use ... Browse Code »

The original pull message for uprobes (commit 654443e2) noted:

This tree includes uprobes support in 'perf probe' - but SystemTap
(and other tools) can take advantage of user probe points as well.

In order to actually be usable in module-based tools like SystemTap, the
interface needs to be exported. This patch first adds the obvious
exports for uprobe_register and uprobe_unregister. Then it also adds
one for task_user_regset_view, which is necessary to get the correct
state of userspace registers.

Signed-off-by: Josh Stone
Signed-off-by: Oleg Nesterov

Josh Stone
2013-02-09 00:47:13 +0800

23 Jan, 2013

2 commits

9899d11f6 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL ... Browse Code »

putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi
Reported-by: Suleiman Souhlal
Suggested-by: Linus Torvalds
Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-01-23 02:08:00 +0800
910ffdb18 ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() ... Browse Code »

Cleanup and preparation for the next change.

signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.

Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.

This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-01-23 00:50:08 +0800

21 Jan, 2013

1 commit

edea0d03e ia64: kill thread_matches(), unexport ptrace_check_attach() ... Browse Code »

The ia64 function "thread_matches()" has no users since commit
e868a55c2a8c ("[IA64] remove find_thread_for_addr()"). Remove it.

This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-01-21 04:26:05 +0800

18 Dec, 2012

2 commits

848b81415 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:
"Incoming:

- lots of misc stuff

- backlight tree updates

- lib/ updates

- Oleg's percpu-rwsem changes

- checkpatch

- rtc

- aoe

- more checkpoint/restart support

I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton : (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc//fdinfo/ fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc//fdinfo/ output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...

Linus Torvalds
2012-12-18 12:58:12 +0800
992fb6e17 ptrace: introduce PTRACE_O_EXITKILL ... Browse Code »

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1. Because
currently all options have an event, and the new one starts the eventless
group. It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.

Signed-off-by: Oleg Nesterov
Tested-by: Amnon Shiloh
Cc: Denys Vlasenko
Cc: Michael Kerrisk
Cc: Serge Hallyn
Cc: Chris Evans
Cc: David Howells
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-12-18 09:15:22 +0800

20 Nov, 2012

1 commit

4c44aaafa userns: Kill task_user_ns ... Browse Code »

The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task. struct cred may go away as
soon as the rcu lock is released. This leads to a race where we
can dereference a stale user namespace pointer.

To make it obvious a struct cred is involved kill task_user_ns.

To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.

Cc: Kees Cook
Cc: James Morris
Acked-by: Kees Cook
Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-11-20 20:17:44 +0800

03 Aug, 2012

1 commit

9f99798ff ptrace: mark __ptrace_may_access() static ... Browse Code »

__ptrace_may_access() is used within only kernel/ptrace.c.

Signed-off-by: Tetsuo Handa
Signed-off-by: James Morris

Tetsuo Handa
2012-08-03 12:47:17 +0800

03 May, 2012

1 commit

5af662030 userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids ... Browse Code »

Update the permission checks to use the new uid_eq and gid_eq helpers
and remove the now unnecessary user_ns equality comparison.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-03 18:28:51 +0800

08 Apr, 2012

1 commit

c4a4d6037 userns: Use cred->user_ns instead of cred->user->user_ns ... Browse Code »

Optimize performance and prepare for the removal of the user_ns reference
from user_struct. Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-04-08 07:55:51 +0800

24 Mar, 2012

3 commits

ee00560c7 ptrace: remove PTRACE_SEIZE_DEVEL bit ... Browse Code »

PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.

Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Acked-by: Oleg Nesterov
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:41 +0800
aa9147c98 ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter ... Browse Code »

This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.

While we are at it:

Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter. To that end, error out if 'addr'
is non-zero. PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there... let's avoid
repeating this mistake with SEIZE.

Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
was probably ok (not a bug), but let's be on a safer side.

Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Cc: Pedro Alves
Reviewed-by: Oleg Nesterov
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:40 +0800
86b6c1f30 ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code ... Browse Code »

Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.

Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko
Acked-by: Tejun Heo
Reviewed-by: Oleg Nesterov
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denys Vlasenko
2012-03-24 07:58:40 +0800