Eric Lee / smarc-fsl-linux-kernel

18 Nov, 2020

1 commit

cf2370524 ptrace: Set PF_SUPERPRIV when checking capability ... Browse Code »

Commit 69f594a38967 ("ptrace: do not audit capability check when outputing
/proc/pid/stat") replaced the use of ns_capable() with
has_ns_capability{,_noaudit}() which doesn't set PF_SUPERPRIV.

Commit 6b3ad6649a4c ("ptrace: reintroduce usage of subjective credentials in
ptrace_has_cap()") replaced has_ns_capability{,_noaudit}() with
security_capable(), which doesn't set PF_SUPERPRIV neither.

Since commit 98f368e9e263 ("kernel: Add noaudit variant of ns_capable()"), a
new ns_capable_noaudit() helper is available. Let's use it!

As a result, the signature of ptrace_has_cap() is restored to its original one.

Cc: Christian Brauner
Cc: Eric Paris
Cc: Jann Horn
Cc: Kees Cook
Cc: Oleg Nesterov
Cc: Serge E. Hallyn
Cc: Tyler Hicks
Cc: stable@vger.kernel.org
Fixes: 6b3ad6649a4c ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()")
Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Mickaël Salaün
Reviewed-by: Jann Horn
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/20201030123849.770769-2-mic@digikod.net

Mickaël Salaün
2020-11-18 04:53:22 +0800

18 Jan, 2020

1 commit

6b3ad6649 ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() ... Browse Code »

Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to various
proc files since they are not violations of policy. While doing so it
somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. This
is wrong since. ptrace_has_cap() is currently only used in
ptrace_may_access() And is used to check whether the calling task (subject)
has the CAP_SYS_PTRACE capability in the provided user namespace to operate
on the target task (object). According to the cred.h comments this would
mean the subjective credentials of the calling task need to be used.
This switches ptrace_has_cap() to use security_capable(). Because we only
call ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling task's creds under rcu_read_lock() there's
no need to go through another series of dereferences and rcu locking done
in ns_capable{_noaudit}().

As one example where this might be particularly problematic, Jann pointed
out that in combination with the upcoming IORING_OP_OPENAT feature, this
bug might allow unprivileged users to bypass the capability checks while
asynchronously opening files like /proc/*/mem, because the capability
checks for this would be performed against kernel credentials.

To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials of the
caller. Later on, when it starts to do work it creates a kernel thread and
registers a callback. The callback runs with kernel creds for
ktask->real_cred and ktask->cred. To prevent this from becoming a
full-blown 0-day io_uring will call override_cred() and override
ktask->cred with the subjective credentials of the creator of the io_uring
instance. With ptrace_has_cap() currently looking at ktask->real_cred this
override will be ineffective and the caller will be able to open arbitray
proc files as mentioned above.
Luckily, this is currently not exploitable but will turn into a 0-day once
IORING_OP_OPENAT{2} land in v5.6. Fix it now!

Cc: Oleg Nesterov
Cc: Eric Paris
Cc: stable@vger.kernel.org
Reviewed-by: Kees Cook
Reviewed-by: Serge Hallyn
Reviewed-by: Jann Horn
Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner

Christian Brauner
2020-01-18 20:51:39 +0800

17 Jul, 2019

1 commit

201766a20 ptrace: add PTRACE_GET_SYSCALL_INFO request ... Browse Code »

PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.

There are two reasons for a special syscall-related ptrace request.

Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:

* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.

* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.

* Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.

Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
Retrieve information about the syscall that caused the stop.
The information is placed into the buffer pointed by "data"
argument, which should be a pointer to a buffer of type
"struct ptrace_syscall_info".
The "addr" argument contains the size of the buffer pointed to
by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
The return value contains the number of bytes available
to be written by the kernel.
If the size of data to be written by the kernel exceeds the size
specified by "addr" argument, the output is truncated.

[ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
Signed-off-by: Elvira Khabirova
Co-developed-by: Dmitry V. Levin
Signed-off-by: Dmitry V. Levin
Reviewed-by: Oleg Nesterov
Reviewed-by: Kees Cook
Reviewed-by: Andy Lutomirski
Cc: Eugene Syromyatnikov
Cc: Benjamin Herrenschmidt
Cc: Greentime Hu
Cc: Helge Deller [parisc]
Cc: James E.J. Bottomley
Cc: James Hogan
Cc: kbuild test robot
Cc: Michael Ellerman
Cc: Paul Burton
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: Richard Kuo
Cc: Shuah Khan
Cc: Vincent Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Elvira Khabirova
2019-07-17 10:23:24 +0800

09 Jul, 2019

1 commit

dfd437a25 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Catalin Marinas:

- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}

- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly

- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)

- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)

- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)

- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic

- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms

- perf: DDR performance monitor support for iMX8QXP

- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers

- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent

- arm64 do_page_fault() and hugetlb cleanups

- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)

- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)

- CONFIG_RANDOMIZE_BASE now enabled in defconfig

- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area

- Make ZONE_DMA32 configurable

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...

Linus Torvalds
2019-07-09 00:54:55 +0800

05 Jul, 2019

1 commit

6994eefb0 ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME ... Browse Code »

Fix two issues:

When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
reference to the parent's objective credentials, then give that pointer
to get_cred(). However, the object lifetime rules for things like
struct cred do not permit unconditionally turning an RCU reference into
a stable reference.

PTRACE_TRACEME records the parent's credentials as if the parent was
acting as the subject, but that's not the case. If a malicious
unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
at a later point, the parent process becomes attacker-controlled
(because it drops privileges and calls execve()), the attacker ends up
with control over two processes with a privileged ptrace relationship,
which can be abused to ptrace a suid binary and obtain root privileges.

Fix both of these by always recording the credentials of the process
that is requesting the creation of the ptrace relationship:
current_cred() can't change under us, and current is the proper subject
for access control.

This change is theoretically userspace-visible, but I am not aware of
any code that it will actually break.

Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: Jann Horn
Acked-by: Oleg Nesterov
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Jann Horn
2019-07-05 01:00:41 +0800

12 Jun, 2019

2 commits

aa7235483 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull ptrace fixes from Eric Biederman:
"This is just two very minor fixes:

- prevent ptrace from reading unitialized kernel memory found twice
by syzkaller

- restore a missing smp_rmb in ptrace_may_access and add comment tp
it so it is not removed by accident again.

Apologies for being a little slow about getting this to you, I am
still figuring out how to develop with a little baby in the house"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: restore smp_rmb() in __ptrace_may_access()
signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO

Linus Torvalds
2019-06-12 09:44:45 +0800
f6581f5b5 ptrace: restore smp_rmb() in __ptrace_may_access() ... Browse Code »

Restore the read memory barrier in __ptrace_may_access() that was deleted
a couple years ago. Also add comments on this barrier and the one it pairs
with to explain why they're there (as far as I understand).

Fixes: bfedb589252c ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks")
Cc: stable@vger.kernel.org
Acked-by: Kees Cook
Acked-by: Oleg Nesterov
Signed-off-by: Jann Horn
Signed-off-by: Eric W. Biederman

Jann Horn
2019-06-12 04:08:28 +0800

06 Jun, 2019

1 commit

15532fd6f ptrace: move clearing of TIF_SYSCALL_EMU flag to core ... Browse Code »

While the TIF_SYSCALL_EMU is set in ptrace_resume independent of any
architecture, currently only powerpc and x86 unset the TIF_SYSCALL_EMU
flag in ptrace_disable which gets called from ptrace_detach.

Let's move the clearing of TIF_SYSCALL_EMU flag to __ptrace_unlink
which gets executed from ptrace_detach and also keep it along with
or close to clearing of TIF_SYSCALL_TRACE.

Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Thomas Gleixner
Cc: Ingo Molnar
Acked-by: Oleg Nesterov
Signed-off-by: Sudeep Holla
Signed-off-by: Catalin Marinas

Sudeep Holla
2019-06-06 00:51:17 +0800

30 May, 2019

1 commit

f6e2aa91a signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO ... Browse Code »

Recently syzbot in conjunction with KMSAN reported that
ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
Inspecting ptrace_peek_siginfo confirms this.

The problem is that off when initialized from args.off can be
initialized to a negaive value. At which point the "if (off >= 0)"
test to see if off became negative fails because off started off
negative.

Prevent the core problem by adding a variable found that is only true
if a siginfo is found and copied to a temporary in preparation for
being copied to userspace.

Prevent args.off from being truncated when being assigned to off by
testing that off is
Cc: stable@vger.kernel.org
Reported-by: syzbot+0d602a1b0d8c95bdf299@syzkaller.appspotmail.com
Fixes: 84c751bd4aeb ("ptrace: add ability to retrieve signals without removing from a queue (v4)")
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-30 19:38:29 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

30 Mar, 2019

1 commit

fcfc2aa01 ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK ... Browse Code »

There are a few system calls (pselect, ppoll, etc) which replace a task
sigmask while they are running in a kernel-space

When a task calls one of these syscalls, the kernel saves a current
sigmask in task->saved_sigmask and sets a syscall sigmask.

On syscall-exit-stop, ptrace traps a task before restoring the
saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and
PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by
saved_sigmask, when the task returns to user-space.

This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask
if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag.

Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com
Fixes: 29000caecbe8 ("ptrace: add ability to get/set signal-blocked mask")
Signed-off-by: Andrei Vagin
Acked-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrei Vagin
2019-03-30 01:01:37 +0800

04 Jan, 2019

1 commit

96d4f267e Remove 'type' argument from access_ok() function ... Browse Code »

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

- csky still had the old "verify_area()" name as an alias.

- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)

- microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds

Linus Torvalds
2019-01-04 10:57:57 +0800

28 Nov, 2018

1 commit

46f7ecb1e ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS ... Browse Code »

The IBPB control code in x86 removed the usage. Remove the functionality
which was introduced for this.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de

Thomas Gleixner
2018-11-28 18:57:11 +0800

24 Oct, 2018

1 commit

ba9f6f895 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.

The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.

At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.

This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.

I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.

Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.

Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...

Linus Torvalds
2018-10-24 18:22:39 +0800

03 Oct, 2018

2 commits

ae7795bc6 signal: Distinguish between kernel_siginfo and siginfo ... Browse Code »

Linus recently observed that if we did not worry about the padding
member in struct siginfo it is only about 48 bytes, and 48 bytes is
much nicer than 128 bytes for allocating on the stack and copying
around in the kernel.

The obvious thing of only adding the padding when userspace is
including siginfo.h won't work as there are sigframe definitions in
the kernel that embed struct siginfo.

So split siginfo in two; kernel_siginfo and siginfo. Keeping the
traditional name for the userspace definition. While the version that
is used internally to the kernel and ultimately will not be padded to
128 bytes is called kernel_siginfo.

The definition of struct kernel_siginfo I have put in include/signal_types.h

A set of buildtime checks has been added to verify the two structures have
the same field offsets.

To make it easy to verify the change kernel_siginfo retains the same
size as siginfo. The reduction in size comes in a following change.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-10-03 22:47:43 +0800
4cd2e0e70 signal: Introduce copy_siginfo_from_user and use it's return value ... Browse Code »

In preparation for using a smaller version of siginfo in the kernel
introduce copy_siginfo_from_user and use it when siginfo is copied from
userspace.

Make the pattern for using copy_siginfo_from_user and
copy_siginfo_from_user32 to capture the return value and return that
value on error.

This is a necessary prerequisite for using a smaller siginfo
in the kernel than the kernel exports to userspace.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-10-03 22:47:15 +0800

26 Sep, 2018

1 commit

dbfe2953f x86/speculation: Apply IBPB more strictly to avoid cross-process data leak ... Browse Code »

Currently, IBPB is only issued in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security
sensitive' processess (such as GPG) from data leaking into a different
userspace process via spectre v2.

This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.

In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).

[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
fine-grained ]

Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: Tim Chen
Signed-off-by: Jiri Kosina
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: "WoodhouseDavid"
Cc: Andi Kleen
Cc: "SchauflerCasey"
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm

Jiri Kosina
2018-09-26 20:26:51 +0800

12 Sep, 2018

1 commit

079b22dc9 signal: Use SEND_SIG_PRIV not SEND_SIG_FORCED with SIGKILL and SIGSTOP ... Browse Code »

Now that siginfo is never allocated for SIGKILL and SIGSTOP there is
no difference between SEND_SIG_PRIV and SEND_SIG_FORCED for SIGKILL
and SIGSTOP. This makes SEND_SIG_FORCED unnecessary and redundant in
the presence of SIGKILL and SIGSTOP. Therefore change users of
SEND_SIG_FORCED that are sending SIGKILL or SIGSTOP to use
SEND_SIG_PRIV instead.

This removes the last users of SEND_SIG_FORCED.

Reviewed-by: Thomas Gleixner
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-09-12 03:19:35 +0800

07 Feb, 2018

1 commit

2ee082608 pids: introduce find_get_task_by_vpid() helper ... Browse Code »

There are several functions that do find_task_by_vpid() followed by
get_task_struct(). We can use a helper function instead.

Link: http://lkml.kernel.org/r/1509602027-11337-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2018-02-07 10:32:46 +0800

01 Feb, 2018

1 commit

3dbc4f548 Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull seccomp updates from James Morris:
"Add support for retrieving seccomp metadata"

* 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ptrace, seccomp: add support for retrieving seccomp metadata
seccomp: hoist out filter resolving logic

Linus Torvalds
2018-02-01 05:44:45 +0800

17 Jan, 2018

1 commit

0752d7bf6 ptrace: Use copy_siginfo in setsiginfo and getsiginfo ... Browse Code »

Now that copy_siginfo copies all of the fields this is safe, safer (as
all of the bits are guaranteed to be copied), clearer, and less error
prone than using a structure copy.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-01-17 02:48:30 +0800

16 Jan, 2018

1 commit

eb5346c37 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 ... Browse Code »

The new unified copy_siginfo_from_user32 takes care of this.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-01-16 08:01:19 +0800

29 Nov, 2017

1 commit

26500475a ptrace, seccomp: add support for retrieving seccomp metadata ... Browse Code »

With the new SECCOMP_FILTER_FLAG_LOG, we need to be able to extract these
flags for checkpoint restore, since they describe the state of a filter.

So, let's add PTRACE_SECCOMP_GET_METADATA, similar to ..._GET_FILTER, which
returns the metadata of the nth filter (right now, just the flags).
Hopefully this will be future proof, and new per-filter metadata can be
added to this struct.

Signed-off-by: Tycho Andersen
CC: Kees Cook
CC: Andy Lutomirski
CC: Oleg Nesterov
Signed-off-by: Kees Cook

Tycho Andersen
2017-11-29 07:41:01 +0800

25 Jul, 2017

1 commit

cc731525f signal: Remove kernel interal si_code magic ... Browse Code »

struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS

While this looks plausible on the surface, in practice this situation has
not worked well.

- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.

- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.

- These kernel internal values leaked to userspace via ptrace_peek_siginfo

- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.

- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.

- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.

- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.

So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.

To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.

A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.

Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.

Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.

Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.

The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.

The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.

Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2017-07-25 03:30:28 +0800

23 May, 2017

1 commit

c70d9d809 ptrace: Properly initialize ptracer_cred on fork ... Browse Code »

When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default. This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork. Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai
Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2017-05-23 20:40:44 +0800

08 Apr, 2017

1 commit

5402e97af ptrace: fix PTRACE_LISTEN race corrupting task->state ... Browse Code »

In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.

Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall
Acked-by: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

bsegall@google.com
2017-04-08 15:47:48 +0800

02 Mar, 2017

3 commits

299300258 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:35 +0800
f7ccbae45 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:28 +0800
6e84f3152 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:28 +0800

23 Nov, 2016

3 commits

84d77d3f0 ptrace: Don't allow accessing an undumpable mm ... Browse Code »

It is the reasonable expectation that if an executable file is not
readable there will be no way for a user without special privileges to
read the file. This is enforced in ptrace_attach but if ptrace
is already attached before exec there is no enforcement for read-only
executables.

As the only way to read such an mm is through access_process_vm
spin a variant called ptrace_access_vm that will fail if the
target process is not being ptraced by the current process, or
the current process did not have sufficient privileges when ptracing
began to read the target processes mm.

In the ptrace implementations replace access_process_vm by
ptrace_access_vm. There remain several ptrace sites that still use
access_process_vm as they are reading the target executables
instructions (for kernel consumption) or register stacks. As such it
does not appear necessary to add a permission check to those calls.

This bug has always existed in Linux.

Fixes: v1.0
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2016-11-23 02:57:38 +0800
64b875f7a ptrace: Capture the ptracer's creds not PT_PTRACE_CAP ... Browse Code »

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked. This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in. This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid. More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Cc: stable@vger.kernel.org
Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2016-11-23 01:49:49 +0800
bfedb5892 mm: Add a user_ns owner to mm_struct and fix ptrace permission checks ... Browse Code »

During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file. A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue. task->cred->user_ns is always the same
as or descendent of mm->user_ns. Which guarantees that having
CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
credentials.

To prevent regressions mm->dumpable and mm->user_ns are not considered
when a task has no mm. As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc//stat

Cc: stable@vger.kernel.org
Acked-by: Kees Cook
Tested-by: Cyrill Gorcunov
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2016-11-23 01:49:48 +0800

19 Oct, 2016

1 commit

f307ab6dc mm: replace access_process_vm() write parameter with gup_flags ... Browse Code »

This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Acked-by: Jesper Nilsson
Acked-by: Michal Hocko
Acked-by: Michael Ellerman
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 23:31:25 +0800

12 Oct, 2016

1 commit

0a5bf409d ptrace: clear TIF_SYSCALL_TRACE on ptrace detach ... Browse Code »

On __ptrace_detach(), called from do_exit()->exit_notify()->
forget_original_parent()->exit_ptrace(), the TIF_SYSCALL_TRACE in
thread->flags of the tracee is not cleared up. This results in the
tracehook_report_syscall_* being called (though there's no longer a tracer
listening to that) upon its further syscalls.

Example scenario - attach "strace" to a running process and kill it (the
strace) with SIGKILL. You'll see that the syscall trace hooks are still
being called.

The clearing of this flag should be moved from ptrace_detach() to
__ptrace_detach().

Link: http://lkml.kernel.org/r/1472759493-20554-1-git-send-email-alnovak@suse.cz
Signed-off-by: Ales Novak
Acked-by: Oleg Nesterov
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ales Novak
2016-10-12 06:06:32 +0800

04 Aug, 2016

1 commit

97f2645f3 tree-wide: replace config_enabled() with IS_ENABLED() ... Browse Code »

The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]

- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Kees Cook
Cc: Stas Sergeev
Cc: Matt Redfearn
Cc: Joshua Kinard
Cc: Jiri Slaby
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Markos Chandras
Cc: "Dmitry V. Levin"
Cc: yu-cheng yu
Cc: James Hogan
Cc: Brian Gerst
Cc: Johannes Berg
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Will Drewry
Cc: Nikolay Martynov
Cc: Huacai Chen
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Daniel Borkmann
Cc: Leonid Yegoshin
Cc: Rafal Milecki
Cc: James Cowgill
Cc: Greg Kroah-Hartman
Cc: Ralf Baechle
Cc: Alex Smith
Cc: Adam Buchbinder
Cc: Qais Yousef
Cc: Jiang Liu
Cc: Mikko Rapeli
Cc: Paul Gortmaker
Cc: Denys Vlasenko
Cc: Brian Norris
Cc: Hidehiro Kawai
Cc: "Luis R. Rodriguez"
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: "Kirill A. Shutemov"
Cc: Roland McGrath
Cc: Paul Burton
Cc: Kalle Valo
Cc: Viresh Kumar
Cc: Tony Wu
Cc: Huaitong Han
Cc: Sumit Semwal
Cc: Alexei Starovoitov
Cc: Juergen Gross
Cc: Jason Cooper
Cc: "David S. Miller"
Cc: Oleg Nesterov
Cc: Andrea Gelmini
Cc: David Woodhouse
Cc: Marc Zyngier
Cc: Rabin Vincent
Cc: "Maciej W. Rozycki"
Cc: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2016-08-04 20:50:07 +0800

23 Mar, 2016

2 commits

1333ab031 ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock ... Browse Code »

This test-case (simplified version of generated by syzkaller)

#include
#include
#include

void test(void)
{
for (;;) {
if (fork()) {
wait(NULL);
continue;
}

ptrace(PTRACE_SEIZE, getppid(), 0, 0);
ptrace(PTRACE_INTERRUPT, getppid(), 0, 0);
_exit(0);
}
}

int main(void)
{
int np;

for (np = 0; np < 8; ++np)
if (!fork())
test();

while (wait(NULL) > 0)
;
return 0;
}

triggers the 2nd WARN_ON_ONCE(!signr) warning in do_jobctl_trap(). The
problem is that __ptrace_unlink() clears task->jobctl under siglock but
task->ptrace is cleared without this lock held; this fools the "else"
branch which assumes that !PT_SEIZED means PT_PTRACED.

Note also that most of other PTRACE_SEIZE checks can race with detach
from the exiting tracer too. Say, the callers of ptrace_trap_notify()
assume that SEIZED can't go away after it was checked.

Signed-off-by: Oleg Nesterov
Reported-by: Dmitry Vyukov
Cc: Tejun Heo
Cc: syzkaller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-03-23 06:36:02 +0800
5c465217a ptrace: in PEEK_SIGINFO, check syscall bitness, not task bitness ... Browse Code »

Users of the 32-bit ptrace() ABI expect the full 32-bit ABI. siginfo
translation should check ptrace() ABI, not caller task ABI.

This is an ABI change on SPARC. Let's hope that no one relied on the
old buggy ABI.

Signed-off-by: Andy Lutomirski
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-03-23 06:36:02 +0800

21 Jan, 2016

2 commits

caaee6234 ptrace: use fsuid, fsgid, effective creds for fs access checks ... Browse Code »

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Casey Schaufler
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Andy Shevchenko
Cc: Andy Lutomirski
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Willy Tarreau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2016-01-21 09:09:18 +0800
7c3b00e06 ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable ... Browse Code »

ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.

This doesn't really solve the problem(s) and we probably need to fix the
freezer. In particular, note that this means that pm freezer will fail if
it races attach-to-stopped-task.

And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
not clear if we really need to hide this transition from debugger, WNOHANG
after PTRACE_ATTACH can fail anyway if it races with SIGCONT.

Signed-off-by: Oleg Nesterov
Reported-by: Andrey Ryabinin
Cc: Roland McGrath
Acked-by: Tejun Heo
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2016-01-21 09:09:18 +0800

28 Oct, 2015

1 commit

f8e529ed9 seccomp, ptrace: add support for dumping seccomp filters ... Browse Code »

This patch adds support for dumping a process' (classic BPF) seccomp
filters via ptrace.

PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
seccomp filters. addr should be an integer which represents the ith seccomp
filter (0 is the most recently installed filter). data should be a struct
sock_filter * with enough room for the ith filter, or NULL, in which case
the filter is not saved. The return value for this command is the number of
BPF instructions the program represents, or negative in the case of errors.
Command specific errors are ENOENT: which indicates that there is no ith
filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
filter was not installed as a classic BPF filter.

A caveat with this approach is that there is no way to get explicitly at
the heirarchy of seccomp filters, and users need to memcmp() filters to
decide which are inherited. This means that a task which installs two of
the same filter can potentially confuse users of this interface.

v2: * make save_orig const
* check that the orig_prog exists (not necessary right now, but when
grows eBPF support it will be)
* s/n/filter_off and make it an unsigned long to match ptrace
* count "down" the tree instead of "up" when passing a filter offset

v3: * don't take the current task's lock for inspecting its seccomp mode
* use a 0x42** constant for the ptrace command value

v4: * don't copy to userspace while holding spinlocks

v5: * add another condition to WARN_ON

v6: * rebase on net-next

Signed-off-by: Tycho Andersen
Acked-by: Kees Cook
CC: Will Drewry
Reviewed-by: Oleg Nesterov
CC: Andy Lutomirski
CC: Pavel Emelyanov
CC: Serge E. Hallyn
CC: Alexei Starovoitov
CC: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Tycho Andersen
2015-10-28 10:55:13 +0800