Eric Lee / smarc-fsl-linux-kernel

22 Dec, 2020

3 commits

6212e56b3 UPSTREAM: seccomp: Remove bogus __user annotations ... Browse Code »

Buffers that are passed to read_actions_logged() and write_actions_logged()
are in kernel memory; the sysctl core takes care of copying from/to
userspace.

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Reviewed-by: Tyler Hicks
Signed-off-by: Jann Horn
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/20201120170545.1419332-1-jannh@google.com
(cherry picked from commit fab686eb0307121e7a2890b6d6c57edd2457863d)
Signed-off-by: Jeff Vander Stoep
Change-Id: I378d1eb73ff01928d3255191ed4443cd7b87c720
Bug: 176068146

Jann Horn
2020-12-22 02:48:41 +0800
2d660e977 UPSTREAM: seccomp/cache: Add "emulator" to check if filter is constant allow ... Browse Code »

SECCOMP_CACHE will only operate on syscalls that do not access
any syscall arguments or instruction pointer. To facilitate
this we need a static analyser to know whether a filter will
return allow regardless of syscall arguments for a given
architecture number / syscall number pair. This is implemented
here with a pseudo-emulator, and stored in a per-filter bitmap.

In order to build this bitmap at filter attach time, each filter is
emulated for every syscall (under each possible architecture), and
checked for any accesses of struct seccomp_data that are not the "arch"
nor "nr" (syscall) members. If only "arch" and "nr" are examined, and
the program returns allow, then we can be sure that the filter must
return allow independent from syscall arguments.

Nearly all seccomp filters are built from these cBPF instructions:

BPF_LD | BPF_W | BPF_ABS
BPF_JMP | BPF_JEQ | BPF_K
BPF_JMP | BPF_JGE | BPF_K
BPF_JMP | BPF_JGT | BPF_K
BPF_JMP | BPF_JSET | BPF_K
BPF_JMP | BPF_JA
BPF_RET | BPF_K
BPF_ALU | BPF_AND | BPF_K

Each of these instructions are emulated. Any weirdness or loading
from a syscall argument will cause the emulator to bail.

The emulation is also halted if it reaches a return. In that case,
if it returns an SECCOMP_RET_ALLOW, the syscall is marked as good.

Emulator structure and comments are from Kees [1] and Jann [2].

Emulation is done at attach time. If a filter depends on more
filters, and if the dependee does not guarantee to allow the
syscall, then we skip the emulation of this syscall.

[1] https://lore.kernel.org/lkml/20200923232923.3142503-5-keescook@chromium.org/
[2] https://lore.kernel.org/lkml/CAG48ez1p=dR_2ikKq=xVxkoGg0fYpTBpkhJSv1w-6BG=76PAvw@mail.gmail.com/

Suggested-by: Jann Horn
Signed-off-by: YiFei Zhu
Reviewed-by: Jann Horn
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/71c7be2db5ee08905f41c3be5c1ad6e2601ce88f.1602431034.git.yifeifz2@illinois.edu
(cherry picked from commit 8e01b51a31a1e08e2c3e8fcc0ef6790441be2f61)
Signed-off-by: Jeff Vander Stoep
Change-Id: I5047f7f0d6502e5de6c047743f1053fda3025a6e
Bug: 176068146

YiFei Zhu
2020-12-22 02:46:50 +0800
f89fef0ee UPSTREAM: seccomp/cache: Lookup syscall allowlist bitmap for fast path ... Browse Code »

The overhead of running Seccomp filters has been part of some past
discussions [1][2][3]. Oftentimes, the filters have a large number
of instructions that check syscall numbers one by one and jump based
on that. Some users chain BPF filters which further enlarge the
overhead. A recent work [6] comprehensively measures the Seccomp
overhead and shows that the overhead is non-negligible and has a
non-trivial impact on application performance.

We observed some common filters, such as docker's [4] or
systemd's [5], will make most decisions based only on the syscall
numbers, and as past discussions considered, a bitmap where each bit
represents a syscall makes most sense for these filters.

The fast (common) path for seccomp should be that the filter permits
the syscall to pass through, and failing seccomp is expected to be
an exceptional case; it is not expected for userspace to call a
denylisted syscall over and over.

When it can be concluded that an allow must occur for the given
architecture and syscall pair (this determination is introduced in
the next commit), seccomp will immediately allow the syscall,
bypassing further BPF execution.

Each architecture number has its own bitmap. The architecture
number in seccomp_data is checked against the defined architecture
number constant before proceeding to test the bit against the
bitmap with the syscall number as the index of the bit in the
bitmap, and if the bit is set, seccomp returns allow. The bitmaps
are all clear in this patch and will be initialized in the next
commit.

When only one architecture exists, the check against architecture
number is skipped, suggested by Kees Cook [7].

[1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/
[2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/
[3] https://github.com/seccomp/libseccomp/issues/116
[4] https://github.com/moby/moby/blob/ae0ef82b90356ac613f329a8ef5ee42ca923417d/profiles/seccomp/default.json
[5] https://github.com/systemd/systemd/blob/6743a1caf4037f03dc51a1277855018e4ab61957/src/shared/seccomp-util.c#L270
[6] Draco: Architectural and Operating System Support for System Call Security
https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020
[7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/

Co-developed-by: Dimitrios Skarlatos
Signed-off-by: Dimitrios Skarlatos
Signed-off-by: YiFei Zhu
Reviewed-by: Jann Horn
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu
(cherry picked from commit f9d480b6ffbeb336bf7f6ce44825c00f61b3abae)A
Signed-off-by: Jeff Vander Stoep
Change-Id: I50b6682e17dc6e91b5e92017361200d722282825
Bug: 176068146

YiFei Zhu
2020-12-22 02:46:39 +0800

18 Nov, 2020

1 commit

fb14528e4 seccomp: Set PF_SUPERPRIV when checking capability ... Browse Code »

Replace the use of security_capable(current_cred(), ...) with
ns_capable_noaudit() which set PF_SUPERPRIV.

Since commit 98f368e9e263 ("kernel: Add noaudit variant of
ns_capable()"), a new ns_capable_noaudit() helper is available. Let's
use it!

Cc: Jann Horn
Cc: Kees Cook
Cc: Tyler Hicks
Cc: Will Drewry
Cc: stable@vger.kernel.org
Fixes: e2cfabdfd075 ("seccomp: add system call filtering using BPF")
Signed-off-by: Mickaël Salaün
Reviewed-by: Jann Horn
Signed-off-by: Kees Cook
Link: https://lore.kernel.org/r/20201030123849.770769-3-mic@digikod.net

Mickaël Salaün
2020-11-18 04:53:22 +0800

09 Oct, 2020

1 commit

dfe719fef seccomp: Make duplicate listener detection non-racy ... Browse Code »

Currently, init_listener() tries to prevent adding a filter with
SECCOMP_FILTER_FLAG_NEW_LISTENER if one of the existing filters already
has a listener. However, this check happens without holding any lock that
would prevent another thread from concurrently installing a new filter
(potentially with a listener) on top of the ones we already have.

Theoretically, this is also a data race: The plain load from
current->seccomp.filter can race with concurrent writes to the same
location.

Fix it by moving the check into the region that holds the siglock to guard
against concurrent TSYNC.

(The "Fixes" tag points to the commit that introduced the theoretical
data race; concurrent installation of another filter with TSYNC only
became possible later, in commit 51891498f2da ("seccomp: allow TSYNC and
USER_NOTIF together").)

Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Reviewed-by: Tycho Andersen
Signed-off-by: Jann Horn
Signed-off-by: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201005014401.490175-1-jannh@google.com

Jann Horn
2020-10-09 04:17:47 +0800

09 Sep, 2020

4 commits

2d9ca267a seccomp: Use current_pt_regs() instead of task_pt_regs(current) ... Browse Code »

As described in commit a3460a59747c ("new helper: current_pt_regs()"):
- arch versions are "optimized versions".
- some architectures have task_pt_regs() working only for traced tasks
blocked on signal delivery. current_pt_regs() needs to work for *all*
processes.

In preparation for adding a coccinelle rule for using current_*(), instead
of raw accesses to current members, modify seccomp_do_user_notification(),
__seccomp_filter(), __secure_computing() to use current_pt_regs().

Signed-off-by: Denis Efremov
Link: https://lore.kernel.org/r/20200824125921.488311-1-efremov@linux.com
[kees: Reworded commit log, add comment to populate_seccomp_data()]
Signed-off-by: Kees Cook

Denis Efremov
2020-09-09 07:26:45 +0800
4d671d922 seccomp: kill process instead of thread for unknown actions ... Browse Code »

Asynchronous termination of a thread outside of the userspace thread
library's knowledge is an unsafe operation that leaves the process in
an inconsistent, corrupt, and possibly unrecoverable state. In order
to make new actions that may be added in the future safe on kernels
not aware of them, change the default action from
SECCOMP_RET_KILL_THREAD to SECCOMP_RET_KILL_PROCESS.

Signed-off-by: Rich Felker
Link: https://lore.kernel.org/r/20200829015609.GA32566@brightrain.aerifal.cx
[kees: Fixed up coredump selection logic to match]
Signed-off-by: Kees Cook

Rich Felker
2020-09-09 03:00:49 +0800
e83931790 seccomp: don't leave dangling ->notif if file allocation fails ... Browse Code »

Christian and Kees both pointed out that this is a bit sloppy to open-code
both places, and Christian points out that we leave a dangling pointer to
->notif if file allocation fails. Since we check ->notif for null in order
to determine if it's ok to install a filter, this means people won't be
able to install a filter if the file allocation fails for some reason, even
if they subsequently should be able to.

To fix this, let's hoist this free+null into its own little helper and use
it.

Reported-by: Kees Cook
Reported-by: Christian Brauner
Signed-off-by: Tycho Andersen
Acked-by: Christian Brauner
Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
Signed-off-by: Kees Cook

Tycho Andersen
2020-09-09 02:30:16 +0800
a566a9012 seccomp: don't leak memory when filter install races ... Browse Code »

In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
the listener fd, then check to see if we can actually use it later in
seccomp_may_assign_mode(), which can fail if anyone else in our thread
group has installed a filter and caused some divergence. If we can't, we
partially clean up the newly allocated file: we put the fd, put the file,
but don't actually clean up the *memory* that was allocated at
filter->notif. Let's clean that up too.

To accomplish this, let's hoist the actual "detach a notifier from a
filter" code to its own helper out of seccomp_notify_release(), so that in
case anyone adds stuff to init_listener(), they only have to add the
cleanup code in one spot. This does a bit of extra locking and such on the
failure path when the filter is not attached, but it's a slow failure path
anyway.

Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together")
Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
Signed-off-by: Tycho Andersen
Acked-by: Christian Brauner
Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
Signed-off-by: Kees Cook

Tycho Andersen
2020-09-09 02:19:50 +0800

15 Jul, 2020

1 commit

7cf97b125 seccomp: Introduce addfd ioctl to seccomp user notifier ... Browse Code »

The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over
an fd. It is often used in settings where a supervising task emulates
syscalls on behalf of a supervised task in userspace, either to further
restrict the supervisee's syscall abilities or to circumvent kernel
enforced restrictions the supervisor deems safe to lift (e.g. actually
performing a mount(2) for an unprivileged container).

While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall,
only a certain subset of syscalls could be correctly emulated. Over the
last few development cycles, the set of syscalls which can't be emulated
has been reduced due to the addition of pidfd_getfd(2). With this we are
now able to, for example, intercept syscalls that require the supervisor
to operate on file descriptors of the supervisee such as connect(2).

However, syscalls that cause new file descriptors to be installed can not
currently be correctly emulated since there is no way for the supervisor
to inject file descriptors into the supervisee. This patch adds a
new addfd ioctl to remove this restriction by allowing the supervisor to
install file descriptors into the intercepted task. By implementing this
feature via seccomp the supervisor effectively instructs the supervisee
to install a set of file descriptors into its own file descriptor table
during the intercepted syscall. This way it is possible to intercept
syscalls such as open() or accept(), and install (or replace, like
dup2(2)) the supervisor's resulting fd into the supervisee. One
replacement use-case would be to redirect the stdout and stderr of a
supervisee into log file descriptors opened by the supervisor.

The ioctl handling is based on the discussions[1] of how Extensible
Arguments should interact with ioctls. Instead of building size into
the addfd structure, make it a function of the ioctl command (which
is how sizes are normally passed to ioctls). To support forward and
backward compatibility, just mask out the direction and size, and match
everything. The size (and any future direction) checks are done along
with copy_struct_from_user() logic.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues
with uapi highlighted before[2][3]. Although we could overload the
newfd field and use -1 to indicate that it is not to be used, doing
so requires changing the size of the fd field, and introduces struct
packing complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
[2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
[3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal

Cc: Christoph Hellwig
Cc: Christian Brauner
Cc: Tycho Andersen
Cc: Jann Horn
Cc: Robert Sesek
Cc: Chris Palmer
Cc: Al Viro
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Suggested-by: Matt Denton
Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
Signed-off-by: Sargun Dhillon
Reviewed-by: Will Drewry
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook

Sargun Dhillon
2020-07-15 07:29:42 +0800

11 Jul, 2020

9 commits

fe4bfff86 seccomp: Use -1 marker for end of mode 1 syscall list ... Browse Code »

The terminator for the mode 1 syscalls list was a 0, but that could be
a valid syscall number (e.g. x86_64 __NR_read). By luck, __NR_read was
listed first and the loop construct would not test it, so there was no
bug. However, this is fragile. Replace the terminator with -1 instead,
and make the variable name for mode 1 syscall lists more descriptive.

Cc: Andy Lutomirski
Cc: Will Drewry
Signed-off-by: Kees Cook

Kees Cook
2020-07-11 07:01:52 +0800
47e33c05f seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID ... Browse Code »

When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.

Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook

Kees Cook
2020-07-11 07:01:52 +0800
e68f9d49d seccomp: Use pr_fmt ... Browse Code »

Avoid open-coding "seccomp: " prefixes for pr_*() calls.

Signed-off-by: Kees Cook

Kees Cook
2020-07-11 07:01:52 +0800
99cdb8b9a seccomp: notify about unused filter ... Browse Code »

We've been making heavy use of the seccomp notifier to intercept and
handle certain syscalls for containers. This patch allows a syscall
supervisor listening on a given notifier to be notified when a seccomp
filter has become unused.

A container is often managed by a singleton supervisor process the
so-called "monitor". This monitor process has an event loop which has
various event handlers registered. If the user specified a seccomp
profile that included a notifier for various syscalls then we also
register a seccomp notify even handler. For any container using a
separate pid namespace the lifecycle of the seccomp notifier is bound to
the init process of the pid namespace, i.e. when the init process exits
the filter must be unused.

If a new process attaches to a container we force it to assume a seccomp
profile. This can either be the same seccomp profile as the container
was started with or a modified one. If the attaching process makes use
of the seccomp notifier we will register a new seccomp notifier handler
in the monitor's event loop. However, when the attaching process exits
we can't simply delete the handler since other child processes could've
been created (daemons spawned etc.) that have inherited the seccomp
filter and so we need to keep the seccomp notifier fd alive in the event
loop. But this is problematic since we don't get a notification when the
seccomp filter has become unused and so we currently never remove the
seccomp notifier fd from the event loop and just keep accumulating fds
in the event loop. We've had this issue for a while but it has recently
become more pressing as more and larger users make use of this.

To fix this, we introduce a new "users" reference counter that tracks any
tasks and dependent filters making use of a filter. When a notifier is
registered waiting tasks will be notified that the filter is now empty
by receiving a (E)POLLHUP event.

The concept in this patch introduces is the same as for signal_struct,
i.e. reference counting for life-cycle management is decoupled from
reference counting taks using the object. There's probably some trickery
possible but the second counter is just the correct way of doing this
IMHO and has precedence.

Cc: Tycho Andersen
Cc: Kees Cook
Cc: Matt Denton
Cc: Sargun Dhillon
Cc: Jann Horn
Cc: Chris Palmer
Cc: Aleksa Sarai
Cc: Robert Sesek
Cc: Jeffrey Vander Stoep
Cc: Linux Containers
Signed-off-by: Christian Brauner
Link: https://lore.kernel.org/r/20200531115031.391515-3-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook

Christian Brauner
2020-07-11 07:01:51 +0800
76194c4e8 seccomp: Lift wait_queue into struct seccomp_filter ... Browse Code »

Lift the wait_queue from struct notification into struct seccomp_filter.
This is cleaner overall and lets us avoid having to take the notifier
mutex in the future for EPOLLHUP notifications since we need to neither
read nor modify the notifier specific aspects of the seccomp filter. In
the exit path I'd very much like to avoid having to take the notifier mutex
for each filter in the task's filter hierarchy.

Cc: Tycho Andersen
Cc: Kees Cook
Cc: Matt Denton
Cc: Sargun Dhillon
Cc: Jann Horn
Cc: Chris Palmer
Cc: Aleksa Sarai
Cc: Robert Sesek
Cc: Jeffrey Vander Stoep
Cc: Linux Containers
Signed-off-by: Christian Brauner
Signed-off-by: Kees Cook

Christian Brauner
2020-07-11 07:01:51 +0800
3a15fb6ed seccomp: release filter after task is fully dead ... Browse Code »

The seccomp filter used to be released in free_task() which is called
asynchronously via call_rcu() and assorted mechanisms. Since we need
to inform tasks waiting on the seccomp notifier when a filter goes empty
we will notify them as soon as a task has been marked fully dead in
release_task(). To not split seccomp cleanup into two parts, move
filter release out of free_task() and into release_task() after we've
unhashed struct task from struct pid, exited signals, and unlinked it
from the threadgroups' thread list. We'll put the empty filter
notification infrastructure into it in a follow up patch.

This also renames put_seccomp_filter() to seccomp_filter_release() which
is a more descriptive name of what we're doing here especially once
we've added the empty filter notification mechanism in there.

We're also NULL-ing the task's filter tree entrypoint which seems
cleaner than leaving a dangling pointer in there. Note that this shouldn't
need any memory barriers since we're calling this when the task is in
release_task() which means it's EXIT_DEAD. So it can't modify its seccomp
filters anymore. You can also see this from the point where we're calling
seccomp_filter_release(). It's after __exit_signal() and at this point,
tsk->sighand will already have been NULLed which is required for
thread-sync and filter installation alike.

Cc: Tycho Andersen
Cc: Kees Cook
Cc: Matt Denton
Cc: Sargun Dhillon
Cc: Jann Horn
Cc: Chris Palmer
Cc: Aleksa Sarai
Cc: Robert Sesek
Cc: Jeffrey Vander Stoep
Cc: Linux Containers
Signed-off-by: Christian Brauner
Link: https://lore.kernel.org/r/20200531115031.391515-2-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook

Christian Brauner
2020-07-11 07:01:51 +0800
b707ddee1 seccomp: rename "usage" to "refs" and document ... Browse Code »

Naming the lifetime counter of a seccomp filter "usage" suggests a
little too strongly that its about tasks that are using this filter
while it also tracks other references such as the user notifier or
ptrace. This also updates the documentation to note this fact.

We'll be introducing an actual usage counter in a follow-up patch.

Cc: Tycho Andersen
Cc: Kees Cook
Cc: Matt Denton
Cc: Sargun Dhillon
Cc: Jann Horn
Cc: Chris Palmer
Cc: Aleksa Sarai
Cc: Robert Sesek
Cc: Jeffrey Vander Stoep
Cc: Linux Containers
Signed-off-by: Christian Brauner
Link: https://lore.kernel.org/r/20200531115031.391515-1-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook

Christian Brauner
2020-07-11 07:01:51 +0800
9f87dcf14 seccomp: Add find_notification helper ... Browse Code »

This adds a helper which can iterate through a seccomp_filter to
find a notification matching an ID. It removes several replicated
chunks of code.

Signed-off-by: Sargun Dhillon
Acked-by: Christian Brauner
Reviewed-by: Tycho Andersen
Cc: Matt Denton
Cc: Kees Cook ,
Cc: Jann Horn ,
Cc: Robert Sesek ,
Cc: Chris Palmer
Cc: Christian Brauner
Cc: Tycho Andersen
Link: https://lore.kernel.org/r/20200601112532.150158-1-sargun@sargun.me
Signed-off-by: Kees Cook

Sargun Dhillon
2020-07-11 07:01:51 +0800
c818c03b6 seccomp: Report number of loaded filters in /proc/$pid/status ... Browse Code »

A common question asked when debugging seccomp filters is "how many
filters are attached to your process?" Provide a way to easily answer
this question through /proc/$pid/status with a "Seccomp_filters" line.

Signed-off-by: Kees Cook

Kees Cook
2020-07-11 07:01:51 +0800

27 Apr, 2020

1 commit

32927393d sysctl: pass kernel pointers to ->proc_handler ... Browse Code »

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig
Acked-by: Andrey Ignatov
Signed-off-by: Al Viro

Christoph Hellwig
2020-04-27 14:07:40 +0800

01 Apr, 2020

1 commit

29d9f30d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Fix the iwlwifi regression, from Johannes Berg.

2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.

3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.

4) Add TTL decrement action to openvswitch, from Matteo Croce.

5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.

6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.

7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.

8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.

9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.

11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.

12) Support bcmgenet on ACPI, from Jeremy Linton.

13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.

14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.

16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.

17) Offload FIFO qdisc in mlxsw, from Petr Machata.

18) Support UDP sockets in sockmap, from Lorenz Bauer.

19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.

20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.

21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.

22) Add support for hw offload of MACSEC, from Antoine Tenart.

23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.

24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.

25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...

Linus Torvalds
2020-04-01 08:29:33 +0800

30 Mar, 2020

1 commit

3db81afd9 seccomp: Add missing compat_ioctl for notify ... Browse Code »

Executing the seccomp_bpf testsuite under a 64-bit kernel with 32-bit
userland (both s390 and x86) doesn't work because there's no compat_ioctl
handler defined. Add the handler.

Signed-off-by: Sven Schnelle
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200310123332.42255-1-svens@linux.ibm.com
Signed-off-by: Kees Cook

Sven Schnelle
2020-03-30 12:10:51 +0800

05 Mar, 2020

1 commit

51891498f seccomp: allow TSYNC and USER_NOTIF together ... Browse Code »

The restriction introduced in 7a0df7fbc145 ("seccomp: Make NEW_LISTENER and
TSYNC flags exclusive") is mostly artificial: there is enough information
in a seccomp user notification to tell which thread triggered a
notification. The reason it was introduced is because TSYNC makes the
syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and
there's no way to distinguish between these two cases (well, I suppose the
caller could check all fds it has, then do the syscall, and if the return
value was an fd that already existed, then it must be a thread id, but
bleh).

Matthew would like to use these two flags together in the Chrome sandbox
which wants to use TSYNC for video drivers and NEW_LISTENER to proxy
syscalls.

So, let's fix this ugliness by adding another flag, TSYNC_ESRCH, which
tells the kernel to just return -ESRCH on a TSYNC error. This way,
NEW_LISTENER (and any subsequent seccomp() commands that want to return
positive values) don't conflict with each other.

Suggested-by: Matthew Denton
Signed-off-by: Tycho Andersen
Link: https://lore.kernel.org/r/20200304180517.23867-1-tycho@tycho.ws
Signed-off-by: Kees Cook

Tycho Andersen
2020-03-05 06:48:54 +0800

25 Feb, 2020

1 commit

3d9f773cf bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites. ... Browse Code »

All of these cases are strictly of the form:

preempt_disable();
BPF_PROG_RUN(...);
preempt_enable();

Replace this with bpf_prog_run_pin_on_cpu() which wraps BPF_PROG_RUN()
with:

migrate_disable();
BPF_PROG_RUN(...);
migrate_enable();

On non RT enabled kernels this maps to preempt_disable/enable() and on RT
enabled kernels this solely prevents migration, which is sufficient as
there is no requirement to prevent reentrancy to any BPF program from a
preempting task. The only requirement is that the program stays on the same
CPU.

Therefore, this is a trivially correct transformation.

The seccomp loop does not need protection over the loop. It only needs
protection per BPF filter program

[ tglx: Converted to bpf_prog_run_pin_on_cpu() ]

Signed-off-by: David S. Miller
Signed-off-by: Thomas Gleixner
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200224145643.691493094@linutronix.de

David Miller
2020-02-25 08:20:09 +0800

03 Jan, 2020

1 commit

2882d53c9 seccomp: Check that seccomp_notif is zeroed out by the user ... Browse Code »

This patch is a small change in enforcement of the uapi for
SECCOMP_IOCTL_NOTIF_RECV ioctl. Specifically, the datastructure which
is passed (seccomp_notif) must be zeroed out. Previously any of its
members could be set to nonsense values, and we would ignore it.

This ensures all fields are set to their zero value.

Signed-off-by: Sargun Dhillon
Reviewed-by: Christian Brauner
Reviewed-by: Aleksa Sarai
Acked-by: Tycho Andersen
Link: https://lore.kernel.org/r/20191229062451.9467-2-sargun@sargun.me
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook

Sargun Dhillon
2020-01-03 05:03:45 +0800

11 Oct, 2019

1 commit

fb3c5386b seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE ... Browse Code »

This allows the seccomp notifier to continue a syscall. A positive
discussion about this feature was triggered by a post to the
ksummit-discuss mailing list (cf. [3]) and took place during KSummit
(cf. [1]) and again at the containers/checkpoint-restore
micro-conference at Linux Plumbers.

Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
which enables a process (watchee) to retrieve an fd for its seccomp
filter. This fd can then be handed to another (usually more privileged)
process (watcher). The watcher will then be able to receive seccomp
messages about the syscalls having been performed by the watchee.

This feature is heavily used in some userspace workloads. For example,
it is currently used to intercept mknod() syscalls in user namespaces
aka in containers.
The mknod() syscall can be easily filtered based on dev_t. This allows
us to only intercept a very specific subset of mknod() syscalls.
Furthermore, mknod() is not possible in user namespaces toto coelo and
so intercepting and denying syscalls that are not in the whitelist on
accident is not a big deal. The watchee won't notice a difference.

In contrast to mknod(), a lot of other syscall we intercept (e.g.
setxattr()) cannot be easily filtered like mknod() because they have
pointer arguments. Additionally, some of them might actually succeed in
user namespaces (e.g. setxattr() for all "user.*" xattrs). Since we
currently cannot tell seccomp to continue from a user notifier we are
stuck with performing all of the syscalls in lieu of the container. This
is a huge security liability since it is extremely difficult to
correctly assume all of the necessary privileges of the calling task
such that the syscall can be successfully emulated without escaping
other additional security restrictions (think missing CAP_MKNOD for
mknod(), or MS_NODEV on a filesystem etc.). This can be solved by
telling seccomp to resume the syscall.

One thing that came up in the discussion was the problem that another
thread could change the memory after userspace has decided to let the
syscall continue which is a well known TOCTOU with seccomp which is
present in other ways already.
The discussion showed that this feature is already very useful for any
syscall without pointer arguments. For any accidentally intercepted
non-pointer syscall it is safe to continue.
For syscalls with pointer arguments there is a race but for any cautious
userspace and the main usec cases the race doesn't matter. The notifier
is intended to be used in a scenario where a more privileged watcher
supervises the syscalls of lesser privileged watchee to allow it to get
around kernel-enforced limitations by performing the syscall for it
whenever deemed save by the watcher. Hence, if a user tricks the watcher
into allowing a syscall they will either get a deny based on
kernel-enforced restrictions later or they will have changed the
arguments in such a way that they manage to perform a syscall with
arguments that they would've been allowed to do anyway.
In general, it is good to point out again, that the notifier fd was not
intended to allow userspace to implement a security policy but rather to
work around kernel security mechanisms in cases where the watcher knows
that a given action is safe to perform.

/* References */
[1]: https://linuxplumbersconf.org/event/4/contributions/560
[2]: https://linuxplumbersconf.org/event/4/contributions/477
[3]: https://lore.kernel.org/r/20190719093538.dhyopljyr5ns33qx@brauner.io
[4]: commit 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")

Co-developed-by: Kees Cook
Signed-off-by: Christian Brauner
Reviewed-by: Tycho Andersen
Cc: Andy Lutomirski
Cc: Will Drewry
CC: Tyler Hicks
Link: https://lore.kernel.org/r/20190920083007.11475-2-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook

Christian Brauner
2019-10-11 05:45:51 +0800

29 May, 2019

1 commit

a89e9b8ab signal: Remove the signal number and task parameters from force_sig_info ... Browse Code »

force_sig_info always delivers to the current task and the signal
parameter always matches info.si_signo. So remove those parameters to
make it a simpler less error prone interface, and to make it clear
that none of the callers are doing anything clever.

This guarantees that force_sig_info will not grow any new buggy
callers that attempt to call force_sig on a non-current task, or that
pass an signal number that does not match info.si_signo.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2019-05-29 22:31:44 +0800

08 May, 2019

1 commit

02aff8db6 Merge tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit ... Browse Code »

Pull audit updates from Paul Moore:
"We've got a reasonably broad set of audit patches for the v5.2 merge
window, the highlights are below:

- The biggest change, and the source of all the arch/* changes, is
the patchset from Dmitry to help enable some of the work he is
doing around PTRACE_GET_SYSCALL_INFO.

To be honest, including this in the audit tree is a bit of a
stretch, but it does help move audit a little further along towards
proper syscall auditing for all arches, and everyone else seemed to
agree that audit was a "good" spot for this to land (or maybe they
just didn't want to merge it? dunno.).

- We can now audit time/NTP adjustments.

- We continue the work to connect associated audit records into a
single event"

* tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (21 commits)
audit: fix a memory leak bug
ntp: Audit NTP parameters adjustment
timekeeping: Audit clock adjustments
audit: purge unnecessary list_empty calls
audit: link integrity evm_write_xattrs record to syscall event
syscall_get_arch: add "struct task_struct *" argument
unicore32: define syscall_get_arch()
Move EM_UNICORE to uapi/linux/elf-em.h
nios2: define syscall_get_arch()
nds32: define syscall_get_arch()
Move EM_NDS32 to uapi/linux/elf-em.h
m68k: define syscall_get_arch()
hexagon: define syscall_get_arch()
Move EM_HEXAGON to uapi/linux/elf-em.h
h8300: define syscall_get_arch()
c6x: define syscall_get_arch()
arc: define syscall_get_arch()
Move EM_ARCOMPACT and EM_ARCV2 to uapi/linux/elf-em.h
audit: Make audit_log_cap and audit_copy_inode static
audit: connect LOGIN record to its syscall record
...

Linus Torvalds
2019-05-08 10:06:04 +0800

07 May, 2019

1 commit

78ee8b1b9 Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:
"Just a few bugfixes and documentation updates"

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp: fix up grammar in comment
Revert "security: inode: fix a missing check for securityfs_create_file"
Yama: mark function as static
security: inode: fix a missing check for securityfs_create_file
keys: safe concurrent user->{session,uid}_keyring access
security: don't use RCU accessors for cred->session_keyring
Yama: mark local symbols as static
LSM: lsm_hooks.h: fix documentation format
LSM: fix documentation for the shm_* hooks
LSM: fix documentation for the sem_* hooks
LSM: fix documentation for the msg_queue_* hooks
LSM: fix documentation for the audit_* hooks
LSM: fix documentation for the path_chmod hook
LSM: fix documentation for the socket_getpeersec_dgram hook
LSM: fix documentation for the task_setscheduler hook
LSM: fix documentation for the socket_post_create hook
LSM: fix documentation for the syslog hook
LSM: fix documentation for sb_copy_data hook

Linus Torvalds
2019-05-07 23:39:54 +0800

30 Apr, 2019

1 commit

83a50840e Merge tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull seccomp fixes from Kees Cook:
"Syzbot found a use-after-free bug in seccomp due to flags that should
not be allowed to be used together.

Tycho fixed this, I updated the self-tests, and the syzkaller PoC has
been running for several days without triggering KASan (before this
fix, it would reproduce). These patches have also been in -next for
almost a week, just to be sure.

- Add logic for making some seccomp flags exclusive (Tycho)

- Update selftests for exclusivity testing (Kees)"

* tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Make NEW_LISTENER and TSYNC flags exclusive
selftests/seccomp: Prepare for exclusive seccomp flags

Linus Torvalds
2019-04-30 04:24:34 +0800

26 Apr, 2019

1 commit

7a0df7fbc seccomp: Make NEW_LISTENER and TSYNC flags exclusive ... Browse Code »

As the comment notes, the return codes for TSYNC and NEW_LISTENER
conflict, because they both return positive values, one in the case of
success and one in the case of error. So, let's disallow both of these
flags together.

While this is technically a userspace break, all the users I know
of are still waiting on me to land this feature in libseccomp, so I
think it'll be safe. Also, at present my use case doesn't require
TSYNC at all, so this isn't a big deal to disallow. If someone
wanted to support this, a path forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN,
but the use cases are so different I don't see it really happening.

Finally, it's worth noting that this does actually fix a UAF issue: at the
end of seccomp_set_mode_filter(), we have:

if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);

But if ret > 0 because TSYNC raced, we'll install the listener fd and then
free the filter out from underneath it, causing a UAF when the task closes
it or dies. This patch also switches the condition to be simply if (ret),
so that if someone does add the flag mentioned above, they won't have to
remember to fix this too.

Reported-by: syzbot+b562969adb2e04af3442@syzkaller.appspotmail.com
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
CC: stable@vger.kernel.org # v5.0+
Signed-off-by: Tycho Andersen
Signed-off-by: Kees Cook
Acked-by: James Morris

Tycho Andersen
2019-04-26 06:55:58 +0800

24 Apr, 2019

1 commit

6beff00b7 seccomp: fix up grammar in comment ... Browse Code »

This sentence is kind of a train wreck anyway, but at least dropping the
extra pronoun helps somewhat.

Signed-off-by: Tycho Andersen
Acked-by: Kees Cook
Signed-off-by: James Morris

Tycho Andersen
2019-04-24 07:21:12 +0800

05 Apr, 2019

1 commit

b35f549df syscalls: Remove start and number from syscall_get_arguments() args ... Browse Code »

At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
function call syscall_get_arguments() implemented in x86 was horribly
written and not optimized for the standard case of passing in 0 and 6 for
the starting index and the number of system calls to get. When looking at
all the users of this function, I discovered that all instances pass in only
0 and 6 for these arguments. Instead of having this function handle
different cases that are never used, simply rewrite it to return the first 6
arguments of a system call.

This should help out the performance of tracing system calls by ptrace,
ftrace and perf.

Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org

Cc: Oleg Nesterov
Cc: Kees Cook
Cc: Andy Lutomirski
Cc: Dominik Brodowski
Cc: Dave Martin
Cc: "Dmitry V. Levin"
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Acked-by: Paul Burton # MIPS parts
Acked-by: Max Filippov # For xtensa changes
Acked-by: Will Deacon # For the arm64 bits
Reviewed-by: Thomas Gleixner # for x86
Reviewed-by: Dmitry V. Levin
Reported-by: Andy Lutomirski
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (Red Hat)
2019-04-05 21:26:43 +0800

21 Mar, 2019

1 commit

16add4116 syscall_get_arch: add "struct task_struct *" argument ... Browse Code »

This argument is required to extend the generic ptrace API with
PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going
to be called from ptrace_request() along with syscall_get_nr(),
syscall_get_arguments(), syscall_get_error(), and
syscall_get_return_value() functions with a tracee as their argument.

The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6)
should describe what system call is being called and what its arguments
are.

Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments")
Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()")
Reviewed-by: Andy Lutomirski # for x86
Reviewed-by: Palmer Dabbelt
Acked-by: Paul Moore
Acked-by: Paul Burton # MIPS parts
Acked-by: Michael Ellerman (powerpc)
Acked-by: Kees Cook # seccomp parts
Acked-by: Mark Salter # for the c6x bit
Cc: Elvira Khabirova
Cc: Eugene Syromyatnikov
Cc: Oleg Nesterov
Cc: x86@kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: linux-audit@redhat.com
Signed-off-by: Dmitry V. Levin
Signed-off-by: Paul Moore

Dmitry V. Levin
2019-03-21 09:12:36 +0800

08 Mar, 2019

1 commit

ae5906cee Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:

- Extend LSM stacking to allow sharing of cred, file, ipc, inode, and
task blobs. This paves the way for more full-featured LSMs to be
merged, and is specifically aimed at LandLock and SARA LSMs. This
work is from Casey and Kees.

- There's a new LSM from Micah Morton: "SafeSetID gates the setid
family of syscalls to restrict UID/GID transitions from a given
UID/GID to only those approved by a system-wide whitelist." This
feature is currently shipping in ChromeOS.

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (62 commits)
keys: fix missing __user in KEYCTL_PKEY_QUERY
LSM: Update list of SECURITYFS users in Kconfig
LSM: Ignore "security=" when "lsm=" is specified
LSM: Update function documentation for cap_capable
security: mark expected switch fall-throughs and add a missing break
tomoyo: Bump version.
LSM: fix return value check in safesetid_init_securityfs()
LSM: SafeSetID: add selftest
LSM: SafeSetID: remove unused include
LSM: SafeSetID: 'depend' on CONFIG_SECURITY
LSM: Add 'name' field for SafeSetID in DEFINE_LSM
LSM: add SafeSetID module that gates setid calls
LSM: add SafeSetID module that gates setid calls
tomoyo: Allow multiple use_group lines.
tomoyo: Coding style fix.
tomoyo: Swicth from cred->security to task_struct->security.
security: keys: annotate implicit fall throughs
security: keys: annotate implicit fall throughs
security: keys: annotate implicit fall through
capabilities:: annotate implicit fall through
...

Linus Torvalds
2019-03-08 03:44:01 +0800

22 Feb, 2019

1 commit

e80d02dd7 seccomp, bpf: disable preemption before calling into bpf prog ... Browse Code »

All BPF programs must be called with preemption disabled.

Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann

Alexei Starovoitov
2019-02-22 07:14:19 +0800

23 Jan, 2019

1 commit

9624d5c9c Merge tag 'v5.0-rc3' into next-general ... Browse Code »

Sync to Linux 5.0-rc3 to pull in the VFS changes which impacted a lot
of the LSM code.

James Morris
2019-01-23 06:33:10 +0800

16 Jan, 2019

1 commit

a811dc615 seccomp: fix UAF in user-trap code ... Browse Code »

On the failure path, we do an fput() of the listener fd if the filter fails
to install (e.g. because of a TSYNC race that's lost, or if the thread is
killed, etc.). fput() doesn't actually release the fd, it just ads it to a
work queue. Then the thread proceeds to free the filter, even though the
listener struct file has a reference to it.

To fix this, on the failure path let's set the private data to null, so we
know in ->release() to ignore the filter.

Reported-by: syzbot+981c26489b2d1c6316ba@syzkaller.appspotmail.com
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Tycho Andersen
Acked-by: Kees Cook
Signed-off-by: James Morris

Tycho Andersen
2019-01-16 01:43:12 +0800

11 Jan, 2019

1 commit

c1a85a00e LSM: generalize flag passing to security_capable ... Browse Code »

This patch provides a general mechanism for passing flags to the
security_capable LSM hook. It replaces the specific 'audit' flag that is
used to tell security_capable whether it should log an audit message for
the given capability check. The reason for generalizing this flag
passing is so we can add an additional flag that signifies whether
security_capable is being called by a setid syscall (which is needed by
the proposed SafeSetID LSM).

Signed-off-by: Micah Morton
Reviewed-by: Kees Cook
Signed-off-by: James Morris

Micah Morton
2019-01-11 06:16:06 +0800

14 Dec, 2018

1 commit

319deec7d seccomp: fix poor type promotion ... Browse Code »

sparse complains,

kernel/seccomp.c:1172:13: warning: incorrect type in assignment (different base types)
kernel/seccomp.c:1172:13: expected restricted __poll_t [usertype] ret
kernel/seccomp.c:1172:13: got int
kernel/seccomp.c:1173:13: warning: restricted __poll_t degrades to integer

Instead of assigning this to ret, since we don't use this anywhere, let's
just test it against 0 directly.

Signed-off-by: Tycho Andersen
Reported-by: 0day robot
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook

Tycho Andersen
2018-12-14 08:49:01 +0800