Eric Lee / smarc-fsl-linux-kernel

18 Feb, 2015

1 commit

580c57f10 seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO ... Browse Code »

The value resulting from the SECCOMP_RET_DATA mask could exceed MAX_ERRNO
when setting errno during a SECCOMP_RET_ERRNO filter action. This makes
sure we have a reliable value being set, so that an invalid errno will not
be ignored by userspace.

Signed-off-by: Kees Cook
Reported-by: Dmitry V. Levin
Cc: Andy Lutomirski
Cc: Will Drewry
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2015-02-18 06:34:55 +0800

14 Oct, 2014

1 commit

ba1a96fc7 Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.

The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.

There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"

* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing

Linus Torvalds
2014-10-14 08:27:06 +0800

06 Sep, 2014

1 commit

60a3b2253 net: bpf: make eBPF interpreter images read-only ... Browse Code »

With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate. This patch moves the to be interpreted bytecode to
read-only pages.

In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").

This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).

Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.

Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables. Brad Spengler proposed the same idea
via Twitter during development of this patch.

Joint work with Hannes Frederic Sowa.

Suggested-by: Brad Spengler
Signed-off-by: Daniel Borkmann
Signed-off-by: Hannes Frederic Sowa
Cc: Alexei Starovoitov
Cc: Kees Cook
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2014-09-06 03:02:48 +0800

04 Sep, 2014

3 commits

d39bd00de seccomp: Allow arch code to provide seccomp_data ... Browse Code »

populate_seccomp_data is expensive: it works by inspecting
task_pt_regs and various other bits to piece together all the
information, and it's does so in multiple partially redundant steps.

Arch-specific code in the syscall entry path can do much better.

Admittedly this adds a bit of additional room for error, but the
speedup should be worth it.

Signed-off-by: Andy Lutomirski
Signed-off-by: Kees Cook

Andy Lutomirski
2014-09-04 05:58:17 +0800
13aa72f0f seccomp: Refactor the filter callback and the API ... Browse Code »

The reason I did this is to add a seccomp API that will be usable
for an x86 fast path. The x86 entry code needs to use a rather
expensive slow path for a syscall that might be visible to things
like ptrace. By splitting seccomp into two phases, we can check
whether we need the slow path and then use the fast path in if the
filter allows the syscall or just returns some errno.

As a side effect, I think the new code is much easier to understand
than the old code.

This has one user-visible effect: the audit record written for
SECCOMP_RET_TRACE is now a simple indication that SECCOMP_RET_TRACE
happened. It used to depend in a complicated way on what the tracer
did. I couldn't make much sense of it.

Signed-off-by: Andy Lutomirski
Signed-off-by: Kees Cook

Andy Lutomirski
2014-09-04 05:58:17 +0800
a4412fc94 seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing ... Browse Code »

The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled. Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.

To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.

For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters. Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.

This will be a slight slowdown on some arches. The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.

This is a prerequisite for making two-phase seccomp work cleanly.

Cc: Russell King
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook
Signed-off-by: Andy Lutomirski
Signed-off-by: Kees Cook

Andy Lutomirski
2014-09-04 05:58:17 +0800

12 Aug, 2014

1 commit

69f6a34bd seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock ... Browse Code »

Current upstream kernel hangs with mips and powerpc targets in
uniprocessor mode if SECCOMP is configured.

Bisect points to commit dbd952127d11 ("seccomp: introduce writer locking").
Turns out that code such as
BUG_ON(!spin_is_locked(&list_lock));
can not be used in uniprocessor mode because spin_is_locked() always
returns false in this configuration, and that assert_spin_locked()
exists for that very purpose and must be used instead.

Fixes: dbd952127d11 ("seccomp: introduce writer locking")
Cc: Kees Cook
Signed-off-by: Guenter Roeck
Signed-off-by: Kees Cook

Guenter Roeck
2014-08-12 04:29:12 +0800

07 Aug, 2014

1 commit

ae045e245 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Steady transitioning of the BPF instructure to a generic spot so
all kernel subsystems can make use of it, from Alexei Starovoitov.

2) SFC driver supports busy polling, from Alexandre Rames.

3) Take advantage of hash table in UDP multicast delivery, from David
Held.

4) Lighten locking, in particular by getting rid of the LRU lists, in
inet frag handling. From Florian Westphal.

5) Add support for various RFC6458 control messages in SCTP, from
Geir Ola Vaagland.

6) Allow to filter bridge forwarding database dumps by device, from
Jamal Hadi Salim.

7) virtio-net also now supports busy polling, from Jason Wang.

8) Some low level optimization tweaks in pktgen from Jesper Dangaard
Brouer.

9) Add support for ipv6 address generation modes, so that userland
can have some input into the process. From Jiri Pirko.

10) Consolidate common TCP connection request code in ipv4 and ipv6,
from Octavian Purdila.

11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.

12) Generic resizable RCU hash table, with intial users in netlink and
nftables. From Thomas Graf.

13) Maintain a name assignment type so that userspace can see where a
network device name came from (enumerated by kernel, assigned
explicitly by userspace, etc.) From Tom Gundersen.

14) Automatic flow label generation on transmit in ipv6, from Tom
Herbert.

15) New packet timestamping facilities from Willem de Bruijn, meant to
assist in measuring latencies going into/out-of the packet
scheduler, latency from TCP data transmission to ACK, etc"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
cxgb4 : Disable recursive mailbox commands when enabling vi
net: reduce USB network driver config options.
tg3: Modify tg3_tso_bug() to handle multiple TX rings
amd-xgbe: Perform phy connect/disconnect at dev open/stop
amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
net: sun4i-emac: fix memory leak on bad packet
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
Revert "net: phy: Set the driver when registering an MDIO bus device"
cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
team: Simplify return path of team_newlink
bridge: Update outdated comment on promiscuous mode
net-timestamp: ACK timestamp for bytestreams
net-timestamp: TCP timestamping
net-timestamp: SCHED timestamp on entering packet scheduler
net-timestamp: add key to disambiguate concurrent datagrams
net-timestamp: move timestamp flags out of sk_flags
net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
cxgb4i : Move stray CPL definitions to cxgb4 driver
tcp: reduce spurious retransmits due to transient SACK reneging
qlcnic: Initialize dcbnl_ops before register_netdev
...

Linus Torvalds
2014-08-07 00:38:14 +0800

03 Aug, 2014

3 commits

7ae457c1e net: filter: split 'struct sk_filter' into socket and bpf parts ... Browse Code »

clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix

split 'struct sk_filter' into
struct sk_filter {
atomic_t refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
u32 jited:1,
len:31;
struct sock_fprog_kern *orig_prog;
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
union {
struct sock_filter insns[0];
struct bpf_insn insnsi[0];
struct work_struct work;
};
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases

split SK_RUN_FILTER macro into:
SK_RUN_FILTER to be used with 'struct sk_filter *' and
BPF_PROG_RUN to be used with 'struct bpf_prog *'

__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function

also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:

sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter

API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet

API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-08-03 06:03:58 +0800
8fb575ca3 net: filter: rename sk_convert_filter() -> bpf_convert_filter() ... Browse Code »

to indicate that this function is converting classic BPF into eBPF
and not related to sockets

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-08-03 06:02:38 +0800
4df95ff48 net: filter: rename sk_chk_filter() -> bpf_check_classic() ... Browse Code »

trivial rename to indicate that this functions performs classic BPF checking

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-08-03 06:02:38 +0800

25 Jul, 2014

1 commit

2695fb552 net: filter: rename 'struct sock_filter_int' into 'struct bpf_insn' ... Browse Code »

eBPF is used by socket filtering, seccomp and soon by tracing and
exposed to userspace, therefore 'sock_filter_int' name is not accurate.
Rename it to 'bpf_insn'

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-07-25 14:27:17 +0800

19 Jul, 2014

9 commits

c2e1f2e30 seccomp: implement SECCOMP_FILTER_FLAG_TSYNC ... Browse Code »

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.

Changes across threads to the filter pointer includes a barrier.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes
Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:40 +0800
3ba2530cc seccomp: allow mode setting across threads ... Browse Code »

This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:40 +0800
dbd952127 seccomp: introduce writer locking ... Browse Code »

Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic. However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:39 +0800
c8bee430d seccomp: split filter prep from check and apply ... Browse Code »

In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

For extreme defensiveness, I've added a BUG_ON check for the calculated
size of the buffer allocation in case BPF_MAXINSN ever changes, which
shouldn't ever happen. The compiler should actually optimize out this
check since the test above it makes it impossible.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:39 +0800
1d4457f99 sched: move no_new_privs into new atomic flags ... Browse Code »

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:38 +0800
48dc92b9f seccomp: add "seccomp" syscall ... Browse Code »

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:37 +0800
3b23dd128 seccomp: split mode setting routines ... Browse Code »

Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:37 +0800
1f41b4504 seccomp: extract check/assign mode helpers ... Browse Code »

To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:36 +0800
d78ab02c2 seccomp: create internal mode-setting function ... Browse Code »

In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook
Reviewed-by: Oleg Nesterov
Reviewed-by: Andy Lutomirski

Kees Cook
2014-07-19 03:13:36 +0800

13 Jun, 2014

1 commit

f9da455b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.

3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.

4) BPF now has a "random" opcode, from Chema Gonzalez.

5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.

6) Support TCP fastopen over ipv6, from Daniel Lee.

7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.

8) Support software TSO in fec driver too, from Nimrod Andy.

9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...

Linus Torvalds
2014-06-13 05:27:40 +0800

07 Jun, 2014

1 commit

119ce5c8b kernel/seccomp.c: kernel-doc warning fix ... Browse Code »

+ fix small typo

Signed-off-by: Fabian Frederick
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-07 07:08:15 +0800

02 Jun, 2014

1 commit

348059313 net: filter: get rid of BPF_S_* enum ... Browse Code »

This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.

Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.

Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().

Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Mircea Gherzan
Cc: Kees Cook
Acked-by: Chema Gonzalez
Signed-off-by: David S. Miller

Daniel Borkmann
2014-06-02 13:16:58 +0800

22 May, 2014

1 commit

5fe821a9d net: filter: cleanup invocation of internal BPF ... Browse Code »

Kernel API for classic BPF socket filters is:

sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter

Cleanup internal BPF kernel API as following:

sk_filter_select_runtime() - final step of internal BPF creation.
Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program

Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.

Example of internal BPF create, run, destroy:

struct sk_filter *fp;

fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
fp->len = prog_len;

sk_filter_select_runtime(fp);

SK_RUN_FILTER(fp, ctx);

sk_filter_free(fp);

Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.

Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-05-22 05:07:17 +0800

16 May, 2014

1 commit

8f577cadf seccomp: JIT compile seccomp filter ... Browse Code »

Take advantage of internal BPF JIT

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

seccomp_rule_add_exact(ctx,...
seccomp_rule_add_exact(ctx,...

rc = seccomp_load(ctx);

for (i = 0; i < 10000000; i++)
syscall(...);

$ sudo sysctl net.core.bpf_jit_enable=1
$ time ./bench
real 0m2.769s
user 0m1.136s
sys 0m1.624s

$ sudo sysctl net.core.bpf_jit_enable=0
$ time ./bench
real 0m5.825s
user 0m1.268s
sys 0m4.548s

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-05-16 04:31:30 +0800

17 Apr, 2014

1 commit

0acf07d24 seccomp: fix memory leak on filter attach ... Browse Code »

This sets the correct error code when final filter memory is unavailable,
and frees the raw filter no matter what.

unreferenced object 0xffff8800d6ea4000 (size 512):
comm "sshd", pid 278, jiffies 4294898315 (age 46.653s)
hex dump (first 32 bytes):
21 00 00 00 04 00 00 00 15 00 01 00 3e 00 00 c0 !...........>...
06 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] __kmalloc+0x280/0x320
[] prctl_set_seccomp+0x11e/0x3b0
[] SyS_prctl+0x3bb/0x4a0
[] system_call_fastpath+0x1a/0x1f
[] 0xffffffffffffffff

Reported-by: Masami Ichikawa
Signed-off-by: Kees Cook
Tested-by: Masami Ichikawa
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Kees Cook
2014-04-17 03:25:53 +0800

15 Apr, 2014

1 commit

2eac76483 seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF ... Browse Code »

Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:

audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000

audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000

These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.

The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:

struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};

Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:

syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.

Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Cc: Linus Torvalds
Cc: Eric Paris
Cc: James Morris
Cc: Kees Cook
Signed-off-by: David S. Miller

Daniel Borkmann
2014-04-15 04:26:47 +0800

13 Apr, 2014

1 commit

0b747172d Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...

Linus Torvalds
2014-04-13 03:38:53 +0800

04 Apr, 2014

1 commit

bea803183 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:
"Apart from reordering the SELinux mmap code to ensure DAC is called
before MAC, these are minor maintenance updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: correctly label /proc inodes in use before the policy is loaded
selinux: put the mmap() DAC controls before the MAC controls
selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
evm: enable key retention service automatically
ima: skip memory allocation for empty files
evm: EVM does not use MD5
ima: return d_name.name if d_path fails
integrity: fix checkpatch errors
ima: fix erroneous removal of security.ima xattr
security: integrity: Use a more current logging style
MAINTAINERS: email updates and other misc. changes
ima: reduce memory usage when a template containing the n field is used
ima: restore the original behavior for sending data with ima template
Integrity: Pass commname via get_task_comm()
fs: move i_readcount
ima: use static const char array definitions
security: have cap_dentry_init_security return error
ima: new helper: file_inode(file)
kernel: Mark function as static in kernel/seccomp.c
capability: Use current logging styles
...

Linus Torvalds
2014-04-04 00:26:18 +0800

31 Mar, 2014

1 commit

bd4cf0ed3 net: filter: rework/optimize internal BPF interpreter's instruction set ... Browse Code »

This patch replaces/reworks the kernel-internal BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

BPF jump instructions are forced to go either 'true' or 'false'
branch which causes branch-miss penalty. The new BPF jump
instructions have only one branch and fall-through otherwise,
which fits the CPU branch predictor logic better. `perf stat`
shows drastic difference for branch-misses between the old and
new code.

2. Jump-threaded implementation of interpreter vs switch
statement:

Instead of single table-jump at the top of 'switch' statement,
gcc will now generate multiple table-jump instructions, which
helps CPU branch predictor logic.

Note that the verification of filters is still being done through
sk_chk_filter() in classical BPF format, so filters from user- or
kernel space are verified in the same way as we do now, and same
restrictions/constraints hold as well.

We reuse current BPF JIT compilers in a way that this upgrade would
even be fine as is, but nevertheless allows for a successive upgrade
of BPF JIT compilers to the new format.

The internal instruction set migration is being done after the
probing for JIT compilation, so in case JIT compilers are able to
create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPF program's
instruction set, so that it can be transparently run in the new
interpreter.

In short, the *internal* format extends BPF in the following way (more
details can be taken from the appended documentation):

- Number of registers increase from 2 to 10
- Register width increases from 32-bit to 64-bit
- Conditional jt/jf targets replaced with jt/fall-through
- Adds signed > and >= insns
- 16 4-byte stack slots for register spill-fill replaced
with up to 512 bytes of multi-use stack space
- Introduction of bpf_call insn and register passing convention
for zero overhead calls from/to other kernel functions
- Adds arithmetic right shift and endianness conversion insns
- Adds atomic_add insn
- Old tax/txa insns are replaced with 'mov dst,src' insn

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
smaller is better:

--x86_64--
fprog #1 fprog #1 fprog #2 fprog #2
cache-hit cache-miss cache-hit cache-miss
old BPF 90 101 192 202
new BPF 31 71 47 97
old BPF jit 12 34 17 44
new BPF jit TBD

--i386--
fprog #1 fprog #1 fprog #2 fprog #2
cache-hit cache-miss cache-hit cache-miss
old BPF 107 136 227 252
new BPF 40 119 69 172

--arm32--
fprog #1 fprog #1 fprog #2 fprog #2
cache-hit cache-miss cache-hit cache-miss
old BPF 202 300 475 540
new BPF 180 270 330 470
old BPF jit 26 182 37 202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

seccomp_rule_add_exact(ctx,...
seccomp_rule_add_exact(ctx,...

rc = seccomp_load(ctx);

for (i = 0; i < 10000000; i++)
syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
39.12% bench libc-2.15.so [.] syscall
8.10% bench [kernel.kallsyms] [k] sk_run_filter
6.31% bench [kernel.kallsyms] [k] system_call
5.59% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller
4.37% bench [kernel.kallsyms] [k] trace_hardirqs_off_caller
3.70% bench [kernel.kallsyms] [k] __secure_computing
3.67% bench [kernel.kallsyms] [k] lock_is_held
3.03% bench [kernel.kallsyms] [k] seccomp_bpf_load
new BPF: 2.58 sec
42.05% bench libc-2.15.so [.] syscall
6.91% bench [kernel.kallsyms] [k] system_call
6.25% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller
6.07% bench [kernel.kallsyms] [k] __secure_computing
5.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
39.92% bench [kernel.kallsyms] [k] vector_swi
16.60% bench [kernel.kallsyms] [k] sk_run_filter
14.66% bench libc-2.17.so [.] syscall
5.42% bench [kernel.kallsyms] [k] seccomp_bpf_load
5.10% bench [kernel.kallsyms] [k] __secure_computing
new BPF: 3.7 sec
35.93% bench [kernel.kallsyms] [k] vector_swi
21.89% bench libc-2.17.so [.] syscall
13.45% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
6.25% bench [kernel.kallsyms] [k] __secure_computing
3.96% bench [kernel.kallsyms] [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
73.38% bench [kernel.kallsyms] [k] sk_run_filter
10.70% bench libc-2.15.so [.] syscall
5.09% bench [kernel.kallsyms] [k] seccomp_bpf_load
1.97% bench [kernel.kallsyms] [k] system_call
new BPF: 5.7 seconds
66.20% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
16.75% bench libc-2.15.so [.] syscall
3.31% bench [kernel.kallsyms] [k] system_call
2.88% bench [kernel.kallsyms] [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
73.88% bench [kernel.kallsyms] [k] sk_run_filter
10.29% bench [kernel.kallsyms] [k] vector_swi
6.46% bench libc-2.17.so [.] syscall
2.94% bench [kernel.kallsyms] [k] seccomp_bpf_load
1.19% bench [kernel.kallsyms] [k] __secure_computing
0.87% bench [kernel.kallsyms] [k] sys_getuid
new BPF: 13.5 sec
76.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp
10.98% bench [kernel.kallsyms] [k] vector_swi
5.87% bench libc-2.17.so [.] syscall
1.77% bench [kernel.kallsyms] [k] __secure_computing
0.93% bench [kernel.kallsyms] [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Cc: Hagen Paul Pfeifer
Cc: Kees Cook
Cc: Paul Moore
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: linux-kernel@vger.kernel.org
Acked-by: Kees Cook
Signed-off-by: David S. Miller

Alexei Starovoitov
2014-03-31 12:45:09 +0800

20 Mar, 2014

1 commit

5e937a9ae syscall_get_arch: remove useless function arguments ... Browse Code »

Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args. So just get rid of both of
those things. Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org

Eric Paris
2014-03-20 22:11:59 +0800

28 Feb, 2014

1 commit

864f32a52 kernel: Mark function as static in kernel/seccomp.c ... Browse Code »

Mark function as static in kernel/seccomp.c because it is not used
outside this file.

This eliminates the following warning in kernel/seccomp.c:
kernel/seccomp.c:296:6: warning: no previous prototype for ?seccomp_attach_user_filter? [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria
Reviewed-by: Josh Triplett
Acked-by: Kees Cook
Acked-by: Will Drewry
Signed-off-by: James Morris

Rashika Kheria
2014-02-28 10:54:27 +0800

26 Mar, 2013

1 commit

d13274794 seccomp: allow BPF_XOR based ALU instructions. ... Browse Code »

Allow BPF_XOR based ALU instructions.

Signed-off-by: Nicolas Schichan
Acked-by: Kees Cook
Acked-by: Will Drewry
Signed-off-by: James Morris

Nicolas Schichan
2013-03-26 08:07:19 +0800

02 Oct, 2012

1 commit

87b526d34 seccomp: Make syscall skipping and nr changes more consistent ... Browse Code »

This fixes two issues that could cause incompatibility between
kernel versions:

- If a tracer uses SECCOMP_RET_TRACE to select a syscall number
higher than the largest known syscall, emulate the unknown
vsyscall by returning -ENOSYS. (This is unlikely to make a
noticeable difference on x86-64 due to the way the system call
entry works.)

- On x86-64 with vsyscall=emulate, skipped vsyscalls were buggy.

This updates the documentation accordingly.

Signed-off-by: Andy Lutomirski
Acked-by: Will Drewry
Signed-off-by: James Morris

Andy Lutomirski
2012-10-02 19:14:29 +0800

18 Apr, 2012

1 commit

8156b451f seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER ... Browse Code »

If both audit and seccomp filter support are disabled, 'ret' is marked
as unused.

If just seccomp filter support is disabled, data and skip are considered
unused.

This change fixes those build warnings.

Reported-by: Stephen Rothwell
Signed-off-by: Will Drewry
Acked-by: Kees Cook
Signed-off-by: James Morris

Will Drewry
2012-04-18 10:24:52 +0800

14 Apr, 2012

4 commits

fb0fadf9b ptrace,seccomp: Add PTRACE_SECCOMP support ... Browse Code »

This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit
SECCOMP_RET_DATA mask of the BPF program return value will be passed as
the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

If the subordinate process is not using seccomp filter, then no
system call notifications will occur even if the option is specified.

If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
is returned, the system call will not be executed and an -ENOSYS errno
will be returned to userspace.

This change adds a dependency on the system call slow path. Any future
efforts to use the system call fast path for seccomp filter will need to
address this restriction.

Signed-off-by: Will Drewry
Acked-by: Eric Paris

v18: - rebase
- comment fatal_signal check
- acked-by
- drop secure_computing_int comment
v17: - ...
v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
[note PT_TRACE_MASK disappears in linux-next]
v15: - add audit support for non-zero return codes
- clean up style (indan@nul.nu)
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
(Brings back a change to ptrace.c and the masks.)
v12: - rebase to linux-next
- use ptrace_event and update arch/Kconfig to mention slow-path dependency
- drop all tracehook changes and inclusion (oleg@redhat.com)
v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
(indan@nul.nu)
v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
v9: - n/a
v8: - guarded PTRACE_SECCOMP use with an ifdef
v7: - introduced
Signed-off-by: James Morris

Will Drewry
2012-04-14 09:13:21 +0800
bb6ea4301 seccomp: Add SECCOMP_RET_TRAP ... Browse Code »

Adds a new return value to seccomp filters that triggers a SIGSYS to be
delivered with the new SYS_SECCOMP si_code.

This allows in-process system call emulation, including just specifying
an errno or cleanly dumping core, rather than just dying.

Suggested-by: Markus Gutschke
Suggested-by: Julien Tinnes
Signed-off-by: Will Drewry
Acked-by: Eric Paris

v18: - acked-by, rebase
- don't mention secure_computing_int() anymore
v15: - use audit_seccomp/skip
- pad out error spacing; clean up switch (indan@nul.nu)
v14: - n/a
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - rebase on to linux-next
v11: - clarify the comment (indan@nul.nu)
- s/sigtrap/sigsys
v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
note suggested-by (though original suggestion had other behaviors)
v9: - changes to SIGILL
v8: - clean up based on changes to dependent patches
v7: - introduction
Signed-off-by: James Morris

Will Drewry
2012-04-14 09:13:21 +0800
acf3b2c71 seccomp: add SECCOMP_RET_ERRNO ... Browse Code »

This change adds the SECCOMP_RET_ERRNO as a valid return value from a
seccomp filter. Additionally, it makes the first use of the lower
16-bits for storing a filter-supplied errno. 16-bits is more than
enough for the errno-base.h calls.

Returning errors instead of immediately terminating processes that
violate seccomp policy allow for broader use of this functionality
for kernel attack surface reduction. For example, a linux container
could maintain a whitelist of pre-existing system calls but drop
all new ones with errnos. This would keep a logically static attack
surface while providing errnos that may allow for graceful failure
without the downside of do_exit() on a bad call.

This change also changes the signature of __secure_computing. It
appears the only direct caller is the arm entry code and it clobbers
any possible return value (register) immediately.

Signed-off-by: Will Drewry
Acked-by: Serge Hallyn
Reviewed-by: Kees Cook
Acked-by: Eric Paris

v18: - fix up comments and rebase
- fix bad var name which was fixed in later revs
- remove _int() and just change the __secure_computing signature
v16-v17: ...
v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
- clean up and pad out return codes (indan@nul.nu)
v14: - no change/rebase
v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: - move to WARN_ON if filter is NULL
(oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
- return immediately for filter==NULL (keescook@chromium.org)
- change evaluation to only compare the ACTION so that layered
errnos don't result in the lowest one being returned.
(keeschook@chromium.org)
v11: - check for NULL filter (keescook@chromium.org)
v10: - change loaders to fn
v9: - n/a
v8: - update Kconfig to note new need for syscall_set_return_value.
- reordered such that TRAP behavior follows on later.
- made the for loop a little less indent-y
v7: - introduced
Signed-off-by: James Morris

Will Drewry
2012-04-14 09:13:21 +0800
3dc1c1b2d seccomp: remove duplicated failure logging ... Browse Code »

This consolidates the seccomp filter error logging path and adds more
details to the audit log.

Signed-off-by: Will Drewry
Signed-off-by: Kees Cook
Acked-by: Eric Paris

v18: make compat= permanent in the record
v15: added a return code to the audit_seccomp path by wad@chromium.org
(suggested by eparis@redhat.com)
v*: original by keescook@chromium.org
Signed-off-by: James Morris

Kees Cook
2012-04-14 09:13:20 +0800