Eric Lee / smarc-fsl-linux-kernel

08 Oct, 2020

2 commits

ebfb4d40e bpf: Fix build failure for kernel/trace/bpf_trace.c with CONFIG_NET=n ... Browse Code »

When CONFIG_NET is not defined, I hit the following build error:
kernel/trace/bpf_trace.o:(.rodata+0x110): undefined reference to `bpf_prog_test_run_raw_tp'

Commit 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
added test_run support for raw_tracepoint in /kernel/trace/bpf_trace.c.
But the test_run function bpf_prog_test_run_raw_tp is defined in
net/bpf/test_run.c, only available with CONFIG_NET=y.

Adding a CONFIG_NET guard for
.test_run = bpf_prog_test_run_raw_tp;
fixed the above build issue.

Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201007062933.3425899-1-yhs@fb.com

Yonghong Song
2020-10-08 01:56:46 +0800
49a2a4d41 kernel/bpf/verifier: Fix build when NET is not enabled ... Browse Code »

Fix build errors in kernel/bpf/verifier.c when CONFIG_NET is
not enabled.

../kernel/bpf/verifier.c:3995:13: error: ‘btf_sock_ids’ undeclared here (not in a function); did you mean ‘bpf_sock_ops’?
.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],

../kernel/bpf/verifier.c:3995:26: error: ‘BTF_SOCK_TYPE_SOCK_COMMON’ undeclared here (not in a function); did you mean ‘PTR_TO_SOCK_COMMON’?
.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],

Fixes: 1df8f55a37bd ("bpf: Enable bpf_skc_to_* sock casting helper to networking prog type")
Signed-off-by: Randy Dunlap
Signed-off-by: Alexei Starovoitov
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20201007021613.13646-1-rdunlap@infradead.org

Randy Dunlap
2020-10-08 01:53:43 +0800

06 Oct, 2020

1 commit

39d8f0d10 bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI ... Browse Code »

Recent improvements in LOCKDEP highlighted a potential A-A deadlock with
pcpu_freelist in NMI:

./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi

[ 18.984807] ================================
[ 18.984807] WARNING: inconsistent lock state
[ 18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
[ 18.984809] --------------------------------
[ 18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[ 18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
[ 18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180
[ 18.984813] {INITIAL USE} state was registered at:
[ 18.984814] lock_acquire+0x175/0x7c0
[ 18.984814] _raw_spin_lock+0x2c/0x40
[ 18.984815] __pcpu_freelist_pop+0xe3/0x180
[ 18.984815] pcpu_freelist_pop+0x31/0x40
[ 18.984816] htab_map_alloc+0xbbf/0xf40
[ 18.984816] __do_sys_bpf+0x5aa/0x3ed0
[ 18.984817] do_syscall_64+0x2d/0x40
[ 18.984818] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 18.984818] irq event stamp: 12
[...]
[ 18.984822] other info that might help us debug this:
[ 18.984823] Possible unsafe locking scenario:
[ 18.984823]
[ 18.984824] CPU0
[ 18.984824] ----
[ 18.984824] lock(&head->lock);
[ 18.984826]
[ 18.984826] lock(&head->lock);
[ 18.984827]
[ 18.984828] *** DEADLOCK ***
[ 18.984828]
[ 18.984829] 2 locks held by test_progs/1990:
[...]
[ 18.984838]
[ 18.984838] dump_stack+0x9a/0xd0
[ 18.984839] lock_acquire+0x5c9/0x7c0
[ 18.984839] ? lock_release+0x6f0/0x6f0
[ 18.984840] ? __pcpu_freelist_pop+0xe3/0x180
[ 18.984840] _raw_spin_lock+0x2c/0x40
[ 18.984841] ? __pcpu_freelist_pop+0xe3/0x180
[ 18.984841] __pcpu_freelist_pop+0xe3/0x180
[ 18.984842] pcpu_freelist_pop+0x17/0x40
[ 18.984842] ? lock_release+0x6f0/0x6f0
[ 18.984843] __bpf_get_stackid+0x534/0xaf0
[ 18.984843] bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350
[ 18.984844] bpf_overflow_handler+0x12f/0x3f0

This is because pcpu_freelist_head.lock is accessed in both NMI and
non-NMI context. Fix this issue by using raw_spin_trylock() in NMI.

Since NMI interrupts non-NMI context, when NMI context tries to lock the
raw_spinlock, non-NMI context of the same CPU may already have locked a
lock and is blocked from unlocking the lock. For a system with N CPUs,
there could be N NMIs at the same time, and they may block N non-NMI
raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike
_pop(), failing _push() means leaking memory. This issue is more likely to
trigger in non-SMP system.

Fix this issue with an extra list, pcpu_freelist.extralist. The extralist
is primarily used to take _push() when raw_spin_trylock() failed on all
the per CPU lists. It should be empty most of the time. The following
table summarizes the behavior of pcpu_freelist in NMI and non-NMI:

non-NMI pop(): use _lock(); check per CPU lists first;
if all per CPU lists are empty, check extralist;
if extralist is empty, return NULL.

non-NMI push(): use _lock(); only push to per CPU lists.

NMI pop(): use _trylock(); check per CPU lists first;
if all per CPU lists are locked or empty, check extralist;
if extralist is locked or empty, return NULL.

NMI push(): use _trylock(); check per CPU lists first;
if all per CPU lists are locked; try push to extralist;
if extralist is also locked, keep trying on per CPU lists.

Reported-by: Alexei Starovoitov
Signed-off-by: Song Liu
Signed-off-by: Daniel Borkmann
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com

Song Liu
2020-10-06 06:04:11 +0800

05 Oct, 2020

1 commit

8731745e4 bpf, verifier: Use fallthrough pseudo-keyword ... Browse Code »

Replace /* fallthrough */ comments with the new pseudo-keyword
macro fallthrough [1].

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Daniel Borkmann
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20201002234217.GA12280@embeddedor

Gustavo A. R. Silva
2020-10-05 21:52:36 +0800

03 Oct, 2020

4 commits

1028ae406 bpf: Deref map in BPF_PROG_BIND_MAP when it's already used ... Browse Code »

We are missing a deref for the case when we are doing BPF_PROG_BIND_MAP
on a map that's being already held by the program.
There is 'if (ret) bpf_map_put(map)' below which doesn't trigger
because we don't consider this an error.
Let's add missing bpf_map_put() for this specific condition.

Fixes: ef15314aa5de ("bpf: Add BPF_PROG_BIND_MAP syscall")
Reported-by: Alexei Starovoitov
Signed-off-by: Stanislav Fomichev
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201003002544.3601440-1-sdf@google.com

Stanislav Fomichev
2020-10-03 10:21:25 +0800
63d9b80dc bpf: Introducte bpf_this_cpu_ptr() ... Browse Code »

Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.

Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com

Hao Luo
2020-10-03 06:00:49 +0800
eaa6bcb71 bpf: Introduce bpf_per_cpu_ptr() ... Browse Code »

Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.

Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com

Hao Luo
2020-10-03 06:00:49 +0800
4976b718c bpf: Introduce pseudo_btf_id ... Browse Code »

Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.

>From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.

Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.

Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com

Hao Luo
2020-10-03 05:59:25 +0800

01 Oct, 2020

2 commits

792caccc4 bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array ... Browse Code »

Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com

Song Liu
2020-10-01 14:18:12 +0800
92acdc58a bpf, net: Rework cookie generator as per-cpu one ... Browse Code »

With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.

Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Reviewed-by: Eric Dumazet
Acked-by: Martin KaFai Lau
Cc: Eric Dumazet
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net

Daniel Borkmann
2020-10-01 02:50:35 +0800

30 Sep, 2020

4 commits

43bc2874e bpf: Fix context type resolving for extension programs ... Browse Code »

Eelco reported we can't properly access arguments if the tracing
program is attached to extension program.

Having following program:

SEC("classifier/test_pkt_md_access")
int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

SEC("freplace/test_pkt_md_access")
int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

SEC("fentry/test_pkt_md_access_new")
int BPF_PROG(fentry, struct sk_buff *skb)

It's not possible to access skb argument in the fentry program,
with following error from verifier:

; int BPF_PROG(fentry, struct sk_buff *skb)
0: (79) r1 = *(u64 *)(r1 +0)
invalid bpf_context access off=0 size=8

The problem is that btf_ctx_access gets the context type for the
traced program, which is in this case the extension.

But when we trace extension program, we want to get the context
type of the program that the extension is attached to, so we can
access the argument properly in the trace program.

This version of the patch is tweaked slightly from Jiri's original one,
since the refactoring in the previous patches means we have to get the
target prog type from the new variable in prog->aux instead of directly
from the target prog.

Reported-by: Eelco Chaudron
Suggested-by: Jiri Olsa
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355278.48470.17057040257274725638.stgit@toke.dk

Toke Høiland-Jørgensen
2020-09-30 04:09:24 +0800
4a1e7c0c6 bpf: Support attaching freplace programs to multiple attach points ... Browse Code »

This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk

Toke Høiland-Jørgensen
2020-09-30 04:09:24 +0800
3aac1ead5 bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach ... Browse Code »

In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.

With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk

Toke Høiland-Jørgensen
2020-09-30 04:09:23 +0800
9d9aae53b bpf/preload: Make sure Makefile cleans up after itself, and add .gitignore ... Browse Code »

The Makefile in bpf/preload builds a local copy of libbpf, but does not
properly clean up after itself. This can lead to subsequent compilation
failures, since the feature detection cache is kept around which can lead
subsequent detection to fail.

Fix this by properly setting clean-files, and while we're at it, also add a
.gitignore for the directory to ignore the build artifacts.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.")
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200927193005.8459-1-toke@redhat.com

Toke Høiland-Jørgensen
2020-09-30 02:15:01 +0800

29 Sep, 2020

11 commits

eb411377a bpf: Add bpf_seq_printf_btf helper ... Browse Code »

A helper is added to allow seq file writing of kernel data
structures using vmlinux BTF. Its signature is

long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);

Flags and struct btf_ptr definitions/use are identical to the
bpf_snprintf_btf helper, and the helper returns 0 on success
or a negative error value.

Suggested-by: Alexei Starovoitov
Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com

Alan Maguire
2020-09-29 09:26:58 +0800
af6532094 bpf: Bump iter seq size to support BTF representation of large data structures ... Browse Code »

BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based
representations of larger kernel data structures such as task_struct,
this will be insufficient.

Suggested-by: Alexei Starovoitov
Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-6-git-send-email-alan.maguire@oracle.com

Alan Maguire
2020-09-29 09:26:58 +0800
c4d0bfb45 bpf: Add bpf_snprintf_btf helper ... Browse Code »

A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF). Its signature is

long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
are not to be confused with the BTF_F_* flags
below that control how the btf_ptr is displayed; the
flags member of the struct btf_ptr may be used to
disambiguate types in kernel versus module BTF, etc;
the main distinction is the flags relate to the type
and information needed in identifying it; not how it
is displayed.

For example a BPF program with a struct sk_buff *skb
could do the following:

static struct btf_ptr b = { };

b.ptr = skb;
b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

Default output looks like this:

(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}

Flags modifying display are as follows:

- BTF_F_COMPACT: no formatting around type information
- BTF_F_NONAME: no struct/union member names/types
- BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
equivalent to %px.
- BTF_F_ZERO: show zero-valued struct/union members;
they are not displayed by default

Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com

Alan Maguire
2020-09-29 09:26:58 +0800
31d0bc816 bpf: Move to generic BTF show support, apply it to seq files/strings ... Browse Code »

generalize the "seq_show" seq file support in btf.c to support
a generic show callback of which we support two instances; the
current seq file show, and a show with snprintf() behaviour which
instead writes the type data to a supplied string.

Both classes of show function call btf_type_show() with different
targets; the seq file or the string to be written. In the string
case we need to track additional data - length left in string to write
and length to return that we would have written (a la snprintf).

By default show will display type information, field members and
their types and values etc, and the information is indented
based upon structure depth. Zeroed fields are omitted.

Show however supports flags which modify its behaviour:

BTF_SHOW_COMPACT - suppress newline/indent.
BTF_SHOW_NONAME - suppress show of type and member names.
BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
BTF_SHOW_ZERO - show zeroed values (by default they are not shown).

Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com

Alan Maguire
2020-09-29 09:26:58 +0800
76654e67f bpf: Provide function to get vmlinux BTF information ... Browse Code »

It will be used later for BPF structure display support

Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com

Alan Maguire
2020-09-29 09:26:58 +0800
f7b12b6fe bpf: verifier: refactor check_attach_btf_id() ... Browse Code »

The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
attachment is valid.

2. It stores a bunch of state about the attachment being requested in
the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.

This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.

Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov

Toke Høiland-Jørgensen
2020-09-29 08:10:34 +0800
efc68158c bpf: change logging calls from verbose() to bpf_log() and use log pointer ... Browse Code »

In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.

Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov

Toke Høiland-Jørgensen
2020-09-29 08:09:59 +0800
1af9270e9 bpf: disallow attaching modify_return tracing functions to other BPF programs ... Browse Code »

From the checks and commit messages for modify_return, it seems it was
never the intention that it should be possible to attach a tracing program
with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
However, check_attach_modify_return() will only look at the function name,
so if the target function starts with "security_", the attach will be
allowed even for bpf2bpf attachment.

Fix this oversight by also blocking the modification if a target program is
supplied.

Fixes: 18644cec714a ("bpf: Fix use-after-free in fmod_ret check")
Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov

Toke Høiland-Jørgensen
2020-09-29 08:08:07 +0800
6550f2ddd bpf: sockmap: Enable map_update_elem from bpf_iter ... Browse Code »

Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Since we may be passed a request or timewait
socket we also need to check sk_fullsock. Doing this allows calling
map_update_elem on sockmap from bpf_iter context, which uses
BTF pointers.

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com

Lorenz Bauer
2020-09-29 07:40:46 +0800
efa90b509 bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature ... Browse Code »

Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine
signature since it is no longer needed.

Signed-off-by: Lorenzo Bianconi
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org

Lorenzo Bianconi
2020-09-29 05:30:42 +0800
1b4d60ec1 bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint ... Browse Code »

Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Signed-off-by: Song Liu
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com

Song Liu
2020-09-29 03:52:36 +0800

26 Sep, 2020

4 commits

4fbb38a3b bpf, verifier: Remove redundant var_off.value ops in scalar known reg cases ... Browse Code »

In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
tnum is a constant.

1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
2 scalar32_min_max_[op]
3 if (known) return
4 scalar_min_max_[op]
5 if (known)
6 __mark_reg_known(dst_reg,
dst_reg->var_off.value [op] src_reg.var_off.value)

The result is in 1 we calculate the var_off value and store it in the
dst_reg. Then in 6 we duplicate this logic doing the op again on the
value.

The duplication comes from the the tnum_[op] handlers because they have
already done the value calcuation. For example this is tnum_and().

struct tnum tnum_and(struct tnum a, struct tnum b)
{
u64 alpha, beta, v;

alpha = a.value | a.mask;
beta = b.value | b.mask;
v = a.value & b.value;
return TNUM(v, alpha & beta & ~v);
}

So lets remove the redundant op calculation. Its confusing for readers
and unnecessary. Its also not harmful because those ops have the
property, r1 & r1 = r1 and r1 | r1 = r1.

Signed-off-by: John Fastabend
Signed-off-by: Alexei Starovoitov

John Fastabend
2020-09-26 07:47:21 +0800
592a34986 bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON ... Browse Code »

This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get(). It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.

The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.

Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: Lorenz Bauer
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com

Martin KaFai Lau
2020-09-26 04:58:01 +0800
1df8f55a3 bpf: Enable bpf_skc_to_* sock casting helper to networking prog type ... Browse Code »

There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.

A current workaround could be to use bpf_probe_read_kernel(). However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example. While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.

This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id. For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".

These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.

The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.

This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.

Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is
always the same. Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/

The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL. This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.

The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.

By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected. This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.

[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
ARG_PTR_TO_SOCK* of other existing helpers later such that
those existing helpers can take the PTR_TO_BTF_ID returned by
the bpf_skc_to_*() helpers.

The only special case is bpf_sk_lookup_assign() which can accept a
literal NULL ptr. It has to be handled specially in another follow
up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]

[ When converting the older helpers that take ARG_PTR_TO_SOCK* in
the later patch, if the kernel does not support BTF,
ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
because no reg->type could have PTR_TO_BTF_ID in this case.

It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
here though because these helpers must require BTF vmlinux to begin
with. ]

Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com

Martin KaFai Lau
2020-09-26 04:58:01 +0800
a968d5e27 bpf: Move the PTR_TO_BTF_ID check to check_reg_type() ... Browse Code »

check_reg_type() checks whether a reg can be used as an arg of a
func_proto. For PTR_TO_BTF_ID, the check is actually not
completely done until the reg->btf_id is pointing to a
kernel struct that is acceptable by the func_proto.

Thus, this patch moves the btf_id check into check_reg_type().
"arg_type" and "arg_btf_id" are passed to check_reg_type() instead of
"compatible". The compatible_reg_types[] usage is localized in
check_reg_type() now.

The "if (!btf_id) verbose(...); " is also removed since it won't happen.

Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: Lorenz Bauer
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com

Martin KaFai Lau
2020-09-26 04:58:01 +0800

24 Sep, 2020

4 commits

182bf3f3d Merge branch 'rtt-speedup.2020.09.16a' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/paulmck/linux-rcu into bpf-next

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Alexei Starovoitov
2020-09-24 10:32:09 +0800
f00f2f7fe Revert "bpf: Fix potential call bpf_link_free() in atomic context" ... Browse Code »

This reverts commit 31f23a6a181c81543b10a1a9056b0e6c7ef1c747.

This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others.
There was no issue in the code.

Signed-off-by: Alexei Starovoitov

Alexei Starovoitov
2020-09-24 10:14:11 +0800
6d772f328 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next ... Browse Code »

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-09-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).

The main changes are:

1) Full multi function support in libbpf, from Andrii.

2) Refactoring of function argument checks, from Lorenz.

3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.

4) Program metadata support, from YiFei.

5) bpf iterator optimizations, from Yonghong.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-09-24 04:11:11 +0800
0789e13bc bpf: Explicitly size compatible_reg_types ... Browse Code »

Arrays with designated initializers have an implicit length of the highest
initialized value plus one. I used this to ensure that newly added entries
in enum bpf_reg_type get a NULL entry in compatible_reg_types.

This is difficult to understand since it requires knowledge of the
peculiarities of designated initializers. Use __BPF_ARG_TYPE_MAX to size
the array instead.

Suggested-by: Alexei Starovoitov
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200923160156.80814-1-lmb@cloudflare.com

Lorenz Bauer
2020-09-24 02:46:46 +0800

23 Sep, 2020

3 commits

3ab0a7a0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.

Signed-off-by: David S. Miller

David S. Miller
2020-09-23 07:45:34 +0800
d3017135c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from Jakub Kicinski:

- fix failure to add bond interfaces to a bridge, the offload-handling
code was too defensive there and recent refactoring unearthed that.
Users complained (Ido)

- fix unnecessarily reflecting ECN bits within TOS values / QoS marking
in TCP ACK and reset packets (Wei)

- fix a deadlock with bpf iterator. Hopefully we're in the clear on
this front now... (Yonghong)

- BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

- fix AQL on mt76 devices with FW rate control and add a couple of AQL
issues in mac80211 code (Felix)

- fix authentication issue with mwifiex (Maximilian)

- WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

- fix exception handling for multipath routes via same device (David
Ahern)

- revert back to a BH spin lock flavor for nsid_lock: there are paths
which do require the BH context protection (Taehee)

- fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

- fix ife module load deadlock (Cong)

- make an adjustment to netlink reply message type for code added in
this release (the sole change touching uAPI here) (Michal)

- a number of fixes for small NXP and Microchip switches (Vladimir)

[ Pull request acked by David: "you can expect more of this in the
future as I try to delegate more things to Jakub" ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
net: Update MAINTAINERS for MediaTek switch driver
net/mlx5e: mlx5e_fec_in_caps() returns a boolean
net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
net/mlx5e: kTLS, Fix leak on resync error flow
net/mlx5e: kTLS, Add missing dma_unmap in RX resync
net/mlx5e: kTLS, Fix napi sync and possible use-after-free
net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
net/mlx5e: Fix endianness when calculating pedit mask first bit
net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
net/mlx5e: CT: Fix freeing ct_label mapping
net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
net/mlx5e: Use synchronize_rcu to sync with NAPI
net/mlx5e: Use RCU to protect rq->xdp_prog
...

Linus Torvalds
2020-09-23 05:43:50 +0800
eff48ddea Merge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:

- Check kprobe is enabled before unregistering from ftrace as it isn't
registered when disabled.

- Remove kprobes enabled via command-line that is on init text when
freed.

- Add missing RCU synchronization for ftrace trampoline symbols removed
from kallsyms.

- Free trampoline on error path if ftrace_startup() fails.

- Give more space for the longer PID numbers in trace output.

- Fix a possible double free in the histogram code.

- A couple of fixes that were discovered by sparse.

* tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: init: make xbc_namebuf static
kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
tracing: fix double free
ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
tracing: Make the space reserved for the pid wider
ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
ftrace: Free the trampoline when ftrace_startup() fails
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

Linus Torvalds
2020-09-23 00:08:33 +0800

22 Sep, 2020

4 commits

a8a717963 selftests/bpf: Fix stat probe in d_path test ... Browse Code »

Some kernels builds might inline vfs_getattr call within fstat
syscall code path, so fentry/vfs_getattr trampoline is not called.

Add security_inode_getattr to allowlist and switch the d_path test stat
trampoline to security_inode_getattr.

Keeping dentry_open and filp_close, because they are in their own
files, so unlikely to be inlined, but in case they are, adding
security_file_open.

Adding flags that indicate trampolines were called and failing
the test if any of them got missed, so it's easier to identify
the issue next time.

Fixes: e4d1af4b16f8 ("selftests/bpf: Add test for d_path helper")
Suggested-by: Alexei Starovoitov
Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200918112338.2618444-1-jolsa@kernel.org

Jiri Olsa
2020-09-22 07:18:00 +0800
f79e7ea57 bpf: Use a table to drive helper arg type checks ... Browse Code »

The mapping between bpf_arg_type and bpf_reg_type is encoded in a big
hairy if statement that is hard to follow. The debug output also leaves
to be desired: if a reg_type doesn't match we only print one of the
options, instead printing all the valid ones.

Convert the if statement into a table which is then used to drive type
checking. If none of the reg_types match we print all options, e.g.:

R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com

Lorenz Bauer
2020-09-22 06:00:41 +0800
fd1b0d604 bpf: Hoist type checking for nullable arg types ... Browse Code »

check_func_arg has a plethora of weird if statements with empty branches.
They work around the fact that *_OR_NULL argument types should accept a
SCALAR_VALUE register, as long as it's value is 0. These statements make
it difficult to reason about the type checking logic.

Instead, skip more detailed type checking logic iff the register is 0,
and the function expects a nullable type. This allows simplifying the type
checking itself.

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-11-lmb@cloudflare.com

Lorenz Bauer
2020-09-22 06:00:41 +0800
c18f0b6ae bpf: Check ARG_PTR_TO_SPINLOCK register type in check_func_arg ... Browse Code »

Move the check for PTR_TO_MAP_VALUE to check_func_arg, where all other
checking is done as well. Move the invocation of process_spin_lock away
from the register type checking, to allow a future refactoring.

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-10-lmb@cloudflare.com

Lorenz Bauer
2020-09-22 06:00:41 +0800