08 Oct, 2020
2 commits
-
When CONFIG_NET is not defined, I hit the following build error:
kernel/trace/bpf_trace.o:(.rodata+0x110): undefined reference to `bpf_prog_test_run_raw_tp'Commit 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
added test_run support for raw_tracepoint in /kernel/trace/bpf_trace.c.
But the test_run function bpf_prog_test_run_raw_tp is defined in
net/bpf/test_run.c, only available with CONFIG_NET=y.Adding a CONFIG_NET guard for
.test_run = bpf_prog_test_run_raw_tp;
fixed the above build issue.Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201007062933.3425899-1-yhs@fb.com -
Fix build errors in kernel/bpf/verifier.c when CONFIG_NET is
not enabled.../kernel/bpf/verifier.c:3995:13: error: ‘btf_sock_ids’ undeclared here (not in a function); did you mean ‘bpf_sock_ops’?
.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],../kernel/bpf/verifier.c:3995:26: error: ‘BTF_SOCK_TYPE_SOCK_COMMON’ undeclared here (not in a function); did you mean ‘PTR_TO_SOCK_COMMON’?
.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],Fixes: 1df8f55a37bd ("bpf: Enable bpf_skc_to_* sock casting helper to networking prog type")
Signed-off-by: Randy Dunlap
Signed-off-by: Alexei Starovoitov
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20201007021613.13646-1-rdunlap@infradead.org
06 Oct, 2020
1 commit
-
Recent improvements in LOCKDEP highlighted a potential A-A deadlock with
pcpu_freelist in NMI:./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi
[ 18.984807] ================================
[ 18.984807] WARNING: inconsistent lock state
[ 18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
[ 18.984809] --------------------------------
[ 18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[ 18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
[ 18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180
[ 18.984813] {INITIAL USE} state was registered at:
[ 18.984814] lock_acquire+0x175/0x7c0
[ 18.984814] _raw_spin_lock+0x2c/0x40
[ 18.984815] __pcpu_freelist_pop+0xe3/0x180
[ 18.984815] pcpu_freelist_pop+0x31/0x40
[ 18.984816] htab_map_alloc+0xbbf/0xf40
[ 18.984816] __do_sys_bpf+0x5aa/0x3ed0
[ 18.984817] do_syscall_64+0x2d/0x40
[ 18.984818] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 18.984818] irq event stamp: 12
[...]
[ 18.984822] other info that might help us debug this:
[ 18.984823] Possible unsafe locking scenario:
[ 18.984823]
[ 18.984824] CPU0
[ 18.984824] ----
[ 18.984824] lock(&head->lock);
[ 18.984826]
[ 18.984826] lock(&head->lock);
[ 18.984827]
[ 18.984828] *** DEADLOCK ***
[ 18.984828]
[ 18.984829] 2 locks held by test_progs/1990:
[...]
[ 18.984838]
[ 18.984838] dump_stack+0x9a/0xd0
[ 18.984839] lock_acquire+0x5c9/0x7c0
[ 18.984839] ? lock_release+0x6f0/0x6f0
[ 18.984840] ? __pcpu_freelist_pop+0xe3/0x180
[ 18.984840] _raw_spin_lock+0x2c/0x40
[ 18.984841] ? __pcpu_freelist_pop+0xe3/0x180
[ 18.984841] __pcpu_freelist_pop+0xe3/0x180
[ 18.984842] pcpu_freelist_pop+0x17/0x40
[ 18.984842] ? lock_release+0x6f0/0x6f0
[ 18.984843] __bpf_get_stackid+0x534/0xaf0
[ 18.984843] bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350
[ 18.984844] bpf_overflow_handler+0x12f/0x3f0This is because pcpu_freelist_head.lock is accessed in both NMI and
non-NMI context. Fix this issue by using raw_spin_trylock() in NMI.Since NMI interrupts non-NMI context, when NMI context tries to lock the
raw_spinlock, non-NMI context of the same CPU may already have locked a
lock and is blocked from unlocking the lock. For a system with N CPUs,
there could be N NMIs at the same time, and they may block N non-NMI
raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike
_pop(), failing _push() means leaking memory. This issue is more likely to
trigger in non-SMP system.Fix this issue with an extra list, pcpu_freelist.extralist. The extralist
is primarily used to take _push() when raw_spin_trylock() failed on all
the per CPU lists. It should be empty most of the time. The following
table summarizes the behavior of pcpu_freelist in NMI and non-NMI:non-NMI pop(): use _lock(); check per CPU lists first;
if all per CPU lists are empty, check extralist;
if extralist is empty, return NULL.non-NMI push(): use _lock(); only push to per CPU lists.
NMI pop(): use _trylock(); check per CPU lists first;
if all per CPU lists are locked or empty, check extralist;
if extralist is locked or empty, return NULL.NMI push(): use _trylock(); check per CPU lists first;
if all per CPU lists are locked; try push to extralist;
if extralist is also locked, keep trying on per CPU lists.Reported-by: Alexei Starovoitov
Signed-off-by: Song Liu
Signed-off-by: Daniel Borkmann
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com
05 Oct, 2020
1 commit
-
Replace /* fallthrough */ comments with the new pseudo-keyword
macro fallthrough [1].[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva
Signed-off-by: Daniel Borkmann
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20201002234217.GA12280@embeddedor
03 Oct, 2020
4 commits
-
We are missing a deref for the case when we are doing BPF_PROG_BIND_MAP
on a map that's being already held by the program.
There is 'if (ret) bpf_map_put(map)' below which doesn't trigger
because we don't consider this an error.
Let's add missing bpf_map_put() for this specific condition.Fixes: ef15314aa5de ("bpf: Add BPF_PROG_BIND_MAP syscall")
Reported-by: Alexei Starovoitov
Signed-off-by: Stanislav Fomichev
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201003002544.3601440-1-sdf@google.com -
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com -
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com -
Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.>From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
01 Oct, 2020
2 commits
-
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.Signed-off-by: Song Liu
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com -
With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Reviewed-by: Eric Dumazet
Acked-by: Martin KaFai Lau
Cc: Eric Dumazet
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
30 Sep, 2020
4 commits
-
Eelco reported we can't properly access arguments if the tracing
program is attached to extension program.Having following program:
SEC("classifier/test_pkt_md_access")
int test_pkt_md_access(struct __sk_buff *skb)with its extension:
SEC("freplace/test_pkt_md_access")
int test_pkt_md_access_new(struct __sk_buff *skb)and tracing that extension with:
SEC("fentry/test_pkt_md_access_new")
int BPF_PROG(fentry, struct sk_buff *skb)It's not possible to access skb argument in the fentry program,
with following error from verifier:; int BPF_PROG(fentry, struct sk_buff *skb)
0: (79) r1 = *(u64 *)(r1 +0)
invalid bpf_context access off=0 size=8The problem is that btf_ctx_access gets the context type for the
traced program, which is in this case the extension.But when we trace extension program, we want to get the context
type of the program that the extension is attached to, so we can
access the argument properly in the trace program.This version of the patch is tweaked slightly from Jiri's original one,
since the refactoring in the previous patches means we have to get the
target prog type from the new variable in prog->aux instead of directly
from the target prog.Reported-by: Eelco Chaudron
Suggested-by: Jiri Olsa
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355278.48470.17057040257274725638.stgit@toke.dk -
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk -
In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk -
The Makefile in bpf/preload builds a local copy of libbpf, but does not
properly clean up after itself. This can lead to subsequent compilation
failures, since the feature detection cache is kept around which can lead
subsequent detection to fail.Fix this by properly setting clean-files, and while we're at it, also add a
.gitignore for the directory to ignore the build artifacts.Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.")
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200927193005.8459-1-toke@redhat.com
29 Sep, 2020
11 commits
-
A helper is added to allow seq file writing of kernel data
structures using vmlinux BTF. Its signature islong bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);Flags and struct btf_ptr definitions/use are identical to the
bpf_snprintf_btf helper, and the helper returns 0 on success
or a negative error value.Suggested-by: Alexei Starovoitov
Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com -
BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based
representations of larger kernel data structures such as task_struct,
this will be insufficient.Suggested-by: Alexei Starovoitov
Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-6-git-send-email-alan.maguire@oracle.com -
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF). Its signature islong bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
u32 btf_ptr_size, u64 flags);struct btf_ptr * specifies
- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
are not to be confused with the BTF_F_* flags
below that control how the btf_ptr is displayed; the
flags member of the struct btf_ptr may be used to
disambiguate types in kernel versus module BTF, etc;
the main distinction is the flags relate to the type
and information needed in identifying it; not how it
is displayed.For example a BPF program with a struct sk_buff *skb
could do the following:static struct btf_ptr b = { };
b.ptr = skb;
b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);Default output looks like this:
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}Flags modifying display are as follows:
- BTF_F_COMPACT: no formatting around type information
- BTF_F_NONAME: no struct/union member names/types
- BTF_F_PTR_RAW: show raw (unobfuscated) pointer values;
equivalent to %px.
- BTF_F_ZERO: show zero-valued struct/union members;
they are not displayed by defaultSigned-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com -
generalize the "seq_show" seq file support in btf.c to support
a generic show callback of which we support two instances; the
current seq file show, and a show with snprintf() behaviour which
instead writes the type data to a supplied string.Both classes of show function call btf_type_show() with different
targets; the seq file or the string to be written. In the string
case we need to track additional data - length left in string to write
and length to return that we would have written (a la snprintf).By default show will display type information, field members and
their types and values etc, and the information is indented
based upon structure depth. Zeroed fields are omitted.Show however supports flags which modify its behaviour:
BTF_SHOW_COMPACT - suppress newline/indent.
BTF_SHOW_NONAME - suppress show of type and member names.
BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
BTF_SHOW_ZERO - show zeroed values (by default they are not shown).Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com -
It will be used later for BPF structure display support
Signed-off-by: Alan Maguire
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com -
The check_attach_btf_id() function really does three things:
1. It performs a bunch of checks on the program to ensure that the
attachment is valid.2. It stores a bunch of state about the attachment being requested in
the verifier environment and struct bpf_prog objects.3. It allocates a trampoline for the attachment.
This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov -
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov -
From the checks and commit messages for modify_return, it seems it was
never the intention that it should be possible to attach a tracing program
with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
However, check_attach_modify_return() will only look at the function name,
so if the target function starts with "security_", the attach will be
allowed even for bpf2bpf attachment.Fix this oversight by also blocking the modification if a target program is
supplied.Fixes: 18644cec714a ("bpf: Fix use-after-free in fmod_ret check")
Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Acked-by: Andrii Nakryiko
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov -
Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Since we may be passed a request or timewait
socket we also need to check sk_fullsock. Doing this allows calling
map_update_elem on sockmap from bpf_iter context, which uses
BTF pointers.Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com -
Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine
signature since it is no longer needed.Signed-off-by: Lorenzo Bianconi
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org -
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.Signed-off-by: Song Liu
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
26 Sep, 2020
4 commits
-
In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
tnum is a constant.1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
2 scalar32_min_max_[op]
3 if (known) return
4 scalar_min_max_[op]
5 if (known)
6 __mark_reg_known(dst_reg,
dst_reg->var_off.value [op] src_reg.var_off.value)The result is in 1 we calculate the var_off value and store it in the
dst_reg. Then in 6 we duplicate this logic doing the op again on the
value.The duplication comes from the the tnum_[op] handlers because they have
already done the value calcuation. For example this is tnum_and().struct tnum tnum_and(struct tnum a, struct tnum b)
{
u64 alpha, beta, v;alpha = a.value | a.mask;
beta = b.value | b.mask;
v = a.value & b.value;
return TNUM(v, alpha & beta & ~v);
}So lets remove the redundant op calculation. Its confusing for readers
and unnecessary. Its also not harmful because those ops have the
property, r1 & r1 = r1 and r1 | r1 = r1.Signed-off-by: John Fastabend
Signed-off-by: Alexei Starovoitov -
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get(). It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: Lorenz Bauer
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com -
There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.A current workaround could be to use bpf_probe_read_kernel(). However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example. While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id. For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is
always the same. Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL. This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected. This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
ARG_PTR_TO_SOCK* of other existing helpers later such that
those existing helpers can take the PTR_TO_BTF_ID returned by
the bpf_skc_to_*() helpers.The only special case is bpf_sk_lookup_assign() which can accept a
literal NULL ptr. It has to be handled specially in another follow
up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ][ When converting the older helpers that take ARG_PTR_TO_SOCK* in
the later patch, if the kernel does not support BTF,
ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
because no reg->type could have PTR_TO_BTF_ID in this case.It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
here though because these helpers must require BTF vmlinux to begin
with. ]Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com -
check_reg_type() checks whether a reg can be used as an arg of a
func_proto. For PTR_TO_BTF_ID, the check is actually not
completely done until the reg->btf_id is pointing to a
kernel struct that is acceptable by the func_proto.Thus, this patch moves the btf_id check into check_reg_type().
"arg_type" and "arg_btf_id" are passed to check_reg_type() instead of
"compatible". The compatible_reg_types[] usage is localized in
check_reg_type() now.The "if (!btf_id) verbose(...); " is also removed since it won't happen.
Signed-off-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov
Acked-by: Lorenz Bauer
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com
24 Sep, 2020
4 commits
-
…nel/git/paulmck/linux-rcu into bpf-next
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
This reverts commit 31f23a6a181c81543b10a1a9056b0e6c7ef1c747.
This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others.
There was no issue in the code.Signed-off-by: Alexei Starovoitov
-
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-09-23The following pull-request contains BPF updates for your *net-next* tree.
We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).The main changes are:
1) Full multi function support in libbpf, from Andrii.
2) Refactoring of function argument checks, from Lorenz.
3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.
4) Program metadata support, from YiFei.
5) bpf iterator optimizations, from Yonghong.
====================Signed-off-by: David S. Miller
-
Arrays with designated initializers have an implicit length of the highest
initialized value plus one. I used this to ensure that newly added entries
in enum bpf_reg_type get a NULL entry in compatible_reg_types.This is difficult to understand since it requires knowledge of the
peculiarities of designated initializers. Use __BPF_ARG_TYPE_MAX to size
the array instead.Suggested-by: Alexei Starovoitov
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200923160156.80814-1-lmb@cloudflare.com
23 Sep, 2020
3 commits
-
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.Signed-off-by: David S. Miller
-
Pull networking fixes from Jakub Kicinski:
- fix failure to add bond interfaces to a bridge, the offload-handling
code was too defensive there and recent refactoring unearthed that.
Users complained (Ido)- fix unnecessarily reflecting ECN bits within TOS values / QoS marking
in TCP ACK and reset packets (Wei)- fix a deadlock with bpf iterator. Hopefully we're in the clear on
this front now... (Yonghong)- BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)
- fix AQL on mt76 devices with FW rate control and add a couple of AQL
issues in mac80211 code (Felix)- fix authentication issue with mwifiex (Maximilian)
- WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)
- fix exception handling for multipath routes via same device (David
Ahern)- revert back to a BH spin lock flavor for nsid_lock: there are paths
which do require the BH context protection (Taehee)- fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)
- fix ife module load deadlock (Cong)
- make an adjustment to netlink reply message type for code added in
this release (the sole change touching uAPI here) (Michal)- a number of fixes for small NXP and Microchip switches (Vladimir)
[ Pull request acked by David: "you can expect more of this in the
future as I try to delegate more things to Jakub" ]* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
net: Update MAINTAINERS for MediaTek switch driver
net/mlx5e: mlx5e_fec_in_caps() returns a boolean
net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
net/mlx5e: kTLS, Fix leak on resync error flow
net/mlx5e: kTLS, Add missing dma_unmap in RX resync
net/mlx5e: kTLS, Fix napi sync and possible use-after-free
net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
net/mlx5e: Fix endianness when calculating pedit mask first bit
net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
net/mlx5e: CT: Fix freeing ct_label mapping
net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
net/mlx5e: Use synchronize_rcu to sync with NAPI
net/mlx5e: Use RCU to protect rq->xdp_prog
... -
Pull tracing fixes from Steven Rostedt:
- Check kprobe is enabled before unregistering from ftrace as it isn't
registered when disabled.- Remove kprobes enabled via command-line that is on init text when
freed.- Add missing RCU synchronization for ftrace trampoline symbols removed
from kallsyms.- Free trampoline on error path if ftrace_startup() fails.
- Give more space for the longer PID numbers in trace output.
- Fix a possible double free in the histogram code.
- A couple of fixes that were discovered by sparse.
* tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: init: make xbc_namebuf static
kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
tracing: fix double free
ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
tracing: Make the space reserved for the pid wider
ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
ftrace: Free the trampoline when ftrace_startup() fails
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
22 Sep, 2020
4 commits
-
Some kernels builds might inline vfs_getattr call within fstat
syscall code path, so fentry/vfs_getattr trampoline is not called.Add security_inode_getattr to allowlist and switch the d_path test stat
trampoline to security_inode_getattr.Keeping dentry_open and filp_close, because they are in their own
files, so unlikely to be inlined, but in case they are, adding
security_file_open.Adding flags that indicate trampolines were called and failing
the test if any of them got missed, so it's easier to identify
the issue next time.Fixes: e4d1af4b16f8 ("selftests/bpf: Add test for d_path helper")
Suggested-by: Alexei Starovoitov
Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200918112338.2618444-1-jolsa@kernel.org -
The mapping between bpf_arg_type and bpf_reg_type is encoded in a big
hairy if statement that is hard to follow. The debug output also leaves
to be desired: if a reg_type doesn't match we only print one of the
options, instead printing all the valid ones.Convert the if statement into a table which is then used to drive type
checking. If none of the reg_types match we print all options, e.g.:R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com -
check_func_arg has a plethora of weird if statements with empty branches.
They work around the fact that *_OR_NULL argument types should accept a
SCALAR_VALUE register, as long as it's value is 0. These statements make
it difficult to reason about the type checking logic.Instead, skip more detailed type checking logic iff the register is 0,
and the function expects a nullable type. This allows simplifying the type
checking itself.Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-11-lmb@cloudflare.com -
Move the check for PTR_TO_MAP_VALUE to check_func_arg, where all other
checking is done as well. Move the invocation of process_spin_lock away
from the register type checking, to allow a future refactoring.Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200921121227.255763-10-lmb@cloudflare.com