Eric Lee / smarc-fsl-linux-kernel

15 Sep, 2020

4 commits

74e00676d samples/bpf: Add quiet option to xdpsock ... Browse Code »

Add a quiet option (-Q) that disables the statistics print outs of
xdpsock. This is good to have when measuring 0% loss rate performance
as it will be quite terrible if the application uses printfs.

Signed-off-by: Magnus Karlsson
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1599726666-8431-4-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-09-15 09:38:11 +0800
5a2a0dd88 samples/bpf: Fix possible deadlock in xdpsock ... Browse Code »

Fix a possible deadlock in the l2fwd application in xdpsock that can
occur when there is no space in the Tx ring. There are two ways to get
the kernel to consume entries in the Tx ring: calling sendto() to make
it send packets and freeing entries from the completion ring, as the
kernel will not send a packet if there is no space for it to add a
completion entry in the completion ring. The Tx loop in l2fwd only
used to call sendto(). This patches adds cleaning the completion ring
in that loop.

Signed-off-by: Magnus Karlsson
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1599726666-8431-3-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-09-15 09:38:11 +0800
3131cf66d samples/bpf: Fix one packet sending in xdpsock ... Browse Code »

Fix the sending of a single packet (or small burst) in xdpsock when
executing in copy mode. Currently, the l2fwd application in xdpsock
only transmits the packets after a batch of them has been received,
which might be confusing if you only send one packet and expect that
it is returned pronto. Fix this by calling sendto() more often and add
a comment in the code that states that this can be optimized if
needed.

Reported-by: Tirthendu Sarkar
Signed-off-by: Magnus Karlsson
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/1599726666-8431-2-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-09-15 09:38:11 +0800
d72714c1d s390/bpf: Fix multiple tail calls ... Browse Code »

In order to branch around tail calls (due to out-of-bounds index,
exceeding tail call count or missing tail call target), JIT uses
label[0] field, which contains the address of the instruction following
the tail call. When there are multiple tail calls, label[0] value comes
from handling of a previous tail call, which is incorrect.

Fix by getting rid of label array and resolving the label address
locally: for all 3 branches that jump to it, emit 0 offsets at the
beginning, and then backpatch them with the correct value.

Also, do not use the long jump infrastructure: the tail call sequence
is known to be short, so make all 3 jumps short.

Fixes: 6651ee070b31 ("s390/bpf: implement bpf_tail_call() helper")
Signed-off-by: Ilya Leoshkevich
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200909232141.3099367-1-iii@linux.ibm.com

Ilya Leoshkevich
2020-09-15 09:21:31 +0800

11 Sep, 2020

22 commits

2bab48c5b Merge branch 'improve-bpf-tcp-cc-init' ... Browse Code »

Neal Cardwell says:

====================
This patch series reorganizes TCP congestion control initialization so that if
EBPF code called by tcp_init_transfer() sets the congestion control algorithm
by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
congestion control module immediately, instead of having tcp_init_transfer()
later initialize the congestion control module.

This increases flexibility for the EBPF code that runs at connection
establishment time, and simplifies the code.

This has the following benefits:

(1) This allows CC module customizations made by the EBPF called in
tcp_init_transfer() to persist, and not be wiped out by a later
call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
EBPF called in tcp_init_transfer() wants to set the CC to a new CC
algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
and net/ipv4/tcp_cong.c, which currently both have some complexity
to special-case CC initialization to avoid double CC
initialization if EBPF sets the CC.

changes in v2:

o rebase onto bpf-next

o add another follow-on simplification suggested by Martin KaFai Lau:
"tcp: simplify tcp_set_congestion_control() load=false case"

changes in v3:

o no change in commits

o resent patch series from @gmail.com, since mail from ncardwell@google.com
stopped being accepted at netdev@vger.kernel.org mid-way through processing
the v2 patch series (between patches 2 and 3), confusing patchwork about
which patches belonged to the v2 patch series
====================

Acked-by: Martin KaFai Lau
Signed-off-by: Alexei Starovoitov

Alexei Starovoitov
2020-09-11 11:53:15 +0800
5050bef87 tcp: Simplify tcp_set_congestion_control() load=false case ... Browse Code »

Simplify tcp_set_congestion_control() by removing the initialization
code path for the !load case.

There are only two call sites for tcp_set_congestion_control(). The
EBPF call site is the only one that passes load=false; it also passes
cap_net_admin=true. Because of that, the exact same behavior can be
achieved by removing the special if (!load) branch of the logic. Both
before and after this commit, the EBPF case will call
bpf_try_module_get(), and if that succeeds then call
tcp_reinit_congestion_control() or if that fails then return EBUSY.

Note that this returns the logic to a structure very similar to the
structure before:
commit 91b5b21c7c16 ("bpf: Add support for changing congestion control")
except that the CAP_NET_ADMIN status is passed in as a function
argument.

This clean-up was suggested by Martin KaFai Lau.

Suggested-by: Martin KaFai Lau
Signed-off-by: Neal Cardwell
Signed-off-by: Alexei Starovoitov
Cc: Lawrence Brakmo
Cc: Eric Dumazet
Cc: Yuchung Cheng
Cc: Kevin Yang

Neal Cardwell
2020-09-11 11:53:01 +0800
5cdc744ca tcp: simplify _bpf_setsockopt(): Remove flags argument ... Browse Code »

Now that the previous patches have removed the code that uses the
flags argument to _bpf_setsockopt(), we can remove that argument.

Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Cc: Lawrence Brakmo

Neal Cardwell
2020-09-11 11:53:01 +0800
29a949325 tcp: simplify tcp_set_congestion_control(): Always reinitialize ... Browse Code »

Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.

Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Cc: Lawrence Brakmo

Neal Cardwell
2020-09-11 11:53:01 +0800
e7b10a4dd tcp: Simplify EBPF TCP_CONGESTION to always init CC ... Browse Code »

Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.

Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Cc: Lawrence Brakmo

Neal Cardwell
2020-09-11 11:53:01 +0800
8919a9b31 tcp: Only init congestion control if not initialized already ... Browse Code »

Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.

With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.

This is an approach that has the following beneficial properties:

(1) This allows CC module customizations made by the EBPF called in
tcp_init_transfer() to persist, and not be wiped out by a later
call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
EBPF called in tcp_init_transfer() wants to set the CC to a new CC
algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
and net/ipv4/tcp_cong.c, which currently both have some complexity
to special-case CC initialization to avoid double CC
initialization if EBPF sets the CC.

Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: Alexei Starovoitov
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Cc: Lawrence Brakmo

Neal Cardwell
2020-09-11 11:53:01 +0800
18841da98 tools: bpftool: Automate generation for "SEE ALSO" sections in man pages ... Browse Code »

The "SEE ALSO" sections of bpftool's manual pages refer to bpf(2),
bpf-helpers(7), then all existing bpftool man pages (save the current
one).

This leads to nearly-identical lists being duplicated in all manual
pages. Ideally, when a new page is created, all lists should be updated
accordingly, but this has led to omissions and inconsistencies multiple
times in the past.

Let's take it out of the RST files and generate the "SEE ALSO" sections
automatically in the Makefile when generating the man pages. The lists
are not really useful in the RST anyway because all other pages are
available in the same directory.

v3:
- Fix conflict with a previous patchset that introduced RST2MAN_OPTS
variable passed to rst2man.

v2:
- Use "echo -n" instead of "printf" in Makefile, to avoid any risk of
passing a format string directly to the command.

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910203935.25304-1-quentin@isovalent.com

Quentin Monnet
2020-09-11 11:10:45 +0800
1aef5b439 bpf: Fix comment for helper bpf_current_task_under_cgroup() ... Browse Code »

This should be "current" not "skb".

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Song Liu
Signed-off-by: Alexei Starovoitov
Cc:
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com

Song Liu
2020-09-11 11:04:51 +0800
6e057fc15 selftests/bpf: Define string const as global for test_sysctl_prog.c ... Browse Code »

When tweaking llvm optimizations, I found that selftest build failed
with the following error:
libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
in section '.rodata.str1.1'
Error: failed to open BPF object file: Relocation failed
make: *** [/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h] Error 255
make: *** Deleting file `/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h'

The local string constant "tcp_mem_name" is put into '.rodata.str1.1' section
which libbpf cannot handle. Using untweaked upstream llvm, "tcp_mem_name"
is completely inlined after loop unrolling.

Commit 7fb5eefd7639 ("selftests/bpf: Fix test_sysctl_loop{1, 2}
failure due to clang change") solved a similar problem by defining
the string const as a global. Let us do the same here
for test_sysctl_prog.c so it can weather future potential llvm changes.

Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910202718.956042-1-yhs@fb.com

Yonghong Song
2020-09-11 11:01:53 +0800
90a1deda7 selftests/bpf: Fix test_ksyms on non-SMP kernels ... Browse Code »

On non-SMP kernels __per_cpu_start is not 0, so look it up in kallsyms.

Signed-off-by: Ilya Leoshkevich
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910171336.3161995-1-iii@linux.ibm.com

Ilya Leoshkevich
2020-09-11 10:53:58 +0800
d66423fbe bpf: Plug hole in struct bpf_sk_lookup_kern ... Browse Code »

As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes.
This leads to suboptimal instructions being generated (IPv4, x86):

1372 struct bpf_sk_lookup_kern ctx = {
0xffffffff81b87f30 : xor %eax,%eax
0xffffffff81b87f32 : mov $0x6,%ecx
0xffffffff81b87f37 : lea 0x90(%rsp),%rdi
0xffffffff81b87f3f : movl $0x110002,0x88(%rsp)
0xffffffff81b87f4a : rep stos %rax,%es:(%rdi)
0xffffffff81b87f4d : mov 0x8(%rsp),%eax
0xffffffff81b87f51 : mov %r13d,0x90(%rsp)
0xffffffff81b87f59 : incl %gs:0x7e4970a0(%rip)
0xffffffff81b87f60 : mov %eax,0x8c(%rsp)
0xffffffff81b87f67 : movzwl 0x10(%rsp),%eax
0xffffffff81b87f6c : mov %ax,0xa8(%rsp)
0xffffffff81b87f74 : movzwl 0x38(%rsp),%eax
0xffffffff81b87f79 : mov %ax,0xaa(%rsp)

Fix this by moving around sport and dport. pahole confirms there
are no more holes:

struct bpf_sk_lookup_kern {
u16 family; /* 0 2 */
u16 protocol; /* 2 2 */
__be16 sport; /* 4 2 */
u16 dport; /* 6 2 */
struct {
__be32 saddr; /* 8 4 */
__be32 daddr; /* 12 4 */
} v4; /* 8 8 */
struct {
const struct in6_addr * saddr; /* 16 8 */
const struct in6_addr * daddr; /* 24 8 */
} v6; /* 16 16 */
struct sock * selected_sk; /* 32 8 */
bool no_reuseport; /* 40 1 */

/* size: 48, cachelines: 1, members: 8 */
/* padding: 7 */
/* last cacheline: 48 bytes */
};

The assembly also doesn't contain the pesky rep stos anymore:

1372 struct bpf_sk_lookup_kern ctx = {
0xffffffff81b87f60 : movzwl 0x10(%rsp),%eax
0xffffffff81b87f65 : movq $0x0,0xa8(%rsp)
0xffffffff81b87f71 : movq $0x0,0xb0(%rsp)
0xffffffff81b87f7d : mov %ax,0x9c(%rsp)
0xffffffff81b87f85 : movzwl 0x38(%rsp),%eax
0xffffffff81b87f8a : movq $0x0,0xb8(%rsp)
0xffffffff81b87f96 : mov %ax,0x9e(%rsp)
0xffffffff81b87f9e : mov 0x8(%rsp),%eax
0xffffffff81b87fa2 : movq $0x0,0xc0(%rsp)
0xffffffff81b87fae : movl $0x110002,0x98(%rsp)
0xffffffff81b87fb9 : mov %eax,0xa0(%rsp)
0xffffffff81b87fc0 : mov %r13d,0xa4(%rsp)

1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/

Fixes: e9ddbb7707ff ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point")
Suggested-by: Alexei Starovoitov
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Jesper Dangaard Brouer
Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com

Lorenz Bauer
2020-09-11 08:47:50 +0800
e3b9626f0 tools: bpftool: Add "inner_map" to "bpftool map create" outer maps ... Browse Code »

There is no support for creating maps of types array-of-map or
hash-of-map in bpftool. This is because the kernel needs an inner_map_fd
to collect metadata on the inner maps to be supported by the new map,
but bpftool does not provide a way to pass this file descriptor.

Add a new optional "inner_map" keyword that can be used to pass a
reference to a map, retrieve a fd to that map, and pass it as the
inner_map_fd.

Add related documentation and bash completion. Note that we can
reference the inner map by its name, meaning we can have several times
the keyword "name" with different meanings (mandatory outer map name,
and possibly a name to use to find the inner_map_fd). The bash
completion will offer it just once, and will not suggest "name" on the
following command:

# bpftool map create /sys/fs/bpf/my_outer_map type hash_of_maps \
inner_map name my_inner_map [TAB]

Fixing that specific case seems too convoluted. Completion will work as
expected, however, if the outer map name comes first and the "inner_map
name ..." is passed second.

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910102652.10509-4-quentin@isovalent.com

Quentin Monnet
2020-09-11 08:29:21 +0800
86233ce35 tools: bpftool: Keep errors for map-of-map dumps if distinct from ENOENT ... Browse Code »

When dumping outer maps or prog_array maps, and on lookup failure,
bpftool simply skips the entry with no error message. This is because
the kernel returns non-zero when no value is found for the provided key,
which frequently happen for those maps if they have not been filled.

When such a case occurs, errno is set to ENOENT. It seems unlikely we
could receive other error codes at this stage (we successfully retrieved
map info just before), but to be on the safe side, let's skip the entry
only if errno was ENOENT, and not for the other errors.

v3: New patch

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910102652.10509-3-quentin@isovalent.com

Quentin Monnet
2020-09-11 08:29:20 +0800
a20693b6e tools: bpftool: Clean up function to dump map entry ... Browse Code »

The function used to dump a map entry in bpftool is a bit difficult to
follow, as a consequence to earlier refactorings. There is a variable
("num_elems") which does not appear to be necessary, and the error
handling would look cleaner if moved to its own function. Let's clean it
up. No functional change.

v2:
- v1 was erroneously removing the check on fd maps in an attempt to get
support for outer map dumps. This is already working. Instead, v2
focuses on cleaning up the dump_map_elem() function, to avoid
similar confusion in the future.

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200910102652.10509-2-quentin@isovalent.com

Quentin Monnet
2020-09-11 08:29:20 +0800
2f7de9865 selftests: bpf: Test iterating a sockmap ... Browse Code »

Add a test that exercises a basic sockmap / sockhash iteration. For
now we simply count the number of elements seen. Once sockmap update
from iterators works we can extend this to perform a full copy.

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200909162712.221874-4-lmb@cloudflare.com

Lorenz Bauer
2020-09-11 03:35:26 +0800
036535152 net: Allow iterating sockmap and sockhash ... Browse Code »

Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
hashtable implementation. sockmap and sockhash share the same iteration
context: a pointer to an arbitrary key and a pointer to a socket. Both
pointers may be NULL, and so BPF has to perform a NULL check before accessing
them. Technically it's not possible for sockhash iteration to yield a NULL
socket, but we ignore this to be able to use a single iteration point.

Iteration will visit all keys that remain unmodified during the lifetime of
the iterator. It may or may not visit newly added ones.

Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain
another guard rail if CONFIG_PROVE_RCU is enabled.

Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com

Lorenz Bauer
2020-09-11 03:31:55 +0800
654785a1a net: sockmap: Remove unnecessary sk_fullsock checks ... Browse Code »

The lookup paths for sockmap and sockhash currently include a check
that returns NULL if the socket we just found is not a full socket.
However, this check is not necessary. On insertion we ensure that
we have a full socket (caveat around sock_ops), so request sockets
are not a problem. Time-wait sockets are allocated separate from
the original socket and then fed into the hashdance. They don't
affect the sockets already stored in the sockmap.

Suggested-by: Jakub Sitnicki
Signed-off-by: Lorenz Bauer
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com

Lorenz Bauer
2020-09-11 03:31:55 +0800
f28ef96d7 tools: bpftool: Include common options from separate file ... Browse Code »

Nearly all man pages for bpftool have the same common set of option
flags (--help, --version, --json, --pretty, --debug). The description is
duplicated across all the pages, which is more difficult to maintain if
the description of an option changes. It may also be confusing to sort
out what options are not "common" and should not be copied when creating
new manual pages.

Let's move the description for those common options to a separate file,
which is included with a RST directive when generating the man pages.

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200909162500.17010-3-quentin@isovalent.com

Quentin Monnet
2020-09-11 02:16:46 +0800
82b8cf0ac tools: bpftool: Print optional built-in features along with version ... Browse Code »

Bpftool has a number of features that can be included or left aside
during compilation. This includes:

- Support for libbfd, providing the disassembler for JIT-compiled
programs.
- Support for BPF skeletons, used for profiling programs or iterating on
the PIDs of processes associated with BPF objects.

In order to make it easy for users to understand what features were
compiled for a given bpftool binary, print the status of the two
features above when showing the version number for bpftool ("bpftool -V"
or "bpftool version"). Document this in the main manual page. Example
invocations:

$ bpftool version
./bpftool v5.9.0-rc1
features: libbfd, skeletons

$ bpftool -p version
{
"version": "5.9.0-rc1",
"features": {
"libbfd": true,
"skeletons": true
}
}

Some other parameters are optional at compilation
("DISASM_FOUR_ARGS_SIGNATURE", LIBCAP support) but they do not impact
significantly bpftool's behaviour from a user's point of view, so their
status is not reported.

Available commands and supported program types depend on the version
number, and are therefore not reported either. Note that they are
already available, albeit without JSON, via bpftool's help messages.

v3:
- Use a simple list instead of boolean values for plain output.

v2:
- Fix JSON (object instead or array for the features).

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200909162500.17010-2-quentin@isovalent.com

Quentin Monnet
2020-09-11 02:16:46 +0800
41d5c37b7 selftests, bpftool: Add bpftool (and eBPF helpers) documentation build ... Browse Code »

eBPF selftests include a script to check that bpftool builds correctly
with different command lines. Let's add one build for bpftool's
documentation so as to detect errors or warning reported by rst2man when
compiling the man pages. Also add a build to the selftests Makefile to
make sure we build bpftool documentation along with bpftool when
building the selftests.

This also builds and checks warnings for the man page for eBPF helpers,
which is built along bpftool's documentation.

This change adds rst2man as a dependency for selftests (it comes with
Python's "docutils").

v2:
- Use "--exit-status=1" option for rst2man instead of counting lines
from stderr.
- Also build bpftool as part as the selftests build (and not only when
the tests are actually run).

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200909162251.15498-3-quentin@isovalent.com

Quentin Monnet
2020-09-11 02:02:45 +0800
16f3ddfba tools: bpftool: Log info-level messages when building bpftool man pages ... Browse Code »

To build man pages for bpftool (and for eBPF helper functions), rst2man
can log different levels of information. Let's make it log all levels
to keep the RST files clean.

Doing so, rst2man complains about double colons, used for literal
blocks, that look like underlines for section titles. Let's add the
necessary blank lines.

v2:
- Use "--verbose" instead of "-r 1" (same behaviour but more readable).
- Pass it through a RST2MAN_OPTS variable so we can easily pass other
options too.

Signed-off-by: Quentin Monnet
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200909162251.15498-2-quentin@isovalent.com

Quentin Monnet
2020-09-11 02:02:45 +0800
e9091bb77 bpf: Remove duplicate headers ... Browse Code »

Remove duplicate headers which are included twice.

Signed-off-by: Chen Zhou
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com

Chen Zhou
2020-09-11 01:53:14 +0800

10 Sep, 2020

2 commits

8081ede1f perf: Stop using deprecated bpf_program__title() ... Browse Code »

Switch from deprecated bpf_program__title() API to
bpf_program__section_name(). Also drop unnecessary error checks because
neither bpf_program__title() nor bpf_program__section_name() can fail or
return NULL.

Fixes: 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Reviewed-by: Tobias Klauser
Acked-by: Jiri Olsa
Link: https://lore.kernel.org/bpf/20200908180127.1249-1-andriin@fb.com

Andrii Nakryiko
2020-09-10 02:28:28 +0800
7fb5eefd7 selftests/bpf: Fix test_sysctl_loop{1, 2} failure due to clang change ... Browse Code »

Andrii reported that with latest clang, when building selftests, we have
error likes:
error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*):
Looks like the BPF stack limit of 512 bytes is exceeded.
Please move large on stack variables into BPF per-cpu array map.

The error is triggered by the following LLVM patch:
https://reviews.llvm.org/D87134

For example, the following code is from test_sysctl_loop1.c:
static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
{
volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string";
...
}
Without the above LLVM patch, the compiler did optimization to load the string
(59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load,
occupying 64 byte stack size.

With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit.
So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on
the stack, total stack size exceeds 512 bytes, hence compiler complains and quits.

To fix the issue, removing "volatile" key word or changing "volatile" to
"const"/"static const" does not work, the string is put in .rodata.str1.1 section,
which libbpf did not process it and errors out with
libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
in section '.rodata.str1.1'

Defining the string const as global variable can fix the issue as it puts the string constant
in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process
'.rodata.str*.*' properly, the global definition can be changed back to local definition.

Defining tcp_mem_name as a global, however, triggered a verifier failure.
./test_progs -n 7/21
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
invalid stack off=0 size=1
verification time 6975 usec
stack depth 160+64
processed 889 insns (limit 1000000) max_states_per_insn 4 total_states
14 peak_states 14 mark_read 10

libbpf: -- END LOG --
libbpf: failed to load program 'sysctl_tcp_mem'
libbpf: failed to load object 'test_sysctl_loop2.o'
test_bpf_verif_scale:FAIL:114
#7/21 test_sysctl_loop2.o:FAIL
This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code
like
const char tcp_mem_name[] = "";
...
char name[64];
...
for (i = 0; i < sizeof(tcp_mem_name); ++i)
if (name[i] != tcp_mem_name[i])
return 0;
In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be
out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and
79 for test_sysctl_loop2.c.

Without promotion-to-global change, old compiler generates code where
the overflowed stack access is actually filled with valid value, so hiding
the bpf program bug. With promotion-to-global change, the code is different,
more specifically, the previous loading constants to stack is gone, and
"name" occupies stack[-64:0] and overflow access triggers a verifier error.
To fix the issue, adjust "name" buffer size properly.

Reported-by: Andrii Nakryiko
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200909171542.3673449-1-yhs@fb.com

Yonghong Song
2020-09-10 02:21:34 +0800

09 Sep, 2020

2 commits

e6054fc1f selftests/bpf: Add test for map_ptr arithmetic ... Browse Code »

Change selftest map_ptr_kern.c with disabling inlining for
one of subtests, which will fail the test without previous
verifier change. Also added to verifier test for both
"map_ptr += scalar" and "scalar += map_ptr" arithmetic.

Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200908175703.2463721-1-yhs@fb.com

Yonghong Song
2020-09-09 09:04:07 +0800
7c6967326 bpf: Permit map_ptr arithmetic with opcode add and offset 0 ... Browse Code »

Commit 41c48f3a98231 ("bpf: Support access
to bpf map fields") added support to access map fields
with CORE support. For example,

struct bpf_map {
__u32 max_entries;
} __attribute__((preserve_access_index));

struct bpf_array {
struct bpf_map map;
__u32 elem_size;
} __attribute__((preserve_access_index));

struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 4);
__type(key, __u32);
__type(value, __u32);
} m_array SEC(".maps");

SEC("cgroup_skb/egress")
int cg_skb(void *ctx)
{
struct bpf_array *array = (struct bpf_array *)&m_array;

/* .. array->map.max_entries .. */
}

In kernel, bpf_htab has similar structure,

struct bpf_htab {
struct bpf_map map;
...
}

In the above cg_skb(), to access array->map.max_entries, with CORE, the clang will
generate two builtin's.
base = &m_array;
/* access array.map */
map_addr = __builtin_preserve_struct_access_info(base, 0, 0);
/* access array.map.max_entries */
max_entries_addr = __builtin_preserve_struct_access_info(map_addr, 0, 0);
max_entries = *max_entries_addr;

In the current llvm, if two builtin's are in the same function or
in the same function after inlining, the compiler is smart enough to chain
them together and generates like below:
base = &m_array;
max_entries = *(base + reloc_offset); /* reloc_offset = 0 in this case */
and we are fine.

But if we force no inlining for one of functions in test_map_ptr() selftest, e.g.,
check_default(), the above two __builtin_preserve_* will be in two different
functions. In this case, we will have code like:
func check_hash():
reloc_offset_map = 0;
base = &m_array;
map_base = base + reloc_offset_map;
check_default(map_base, ...)
func check_default(map_base, ...):
max_entries = *(map_base + reloc_offset_max_entries);

In kernel, map_ptr (CONST_PTR_TO_MAP) does not allow any arithmetic.
The above "map_base = base + reloc_offset_map" will trigger a verifier failure.
; VERIFY(check_default(&hash->map, map));
0: (18) r7 = 0xffffb4fe8018a004
2: (b4) w1 = 110
3: (63) *(u32 *)(r7 +0) = r1
R1_w=invP110 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
; VERIFY_TYPE(BPF_MAP_TYPE_HASH, check_hash);
4: (18) r1 = 0xffffb4fe8018a000
6: (b4) w2 = 1
7: (63) *(u32 *)(r1 +0) = r2
R1_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R2_w=invP1 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
8: (b7) r2 = 0
9: (18) r8 = 0xffff90bcb500c000
11: (18) r1 = 0xffff90bcb500c000
13: (0f) r1 += r2
R1 pointer arithmetic on map_ptr prohibited

To fix the issue, let us permit map_ptr + 0 arithmetic which will
result in exactly the same map_ptr.

Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200908175702.2463625-1-yhs@fb.com

Yonghong Song
2020-09-09 09:02:18 +0800

07 Sep, 2020

3 commits

bc0b5a030 tools, bpf: Synchronise BPF UAPI header with tools ... Browse Code »

Synchronise the bpf.h header under tools, to report the fixes recently
brought to the documentation for the BPF helpers.

Signed-off-by: Quentin Monnet
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904161454.31135-4-quentin@isovalent.com

Quentin Monnet
2020-09-07 22:31:18 +0800
938c3efd9 bpf: Fix formatting in documentation for BPF helpers ... Browse Code »

Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man
complains about a wrong indentation, but what is missing is actually a
blank line before the bullet list).

Fix and harmonise the formatting for other helpers.

Signed-off-by: Quentin Monnet
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com

Quentin Monnet
2020-09-07 22:31:18 +0800
1a7581b17 tools: bpftool: Fix formatting in bpftool-link documentation ... Browse Code »

Fix a formatting error in the documentation for bpftool-link, so that
the man page can build correctly.

Signed-off-by: Quentin Monnet
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904161454.31135-2-quentin@isovalent.com

Quentin Monnet
2020-09-07 22:31:18 +0800

04 Sep, 2020

7 commits

f9bec5d75 samples, bpf: Add xsk_fwd test file to .gitignore ... Browse Code »

This commit adds xsk_fwd test file to .gitignore which is newly added
to samples/bpf.

Signed-off-by: Daniel T. Lee
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904063434.24963-2-danieltimlee@gmail.com

Daniel T. Lee
2020-09-04 22:37:53 +0800
698584dff samples, bpf: Replace bpf_program__title() with bpf_program__section_name() ... Browse Code »

From commit 521095842027 ("libbpf: Deprecate notion of BPF program
"title" in favor of "section name""), the term title has been replaced
with section name in libbpf.

Since the bpf_program__title() has been deprecated, this commit
switches this function to bpf_program__section_name(). Due to
this commit, the compilation warning issue has also been resolved.

Fixes: 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"")
Signed-off-by: Daniel T. Lee
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904063434.24963-1-danieltimlee@gmail.com

Daniel T. Lee
2020-09-04 22:37:48 +0800
8eb629585 libbpf: Fix potential multiplication overflow ... Browse Code »

Detected by LGTM static analyze in Github repo, fix potential multiplication
overflow before result is casted to size_t.

Fixes: 8505e8709b5e ("libbpf: Implement generalized .BTF.ext func/line info adjustment")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904041611.1695163-2-andriin@fb.com

Andrii Nakryiko
2020-09-04 20:35:12 +0800
17e54b096 libbpf: Fix another __u64 cast in printf ... Browse Code »

Another issue of __u64 needing either %lu or %llu, depending on the
architecture. Fix with cast to `unsigned long long`.

Fixes: 7e06aad52929 ("libbpf: Add multi-prog section support for struct_ops")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200904041611.1695163-1-andriin@fb.com

Andrii Nakryiko
2020-09-04 20:35:12 +0800
95cec14b0 selftests/bpf: Fix check in global_data_init. ... Browse Code »

The returned value of bpf_object__open_file() should be checked with
libbpf_get_error() rather than NULL. This fix prevents test_progs from
crash when test_global_data.o is not present.

Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200903200528.747884-1-haoluo@google.com

Hao Luo
2020-09-04 08:33:31 +0800
b4ff7ad77 Merge branch 'libbpf-support-bpf-to-bpf-calls' ... Browse Code »

Andrii Nakryiko says:

====================
Currently, libbpf supports a limited form of BPF-to-BPF subprogram calls. The
restriction is that entry-point BPF program should use *all* of defined
sub-programs in BPF .o file. If any of the subprograms is not used, such
entry-point BPF program will be rejected by verifier as containing unreachable
dead code. This is not a big limitation for cases with single entry-point BPF
programs, but is quite a heavy restriction for multi-programs that use only
partially overlapping set of subprograms.

This patch set removes all such restrictions and adds complete support for
using BPF sub-program calls on BPF side. This is achieved through libbpf
tracking subprograms individually and detecting which subprograms are used by
any given entry-point BPF program, and subsequently only appending and
relocating code for just those used subprograms.

In addition, libbpf now also supports multiple entry-point BPF programs within
the same ELF section. This allows to structure code so that there are few
variants of BPF programs of the same type and attaching to the same target
(e.g., for tracepoints and kprobes) without the need to worry about ELF
section name clashes.

This patch set opens way for more wider adoption of BPF subprogram calls,
especially for real-world production use-cases with complicated net of
subprograms. This will allow to further scale BPF verification process through
good use of global functions, which can be verified independently. This is
also important prerequisite for static linking which allows static BPF
libraries to not worry about naming clashes for section names, as well as use
static non-inlined functions (subprograms) without worries of verifier
rejecting program due to dead code.

Patch set is structured as follows:
- patched 1-6 contain all the libbpf changes necessary to support multi-prog
sections and bpf2bpf subcalls;
- patch 7 adds dedicated selftests validating all combinations of possible
sub-calls (within and across sections, static vs global functions);
- patch 8 deprecated bpf_program__title() in favor of
bpf_program__section_name(). The intent was to also deprecate
bpf_object__find_program_by_title() as it's now non-sensical with multiple
programs per section. But there were too many selftests uses of this and
I didn't want to delay this patches further and make it even bigger, so left
it for a follow up cleanup;
- patches 9-10 remove uses for title-related APIs from bpftool and
bpf_program__title() use from selftests;
- patch 11 is converting fexit_bpf2bpf to have explicit subtest (it does
contain 4 subtests, which are not handled as sub-tests);
- patches 12-14 convert few complicated BPF selftests to use __noinline
functions to further validate correctness of libbpf's bpf2bpf processing
logic.

v2->v3:
- explained subprog relocation algorithm in more details (Alexei);
- pyperf, strobelight and cls_redirect got new subprog variants, leaving
other modes intact (Alexei);
v1->v2:
- rename DEPRECATED to LIBBPF_DEPRECATED to avoid name clashes;
- fix test_subprogs build;
- convert a bunch of complicated selftests to __noinline (Alexei).
====================

Signed-off-by: Alexei Starovoitov

Alexei Starovoitov
2020-09-04 08:18:23 +0800
ee333df50 selftests/bpf: Add __noinline variant of cls_redirect selftest ... Browse Code »

As one of the most complicated and close-to-real-world programs, cls_redirect
is a good candidate to exercise libbpf's logic of handling bpf2bpf calls. So
add variant with using explicit __noinline for majority of functions except
few most basic ones. If those few functions are inlined, verifier starts to
complain about program instruction limit of 1mln instructions being exceeded,
most probably due to instruction overhead of doing a sub-program call.
Convert user-space part of selftest to have to sub-tests: with and without
inlining.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Cc: Lorenz Bauer
Link: https://lore.kernel.org/bpf/20200903203542.15944-15-andriin@fb.com

Andrii Nakryiko
2020-09-04 08:14:40 +0800