Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2020

1 commit

beef1b438 libbpf: Sanitise map names before pinning ... Browse Code »

[ Upstream commit 9cf309c56f7910a81fbe053b6f11c3b1f0987b12 ]

When we added sanitising of map names before loading programs to libbpf, we
still allowed periods in the name. While the kernel will accept these for
the map names themselves, they are not allowed in file names when pinning
maps. This means that bpf_object__pin_maps() will fail if called on an
object that contains internal maps (such as sections .rodata).

Fix this by replacing periods with underscores when constructing map pin
paths. This only affects the paths generated by libbpf when
bpf_object__pin_maps() is called with a path argument. Any pin paths set
by bpf_map__set_pin_path() are unaffected, and it will still be up to the
caller to avoid invalid characters in those.

Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not rejected by the kernel")
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
Signed-off-by: Sasha Levin

Toke Høiland-Jørgensen
2020-12-30 18:53:33 +0800

02 Dec, 2020

1 commit

f6a8250ea libbpf: Fix ring_buffer__poll() to return number of consumed samples ... Browse Code »

Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org

Andrii Nakryiko
2020-12-02 12:21:45 +0800

20 Nov, 2020

1 commit

1fd6cee12 libbpf: Fix VERSIONED_SYM_COUNT number parsing ... Browse Code »

We remove "other info" from "readelf -s --wide" output when
parsing GLOBAL_SYM_COUNT variable, which was added in [1].
But we don't do that for VERSIONED_SYM_COUNT and it's failing
the check_abi target on powerpc Fedora 33.

The extra "other info" wasn't problem for VERSIONED_SYM_COUNT
parsing until commit [2] added awk in the pipe, which assumes
that the last column is symbol, but it can be "other info".

Adding "other info" removal for VERSIONED_SYM_COUNT the same
way as we did for GLOBAL_SYM_COUNT parsing.

[1] aa915931ac3e ("libbpf: Fix readelf output parsing for Fedora")
[2] 746f534a4809 ("tools/libbpf: Avoid counting local symbols in ABI check")

Fixes: 746f534a4809 ("tools/libbpf: Avoid counting local symbols in ABI check")
Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201118211350.1493421-1-jolsa@kernel.org

Jiri Olsa
2020-11-20 00:45:12 +0800

10 Nov, 2020

1 commit

197afc631 libbpf: Don't attempt to load unused subprog as an entry-point BPF program ... Browse Code »

If BPF code contains unused BPF subprogram and there are no other subprogram
calls (which can realistically happen in real-world applications given
sufficiently smart Clang code optimizations), libbpf will erroneously assume
that subprograms are entry-point programs and will attempt to load them with
UNSPEC program type.

Fix by not relying on subcall instructions and rather detect it based on the
structure of BPF object's sections.

Fixes: 9a94f277c4fb ("tools: libbpf: restore the ability to load programs from .text section")
Reported-by: Dmitrii Banshchikov
Signed-off-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org

Andrii Nakryiko
2020-11-10 05:15:23 +0800

05 Nov, 2020

2 commits

25cf73b9f libbpf: Fix possible use after free in xsk_socket__delete ... Browse Code »

Fix a possible use after free in xsk_socket__delete that will happen
if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken
from the context and just use that instead.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson
Signed-off-by: Daniel Borkmann
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/1604396490-12129-3-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-11-05 04:37:29 +0800
f78331f74 libbpf: Fix null dereference in xsk_socket__delete ... Browse Code »

Fix a possible null pointer dereference in xsk_socket__delete that
will occur if a null pointer is fed into the function.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Reported-by: Andrii Nakryiko
Signed-off-by: Magnus Karlsson
Signed-off-by: Daniel Borkmann
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/1604396490-12129-2-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-11-05 04:37:28 +0800

03 Nov, 2020

1 commit

7a078d2d1 libbpf, hashmap: Fix undefined behavior in hash_bits ... Browse Code »

If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.

Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko ,
Signed-off-by: Ian Rogers
Signed-off-by: Daniel Borkmann
Acked-by: Andrii Nakryiko
Acked-by: Song Liu
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com

Ian Rogers
2020-11-03 06:33:51 +0800

24 Oct, 2020

1 commit

3cb12d27f Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull networking fixes from Jakub Kicinski:
"Cross-tree/merge window issues:

- rtl8150: don't incorrectly assign random MAC addresses; fix late in
the 5.9 cycle started depending on a return code from a function
which changed with the 5.10 PR from the usb subsystem

Current release regressions:

- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available

Previous release regressions:

- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly

- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu() to
synchronize_rcu_expedited()

- netsec: ignore 'phy-mode' device property on ACPI systems; the
property is not populated correctly by the firmware, but firmware
configures the PHY so just keep boot settings

Previous releases - always broken:

- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"

- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal

- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is light
and only wants to schedule napi (and do so through a _irqoff()
variant, preferably)

- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked

- tipc: re-configure queue limit for broadcast link

- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels

- fix various issues in chelsio inline tls driver

Misc:

- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one

- remove unnecessary break statements

- make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
tcp: fix to update snd_wl1 in bulk receiver fast path
net: Properly typecast int values to set sk_max_pacing_rate
netfilter: nf_fwd_netdev: clear timestamp in forwarding path
ibmvnic: save changed mac address to adapter->mac_addr
selftests: mptcp: depends on built-in IPv6
Revert "virtio-net: ethtool configurable RXCSUM"
rtnetlink: fix data overflow in rtnl_calcit()
net: ethernet: mtk-star-emac: select REGMAP_MMIO
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
mptcp: depends on IPV6 but not as a module
sfc: move initialisation of efx->filter_sem to efx_init_struct()
mpls: load mpls_gso after mpls_iptunnel
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
net: dsa: bcm_sf2: make const array static, makes object smaller
mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
...

Linus Torvalds
2020-10-24 03:05:49 +0800

22 Oct, 2020

1 commit

3652c9a1b bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static ... Browse Code »

Yaniv reported a compilation error after pulling latest libbpf:

[...]
../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
unknown register name 'r0' in asm
: "r0", "r1", "r2", "r3", "r4", "r5");
[...]

The issue got triggered given Yaniv was compiling tracing programs with native
target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
to BPF object file. Given that clang was expecting x86 inline asm and not BPF
one the error complained that these regs don't exist on the former.

Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
to use. BPF tracing programs on more modern kernels use BPF target anyway and
thus the bpf_tail_call_static() function will be available for them. BPF inline
asm is supported since clang 7 (clang
Signed-off-by: Daniel Borkmann
Acked-by: Andrii Nakryiko
Acked-by: Yonghong Song
Tested-by: Yaniv Agman
Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net

Daniel Borkmann
2020-10-22 07:46:52 +0800

18 Oct, 2020

1 commit

9d9af1007 Merge tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

- cgroup improvements for 'perf stat', allowing for compact
specification of events and cgroups in the command line.

- Support per thread topdown metrics in 'perf stat'.

- Support sample-read topdown metric group in 'perf record'

- Show start of latency in addition to its start in 'perf sched
latency'.

- Add min, max to 'perf script' futex-contention output, in addition to
avg.

- Allow usage of 'perf_event_attr->exclusive' attribute via the new
':e' event modifier.

- Add 'snapshot' command to 'perf record --control', using it with
Intel PT.

- Support FIFO file names as alternative options to 'perf record
--control'.

- Introduce branch history "streams", to compare 'perf record' runs
with 'perf diff' based on branch records and report hot streams.

- Support PE executable symbol tables using libbfd, to profile, for
instance, wine binaries.

- Add filter support for option 'perf ftrace -F/--funcs'.

- Allow configuring the 'disassembler_style' 'perf annotate' knob via
'perf config'

- Update CascadelakeX and SkylakeX JSON vendor events files.

- Add support for parsing perchip/percore JSON vendor events.

- Add power9 hv_24x7 core level metric events.

- Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
zen1.

- Enable Family 19h users by matching Zen2 AMD vendor events.

- Use debuginfod in 'perf probe' when required debug files not found
locally.

- Display negative tid in non-sample events in 'perf script'.

- Make GTK2 support opt-in

- Add build test with GTK+

- Add missing -lzstd to the fast path feature detection

- Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
use in 'perf trace'.

- Show python test script in verbose mode.

- Fix uncore metric expressions

- Msan uninitialized use fixes.

- Use condition variables in 'perf bench numa'

- Autodetect python3 binary in systems without python2.

- Support md5 build ids in addition to sha1.

- Add build id 'perf test' regression test.

- Fix printable strings in python3 scripts.

- Fix off by ones in 'perf trace' in arches using libaudit.

- Fix JSON event code for events referencing std arch events.

- Introduce 'perf test' shell script for Arm CoreSight testing.

- Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
event and in 'perf test tsc'.

- 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
fixes and documentation update.

- Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
debuginfo files.

- Do not print 'Metric Groups:' unnecessarily in 'perf list'

- Refcounting fixes in the event parsing code.

- Add expand cgroup event 'perf test' entry.

- Fix out of bounds CPU map access when handling armv8_pmu events in
'perf stat'.

- Add build-id injection 'perf bench' benchmark.

- Enter namespace when reading build-id in 'perf inject'.

- Do not load map/dso when injecting build-id speeding up the 'perf
inject' process.

- Add --buildid-all option to avoid processing all samples, just the
mmap metadata events.

- Add feature test to check if libbfd has buildid support

- Add 'perf test' entry for PE binary format support.

- Fix typos in power8 PMU vendor events JSON files.

- Hide libtraceevent non API functions.

* tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
perf c2c: Update documentation for metrics reorganization
perf c2c: Add metrics "RMT Load Hit"
perf c2c: Correct LLC load hit metrics
perf c2c: Change header for LLC local hit
perf c2c: Use more explicit headers for HITM
perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
perf c2c: Organize metrics based on memory hierarchy
perf c2c: Display "Total Stores" as a standalone metrics
perf c2c: Display the total numbers continuously
perf bench: Use condition variables in numa.
perf jevents: Fix event code for events referencing std arch events
perf diff: Support hot streams comparison
perf streams: Report hot streams
perf streams: Calculate the sum of total streams hits
perf streams: Link stream pair
perf streams: Compare two streams
perf streams: Get the evsel_streams by evsel_idx
perf streams: Introduce branch history "streams"
perf intel-pt: Improve PT documentation slightly
perf tools: Add support for exclusive groups/events
...

Linus Torvalds
2020-10-18 02:47:46 +0800

16 Oct, 2020

1 commit

9ff9b0d39 Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from Jakub Kicinski:

- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.

Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).

- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.

- Allow more than 255 IPv4 multicast interfaces.

- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.

- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.

- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.

- Allow more calls to same peer in RxRPC.

- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.

- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.

- Add TC actions for implementing MPLS L2 VPNs.

- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.

- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.

- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.

- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.

- Support sleepable BPF programs, initially targeting LSM and tracing.

- Add bpf_d_path() helper for returning full path for given 'struct
path'.

- Make bpf_tail_call compatible with bpf-to-bpf calls.

- Allow BPF programs to call map_update_elem on sockmaps.

- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).

- Support listing and getting information about bpf_links via the bpf
syscall.

- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).

- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.

- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).

- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.

- Add XDP support for Intel's igb driver.

- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.

- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.

- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.

- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.

- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.

- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.

- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.

- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.

- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.

- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...

Linus Torvalds
2020-10-16 09:42:13 +0800

15 Oct, 2020

1 commit

9e51183e9 Merge tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

- a selftests harness fix to flush stdout before forking to avoid
parent and child printing duplicates messages. This is evident when
test output is redirected to a file.

- a tools/ wide change to avoid comma separated statements from Joe
Perches. This fix spans tools/lib, tools/power/cpupower, and
selftests.

* tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
tools: Avoid comma separated statements
selftests/harness: Flush stdout before forking

Linus Torvalds
2020-10-15 05:23:51 +0800

14 Oct, 2020

2 commits

b0a323c7f perf tools: Add size to 'struct perf_record_header_build_id' ... Browse Code »

We do not store size with build ids in perf data, but there's enough
space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark
build id event with size.

With this fix the dso with md5 build id will have correct build id data
and will be usable for debuginfod processing if needed (coming in
following patches).

Committer notes:

Use %zu with size_t to fix this error on 32-bit arches:

util/header.c: In function '__event_process_build_id':
util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
pr_debug("build id event received for %s: %s [%lu]\n",
^

Signed-off-by: Jiri Olsa
Acked-by: Ian Rogers
Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Jiri Olsa
2020-10-14 22:28:12 +0800
dbaa1b3d9 Merge branch 'perf/urgent' into perf/core ... Browse Code »

To pick fixes that missed v5.9.

Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2020-10-14 00:02:20 +0800

13 Oct, 2020

2 commits

a41c32105 tools lib traceevent: Hide non API functions ... Browse Code »

There are internal library functions, which are not declared as a static.
They are used inside the library from different files. Hide them from
the library users, as they are not part of the API.
These functions are made hidden and are renamed without the prefix "tep_":
tep_free_plugin_paths
tep_peek_char
tep_buffer_init
tep_get_input_buf_ptr
tep_get_input_buf
tep_read_token
tep_free_token
tep_free_event
tep_free_format_field
__tep_parse_format

Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/
Reported-by: Ben Hutchings
Signed-off-by: Tzvetomir Stoyanov (VMware)
Reviewed-by: Steven Rostedt (VMware)
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo

Tzvetomir Stoyanov (VMware)
2020-10-13 22:47:38 +0800
ccdf7fae3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next ... Browse Code »

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-10-12

The main changes are:

1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

2) libbpf relocation support for different size load/store, from Andrii.

3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

4) BPF support for per-cpu variables, form Hao.

5) sockmap improvements, from John.
====================

Signed-off-by: Jakub Kicinski

Jakub Kicinski
2020-10-13 07:16:50 +0800

08 Oct, 2020

4 commits

2b7d88c2b libbpf: Allow specifying both ELF and raw BTF for CO-RE BTF override ... Browse Code »

Use generalized BTF parsing logic, making it possible to parse BTF both from
ELF file, as well as a raw BTF dump. This makes it easier to write custom
tests with manually generated BTFs.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201008001025.292064-4-andrii@kernel.org

Andrii Nakryiko
2020-10-08 09:50:27 +0800
a66345bcb libbpf: Support safe subset of load/store instruction resizing with CO-RE ... Browse Code »

Add support for patching instructions of the following form:
- rX = *(T *)(rY + );
- *(T *)(rX + ) = rY;
- *(T *)(rX + ) = , where T is one of {u8, u16, u32, u64}.

For such instructions, if the actual kernel field recorded in CO-RE relocation
has a different size than the one recorded locally (e.g., from vmlinux.h),
then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads.

In general, such transformation is not always correct and could lead to
invalid final value being loaded or stored. But two classes of cases are
always safe:
- if both local and target (kernel) types are unsigned integers, but of
different sizes, then it's OK to adjust load/store instruction according to
the necessary memory size. Zero-extending nature of such instructions and
unsignedness make sure that the final value is always correct;
- pointer size mismatch between BPF target architecture (which is always
64-bit) and 32-bit host kernel architecture can be similarly resolved
automatically, because pointer is essentially an unsigned integer. Loading
32-bit pointer into 64-bit BPF register with zero extension will leave
correct pointer in the register.

Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned
long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled
with BPF program for BPF target it will be treated by compiler as 64-bit
integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated
as 64-bit values by compiler for BPF target. Both problems are now resolved by
libbpf for direct memory reads.

But similar transformations are useful in general when kernel fields are
"resized" from, e.g., unsigned int to unsigned long (or vice versa).

Now, similar transformations for signed integers are not safe to perform as
they will result in incorrect sign extension of the value. If such situation
is detected, libbpf will emit helpful message and will poison the instruction.
Not failing immediately means that it's possible to guard the instruction
based on kernel version (or other conditions) and make sure it's not
reachable.

If there is a need to read signed integers that change sizes between different
kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both
with bitfields and non-bitfield integers of any signedness and handles
sign-extension properly. Also, bpf_core_read() with proper size and/or use of
bpf_core_field_size() relocation could allow to deal with such complicated
situations explicitly, if not so conventiently as direct memory reads.

Selftests added in a separate patch in progs/test_core_autosize.c demonstrate
both direct memory and probed use cases.

BPF_CORE_READ() is not changed and it won't deal with such situations as
automatically as direct memory reads due to the signedness integer
limitations, which are much harder to detect and control with compiler macro
magic. So it's encouraged to utilize direct memory reads as much as possible.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org

Andrii Nakryiko
2020-10-08 09:50:27 +0800
47f7cf632 libbpf: Skip CO-RE relocations for not loaded BPF programs ... Browse Code »

Bypass CO-RE relocations step for BPF programs that are not going to be
loaded. This allows to have BPF programs compiled in and disabled dynamically
if kernel is not supposed to provide enough relocation information. In such
case, there won't be unnecessary warnings about failed relocations.

Fixes: d929758101fc ("libbpf: Support disabling auto-loading BPF programs")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org

Andrii Nakryiko
2020-10-08 09:50:27 +0800
80348d886 libbpf: Fix compatibility problem in xsk_socket__create ... Browse Code »

Fix a compatibility problem when the old XDP_SHARED_UMEM mode is used
together with the xsk_socket__create() call. In the old XDP_SHARED_UMEM
mode, only sharing of the same device and queue id was allowed, and
in this mode, the fill ring and completion ring were shared between
the AF_XDP sockets.

Therefore, it was perfectly fine to call the xsk_socket__create() API
for each socket and not use the new xsk_socket__create_shared() API.
This behavior was ruined by the commit introducing XDP_SHARED_UMEM
support between different devices and/or queue ids. This patch restores
the ability to use xsk_socket__create in these circumstances so that
backward compatibility is not broken.

Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/1602070946-11154-1-git-send-email-magnus.karlsson@gmail.com

Magnus Karlsson
2020-10-08 04:28:43 +0800

07 Oct, 2020

4 commits

bef69bd7c perf stat: Fix out of bounds CPU map access when handling armv8_pmu events ... Browse Code »

It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
events with the task mode. As 'perf stat' uses an empty cpu map for
task mode but armv8_pmu has its own cpu mask, it has confused which map
it should use when accessing file descriptors and this causes segfaults:

(gdb) bt
#0 0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=,
cpu=) at evsel.c:122
#1 perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
#2 0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
#3 0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
at builtin-stat.c:929
#4 0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
argc=1) at builtin-stat.c:947
#5 cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
#6 0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 ,
argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
#7 0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
argv=argv@entry=0xfffffaea2f90) at perf.c:364
#8 0x0000000000435378 in run_argv (argcp=,
argv=) at perf.c:408
#9 main (argc=4, argv=0xfffffaea2f90) at perf.c:538

To fix this, I simply used the given cpu map unless the evsel actually
is not a system-wide event (like uncore events).

Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Reported-by: Wei Li
Signed-off-by: Namhyung Kim
Tested-by: Barry Song
Acked-by: Jiri Olsa
Cc: Alexander Shishkin
Cc: Mark Rutland
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lore.kernel.org/lkml/20201007081311.1831003-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Namhyung Kim
2020-10-07 20:57:58 +0800
8cee9107e bpf, libbpf: Use valid btf in bpf_program__set_attach_target ... Browse Code »

bpf_program__set_attach_target(prog, fd, ...) will always fail when
fd = 0 (attach to a kernel symbol) because obj->btf_vmlinux is NULL
and there is no way to set it (at the moment btf_vmlinux is meant
to be temporary storage for use in bpf_object__load_xattr()).

Fix this by using libbpf_find_vmlinux_btf_id().

At some point we may want to opportunistically cache btf_vmlinux
so it can be reused with multiple programs.

Signed-off-by: Luigi Rizzo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Petar Penkov
Link: https://lore.kernel.org/bpf/20201005224528.389097-1-lrizzo@google.com

Luigi Rizzo
2020-10-07 02:36:10 +0800
2c193d32c libbpf: Check if pin_path was set even map fd exist ... Browse Code »

Say a user reuse map fd after creating a map manually and set the
pin_path, then load the object via libbpf.

In libbpf bpf_object__create_maps(), bpf_object__reuse_map() will
return 0 if there is no pinned map in map->pin_path. Then after
checking if map fd exist, we should also check if pin_path was set
and do bpf_map__pin() instead of continue the loop.

Fix it by creating map if fd not exist and continue checking pin_path
after that.

Suggested-by: Andrii Nakryiko
Signed-off-by: Hangbin Liu
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201006021345.3817033-3-liuhangbin@gmail.com

Hangbin Liu
2020-10-07 02:10:20 +0800
a0f2b7acb libbpf: Close map fd if init map slots failed ... Browse Code »

Previously we forgot to close the map fd if bpf_map_update_elem()
failed during map slot init, which will leak map fd.

Let's move map slot initialization to new function init_map_slots() to
simplify the code. And close the map fd if init slot failed.

Reported-by: Andrii Nakryiko
Signed-off-by: Hangbin Liu
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20201006021345.3817033-2-liuhangbin@gmail.com

Hangbin Liu
2020-10-07 02:10:20 +0800

06 Oct, 2020

1 commit

8b0308fe3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller

David S. Miller
2020-10-06 09:40:01 +0800

03 Oct, 2020

2 commits

d370bbe12 bpf/libbpf: BTF support for typed ksyms ... Browse Code »

If a ksym is defined with a type, libbpf will try to find the ksym's btf
information from kernel btf. If a valid btf entry for the ksym is found,
libbpf can pass in the found btf id to the verifier, which validates the
ksym's type and value.

Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
but it has the symbol's address (read from kallsyms) and its value is
treated as a raw pointer.

Signed-off-by: Hao Luo
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com

Hao Luo
2020-10-03 05:59:25 +0800
aa803771a tools: Avoid comma separated statements ... Browse Code »

Use semicolons and braces.

Signed-off-by: Joe Perches
Signed-off-by: Shuah Khan

Joe Perches
2020-10-03 00:36:36 +0800

01 Oct, 2020

2 commits

9c6c5c48d libbpf: Make btf_dump work with modifiable BTF ... Browse Code »

Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com

Andrii Nakryiko
2020-10-01 03:30:22 +0800
0e9f6841f bpf, libbpf: Add bpf_tail_call_static helper for bpf programs ... Browse Code »

Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

[0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035

Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net

Daniel Borkmann
2020-10-01 02:50:35 +0800

30 Sep, 2020

5 commits

b0efc216f libbpf: Compile in PIC mode only for shared library case ... Browse Code »

Libbpf compiles .o's for static and shared library modes separately, so no
need to specify -fPIC for both. Keep it only for shared library mode.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200929220604.833631-3-andriin@fb.com

Andrii Nakryiko
2020-09-30 08:05:31 +0800
0a62291d6 libbpf: Compile libbpf under -O2 level by default and catch extra warnings ... Browse Code »

For some reason compiler doesn't complain about uninitialized variable, fixed
in previous patch, if libbpf is compiled without -O2 optimization level. So do
compile it with -O2 and never let similar issue slip by again. -Wall is added
unconditionally, so no need to specify it again.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200929220604.833631-2-andriin@fb.com

Andrii Nakryiko
2020-09-30 08:05:31 +0800
334339134 libbpf: Fix uninitialized variable in btf_parse_type_sec ... Browse Code »

Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
Makefile changes to catch such errors are added separately.

Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com

Andrii Nakryiko
2020-09-30 08:05:31 +0800
a53590914 libbpf: Add support for freplace attachment in bpf_link_create ... Browse Code »

This adds support for supplying a target btf ID for the bpf_link_create()
operation, and adds a new bpf_program__attach_freplace() high-level API for
attaching freplace functions with a target.

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk

Toke Høiland-Jørgensen
2020-09-30 04:09:24 +0800
3289959b9 libbpf: Support BTF loading and raw data output in both endianness ... Browse Code »

Teach BTF to recognized wrong endianness and transparently convert it
internally to host endianness. Original endianness of BTF will be preserved
and used during btf__get_raw_data() to convert resulting raw data to the same
endianness and a source raw_data. This means that little-endian host can parse
big-endian BTF with no issues, all the type data will be presented to the
client application in native endianness, but when it's time for emitting BTF
to persist it in a file (e.g., after BTF deduplication), original non-native
endianness will be preserved and stored.

It's possible to query original endianness of BTF data with new
btf__endianness() API. It's also possible to override desired output
endianness with btf__set_endianness(), so that if application needs to load,
say, big-endian BTF and store it as little-endian BTF, it's possible to
manually override this. If btf__set_endianness() was used to change
endianness, btf__endianness() will reflect overridden endianness.

Given there are no known use cases for supporting cross-endianness for
.BTF.ext, loading .BTF.ext in non-native endianness is not supported.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com

Andrii Nakryiko
2020-09-30 03:21:23 +0800

29 Sep, 2020

6 commits

9141f75a3 selftests/bpf: Test BTF writing APIs ... Browse Code »

Add selftests for BTF writer APIs.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200929020533.711288-4-andriin@fb.com

Andrii Nakryiko
2020-09-29 10:17:13 +0800
f86ed050b libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset ... Browse Code »

BTF strings are used not just for names, they can be arbitrary strings used
for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
is too specific and might be misleading. Instead, introduce
btf__str_by_offset() API which uses generic string terminology.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com

Andrii Nakryiko
2020-09-29 10:17:13 +0800
4a3b33f85 libbpf: Add BTF writing APIs ... Browse Code »

Add APIs for appending new BTF types at the end of BTF object.

Each BTF kind has either one API of the form btf__add_(). For types
that have variable amount of additional items (struct/union, enum, func_proto,
datasec), additional API is provided to emit each such item. E.g., for
emitting a struct, one would use the following sequence of API calls:

btf__add_struct(...);
btf__add_field(...);
...
btf__add_field(...);

Each btf__add_field() will ensure that the last BTF type is of STRUCT or
UNION kind and will automatically increment that type's vlen field.

All the strings are provided as C strings (const char *), not a string offset.
This significantly improves usability of BTF writer APIs. All such strings
will be automatically appended to string section or existing string will be
re-used, if such string was already added previously.

Each API attempts to do all the reasonable validations, like enforcing
non-empty names for entities with required names, proper value bounds, various
bit offset restrictions, etc.

Type ID validation is minimal because it's possible to emit a type that refers
to type that will be emitted later, so libbpf has no way to enforce such
cases. User must be careful to properly emit all the necessary types and
specify type IDs that will be valid in the finally generated BTF.

Each of btf__add_() APIs return new type ID on success or negative
value on error. APIs like btf__add_field() that emit additional items
return zero on success and negative value on error.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com

Andrii Nakryiko
2020-09-29 10:17:13 +0800
a871b0431 libbpf: Add btf__new_empty() to create an empty BTF object ... Browse Code »

Add an ability to create an empty BTF object from scratch. This is going to be
used by pahole for BTF encoding. And also by selftest for convenient creation
of BTF objects.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com

Andrii Nakryiko
2020-09-29 08:27:32 +0800
919d2b1db libbpf: Allow modification of BTF and add btf__add_str API ... Browse Code »

Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.

Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.

Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.

This patch adds the first two BTF manipulation APIs: btf__add_str(), which
allows to add arbitrary strings to BTF string section, and btf__find_str()
which allows to find existing string offset, but not add it if it's missing.
All the added strings are automatically deduplicated. This is achieved by
maintaining an additional string lookup index for all unique strings. Such
index is built when BTF is switched to modifiable mode. If at that time BTF
strings section contained duplicate strings, they are not de-duplicated. This
is done specifically to not modify the existing content of BTF (types, their
string offsets, etc), which can cause confusion and is especially important
property if there is struct btf_ext associated with struct btf. By following
this "imperfect deduplication" process, btf_ext is kept consitent and correct.
If deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com

Andrii Nakryiko
2020-09-29 08:27:31 +0800
7d9c71e10 libbpf: Extract generic string hashing function for reuse ... Browse Code »

Calculating a hash of zero-terminated string is a common need when using
hashmap, so extract it for reuse.

Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com

Andrii Nakryiko
2020-09-29 08:27:31 +0800