30 Dec, 2020

1 commit

  • [ Upstream commit 9cf309c56f7910a81fbe053b6f11c3b1f0987b12 ]

    When we added sanitising of map names before loading programs to libbpf, we
    still allowed periods in the name. While the kernel will accept these for
    the map names themselves, they are not allowed in file names when pinning
    maps. This means that bpf_object__pin_maps() will fail if called on an
    object that contains internal maps (such as sections .rodata).

    Fix this by replacing periods with underscores when constructing map pin
    paths. This only affects the paths generated by libbpf when
    bpf_object__pin_maps() is called with a path argument. Any pin paths set
    by bpf_map__set_pin_path() are unaffected, and it will still be up to the
    caller to avoid invalid characters in those.

    Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not rejected by the kernel")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     

02 Dec, 2020

1 commit

  • Fix ring_buffer__poll() to return the number of non-discarded records
    consumed, just like its documentation states. It's also consistent with
    ring_buffer__consume() return. Fix up selftests with wrong expected results.

    Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
    Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org

    Andrii Nakryiko
     

20 Nov, 2020

1 commit

  • We remove "other info" from "readelf -s --wide" output when
    parsing GLOBAL_SYM_COUNT variable, which was added in [1].
    But we don't do that for VERSIONED_SYM_COUNT and it's failing
    the check_abi target on powerpc Fedora 33.

    The extra "other info" wasn't problem for VERSIONED_SYM_COUNT
    parsing until commit [2] added awk in the pipe, which assumes
    that the last column is symbol, but it can be "other info".

    Adding "other info" removal for VERSIONED_SYM_COUNT the same
    way as we did for GLOBAL_SYM_COUNT parsing.

    [1] aa915931ac3e ("libbpf: Fix readelf output parsing for Fedora")
    [2] 746f534a4809 ("tools/libbpf: Avoid counting local symbols in ABI check")

    Fixes: 746f534a4809 ("tools/libbpf: Avoid counting local symbols in ABI check")
    Signed-off-by: Jiri Olsa
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20201118211350.1493421-1-jolsa@kernel.org

    Jiri Olsa
     

10 Nov, 2020

1 commit

  • If BPF code contains unused BPF subprogram and there are no other subprogram
    calls (which can realistically happen in real-world applications given
    sufficiently smart Clang code optimizations), libbpf will erroneously assume
    that subprograms are entry-point programs and will attempt to load them with
    UNSPEC program type.

    Fix by not relying on subcall instructions and rather detect it based on the
    structure of BPF object's sections.

    Fixes: 9a94f277c4fb ("tools: libbpf: restore the ability to load programs from .text section")
    Reported-by: Dmitrii Banshchikov
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org

    Andrii Nakryiko
     

05 Nov, 2020

2 commits

  • Fix a possible use after free in xsk_socket__delete that will happen
    if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken
    from the context and just use that instead.

    Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
    Signed-off-by: Magnus Karlsson
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/1604396490-12129-3-git-send-email-magnus.karlsson@gmail.com

    Magnus Karlsson
     
  • Fix a possible null pointer dereference in xsk_socket__delete that
    will occur if a null pointer is fed into the function.

    Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
    Reported-by: Andrii Nakryiko
    Signed-off-by: Magnus Karlsson
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/1604396490-12129-2-git-send-email-magnus.karlsson@gmail.com

    Magnus Karlsson
     

03 Nov, 2020

1 commit

  • If bits is 0, the case when the map is empty, then the >> is the size of
    the register which is undefined behavior - on x86 it is the same as a
    shift by 0.

    Fix by handling the 0 case explicitly and guarding calls to hash_bits for
    empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

    Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
    Suggested-by: Andrii Nakryiko ,
    Signed-off-by: Ian Rogers
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com

    Ian Rogers
     

24 Oct, 2020

1 commit

  • Pull networking fixes from Jakub Kicinski:
    "Cross-tree/merge window issues:

    - rtl8150: don't incorrectly assign random MAC addresses; fix late in
    the 5.9 cycle started depending on a return code from a function
    which changed with the 5.10 PR from the usb subsystem

    Current release regressions:

    - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
    crashes at probe when control vq was not negotiated/available

    Previous release regressions:

    - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
    bus, only first device would be probed correctly

    - nexthop: Fix performance regression in nexthop deletion by
    effectively switching from recently added synchronize_rcu() to
    synchronize_rcu_expedited()

    - netsec: ignore 'phy-mode' device property on ACPI systems; the
    property is not populated correctly by the firmware, but firmware
    configures the PHY so just keep boot settings

    Previous releases - always broken:

    - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
    bulk transfers getting "stuck"

    - icmp: randomize the global rate limiter to prevent attackers from
    getting useful signal

    - r8169: fix operation under forced interrupt threading, make the
    driver always use hard irqs, even on RT, given the handler is light
    and only wants to schedule napi (and do so through a _irqoff()
    variant, preferably)

    - bpf: Enforce pointer id generation for all may-be-null register
    type to avoid pointers erroneously getting marked as null-checked

    - tipc: re-configure queue limit for broadcast link

    - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
    tunnels

    - fix various issues in chelsio inline tls driver

    Misc:

    - bpf: improve just-added bpf_redirect_neigh() helper api to support
    supplying nexthop by the caller - in case BPF program has already
    done a lookup we can avoid doing another one

    - remove unnecessary break statements

    - make MCTCP not select IPV6, but rather depend on it"

    * tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
    tcp: fix to update snd_wl1 in bulk receiver fast path
    net: Properly typecast int values to set sk_max_pacing_rate
    netfilter: nf_fwd_netdev: clear timestamp in forwarding path
    ibmvnic: save changed mac address to adapter->mac_addr
    selftests: mptcp: depends on built-in IPv6
    Revert "virtio-net: ethtool configurable RXCSUM"
    rtnetlink: fix data overflow in rtnl_calcit()
    net: ethernet: mtk-star-emac: select REGMAP_MMIO
    net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
    net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
    bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
    bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
    bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
    mptcp: depends on IPV6 but not as a module
    sfc: move initialisation of efx->filter_sem to efx_init_struct()
    mpls: load mpls_gso after mpls_iptunnel
    net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
    net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
    net: dsa: bcm_sf2: make const array static, makes object smaller
    mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
    ...

    Linus Torvalds
     

22 Oct, 2020

1 commit

  • Yaniv reported a compilation error after pulling latest libbpf:

    [...]
    ../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
    unknown register name 'r0' in asm
    : "r0", "r1", "r2", "r3", "r4", "r5");
    [...]

    The issue got triggered given Yaniv was compiling tracing programs with native
    target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
    CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
    to BPF object file. Given that clang was expecting x86 inline asm and not BPF
    one the error complained that these regs don't exist on the former.

    Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
    to use. BPF tracing programs on more modern kernels use BPF target anyway and
    thus the bpf_tail_call_static() function will be available for them. BPF inline
    asm is supported since clang 7 (clang
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Acked-by: Yonghong Song
    Tested-by: Yaniv Agman
    Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
    Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net

    Daniel Borkmann
     

18 Oct, 2020

1 commit

  • …x/kernel/git/acme/linux

    Pull perf tools updates from Arnaldo Carvalho de Melo:

    - cgroup improvements for 'perf stat', allowing for compact
    specification of events and cgroups in the command line.

    - Support per thread topdown metrics in 'perf stat'.

    - Support sample-read topdown metric group in 'perf record'

    - Show start of latency in addition to its start in 'perf sched
    latency'.

    - Add min, max to 'perf script' futex-contention output, in addition to
    avg.

    - Allow usage of 'perf_event_attr->exclusive' attribute via the new
    ':e' event modifier.

    - Add 'snapshot' command to 'perf record --control', using it with
    Intel PT.

    - Support FIFO file names as alternative options to 'perf record
    --control'.

    - Introduce branch history "streams", to compare 'perf record' runs
    with 'perf diff' based on branch records and report hot streams.

    - Support PE executable symbol tables using libbfd, to profile, for
    instance, wine binaries.

    - Add filter support for option 'perf ftrace -F/--funcs'.

    - Allow configuring the 'disassembler_style' 'perf annotate' knob via
    'perf config'

    - Update CascadelakeX and SkylakeX JSON vendor events files.

    - Add support for parsing perchip/percore JSON vendor events.

    - Add power9 hv_24x7 core level metric events.

    - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
    zen1.

    - Enable Family 19h users by matching Zen2 AMD vendor events.

    - Use debuginfod in 'perf probe' when required debug files not found
    locally.

    - Display negative tid in non-sample events in 'perf script'.

    - Make GTK2 support opt-in

    - Add build test with GTK+

    - Add missing -lzstd to the fast path feature detection

    - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
    use in 'perf trace'.

    - Show python test script in verbose mode.

    - Fix uncore metric expressions

    - Msan uninitialized use fixes.

    - Use condition variables in 'perf bench numa'

    - Autodetect python3 binary in systems without python2.

    - Support md5 build ids in addition to sha1.

    - Add build id 'perf test' regression test.

    - Fix printable strings in python3 scripts.

    - Fix off by ones in 'perf trace' in arches using libaudit.

    - Fix JSON event code for events referencing std arch events.

    - Introduce 'perf test' shell script for Arm CoreSight testing.

    - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
    event and in 'perf test tsc'.

    - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
    fixes and documentation update.

    - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
    debuginfo files.

    - Do not print 'Metric Groups:' unnecessarily in 'perf list'

    - Refcounting fixes in the event parsing code.

    - Add expand cgroup event 'perf test' entry.

    - Fix out of bounds CPU map access when handling armv8_pmu events in
    'perf stat'.

    - Add build-id injection 'perf bench' benchmark.

    - Enter namespace when reading build-id in 'perf inject'.

    - Do not load map/dso when injecting build-id speeding up the 'perf
    inject' process.

    - Add --buildid-all option to avoid processing all samples, just the
    mmap metadata events.

    - Add feature test to check if libbfd has buildid support

    - Add 'perf test' entry for PE binary format support.

    - Fix typos in power8 PMU vendor events JSON files.

    - Hide libtraceevent non API functions.

    * tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
    perf c2c: Update documentation for metrics reorganization
    perf c2c: Add metrics "RMT Load Hit"
    perf c2c: Correct LLC load hit metrics
    perf c2c: Change header for LLC local hit
    perf c2c: Use more explicit headers for HITM
    perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
    perf c2c: Organize metrics based on memory hierarchy
    perf c2c: Display "Total Stores" as a standalone metrics
    perf c2c: Display the total numbers continuously
    perf bench: Use condition variables in numa.
    perf jevents: Fix event code for events referencing std arch events
    perf diff: Support hot streams comparison
    perf streams: Report hot streams
    perf streams: Calculate the sum of total streams hits
    perf streams: Link stream pair
    perf streams: Compare two streams
    perf streams: Get the evsel_streams by evsel_idx
    perf streams: Introduce branch history "streams"
    perf intel-pt: Improve PT documentation slightly
    perf tools: Add support for exclusive groups/events
    ...

    Linus Torvalds
     

16 Oct, 2020

1 commit

  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • …/kernel/git/shuah/linux-kselftest

    Pull kselftest updates from Shuah Khan:

    - a selftests harness fix to flush stdout before forking to avoid
    parent and child printing duplicates messages. This is evident when
    test output is redirected to a file.

    - a tools/ wide change to avoid comma separated statements from Joe
    Perches. This fix spans tools/lib, tools/power/cpupower, and
    selftests.

    * tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    tools: Avoid comma separated statements
    selftests/harness: Flush stdout before forking

    Linus Torvalds
     

14 Oct, 2020

2 commits

  • We do not store size with build ids in perf data, but there's enough
    space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark
    build id event with size.

    With this fix the dso with md5 build id will have correct build id data
    and will be usable for debuginfod processing if needed (coming in
    following patches).

    Committer notes:

    Use %zu with size_t to fix this error on 32-bit arches:

    util/header.c: In function '__event_process_build_id':
    util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
    pr_debug("build id event received for %s: %s [%lu]\n",
    ^

    Signed-off-by: Jiri Olsa
    Acked-by: Ian Rogers
    Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • To pick fixes that missed v5.9.

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

13 Oct, 2020

2 commits

  • There are internal library functions, which are not declared as a static.
    They are used inside the library from different files. Hide them from
    the library users, as they are not part of the API.
    These functions are made hidden and are renamed without the prefix "tep_":
    tep_free_plugin_paths
    tep_peek_char
    tep_buffer_init
    tep_get_input_buf_ptr
    tep_get_input_buf
    tep_read_token
    tep_free_token
    tep_free_event
    tep_free_format_field
    __tep_parse_format

    Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/
    Reported-by: Ben Hutchings
    Signed-off-by: Tzvetomir Stoyanov (VMware)
    Reviewed-by: Steven Rostedt (VMware)
    Cc: linux-trace-devel@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Tzvetomir Stoyanov (VMware)
     
  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2020-10-12

    The main changes are:

    1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

    2) libbpf relocation support for different size load/store, from Andrii.

    3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

    4) BPF support for per-cpu variables, form Hao.

    5) sockmap improvements, from John.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

08 Oct, 2020

4 commits

  • Use generalized BTF parsing logic, making it possible to parse BTF both from
    ELF file, as well as a raw BTF dump. This makes it easier to write custom
    tests with manually generated BTFs.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201008001025.292064-4-andrii@kernel.org

    Andrii Nakryiko
     
  • Add support for patching instructions of the following form:
    - rX = *(T *)(rY + );
    - *(T *)(rX + ) = rY;
    - *(T *)(rX + ) = , where T is one of {u8, u16, u32, u64}.

    For such instructions, if the actual kernel field recorded in CO-RE relocation
    has a different size than the one recorded locally (e.g., from vmlinux.h),
    then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads.

    In general, such transformation is not always correct and could lead to
    invalid final value being loaded or stored. But two classes of cases are
    always safe:
    - if both local and target (kernel) types are unsigned integers, but of
    different sizes, then it's OK to adjust load/store instruction according to
    the necessary memory size. Zero-extending nature of such instructions and
    unsignedness make sure that the final value is always correct;
    - pointer size mismatch between BPF target architecture (which is always
    64-bit) and 32-bit host kernel architecture can be similarly resolved
    automatically, because pointer is essentially an unsigned integer. Loading
    32-bit pointer into 64-bit BPF register with zero extension will leave
    correct pointer in the register.

    Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned
    long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled
    with BPF program for BPF target it will be treated by compiler as 64-bit
    integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated
    as 64-bit values by compiler for BPF target. Both problems are now resolved by
    libbpf for direct memory reads.

    But similar transformations are useful in general when kernel fields are
    "resized" from, e.g., unsigned int to unsigned long (or vice versa).

    Now, similar transformations for signed integers are not safe to perform as
    they will result in incorrect sign extension of the value. If such situation
    is detected, libbpf will emit helpful message and will poison the instruction.
    Not failing immediately means that it's possible to guard the instruction
    based on kernel version (or other conditions) and make sure it's not
    reachable.

    If there is a need to read signed integers that change sizes between different
    kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both
    with bitfields and non-bitfield integers of any signedness and handles
    sign-extension properly. Also, bpf_core_read() with proper size and/or use of
    bpf_core_field_size() relocation could allow to deal with such complicated
    situations explicitly, if not so conventiently as direct memory reads.

    Selftests added in a separate patch in progs/test_core_autosize.c demonstrate
    both direct memory and probed use cases.

    BPF_CORE_READ() is not changed and it won't deal with such situations as
    automatically as direct memory reads due to the signedness integer
    limitations, which are much harder to detect and control with compiler macro
    magic. So it's encouraged to utilize direct memory reads as much as possible.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org

    Andrii Nakryiko
     
  • Bypass CO-RE relocations step for BPF programs that are not going to be
    loaded. This allows to have BPF programs compiled in and disabled dynamically
    if kernel is not supposed to provide enough relocation information. In such
    case, there won't be unnecessary warnings about failed relocations.

    Fixes: d929758101fc ("libbpf: Support disabling auto-loading BPF programs")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org

    Andrii Nakryiko
     
  • Fix a compatibility problem when the old XDP_SHARED_UMEM mode is used
    together with the xsk_socket__create() call. In the old XDP_SHARED_UMEM
    mode, only sharing of the same device and queue id was allowed, and
    in this mode, the fill ring and completion ring were shared between
    the AF_XDP sockets.

    Therefore, it was perfectly fine to call the xsk_socket__create() API
    for each socket and not use the new xsk_socket__create_shared() API.
    This behavior was ruined by the commit introducing XDP_SHARED_UMEM
    support between different devices and/or queue ids. This patch restores
    the ability to use xsk_socket__create in these circumstances so that
    backward compatibility is not broken.

    Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices")
    Signed-off-by: Magnus Karlsson
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/1602070946-11154-1-git-send-email-magnus.karlsson@gmail.com

    Magnus Karlsson
     

07 Oct, 2020

4 commits

  • It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
    events with the task mode. As 'perf stat' uses an empty cpu map for
    task mode but armv8_pmu has its own cpu mask, it has confused which map
    it should use when accessing file descriptors and this causes segfaults:

    (gdb) bt
    #0 0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=,
    cpu=) at evsel.c:122
    #1 perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
    #2 0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
    #3 0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
    argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
    at builtin-stat.c:929
    #4 0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
    argc=1) at builtin-stat.c:947
    #5 cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
    #6 0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 ,
    argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
    #7 0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
    argv=argv@entry=0xfffffaea2f90) at perf.c:364
    #8 0x0000000000435378 in run_argv (argcp=,
    argv=) at perf.c:408
    #9 main (argc=4, argv=0xfffffaea2f90) at perf.c:538

    To fix this, I simply used the given cpu map unless the evsel actually
    is not a system-wide event (like uncore events).

    Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
    Reported-by: Wei Li
    Signed-off-by: Namhyung Kim
    Tested-by: Barry Song
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lore.kernel.org/lkml/20201007081311.1831003-1-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • bpf_program__set_attach_target(prog, fd, ...) will always fail when
    fd = 0 (attach to a kernel symbol) because obj->btf_vmlinux is NULL
    and there is no way to set it (at the moment btf_vmlinux is meant
    to be temporary storage for use in bpf_object__load_xattr()).

    Fix this by using libbpf_find_vmlinux_btf_id().

    At some point we may want to opportunistically cache btf_vmlinux
    so it can be reused with multiple programs.

    Signed-off-by: Luigi Rizzo
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: Petar Penkov
    Link: https://lore.kernel.org/bpf/20201005224528.389097-1-lrizzo@google.com

    Luigi Rizzo
     
  • Say a user reuse map fd after creating a map manually and set the
    pin_path, then load the object via libbpf.

    In libbpf bpf_object__create_maps(), bpf_object__reuse_map() will
    return 0 if there is no pinned map in map->pin_path. Then after
    checking if map fd exist, we should also check if pin_path was set
    and do bpf_map__pin() instead of continue the loop.

    Fix it by creating map if fd not exist and continue checking pin_path
    after that.

    Suggested-by: Andrii Nakryiko
    Signed-off-by: Hangbin Liu
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20201006021345.3817033-3-liuhangbin@gmail.com

    Hangbin Liu
     
  • Previously we forgot to close the map fd if bpf_map_update_elem()
    failed during map slot init, which will leak map fd.

    Let's move map slot initialization to new function init_map_slots() to
    simplify the code. And close the map fd if init slot failed.

    Reported-by: Andrii Nakryiko
    Signed-off-by: Hangbin Liu
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20201006021345.3817033-2-liuhangbin@gmail.com

    Hangbin Liu
     

06 Oct, 2020

1 commit


03 Oct, 2020

2 commits

  • If a ksym is defined with a type, libbpf will try to find the ksym's btf
    information from kernel btf. If a valid btf entry for the ksym is found,
    libbpf can pass in the found btf id to the verifier, which validates the
    ksym's type and value.

    Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
    but it has the symbol's address (read from kallsyms) and its value is
    treated as a raw pointer.

    Signed-off-by: Hao Luo
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com

    Hao Luo
     
  • Use semicolons and braces.

    Signed-off-by: Joe Perches
    Signed-off-by: Shuah Khan

    Joe Perches
     

01 Oct, 2020

2 commits

  • Ensure that btf_dump can accommodate new BTF types being appended to BTF
    instance after struct btf_dump was created. This came up during attemp to
    use btf_dump for raw type dumping in selftests, but given changes are not
    excessive, it's good to not have any gotchas in API usage, so I decided to
    support such use case in general.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com

    Andrii Nakryiko
     
  • Port of tail_call_static() helper function from Cilium's BPF code base [0]
    to libbpf, so others can easily consume it as well. We've been using this
    in production code for some time now. The main idea is that we guarantee
    that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
    JITed BPF insns with direct jumps instead of having to fall back to using
    expensive retpolines. By using inline asm, we guarantee that the compiler
    won't merge the call from different paths with potentially different
    content of r2/r3.

    We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
    in different places as a neat trick to trigger compilation errors when
    compiler does not remove code at compilation time. This works for the BPF
    back end as it does not implement the __builtin_trap().

    [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net

    Daniel Borkmann
     

30 Sep, 2020

5 commits

  • Libbpf compiles .o's for static and shared library modes separately, so no
    need to specify -fPIC for both. Keep it only for shared library mode.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200929220604.833631-3-andriin@fb.com

    Andrii Nakryiko
     
  • For some reason compiler doesn't complain about uninitialized variable, fixed
    in previous patch, if libbpf is compiled without -O2 optimization level. So do
    compile it with -O2 and never let similar issue slip by again. -Wall is added
    unconditionally, so no need to specify it again.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200929220604.833631-2-andriin@fb.com

    Andrii Nakryiko
     
  • Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
    Makefile changes to catch such errors are added separately.

    Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com

    Andrii Nakryiko
     
  • This adds support for supplying a target btf ID for the bpf_link_create()
    operation, and adds a new bpf_program__attach_freplace() high-level API for
    attaching freplace functions with a target.

    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk

    Toke Høiland-Jørgensen
     
  • Teach BTF to recognized wrong endianness and transparently convert it
    internally to host endianness. Original endianness of BTF will be preserved
    and used during btf__get_raw_data() to convert resulting raw data to the same
    endianness and a source raw_data. This means that little-endian host can parse
    big-endian BTF with no issues, all the type data will be presented to the
    client application in native endianness, but when it's time for emitting BTF
    to persist it in a file (e.g., after BTF deduplication), original non-native
    endianness will be preserved and stored.

    It's possible to query original endianness of BTF data with new
    btf__endianness() API. It's also possible to override desired output
    endianness with btf__set_endianness(), so that if application needs to load,
    say, big-endian BTF and store it as little-endian BTF, it's possible to
    manually override this. If btf__set_endianness() was used to change
    endianness, btf__endianness() will reflect overridden endianness.

    Given there are no known use cases for supporting cross-endianness for
    .BTF.ext, loading .BTF.ext in non-native endianness is not supported.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com

    Andrii Nakryiko
     

29 Sep, 2020

6 commits

  • Add selftests for BTF writer APIs.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200929020533.711288-4-andriin@fb.com

    Andrii Nakryiko
     
  • BTF strings are used not just for names, they can be arbitrary strings used
    for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
    is too specific and might be misleading. Instead, introduce
    btf__str_by_offset() API which uses generic string terminology.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com

    Andrii Nakryiko
     
  • Add APIs for appending new BTF types at the end of BTF object.

    Each BTF kind has either one API of the form btf__add_(). For types
    that have variable amount of additional items (struct/union, enum, func_proto,
    datasec), additional API is provided to emit each such item. E.g., for
    emitting a struct, one would use the following sequence of API calls:

    btf__add_struct(...);
    btf__add_field(...);
    ...
    btf__add_field(...);

    Each btf__add_field() will ensure that the last BTF type is of STRUCT or
    UNION kind and will automatically increment that type's vlen field.

    All the strings are provided as C strings (const char *), not a string offset.
    This significantly improves usability of BTF writer APIs. All such strings
    will be automatically appended to string section or existing string will be
    re-used, if such string was already added previously.

    Each API attempts to do all the reasonable validations, like enforcing
    non-empty names for entities with required names, proper value bounds, various
    bit offset restrictions, etc.

    Type ID validation is minimal because it's possible to emit a type that refers
    to type that will be emitted later, so libbpf has no way to enforce such
    cases. User must be careful to properly emit all the necessary types and
    specify type IDs that will be valid in the finally generated BTF.

    Each of btf__add_() APIs return new type ID on success or negative
    value on error. APIs like btf__add_field() that emit additional items
    return zero on success and negative value on error.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com

    Andrii Nakryiko
     
  • Add an ability to create an empty BTF object from scratch. This is going to be
    used by pahole for BTF encoding. And also by selftest for convenient creation
    of BTF objects.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com

    Andrii Nakryiko
     
  • Allow internal BTF representation to switch from default read-only mode, in
    which raw BTF data is a single non-modifiable block of memory with BTF header,
    types, and strings layed out sequentially and contiguously in memory, into
    a writable representation with types and strings data split out into separate
    memory regions, that can be dynamically expanded.

    Such writable internal representation is transparent to users of libbpf APIs,
    but allows to append new types and strings at the end of BTF, which is
    a typical use case when generating BTF programmatically. All the basic
    guarantees of BTF types and strings layout is preserved, i.e., user can get
    `struct btf_type *` pointer and read it directly. Such btf_type pointers might
    be invalidated if BTF is modified, so some care is required in such mixed
    read/write scenarios.

    Switch from read-only to writable configuration happens automatically the
    first time when user attempts to modify BTF by either adding a new type or new
    string. It is still possible to get raw BTF data, which is a single piece of
    memory that can be persisted in ELF section or into a file as raw BTF. Such
    raw data memory is also still owned by BTF and will be freed either when BTF
    object is freed or if another modification to BTF happens, as any modification
    invalidates BTF raw representation.

    This patch adds the first two BTF manipulation APIs: btf__add_str(), which
    allows to add arbitrary strings to BTF string section, and btf__find_str()
    which allows to find existing string offset, but not add it if it's missing.
    All the added strings are automatically deduplicated. This is achieved by
    maintaining an additional string lookup index for all unique strings. Such
    index is built when BTF is switched to modifiable mode. If at that time BTF
    strings section contained duplicate strings, they are not de-duplicated. This
    is done specifically to not modify the existing content of BTF (types, their
    string offsets, etc), which can cause confusion and is especially important
    property if there is struct btf_ext associated with struct btf. By following
    this "imperfect deduplication" process, btf_ext is kept consitent and correct.
    If deduplication of strings is necessary, it can be forced by doing BTF
    deduplication, at which point all the strings will be eagerly deduplicated and
    all string offsets both in struct btf and struct btf_ext will be updated.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com

    Andrii Nakryiko
     
  • Calculating a hash of zero-terminated string is a common need when using
    hashmap, so extract it for reuse.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com

    Andrii Nakryiko