05 Sep, 2022

1 commit

  • commit fd1894224407c484f652ad456e1ce423e89bb3eb upstream.

    Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
    skbs, that is, the flow->head is null.
    The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
    run a bpf prog which redirects empty skbs.
    So we should determine whether the length of the packet modified by bpf
    prog or others like bpf_prog_test is valid before forwarding it directly.

    LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
    LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

    Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
    Signed-off-by: Zhengchao Shao
    Reviewed-by: Stanislav Fomichev
    Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Zhengchao Shao
     

14 Apr, 2022

1 commit

  • commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream.

    remote_port is another case of a BPF context field documented as a 32-bit
    value in network byte order for which the BPF context access converter
    generates a load of a zero-padded 16-bit integer in network byte order.

    First such case was dst_port in bpf_sock which got addressed in commit
    4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").

    Loading 4-bytes from the remote_port offset and converting the value with
    bpf_ntohl() leads to surprising results, as the expected value is shifted
    by 16 bits.

    Reduce the confusion by splitting the field in two - a 16-bit field holding
    a big-endian integer, and a 16-bit zero-padding anonymous field that
    follows it.

    Suggested-by: Alexei Starovoitov
    Signed-off-by: Jakub Sitnicki
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman

    Jakub Sitnicki
     

28 Sep, 2021

1 commit

  • BPF test infra has some hacks in place which kzalloc() a socket and perform
    minimum init via sock_net_set() and sock_init_data(). As a result, the sk's
    skcd->cgroup is NULL since it didn't go through proper initialization as it
    would have been the case from sk_alloc(). Rather than re-adding a NULL test
    in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test
    socket. The latter also allows to get rid of the bpf_sk_storage_free() special
    case.

    Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
    Fixes: b7a1848e8398 ("bpf: add BPF_PROG_TEST_RUN support for flow dissector")
    Fixes: 2cb494a36c98 ("bpf: add tests for direct packet access from CGROUP_SKB")
    Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
    Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
    Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net

    Daniel Borkmann
     

17 Aug, 2021

1 commit

  • Turn BPF_PROG_RUN into a proper always inlined function. No functional and
    performance changes are intended, but it makes it much easier to understand
    what's going on with how BPF programs are actually get executed. It's more
    obvious what types and callbacks are expected. Also extra () around input
    parameters can be dropped, as well as `__` variable prefixes intended to avoid
    naming collisions, which makes the code simpler to read and write.

    This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
    a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
    BPF_PROG_RUN into a function causes naming conflict compilation error. So
    rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
    bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
    BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org

    Andrii Nakryiko
     

13 Aug, 2021

1 commit

  • Conflicts:

    drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
    9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
    9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
    099fdeda659d ("bnxt_en: Event handler for PPS events")

    kernel/bpf/helpers.c
    include/linux/bpf-cgroup.h
    a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
    c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

    drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
    5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
    2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

    MAINTAINERS
    7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
    7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

10 Aug, 2021

1 commit

  • Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
    added support for syscall program, which is a sleepable program.

    But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
    which is needed to ensure proper rcu callback invocations. This patch adds
    bpf_read_[un]lock_trace() properly.

    Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com

    Yonghong Song
     

05 Aug, 2021

1 commit

  • During recent net into net-next merge ([0]) a piece of old logic ([1]) got
    reintroduced accidentally while resolving merge conflict between bpf's [2]
    and bpf-next's [3]. This check was removed in bpf-next tree to allow extra
    ctx_in parameter passed for XDP test runs. Reinstating the check breaks
    bpf_prog_test_run_xdp logic and causes a corresponding xdp_context_test_run
    selftest failure. Fix by removing the check and allow ctx_in for XDP test
    runs.

    [0] 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
    [1] 947e8b595b82 ("bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN")
    [2] 5e21bb4e8125 ("bpf, test: fix NULL pointer dereference on invalid expected_attach_type")
    [3] 47316f4a3053 ("bpf: Support input xdp_md context in BPF_PROG_TEST_RUN")

    Fixes: 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Daniel Borkmann

    Andrii Nakryiko
     

01 Aug, 2021

1 commit

  • Andrii Nakryiko says:

    ====================
    bpf-next 2021-07-30

    We've added 64 non-merge commits during the last 15 day(s) which contain
    a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).

    The main changes are:

    1) BTF-guided binary data dumping libbpf API, from Alan.

    2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.

    3) Ambient BPF run context and cgroup storage cleanup, from Andrii.

    4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.

    5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.

    6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.

    7) BPF map pinning improvements in libbpf, from Martynas.

    8) Improved module BTF support in libbpf and bpftool, from Quentin.

    9) Bpftool cleanups and documentation improvements, from Quentin.

    10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.

    11) Increased maximum cgroup storage size, from Stanislav.

    12) Small fixes and improvements to BPF tests and samples, from various folks.

    * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
    tools: bpftool: Complete metrics list in "bpftool prog profile" doc
    tools: bpftool: Document and add bash completion for -L, -B options
    selftests/bpf: Update bpftool's consistency script for checking options
    tools: bpftool: Update and synchronise option list in doc and help msg
    tools: bpftool: Complete and synchronise attach or map types
    selftests/bpf: Check consistency between bpftool source, doc, completion
    tools: bpftool: Slightly ease bash completion updates
    unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
    libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
    tools: bpftool: Support dumping split BTF by id
    libbpf: Add split BTF support for btf__load_from_kernel_by_id()
    tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
    tools: Free BTF objects at various locations
    libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
    libbpf: Rename btf__load() as btf__load_into_kernel()
    libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
    bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
    tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
    bpf: Increase supported cgroup storage value size
    libbpf: Fix race when pinning maps in parallel
    ...
    ====================

    Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.org
    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

23 Jul, 2021

1 commit


17 Jul, 2021

1 commit

  • b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
    helper") fixed the problem with cgroup-local storage use in BPF by
    pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
    possible BPF program preemptions and nested executions.

    While this seems to work good in practice, it introduces new and unnecessary
    failure mode in which not all BPF programs might be executed if we fail to
    find an unused slot for cgroup storage, however unlikely it is. It might also
    not be so unlikely when/if we allow sleepable cgroup BPF programs in the
    future.

    Further, the way that cgroup storage is implemented as ambiently-available
    property during entire BPF program execution is a convenient way to pass extra
    information to BPF program and helpers without requiring user code to pass
    around extra arguments explicitly. So it would be good to have a generic
    solution that can allow implementing this without arbitrary restrictions.
    Ideally, such solution would work for both preemptable and sleepable BPF
    programs in exactly the same way.

    This patch introduces such solution, bpf_run_ctx. It adds one pointer field
    (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
    macros in such a way that it always stays valid throughout BPF program
    execution. BPF program preemption is handled by remembering previous
    current->bpf_ctx value locally while executing nested BPF program and
    restoring old value after nested BPF program finishes. This is handled by two
    helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
    supposed to be used before and after BPF program runs, respectively.

    Restoring old value of the pointer handles preemption, while bpf_run_ctx
    pointer being a property of current task_struct naturally solves this problem
    for sleepable BPF programs by "following" BPF program execution as it is
    scheduled in and out of CPU. It would even allow CPU migration of BPF
    programs, even though it's not currently allowed by BPF infra.

    This patch cleans up cgroup local storage handling as a first application. The
    design itself is generic, though, with bpf_run_ctx being an empty struct that
    is supposed to be embedded into a specific struct for a given BPF program type
    (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
    this mechanism for other uses within tracing BPF programs.

    To verify that this change doesn't revert the fix to the original cgroup
    storage issue, I ran the same repro as in the original report ([0]) and didn't
    get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
    bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).

    [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/

    Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org

    Andrii Nakryiko
     

12 Jul, 2021

1 commit

  • These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be
    executed directly in the driver, therefore we should also not directly
    run them from here. To run in these two situations, there must be further
    preparations done, otherwise these may cause a kernel panic.

    For more details, see also dev_xdp_attach().

    [ 46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [ 46.984295] #PF: supervisor read access in kernel mode
    [ 46.985777] #PF: error_code(0x0000) - not-present page
    [ 46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0
    [ 46.989201] Oops: 0000 [#1] SMP PTI
    [ 46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44
    [ 46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24
    [ 46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710
    [ 46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
    [ 47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
    [ 47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
    [ 47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
    [ 47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
    [ 47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
    [ 47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
    [ 47.013244] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
    [ 47.015705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
    [ 47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 47.023574] PKRU: 55555554
    [ 47.024571] Call Trace:
    [ 47.025424] __bpf_prog_run32+0x32/0x50
    [ 47.026296] ? printk+0x53/0x6a
    [ 47.027066] ? ktime_get+0x39/0x90
    [ 47.027895] bpf_test_run.cold.28+0x23/0x123
    [ 47.028866] ? printk+0x53/0x6a
    [ 47.029630] bpf_prog_test_run_xdp+0x149/0x1d0
    [ 47.030649] __sys_bpf+0x1305/0x23d0
    [ 47.031482] __x64_sys_bpf+0x17/0x20
    [ 47.032316] do_syscall_64+0x3a/0x80
    [ 47.033165] entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 47.034254] RIP: 0033:0x7f04a51364dd
    [ 47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48
    [ 47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141
    [ 47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd
    [ 47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a
    [ 47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100
    [ 47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070
    [ 47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    [ 47.047579] Modules linked in:
    [ 47.048318] CR2: 0000000000000000
    [ 47.049120] ---[ end trace 7ad34443d5be719a ]---
    [ 47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710
    [ 47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
    [ 47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
    [ 47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
    [ 47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
    [ 47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
    [ 47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
    [ 47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
    [ 47.063249] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
    [ 47.065070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
    [ 47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 47.070652] PKRU: 55555554
    [ 47.071318] Kernel panic - not syncing: Fatal exception
    [ 47.072854] Kernel Offset: disabled
    [ 47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]---

    Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
    Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
    Reported-by: Abaci
    Signed-off-by: Xuan Zhuo
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Dust Li
    Acked-by: Jesper Dangaard Brouer
    Acked-by: David Ahern
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.com

    Xuan Zhuo
     

08 Jul, 2021

2 commits

  • Support specifying the ingress_ifindex and rx_queue_index of xdp_md
    contexts for BPF_PROG_TEST_RUN.

    The intended use case is to allow testing XDP programs that make decisions
    based on the ingress interface or RX queue.

    If ingress_ifindex is specified, look up the device by the provided index
    in the current namespace and use its xdp_rxq for the xdp_buff. If the
    rx_queue_index is out of range, or is non-zero when the ingress_ifindex is
    0, return -EINVAL.

    Co-developed-by: Cody Haas
    Co-developed-by: Lisa Watanabe
    Signed-off-by: Cody Haas
    Signed-off-by: Lisa Watanabe
    Signed-off-by: Zvi Effron
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-4-zeffron@riotgames.com

    Zvi Effron
     
  • Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
    BPF_PROG_TEST_RUN.

    The intended use case is to pass some XDP meta data to the test runs of
    XDP programs that are used as tail calls.

    For programs that use bpf_prog_test_run_xdp, support xdp_md input and
    output. Unlike with an actual xdp_md during a non-test run, data_meta must
    be 0 because it must point to the start of the provided user data. From
    the initial xdp_md, use data and data_end to adjust the pointers in the
    generated xdp_buff. All other non-zero fields are prohibited (with
    EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
    different) xdp_md back to the userspace.

    We require all fields of input xdp_md except the ones we explicitly
    support to be set to zero. The expectation is that in the future we might
    add support for more fields and we want to fail explicitly if the user
    runs the program on the kernel where we don't yet support them.

    Co-developed-by: Cody Haas
    Co-developed-by: Lisa Watanabe
    Signed-off-by: Cody Haas
    Signed-off-by: Lisa Watanabe
    Signed-off-by: Zvi Effron
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com

    Zvi Effron
     

19 May, 2021

2 commits

  • With the help from bpfptr_t prepare relevant bpf syscall commands
    to be used from kernel and user space.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com

    Alexei Starovoitov
     
  • Add placeholders for bpf_sys_bpf() helper and new program type.
    Make sure to check that expected_attach_type is zero for future extensibility.
    Allow tracing helper functions to be used in this program type, since they will
    only execute from user context via bpf_prog_test_run.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com

    Alexei Starovoitov
     

27 Mar, 2021

1 commit

  • This patch adds a few kernel function bpf_kfunc_call_test*() for the
    selftest's test_run purpose. They will be allowed for tc_cls prog.

    The selftest calling the kernel function bpf_kfunc_call_test*()
    is also added in this patch.

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20210325015252.1551395-1-kafai@fb.com

    Martin KaFai Lau
     

26 Mar, 2021

1 commit

  • Jiri Olsa reported a bug ([1]) in kernel where cgroup local
    storage pointer may be NULL in bpf_get_local_storage() helper.
    There are two issues uncovered by this bug:
    (1). kprobe or tracepoint prog incorrectly sets cgroup local storage
    before prog run,
    (2). due to change from preempt_disable to migrate_disable,
    preemption is possible and percpu storage might be overwritten
    by other tasks.

    This issue (1) is fixed in [2]. This patch tried to address issue (2).
    The following shows how things can go wrong:
    task 1: bpf_cgroup_storage_set() for percpu local storage
    preemption happens
    task 2: bpf_cgroup_storage_set() for percpu local storage
    preemption happens
    task 1: run bpf program

    task 1 will effectively use the percpu local storage setting by task 2
    which will be either NULL or incorrect ones.

    Instead of just one common local storage per cpu, this patch fixed
    the issue by permitting 8 local storages per cpu and each local
    storage is identified by a task_struct pointer. This way, we
    allow at most 8 nested preemption between bpf_cgroup_storage_set()
    and bpf_cgroup_storage_unset(). The percpu local storage slot
    is released (calling bpf_cgroup_storage_unset()) by the same task
    after bpf program finished running.
    bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set()
    interface.

    The patch is tested on top of [2] with reproducer in [1].
    Without this patch, kernel will emit error in 2-3 minutes.
    With this patch, after one hour, still no error.

    [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
    [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com

    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Roman Gushchin
    Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com

    Yonghong Song
     

05 Mar, 2021

2 commits

  • Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
    provides the full bpf_sk_lookup struct as context. Since the
    context includes a socket pointer that can't be exposed
    to user space we define that PROG_TEST_RUN returns the cookie
    of the selected socket or zero in place of the socket pointer.

    We don't support testing programs that select a reuseport socket,
    since this would mean running another (unrelated) BPF program
    from the sk_lookup test handler.

    Signed-off-by: Lorenz Bauer
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com

    Lorenz Bauer
     
  • Share the timing / signal interruption logic between different
    implementations of PROG_TEST_RUN. There is a change in behaviour
    as well. We check the loop exit condition before checking for
    pending signals. This resolves an edge case where a signal
    arrives during the last iteration. Instead of aborting with
    EINTR we return the successful result to user space.

    Signed-off-by: Lorenz Bauer
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20210303101816.36774-2-lmb@cloudflare.com

    Lorenz Bauer
     

21 Jan, 2021

1 commit

  • Conflicts:

    drivers/net/can/dev.c
    commit 03f16c5075b2 ("can: dev: can_restart: fix use after free bug")
    commit 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir")

    Code move.

    drivers/net/dsa/b53/b53_common.c
    commit 8e4052c32d6b ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
    commit b7a9e0da2d1c ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")

    Field rename.

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

14 Jan, 2021

1 commit

  • syzbot reported a WARNING for allocating too big memory:

    WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011
    Modules linked in:
    CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976
    Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8
    RSP: 0018:ffffc900012efb10 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0
    RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000
    R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014
    R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
    FS: 000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
    alloc_pages include/linux/gfp.h:547 [inline]
    kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
    kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
    kmalloc include/linux/slab.h:557 [inline]
    kzalloc include/linux/slab.h:682 [inline]
    bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282
    bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline]
    __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x440499
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499
    RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0
    R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000

    This is because we didn't filter out too big ctx_size_in. Fix it by
    rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64
    numbers.

    Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
    Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com
    Signed-off-by: Song Liu
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com

    Song Liu
     

09 Jan, 2021

2 commits

  • Introduce xdp_prepare_buff utility routine to initialize per-descriptor
    xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
    all XDP capable drivers.

    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Alexander Duyck
    Acked-by: Jesper Dangaard Brouer
    Acked-by: John Fastabend
    Acked-by: Shay Agroskin
    Acked-by: Martin Habets
    Acked-by: Camelia Groza
    Acked-by: Marcin Wojtas
    Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov

    Lorenzo Bianconi
     
  • Introduce xdp_init_buff utility routine to initialize xdp_buff fields
    const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
    xdp_init_buff in all XDP capable drivers.

    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Alexander Duyck
    Acked-by: Jesper Dangaard Brouer
    Acked-by: John Fastabend
    Acked-by: Shay Agroskin
    Acked-by: Martin Habets
    Acked-by: Camelia Groza
    Acked-by: Marcin Wojtas
    Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org
    Signed-off-by: Alexei Starovoitov

    Lorenzo Bianconi
     

30 Sep, 2020

1 commit

  • In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:

    [ 35.874974] BUG: using smp_processor_id() in preemptible [00000000]
    code: new_name/87
    [ 35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
    [ 35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
    [ 35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
    BIOS 1.10.2-1ubuntu1 04/01/2014
    [ 35.916941] Call Trace:
    [ 35.919660] dump_stack+0x77/0x9b
    [ 35.923273] check_preemption_disabled+0xb4/0xc0
    [ 35.928376] bpf_prog_test_run_raw_tp+0xd4/0x1b0
    [ 35.933872] ? selinux_bpf+0xd/0x70
    [ 35.937532] __do_sys_bpf+0x6bb/0x21e0
    [ 35.941570] ? find_held_lock+0x2d/0x90
    [ 35.945687] ? vfs_write+0x150/0x220
    [ 35.949586] do_syscall_64+0x2d/0x40
    [ 35.953443] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fix this by calling migrate_disable() before smp_processor_id().

    Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Song Liu
    Signed-off-by: Alexei Starovoitov

    Song Liu
     

29 Sep, 2020

1 commit

  • Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
    the target program on a specific CPU. This is achieved by a new flag in
    bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
    is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
    BPF programs that handle perf_event and other percpu resources, as the
    program can access these resource locally.

    Signed-off-by: Song Liu
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com

    Song Liu
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

04 Aug, 2020

2 commits

  • Now skb->dev is unconditionally set to the loopback device in current net
    namespace. But if we want to test bpf program which contains code branch
    based on ifindex condition (eg filters out localhost packets) it is useful
    to allow specifying of ifindex from userspace. This patch adds such option
    through ctx_in (__sk_buff) parameter.

    Signed-off-by: Dmitry Yakunin
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru

    Dmitry Yakunin
     
  • Now it's impossible to test all branches of cgroup_skb bpf program which
    accesses skb->family and skb->{local,remote}_ip{4,6} fields because they
    are zeroed during socket allocation. This commit fills socket family and
    addresses from related fields in constructed skb.

    Signed-off-by: Dmitry Yakunin
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200803090545.82046-2-zeil@yandex-team.ru

    Dmitry Yakunin
     

01 Jul, 2020

1 commit

  • Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
    one for PTR_TO_BTF_ID in the ctx structure and the
    other for PTR_TO_BTF_ID after one level pointer chasing.
    In both cases, the test ensures condition is not
    removed.

    For example, for this test
    struct bpf_fentry_test_t {
    struct bpf_fentry_test_t *a;
    };
    int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
    {
    if (arg == 0)
    test7_result = 1;
    return 0;
    }
    Before the previous verifier change, we have xlated codes:
    int test7(long long unsigned int * ctx):
    ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
    0: (79) r1 = *(u64 *)(r1 +0)
    ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
    1: (b4) w0 = 0
    2: (95) exit
    After the previous verifier change, we have:
    int test7(long long unsigned int * ctx):
    ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
    0: (79) r1 = *(u64 *)(r1 +0)
    ; if (arg == 0)
    1: (55) if r1 != 0x0 goto pc+4
    ; test7_result = 1;
    2: (18) r1 = map[id:6][0]+48
    4: (b7) r2 = 1
    5: (7b) *(u64 *)(r1 +0) = r2
    ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
    6: (b4) w0 = 0
    7: (95) exit

    Signed-off-by: Yonghong Song
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com

    Yonghong Song
     

19 May, 2020

1 commit

  • Commit bc56c919fce7 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().")
    recently changed bpf_prog_test_run_xdp() to use larger frames for XDP in
    order to test tail growing frames (via bpf_xdp_adjust_tail) and to have
    memory backing frame better resemble drivers.

    The commit contains a bug, as it tries to copy the max data size from
    userspace, instead of the size provided by userspace. This cause XDP
    unit tests to fail sporadically with EFAULT, an unfortunate behavior.
    The fix is to only copy the size specified by userspace.

    Fixes: bc56c919fce7 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().")
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/158980712729.256597.6115007718472928659.stgit@firesoul

    Jesper Dangaard Brouer
     

15 May, 2020

1 commit

  • Update the memory requirements, when adding xdp.frame_sz in BPF test_run
    function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests.

    Specifically add the expected reserved tailroom, but also allocated a
    larger memory area to reflect that XDP frames usually comes in this
    format. Limit the provided packet data size to 4096 minus headroom +
    tailroom, as this also reflect a common 3520 bytes MTU limit with XDP.

    Note that bpf_test_init already use a memory allocation method that clears
    memory. Thus, this already guards against leaking uninit kernel memory.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/158945349549.97035.15316291762482444006.stgit@firesoul

    Jesper Dangaard Brouer
     

29 Mar, 2020

1 commit

  • Fix build warnings when building net/bpf/test_run.o with W=1 due
    to missing prototype for bpf_fentry_test{1..6}.

    Instead of declaring prototypes, turn off warnings with
    __diag_{push,ignore,pop} as pointed out by Alexei.

    Signed-off-by: Jean-Philippe Menil
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200327204713.28050-1-jpmenil@gmail.com

    Jean-Philippe Menil
     

05 Mar, 2020

2 commits

  • Test for two scenarios:

    * When the fmod_ret program returns 0, the original function should
    be called along with fentry and fexit programs.
    * When the fmod_ret program returns a non-zero value, the original
    function should not be called, no side effect should be observed and
    fentry and fexit programs should be called.

    The result from the kernel function call and whether a side-effect is
    observed is returned via the retval attr of the BPF_PROG_TEST_RUN (bpf)
    syscall.

    Signed-off-by: KP Singh
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200304191853.1529-8-kpsingh@chromium.org

    KP Singh
     
  • The current fexit and fentry tests rely on a different program to
    exercise the functions they attach to. Instead of doing this, implement
    the test operations for tracing which will also be used for
    BPF_MODIFY_RETURN in a subsequent patch.

    Also, clean up the fexit test to use the generated skeleton.

    Signed-off-by: KP Singh
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Acked-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org

    KP Singh
     

04 Mar, 2020

1 commit

  • BPF programs may want to know whether an skb is gso. The canonical
    answer is skb_is_gso(skb), which tests that gso_size != 0.

    Expose this field in the same manner as gso_segs. That field itself
    is not a sufficient signal, as the comment in skb_shared_info makes
    clear: gso_segs may be zero, e.g., from dodgy sources.

    Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests
    of the feature.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200303200503.226217-2-willemdebruijn.kernel@gmail.com

    Willem de Bruijn
     

25 Feb, 2020

1 commit

  • Replace the preemption disable/enable with migrate_disable/enable() to
    reflect the actual requirement and to allow PREEMPT_RT to substitute it
    with an actual migration disable mechanism which does not disable
    preemption.

    [ tglx: Switched it over to migrate disable ]

    Signed-off-by: David S. Miller
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200224145643.785306549@linutronix.de

    David Miller
     

19 Dec, 2019

1 commit


14 Dec, 2019

2 commits


11 Dec, 2019

1 commit

  • Switch existing pattern of "offsetof(..., member) + FIELD_SIZEOF(...,
    member)' to "offsetofend(..., member)" which does exactly what
    we need without all the copy-paste.

    Suggested-by: Andrii Nakryiko
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20191210191933.105321-1-sdf@google.com

    Stanislav Fomichev