23 Oct, 2019

1 commit

  • syzkaller managed to trigger the following crash:

    [...]
    BUG: unable to handle page fault for address: ffffc90001923030
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
    Oops: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
    RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
    RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
    RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
    RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
    RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
    RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
    [...]
    Call Trace:
    kernel_text_address kernel/extable.c:147 [inline]
    __kernel_text_address+0x9a/0x110 kernel/extable.c:102
    unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
    arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
    stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
    save_stack mm/kasan/common.c:69 [inline]
    set_track mm/kasan/common.c:77 [inline]
    __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
    kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
    slab_post_alloc_hook mm/slab.h:584 [inline]
    slab_alloc mm/slab.c:3319 [inline]
    kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
    getname_flags+0xba/0x640 fs/namei.c:138
    getname+0x19/0x20 fs/namei.c:209
    do_sys_open+0x261/0x560 fs/open.c:1091
    __do_sys_open fs/open.c:1115 [inline]
    __se_sys_open fs/open.c:1110 [inline]
    __x64_sys_open+0x87/0x90 fs/open.c:1110
    do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [...]

    After further debugging it turns out that we walk kallsyms while in parallel
    we tear down a BPF program which contains subprograms that have been JITed
    though the program itself has not been fully exposed and is eventually bailing
    out with error.

    The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
    the symbols, however, bpf_prog_free() tears down the JIT memory too early via
    scheduled work. Instead, it needs to properly respect RCU grace period as the
    kallsyms walk for BPF is under RCU.

    Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
    path where we defer final destruction when we have subprogs in the program.

    Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup")
    Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
    Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net

    Daniel Borkmann
     

27 Aug, 2019

1 commit

  • Since BPF constant blinding is performed after the verifier pass, the
    ALU32 instructions inserted for doubleword immediate loads don't have a
    corresponding zext instruction. This is causing a kernel oops on powerpc
    and can be reproduced by running 'test_cgroup_storage' with
    bpf_jit_harden=2.

    Fix this by emitting BPF_ZEXT during constant blinding if
    prog->aux->verifier_zext is set.

    Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
    Reported-by: Michael Ellerman
    Signed-off-by: Naveen N. Rao
    Reviewed-by: Jiong Wang
    Signed-off-by: Daniel Borkmann

    Naveen N. Rao
     

19 Jul, 2019

2 commits

  • On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
    elimination" optimization results in ___bpf_prog_run()'s jumptable code
    changing from this:

    select_insn:
    jmp *jumptable(, %rax, 8)
    ...
    ALU64_ADD_X:
    ...
    jmp *jumptable(, %rax, 8)
    ALU_ADD_X:
    ...
    jmp *jumptable(, %rax, 8)

    to this:

    select_insn:
    mov jumptable, %r12
    jmp *(%r12, %rax, 8)
    ...
    ALU64_ADD_X:
    ...
    jmp *(%r12, %rax, 8)
    ALU_ADD_X:
    ...
    jmp *(%r12, %rax, 8)

    The jumptable address is placed in a register once, at the beginning of
    the function. The function execution can then go through multiple
    indirect jumps which rely on that same register value. This has a few
    issues:

    1) Objtool isn't smart enough to be able to track such a register value
    across multiple recursive indirect jumps through the jump table.

    2) With CONFIG_RETPOLINE enabled, this optimization actually results in
    a small slowdown. I measured a ~4.7% slowdown in the test_bpf
    "tcpdump port 22" selftest.

    This slowdown is actually predicted by the GCC manual:

    Note: When compiling a program using computed gotos, a GCC
    extension, you may get better run-time performance if you
    disable the global common subexpression elimination pass by
    adding -fno-gcse to the command line.

    So just disable the optimization for this function.

    Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
    Reported-by: Randy Dunlap
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Acked-by: Alexei Starovoitov
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com

    Josh Poimboeuf
     
  • Pick up the two pending objtool patches as the next round of objtool fixes
    depend on them.

    Thomas Gleixner
     

09 Jul, 2019

2 commits

  • Objtool previously ignored ___bpf_prog_run() because it didn't understand
    the jump table. This resulted in the ORC unwinder not being able to unwind
    through non-JIT BPF code.

    Now that objtool knows how to read jump tables, remove the whitelist and
    annotate the jump table so objtool can recognize it.

    Also add an additional "const" to the jump table definition to clarify that
    the text pointers are constant. Otherwise GCC sets the section writable
    flag and the assembler spits out warnings.

    Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
    Reported-by: Song Liu
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Acked-by: Alexei Starovoitov
    Cc: Peter Zijlstra
    Cc: Kairui Song
    Cc: Steven Rostedt
    Cc: Borislav Petkov
    Cc: Daniel Borkmann
    Link: https://lkml.kernel.org/r/881939122b88f32be4c374d248c09d7527a87e35.1561685471.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Two cases of overlapping changes, nothing fancy.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Jun, 2019

1 commit

  • Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
    BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.

    BPF_CGROUP_SETSOCKOPT can modify user setsockopt arguments before
    passing them down to the kernel or bypass kernel completely.
    BPF_CGROUP_GETSOCKOPT can can inspect/modify getsockopt arguments that
    kernel returns.
    Both hooks reuse existing PTR_TO_PACKET{,_END} infrastructure.

    The buffer memory is pre-allocated (because I don't think there is
    a precedent for working with __user memory from bpf). This might be
    slow to do for each {s,g}etsockopt call, that's why I've added
    __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
    attached to a cgroup. Note, however, that there is a race between
    __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
    program layout might have changed; this should not be a problem
    because in general there is a race between multiple calls to
    {s,g}etsocktop and user adding/removing bpf progs from a cgroup.

    The return code of the BPF program is handled as follows:
    * 0: EPERM
    * 1: success, continue with next BPF program in the cgroup chain

    v9:
    * allow overwriting setsockopt arguments (Alexei Starovoitov):
    * use set_fs (same as kernel_setsockopt)
    * buffer is always kzalloc'd (no small on-stack buffer)

    v8:
    * use s32 for optlen (Andrii Nakryiko)

    v7:
    * return only 0 or 1 (Alexei Starovoitov)
    * always run all progs (Alexei Starovoitov)
    * use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
    (decided to use optval=-1 instead, optval=0 might be a valid input)
    * call getsockopt hook after kernel handlers (Alexei Starovoitov)

    v6:
    * rework cgroup chaining; stop as soon as bpf program returns
    0 or 2; see patch with the documentation for the details
    * drop Andrii's and Martin's Acked-by (not sure they are comfortable
    with the new state of things)

    v5:
    * skip copy_to_user() and put_user() when ret == 0 (Martin Lau)

    v4:
    * don't export bpf_sk_fullsock helper (Martin Lau)
    * size != sizeof(__u64) for uapi pointers (Martin Lau)
    * offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)

    v3:
    * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
    * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
    Nakryiko)
    * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
    * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
    * new CG_SOCKOPT_ACCESS macro to wrap repeated parts

    v2:
    * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
    * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
    * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
    * added [0,2] return code check to verifier (Martin Lau)
    * dropped unused buf[64] from the stack (Martin Lau)
    * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
    * dropped bpf_target_off from ctx rewrites (Martin Lau)
    * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)

    Cc: Andrii Nakryiko
    Cc: Martin Lau
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov

    Stanislav Fomichev
     

26 Jun, 2019

1 commit

  • Yauheni reported the following code do not work correctly on BE arches:

    ALU_ARSH_X:
    DST = (u64) (u32) ((*(s32 *) &DST) >> SRC);
    CONT;
    ALU_ARSH_K:
    DST = (u64) (u32) ((*(s32 *) &DST) >> IMM);
    CONT;

    and are causing failure of test_verifier test 'arsh32 on imm 2' on BE
    arches.

    The code is taking address and interpreting memory directly, so is not
    endianness neutral. We should instead perform standard C type casting on
    the variable. A u64 to s32 conversion will drop the high 32-bit and reserve
    the low 32-bit as signed integer, this is all we want.

    Fixes: 2dc6b100f928 ("bpf: interpreter support BPF_ALU | BPF_ARSH")
    Reported-by: Yauheni Kaliuta
    Reviewed-by: Jakub Kicinski
    Reviewed-by: Quentin Monnet
    Signed-off-by: Jiong Wang
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Jiong Wang
     

25 Jun, 2019

1 commit

  • This is introduced for admins to check what is happening on XDP_TX when
    bulk XDP_TX is in use, which will be first introduced in veth in next
    commit.

    v3:
    - Add act field to be in line with other XDP tracepoints.

    Signed-off-by: Toshiaki Makita
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Toshiaki Makita
     

18 Jun, 2019

1 commit


15 Jun, 2019

1 commit

  • Convert proc_dointvec_minmax_bpf_stats() into a more generic
    helper, since we are going to use jump labels more often.

    Note that sysctl_bpf_stats_enabled is removed, since
    it is no longer needed/used.

    Signed-off-by: Eric Dumazet
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 May, 2019

1 commit

  • Drop __rcu annotations and rcu read sections from bpf_prog_array
    helper functions. They are not needed since all existing callers
    call those helpers from the rcu update side while holding a mutex.
    This guarantees that use-after-free could not happen.

    In the next patches I'll fix the callers with missing
    rcu_dereference_protected to make sparse/lockdep happy, the proper
    way to use these helpers is:

    struct bpf_prog_array __rcu *progs = ...;
    struct bpf_prog_array *p;

    mutex_lock(&mtx);
    p = rcu_dereference_protected(progs, lockdep_is_held(&mtx));
    bpf_prog_array_length(p);
    bpf_prog_array_copy_to_user(p, ...);
    bpf_prog_array_delete_safe(p, ...);
    bpf_prog_array_copy_info(p, ...);
    bpf_prog_array_copy(p, ...);
    bpf_prog_array_free(p);
    mutex_unlock(&mtx);

    No functional changes! rcu_dereference_protected with lockdep_is_held
    should catch any cases where we update prog array without a mutex
    (I've looked at existing call sites and I think we hold a mutex
    everywhere).

    Motivation is to fix sparse warnings:
    kernel/bpf/core.c:1803:9: warning: incorrect type in argument 1 (different address spaces)
    kernel/bpf/core.c:1803:9: expected struct callback_head *head
    kernel/bpf/core.c:1803:9: got struct callback_head [noderef] *
    kernel/bpf/core.c:1877:44: warning: incorrect type in initializer (different address spaces)
    kernel/bpf/core.c:1877:44: expected struct bpf_prog_array_item *item
    kernel/bpf/core.c:1877:44: got struct bpf_prog_array_item [noderef] *
    kernel/bpf/core.c:1901:26: warning: incorrect type in assignment (different address spaces)
    kernel/bpf/core.c:1901:26: expected struct bpf_prog_array_item *existing
    kernel/bpf/core.c:1901:26: got struct bpf_prog_array_item [noderef] *
    kernel/bpf/core.c:1935:26: warning: incorrect type in assignment (different address spaces)
    kernel/bpf/core.c:1935:26: expected struct bpf_prog_array_item *[assigned] existing
    kernel/bpf/core.c:1935:26: got struct bpf_prog_array_item [noderef] *

    v2:
    * remove comment about potential race; that can't happen
    because all callers are in rcu-update section

    Cc: Roman Gushchin
    Acked-by: Roman Gushchin
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Daniel Borkmann

    Stanislav Fomichev
     

25 May, 2019

1 commit

  • After previous patches, verifier will mark a insn if it really needs zero
    extension on dst_reg.

    It is then for back-ends to decide how to use such information to eliminate
    unnecessary zero extension code-gen during JIT compilation.

    One approach is verifier insert explicit zero extension for those insns
    that need zero extension in a generic way, JIT back-ends then do not
    generate zero extension for sub-register write at default.

    However, only those back-ends which do not have hardware zero extension
    want this optimization. Back-ends like x86_64 and AArch64 have hardware
    zero extension support that the insertion should be disabled.

    This patch introduces new target hook "bpf_jit_needs_zext" which returns
    false at default, meaning verifier zero extension insertion is disabled at
    default. A back-end could override this hook to return true if it doesn't
    have hardware support and want verifier insert zero extension explicitly.

    Offload targets do not use this native target hook, instead, they could
    get the optimization results using bpf_prog_offload_ops.finalize.

    NOTE: arches could have diversified features, it is possible for one arch
    to have hardware zero extension support for some sub-register write insns
    but not for all. For example, PowerPC, SPARC have zero extended loads, but
    not for alu32. So when verifier zero extension insertion enabled, these JIT
    back-ends need to peephole insns to remove those zero extension inserted
    for insn that actually has hardware zero extension support. The peephole
    could be as simple as looking the next insn, if it is a special zero
    extension insn then it is safe to eliminate it if the current insn has
    hardware zero extension support.

    Reviewed-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Alexei Starovoitov

    Jiong Wang
     

11 May, 2019

1 commit

  • systemtap folks reported the following splat recently:

    [ 7790.862212] WARNING: CPU: 3 PID: 26759 at arch/x86/kernel/kprobes/core.c:1022 kprobe_fault_handler+0xec/0xf0
    [...]
    [ 7790.864113] CPU: 3 PID: 26759 Comm: sshd Not tainted 5.1.0-0.rc7.git1.1.fc31.x86_64 #1
    [ 7790.864198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS[...]
    [ 7790.864314] RIP: 0010:kprobe_fault_handler+0xec/0xf0
    [ 7790.864375] Code: 48 8b 50 [...]
    [ 7790.864714] RSP: 0018:ffffc06800bdbb48 EFLAGS: 00010082
    [ 7790.864812] RAX: ffff9e2b75a16320 RBX: 0000000000000000 RCX: 0000000000000000
    [ 7790.865306] RDX: ffffffffffffffff RSI: 000000000000000e RDI: ffffc06800bdbbf8
    [ 7790.865514] RBP: ffffc06800bdbbf8 R08: 0000000000000000 R09: 0000000000000000
    [ 7790.865960] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc06800bdbbf8
    [ 7790.866037] R13: ffff9e2ab56a0418 R14: ffff9e2b6d0bb400 R15: ffff9e2b6d268000
    [ 7790.866114] FS: 00007fde49937d80(0000) GS:ffff9e2b75a00000(0000) knlGS:0000000000000000
    [ 7790.866193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 7790.866318] CR2: 0000000000000000 CR3: 000000012f312000 CR4: 00000000000006e0
    [ 7790.866419] Call Trace:
    [ 7790.866677] do_user_addr_fault+0x64/0x480
    [ 7790.867513] do_page_fault+0x33/0x210
    [ 7790.868002] async_page_fault+0x1e/0x30
    [ 7790.868071] RIP: 0010: (null)
    [ 7790.868144] Code: Bad RIP value.
    [ 7790.868229] RSP: 0018:ffffc06800bdbca8 EFLAGS: 00010282
    [ 7790.868362] RAX: ffff9e2b598b60f8 RBX: ffffc06800bdbe48 RCX: 0000000000000004
    [ 7790.868629] RDX: 0000000000000004 RSI: ffffc06800bdbc6c RDI: ffff9e2b598b60f0
    [ 7790.868834] RBP: ffffc06800bdbcf8 R08: 0000000000000000 R09: 0000000000000004
    [ 7790.870432] R10: 00000000ff6f7a03 R11: 0000000000000000 R12: 0000000000000001
    [ 7790.871859] R13: ffffc06800bdbcb8 R14: 0000000000000000 R15: ffff9e2acd0a5310
    [ 7790.873455] ? vfs_read+0x5/0x170
    [ 7790.874639] ? vfs_read+0x1/0x170
    [ 7790.875834] ? trace_call_bpf+0xf6/0x260
    [ 7790.877044] ? vfs_read+0x1/0x170
    [ 7790.878208] ? vfs_read+0x5/0x170
    [ 7790.879345] ? kprobe_perf_func+0x233/0x260
    [ 7790.880503] ? vfs_read+0x1/0x170
    [ 7790.881632] ? vfs_read+0x5/0x170
    [ 7790.882751] ? kprobe_ftrace_handler+0x92/0xf0
    [ 7790.883926] ? __vfs_read+0x30/0x30
    [ 7790.885050] ? ftrace_ops_assist_func+0x94/0x100
    [ 7790.886183] ? vfs_read+0x1/0x170
    [ 7790.887283] ? vfs_read+0x5/0x170
    [ 7790.888348] ? ksys_read+0x5a/0xe0
    [ 7790.889389] ? do_syscall_64+0x5c/0xa0
    [ 7790.890401] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

    After some debugging, turns out that the logic in 2cbd95a5c4fb
    ("bpf: change parameters of call/branch offset adjustment") has
    a bug that is exposed after 52875a04f4b2 ("bpf: verifier: remove
    dead code") in that we miss some of the jump offset adjustments
    after code patching when we remove dead code, more concretely,
    upon backward jump spanning over the area that is being removed.

    BPF insns of a case that was hit pre 52875a04f4b2:

    [...]
    676: (85) call bpf_perf_event_output#-47616
    677: (05) goto pc-636
    678: (62) *(u32 *)(r10 -64) = 0
    679: (bf) r7 = r10
    680: (07) r7 += -64
    681: (05) goto pc-44
    682: (05) goto pc-1
    683: (05) goto pc-1

    BPF insns afterwards:

    [...]
    618: (85) call bpf_perf_event_output#-47616
    619: (05) goto pc-638
    620: (62) *(u32 *)(r10 -64) = 0
    621: (bf) r7 = r10
    622: (07) r7 += -64
    623: (05) goto pc-44

    To illustrate the bug, situation looks as follows:
    ____
    0 | | = end_new && curr + off + 1 < end_new in the
    branch delta adjustments is never hit because curr + off + 1 <
    end_new is compared as unsigned and therefore curr + off + 1 >
    end_new in unsigned realm as curr + off + 1 becomes negative
    since the insns are memmove()'d before the offset adjustments.

    Correct BPF insns after this fix:

    [...]
    618: (85) call bpf_perf_event_output#-47216
    619: (05) goto pc-578
    620: (62) *(u32 *)(r10 -64) = 0
    621: (bf) r7 = r10
    622: (07) r7 += -64
    623: (05) goto pc-44

    Note that unprivileged case is not affected from this.

    Fixes: 52875a04f4b2 ("bpf: verifier: remove dead code")
    Fixes: 2cbd95a5c4fb ("bpf: change parameters of call/branch offset adjustment")
    Reported-by: Frank Ch. Eigler
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Kicinski
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

08 May, 2019

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

    2) Add fib_sync_mem to control the amount of dirty memory we allow to
    queue up between synchronize RCU calls, from David Ahern.

    3) Make flow classifier more lockless, from Vlad Buslov.

    4) Add PHY downshift support to aquantia driver, from Heiner
    Kallweit.

    5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
    contention on SLAB spinlocks in heavy RPC workloads.

    6) Partial GSO offload support in XFRM, from Boris Pismenny.

    7) Add fast link down support to ethtool, from Heiner Kallweit.

    8) Use siphash for IP ID generator, from Eric Dumazet.

    9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
    entries, from David Ahern.

    10) Move skb->xmit_more into a per-cpu variable, from Florian
    Westphal.

    11) Improve eBPF verifier speed and increase maximum program size,
    from Alexei Starovoitov.

    12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
    spinlocks. From Neil Brown.

    13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

    14) Improve link partner cap detection in generic PHY code, from
    Heiner Kallweit.

    15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
    Maguire.

    16) Remove SKB list implementation assumptions in SCTP, your's truly.

    17) Various cleanups, optimizations, and simplifications in r8169
    driver. From Heiner Kallweit.

    18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

    19) Switch PHY drivers over to use dynamic featue detection, from
    Heiner Kallweit.

    20) Support flow steering without masking in dpaa2-eth, from Ioana
    Ciocoi.

    21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
    Pirko.

    22) Increase the strict parsing of current and future netlink
    attributes, also export such policies to userspace. From Johannes
    Berg.

    23) Allow DSA tag drivers to be modular, from Andrew Lunn.

    24) Remove legacy DSA probing support, also from Andrew Lunn.

    25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
    Haabendal.

    26) Add a generic tracepoint for TX queue timeouts to ease debugging,
    from Cong Wang.

    27) More indirect call optimizations, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
    cxgb4: Fix error path in cxgb4_init_module
    net: phy: improve pause mode reporting in phy_print_status
    dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
    net: macb: Change interrupt and napi enable order in open
    net: ll_temac: Improve error message on error IRQ
    net/sched: remove block pointer from common offload structure
    net: ethernet: support of_get_mac_address new ERR_PTR error
    net: usb: smsc: fix warning reported by kbuild test robot
    staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
    net: dsa: support of_get_mac_address new ERR_PTR error
    net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
    vrf: sit mtu should not be updated when vrf netdev is the link
    net: dsa: Fix error cleanup path in dsa_init_module
    l2tp: Fix possible NULL pointer dereference
    taprio: add null check on sched_nest to avoid potential null pointer dereference
    net: mvpp2: cls: fix less than zero check on a u32 variable
    net_sched: sch_fq: handle non connected flows
    net_sched: sch_fq: do not assume EDT packets are ordered
    net: hns3: use devm_kcalloc when allocating desc_cb
    net: hns3: some cleanup for struct hns3_enet_ring
    ...

    Linus Torvalds
     

30 Apr, 2019

1 commit

  • Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
    permissioned memory in vmalloc and remove places where memory was set RW
    before freeing which is no longer needed. Don't track if the memory is RO
    anymore because it is now tracked in vmalloc.

    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Daniel Borkmann
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@vmware.com
    Signed-off-by: Ingo Molnar

    Rick Edgecombe
     

10 Apr, 2019

1 commit

  • This generic extension to BPF maps allows for directly loading
    an address residing inside a BPF map value as a single BPF
    ldimm64 instruction!

    The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
    is a special src_reg flag for ldimm64 instruction that indicates
    that inside the first part of the double insns's imm field is a
    file descriptor which the verifier then replaces as a full 64bit
    address of the map into both imm parts. For the newly added
    BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
    the first part of the double insns's imm field is again a file
    descriptor corresponding to the map, and the second part of the
    imm field is an offset into the value. The verifier will then
    replace both imm parts with an address that points into the BPF
    map value at the given value offset for maps that support this
    operation. Currently supported is array map with single entry.
    It is possible to support more than just single map element by
    reusing both 16bit off fields of the insns as a map index, so
    full array map lookup could be expressed that way. It hasn't
    been implemented here due to lack of concrete use case, but
    could easily be done so in future in a compatible way, since
    both off fields right now have to be 0 and would correctly
    denote a map index 0.

    The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
    BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
    map pointer versus load of map's value at offset 0, and changing
    BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
    regular map pointer and map value pointer would add unnecessary
    complexity and increases barrier for debugability thus less
    suitable. Using the second part of the imm field as an offset
    into the value does /not/ come with limitations since maximum
    possible value size is in u32 universe anyway.

    This optimization allows for efficiently retrieving an address
    to a map value memory area without having to issue a helper call
    which needs to prepare registers according to calling convention,
    etc, without needing the extra NULL test, and without having to
    add the offset in an additional instruction to the value base
    pointer. The verifier then treats the destination register as
    PTR_TO_MAP_VALUE with constant reg->off from the user passed
    offset from the second imm field, and guarantees that this is
    within bounds of the map value. Any subsequent operations are
    normally treated as typical map value handling without anything
    extra needed from verification side.

    The two map operations for direct value access have been added to
    array map for now. In future other types could be supported as
    well depending on the use case. The main use case for this commit
    is to allow for BPF loader support for global variables that
    reside in .data/.rodata/.bss sections such that we can directly
    load the address of them with minimal additional infrastructure
    required. Loader support has been added in subsequent commits for
    libbpf library.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

04 Apr, 2019

1 commit


06 Mar, 2019

1 commit

  • Pull perf updates from Ingo Molnar:
    "Lots of tooling updates - too many to list, here's a few highlights:

    - Various subcommand updates to 'perf trace', 'perf report', 'perf
    record', 'perf annotate', 'perf script', 'perf test', etc.

    - CPU and NUMA topology and affinity handling improvements,

    - HW tracing and HW support updates:
    - Intel PT updates
    - ARM CoreSight updates
    - vendor HW event updates

    - BPF updates

    - Tons of infrastructure updates, both on the build system and the
    library support side

    - Documentation updates.

    - ... and lots of other changes, see the changelog for details.

    Kernel side updates:

    - Tighten up kprobes blacklist handling, reduce the number of places
    where developers can install a kprobe and hang/crash the system.

    - Fix/enhance vma address filter handling.

    - Various PMU driver updates, small fixes and additions.

    - refcount_t conversions

    - BPF updates

    - error code propagation enhancements

    - misc other changes"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
    perf script python: Add Python3 support to syscall-counts-by-pid.py
    perf script python: Add Python3 support to syscall-counts.py
    perf script python: Add Python3 support to stat-cpi.py
    perf script python: Add Python3 support to stackcollapse.py
    perf script python: Add Python3 support to sctop.py
    perf script python: Add Python3 support to powerpc-hcalls.py
    perf script python: Add Python3 support to net_dropmonitor.py
    perf script python: Add Python3 support to mem-phys-addr.py
    perf script python: Add Python3 support to failed-syscalls-by-pid.py
    perf script python: Add Python3 support to netdev-times.py
    perf tools: Add perf_exe() helper to find perf binary
    perf script: Handle missing fields with -F +..
    perf data: Add perf_data__open_dir_data function
    perf data: Add perf_data__(create_dir|close_dir) functions
    perf data: Fail check_backup in case of error
    perf data: Make check_backup work over directories
    perf tools: Add rm_rf_perf_data function
    perf tools: Add pattern name checking to rm_rf
    perf tools: Add depth checking to rm_rf
    perf data: Add global path holder
    ...

    Linus Torvalds
     

02 Mar, 2019

1 commit


28 Feb, 2019

1 commit

  • JITed BPF programs are indistinguishable from kernel functions, but unlike
    kernel code BPF code can be changed often.
    Typical approach of "perf record" + "perf report" profiling and tuning of
    kernel code works just as well for BPF programs, but kernel code doesn't
    need to be monitored whereas BPF programs do.
    Users load and run large amount of BPF programs.
    These BPF stats allow tools monitor the usage of BPF on the server.
    The monitoring tools will turn sysctl kernel.bpf_stats_enabled
    on and off for few seconds to sample average cost of the programs.
    Aggregated data over hours and days will provide an insight into cost of BPF
    and alarms can trigger in case given program suddenly gets more expensive.

    The cost of two sched_clock() per program invocation adds ~20 nsec.
    Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
    from ~10 nsec to ~30 nsec.
    static_key minimizes the cost of the stats collection.
    There is no measurable difference before/after this patch
    with kernel.bpf_stats_enabled=0

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

02 Feb, 2019

1 commit

  • Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
    bpf program serialize access to other variables.

    Example:
    struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
    };
    struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
    if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
    }

    Restrictions and safety checks:
    - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
    - BTF description of the map is mandatory for safety analysis.
    - bpf program can take one bpf_spin_lock at a time, since two or more can
    cause dead locks.
    - only one 'struct bpf_spin_lock' is allowed per map element.
    It drastically simplifies implementation yet allows bpf program to use
    any number of bpf_spin_locks.
    - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
    - bpf program must bpf_spin_unlock() before return.
    - bpf program can access 'struct bpf_spin_lock' only via
    bpf_spin_lock()/bpf_spin_unlock() helpers.
    - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
    - to use bpf_spin_lock() helper the BTF description of map value must be
    a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
    Nested lock inside another struct is not allowed.
    - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
    - syscall map_update and program map_update do not update bpf_spin_lock field.
    - bpf_spin_lock cannot be on the stack or inside networking packet.
    bpf_spin_lock can only be inside HASH or ARRAY map value.
    - bpf_spin_lock is available to root only and to all program types.
    - bpf_spin_lock is not allowed in inner maps of map-in-map.
    - ld_abs is not allowed inside spin_lock-ed region.
    - tracing progs and socket filter progs cannot use bpf_spin_lock due to
    insufficient preemption checks

    Implementation details:
    - cgroup-bpf class of programs can nest with xdp/tc programs.
    Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
    Other solutions to avoid nested bpf_spin_lock are possible.
    Like making sure that all networking progs run with softirq disabled.
    spin_lock_irqsave is the simplest and doesn't add overhead to the
    programs that don't use it.
    - arch_spinlock_t is used when its implemented as queued_spin_lock
    - archs can force their own arch_spinlock_t
    - on architectures where queued_spin_lock is not available and
    sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
    - presence of bpf_spin_lock inside map value could have been indicated via
    extra flag during map_create, but specifying it via BTF is cleaner.
    It provides introspection for map key/value and reduces user mistakes.

    Next steps:
    - allow bpf_spin_lock in other map types (like cgroup local storage)
    - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
    to request kernel to grab bpf_spin_lock before rewriting the value.
    That will serialize access to map elements.

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

31 Jan, 2019

1 commit


27 Jan, 2019

3 commits

  • This patch adds JIT blinds support for JMP32.

    Like BPF_JMP_REG/IMM, JMP32 version are needed for building raw bpf insn.
    They are added to both include/linux/filter.h and
    tools/include/linux/filter.h.

    Reviewed-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Alexei Starovoitov

    Jiong Wang
     
  • This patch implements interpreting new JMP32 instructions.

    Reviewed-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Alexei Starovoitov

    Jiong Wang
     
  • This patch teach verifier about the new BPF_JMP32 instruction class.
    Verifier need to treat it similar as the existing BPF_JMP class.
    A BPF_JMP32 insn needs to go through all checks that have been done on
    BPF_JMP.

    Also, verifier is doing runtime optimizations based on the extra info
    conditional jump instruction could offer, especially when the comparison is
    between constant and register that the value range of the register could be
    improved based on the comparison results. These code are updated
    accordingly.

    Acked-by: Jakub Kicinski
    Signed-off-by: Jiong Wang
    Signed-off-by: Alexei Starovoitov

    Jiong Wang
     

24 Jan, 2019

2 commits

  • Instead of overwriting dead code with jmp -1 instructions
    remove it completely for root. Adjust verifier state and
    line info appropriately.

    v2:
    - adjust func_info (Alexei);
    - make sure first instruction retains line info (Alexei).
    v4: (Yonghong)
    - remove unnecessary if (!insn to remove) checks;
    - always keep last line info if first live instruction lacks one.
    v5: (Martin Lau)
    - improve and clarify comments.

    Signed-off-by: Jakub Kicinski
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Jakub Kicinski
     
  • In preparation for code removal change parameters to branch
    and call adjustment functions to be more universal. The
    current parameters assume we are patching a single instruction
    with a longer set.

    A diagram may help reading the change, this is for the patch
    single case, patching instruction 1 with a replacement of 4:
    ____
    0 |____|
    1 |____|
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Jakub Kicinski
     

22 Jan, 2019

1 commit

  • For better performance analysis of BPF programs, this patch introduces
    PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
    load/unload information to user space.

    Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
    The following example shows kernel symbols for a BPF program with 7 sub
    programs:

    ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
    ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
    ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
    ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
    ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
    ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
    ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
    ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi

    When a bpf program is loaded, PERF_RECORD_KSYMBOL is generated for each
    of these sub programs. Therefore, PERF_RECORD_BPF_EVENT is not needed
    for simple profiling.

    For annotation, user space need to listen to PERF_RECORD_BPF_EVENT and
    gather more information about these (sub) programs via sys_bpf.

    Signed-off-by: Song Liu
    Reviewed-by: Arnaldo Carvalho de Melo
    Acked-by: Alexei Starovoitov
    Acked-by: Peter Zijlstra (Intel)
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: Peter Zijlstra
    Cc: kernel-team@fb.com
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190117161521.1341602-4-songliubraving@fb.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Song Liu
     

03 Jan, 2019

2 commits

  • Right now we are using BPF ax register in JIT for constant blinding as
    well as in interpreter as temporary variable. Verifier will not be able
    to use it simply because its use will get overridden from the former in
    bpf_jit_blind_insn(). However, it can be made to work in that blinding
    will be skipped if there is prior use in either source or destination
    register on the instruction. Taking constraints of ax into account, the
    verifier is then open to use it in rewrites under some constraints. Note,
    ax register already has mappings in every eBPF JIT.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
    into the hidden ax register. The latter is currently only used in JITs
    for constant blinding as a temporary scratch register, meaning the BPF
    interpreter will never see the use of ax. Therefore it is safe to use
    it for the cases where tmp has been used earlier. This is needed to later
    on allow restricted hidden use of ax in both interpreter and JITs.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

21 Dec, 2018

1 commit


12 Dec, 2018

1 commit

  • Michael and Sandipan report:

    Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
    JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
    and is adjusted again at init time if MODULES_VADDR is defined.

    For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
    the compile-time default at boot-time, which is 0x9c400000 when
    using 64K page size. This overflows the signed 32-bit bpf_jit_limit
    value:

    root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
    -1673527296

    and can cause various unexpected failures throughout the network
    stack. In one case `strace dhclient eth0` reported:

    setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
    16) = -1 ENOTSUPP (Unknown error 524)

    and similar failures can be seen with tools like tcpdump. This doesn't
    always reproduce however, and I'm not sure why. The more consistent
    failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
    host would time out on systemd/netplan configuring a virtio-net NIC
    with no noticeable errors in the logs.

    Given this and also given that in near future some architectures like
    arm64 will have a custom area for BPF JIT image allocations we should
    get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
    4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
    so therefore add another overridable bpf_jit_alloc_exec_limit() helper
    function which returns the possible size of the memory area for deriving
    the default heuristic in bpf_jit_charge_init().

    Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
    bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
    JIT memory provider, and therefore in case archs implement their custom
    module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
    vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

    Additionally, for archs supporting large page sizes, we should change
    the sysctl to be handled as long to not run into sysctl restrictions
    in future.

    Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
    Reported-by: Sandipan Das
    Reported-by: Michael Roth
    Signed-off-by: Daniel Borkmann
    Tested-by: Michael Roth
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

10 Dec, 2018

1 commit

  • This patch adds bpf_line_info support.

    It accepts an array of bpf_line_info objects during BPF_PROG_LOAD.
    The "line_info", "line_info_cnt" and "line_info_rec_size" are added
    to the "union bpf_attr". The "line_info_rec_size" makes
    bpf_line_info extensible in the future.

    The new "check_btf_line()" ensures the userspace line_info is valid
    for the kernel to use.

    When the verifier is translating/patching the bpf_prog (through
    "bpf_patch_insn_single()"), the line_infos' insn_off is also
    adjusted by the newly added "bpf_adj_linfo()".

    If the bpf_prog is jited, this patch also provides the jited addrs (in
    aux->jited_linfo) for the corresponding line_info.insn_off.
    "bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo.
    It is currently called by the x86 jit. Other jits can also use
    "bpf_prog_fill_jited_linfo()" and it will be done in the followup patches.
    In the future, if it deemed necessary, a particular jit could also provide
    its own "bpf_prog_fill_jited_linfo()" implementation.

    A few "*line_info*" fields are added to the bpf_prog_info such
    that the user can get the xlated line_info back (i.e. the line_info
    with its insn_off reflecting the translated prog). The jited_line_info
    is available if the prog is jited. It is an array of __u64.
    If the prog is not jited, jited_line_info_cnt is 0.

    The verifier's verbose log with line_info will be done in
    a follow up patch.

    Signed-off-by: Martin KaFai Lau
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Martin KaFai Lau
     

08 Dec, 2018

1 commit


06 Dec, 2018

1 commit

  • 1) When bpf_dump_raw_ok() == false and the kernel can provide >=1
    func_info to the userspace, the current behavior is setting
    the info.func_info_cnt to 0 instead of setting info.func_info
    to 0.

    It is different from the behavior in jited_func_lens/nr_jited_func_lens,
    jited_ksyms/nr_jited_ksyms...etc.

    This patch fixes it. (i.e. set func_info to 0 instead of
    func_info_cnt to 0 when bpf_dump_raw_ok() == false).

    2) When the userspace passed in info.func_info_cnt == 0, the kernel
    will set the expected func_info size back to the
    info.func_info_rec_size. It is a way for the userspace to learn
    the kernel expected func_info_rec_size introduced in
    commit 838e96904ff3 ("bpf: Introduce bpf_func_info").

    An exception is the kernel expected size is not set when
    func_info is not available for a bpf_prog. This makes the
    returned info.func_info_rec_size has different values
    depending on the returned value of info.func_info_cnt.

    This patch sets the kernel expected size to info.func_info_rec_size
    independent of the info.func_info_cnt.

    3) The current logic only rejects invalid func_info_rec_size if
    func_info_cnt is non zero. This patch also rejects invalid
    nonzero info.func_info_rec_size and not equal to the kernel
    expected size.

    4) Set info.btf_id as long as prog->aux->btf != NULL. That will
    setup the later copy_to_user() codes look the same as others
    which then easier to understand and maintain.

    prog->aux->btf is not NULL only if prog->aux->func_info_cnt > 0.

    Breaking up info.btf_id from prog->aux->func_info_cnt is needed
    for the later line info patch anyway.

    A similar change is made to bpf_get_prog_name().

    Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
    Signed-off-by: Martin KaFai Lau
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov

    Martin KaFai Lau
     

05 Dec, 2018

1 commit

  • By default, BPF uses module_alloc() to allocate executable memory,
    but this is not necessary on all arches and potentially undesirable
    on some of them.

    So break out the module_alloc() and module_memfree() calls into __weak
    functions to allow them to be overridden in arch code.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Daniel Borkmann

    Ard Biesheuvel
     

30 Nov, 2018

1 commit

  • Daniel Borkmann says:

    ====================
    bpf-next 2018-11-30

    The following pull-request contains BPF updates for your *net-next* tree.

    (Getting out bit earlier this time to pull in a dependency from bpf.)

    The main changes are:

    1) Add libbpf ABI versioning and document API naming conventions
    as well as ABI versioning process, from Andrey.

    2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
    programs that is used in conjunction with sk_msg_push_data()
    for adding / removing meta data to the msg data, from John.

    3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
    lib and testsuite build failures on 32 bit, from David.

    4) Make BPF prog dump for !JIT identical to how we dump subprogs
    when JIT is in use, from Yonghong.

    5) Rename btf_get_from_id() to make it more conform with libbpf
    API naming conventions, from Martin.

    6) Add a missing BPF kselftest config item, from Naresh.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller