01 Oct, 2020

2 commits

  • [ Upstream commit ce880cb825fcc22d4e39046a6c3a3a7f6603883d ]

    Running selftest
    ./btf_btf -p
    the kernel had the following warning:
    [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
    [ 51.529217] Modules linked in:
    [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
    [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
    [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
    ...
    [ 51.542826] Call Trace:
    [ 51.543119] map_seq_next+0x53/0x80
    [ 51.543528] seq_read+0x263/0x400
    [ 51.543932] vfs_read+0xad/0x1c0
    [ 51.544311] ksys_read+0x5f/0xe0
    [ 51.544689] do_syscall_64+0x33/0x40
    [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The related source code in kernel/bpf/hashtab.c:
    709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
    710 {
    711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
    712 struct hlist_nulls_head *head;
    713 struct htab_elem *l, *next_l;
    714 u32 hash, key_size;
    715 int i = 0;
    716
    717 WARN_ON_ONCE(!rcu_read_lock_held());

    In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
    without holding a rcu_read_lock(), hence causing the above warning.
    To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.

    Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrii Nakryiko
    Cc: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
    Signed-off-by: Sasha Levin

    Yonghong Song
     
  • [ Upstream commit 8a37963c7ac9ecb7f86f8ebda020e3f8d6d7b8a0 ]

    If an element is freed via RCU then recursion into BPF instrumentation
    functions is not a concern. The element is already detached from the map
    and the RCU callback does not hold any locks on which a kprobe, perf event
    or tracepoint attached BPF program could deadlock.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200224145643.259118710@linutronix.de
    Signed-off-by: Sasha Levin

    Thomas Gleixner
     

07 Aug, 2020

1 commit

  • commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream.

    The sockmap code currently ignores the value of attach_bpf_fd when
    detaching a program. This is contrary to the usual behaviour of
    checking that attach_bpf_fd represents the currently attached
    program.

    Ensure that attach_bpf_fd is indeed the currently attached
    program. It turns out that all sockmap selftests already do this,
    which indicates that this is unlikely to cause breakage.

    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Lorenz Bauer
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman

    Lorenz Bauer
     

05 Aug, 2020

1 commit

  • [ Upstream commit 1d4e1eab456e1ee92a94987499b211db05f900ea ]

    Fix HASH_OF_MAPS bug of not putting inner map pointer on bpf_map_elem_update()
    operation. This is due to per-cpu extra_elems optimization, which bypassed
    free_htab_elem() logic doing proper clean ups. Make sure that inner map is put
    properly in optimized case as well.

    Fixes: 8c290e60fa2a ("bpf: fix hashmap extra_elems logic")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Daniel Borkmann
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20200729040913.2815687-1-andriin@fb.com
    Signed-off-by: Sasha Levin

    Andrii Nakryiko
     

16 Jul, 2020

1 commit

  • commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

    When evaluating access control over kallsyms visibility, credentials at
    open() time need to be used, not the "current" creds (though in BPF's
    case, this has likely always been the same). Plumb access to associated
    file->f_cred down through bpf_dump_raw_ok() and its callers now that
    kallsysm_show_value() has been refactored to take struct cred.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: bpf@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

01 Jul, 2020

2 commits

  • [ Upstream commit d8fe449a9c51a37d844ab607e14e2f5c657d3cf2 ]

    Attaching to these hooks can break iptables because its optval is
    usually quite big, or at least bigger than the current PAGE_SIZE limit.
    David also mentioned some SCTP options can be big (around 256k).

    For such optvals we expose only the first PAGE_SIZE bytes to
    the BPF program. BPF program has two options:
    1. Set ctx->optlen to 0 to indicate that the BPF's optval
    should be ignored and the kernel should use original userspace
    value.
    2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.

    v5:
    * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
    * update the docs accordingly

    v4:
    * use temporary buffer to avoid optval == optval_end == NULL;
    this removes the corner case in the verifier that might assume
    non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.

    v3:
    * don't increase the limit, bypass the argument

    v2:
    * proper comments formatting (Jakub Kicinski)

    Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
    Signed-off-by: Stanislav Fomichev
    Signed-off-by: Alexei Starovoitov
    Cc: David Laight
    Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
    Signed-off-by: Sasha Levin

    Stanislav Fomichev
     
  • [ Upstream commit 99c51064fb06146b3d494b745c947e438a10aaa7 ]

    Syzkaller discovered that creating a hash of type devmap_hash with a large
    number of entries can hit the memory allocator limit for allocating
    contiguous memory regions. There's really no reason to use kmalloc_array()
    directly in the devmap code, so just switch it to the existing
    bpf_map_area_alloc() function that is used elsewhere.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Reported-by: Xiumei Mu
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     

22 Jun, 2020

1 commit

  • [ Upstream commit 1ea0f9120c8ce105ca181b070561df5cbd6bc049 ]

    The map_lookup_and_delete_elem() function should check for both FMODE_CAN_WRITE
    and FMODE_CAN_READ permissions because it returns a map element to user space.

    Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall")
    Signed-off-by: Anton Protopopov
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200527185700.14658-5-a.s.protopopov@gmail.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Sasha Levin

    Anton Protopopov
     

17 Jun, 2020

1 commit

  • commit 90ceddcb495008ac8ba7a3dce297841efcd7d584 upstream.

    Simplify gen_btf logic to make it work with llvm-objcopy. The existing
    'file format' and 'architecture' parsing logic is brittle and does not
    work with llvm-objcopy/llvm-objdump.

    'file format' output of llvm-objdump>=11 will match GNU objdump, but
    'architecture' (bfdarch) may not.

    .BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag
    because it is part of vmlinux image used for introspection. C code
    can reference the section via linker script defined __start_BTF and
    __stop_BTF. This fixes a small problem that previous .BTF had the
    SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data).

    Additionally, `objcopy -I binary` synthesized symbols
    _binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not
    used elsewhere) are replaced with more commonplace __start_BTF and
    __stop_BTF.

    Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns
    "empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?"

    We use a dd command to change the e_type field in the ELF header from
    ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting
    ET_EXEC as an input file is an extremely rare GNU ld feature that lld
    does not intend to support, because this is error-prone.

    The output section description .BTF in include/asm-generic/vmlinux.lds.h
    avoids potential subtle orphan section placement issues and suppresses
    --orphan-handling=warn warnings.

    Fixes: df786c9b9476 ("bpf: Force .BTF section start to zero when dumping from vmlinux")
    Fixes: cb0cc635c7a9 ("powerpc: Include .BTF section")
    Reported-by: Nathan Chancellor
    Signed-off-by: Fangrui Song
    Signed-off-by: Daniel Borkmann
    Tested-by: Stanislav Fomichev
    Tested-by: Andrii Nakryiko
    Reviewed-by: Stanislav Fomichev
    Reviewed-by: Kees Cook
    Acked-by: Andrii Nakryiko
    Acked-by: Michael Ellerman (powerpc)
    Link: https://github.com/ClangBuiltLinux/linux/issues/871
    Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com
    Signed-off-by: Maria Teguiani
    Tested-by: Matthias Maennich
    Signed-off-by: Greg Kroah-Hartman

    Fangrui Song
     

20 May, 2020

1 commit

  • [ Upstream commit 7f645462ca01d01abb94d75e6768c8b3ed3a188b ]

    Fix to return negative error code -EFAULT from the copy_to_user() error
    handling case instead of 0, as done elsewhere in this function.

    Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall")
    Signed-off-by: Wei Yongjun
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20200430081851.166996-1-weiyongjun1@huawei.com
    Signed-off-by: Sasha Levin

    Wei Yongjun
     

02 May, 2020

2 commits

  • commit bc23d0e3f717ced21fbfacab3ab887d55e5ba367 upstream.

    When the kernel is built with CONFIG_DEBUG_PER_CPU_MAPS, the cpumap code
    can trigger a spurious warning if CONFIG_CPUMASK_OFFSTACK is also set. This
    happens because in this configuration, NR_CPUS can be larger than
    nr_cpumask_bits, so the initial check in cpu_map_alloc() is not sufficient
    to guard against hitting the warning in cpumask_check().

    Fix this by explicitly checking the supplied key against the
    nr_cpumask_bits variable before calling cpu_possible().

    Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
    Reported-by: Xiumei Mu
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Tested-by: Xiumei Mu
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20200416083120.453718-1-toke@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • commit 6e7e63cbb023976d828cdb22422606bf77baa8a9 upstream.

    When check_xadd() verifies an XADD operation on a pointer to a stack slot
    containing a spilled pointer, check_stack_read() verifies that the read,
    which is part of XADD, is valid. However, since the placeholder value -1 is
    passed as `value_regno`, check_stack_read() can only return a binary
    decision and can't return the type of the value that was read. The intent
    here is to verify whether the value read from the stack slot may be used as
    a SCALAR_VALUE; but since check_stack_read() doesn't check the type, and
    the type information is lost when check_stack_read() returns, this is not
    enforced, and a malicious user can abuse XADD to leak spilled kernel
    pointers.

    Fix it by letting check_stack_read() verify that the value is usable as a
    SCALAR_VALUE if no type information is passed to the caller.

    To be able to use __is_pointer_value() in check_stack_read(), move it up.

    Fix up the expected unprivileged error message for a BPF selftest that,
    until now, assumed that unprivileged users can use XADD on stack-spilled
    pointers. This also gives us a test for the behavior introduced in this
    patch for free.

    In theory, this could also be fixed by forbidding XADD on stack spills
    entirely, since XADD is a locked operation (for operations on memory with
    concurrency) and there can't be any concurrency on the BPF stack; but
    Alexei has said that he wants to keep XADD on stack slots working to avoid
    changes to the test suite [1].

    The following BPF program demonstrates how to leak a BPF map pointer as an
    unprivileged user using this bug:

    // r7 = map_pointer
    BPF_LD_MAP_FD(BPF_REG_7, small_map),
    // r8 = launder(map_pointer)
    BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_7, -8),
    BPF_MOV64_IMM(BPF_REG_1, 0),
    ((struct bpf_insn) {
    .code = BPF_STX | BPF_DW | BPF_XADD,
    .dst_reg = BPF_REG_FP,
    .src_reg = BPF_REG_1,
    .off = -8
    }),
    BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_FP, -8),

    // store r8 into map
    BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_7),
    BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP),
    BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -4),
    BPF_ST_MEM(BPF_W, BPF_REG_ARG2, 0, 0),
    BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
    BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
    BPF_EXIT_INSN(),
    BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_8, 0),

    BPF_MOV64_IMM(BPF_REG_0, 0),
    BPF_EXIT_INSN()

    [1] https://lore.kernel.org/bpf/20200416211116.qxqcza5vo2ddnkdq@ast-mbp.dhcp.thefacebook.com/

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200417000007.10734-1-jannh@google.com
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

23 Apr, 2020

1 commit

  • [ no upstream commit ]

    See the glory details in 100605035e15 ("bpf: Verifier, do_refine_retval_range
    may clamp umin to 0 incorrectly") for why 849fa50662fb ("bpf/verifier: refine
    retval R0 state for bpf_get_stack helper") is buggy. The whole series however
    is not suitable for stable since it adds significant amount [0] of verifier
    complexity in order to add 32bit subreg tracking. Something simpler is needed.

    Unfortunately, reverting 849fa50662fb ("bpf/verifier: refine retval R0 state
    for bpf_get_stack helper") or just cherry-picking 100605035e15 ("bpf: Verifier,
    do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since
    it will break existing tracing programs badly (at least those that are using
    bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is
    also not an option since on 4.19 kernels an error will cause a soft-lockup due
    to hitting dead-code sanitized branch since we don't hard-wire such branches
    in old kernels yet. But even then for 5.x 849fa50662fb ("bpf/verifier: refine
    retval R0 state for bpf_get_stack helper") would cause wrong bounds on the
    verifier simluation when an error is hit.

    In one of the earlier iterations of mentioned patch series for upstream there
    was the concern that just using smax_value in do_refine_retval_range() would
    nuke bounds by subsequent <>32 shifts before the comparison against 0 [1]
    which eventually led to the 32bit subreg tracking in the first place. While I
    initially went for implementing the idea [1] to pattern match the two shift
    operations, it turned out to be more complex than actually needed, meaning, we
    could simply treat do_refine_retval_range() similarly to how we branch off
    verification for conditionals or under speculation, that is, pushing a new
    reg state to the stack for later verification. This means, instead of verifying
    the current path with the ret_reg in [S32MIN, msize_max_value] interval where
    later bounds would get nuked, we split this into two: i) for the success case
    where ret_reg can be in [0, msize_max_value], and ii) for the error case with
    ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds
    during these shift patterns and can match reg < 0 test. test_progs also succeed
    with this approach.

    [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/
    [1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d

    Fixes: 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper")
    Reported-by: Lorenzo Fontana
    Reported-by: Leonardo Di Donato
    Reported-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Tested-by: John Fastabend
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

17 Apr, 2020

1 commit

  • [ Upstream commit 604dca5e3af1db98bd123b7bfc02b017af99e3a0 ]

    The BPF verifier tried to track values based on 32-bit comparisons by
    (ab)using the tnum state via 581738a681b6 ("bpf: Provide better register
    bounds after jmp32 instructions"). The idea is that after a check like
    this:

    if ((u32)r0 > 3)
    exit

    We can't meaningfully constrain the arithmetic-range-based tracking, but
    we can update the tnum state to (value=0,mask=0xffff'ffff'0000'0003).
    However, the implementation from 581738a681b6 didn't compute the tnum
    constraint based on the fixed operand, but instead derives it from the
    arithmetic-range-based tracking. This means that after the following
    sequence of operations:

    if (r0 >= 0x1'0000'0001)
    exit
    if ((u32)r0 > 7)
    exit

    The verifier assumed that the lower half of r0 is in the range (0, 0)
    and apply the tnum constraint (value=0,mask=0xffff'ffff'0000'0000) thus
    causing the overall tnum to be (value=0,mask=0x1'0000'0000), which was
    incorrect. Provide a fixed implementation.

    Fixes: 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions")
    Signed-off-by: Jann Horn
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200330160324.15259-3-daniel@iogearbox.net
    Signed-off-by: Sasha Levin

    Jann Horn
     

02 Apr, 2020

2 commits

  • commit 5c6f25887963f15492b604dd25cb149c501bbabf upstream.

    Trying to initialize a structure with "= {};" will not always clean out
    all padding locations in a structure. So be explicit and call memset to
    initialize everything for a number of bpf information structures that
    are then copied from userspace, sometimes from smaller memory locations
    than the size of the structure.

    Reported-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • commit 8096f229421f7b22433775e928d506f0342e5907 upstream.

    For the bpf syscall, we are relying on the compiler to properly zero out
    the bpf_attr union that we copy userspace data into. Unfortunately that
    doesn't always work properly, padding and other oddities might not be
    correctly zeroed, and in some tests odd things have been found when the
    stack is pre-initialized to other values.

    Fix this by explicitly memsetting the structure to 0 before using it.

    Reported-by: Maciej Żenczykowski
    Reported-by: John Stultz
    Reported-by: Alexander Potapenko
    Reported-by: Alistair Delva
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
    Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

01 Apr, 2020

4 commits

  • commit f2d67fec0b43edce8c416101cdc52e71145b5fef upstream.

    Anatoly has been fuzzing with kBdysch harness and reported a hang in
    one of the outcomes:

    0: (b7) r0 = 808464432
    1: (7f) r0 >>= r0
    2: (14) w0 -= 808464432
    3: (07) r0 += 808464432
    4: (b7) r1 = 808464432
    5: (de) if w1 s 0x30303030 goto pc+0
    R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
    9: (76) if w0 s>= 0x303030 goto pc+2
    12: (95) exit

    from 8 to 9: safe

    from 5 to 6: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0
    6: (07) r0 += -2144337872
    7: (14) w0 -= -1607454672
    8: (25) if r0 > 0x30303030 goto pc+0
    R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
    9: safe

    from 8 to 9: safe
    verification time 589 usec
    stack depth 0
    processed 17 insns (limit 1000000) [...]

    The underlying program was xlated as follows:

    # bpftool p d x i 9
    0: (b7) r0 = 808464432
    1: (7f) r0 >>= r0
    2: (14) w0 -= 808464432
    3: (07) r0 += 808464432
    4: (b7) r1 = 808464432
    5: (de) if w1 s 0x30303030 goto pc+0
    9: (76) if w0 s>= 0x303030 goto pc+2
    10: (05) goto pc-1
    11: (05) goto pc-1
    12: (95) exit

    The verifier rewrote original instructions it recognized as dead code with
    'goto pc-1', but reality differs from verifier simulation in that we're
    actually able to trigger a hang due to hitting the 'goto pc-1' instructions.

    Taking different examples to make the issue more obvious: in this example
    we're probing bounds on a completely unknown scalar variable in r1:

    [...]
    5: R0_w=inv1 R1_w=inv(id=0) R10=fp0
    5: (18) r2 = 0x4000000000
    7: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R10=fp0
    7: (18) r3 = 0x2000000000
    9: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R10=fp0
    9: (18) r4 = 0x400
    11: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R10=fp0
    11: (18) r5 = 0x200
    13: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
    13: (2d) if r1 > r2 goto pc+4
    R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
    14: R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
    14: (ad) if r1 < r3 goto pc+3
    R0_w=inv1 R1_w=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
    15: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
    15: (2e) if w1 > w4 goto pc+2
    R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
    16: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
    16: (ae) if w1 < w5 goto pc+1
    R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
    [...]

    We're first probing lower/upper bounds via jmp64, later we do a similar
    check via jmp32 and examine the resulting var_off there. After fall-through
    in insn 14, we get the following bounded r1 with 0x7fffffffff unknown marked
    bits in the variable section.

    Thus, after knowing r1 = 0x2000000000:

    max: 0b100000000000000000000000000000000000000 / 0x4000000000
    var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
    min: 0b010000000000000000000000000000000000000 / 0x2000000000

    Now, in insn 15 and 16, we perform a similar probe with lower/upper bounds
    in jmp32.

    Thus, after knowing r1 = 0x2000000000 and
    w1 = 0x200:

    max: 0b100000000000000000000000000000000000000 / 0x4000000000
    var: 0b111111100000000000000000000000000000000 / 0x7f00000000
    min: 0b010000000000000000000000000000000000000 / 0x2000000000

    The lower/upper bounds haven't changed since they have high bits set in
    u64 space and the jmp32 tests can only refine bounds in the low bits.

    However, for the var part the expectation would have been 0x7f000007ff
    or something less precise up to 0x7fffffffff. A outcome of 0x7f00000000
    is not correct since it would contradict the earlier probed bounds
    where we know that the result should have been in [0x200,0x400] in u32
    space. Therefore, tests with such info will lead to wrong verifier
    assumptions later on like falsely predicting conditional jumps to be
    always taken, etc.

    The issue here is that __reg_bound_offset32()'s implementation from
    commit 581738a681b6 ("bpf: Provide better register bounds after jmp32
    instructions") makes an incorrect range assumption:

    static void __reg_bound_offset32(struct bpf_reg_state *reg)
    {
    u64 mask = 0xffffFFFF;
    struct tnum range = tnum_range(reg->umin_value & mask,
    reg->umax_value & mask);
    struct tnum lo32 = tnum_cast(reg->var_off, 4);
    struct tnum hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);

    reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
    }

    In the above walk-through example, __reg_bound_offset32() as-is chose
    a range after masking with 0xffffffff of [0x0,0x0] since umin:0x2000000000
    and umax:0x4000000000 and therefore the lo32 part was clamped to 0x0 as
    well. However, in the umin:0x2000000000 and umax:0x4000000000 range above
    we'd end up with an actual possible interval of [0x0,0xffffffff] for u32
    space instead.

    In case of the original reproducer, the situation looked as follows at
    insn 5 for r0:

    [...]
    5: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0
    0x30303030 0x13030302f
    5: (de) if w1 svar_off, 4);
    hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);
    reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
    }

    In the case of the concrete example, this gives us a more conservative unknown
    section. Thus, after knowing r1 = 0x2000000000 and
    w1 = 0x200:

    max: 0b100000000000000000000000000000000000000 / 0x4000000000
    var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
    min: 0b010000000000000000000000000000000000000 / 0x2000000000

    However, above new __reg_bound_offset32() has no effect on refining the
    knowledge of the register contents. Meaning, if the bounds in hi32 range
    mismatch we'll get the identity function given the range reg spans
    [0x0,0xffffffff] and we cast var_off into lo32 only to later on binary
    or it again with the hi32.

    Likewise, if the bounds in hi32 range match, then we mask both bounds
    with 0xffffffff, use the resulting umin/umax for the range to later
    intersect the lo32 with it. However, _prior_ called __reg_bound_offset()
    did already such intersection on the full reg and we therefore would only
    repeat the same operation on the lo32 part twice.

    Given this has no effect and the original commit had false assumptions,
    this patch reverts the code entirely which is also more straight forward
    for stable trees: apparently 581738a681b6 got auto-selected by Sasha's
    ML system and misclassified as a fix, so it got sucked into v5.4 where
    it should never have landed. A revert is low-risk also from a user PoV
    since it requires a recent kernel and llc to opt-into -mcpu=v3 BPF CPU
    to generate jmp32 instructions. A proper bounds refinement would need a
    significantly more complex approach which is currently being worked, but
    no stable material [0]. Hence revert is best option for stable. After the
    revert, the original reported program gets rejected as follows:

    1: (7f) r0 >>= r0
    2: (14) w0 -= 808464432
    3: (07) r0 += 808464432
    4: (b7) r1 = 808464432
    5: (de) if w1 s 0x30303030 goto pc+0
    R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x3fffffff)) R1_w=invP808464432 R10=fp0
    9: (76) if w0 s>= 0x303030 goto pc+2
    R0=invP(id=0,umax_value=3158063,var_off=(0x0; 0x3fffff)) R1=invP808464432 R10=fp0
    10: (30) r0 = *(u8 *)skb[808464432]
    BPF_LD_[ABS|IND] uses reserved fields
    processed 11 insns (limit 1000000) [...]

    [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/T/

    Fixes: 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions")
    Reported-by: Anatoly Trosinenko
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200330160324.15259-2-daniel@iogearbox.net
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • commit da6c7faeb103c493e505e87643272f70be586635 upstream.

    btf_enum_check_member() was currently sure to recognize the size of
    "enum" type members in struct/union as the size of "int" even if
    its size was packed.

    This patch fixes BTF enum verification to use the correct size
    of member in BPF programs.

    Fixes: 179cde8cef7e ("bpf: btf: Check members of struct/union")
    Signed-off-by: Yoshiki Komachi
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/1583825550-18606-2-git-send-email-komachi.yoshiki@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Yoshiki Komachi
     
  • commit 62039c30c19dcab96621e074aeeb90da7100def7 upstream.

    Local storage array isn't initialized, so if cgroup storage allocation fails
    for BPF_CGROUP_STORAGE_SHARED, error handling code will attempt to free
    uninitialized pointer for BPF_CGROUP_STORAGE_PERCPU storage type. Avoid this
    by always initializing storage pointers to NULLs.

    Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple cgroup storage types")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200309222756.1018737-1-andriin@fb.com
    Signed-off-by: Greg Kroah-Hartman

    Andrii Nakryiko
     
  • commit 1d8006abaab4cb90f81add86e8d1bf9411add05a upstream.

    There is no compensating cgroup_bpf_put() for each ancestor cgroup in
    cgroup_bpf_inherit(). If compute_effective_progs returns error, those cgroups
    won't be freed ever. Fix it by putting them in cleanup code path.

    Fixes: e10360f815ca ("bpf: cgroup: prevent out-of-order release of cgroup bpf")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Roman Gushchin
    Link: https://lore.kernel.org/bpf/20200309224017.1063297-1-andriin@fb.com
    Signed-off-by: Greg Kroah-Hartman

    Andrii Nakryiko
     

29 Feb, 2020

1 commit

  • commit e20d3a055a457a10a4c748ce5b7c2ed3173a1324 upstream.

    This if guards whether user-space wants a copy of the offload-jited
    bytecode and whether this bytecode exists. By erroneously doing a bitwise
    AND instead of a logical AND on user- and kernel-space buffer-size can lead
    to no data being copied to user-space especially when user-space size is a
    power of two and bigger then the kernel-space buffer.

    Fixes: fcfb126defda ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
    Signed-off-by: Johannes Krude
    Signed-off-by: Daniel Borkmann
    Acked-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
    Signed-off-by: Greg Kroah-Hartman

    Johannes Krude
     

24 Feb, 2020

1 commit

  • [ Upstream commit 90435a7891a2259b0f74c5a1bc5600d0d64cba8f ]

    If seq_file .next fuction does not change position index,
    read after some lseek can generate an unexpected output.

    See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283

    v1 -> v2: removed missed increment in end of function

    Signed-off-by: Vasily Averin
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com
    Signed-off-by: Sasha Levin

    Vasily Averin
     

11 Feb, 2020

1 commit

  • commit 485ec2ea9cf556e9c120e07961b7b459d776a115 upstream.

    head is traversed using hlist_for_each_entry_rcu outside an RCU
    read-side critical section but under the protection of dtab->index_lock.

    Hence, add corresponding lockdep expression to silence false-positive
    lockdep warnings, and harden RCU lists.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Signed-off-by: Amol Grover
    Signed-off-by: Daniel Borkmann
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Toke Høiland-Jørgensen
    Link: https://lore.kernel.org/bpf/20200123120437.26506-1-frextrite@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Amol Grover
     

26 Jan, 2020

1 commit

  • [ Upstream commit 071cdecec57fb5d5df78e6a12114ad7bccea5b0e ]

    Tetsuo pointed out that it was not only the device unregister hook that was
    broken for devmap_hash types, it was also cleanup on map free. So better
    fix this as well.

    While we're at it, there's no reason to allocate the netdev_map array for
    DEVMAP_HASH, so skip that and adjust the cost accordingly.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Reported-by: Tetsuo Handa
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20191121133612.430414-1-toke@redhat.com
    Signed-off-by: Sasha Levin

    Toke Høiland-Jørgensen
     

23 Jan, 2020

1 commit

  • commit 0af2ffc93a4b50948f9dad2786b7f1bd253bf0b9 upstream.

    Anatoly has been fuzzing with kBdysch harness and reported a hang in one
    of the outcomes:

    0: R1=ctx(id=0,off=0,imm=0) R10=fp0
    0: (85) call bpf_get_socket_cookie#46
    1: R0_w=invP(id=0) R10=fp0
    1: (57) r0 &= 808464432
    2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
    2: (14) w0 -= 810299440
    3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
    3: (c4) w0 s>>= 1
    4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
    4: (76) if w0 s>= 0x30303030 goto pc+216
    221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
    221: (95) exit
    processed 6 insns (limit 1000000) [...]

    Taking a closer look, the program was xlated as follows:

    # ./bpftool p d x i 12
    0: (85) call bpf_get_socket_cookie#7800896
    1: (bf) r6 = r0
    2: (57) r6 &= 808464432
    3: (14) w6 -= 810299440
    4: (c4) w6 s>>= 1
    5: (76) if w6 s>= 0x30303030 goto pc+216
    6: (05) goto pc-1
    7: (05) goto pc-1
    8: (05) goto pc-1
    [...]
    220: (05) goto pc-1
    221: (05) goto pc-1
    222: (95) exit

    Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix
    precision tracking for unbounded scalars"), that is, the fall-through
    branch in the instruction 5 is considered to be never taken given the
    conclusion from the min/max bounds tracking in w6, and therefore the
    dead-code sanitation rewrites it as goto pc-1. However, real-life input
    disagrees with verification analysis since a soft-lockup was observed.

    The bug sits in the analysis of the ARSH. The definition is that we shift
    the target register value right by K bits through shifting in copies of
    its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
    register into 32 bit mode, same happens after simulating the operation.
    However, for the case of simulating the actual ARSH, we don't take the
    mode into account and act as if it's always 64 bit, but location of sign
    bit is different:

    dst_reg->smin_value >>= umin_val;
    dst_reg->smax_value >>= umin_val;
    dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);

    Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
    for example return 0xffff. With the above ARSH simulation, we'd see the
    following results:

    [...]
    1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
    1: (85) call bpf_get_socket_cookie#46
    2: R0_w=invP(id=0) R10=fp0
    2: (57) r0 &= 808464432
    -> R0_runtime = 0x3030
    3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
    3: (14) w0 -= 810299440
    -> R0_runtime = 0xcfb40000
    4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
    (0xffffffff)
    4: (c4) w0 s>>= 1
    -> R0_runtime = 0xe7da0000
    5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
    (0x67c00000) (0x7ffbfff8)
    [...]

    In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
    0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
    is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
    retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
    0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
    and means here that the simulation didn't retain the sign bit. With above
    logic, the updates happen on the 64 bit min/max bounds and given we coerced
    the register, the sign bits of the bounds are cleared as well, meaning, we
    need to force the simulation into s32 space for 32 bit alu mode.

    Verification after the fix below. We're first analyzing the fall-through branch
    on 32 bit signed >= test eventually leading to rejection of the program in this
    specific case:

    0: R1=ctx(id=0,off=0,imm=0) R10=fp0
    0: (b7) r2 = 808464432
    1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
    1: (85) call bpf_get_socket_cookie#46
    2: R0_w=invP(id=0) R10=fp0
    2: (bf) r6 = r0
    3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
    3: (57) r6 &= 808464432
    4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
    4: (14) w6 -= 810299440
    5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
    5: (c4) w6 s>>= 1
    6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
    (0x67c00000) (0xfffbfff8)
    6: (76) if w6 s>= 0x30303030 goto pc+216
    7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
    7: (30) r0 = *(u8 *)skb[808464432]
    BPF_LD_[ABS|IND] uses reserved fields
    processed 8 insns (limit 1000000) [...]

    Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH")
    Reported-by: Anatoly Trosinenko
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

18 Jan, 2020

1 commit

  • commit e10360f815ca6367357b2c2cfef17fc663e50f7b upstream.

    Before commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
    cgroup bpf structures were released with
    corresponding cgroup structures. It guaranteed the hierarchical order
    of destruction: children were always first. It preserved attached
    programs from being released before their propagated copies.

    But with cgroup auto-detachment there are no such guarantees anymore:
    cgroup bpf is released as soon as the cgroup is offline and there are
    no live associated sockets. It means that an attached program can be
    detached and released, while its propagated copy is still living
    in the cgroup subtree. This will obviously lead to an use-after-free
    bug.

    To reproduce the issue the following script can be used:

    #!/bin/bash

    CGROOT=/sys/fs/cgroup

    mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
    sleep 1

    ./test_cgrp2_attach ${CGROOT}/A egress &
    A_PID=$!
    ./test_cgrp2_attach ${CGROOT}/B egress &
    B_PID=$!

    echo $$ > ${CGROOT}/A/C/cgroup.procs
    iperf -s &
    S_PID=$!
    iperf -c localhost -t 100 &
    C_PID=$!

    sleep 1

    echo $$ > ${CGROOT}/B/cgroup.procs
    echo ${S_PID} > ${CGROOT}/B/cgroup.procs
    echo ${C_PID} > ${CGROOT}/B/cgroup.procs

    sleep 1

    rmdir ${CGROOT}/A/C
    rmdir ${CGROOT}/A

    sleep 1

    kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}

    On the unpatched kernel the following stacktrace can be obtained:

    [ 33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
    [ 33.620677] #PF: supervisor read access in kernel mode
    [ 33.621293] #PF: error_code(0x0000) - not-present page
    [ 33.622754] Oops: 0000 [#1] SMP NOPTI
    [ 33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
    [ 33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
    [ 33.635809] Call Trace:
    [ 33.636118] ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
    [ 33.636728] ? __switch_to_asm+0x40/0x70
    [ 33.637196] ip_finish_output+0x68/0xa0
    [ 33.637654] ip_output+0x76/0xf0
    [ 33.638046] ? __ip_finish_output+0x1c0/0x1c0
    [ 33.638576] __ip_queue_xmit+0x157/0x410
    [ 33.639049] __tcp_transmit_skb+0x535/0xaf0
    [ 33.639557] tcp_write_xmit+0x378/0x1190
    [ 33.640049] ? _copy_from_iter_full+0x8d/0x260
    [ 33.640592] tcp_sendmsg_locked+0x2a2/0xdc0
    [ 33.641098] ? sock_has_perm+0x10/0xa0
    [ 33.641574] tcp_sendmsg+0x28/0x40
    [ 33.641985] sock_sendmsg+0x57/0x60
    [ 33.642411] sock_write_iter+0x97/0x100
    [ 33.642876] new_sync_write+0x1b6/0x1d0
    [ 33.643339] vfs_write+0xb6/0x1a0
    [ 33.643752] ksys_write+0xa7/0xe0
    [ 33.644156] do_syscall_64+0x5b/0x1b0
    [ 33.644605] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fix this by grabbing a reference to the bpf structure of each ancestor
    on the initialization of the cgroup bpf structure, and dropping the
    reference at the end of releasing the cgroup bpf structure.

    This will restore the hierarchical order of cgroup bpf releasing,
    without adding any operations on hot paths.

    Thanks to Josef Bacik for the debugging and the initial analysis of
    the problem.

    Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
    Reported-by: Josef Bacik
    Signed-off-by: Roman Gushchin
    Acked-by: Song Liu
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Roman Gushchin
     

12 Jan, 2020

1 commit

  • commit 6d4f151acf9a4f6fab09b615f246c717ddedcf0c upstream.

    Anatoly has been fuzzing with kBdysch harness and reported a KASAN
    slab oob in one of the outcomes:

    [...]
    [ 77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
    [ 77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
    [ 77.361119]
    [ 77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
    [ 77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    [ 77.362984] Call Trace:
    [ 77.363249] dump_stack+0x97/0xe0
    [ 77.363603] print_address_description.constprop.0+0x1d/0x220
    [ 77.364251] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
    [ 77.365030] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
    [ 77.365860] __kasan_report.cold+0x37/0x7b
    [ 77.366365] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
    [ 77.366940] kasan_report+0xe/0x20
    [ 77.367295] bpf_skb_load_helper_8_no_cache+0x71/0x130
    [ 77.367821] ? bpf_skb_load_helper_8+0xf0/0xf0
    [ 77.368278] ? mark_lock+0xa3/0x9b0
    [ 77.368641] ? kvm_sched_clock_read+0x14/0x30
    [ 77.369096] ? sched_clock+0x5/0x10
    [ 77.369460] ? sched_clock_cpu+0x18/0x110
    [ 77.369876] ? bpf_skb_load_helper_8+0xf0/0xf0
    [ 77.370330] ___bpf_prog_run+0x16c0/0x28f0
    [ 77.370755] __bpf_prog_run32+0x83/0xc0
    [ 77.371153] ? __bpf_prog_run64+0xc0/0xc0
    [ 77.371568] ? match_held_lock+0x1b/0x230
    [ 77.371984] ? rcu_read_lock_held+0xa1/0xb0
    [ 77.372416] ? rcu_is_watching+0x34/0x50
    [ 77.372826] sk_filter_trim_cap+0x17c/0x4d0
    [ 77.373259] ? sock_kzfree_s+0x40/0x40
    [ 77.373648] ? __get_filter+0x150/0x150
    [ 77.374059] ? skb_copy_datagram_from_iter+0x80/0x280
    [ 77.374581] ? do_raw_spin_unlock+0xa5/0x140
    [ 77.375025] unix_dgram_sendmsg+0x33a/0xa70
    [ 77.375459] ? do_raw_spin_lock+0x1d0/0x1d0
    [ 77.375893] ? unix_peer_get+0xa0/0xa0
    [ 77.376287] ? __fget_light+0xa4/0xf0
    [ 77.376670] __sys_sendto+0x265/0x280
    [ 77.377056] ? __ia32_sys_getpeername+0x50/0x50
    [ 77.377523] ? lock_downgrade+0x350/0x350
    [ 77.377940] ? __sys_setsockopt+0x2a6/0x2c0
    [ 77.378374] ? sock_read_iter+0x240/0x240
    [ 77.378789] ? __sys_socketpair+0x22a/0x300
    [ 77.379221] ? __ia32_sys_socket+0x50/0x50
    [ 77.379649] ? mark_held_locks+0x1d/0x90
    [ 77.380059] ? trace_hardirqs_on_thunk+0x1a/0x1c
    [ 77.380536] __x64_sys_sendto+0x74/0x90
    [ 77.380938] do_syscall_64+0x68/0x2a0
    [ 77.381324] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 77.381878] RIP: 0033:0x44c070
    [...]

    After further debugging, turns out while in case of other helper functions
    we disallow passing modified ctx, the special case of ld/abs/ind instruction
    which has similar semantics (except r6 being the ctx argument) is missing
    such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
    and others are expecting skb fields in original position, hence, add
    check_ctx_reg() to reject any modified ctx. Issue was first introduced back
    in f1174f77b50c ("bpf/verifier: rework value tracking").

    Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
    Reported-by: Anatoly Trosinenko
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

09 Jan, 2020

1 commit

  • commit f54c7898ed1c3c9331376c0337a5049c38f66497 upstream.

    Anatoly has been fuzzing with kBdysch harness and reported a hang in one
    of the outcomes. Upon closer analysis, it turns out that precise scalar
    value tracking is missing a few precision markings for unknown scalars:

    0: R1=ctx(id=0,off=0,imm=0) R10=fp0
    0: (b7) r0 = 0
    1: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    1: (35) if r0 >= 0xf72e goto pc+0
    --> only follow fallthrough
    2: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    2: (35) if r0 >= 0x80fe0000 goto pc+0
    --> only follow fallthrough
    3: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    3: (14) w0 -= -536870912
    4: R0_w=invP536870912 R1=ctx(id=0,off=0,imm=0) R10=fp0
    4: (0f) r1 += r0
    5: R0_w=invP536870912 R1_w=inv(id=0) R10=fp0
    5: (55) if r1 != 0x104c1500 goto pc+0
    --> push other branch for later analysis
    R0_w=invP536870912 R1_w=inv273421568 R10=fp0
    6: R0_w=invP536870912 R1_w=inv273421568 R10=fp0
    6: (b7) r0 = 0
    7: R0=invP0 R1=inv273421568 R10=fp0
    7: (76) if w1 s>= 0xffffff00 goto pc+3
    --> only follow goto
    11: R0=invP0 R1=inv273421568 R10=fp0
    11: (95) exit
    6: R0_w=invP536870912 R1_w=inv(id=0) R10=fp0
    6: (b7) r0 = 0
    propagating r0
    7: safe
    processed 11 insns [...]

    In the analysis of the second path coming after the successful exit above,
    the path is being pruned at line 7. Pruning analysis found that both r0 are
    precise P0 and both R1 are non-precise scalars and given prior path with
    R1 as non-precise scalar succeeded, this one is therefore safe as well.

    However, problem is that given condition at insn 7 in the first run, we only
    followed goto and didn't push the other branch for later analysis, we've
    never walked the few insns in there and therefore dead-code sanitation
    rewrites it as goto pc-1, causing the hang depending on the skb address
    hitting these conditions. The issue is that R1 should have been marked as
    precise as well such that pruning enforces range check and conluded that new
    R1 is not in range of old R1. In insn 4, we mark R1 (skb) as unknown scalar
    via __mark_reg_unbounded() but not mark_reg_unbounded() and therefore
    regs->precise remains as false.

    Back in b5dc0163d8fd ("bpf: precise scalar_value tracking"), this was not
    the case since marking out of __mark_reg_unbounded() had this covered as well.
    Once in both are set as precise in 4 as they should have been, we conclude
    that given R1 was in prior fall-through path 0x104c1500 and now is completely
    unknown, the check at insn 7 concludes that we need to continue walking.
    Analysis after the fix:

    0: R1=ctx(id=0,off=0,imm=0) R10=fp0
    0: (b7) r0 = 0
    1: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    1: (35) if r0 >= 0xf72e goto pc+0
    2: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    2: (35) if r0 >= 0x80fe0000 goto pc+0
    3: R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R10=fp0
    3: (14) w0 -= -536870912
    4: R0_w=invP536870912 R1=ctx(id=0,off=0,imm=0) R10=fp0
    4: (0f) r1 += r0
    5: R0_w=invP536870912 R1_w=invP(id=0) R10=fp0
    5: (55) if r1 != 0x104c1500 goto pc+0
    R0_w=invP536870912 R1_w=invP273421568 R10=fp0
    6: R0_w=invP536870912 R1_w=invP273421568 R10=fp0
    6: (b7) r0 = 0
    7: R0=invP0 R1=invP273421568 R10=fp0
    7: (76) if w1 s>= 0xffffff00 goto pc+3
    11: R0=invP0 R1=invP273421568 R10=fp0
    11: (95) exit
    6: R0_w=invP536870912 R1_w=invP(id=0) R10=fp0
    6: (b7) r0 = 0
    7: R0_w=invP0 R1_w=invP(id=0) R10=fp0
    7: (76) if w1 s>= 0xffffff00 goto pc+3
    R0_w=invP0 R1_w=invP(id=0) R10=fp0
    8: R0_w=invP0 R1_w=invP(id=0) R10=fp0
    8: (a5) if r0 < 0x2007002a goto pc+0
    9: R0_w=invP0 R1_w=invP(id=0) R10=fp0
    9: (57) r0 &= -16316416
    10: R0_w=invP0 R1_w=invP(id=0) R10=fp0
    10: (a6) if w0 < 0x1201 goto pc+0
    11: R0_w=invP0 R1_w=invP(id=0) R10=fp0
    11: (95) exit
    11: R0=invP0 R1=invP(id=0) R10=fp0
    11: (95) exit
    processed 16 insns [...]

    Fixes: 6754172c208d ("bpf: fix precision tracking in presence of bpf2bpf calls")
    Reported-by: Anatoly Trosinenko
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20191222223740.25297-1-daniel@iogearbox.net
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

31 Dec, 2019

2 commits

  • [ Upstream commit 581738a681b6faae5725c2555439189ca81c0f1f ]

    With latest llvm (trunk https://github.com/llvm/llvm-project),
    test_progs, which has +alu32 enabled, failed for strobemeta.o.
    The verifier output looks like below with edit to replace large
    decimal numbers with hex ones.
    193: (85) call bpf_probe_read_user_str#114
    R0=inv(id=0)
    194: (26) if w0 > 0x1 goto pc+4
    R0_w=inv(id=0,umax_value=0xffffffff00000001)
    195: (6b) *(u16 *)(r7 +80) = r0
    196: (bc) w6 = w0
    R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
    197: (67) r6 <>= 32
    R6=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
    ...
    201: (79) r8 = *(u64 *)(r10 -416)
    R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0)
    202: (0f) r8 += r6
    R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
    203: (07) r8 += 9696
    R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
    ...
    255: (bf) r1 = r8
    R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
    ...
    257: (85) call bpf_probe_read_user_str#114
    R1 unbounded memory access, make sure to bounds check any array access into a map

    The value range for register r6 at insn 198 should be really just 0/1.
    The umax_value=0xffffffff caused later verification failure.

    After jmp instructions, the current verifier already tried to use just
    obtained information to get better register range. The current mechanism is
    for 64bit register only. This patch implemented to tighten the range
    for 32bit sub-registers after jmp32 instructions.
    With the patch, we have the below range ranges for the
    above code sequence:
    193: (85) call bpf_probe_read_user_str#114
    R0=inv(id=0)
    194: (26) if w0 > 0x1 goto pc+4
    R0_w=inv(id=0,smax_value=0x7fffffff00000001,umax_value=0xffffffff00000001,
    var_off=(0x0; 0xffffffff00000001))
    195: (6b) *(u16 *)(r7 +80) = r0
    196: (bc) w6 = w0
    R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0x1))
    197: (67) r6 <>= 32
    R6=inv(id=0,umax_value=1,var_off=(0x0; 0x1))
    ...
    201: (79) r8 = *(u64 *)(r10 -416)
    R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0)
    202: (0f) r8 += r6
    R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
    203: (07) r8 += 9696
    R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
    ...
    255: (bf) r1 = r8
    R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
    ...
    257: (85) call bpf_probe_read_user_str#114
    ...

    At insn 194, the register R0 has better var_off.mask and smax_value.
    Especially, the var_off.mask ensures later lshift and rshift
    maintains proper value range.

    Suggested-by: Alexei Starovoitov
    Signed-off-by: Yonghong Song
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20191121170650.449030-1-yhs@fb.com
    Signed-off-by: Sasha Levin

    Yonghong Song
     
  • [ Upstream commit eac9153f2b584c702cea02c1f1a57d85aa9aea42 ]

    bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
    deadlock on rq_lock():

    rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    [...]
    Call Trace:
    try_to_wake_up+0x1ad/0x590
    wake_up_q+0x54/0x80
    rwsem_wake+0x8a/0xb0
    bpf_get_stack+0x13c/0x150
    bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
    bpf_overflow_handler+0x60/0x100
    __perf_event_overflow+0x4f/0xf0
    perf_swevent_overflow+0x99/0xc0
    ___perf_sw_event+0xe7/0x120
    __schedule+0x47d/0x620
    schedule+0x29/0x90
    futex_wait_queue_me+0xb9/0x110
    futex_wait+0x139/0x230
    do_futex+0x2ac/0xa50
    __x64_sys_futex+0x13c/0x180
    do_syscall_64+0x42/0x100
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This can be reproduced by:
    1. Start a multi-thread program that does parallel mmap() and malloc();
    2. taskset the program to 2 CPUs;
    3. Attach bpf program to trace_sched_switch and gather stackmap with
    build-id, e.g. with trace.py from bcc tools:
    trace.py -U -p -s t:sched:sched_switch

    A sample reproducer is attached at the end.

    This could also trigger deadlock with other locks that are nested with
    rq_lock.

    Fix this by checking whether irqs are disabled. Since rq_lock and all
    other nested locks are irq safe, it is safe to do up_read() when irqs are
    not disable. If the irqs are disabled, postpone up_read() in irq_work.

    Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
    Signed-off-by: Song Liu
    Signed-off-by: Alexei Starovoitov
    Cc: Peter Zijlstra
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

    Reproducer:
    ============================ 8< ============================

    char *filename;

    void *worker(void *p)
    {
    void *ptr;
    int fd;
    char *pptr;

    fd = open(filename, O_RDONLY);
    if (fd < 0)
    return NULL;
    while (1) {
    struct timespec ts = {0, 1000 + rand() % 2000};

    ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
    usleep(1);
    if (ptr == MAP_FAILED) {
    printf("failed to mmap\n");
    break;
    }
    munmap(ptr, 4096 * 64);
    usleep(1);
    pptr = malloc(1);
    usleep(1);
    pptr[0] = 1;
    usleep(1);
    free(pptr);
    usleep(1);
    nanosleep(&ts, NULL);
    }
    close(fd);
    return NULL;
    }

    int main(int argc, char *argv[])
    {
    void *ptr;
    int i;
    pthread_t threads[THREAD_COUNT];

    if (argc < 2)
    return 0;

    filename = argv[1];

    for (i = 0; i < THREAD_COUNT; i++) {
    if (pthread_create(threads + i, NULL, worker, NULL)) {
    fprintf(stderr, "Error creating thread\n");
    return 0;
    }
    }

    for (i = 0; i < THREAD_COUNT; i++)
    pthread_join(threads[i], NULL);
    return 0;
    }
    ============================ 8< ============================

    Signed-off-by: Sasha Levin

    Song Liu
     

07 Nov, 2019

1 commit


01 Nov, 2019

1 commit

  • The functions bpf_map_area_alloc() and bpf_map_charge_init() prior
    this commit passed the size parameter as size_t. In this commit this
    is changed to u64.

    All users of these functions avoid size_t overflows on 32-bit systems,
    by explicitly using u64 when calculating the allocation size and
    memory charge cost. However, since the result was narrowed by the
    size_t when passing size and cost to the functions, the overflow
    handling was in vain.

    Instead of changing all call sites to size_t and handle overflow at
    the call site, the parameter is changed to u64 and checked in the
    functions above.

    Fixes: d407bd25a204 ("bpf: don't trigger OOM killer under pressure with map alloc")
    Fixes: c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()")
    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/20191029154307.23053-1-bjorn.topel@gmail.com

    Björn Töpel
     

31 Oct, 2019

1 commit

  • "ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it
    reads the wrong byte, which happens to have the expected value of 0.
    Improve the test by seeking to the 4th byte and expecting 4 instead of
    0.

    This makes the latent problem apparent: the test attempts to read the
    first byte of bpf_sysctl.file_pos, assuming this is the least-significant
    byte, which is not the case on big-endian machines: a non-zero offset is
    needed.

    The point of the test is to verify narrow loads, so we cannot cheat our
    way out by simply using BPF_W. The existence of the test means that such
    loads have to be supported, most likely because llvm can generate them.
    Fix the test by adding a big-endian variant, which uses an offset to
    access the least-significant byte of bpf_sysctl.file_pos.

    This reveals the final problem: verifier rejects accesses to bpf_sysctl
    fields with offset > 0. Such accesses are already allowed for a wide
    range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few.
    Extend this support to bpf_sysctl by using bpf_ctx_range instead of
    offsetof when matching field offsets.

    Fixes: 7b146cebe30c ("bpf: Sysctl hook")
    Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx")
    Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx")
    Signed-off-by: Ilya Leoshkevich
    Signed-off-by: Alexei Starovoitov
    Acked-by: Andrey Ignatov
    Acked-by: Andrii Nakryiko
    Link: https://lore.kernel.org/bpf/20191028122902.9763-1-iii@linux.ibm.com

    Ilya Leoshkevich
     

23 Oct, 2019

2 commits

  • There is one more problematic case I noticed while recently fixing BPF kallsyms
    handling in cd7455f1013e ("bpf: Fix use after free in subprog's jited symbol
    removal") and that is bpf_get_prog_name().

    If BTF has been attached to the prog, then we may be able to fetch the function
    signature type id in kallsyms through prog->aux->func_info[prog->aux->func_idx].type_id.
    However, while the BTF object itself is torn down via RCU callback, the prog's
    aux->func_info is immediately freed via kvfree(prog->aux->func_info) once the
    prog's refcount either hit zero or when subprograms were already exposed via
    kallsyms and we hit the error path added in 5482e9a93c83 ("bpf: Fix memleak in
    aux->func_info and aux->btf").

    This violates RCU as well since kallsyms could be walked in parallel where we
    could access aux->func_info. Hence, defer kvfree() to after RCU grace period.
    Looking at ba64e7d85252 ("bpf: btf: support proper non-jit func info") there
    is no reason/dependency where we couldn't defer the kvfree(aux->func_info) into
    the RCU callback.

    Fixes: 5482e9a93c83 ("bpf: Fix memleak in aux->func_info and aux->btf")
    Fixes: ba64e7d85252 ("bpf: btf: support proper non-jit func info")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Cc: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/875f2906a7c1a0691f2d567b4d8e4ea2739b1e88.1571779205.git.daniel@iogearbox.net

    Daniel Borkmann
     
  • syzkaller managed to trigger the following crash:

    [...]
    BUG: unable to handle page fault for address: ffffc90001923030
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
    Oops: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
    RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
    RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
    RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
    RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
    RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
    RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
    [...]
    Call Trace:
    kernel_text_address kernel/extable.c:147 [inline]
    __kernel_text_address+0x9a/0x110 kernel/extable.c:102
    unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
    arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
    stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
    save_stack mm/kasan/common.c:69 [inline]
    set_track mm/kasan/common.c:77 [inline]
    __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
    kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
    slab_post_alloc_hook mm/slab.h:584 [inline]
    slab_alloc mm/slab.c:3319 [inline]
    kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
    getname_flags+0xba/0x640 fs/namei.c:138
    getname+0x19/0x20 fs/namei.c:209
    do_sys_open+0x261/0x560 fs/open.c:1091
    __do_sys_open fs/open.c:1115 [inline]
    __se_sys_open fs/open.c:1110 [inline]
    __x64_sys_open+0x87/0x90 fs/open.c:1110
    do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [...]

    After further debugging it turns out that we walk kallsyms while in parallel
    we tear down a BPF program which contains subprograms that have been JITed
    though the program itself has not been fully exposed and is eventually bailing
    out with error.

    The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
    the symbols, however, bpf_prog_free() tears down the JIT memory too early via
    scheduled work. Instead, it needs to properly respect RCU grace period as the
    kallsyms walk for BPF is under RCU.

    Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
    path where we defer final destruction when we have subprogs in the program.

    Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup")
    Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
    Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net

    Daniel Borkmann
     

22 Oct, 2019

1 commit

  • It seems I forgot to add handling of devmap_hash type maps to the device
    unregister hook for devmaps. This omission causes devices to not be
    properly released, which causes hangs.

    Fix this by adding the missing handler.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Reported-by: Tetsuo Handa
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20191019111931.2981954-1-toke@redhat.com

    Toke Høiland-Jørgensen
     

19 Oct, 2019

1 commit

  • Tetsuo pointed out that without an explicit cast, the cost calculation for
    devmap_hash type maps could overflow on 32-bit builds. This adds the
    missing cast.

    Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
    Reported-by: Tetsuo Handa
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Alexei Starovoitov
    Acked-by: Yonghong Song
    Link: https://lore.kernel.org/bpf/20191017105702.2807093-1-toke@redhat.com

    Toke Høiland-Jørgensen
     

29 Sep, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Sanity check URB networking device parameters to avoid divide by
    zero, from Oliver Neukum.

    2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
    don't work properly. Longer term this needs a better fix tho. From
    Vijay Khemka.

    3) Small fixes to selftests (use ping when ping6 is not present, etc.)
    from David Ahern.

    4) Bring back rt_uses_gateway member of struct rtable, it's semantics
    were not well understood and trying to remove it broke things. From
    David Ahern.

    5) Move usbnet snaity checking, ignore endpoints with invalid
    wMaxPacketSize. From Bjørn Mork.

    6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.

    7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
    Alex Vesker, and Yevgeny Kliteynik

    8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.

    9) Fix crash when removing sch_cbs entry while offloading is enabled,
    from Vinicius Costa Gomes.

    10) Signedness bug fixes, generally in looking at the result given by
    of_get_phy_mode() and friends. From Dan Crapenter.

    11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.

    12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.

    13) Fix quantization code in tcp_bbr, from Kevin Yang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
    net: tap: clean up an indentation issue
    nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
    tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
    sk_buff: drop all skb extensions on free and skb scrubbing
    tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
    mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
    Documentation: Clarify trap's description
    mlxsw: spectrum: Clear VLAN filters during port initialization
    net: ena: clean up indentation issue
    NFC: st95hf: clean up indentation issue
    net: phy: micrel: add Asym Pause workaround for KSZ9021
    net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
    ptp: correctly disable flags on old ioctls
    lib: dimlib: fix help text typos
    net: dsa: microchip: Always set regmap stride to 1
    nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
    nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
    net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
    vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
    net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
    ...

    Linus Torvalds
     

26 Sep, 2019

2 commits

  • There is a statement that is indented one level too deeply, remove
    the extraneous tab.

    Signed-off-by: Colin Ian King
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/20190925093835.19515-1-colin.king@canonical.com

    Colin Ian King
     
  • When kzalloc() failed, NULL was returned to the caller, which
    tested the pointer with IS_ERR(), which didn't match, so the
    pointer was used later, resulting in a NULL dereference.

    Return ERR_PTR(-ENOMEM) instead of NULL.

    Reported-by: syzbot+491c1b7565ba9069ecae@syzkaller.appspotmail.com
    Fixes: 0402acd683c6 ("xsk: remove AF_XDP socket from map when the socket is released")
    Signed-off-by: Jonathan Lemon
    Acked-by: Björn Töpel
    Signed-off-by: Daniel Borkmann

    Jonathan Lemon