11 Mar, 2018

3 commits

  • [ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]

    syszbot managed to trigger RCU detected stalls in
    bpf_array_free_percpu()

    It takes time to allocate a huge percpu map, but even more time to free
    it.

    Since we run in process context, use cond_resched() to yield cpu if
    needed.

    Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]

    syzkaller recently triggered OOM during percpu map allocation;
    while there is work in progress by Dennis Zhou to add __GFP_NORETRY
    semantics for percpu allocator under pressure, there seems also a
    missing bpf_map_precharge_memlock() check in array map allocation.

    Given today the actual bpf_map_charge_memlock() happens after the
    find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
    is there to bail out early before we go and do the map setup work
    when we find that we hit the limits anyway. Therefore add this for
    array map as well.

    Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
    Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
    Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Cc: Dennis Zhou
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ upstream commit a316338cb71a3260201490e615f2f6d5c0d8fb2c ]

    trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
    attr->map_flags, since it does not support preallocation yet. We
    check the flag, but we never copy the flag into trie->map.map_flags,
    which is later on exposed into fdinfo and used by loaders such as
    iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
    whether a pinned map has the same spec as the one from the BPF obj
    file and if not, bails out, which is currently the case for lpm
    since it exposes always 0 as flags.

    Also copy over flags in array_map_alloc() and stack_map_alloc().
    They always have to be 0 right now, but we should make sure to not
    miss to copy them over at a later point in time when we add actual
    flags for them to use.

    Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
    Reported-by: Jarno Rajahalme
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

31 Jan, 2018

6 commits

  • [ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]

    Alexei found that verifier does not reject stores into context
    via BPF_ST instead of BPF_STX. And while looking at it, we
    also should not allow XADD variant of BPF_STX.

    The context rewriter is only assuming either BPF_LDX_MEM- or
    BPF_STX_MEM-type operations, thus reject anything other than
    that so that assumptions in the rewriter properly hold. Add
    test cases as well for BPF selftests.

    Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
    Reported-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]

    due to some JITs doing if (src_reg == 0) check in 64-bit mode
    for div/mod operations mask upper 32-bits of src register
    before doing the check

    Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
    Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
    Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]

    Divides by zero are not nice, lets avoid them if possible.

    Also do_div() seems not needed when dealing with 32bit operands,
    but this seems a minor detail.

    Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ upstream commit 7891a87efc7116590eaba57acc3c422487802c6f ]

    The following snippet was throwing an 'unknown opcode cc' warning
    in BPF interpreter:

    0: (18) r0 = 0x0
    2: (7b) *(u64 *)(r10 -16) = r0
    3: (cc) (u32) r0 s>>= (u32) r0
    4: (95) exit

    Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
    generation, not all of them do and interpreter does neither. We can
    leave existing ones and implement it later in bpf-next for the
    remaining ones, but reject this properly in verifier for the time
    being.

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

    The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

    A quote from goolge project zero blog:
    "At this point, it would normally be necessary to locate gadgets in
    the host kernel code that can be used to actually leak data by reading
    from an attacker-controlled location, shifting and masking the result
    appropriately and then using the result of that as offset to an
    attacker-controlled address for a load. But piecing gadgets together
    and figuring out which ones work in a speculation context seems annoying.
    So instead, we decided to use the eBPF interpreter, which is built into
    the host kernel - while there is no legitimate way to invoke it from inside
    a VM, the presence of the code in the host kernel's text section is sufficient
    to make it usable for the attack, just like with ordinary ROP gadgets."

    To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
    option that removes interpreter from the kernel in favor of JIT-only mode.
    So far eBPF JIT is supported by:
    x64, arm64, arm32, sparc64, s390, powerpc64, mips64

    The start of JITed program is randomized and code page is marked as read-only.
    In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

    v2->v3:
    - move __bpf_prog_ret0 under ifdef (Daniel)

    v1->v2:
    - fix init order, test_bpf and cBPF (Daniel's feedback)
    - fix offloaded bpf (Jakub's feedback)
    - add 'return 0' dummy in case something can invoke prog->bpf_func
    - retarget bpf tree. For bpf-next the patch would need one extra hunk.
    It will be sent when the trees are merged back to net-next

    Considered doing:
    int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
    but it seems better to land the patch as-is and in bpf-next remove
    bpf_jit_enable global variable from all JITs, consolidate in one place
    and remove this jit_init() function.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ upstream commit 90caccdd8cc0215705f18b92771b449b01e2474a ]

    - bpf prog_array just like all other types of bpf array accepts 32-bit index.
    Clarify that in the comment.
    - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
    - tighten corresponding check in the interpreter to stay consistent

    The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
    in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and
    though JIT code is wrong it will check bounds correctly.
    Hence two fixes tags. All other JITs don't have this problem.

    Signed-off-by: Alexei Starovoitov
    Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation")
    Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper")
    Acked-by: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     

17 Jan, 2018

4 commits

  • commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.

    syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
    and thus unprivileged. With the recently added logic in b2157399cc98
    ("bpf: prevent out-of-bounds speculation") we round this up to the next
    power of two value for max_entries for unprivileged such that we can
    apply proper masking into potentially zeroed out map slots.

    However, this will generate an index_mask of 0xffffffff, and therefore
    a + 1 will let this overflow into new max_entries of 0. This will pass
    allocation, etc, and later on map access we still enforce on the original
    attr->max_entries value which was 0xfffffffd, therefore triggering GPF
    all over the place. Thus bail out on overflow in such case.

    Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
    since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
    space is undefined. Therefore, do this by hand in a 64 bit variable.

    This fixes all the issues triggered by syzkaller's reproducers.

    Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
    Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
    Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
    Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
    Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
    Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

    Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
    memory accesses under a bounds check may be speculated even if the
    bounds check fails, providing a primitive for building a side channel.

    To avoid leaking kernel data round up array-based maps and mask the index
    after bounds check, so speculated load with out of bounds index will load
    either valid value from the array or zero from the padded area.

    Unconditionally mask index for all array types even when max_entries
    are not rounded to power of 2 for root user.
    When map is created by unpriv user generate a sequence of bpf insns
    that includes AND operation to make sure that JITed code includes
    the same 'index & index_mask' operation.

    If prog_array map is created by unpriv user replace
    bpf_tail_call(ctx, map, index);
    with
    if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
    }
    (along with roundup to power 2) to prevent out-of-bounds speculation.
    There is secondary redundant 'if (index >= max_entries)' in the interpreter
    and in all JITs, but they can be optimized later if necessary.

    Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
    cannot be used by unpriv, so no changes there.

    That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
    all architectures with and without JIT.

    v2->v3:
    Daniel noticed that attack potentially can be crafted via syscall commands
    without loading the program, so add masking to those paths as well.

    Signed-off-by: Alexei Starovoitov
    Acked-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Cc: Jiri Slaby
    [ Backported to 4.9 - gregkh ]
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • commit 79741b3bdec01a8628368fbcfccc7d189ed606cb upstream.

    reduce indent and make it iterate over instructions similar to
    convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Cc: Jiri Slaby
    [Backported to 4.9.y - gregkh]
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • commit e245c5c6a5656e4d61aa7bb08e9694fd6e5b2b9d upstream.

    no functional change.
    move fixup_bpf_calls() to verifier.c
    it's being refactored in the next patch

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Cc: Jiri Slaby
    [backported to 4.9 - gregkh]
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     

30 Dec, 2017

1 commit

  • An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
    pointer leaks are allowed. Therefore, states_equal() must not treat
    a state with a pointer in a register as "equal" to a state with an
    UNKNOWN_VALUE in that register.

    This was fixed differently upstream, but the code around here was
    largely rewritten in 4.14 by commit f1174f77b50c "bpf/verifier: rework
    value tracking". The bug can be detected by the bpf/verifier sub-test
    "pointer/scalar confusion in state equality check (way 1)".

    Signed-off-by: Ben Hutchings
    Cc: Edward Cree
    Cc: Jann Horn
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann

    Ben Hutchings
     

25 Dec, 2017

4 commits

  • From: Jann Horn

    [ Upstream commit 95a762e2c8c942780948091f8f2a4f32fce1ac6f ]

    Distinguish between
    BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
    and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
    only perform sign extension in the first case.

    Starting with v4.14, this is exploitable by unprivileged users as long as
    the unprivileged_bpf_disabled sysctl isn't set.

    Debian assigned CVE-2017-16995 for this issue.

    v3:
    - add CVE number (Ben Hutchings)

    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Signed-off-by: Jann Horn
    Acked-by: Edward Cree
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Jann Horn

    Reject programs that compute wildly out-of-bounds stack pointers.
    Otherwise, pointers can be computed with an offset that doesn't fit into an
    `int`, causing security issues in the stack memory access check (as well as
    signed integer overflow during offset addition).

    This is a fix specifically for the v4.9 stable tree because the mainline
    code looks very different at this point.

    Fixes: 7bca0a9702edf ("bpf: enhance verifier to understand stack pointer arithmetic")
    Signed-off-by: Jann Horn
    Acked-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Alexei Starovoitov

    [ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ]

    when the verifier detects that register contains a runtime constant
    and it's compared with another constant it will prune exploration
    of the branch that is guaranteed not to be taken at runtime.
    This is all correct, but malicious program may be constructed
    in such a way that it always has a constant comparison and
    the other branch is never taken under any conditions.
    In this case such path through the program will not be explored
    by the verifier. It won't be taken at run-time either, but since
    all instructions are JITed the malicious program may cause JITs
    to complain about using reserved fields, etc.
    To fix the issue we have to track the instructions explored by
    the verifier and sanitize instructions that are dead at run time
    with NOPs. We cannot reject such dead code, since llvm generates
    it for valid C code, since it doesn't do as much data flow
    analysis as the verifier does.

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • From: Alexei Starovoitov

    [ Upstream commit 8041902dae5299c1f194ba42d14383f734631009 ]

    convert_ctx_accesses() replaces single bpf instruction with a set of
    instructions. Adjust corresponding insn_aux_data while patching.
    It's needed to make sure subsequent 'for(all insn)' loops
    have matching insn and insn_aux_data.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

14 Dec, 2017

1 commit

  • [ Upstream commit 89ad2fa3f043a1e8daae193bcb5fe34d5f8caf28 ]

    pcpu_freelist_pop() needs the same lockdep awareness than
    pcpu_freelist_populate() to avoid a false positive.

    [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]

    switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
    (&htab->buckets[i].lock){......}, at: [] __htab_percpu_map_update_elem+0x1cb/0x300

    and this task is already holding:
    (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [] __dev_queue_xmit+0
    x868/0x1240
    which would create a new lock dependency:
    (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}

    but this new dependency connects a SOFTIRQ-irq-safe lock:
    (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
    ... which became SOFTIRQ-irq-safe at:
    [] __lock_acquire+0x42b/0x1f10
    [] lock_acquire+0xbc/0x1b0
    [] _raw_spin_lock+0x38/0x50
    [] __dev_queue_xmit+0x868/0x1240
    [] dev_queue_xmit+0x10/0x20
    [] ip_finish_output2+0x439/0x590
    [] ip_finish_output+0x150/0x2f0
    [] ip_output+0x7d/0x260
    [] ip_local_out+0x5e/0xe0
    [] ip_queue_xmit+0x205/0x620
    [] tcp_transmit_skb+0x5a8/0xcb0
    [] tcp_write_xmit+0x242/0x1070
    [] __tcp_push_pending_frames+0x3c/0xf0
    [] tcp_rcv_established+0x312/0x700
    [] tcp_v4_do_rcv+0x11c/0x200
    [] tcp_v4_rcv+0xaa2/0xc30
    [] ip_local_deliver_finish+0xa7/0x240
    [] ip_local_deliver+0x66/0x200
    [] ip_rcv_finish+0xdd/0x560
    [] ip_rcv+0x295/0x510
    [] __netif_receive_skb_core+0x988/0x1020
    [] __netif_receive_skb+0x21/0x70
    [] process_backlog+0x6f/0x230
    [] net_rx_action+0x229/0x420
    [] __do_softirq+0xd8/0x43d
    [] do_softirq_own_stack+0x1c/0x30
    [] do_softirq+0x55/0x60
    [] __local_bh_enable_ip+0xa8/0xb0
    [] cpu_startup_entry+0x1c7/0x500
    [] start_secondary+0x113/0x140

    to a SOFTIRQ-irq-unsafe lock:
    (&head->lock){+.+...}
    ... which became SOFTIRQ-irq-unsafe at:
    ... [] __lock_acquire+0x82f/0x1f10
    [] lock_acquire+0xbc/0x1b0
    [] _raw_spin_lock+0x38/0x50
    [] pcpu_freelist_pop+0x7a/0xb0
    [] htab_map_alloc+0x50c/0x5f0
    [] SyS_bpf+0x265/0x1200
    [] entry_SYSCALL_64_fastpath+0x12/0x17

    other info that might help us debug this:

    Chain exists of:
    dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock

    Possible interrupt unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&head->lock);
    local_irq_disable();
    lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
    lock(&htab->buckets[i].lock);

    lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);

    *** DEADLOCK ***

    Fixes: e19494edab82 ("bpf: introduce percpu_freelist")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

12 Oct, 2017

1 commit

  • [ Upstream commit e67b8a685c7c984e834e3181ef4619cd7025a136 ]

    Neither ___bpf_prog_run nor the JITs accept it.
    Also adds a new test case.

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Edward Cree
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Edward Cree
     

30 Aug, 2017

5 commits

  • [ Upstream commit 9305706c2e808ae59f1eb201867f82f1ddf6d7a6 ]

    We have to subtract the src max from the dst min, and vice-versa, since
    (e.g.) the smallest result comes from the largest subtrahend.

    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Signed-off-by: Edward Cree
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Edward Cree
     
  • [ Upstream commit 4cabc5b186b5427b9ee5a7495172542af105f02b ]

    Edward reported that there's an issue in min/max value bounds
    tracking when signed and unsigned compares both provide hints
    on limits when having unknown variables. E.g. a program such
    as the following should have been rejected:

    0: (7a) *(u64 *)(r10 -8) = 0
    1: (bf) r2 = r10
    2: (07) r2 += -8
    3: (18) r1 = 0xffff8a94cda93400
    5: (85) call bpf_map_lookup_elem#1
    6: (15) if r0 == 0x0 goto pc+7
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
    7: (7a) *(u64 *)(r10 -16) = -8
    8: (79) r1 = *(u64 *)(r10 -16)
    9: (b7) r2 = -1
    10: (2d) if r1 > r2 goto pc+3
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
    R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
    11: (65) if r1 s> 0x1 goto pc+2
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
    R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
    12: (0f) r0 += r1
    13: (72) *(u8 *)(r0 +0) = 0
    R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
    R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
    14: (b7) r0 = 0
    15: (95) exit

    What happens is that in the first part ...

    8: (79) r1 = *(u64 *)(r10 -16)
    9: (b7) r2 = -1
    10: (2d) if r1 > r2 goto pc+3

    ... r1 carries an unsigned value, and is compared as unsigned
    against a register carrying an immediate. Verifier deduces in
    reg_set_min_max() that since the compare is unsigned and operation
    is greater than (>), that in the fall-through/false case, r1's
    minimum bound must be 0 and maximum bound must be r2. Latter is
    larger than the bound and thus max value is reset back to being
    'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
    'R1=inv,min_value=0'. The subsequent test ...

    11: (65) if r1 s> 0x1 goto pc+2

    ... is a signed compare of r1 with immediate value 1. Here,
    verifier deduces in reg_set_min_max() that since the compare
    is signed this time and operation is greater than (>), that
    in the fall-through/false case, we can deduce that r1's maximum
    bound must be 1, meaning with prior test, we result in r1 having
    the following state: R1=inv,min_value=0,max_value=1. Given that
    the actual value this holds is -8, the bounds are wrongly deduced.
    When this is being added to r0 which holds the map_value(_adj)
    type, then subsequent store access in above case will go through
    check_mem_access() which invokes check_map_access_adj(), that
    will then probe whether the map memory is in bounds based
    on the min_value and max_value as well as access size since
    the actual unknown value is min_value = r1 goto pc+2
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
    R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
    11: (b7) r7 = 1
    12: (65) if r7 s> 0x0 goto pc+2
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
    R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
    13: (b7) r0 = 0
    14: (95) exit

    from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
    R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
    15: (0f) r7 += r1
    16: (65) if r7 s> 0x4 goto pc+2
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
    R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
    17: (0f) r0 += r7
    18: (72) *(u8 *)(r0 +0) = 0
    R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
    R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
    19: (b7) r0 = 0
    20: (95) exit

    Meaning, in adjust_reg_min_max_vals() we must also reset range
    values on the dst when src/dst registers have mixed signed/
    unsigned derived min/max value bounds with one unbounded value
    as otherwise they can be added together deducing false boundaries.
    Once both boundaries are established from either ALU ops or
    compare operations w/o mixing signed/unsigned insns, then they
    can safely be added to other regs also having both boundaries
    established. Adding regs with one unbounded side to a map value
    where the bounded side has been learned w/o mixing ops is
    possible, but the resulting map value won't recover from that,
    meaning such op is considered invalid on the time of actual
    access. Invalid bounds are set on the dst reg in case i) src reg,
    or ii) in case dst reg already had them. The only way to recover
    would be to perform i) ALU ops but only 'add' is allowed on map
    value types or ii) comparisons, but these are disallowed on
    pointers in case they span a range. This is fine as only BPF_JEQ
    and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
    which potentially turn them into PTR_TO_MAP_VALUE type depending
    on the branch, so only here min/max value cannot be invalidated
    for them.

    In terms of state pruning, value_from_signed is considered
    as well in states_equal() when dealing with adjusted map values.
    With regards to breaking existing programs, there is a small
    risk, but use-cases are rather quite narrow where this could
    occur and mixing compares probably unlikely.

    Joint work with Josef and Edward.

    [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html

    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Reported-by: Edward Cree
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Edward Cree
    Signed-off-by: Josef Bacik
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit fce366a9dd0ddc47e7ce05611c266e8574a45116 ]

    While looking into map_value_adj, I noticed that alu operations
    directly on the map_value() resp. map_value_adj() register (any
    alu operation on a map_value() register will turn it into a
    map_value_adj() typed register) are not sufficiently protected
    against some of the operations. Two non-exhaustive examples are
    provided that the verifier needs to reject:

    i) BPF_AND on r0 (map_value_adj):

    0: (bf) r2 = r10
    1: (07) r2 += -8
    2: (7a) *(u64 *)(r2 +0) = 0
    3: (18) r1 = 0xbf842a00
    5: (85) call bpf_map_lookup_elem#1
    6: (15) if r0 == 0x0 goto pc+2
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
    7: (57) r0 &= 8
    8: (7a) *(u64 *)(r0 +0) = 22
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
    9: (95) exit

    from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
    9: (95) exit
    processed 10 insns

    ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):

    0: (bf) r2 = r10
    1: (07) r2 += -8
    2: (7a) *(u64 *)(r2 +0) = 0
    3: (18) r1 = 0xc24eee00
    5: (85) call bpf_map_lookup_elem#1
    6: (15) if r0 == 0x0 goto pc+2
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
    7: (04) (u32) r0 += (u32) 0
    8: (7a) *(u64 *)(r0 +0) = 22
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
    9: (95) exit

    from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
    9: (95) exit
    processed 10 insns

    Issue is, while min_value / max_value boundaries for the access
    are adjusted appropriately, we change the pointer value in a way
    that cannot be sufficiently tracked anymore from its origin.
    Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
    that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
    all the test cases coming with 484611357c19 ("bpf: allow access
    into map value arrays") perform BPF_ADD only on the destination
    register that is PTR_TO_MAP_VALUE_ADJ.

    Only for UNKNOWN_VALUE register types such operations make sense,
    f.e. with unknown memory content fetched initially from a constant
    offset from the map value memory into a register. That register is
    then later tested against lower / upper bounds, so that the verifier
    can then do the tracking of min_value / max_value, and properly
    check once that UNKNOWN_VALUE register is added to the destination
    register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
    original use-case is solving. Note, tracking on what is being
    added is done through adjust_reg_min_max_vals() and later access
    to the map value enforced with these boundaries and the given offset
    from the insn through check_map_access_adj().

    Tests will fail for non-root environment due to prohibited pointer
    arithmetic, in particular in check_alu_op(), we bail out on the
    is_pointer_value() check on the dst_reg (which is false in root
    case as we allow for pointer arithmetic via env->allow_ptr_leaks).

    Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
    allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
    mode BPF_ADD. The test_verifier suite runs fine after the patch
    and it also rejects mentioned test cases.

    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit 3c2ce60bdd3d57051bf85615deec04a694473840 ]

    Current limits with regards to processing program paths do not
    really reflect today's needs anymore due to programs becoming
    more complex and verifier smarter, keeping track of more data
    such as const ALU operations, alignment tracking, spilling of
    PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
    smarter matching of what LLVM generates.

    This also comes with the side-effect that we result in fewer
    opportunities to prune search states and thus often need to do
    more work to prove safety than in the past due to different
    register states and stack layout where we mismatch. Generally,
    it's quite hard to determine what caused a sudden increase in
    complexity, it could be caused by something as trivial as a
    single branch somewhere at the beginning of the program where
    LLVM assigned a stack slot that is marked differently throughout
    other branches and thus causing a mismatch, where verifier
    then needs to prove safety for the whole rest of the program.
    Subsequently, programs with even less than half the insn size
    limit can get rejected. We noticed that while some programs
    load fine under pre 4.11, they get rejected due to hitting
    limits on more recent kernels. We saw that in the vast majority
    of cases (90+%) pruning failed due to register mismatches. In
    case of stack mismatches, majority of cases failed due to
    different stack slot types (invalid, spill, misc) rather than
    differences in spilled registers.

    This patch makes pruning more aggressive by also adding markers
    that sit at conditional jumps as well. Currently, we only mark
    jump targets for pruning. For example in direct packet access,
    these are usually error paths where we bail out. We found that
    adding these markers, it can reduce number of processed insns
    by up to 30%. Another option is to ignore reg->id in probing
    PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
    slightly as well by up to 7% observed complexity reduction as
    stand-alone. Meaning, if a previous path with register type
    PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
    in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
    the same map X must be safe as well. Last but not least the
    patch also adds a scheduling point and bumps the current limit
    for instructions to be processed to a more adequate value.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit 43188702b3d98d2792969a3377a30957f05695e6 ]

    Currently the verifier does not track imm across alu operations when
    the source register is of unknown type. This adds additional pattern
    matching to catch this and track imm. We've seen LLVM generating this
    pattern while working on cilium.

    Signed-off-by: John Fastabend
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     

21 Jul, 2017

1 commit

  • commit 6bdf6abc56b53103324dfd270a86580306e1a232 upstream.

    Leaking kernel addresses on unpriviledged is generally disallowed,
    for example, verifier rejects the following:

    0: (b7) r0 = 0
    1: (18) r2 = 0xffff897e82304400
    3: (7b) *(u64 *)(r1 +48) = r2
    R2 leaks addr into ctx

    Doing pointer arithmetic on them is also forbidden, so that they
    don't turn into unknown value and then get leaked out. However,
    there's xadd as a special case, where we don't check the src reg
    for being a pointer register, e.g. the following will pass:

    0: (b7) r0 = 0
    1: (7b) *(u64 *)(r1 +48) = r0
    2: (18) r2 = 0xffff897e82304400 ; map
    4: (db) lock *(u64 *)(r1 +48) += r2
    5: (95) exit

    We could store the pointer into skb->cb, loose the type context,
    and then read it out from there again to leak it eventually out
    of a map value. Or more easily in a different variant, too:

    0: (bf) r6 = r1
    1: (7a) *(u64 *)(r10 -8) = 0
    2: (bf) r2 = r10
    3: (07) r2 += -8
    4: (18) r1 = 0x0
    6: (85) call bpf_map_lookup_elem#1
    7: (15) if r0 == 0x0 goto pc+3
    R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
    8: (b7) r3 = 0
    9: (7b) *(u64 *)(r0 +0) = r3
    10: (db) lock *(u64 *)(r0 +0) += r6
    11: (b7) r0 = 0
    12: (95) exit

    from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
    11: (b7) r0 = 0
    12: (95) exit

    Prevent this by checking xadd src reg for pointer types. Also
    add a couple of test cases related to this.

    Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Acked-by: Edward Cree
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

05 Jul, 2017

1 commit

  • [ Upstream commit d407bd25a204bd66b7346dde24bd3d37ef0e0b05 ]

    This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
    that are to be used for map allocations. Using kmalloc() for very large
    allocations can cause excessive work within the page allocator, so i) fall
    back earlier to vmalloc() when the attempt is considered costly anyway,
    and even more importantly ii) don't trigger OOM killer with any of the
    allocators.

    Since this is based on a user space request, for example, when creating
    maps with element pre-allocation, we really want such requests to fail
    instead of killing other user space processes.

    Also, don't spam the kernel log with warnings should any of the allocations
    fail under pressure. Given that, we can make backend selection in
    bpf_map_area_alloc() generic, and convert all maps over to use this API
    for spots with potentially large allocation requests.

    Note, replacing the one kmalloc_array() is fine as overflow checks happen
    earlier in htab_map_alloc(), since it must also protect the multiplication
    for vmalloc() should kmalloc_array() fail.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

14 May, 2017

2 commits

  • [ Upstream commit 0d0e57697f162da4aa218b5feafe614fb666db07 ]

    The patch fixes two things at once:

    1) It checks the env->allow_ptr_leaks and only prints the map address to
    the log if we have the privileges to do so, otherwise it just dumps 0
    as we would when kptr_restrict is enabled on %pK. Given the latter is
    off by default and not every distro sets it, I don't want to rely on
    this, hence the 0 by default for unprivileged.

    2) Printing of ldimm64 in the verifier log is currently broken in that
    we don't print the full immediate, but only the 32 bit part of the
    first insn part for ldimm64. Thus, fix this up as well; it's okay to
    access, since we verified all ldimm64 earlier already (including just
    constants) through replace_map_fd_with_map_ptr().

    Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
    Fixes: cbd357008604 ("bpf: verifier (add ability to receive verification log)")
    Reported-by: Jann Horn
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529 ]

    llvm 4.0 and above generates the code like below:
    ....
    440: (b7) r1 = 15
    441: (05) goto pc+73
    515: (79) r6 = *(u64 *)(r10 -152)
    516: (bf) r7 = r10
    517: (07) r7 += -112
    518: (bf) r2 = r7
    519: (0f) r2 += r1
    520: (71) r1 = *(u8 *)(r8 +0)
    521: (73) *(u8 *)(r2 +45) = r1
    ....
    and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
    This is because verifier marks register r2 as unknown value after #519
    where r2 is a stack pointer and r1 holds a constant value.

    Teach verifier to recognize "stack_ptr + imm" and
    "stack_ptr + reg with const val" as valid stack_ptr with new offset.

    Signed-off-by: Yonghong Song
    Acked-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yonghong Song
     

03 May, 2017

1 commit

  • [ Upstream commit b1977682a3858b5584ffea7cfb7bd863f68db18d ]

    llvm can optimize the 'if (ptr > data_end)' checks to be in the order
    slightly different than the original C code which will confuse verifier.
    Like:
    if (ptr + 16 > data_end)
    return TC_ACT_SHOT;
    // may be followed by
    if (ptr + 14 > data_end)
    return TC_ACT_SHOT;
    while llvm can see that 'ptr' is valid for all 16 bytes,
    the verifier could not.
    Fix verifier logic to account for such case and add a test.

    Reported-by: Huapeng Zhou
    Fixes: 969bf05eb3ce ("bpf: direct packet access")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Martin KaFai Lau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     

22 Mar, 2017

4 commits

  • [ Upstream commit 6760bf2ddde8ad64f8205a651223a93de3a35494 ]

    Martin reported a verifier issue that hit the BUG_ON() for his
    test case in the mark_reg_unknown_value() function:

    [ 202.861380] kernel BUG at kernel/bpf/verifier.c:467!
    [...]
    [ 203.291109] Call Trace:
    [ 203.296501] [] mark_map_reg+0x45/0x50
    [ 203.308225] [] mark_map_regs+0x78/0x90
    [ 203.320140] [] do_check+0x226d/0x2c90
    [ 203.331865] [] bpf_check+0x48b/0x780
    [ 203.343403] [] bpf_prog_load+0x27e/0x440
    [ 203.355705] [] ? handle_mm_fault+0x11af/0x1230
    [ 203.369158] [] ? security_capable+0x48/0x60
    [ 203.382035] [] SyS_bpf+0x124/0x960
    [ 203.393185] [] ? __do_page_fault+0x276/0x490
    [ 203.406258] [] entry_SYSCALL_64_fastpath+0x13/0x94

    This issue got uncovered after the fix in a08dd0da5307 ("bpf: fix
    regression on verifier pruning wrt map lookups"). The reason why it
    wasn't noticed before was, because as mentioned in a08dd0da5307,
    mark_map_regs() was doing the id matching incorrectly based on the
    uncached regs[regno].id. So, in the first loop, we walked all regs
    and as soon as we found regno == i, then this reg's id was cleared
    when calling mark_reg_unknown_value() thus that every subsequent
    register was probed against id of 0 (which, in combination with the
    PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
    register state can hold), and therefore wasn't type transitioned such
    as in the spilled register case for the second loop.

    Now since that got fixed, it turned out that 57a09bf0a416 ("bpf:
    Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
    mark_reg_unknown_value() incorrectly for the spilled regs, and thus
    hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG.

    Although spilled regs have the same type as the non-spilled regs
    for the verifier state, that is, struct bpf_reg_state, they are
    semantically different from the non-spilled regs. In other words,
    there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
    in the stack, for example, register R could have been spilled by
    the program to stack location X, Y, Z, and in mark_map_regs() we
    need to scan these stack slots of type STACK_SPILL for potential
    registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
    Therefore, depending on the location, the spilled_regs regno can
    be a lot higher than just MAX_BPF_REG's value since we operate on
    stack instead. The reset in mark_reg_unknown_value() itself is
    just fine, only that the BUG_ON() was inappropriate for this. Fix
    it by making a __mark_reg_unknown_value() version that can be
    called from mark_map_reg() generically; we know for the non-spilled
    case that the regno is always < MAX_BPF_REG anyway.

    Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    Reported-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit a08dd0da5307ba01295c8383923e51e7997c3576 ]

    Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
    registers") introduced a regression where existing programs stopped
    loading due to reaching the verifier's maximum complexity limit,
    whereas prior to this commit they were loading just fine; the affected
    program has roughly 2k instructions.

    What was found is that state pruning couldn't be performed effectively
    anymore due to mismatches of the verifier's register state, in particular
    in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per
    se, but rather that verifier needs to perform a lot more work for the
    same program with regards to involved map lookups.

    Since commit 57a09bf0a416 is only about tracking registers with type
    PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
    until they are promoted through pattern matching with a NULL check to
    either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
    id becomes irrelevant for the transitioned types.

    For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
    but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
    transferred further into other types that don't make use of it. Among
    others, one example is where UNKNOWN_VALUE is set on function call
    return with RET_INTEGER return type.

    states_equal() will then fall through the memcmp() on register state;
    note that the second memcmp() uses offsetofend(), so the id is part of
    that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect
    pointed already to 57a09bf0a416, where we really reach beyond complexity
    limit. What I found was that states_equal() often failed in this
    case due to id mismatches in spilled regs with registers in type
    PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
    a memcmp() on their reg state and don't have any other optimizations
    in place, therefore also id was relevant in this case for making a
    pruning decision.

    We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
    For the affected program, it resulted in a ~17 fold reduction of
    complexity and let the program load fine again. Selftest suite also
    runs fine. The only other place where env->id_gen is used currently is
    through direct packet access, but for these cases id is long living, thus
    a different scenario.

    Also, the current logic in mark_map_regs() is not fully correct when
    marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
    reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
    it's id is reset and any subsequent registers that hold the original id
    and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
    anymore, since mark_map_reg() reuses the uncached regs[regno].id that
    was just overridden. Note, we don't need to cache it outside of
    mark_map_regs(), since it's called once on this_branch and the other
    time on other_branch, which are both two independent verifier states.
    A test case for this is added here, too.

    Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    Signed-off-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit d2a4dd37f6b41fbcad76efbf63124eb3126c66fe ]

    Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    and 484611357c19 ("bpf: allow access into map value arrays") by themselves
    are correct, but in combination they make state equivalence ignore 'id' field
    of the register state which can lead to accepting invalid program.

    Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ Upstream commit 57a09bf0a416700676e77102c28f9cfcb48267e0 ]

    A BPF program is required to check the return register of a
    map_elem_lookup() call before accessing memory. The verifier keeps
    track of this by converting the type of the result register from
    PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
    jump ensures safety. This check is currently exclusively performed
    for the result register 0.

    In the event the compiler reorders instructions, BPF_MOV64_REG
    instructions may be moved before the conditional jump which causes
    them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
    verifier objects when the register is accessed:

    0: (b7) r1 = 10
    1: (7b) *(u64 *)(r10 -8) = r1
    2: (bf) r2 = r10
    3: (07) r2 += -8
    4: (18) r1 = 0x59c00000
    6: (85) call 1
    7: (bf) r4 = r0
    8: (15) if r0 == 0x0 goto pc+1
    R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
    9: (7a) *(u64 *)(r4 +0) = 0
    R4 invalid mem access 'map_value_or_null'

    This commit extends the verifier to keep track of all identical
    PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
    assigning them an ID and then marking them all when the conditional
    jump is observed.

    Signed-off-by: Thomas Graf
    Reviewed-by: Josef Bacik
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Thomas Graf
     

01 Dec, 2016

1 commit

  • If we have a branch that looks something like this

    int foo = map->value;
    if (condition) {
    foo += blah;
    } else {
    foo = bar;
    }
    map->array[foo] = baz;

    We will incorrectly assume that the !condition branch is equal to the condition
    branch as the register for foo will be UNKNOWN_VALUE in both cases. We need to
    adjust this logic to only do this if we didn't do a varlen access after we
    processed the !condition branch, otherwise we have different ranges and need to
    check the other branch as well.

    Fixes: 484611357c19 ("bpf: allow access into map value arrays")
    Reported-by: Jann Horn
    Signed-off-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Josef Bacik
     

17 Nov, 2016

1 commit

  • I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
    invalid accesses to bpf map entries. Fix this up by doing a few things

    1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real
    life and just adds extra complexity.

    2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
    minimum value to 0 for positive AND's.

    3) Don't do operations on the ranges if they are set to the limits, as they are
    by definition undefined, and allowing arithmetic operations on those values
    could make them appear valid when they really aren't.

    This fixes the testcase provided by Jann as well as a few other theoretical
    problems.

    Reported-by: Jann Horn
    Signed-off-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Josef Bacik
     

08 Nov, 2016

2 commits

  • In map_create(), we first find and create the map, then once that
    suceeded, we charge it to the user's RLIMIT_MEMLOCK, and then fetch
    a new anon fd through anon_inode_getfd(). The problem is, once the
    latter fails f.e. due to RLIMIT_NOFILE limit, then we only destruct
    the map via map->ops->map_free(), but without uncharging the previously
    locked memory first. That means that the user_struct allocation is
    leaked as well as the accounted RLIMIT_MEMLOCK memory not released.
    Make the label names in the fix consistent with bpf_prog_load().

    Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Commit a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
    added an extra per-cpu reserve to the hash table map to restore old
    behaviour from pre prealloc times. When non-prealloc is in use for a
    map, then problem is that once a hash table extra element has been
    linked into the hash-table, and the hash table is destroyed due to
    refcount dropping to zero, then htab_map_free() -> delete_all_elements()
    will walk the whole hash table and drop all elements via htab_elem_free().
    The problem is that the element from the extra reserve is first fed
    to the wrong backend allocator and eventually freed twice.

    Fixes: a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
    Reported-by: Dmitry Vyukov
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

29 Sep, 2016

1 commit

  • Suppose you have a map array value that is something like this

    struct foo {
    unsigned iter;
    int array[SOME_CONSTANT];
    };

    You can easily insert this into an array, but you cannot modify the contents of
    foo->array[] after the fact. This is because we have no way to verify we won't
    go off the end of the array at verification time. This patch provides a start
    for this work. We accomplish this by keeping track of a minimum and maximum
    value a register could be while we're checking the code. Then at the time we
    try to do an access into a MAP_VALUE we verify that the maximum offset into that
    region is a valid access into that memory region. So in practice, code such as
    this

    unsigned index = 0;

    if (foo->iter >= SOME_CONSTANT)
    foo->iter = index;
    else
    index = foo->iter++;
    foo->array[index] = bar;

    would be allowed, as we can verify that index will always be between 0 and
    SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
    check to make sure the index isn't less than 0, or do something like index %=
    SOME_CONSTANT.

    Signed-off-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Josef Bacik