15 May, 2018

3 commits

  • We can trivially save 4 bytes in prologue for cBPF since tail calls
    can never be used from there. The register push/pop is pairwise,
    here, x25 (fp) and x26 (tcc), so no point in changing that, only
    reset to zero is not needed.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Improve the JIT to emit 64 and 32 bit immediates, the current
    algorithm is not optimal and we often emit more instructions
    than actually needed. arm64 has movz, movn, movk variants but
    for the current 64 bit immediates we only use movz with a
    series of movk when needed.

    For example loading ffffffffffffabab emits the following 4
    instructions in the JIT today:

    * movz: abab, shift: 0, result: 000000000000abab
    * movk: ffff, shift: 16, result: 00000000ffffabab
    * movk: ffff, shift: 32, result: 0000ffffffffabab
    * movk: ffff, shift: 48, result: ffffffffffffabab

    Whereas after the patch the same load only needs a single
    instruction:

    * movn: 5454, shift: 0, result: ffffffffffffabab

    Another example where two extra instructions can be saved:

    * movz: abab, shift: 0, result: 000000000000abab
    * movk: 1f2f, shift: 16, result: 000000001f2fabab
    * movk: ffff, shift: 32, result: 0000ffff1f2fabab
    * movk: ffff, shift: 48, result: ffffffff1f2fabab

    After the patch:

    * movn: e0d0, shift: 16, result: ffffffff1f2fffff
    * movk: abab, shift: 0, result: ffffffff1f2fabab

    Another example with movz, before:

    * movz: 0000, shift: 0, result: 0000000000000000
    * movk: fea0, shift: 32, result: 0000fea000000000

    After:

    * movz: fea0, shift: 32, result: 0000fea000000000

    Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
    are loaded via emit_a64_mov_i64() which is a similar optimization
    as done in 6fe8b9c1f41d ("bpf, x64: save several bytes by using
    mov over movabsq when possible"). On arm64, the latter allows to
    use a single instruction with movn due to zero extension where
    otherwise two would be needed. And last but not least add a
    missing optimization in emit_a64_mov_i() where movn is used but
    the subsequent movk not needed. With some of the Cilium programs
    in use, this shrinks the needed instructions by about three
    percent. Tested on Cavium ThunderX CN8890.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Follow-up to 816d9ef32a8b ("bpf, arm64: remove ld_abs/ld_ind") in
    that the extra 4 byte JIT scratchpad is not needed anymore since it
    was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

04 May, 2018

1 commit

  • Since LD_ABS/LD_IND instructions are now removed from the core and
    reimplemented through a combination of inlined BPF instructions and
    a slow-path helper, we can get rid of the complexity from arm64 JIT.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

23 Feb, 2018

1 commit

  • I recently noticed a crash on arm64 when feeding a bogus index
    into BPF tail call helper. The crash would not occur when the
    interpreter is used, but only in case of JIT. Output looks as
    follows:

    [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510
    [...]
    [ 347.043065] [fffb850e96492510] address between user and kernel address ranges
    [ 347.050205] Internal error: Oops: 96000004 [#1] SMP
    [...]
    [ 347.190829] x13: 0000000000000000 x12: 0000000000000000
    [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10
    [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000
    [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000
    [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a
    [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500
    [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500
    [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61)
    [ 347.235221] Call trace:
    [ 347.237656] 0xffff000002f3a4fc
    [ 347.240784] bpf_test_run+0x78/0xf8
    [ 347.244260] bpf_prog_test_run_skb+0x148/0x230
    [ 347.248694] SyS_bpf+0x77c/0x1110
    [ 347.251999] el0_svc_naked+0x30/0x34
    [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b)
    [...]

    In this case the index used in BPF r3 is the same as in r1
    at the time of the call, meaning we fed a pointer as index;
    here, it had the value 0xffff808fd7cf0500 which sits in x2.

    While I found tail calls to be working in general (also for
    hitting the error cases), I noticed the following in the code
    emission:

    # bpftool p d j i 988
    [...]
    38: ldr w10, [x1,x10]
    3c: cmp w2, w10
    40: b.ge 0x000000000000007c
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

27 Jan, 2018

1 commit


21 Jan, 2018

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2018-01-19

    The following pull-request contains BPF updates for your *net-next* tree.

    The main changes are:

    1) bpf array map HW offload, from Jakub.

    2) support for bpf_get_next_key() for LPM map, from Yonghong.

    3) test_verifier now runs loaded programs, from Alexei.

    4) xdp cpumap monitoring, from Jesper.

    5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.

    6) user space can now retrieve HW JITed program, from Jiong.

    Note there is a minor conflict between Russell's arm32 JIT fixes
    and removal of bpf_jit_enable variable by Daniel which should
    be resolved by keeping Russell's comment and removing that variable.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Jan, 2018

2 commits

  • The BPF verifier conflict was some minor contextual issue.

    The TUN conflict was less trivial. Cong Wang fixed a memory leak of
    tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
    net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
    ptr_ring.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Having a pure_initcall() callback just to permanently enable BPF
    JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
    a small race window in future where JIT is still disabled on boot.
    Since we know about the setting at compilation time anyway, just
    initialize it properly there. Also consolidate all the individual
    bpf_jit_enable variables into a single one and move them under one
    location. Moreover, don't allow for setting unspecified garbage
    values on them.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

17 Jan, 2018

1 commit

  • Using dynamic stack_depth tracking in arm64 JIT is currently broken in
    combination with tail calls. In prologue, we cache ctx->stack_size and
    adjust SP reg for setting up function call stack, and tearing it down
    again in epilogue. Problem is that when doing a tail call, the cached
    ctx->stack_size might not be the same.

    One way to fix the problem with minimal overhead is to re-adjust SP in
    emit_bpf_tail_call() and properly adjust it to the current program's
    ctx->stack_size. Tested on Cavium ThunderX ARMv8.

    Fixes: f1c9eed7f437 ("bpf, arm64: take advantage of stack_depth tracking")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

19 Dec, 2017

1 commit

  • fix the following issue:
    arch/arm64/net/bpf_jit_comp.c: In function 'bpf_int_jit_compile':
    arch/arm64/net/bpf_jit_comp.c:982:18: error: 'image_size' may be used
    uninitialized in this function [-Werror=maybe-uninitialized]

    Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs")
    Reported-by: Arnd Bergmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

18 Dec, 2017

2 commits

  • similar to x64 add support for bpf-to-bpf calls.
    When program has calls to in-kernel helpers the target call offset
    is known at JIT time and arm64 architecture needs 2 passes.
    With bpf-to-bpf calls the dynamically allocated function start
    is unknown until all functions of the program are JITed.
    Therefore (just like x64) arm64 JIT needs one extra pass over
    the program to emit correct call offsets.

    Implementation detail:
    Avoid being too clever in 64-bit immediate moves and
    always use 4 instructions (instead of 3-4 depending on the address)
    to make sure only one extra pass is needed.
    If some future optimization would make it worth while to optimize
    'call 64-bit imm' further, the JIT would need to do 4 passes
    over the program instead of 3 as in this patch.
    For typical bpf program address the mov needs 3 or 4 insns,
    so unconditional 4 insns to save extra pass is a worthy trade off
    at this state of JIT.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     
  • global bpf_jit_enable variable is tested multiple times in JITs,
    blinding and verifier core. The malicious root can try to toggle
    it while loading the programs. This race condition was accounted
    for and there should be no issues, but it's safer to avoid
    this race condition.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

10 Aug, 2017

1 commit


06 Jul, 2017

1 commit

  • Pull arm64 updates from Will Deacon:

    - RAS reporting via GHES/APEI (ACPI)

    - Indirect ftrace trampolines for modules

    - Improvements to kernel fault reporting

    - Page poisoning

    - Sigframe cleanups and preparation for SVE context

    - Core dump fixes

    - Sparse fixes (mainly relating to endianness)

    - xgene SoC PMU v3 driver

    - Misc cleanups and non-critical fixes

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
    arm64: fix endianness annotation for 'struct jit_ctx' and friends
    arm64: cpuinfo: constify attribute_group structures.
    arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set()
    arm64: ptrace: Remove redundant overrun check from compat_vfp_set()
    arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails
    arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn()
    arm64: fix endianness annotation in get_kaslr_seed()
    arm64: add missing conversion to __wsum in ip_fast_csum()
    arm64: fix endianness annotation in acpi_parking_protocol.c
    arm64: use readq() instead of readl() to read 64bit entry_point
    arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm()
    arm64: fix endianness annotation for aarch64_insn_write()
    arm64: fix endianness annotation in aarch64_insn_read()
    arm64: fix endianness annotation in call_undef_hook()
    arm64: fix endianness annotation for debug-monitors.c
    ras: mark stub functions as 'inline'
    arm64: pass endianness info to sparse
    arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels
    arm64: signal: Allow expansion of the signal frame
    acpi: apei: check for pending errors when probing GHES entries
    ...

    Linus Torvalds
     

01 Jul, 2017

1 commit

  • struct jit_ctx::image is used the store a pointer to the jitted
    intructions, which are always little-endian. These instructions
    are thus correctly converted from native order to little-endian
    before being stored but the pointer 'image' is declared as for
    native order values.

    Fix this by declaring the field as __le32* instead of u32*.
    Same for the pointer used in jit_fill_hole() to initialize
    the image.

    Signed-off-by: Luc Van Oostenryck
    Signed-off-by: Will Deacon

    Luc Van Oostenryck
     

15 Jun, 2017

1 commit


12 Jun, 2017

1 commit


08 Jun, 2017

1 commit

  • Will reported that in BPF_XADD we must use a different register in stxr
    instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
    behavior per architecture. Reference manual says [1]:

    If s == t, then one of the following behaviors must occur:

    * The instruction is UNDEFINED.
    * The instruction executes as a NOP.
    * The instruction performs the store to the specified address, but
    the value stored is UNKNOWN.

    Thus, use a different temporary register for the status flag to fix it.

    Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:

    [...]
    0000003c: c85f7d4b ldxr x11, [x10]
    00000040: 8b07016b add x11, x11, x7
    00000044: c80c7d4b stxr w12, x11, [x10]
    00000048: 35ffffac cbnz w12, 0x0000003c
    [...]

    [1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132

    Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
    Reported-by: Will Deacon
    Signed-off-by: Daniel Borkmann
    Acked-by: Will Deacon
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

07 Jun, 2017

1 commit


01 Jun, 2017

1 commit


12 May, 2017

1 commit

  • Shubham was recently asking on netdev why in arm64 JIT we don't multiply
    the index for accessing the tail call map by 8. That led me into testing
    out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
    dereference on the tail call.

    The buggy access is at:

    prog = array->ptrs[index];
    if (prog == NULL)
    goto out;

    [...]
    00000060: d2800e0a mov x10, #0x70 // #112
    00000064: f86a682a ldr x10, [x1,x10]
    00000068: f862694b ldr x11, [x10,x2]
    0000006c: b40000ab cbz x11, 0x00000080
    [...]

    The code triggering the crash is f862694b. x1 at the time contains the
    address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
    above we load the pointer to the program at map slot 0 into x10. x10
    can then be NULL if the slot is not occupied, which we later on try to
    access with a user given offset in x2 that is the map index.

    Fix this by emitting the following instead:

    [...]
    00000060: d2800e0a mov x10, #0x70 // #112
    00000064: 8b0a002a add x10, x1, x10
    00000068: d37df04b lsl x11, x2, #3
    0000006c: f86b694b ldr x11, [x10,x11]
    00000070: b40000ab cbz x11, 0x00000084
    [...]

    This basically adds the offset to ptrs to the base address of the bpf
    array we got and we later on access the map with an index * 8 offset
    relative to that. The tail call map itself is basically one large area
    with meta data at the head followed by the array of prog pointers.
    This makes tail calls working again, tested on Cavium ThunderX ARMv8.

    Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper")
    Reported-by: Shubham Bansal
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 May, 2017

1 commit

  • The set_memory_* functions have moved to set_memory.h. Use that header
    explicitly.

    Link: http://lkml.kernel.org/r/1488920133-27229-4-git-send-email-labbott@redhat.com
    Signed-off-by: Laura Abbott
    Acked-by: Catalin Marinas
    Acked-by: Mark Rutland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

03 May, 2017

2 commits

  • When the instruction right before the branch destination is
    a 64 bit load immediate, we currently calculate the wrong
    jump offset in the ctx->offset[] array as we only account
    one instruction slot for the 64 bit load immediate although
    it uses two BPF instructions. Fix it up by setting the offset
    into the right slot after we incremented the index.

    Before (ldimm64 test 1):

    [...]
    00000020: 52800007 mov w7, #0x0 // #0
    00000024: d2800060 mov x0, #0x3 // #3
    00000028: d2800041 mov x1, #0x2 // #2
    0000002c: eb01001f cmp x0, x1
    00000030: 54ffff82 b.cs 0x00000020
    00000034: d29fffe7 mov x7, #0xffff // #65535
    00000038: f2bfffe7 movk x7, #0xffff, lsl #16
    0000003c: f2dfffe7 movk x7, #0xffff, lsl #32
    00000040: f2ffffe7 movk x7, #0xffff, lsl #48
    00000044: d29dddc7 mov x7, #0xeeee // #61166
    00000048: f2bdddc7 movk x7, #0xeeee, lsl #16
    0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32
    00000050: f2fdddc7 movk x7, #0xeeee, lsl #48
    [...]

    After (ldimm64 test 1):

    [...]
    00000020: 52800007 mov w7, #0x0 // #0
    00000024: d2800060 mov x0, #0x3 // #3
    00000028: d2800041 mov x1, #0x2 // #2
    0000002c: eb01001f cmp x0, x1
    00000030: 540000a2 b.cs 0x00000044
    00000034: d29fffe7 mov x7, #0xffff // #65535
    00000038: f2bfffe7 movk x7, #0xffff, lsl #16
    0000003c: f2dfffe7 movk x7, #0xffff, lsl #32
    00000040: f2ffffe7 movk x7, #0xffff, lsl #48
    00000044: d29dddc7 mov x7, #0xeeee // #61166
    00000048: f2bdddc7 movk x7, #0xeeee, lsl #16
    0000004c: f2ddddc7 movk x7, #0xeeee, lsl #32
    00000050: f2fdddc7 movk x7, #0xeeee, lsl #48
    [...]

    Also, add a couple of test cases to make sure JITs pass
    this test. Tested on Cavium ThunderX ARMv8. The added
    test cases all pass after the fix.

    Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
    Reported-by: David S. Miller
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Cc: Xi Wang
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
    completes JITing of all BPF instructions, meaning we can thus also remove
    the 'notyet' label and do not need to fall back to the interpreter when
    BPF_XADD is used in a program!

    This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
    where all current eBPF features are supported.

    BPF_W example from test_bpf:

    .u.insns_int = {
    BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
    BPF_ST_MEM(BPF_W, R10, -40, 0x10),
    BPF_STX_XADD(BPF_W, R10, R0, -40),
    BPF_LDX_MEM(BPF_W, R0, R10, -40),
    BPF_EXIT_INSN(),
    },

    [...]
    00000020: 52800247 mov w7, #0x12 // #18
    00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40
    00000028: d280020a mov x10, #0x10 // #16
    0000002c: b82b6b2a str w10, [x25,x11]
    // start of xadd mapping:
    00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40
    00000034: 8b19014a add x10, x10, x25
    00000038: f9800151 prfm pstl1strm, [x10]
    0000003c: 885f7d4b ldxr w11, [x10]
    00000040: 0b07016b add w11, w11, w7
    00000044: 880b7d4b stxr w11, w11, [x10]
    00000048: 35ffffab cbnz w11, 0x0000003c
    // end of xadd mapping:
    [...]

    BPF_DW example from test_bpf:

    .u.insns_int = {
    BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
    BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
    BPF_STX_XADD(BPF_DW, R10, R0, -40),
    BPF_LDX_MEM(BPF_DW, R0, R10, -40),
    BPF_EXIT_INSN(),
    },

    [...]
    00000020: 52800247 mov w7, #0x12 // #18
    00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40
    00000028: d280020a mov x10, #0x10 // #16
    0000002c: f82b6b2a str x10, [x25,x11]
    // start of xadd mapping:
    00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40
    00000034: 8b19014a add x10, x10, x25
    00000038: f9800151 prfm pstl1strm, [x10]
    0000003c: c85f7d4b ldxr x11, [x10]
    00000040: 8b07016b add x11, x11, x7
    00000044: c80b7d4b stxr w11, x11, [x10]
    00000048: 35ffffab cbnz w11, 0x0000003c
    // end of xadd mapping:
    [...]

    Tested on Cavium ThunderX ARMv8, test suite results after the patch:

    No JIT: [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
    With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

29 Apr, 2017

1 commit


22 Feb, 2017

1 commit

  • Eric and Willem reported that they recently saw random crashes when
    JIT was in use and bisected this to 74451e66d516 ("bpf: make jited
    programs visible in traces"). Issue was that the consolidation part
    added bpf_jit_binary_unlock_ro() that would unlock previously made
    read-only memory back to read-write. However, DEBUG_SET_MODULE_RONX
    cannot be used for this to test for presence of set_memory_*()
    functions. We need to use ARCH_HAS_SET_MEMORY instead to fix this;
    also add the corresponding bpf_jit_binary_lock_ro() to filter.h.

    Fixes: 74451e66d516 ("bpf: make jited programs visible in traces")
    Reported-by: Eric Dumazet
    Reported-by: Willem de Bruijn
    Bisected-by: Eric Dumazet
    Signed-off-by: Daniel Borkmann
    Tested-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

18 Feb, 2017

2 commits

  • Long standing issue with JITed programs is that stack traces from
    function tracing check whether a given address is kernel code
    through {__,}kernel_text_address(), which checks for code in core
    kernel, modules and dynamically allocated ftrace trampolines. But
    what is still missing is BPF JITed programs (interpreted programs
    are not an issue as __bpf_prog_run() will be attributed to them),
    thus when a stack trace is triggered, the code walking the stack
    won't see any of the JITed ones. The same for address correlation
    done from user space via reading /proc/kallsyms. This is read by
    tools like perf, but the latter is also useful for permanent live
    tracing with eBPF itself in combination with stack maps when other
    eBPF types are part of the callchain. See offwaketime example on
    dumping stack from a map.

    This work tries to tackle that issue by making the addresses and
    symbols known to the kernel. The lookup from *kernel_text_address()
    is implemented through a latched RB tree that can be read under
    RCU in fast-path that is also shared for symbol/size/offset lookup
    for a specific given address in kallsyms. The slow-path iteration
    through all symbols in the seq file done via RCU list, which holds
    a tiny fraction of all exported ksyms, usually below 0.1 percent.
    Function symbols are exported as bpf_prog_, in order to aide
    debugging and attribution. This facility is currently enabled for
    root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
    is active in any mode. The rationale behind this is that still a lot
    of systems ship with world read permissions on kallsyms thus addresses
    should not get suddenly exposed for them. If that situation gets
    much better in future, we always have the option to change the
    default on this. Likewise, unprivileged programs are not allowed
    to add entries there either, but that is less of a concern as most
    such programs types relevant in this context are for root-only anyway.
    If enabled, call graphs and stack traces will then show a correct
    attribution; one example is illustrated below, where the trace is
    now visible in tooling such as perf script --kallsyms=/proc/kallsyms
    and friends.

    Before:

    7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
    f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)

    After:

    7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
    [...]
    7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
    f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
    that a single __weak function in the core that can be overridden
    similarly to the eBPF one. Also remove stale pr_err() mentions
    of bpf_jit_compile.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Jun, 2016

3 commits

  • Remove superfluous stack frame, saving us 3 instructions for every
    LD_ABS or LD_IND.

    Signed-off-by: Zi Shen Lim
    Signed-off-by: David S. Miller

    Zi Shen Lim
     
  • Remove superfluous stack frame, saving us 3 instructions for
    every JMP_CALL.

    Signed-off-by: Zi Shen Lim
    Signed-off-by: David S. Miller

    Zi Shen Lim
     
  • Add support for JMP_CALL_X (tail call) introduced by commit 04fd61ab36ec
    ("bpf: allow bpf programs to tail-call other bpf programs").

    bpf_tail_call() arguments:
    ctx - context pointer passed to next program
    array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
    index - index inside array that selects specific program to run

    In this implementation arm64 JIT jumps into callee program after prologue,
    so callee program reuses the same stack. For tail_call_cnt, we use the
    callee-saved R26 (which was already saved/restored but previously unused
    by JIT).

    With this patch a tail call generates the following code on arm64:

    if (index >= array->map.max_entries)
    goto out;

    34: mov x10, #0x10 // #16
    38: ldr w10, [x1,x10]
    3c: cmp w2, w10
    40: b.ge 0x0000000000000074

    if (tail_call_cnt > MAX_TAIL_CALL_CNT)
    goto out;
    tail_call_cnt++;

    44: mov x10, #0x20 // #32
    48: cmp x26, x10
    4c: b.gt 0x0000000000000074
    50: add x26, x26, #0x1

    prog = array->ptrs[index];
    if (prog == NULL)
    goto out;

    54: mov x10, #0x68 // #104
    58: ldr x10, [x1,x10]
    5c: ldr x11, [x10,x2]
    60: cbz x11, 0x0000000000000074

    goto *(prog->bpf_func + prologue_size);

    64: mov x10, #0x20 // #32
    68: ldr x10, [x11,x10]
    6c: add x10, x10, #0x20
    70: br x10
    74:

    Signed-off-by: Zi Shen Lim
    Signed-off-by: David S. Miller

    Zi Shen Lim
     

18 May, 2016

1 commit

  • In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
    tmp registers, which are callee-saved registers. This leads to variable size
    of JIT prologue and epilogue. The latest blinding constant change prefers to
    constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
    registers which not need to be saved/restored during function call. So, replace
    R23 and R24 to R10 and R11, and remove tmp_used flag to save 2 instructions for
    some jited BPF program.

    CC: Daniel Borkmann
    Acked-by: Zi Shen Lim
    Signed-off-by: Yang Shi
    Acked-by: Catalin Marinas
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Yang Shi
     

17 May, 2016

3 commits

  • This patch adds recently added constant blinding helpers into the
    arm64 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
    to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
    pair for rewriting the program into a blinded one, and to map the
    BPF_REG_AX register to a CPU register. The mapping is on x9.

    Signed-off-by: Daniel Borkmann
    Acked-by: Zi Shen Lim
    Acked-by: Yang Shi
    Tested-by: Yang Shi
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Since the blinding is strictly only called from inside eBPF JITs,
    we need to change signatures for bpf_int_jit_compile() and
    bpf_prog_select_runtime() first in order to prepare that the
    eBPF program we're dealing with can change underneath. Hence,
    for call sites, we need to return the latest prog. No functional
    change in this patch.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There is never such a situation, where bpf_int_jit_compile() is
    called with either prog as NULL or len as 0, so the tests are
    unnecessary and confusing as people would just copy them. s390
    doesn't have them, so no change is needed there.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 May, 2016

1 commit

  • Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler")
    had the relevant code paths, but due to an oversight always fail jiting.

    As a result, we had been falling back to BPF interpreter whenever a BPF
    program has JMP_JSET_{X,K} instructions.

    With this fix, we confirm that the corresponding tests in lib/test_bpf
    continue to pass, and also jited.

    ...
    [ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
    [ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
    [ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
    ...
    [ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
    [ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
    [ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
    [ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
    ...

    Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
    Signed-off-by: Zi Shen Lim
    Acked-by: Will Deacon
    Acked-by: Yang Shi
    Signed-off-by: David S. Miller

    Zi Shen Lim
     

18 Jan, 2016

1 commit

  • Code generation functions in arch/arm64/kernel/insn.c previously
    BUG_ON invalid parameters. Following change of that behavior, now we
    need to handle the error case where AARCH64_BREAK_FAULT is returned.

    Instead of error-handling on every emit() in JIT, we add a new
    validation pass at the end of JIT compilation. There's no point in
    running JITed code at run-time only to trap due to AARCH64_BREAK_FAULT.
    Instead, we drop this failed JIT compilation and allow the system to
    gracefully fallback on the BPF interpreter.

    Signed-off-by: Zi Shen Lim
    Suggested-by: Alexei Starovoitov
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Zi Shen Lim
     

19 Dec, 2015

1 commit

  • Back in the days where eBPF (or back then "internal BPF" ;->) was not
    exposed to user space, and only the classic BPF programs internally
    translated into eBPF programs, we missed the fact that for classic BPF
    A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9
    ("net: filter: initialize A and X registers"), and thus classic BPF
    specifics were added to the eBPF interpreter core to work around it.

    This added some confusion for JIT developers later on that take the
    eBPF interpreter code as an example for deriving their JIT. F.e. in
    f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at
    least X could leak stack memory. Furthermore, since this is only needed
    for classic BPF translations and not for eBPF (verifier takes care
    that read access to regs cannot be done uninitialized), more complexity
    is added to JITs as they need to determine whether they deal with
    migrations or native eBPF where they can just omit clearing A/X in
    their prologue and thus reduce image size a bit, see f.e. cde66c2d88da
    ("s390/bpf: Only clear A and X for converted BPF programs"). In other
    cases (x86, arm64), A and X is being cleared in the prologue also for
    eBPF case, which is unnecessary.

    Lets move this into the BPF migration in bpf_convert_filter() where it
    actually belongs as long as the number of eBPF JITs are still few. It
    can thus be done generically; allowing us to remove the quirk from
    __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
    while reducing code duplication on this matter in current(/future) eBPF
    JITs.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Reviewed-by: Michael Holzheu
    Tested-by: Michael Holzheu
    Cc: Zi Shen Lim
    Cc: Yang Shi
    Acked-by: Yang Shi
    Acked-by: Zi Shen Lim
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

04 Dec, 2015

1 commit

  • aarch64 doesn't have native store immediate instruction, such operation
    has to be implemented by the below instruction sequence:

    Load immediate to register
    Store register

    Signed-off-by: Yang Shi
    CC: Zi Shen Lim
    CC: Xi Wang
    Reviewed-by: Zi Shen Lim
    Signed-off-by: David S. Miller

    Yang Shi