13 Aug, 2016

1 commit

  • While hashing out BPF's current_task_under_cgroup helper bits, it came
    to discussion that the skb_in_cgroup helper name was suboptimally chosen.

    Tejun says:

    So, I think in_cgroup should mean that the object is in that
    particular cgroup while under_cgroup in the subhierarchy of that
    cgroup. Let's rename the other subhierarchy test to under too. I
    think that'd be a lot less confusing going forward.

    [...]

    It's more intuitive and gives us the room to implement the real
    "in" test if ever necessary in the future.

    Since this touches uapi bits, we need to change this as long as v4.8
    is not yet officially released. Thus, change the helper enum and rename
    related bits.

    Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
    Reference: http://patchwork.ozlabs.org/patch/658500/
    Suggested-by: Sargun Dhillon
    Suggested-by: Tejun Heo
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov

    Daniel Borkmann
     

07 Aug, 2016

1 commit


26 Jul, 2016

2 commits

  • This example shows using a kprobe to act as a dnat mechanism to divert
    traffic for arbitrary endpoints. It rewrite the arguments to a syscall
    while they're still in userspace, and before the syscall has a chance
    to copy the argument into kernel space.

    Although this is an example, it also acts as a test because the mapped
    address is 255.255.255.255:555 -> real address, and that's not a legal
    address to connect to. If the helper is broken, the example will fail
    on the intermediate steps, as well as the final step to verify the
    rewrite of userspace memory succeeded.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     
  • This allows user memory to be written to during the course of a kprobe.
    It shouldn't be used to implement any kind of security mechanism
    because of TOC-TOU attacks, but rather to debug, divert, and
    manipulate execution of semi-cooperative processes.

    Although it uses probe_kernel_write, we limit the address space
    the probe can write into by checking the space with access_ok.
    We do this as opposed to calling copy_to_user directly, in order
    to avoid sleeping. In addition we ensure the threads's current fs
    / segment is USER_DS and the thread isn't exiting nor a kernel thread.

    Given this feature is meant for experiments, and it has a risk of
    crashing the system, and running programs, we print a warning on
    when a proglet that attempts to use this helper is installed,
    along with the pid and process name.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

21 Jul, 2016

1 commit


20 Jul, 2016

2 commits

  • Add a sample that rewrites and forwards packets out on the same
    interface. Observed single core forwarding performance of ~10Mpps.

    Since the mlx4 driver under test recycles every single packet page, the
    perf output shows almost exclusively just the ring management and bpf
    program work. Slowdowns are likely occurring due to cache misses.

    Signed-off-by: Brenden Blanco
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     
  • Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
    hook of a link. With the drop-only program, observed single core rate is
    ~20Mpps.

    Other tests were run, for instance without the dropcnt increment or
    without reading from the packet header, the packet rate was mostly
    unchanged.

    $ perf record -a samples/bpf/xdp1 $(
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     

02 Jul, 2016

1 commit

  • test_cgrp2_array_pin.c:
    A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
    pouplates/updates it with a cgroup2's backed fd and pins it to a
    bpf-fs's file. The pinned file can be loaded by tc and then used
    by the bpf prog later. This program can also update an existing pinned
    array and it could be useful for debugging/testing purpose.

    test_cgrp2_tc_kern.c:
    A bpf prog which should be loaded by tc. It is to demonstrate
    the usage of bpf_skb_in_cgroup.

    test_cgrp2_tc.sh:
    A script that glues the test_cgrp2_array_pin.c and
    test_cgrp2_tc_kern.c together. The idea is like:
    1. Load the test_cgrp2_tc_kern.o by tc
    2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
    with a cgroup fd
    3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
    dropped because of a match on the cgroup

    Most of the lines in test_cgrp2_tc.sh is the boilerplate
    to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
    not bulletproof on errors but should work well enough and
    give enough debug info if things did not go well.

    Signed-off-by: Martin KaFai Lau
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Tejun Heo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

26 Jun, 2016

1 commit


07 May, 2016

2 commits

  • add few tests for "pointer to packet" logic of the verifier

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • parse_simple.c - packet parser exapmle with single length check that
    filters out udp packets for port 9

    parse_varlen.c - variable length parser that understand multiple vlan headers,
    ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
    The packet is parsed layer by layer with multitple length checks.

    parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
    Same functionality as parse_simple.

    simple = 24.1Mpps per core
    varlen = 22.7Mpps
    ldabs = 21.4Mpps

    Parser with LD_ABS instructions is slower than full direct access parser
    which does more packet accesses and checks.

    These examples demonstrate the choice bpf program authors can make between
    flexibility of the parser vs speed.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

04 May, 2016

1 commit


30 Apr, 2016

5 commits

  • Users are likely to manually compile both LLVM 'llc' and 'clang'
    tools. Thus, also allow redefining CLANG and verify command exist.

    Makefile implementation wise, the target that verify the command have
    been generalized.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • It is not intuitive that 'make' must be run from the top level
    directory with argument "samples/bpf/" to compile these eBPF samples.

    Introduce a kbuild make file trick that allow make to be run from the
    "samples/bpf/" directory itself. It basically change to the top level
    directory and call "make samples/bpf/" with the "/" slash after the
    directory name.

    Also add a clean target that only cleans this directory, by taking
    advantage of the kbuild external module setting M=$PWD.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Getting started with using examples in samples/bpf/ is not
    straightforward. There are several dependencies, and specific
    versions of these dependencies.

    Just compiling the example tool is also slightly obscure, e.g. one
    need to call make like:

    make samples/bpf/

    Do notice the "/" slash after the directory name.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Naveen N. Rao
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Make compiling samples/bpf more user friendly, by detecting if LLVM
    compiler tool 'llc' is available, and also detect if the 'bpf' target
    is available in this version of LLVM.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • It is practical to be-able-to redefine the location of the LLVM
    command 'llc', because not all distros have a LLVM version with bpf
    target support. Thus, it is sometimes required to compile LLVM from
    source, and sometimes it is not desired to overwrite the distros
    default LLVM version.

    This feature was removed with 128d1514be35 ("samples/bpf: Use llc in
    PATH, rather than a hardcoded value").

    Add this features back. Note that it is possible to redefine the LLC
    on the make command like:

    make samples/bpf/ LLC=~/git/llvm/build/bin/llc

    Fixes: 128d1514be35 ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Acked-by: Naveen N. Rao
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

29 Apr, 2016

1 commit

  • llvm cannot always recognize memset as builtin function and optimize
    it away, so just delete it. It was a leftover from testing
    of bpf_perf_event_output() with large data structures.

    Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example")
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

15 Apr, 2016

2 commits

  • This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
    verifier behaviour.

    [...]
    #84 raw_stack: no skb_load_bytes OK
    #85 raw_stack: skb_load_bytes, no init OK
    #86 raw_stack: skb_load_bytes, init OK
    #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
    #88 raw_stack: skb_load_bytes, spilled regs corruption OK
    #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
    #90 raw_stack: skb_load_bytes, spilled regs + data OK
    #91 raw_stack: skb_load_bytes, invalid access 1 OK
    #92 raw_stack: skb_load_bytes, invalid access 2 OK
    #93 raw_stack: skb_load_bytes, invalid access 3 OK
    #94 raw_stack: skb_load_bytes, invalid access 4 OK
    #95 raw_stack: skb_load_bytes, invalid access 5 OK
    #96 raw_stack: skb_load_bytes, invalid access 6 OK
    #97 raw_stack: skb_load_bytes, large access OK
    Summary: 98 PASSED, 0 FAILED

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Remove the zero initialization in the sample programs where appropriate.
    Note that this is an optimization which is now possible, old programs
    still doing the zero initialization are just fine as well. Also, make
    sure we don't have padding issues when we don't memset() the entire
    struct anymore.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Apr, 2016

1 commit


08 Apr, 2016

3 commits

  • the first microbenchmark does
    fd=open("/proc/self/comm");
    for() {
    write(fd, "test");
    }
    and on 4 cpus in parallel:
    writes per sec
    base (no tracepoints, no kprobes) 930k
    with kprobe at __set_task_comm() 420k
    with tracepoint at task:task_rename 730k

    For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
    For tracepint bpf program does nothing, since arguments are copied by tracepoint.

    2nd microbenchmark does:
    fd=open("/dev/urandom");
    for() {
    read(fd, buf);
    }
    and on 4 cpus in parallel:
    reads per sec
    base (no tracepoints, no kprobes) 300k
    with kprobe at urandom_read() 279k
    with tracepoint at random:urandom_read 290k

    bpf progs attached to kprobe and tracepoint are noop.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • modify offwaketime to work with sched/sched_switch tracepoint
    instead of kprobe into finish_task_switch

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Recognize "tracepoint/" section name prefix and attach the program
    to that tracepoint.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

07 Apr, 2016

3 commits

  • Add the necessary definitions for building bpf samples on ppc.

    Since ppc doesn't store function return address on the stack, modify how
    PT_REGS_RET() and PT_REGS_FP() work.

    Also, introduce PT_REGS_IP() to access the instruction pointer.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Ananth N Mavinakayanahalli
    Cc: Michael Ellerman
    Signed-off-by: Naveen N. Rao
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Naveen N. Rao
     
  • While at it, remove the generation of .s files and fix some typos in the
    related comment.

    Cc: Alexei Starovoitov
    Cc: David S. Miller
    Cc: Daniel Borkmann
    Cc: Ananth N Mavinakayanahalli
    Cc: Michael Ellerman
    Signed-off-by: Naveen N. Rao
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Naveen N. Rao
     
  • Building BPF samples is failing with the below error:

    samples/bpf/map_perf_test_user.c: In function ‘main’:
    samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
    initializer but incomplete type
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    ^
    samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
    undeclared (first use in this function)
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    ^
    samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
    identifier is reported only once for each function it appears in
    samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
    struct initializer [enabled by default]
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    ^
    samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
    for ‘r’) [enabled by default]
    samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
    struct initializer [enabled by default]
    samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
    for ‘r’) [enabled by default]
    samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
    isn’t known
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    ^
    samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
    function ‘setrlimit’ [-Wimplicit-function-declaration]
    setrlimit(RLIMIT_MEMLOCK, &r);
    ^
    samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
    undeclared (first use in this function)
    setrlimit(RLIMIT_MEMLOCK, &r);
    ^
    samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
    [-Wunused-variable]
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    ^
    make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1

    Fix this by including the necessary header file.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Ananth N Mavinakayanahalli
    Cc: Michael Ellerman
    Acked-by: Alexei Starovoitov
    Signed-off-by: Naveen N. Rao
    Signed-off-by: David S. Miller

    Naveen N. Rao
     

09 Mar, 2016

7 commits


20 Feb, 2016

1 commit

  • This is simplified version of Brendan Gregg's offwaketime:
    This program shows kernel stack traces and task names that were blocked and
    "off-CPU", along with the stack traces and task names for the threads that woke
    them, and the total elapsed time from when they blocked to when they were woken
    up. The combined stacks, task names, and total time is summarized in kernel
    context for efficiency.

    Example:
    $ sudo ./offwaketime | flamegraph.pl > demo.svg
    Open demo.svg in the browser as FlameGraph visualization.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

06 Feb, 2016

3 commits


17 Nov, 2015

1 commit

  • commit 338d4f49d6f7114a017d294ccf7374df4f998edc
    ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h
    into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is
    incompatible with llvm so it will cause BPF samples build failure for ARM64.
    Since sysreg.h is useless for BPF samples, just exclude it from Makefile via
    defining __ASM_SYSREG_H.

    Signed-off-by: Yang Shi
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Yang Shi
     

03 Nov, 2015

1 commit

  • This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
    and BPF_OBJ_GET commands can be used.

    Example with maps:

    # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
    bpf: map fd:3 (Success)
    bpf: pin ret:(0,Success)
    bpf: fd:3 u->(1:42) ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):42 ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
    bpf: get fd:3 (Success)
    bpf: fd:3 u->(1:24) ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):24 ret:(0,Success)

    # ./fds_example -F /sys/fs/bpf/m2 -P -m
    bpf: map fd:3 (Success)
    bpf: pin ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):0 ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m2 -G -m
    bpf: get fd:3 (Success)

    Example with progs:

    # ./fds_example -F /sys/fs/bpf/p -P -p
    bpf: prog fd:3 (Success)
    bpf: pin ret:(0,Success)
    bpf sock:4
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann