29 Nov, 2016

1 commit

  • The files "sampleip_kern.c" and "trace_event_kern.c" directly access
    "ctx->regs.ip" which is not available on s390x. Fix this and use the
    PT_REGS_IP() macro instead.

    Also fix the macro for s390x and use "psw.addr" from "pt_regs".

    Reported-by: Zvonko Kosic
    Signed-off-by: Michael Holzheu
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Michael Holzheu
     

13 Nov, 2016

1 commit

  • The test creates two netns, ns1 and ns2. The host (the default netns)
    has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.

    ping VIPS from ns1 host ns2 (VIPs at loopback)

    The test is to have ns1 pinging VIPs configured at the loopback
    interface in ns2.

    The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
    at lo@ns2). [Note: 0x66 => 102].

    At ns1, the VIPs are routed _via_ the host.

    At the host, bpf programs are installed at the veth to redirect packets
    from a veth to the ipip/ip6tnl. The test is configured in a way so
    that both ingress and egress can be tested.

    At ns2, the ipip/ip6tnl dev is configured with the local and remote address
    specified. The return path is routed to the dev ipip/ip6tnl.

    During egress test, the host also locally tests pinging the VIPs to ensure
    that bpf_redirect at egress also works for the direct egress (i.e. not
    forwarding from dev ve1 to ve2).

    Acked-by: Alexei Starovoitov
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

30 Oct, 2016

1 commit

  • Some of the sample files are causing issues when they are loaded with tc
    and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
    file as program/map/etc sections are not present, which can be easily
    spotted with readelf(1).

    Currently, BPF samples are including some of the kernel headers and mid
    term we should change them to refrain from this, really. When dynamic
    debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
    is easily overlooked in the build as clang spills this along with other
    noisy warnings from various header includes, and llc still generates an
    ELF file with mentioned characteristics. For just playing around with BPF
    examples, this can be a bit of a hurdle to take.

    Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
    done in xdp*_kern samples already.

    Fixes: 65d472fb007d ("samples/bpf: add 'pointer to packet' tests")
    Fixes: 6afb1e28b859 ("samples/bpf: Add tunnel set/get tests.")
    Fixes: a3f74617340b ("cgroup: bpf: Add an example to do cgroup checking in BPF")
    Reported-by: Chandrasekar Kannan
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

29 Sep, 2016

1 commit

  • Suppose you have a map array value that is something like this

    struct foo {
    unsigned iter;
    int array[SOME_CONSTANT];
    };

    You can easily insert this into an array, but you cannot modify the contents of
    foo->array[] after the fact. This is because we have no way to verify we won't
    go off the end of the array at verification time. This patch provides a start
    for this work. We accomplish this by keeping track of a minimum and maximum
    value a register could be while we're checking the code. Then at the time we
    try to do an access into a MAP_VALUE we verify that the maximum offset into that
    region is a valid access into that memory region. So in practice, code such as
    this

    unsigned index = 0;

    if (foo->iter >= SOME_CONSTANT)
    foo->iter = index;
    else
    index = foo->iter++;
    foo->array[index] = bar;

    would be allowed, as we can verify that index will always be between 0 and
    SOME_CONSTANT-1. If you wish to use signed values you'll have to have an extra
    check to make sure the index isn't less than 0, or do something like index %=
    SOME_CONSTANT.

    Signed-off-by: Josef Bacik
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Josef Bacik
     

27 Sep, 2016

2 commits


21 Sep, 2016

1 commit

  • Add couple of test cases for direct write and the negative size issue, and
    also adjust the direct packet access test4 since it asserts that writes are
    not possible, but since we've just added support for writes, we need to
    invert the verdict to ACCEPT, of course. Summary: 133 PASSED, 0 FAILED.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

17 Sep, 2016

2 commits

  • the test creates 3 namespaces with veth connected via bridge.
    First two namespaces simulate two different hosts with the same
    IPv4 and IPv6 addresses configured on the tunnel interface and they
    communicate with outside world via standard tunnels.
    Third namespace creates collect_md tunnel that is driven by BPF
    program which selects different remote host (either first or
    second namespace) based on tcp dest port number while tcp dst
    ip is the same.
    This scenario is rough approximation of load balancer use case.
    The tests check both traditional tunnel configuration and collect_md mode.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
    It tests both traditional tunnel configuration and
    dynamic via bpf helpers.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

09 Sep, 2016

1 commit

  • LLVM can generate code that tests for direct packet access via
    skb->data/data_end in a way that currently gets rejected by the
    verifier, example:

    [...]
    7: (61) r3 = *(u32 *)(r6 +80)
    8: (61) r9 = *(u32 *)(r6 +76)
    9: (bf) r2 = r9
    10: (07) r2 += 54
    11: (3d) if r3 >= r2 goto pc+12
    R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
    R9=pkt(id=0,off=0,r=0) R10=fp
    12: (18) r4 = 0xffffff7a
    14: (05) goto pc+430
    [...]

    from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
    R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
    24: (7b) *(u64 *)(r10 -40) = r1
    25: (b7) r1 = 0
    26: (63) *(u32 *)(r6 +56) = r1
    27: (b7) r2 = 40
    28: (71) r8 = *(u8 *)(r9 +20)
    invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

    The reason why this gets rejected despite a proper test is that we
    currently call find_good_pkt_pointers() only in case where we detect
    tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
    derived, for example, from a register of type pkt(id=Y,off=0,r=0)
    pointing to skb->data. find_good_pkt_pointers() then fills the range
    in the current branch to pkt(id=Y,off=0,r=Z) on success.

    For above case, we need to extend that to recognize pkt_end >= rX
    pattern and mark the other branch that is taken on success with the
    appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
    Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
    only two practical options to test for from what LLVM could have
    generated, since there's no such thing as BPF_JLT (= r2 goto pc+12
    R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
    R9=pkt(id=0,off=0,r=0) R10=fp
    12: (18) r4 = 0xffffff7a
    14: (05) goto pc+430
    [...]

    from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
    R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
    24: (7b) *(u64 *)(r10 -40) = r1
    25: (b7) r1 = 0
    26: (63) *(u32 *)(r6 +56) = r1
    27: (b7) r2 = 40
    28: (71) r8 = *(u8 *)(r9 +20)
    29: (bf) r1 = r8
    30: (25) if r8 > 0x3c goto pc+47
    R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
    R9=pkt(id=0,off=0,r=54) R10=fp
    31: (b7) r1 = 1
    [...]

    Verifier test cases are also added in this work, one that demonstrates
    the mentioned example here and one that tries a bad packet access for
    the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
    pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
    with both test variants (>, >=).

    Fixes: 969bf05eb3ce ("bpf: direct packet access")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Sep, 2016

2 commits

  • sample instruction pointer and frequency count in a BPF map

    Signed-off-by: Brendan Gregg
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brendan Gregg
     
  • The bpf program is called 50 times a second and does hashmap[kern&user_stackid]++
    It's primary purpose to check that key bpf helpers like map lookup, update,
    get_stackid, trace_printk and ctx access are all working.
    It checks:
    - PERF_COUNT_HW_CPU_CYCLES on all cpus
    - PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children
    - PERF_COUNT_SW_CPU_CLOCK on all cpus
    - PERF_COUNT_SW_CPU_CLOCK for current process

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

20 Aug, 2016

1 commit

  • The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
    and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE. A native
    tunnel device is created in a namespace to interact with a lwtunnel
    device out of the namespace, with metadata enabled. The bpf_skb_set_*
    program is attached to tc egress and bpf_skb_get_* is attached to egress
    qdisc. A ping between two tunnels is used to verify correctness and
    the result of bpf_skb_get_* printed by bpf_trace_printk.

    Signed-off-by: William Tu
    Signed-off-by: David S. Miller

    William Tu
     

18 Aug, 2016

1 commit


13 Aug, 2016

3 commits

  • test various corner cases of the helper function access to the packet
    via crafted XDP programs.

    Signed-off-by: Aaron Yue
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Aaron Yue
     
  • While hashing out BPF's current_task_under_cgroup helper bits, it came
    to discussion that the skb_in_cgroup helper name was suboptimally chosen.

    Tejun says:

    So, I think in_cgroup should mean that the object is in that
    particular cgroup while under_cgroup in the subhierarchy of that
    cgroup. Let's rename the other subhierarchy test to under too. I
    think that'd be a lot less confusing going forward.

    [...]

    It's more intuitive and gives us the room to implement the real
    "in" test if ever necessary in the future.

    Since this touches uapi bits, we need to change this as long as v4.8
    is not yet officially released. Thus, change the helper enum and rename
    related bits.

    Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
    Reference: http://patchwork.ozlabs.org/patch/658500/
    Suggested-by: Sargun Dhillon
    Suggested-by: Tejun Heo
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov

    Daniel Borkmann
     
  • This test has a BPF program which writes the last known pid to call the
    sync syscall within a given cgroup to a map.

    The user mode program creates its own mount namespace, and mounts the
    cgroupsv2 hierarchy in there, as on all current test systems
    (Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
    Once it does this, it proceeds to test.

    The test checks for positive and negative condition. It ensures that
    when it's part of a given cgroup, its pid is captured in the map,
    and that when it leaves the cgroup, this doesn't happen.

    It populate a cgroups arraymap prior to execution in userspace. This means
    that the program must be run in the same cgroups namespace as the programs
    that are being traced.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Tejun Heo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

11 Aug, 2016

1 commit

  • The commit 555c8a8623a3 ("bpf: avoid stack copy and use skb ctx for event output")
    started using 20 of initially reserved upper 32-bits of 'flags' argument
    in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h

    Signed-off-by: Adam Barth
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Adam Barth
     

07 Aug, 2016

1 commit


26 Jul, 2016

2 commits

  • This example shows using a kprobe to act as a dnat mechanism to divert
    traffic for arbitrary endpoints. It rewrite the arguments to a syscall
    while they're still in userspace, and before the syscall has a chance
    to copy the argument into kernel space.

    Although this is an example, it also acts as a test because the mapped
    address is 255.255.255.255:555 -> real address, and that's not a legal
    address to connect to. If the helper is broken, the example will fail
    on the intermediate steps, as well as the final step to verify the
    rewrite of userspace memory succeeded.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     
  • This allows user memory to be written to during the course of a kprobe.
    It shouldn't be used to implement any kind of security mechanism
    because of TOC-TOU attacks, but rather to debug, divert, and
    manipulate execution of semi-cooperative processes.

    Although it uses probe_kernel_write, we limit the address space
    the probe can write into by checking the space with access_ok.
    We do this as opposed to calling copy_to_user directly, in order
    to avoid sleeping. In addition we ensure the threads's current fs
    / segment is USER_DS and the thread isn't exiting nor a kernel thread.

    Given this feature is meant for experiments, and it has a risk of
    crashing the system, and running programs, we print a warning on
    when a proglet that attempts to use this helper is installed,
    along with the pid and process name.

    Signed-off-by: Sargun Dhillon
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

21 Jul, 2016

1 commit


20 Jul, 2016

2 commits

  • Add a sample that rewrites and forwards packets out on the same
    interface. Observed single core forwarding performance of ~10Mpps.

    Since the mlx4 driver under test recycles every single packet page, the
    perf output shows almost exclusively just the ring management and bpf
    program work. Slowdowns are likely occurring due to cache misses.

    Signed-off-by: Brenden Blanco
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     
  • Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
    hook of a link. With the drop-only program, observed single core rate is
    ~20Mpps.

    Other tests were run, for instance without the dropcnt increment or
    without reading from the packet header, the packet rate was mostly
    unchanged.

    $ perf record -a samples/bpf/xdp1 $(
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Brenden Blanco
     

02 Jul, 2016

1 commit

  • test_cgrp2_array_pin.c:
    A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
    pouplates/updates it with a cgroup2's backed fd and pins it to a
    bpf-fs's file. The pinned file can be loaded by tc and then used
    by the bpf prog later. This program can also update an existing pinned
    array and it could be useful for debugging/testing purpose.

    test_cgrp2_tc_kern.c:
    A bpf prog which should be loaded by tc. It is to demonstrate
    the usage of bpf_skb_in_cgroup.

    test_cgrp2_tc.sh:
    A script that glues the test_cgrp2_array_pin.c and
    test_cgrp2_tc_kern.c together. The idea is like:
    1. Load the test_cgrp2_tc_kern.o by tc
    2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
    with a cgroup fd
    3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
    dropped because of a match on the cgroup

    Most of the lines in test_cgrp2_tc.sh is the boilerplate
    to setup the cgroup/bpf-fs/net-devices/netns...etc. It is
    not bulletproof on errors but should work well enough and
    give enough debug info if things did not go well.

    Signed-off-by: Martin KaFai Lau
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Tejun Heo
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

26 Jun, 2016

1 commit


07 May, 2016

2 commits

  • add few tests for "pointer to packet" logic of the verifier

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • parse_simple.c - packet parser exapmle with single length check that
    filters out udp packets for port 9

    parse_varlen.c - variable length parser that understand multiple vlan headers,
    ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
    The packet is parsed layer by layer with multitple length checks.

    parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
    Same functionality as parse_simple.

    simple = 24.1Mpps per core
    varlen = 22.7Mpps
    ldabs = 21.4Mpps

    Parser with LD_ABS instructions is slower than full direct access parser
    which does more packet accesses and checks.

    These examples demonstrate the choice bpf program authors can make between
    flexibility of the parser vs speed.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

04 May, 2016

1 commit


30 Apr, 2016

5 commits

  • Users are likely to manually compile both LLVM 'llc' and 'clang'
    tools. Thus, also allow redefining CLANG and verify command exist.

    Makefile implementation wise, the target that verify the command have
    been generalized.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • It is not intuitive that 'make' must be run from the top level
    directory with argument "samples/bpf/" to compile these eBPF samples.

    Introduce a kbuild make file trick that allow make to be run from the
    "samples/bpf/" directory itself. It basically change to the top level
    directory and call "make samples/bpf/" with the "/" slash after the
    directory name.

    Also add a clean target that only cleans this directory, by taking
    advantage of the kbuild external module setting M=$PWD.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Getting started with using examples in samples/bpf/ is not
    straightforward. There are several dependencies, and specific
    versions of these dependencies.

    Just compiling the example tool is also slightly obscure, e.g. one
    need to call make like:

    make samples/bpf/

    Do notice the "/" slash after the directory name.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Naveen N. Rao
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Make compiling samples/bpf more user friendly, by detecting if LLVM
    compiler tool 'llc' is available, and also detect if the 'bpf' target
    is available in this version of LLVM.

    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • It is practical to be-able-to redefine the location of the LLVM
    command 'llc', because not all distros have a LLVM version with bpf
    target support. Thus, it is sometimes required to compile LLVM from
    source, and sometimes it is not desired to overwrite the distros
    default LLVM version.

    This feature was removed with 128d1514be35 ("samples/bpf: Use llc in
    PATH, rather than a hardcoded value").

    Add this features back. Note that it is possible to redefine the LLC
    on the make command like:

    make samples/bpf/ LLC=~/git/llvm/build/bin/llc

    Fixes: 128d1514be35 ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Alexei Starovoitov
    Acked-by: Naveen N. Rao
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

29 Apr, 2016

1 commit

  • llvm cannot always recognize memset as builtin function and optimize
    it away, so just delete it. It was a leftover from testing
    of bpf_perf_event_output() with large data structures.

    Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example")
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

15 Apr, 2016

2 commits

  • This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
    verifier behaviour.

    [...]
    #84 raw_stack: no skb_load_bytes OK
    #85 raw_stack: skb_load_bytes, no init OK
    #86 raw_stack: skb_load_bytes, init OK
    #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
    #88 raw_stack: skb_load_bytes, spilled regs corruption OK
    #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
    #90 raw_stack: skb_load_bytes, spilled regs + data OK
    #91 raw_stack: skb_load_bytes, invalid access 1 OK
    #92 raw_stack: skb_load_bytes, invalid access 2 OK
    #93 raw_stack: skb_load_bytes, invalid access 3 OK
    #94 raw_stack: skb_load_bytes, invalid access 4 OK
    #95 raw_stack: skb_load_bytes, invalid access 5 OK
    #96 raw_stack: skb_load_bytes, invalid access 6 OK
    #97 raw_stack: skb_load_bytes, large access OK
    Summary: 98 PASSED, 0 FAILED

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Remove the zero initialization in the sample programs where appropriate.
    Note that this is an optimization which is now possible, old programs
    still doing the zero initialization are just fine as well. Also, make
    sure we don't have padding issues when we don't memset() the entire
    struct anymore.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Apr, 2016

1 commit


08 Apr, 2016

2 commits

  • the first microbenchmark does
    fd=open("/proc/self/comm");
    for() {
    write(fd, "test");
    }
    and on 4 cpus in parallel:
    writes per sec
    base (no tracepoints, no kprobes) 930k
    with kprobe at __set_task_comm() 420k
    with tracepoint at task:task_rename 730k

    For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
    For tracepint bpf program does nothing, since arguments are copied by tracepoint.

    2nd microbenchmark does:
    fd=open("/dev/urandom");
    for() {
    read(fd, buf);
    }
    and on 4 cpus in parallel:
    reads per sec
    base (no tracepoints, no kprobes) 300k
    with kprobe at urandom_read() 279k
    with tracepoint at random:urandom_read 290k

    bpf progs attached to kprobe and tracepoint are noop.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • modify offwaketime to work with sched/sched_switch tracepoint
    instead of kprobe into finish_task_switch

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov