13 Feb, 2017

1 commit

  • If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
    to the given cgroup the descendent cgroup will be able to override
    effective bpf program that was inherited from this cgroup.
    By default it's not passed, therefore override is disallowed.

    Examples:
    1.
    prog X attached to /A with default
    prog Y fails to attach to /A/B and /A/B/C
    Everything under /A runs prog X

    2.
    prog X attached to /A with allow_override.
    prog Y fails to attach to /A/B with default (non-override)
    prog M attached to /A/B with allow_override.
    Everything under /A/B runs prog M only.

    3.
    prog X attached to /A with allow_override.
    prog Y fails to attach to /A with default.
    The user has to detach first to switch the mode.

    In the future this behavior may be extended with a chain of
    non-overridable programs.

    Also fix the bug where detach from cgroup where nothing is attached
    was not throwing error. Return ENOENT in such case.

    Add several testcases and adjust libbpf.

    Fixes: 3007098494be ("cgroup: add support for eBPF programs")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: Tejun Heo
    Acked-by: Daniel Mack
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

20 Dec, 2016

1 commit

  • Now that libbpf under tools/lib/bpf/* is synced with the version from
    samples/bpf, we can get rid most of the libbpf library here.

    Committer notes:

    Built it in a docker fedora rawhide container and ran it in the f25 host, seems
    to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
    doesn't seem to have introduced problems and Joe said he tested it with
    all the entries in samples/bpf/ and other code he found:

    [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install

    [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
    [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
    make[1]: Entering directory '/tmp/build/linux'
    CHK include/config/kernel.release
    HOSTCC scripts/basic/fixdep
    GEN ./Makefile
    CHK include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK include/generated/utsrelease.h
    HOSTCC scripts/basic/bin2c
    HOSTCC arch/x86/tools/relocs_32.o
    HOSTCC arch/x86/tools/relocs_64.o
    LD samples/bpf/built-in.o

    HOSTCC samples/bpf/fds_example.o
    HOSTCC samples/bpf/sockex1_user.o
    /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
    /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
    insns, insns_cnt, "GPL", 0,
    ^~~~~
    In file included from /git/linux/samples/bpf/libbpf.h:5:0,
    from /git/linux/samples/bpf/bpf_load.h:4,
    from /git/linux/samples/bpf/fds_example.c:15:
    /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
    int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
    ^~~~~~~~~~~~~~~~
    HOSTCC samples/bpf/sockex2_user.o

    HOSTCC samples/bpf/xdp_tx_iptunnel_user.o
    clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD samples/bpf/tc_l2_redirect

    HOSTLD samples/bpf/lwt_len_hist
    HOSTLD samples/bpf/xdp_tx_iptunnel
    make[1]: Leaving directory '/tmp/build/linux'
    [root@f5065a7d6272 linux]#

    And then, in the host:

    [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
    /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
    [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
    [root@jouet bpf]# file offwaketime
    offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
    [root@jouet bpf]# readelf -SW offwaketime
    offwaketime offwaketime_kern.o offwaketime_user.o
    [root@jouet bpf]# readelf -SW offwaketime_kern.o
    There are 11 section headers, starting at offset 0x700:

    Section Headers:
    [Nr] Name Type Address Off Size ES Flg Lk Inf Al
    [ 0] NULL 0000000000000000 000000 000000 00 0 0 0
    [ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1
    [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
    [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
    [ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8
    [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
    [ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8
    [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
    [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
    [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
    [10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8
    Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
    qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
    firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
    swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
    [root@jouet bpf]#

    Signed-off-by: Joe Stringer
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Cc: netdev@vger.kernel.org
    Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
    Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
    [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     

16 Dec, 2016

1 commit

  • Switch all of the sample code to use the function names from
    tools/lib/bpf so that they're consistent with that, and to declare their
    own log buffers. This allow the next commit to be purely devoted to
    getting rid of the duplicate library in samples/bpf.

    Committer notes:

    Testing it:

    On a fedora rawhide container, with clang/llvm 3.9, sharing the host
    linux kernel git tree:

    # make O=/tmp/build/linux/ headers_install
    # make O=/tmp/build/linux -C samples/bpf/

    Since I forgot to make it privileged, just tested it outside the
    container, using what it generated:

    # uname -a
    Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
    # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
    # ls -la offwaketime
    -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
    # file offwaketime
    offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
    # readelf -SW offwaketime_kern.o | grep PROGBITS
    [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
    [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
    [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
    [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
    [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
    [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
    # ./offwaketime | head -5
    swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
    CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
    Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
    firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
    JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
    #

    Signed-off-by: Joe Stringer
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     

30 Nov, 2016

1 commit

  • This patch modifies test_cgrp2_attach to use getopt so we can use standard
    command line parsing.

    It also adds an option to run the program in detach only mode. This does
    not attach a new filter at the cgroup, but only runs the detach command.

    Lastly, it changes the attach code to not detach and then attach. It relies
    on the 'hotswap' behaviour of CGroup BPF programs to be able to change
    in-place. If detach-then-attach behaviour needs to be tested, the example
    can be run in detach only mode prior to attachment.

    Signed-off-by: Sargun Dhillon
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Sargun Dhillon
     

26 Nov, 2016

1 commit

  • Add a simple userpace program to demonstrate the new API to attach eBPF
    programs to cgroups. This is what it does:

    * Create arraymap in kernel with 4 byte keys and 8 byte values

    * Load eBPF program

    The eBPF program accesses the map passed in to store two pieces of
    information. The number of invocations of the program, which maps
    to the number of packets received, is stored to key 0. Key 1 is
    incremented on each iteration by the number of bytes stored in
    the skb.

    * Detach any eBPF program previously attached to the cgroup

    * Attach the new program to the cgroup using BPF_PROG_ATTACH

    * Once a second, read map[0] and map[1] to see how many bytes and
    packets were seen on any socket of tasks in the given cgroup.

    The program takes a cgroup path as 1st argument, and either "ingress"
    or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
    which will make the generated eBPF program return 0 instead of 1, so
    the kernel will drop the packet.

    libbpf gained two new wrappers for the new syscall commands.

    Signed-off-by: Daniel Mack
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Mack