09 Mar, 2016

7 commits


20 Feb, 2016

1 commit

  • This is simplified version of Brendan Gregg's offwaketime:
    This program shows kernel stack traces and task names that were blocked and
    "off-CPU", along with the stack traces and task names for the threads that woke
    them, and the total elapsed time from when they blocked to when they were woken
    up. The combined stacks, task names, and total time is summarized in kernel
    context for efficiency.

    Example:
    $ sudo ./offwaketime | flamegraph.pl > demo.svg
    Open demo.svg in the browser as FlameGraph visualization.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

06 Feb, 2016

3 commits


18 Nov, 2015

1 commit

  • Pull networking fixes from David Miller:

    1) Fix list tests in netfilter ingress support, from Florian Westphal.

    2) Fix reversal of input and output interfaces in ingress hook
    invocation, from Pablo Neira Ayuso.

    3) We have a use after free in r8169, caught by Dave Jones, fixed by
    Francois Romieu.

    4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.

    5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
    a) Don't create clone routes not managed by the fib6 tree
    b) Don't forget to check expiration of DST_NOCACHE routes.
    c) Handle rt->dst.from == NULL properly.

    6) Several AF_PACKET fixes wrt transport header setting and SKB
    protocol setting, from Daniel Borkmann.

    7) Fix thunder driver crash on shutdown, from Pavel Fedin.

    8) Several Mellanox driver fixes (max MTU calculations, use of correct
    DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
    Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.

    9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
    certain registers, etc.) from Neil Armstrong.

    10) Make sure to disable preemption while updating per-cpu stats of ip
    tunnels, from Jason A. Donenfeld.

    11) Various ARM64 bpf JIT fixes, from Yang Shi.

    12) Flush icache properly in ARM JITs, from Daniel Borkmann.

    13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
    Nagai.

    14) Fix netdev feature propagation for devices not implementing
    ->ndo_set_features(). From Nikolay Aleksandrov.

    15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.

    16) RAW socket code increments incorrect SNMP counters, fix from Ben
    Cartwright-Cox.

    17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.

    18) Fix handling of VLAN headers on stacked devices when REORDER is
    disabled. From Vlad Yasevich.

    19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
    Sabrina Dubroca.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
    MAINTAINERS: Update Mellanox's Eth NIC driver entries
    net/core: revert "net: fix __netdev_update_features return.." and add comment
    af_unix: take receive queue lock while appending new skb
    rtnetlink: fix frame size warning in rtnl_fill_ifinfo
    net: use skb_clone to avoid alloc_pages failure.
    packet: Use PAGE_ALIGNED macro
    packet: Don't check frames_per_block against negative values
    net: phy: Use interrupts when available in NOLINK state
    phy: marvell: Add support for 88E1540 PHY
    arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
    macvlan: fix leak in macvlan_handle_frame
    ipvlan: fix use after free of skb
    ipvlan: fix leak in ipvlan_rcv_frame
    vlan: Do not put vlan headers back on bridge and macvlan ports
    vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
    via-velocity: unconditionally drop frames with bad l2 length
    ipg: Remove ipg driver
    dl2k: Add support for IP1000A-based cards
    snmp: Remove duplicate OUTMCAST stat increment
    net: thunder: Check for driver data in nicvf_remove()
    ...

    Linus Torvalds
     

17 Nov, 2015

1 commit

  • commit 338d4f49d6f7114a017d294ccf7374df4f998edc
    ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h
    into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is
    incompatible with llvm so it will cause BPF samples build failure for ARM64.
    Since sysreg.h is useless for BPF samples, just exclude it from Makefile via
    defining __ASM_SYSREG_H.

    Signed-off-by: Yang Shi
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Yang Shi
     

14 Nov, 2015

1 commit

  • Pull SCSI target updates from Nicholas Bellinger:
    "This series contains HCH's changes to absorb configfs attribute
    ->show() + ->store() function pointer usage from it's original
    tree-wide consumers, into common configfs code.

    It includes usb-gadget, target w/ drivers, netconsole and ocfs2
    changes to realize the improved simplicity, that now renders the
    original include/target/configfs_macros.h CPP magic for fabric drivers
    and others, unnecessary and obsolete.

    And with common code in place, new configfs attributes can be added
    easier than ever before.

    Note, there are further improvements in-flight from other folks for
    v4.5 code in configfs land, plus number of target fixes for post -rc1
    code"

    In the meantime, a new user of the now-removed old configfs API came in
    through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce
    an abstraction for System Trace Module devices").

    This merge resolution comes from Alexander Shishkin, who updated his stm
    class tracing abstraction to account for the removal of the old
    show_attribute and store_attribute methods in commit 517982229f78
    ("configfs: remove old API") from this pull. As Alexander says about
    that patch:

    "There's no need to keep an extra wrapper structure per item and the
    awkward show_attribute/store_attribute item ops are no longer needed.

    This patch converts policy code to the new api, all the while making
    the code quite a bit smaller and easier on the eyes.

    Signed-off-by: Alexander Shishkin "

    That patch was folded into the merge so that the tree should be fully
    bisectable.

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
    configfs: remove old API
    ocfs2/cluster: use per-attribute show and store methods
    ocfs2/cluster: move locking into attribute store methods
    netconsole: use per-attribute show and store methods
    target: use per-attribute show and store methods
    spear13xx_pcie_gadget: use per-attribute show and store methods
    dlm: use per-attribute show and store methods
    usb-gadget/f_serial: use per-attribute show and store methods
    usb-gadget/f_phonet: use per-attribute show and store methods
    usb-gadget/f_obex: use per-attribute show and store methods
    usb-gadget/f_uac2: use per-attribute show and store methods
    usb-gadget/f_uac1: use per-attribute show and store methods
    usb-gadget/f_mass_storage: use per-attribute show and store methods
    usb-gadget/f_sourcesink: use per-attribute show and store methods
    usb-gadget/f_printer: use per-attribute show and store methods
    usb-gadget/f_midi: use per-attribute show and store methods
    usb-gadget/f_loopback: use per-attribute show and store methods
    usb-gadget/ether: use per-attribute show and store methods
    usb-gadget/f_acm: use per-attribute show and store methods
    usb-gadget/f_hid: use per-attribute show and store methods
    ...

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • Pull tracking updates from Steven Rostedt:
    "Most of the changes are clean ups and small fixes. Some of them have
    stable tags to them. I searched through my INBOX just as the merge
    window opened and found lots of patches to pull. I ran them through
    all my tests and they were in linux-next for a few days.

    Features added this release:
    ----------------------------

    - Module globbing. You can now filter function tracing to several
    modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)

    - Tracer specific options are now visible even when the tracer is not
    active. It was rather annoying that you can only see and modify
    tracer options after enabling the tracer. Now they are in the
    options/ directory even when the tracer is not active. Although
    they are still only visible when the tracer is active in the
    trace_options file.

    - Trace options are now per instance (although some of the tracer
    specific options are global)

    - New tracefs file: set_event_pid. If any pid is added to this file,
    then all events in the instance will filter out events that are not
    part of this pid. sched_switch and sched_wakeup events handle next
    and the wakee pids"

    * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
    tracefs: Fix refcount imbalance in start_creating()
    tracing: Put back comma for empty fields in boot string parsing
    tracing: Apply tracer specific options from kernel command line.
    tracing: Add some documentation about set_event_pid
    ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
    tracing: Allow dumping traces without tracking trace started cpus
    ring_buffer: Fix more races when terminating the producer in the benchmark
    ring_buffer: Do no not complete benchmark reader too early
    tracing: Remove redundant TP_ARGS redefining
    tracing: Rename max_stack_lock to stack_trace_max_lock
    tracing: Allow arch-specific stack tracer
    recordmcount: arm64: Replace the ignored mcount call into nop
    recordmcount: Fix endianness handling bug for nop_mcount
    tracepoints: Fix documentation of RCU lockdep checks
    tracing: ftrace_event_is_function() can return boolean
    tracing: is_legal_op() can return boolean
    ring-buffer: rb_event_is_commit() can return boolean
    ring-buffer: rb_per_cpu_empty() can return boolean
    ring_buffer: ring_buffer_empty{cpu}() can return boolean
    ring-buffer: rb_is_reader_page() can return boolean
    ...

    Linus Torvalds
     

03 Nov, 2015

2 commits

  • This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
    and BPF_OBJ_GET commands can be used.

    Example with maps:

    # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
    bpf: map fd:3 (Success)
    bpf: pin ret:(0,Success)
    bpf: fd:3 u->(1:42) ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):42 ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
    bpf: get fd:3 (Success)
    bpf: fd:3 u->(1:24) ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):24 ret:(0,Success)

    # ./fds_example -F /sys/fs/bpf/m2 -P -m
    bpf: map fd:3 (Success)
    bpf: pin ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
    bpf: get fd:3 (Success)
    bpf: fd:3 l->(1):0 ret:(0,Success)
    # ./fds_example -F /sys/fs/bpf/m2 -G -m
    bpf: get fd:3 (Success)

    Example with progs:

    # ./fds_example -F /sys/fs/bpf/p -P -p
    bpf: prog fd:3 (Success)
    bpf: pin ret:(0,Success)
    bpf sock:4
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The commit 889204278ccf ("tracing: Update trace-event-sample with
    TRACE_SYSTEM_VAR documentation") changed TRACE_SYSTEM to 'sample-trace',
    but didn't make the according change of its name in the comments.

    Link: http://lkml.kernel.org/r/1443599650-23680-1-git-send-email-zhang.chunyan@linaro.org

    Signed-off-by: Chunyan Zhang
    Signed-off-by: Steven Rostedt

    Chunyan Zhang
     

01 Nov, 2015

1 commit


28 Oct, 2015

1 commit


22 Oct, 2015

1 commit

  • Performance test and example of bpf_perf_event_output().
    kprobe is attached to sys_write() and trivial bpf program streams
    pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event.

    Usage:
    $ sudo ./bld_x64/samples/bpf/trace_output
    recv 2968913 events per sec

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

14 Oct, 2015

1 commit

  • Remove the old show_attribute and store_attribute methods and update
    the documentation. Also replace the two C samples with a single new
    one in the proper samples directory where people expect to find it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Nicholas Bellinger

    Christoph Hellwig
     

13 Oct, 2015

1 commit

  • Add new tests samples/bpf/test_verifier:

    unpriv: return pointer
    checks that pointer cannot be returned from the eBPF program

    unpriv: add const to pointer
    unpriv: add pointer to pointer
    unpriv: neg pointer
    checks that pointer arithmetic is disallowed

    unpriv: cmp pointer with const
    unpriv: cmp pointer with pointer
    checks that comparison of pointers is disallowed
    Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'

    unpriv: check that printk is disallowed
    since bpf_trace_printk is not available to unprivileged

    unpriv: pass pointer to helper function
    checks that pointers cannot be passed to functions that expect integers
    If function expects a pointer the verifier allows only that type of pointer.
    Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
    (applies to non-root as well)

    unpriv: indirectly pass pointer on stack to helper function
    checks that pointer stored into stack cannot be used as part of key
    passed into bpf_map_lookup_elem()

    unpriv: mangle pointer on stack 1
    unpriv: mangle pointer on stack 2
    checks that writing into stack slot that already contains a pointer
    is disallowed

    unpriv: read pointer from stack in small chunks
    checks that < 8 byte read from stack slot that contains a pointer is
    disallowed

    unpriv: write pointer into ctx
    checks that storing pointers into skb->fields is disallowed

    unpriv: write pointer into map elem value
    checks that storing pointers into element values is disallowed
    For example:
    int bpf_prog(struct __sk_buff *skb)
    {
    u32 key = 0;
    u64 *value = bpf_map_lookup_elem(&map, &key);
    if (value)
    *value = (u64) skb;
    }
    will be rejected.

    unpriv: partial copy of pointer
    checks that doing 32-bit register mov from register containing
    a pointer is disallowed

    unpriv: pass pointer to tail_call
    checks that passing pointer as an index into bpf_tail_call
    is disallowed

    unpriv: cmp map pointer with zero
    checks that comparing map pointer with constant is disallowed

    unpriv: write into frame pointer
    checks that frame pointer is read-only (applies to root too)

    unpriv: cmp of frame pointer
    checks that R10 cannot be using in comparison

    unpriv: cmp of stack pointer
    checks that Rx = R10 - imm is ok, but comparing Rx is not

    unpriv: obfuscate stack pointer
    checks that Rx = R10 - imm is ok, but Rx -= imm is not

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

02 Oct, 2015

2 commits

  • Conflicts:
    net/dsa/slave.c

    net/dsa/slave.c simply had overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit 3033f14ab78c ("clone: support passing tls argument via C rather
    than pt_regs magic") introduced _do_fork() that allowed to pass @tls
    parameter.

    The old do_fork() is defined only for architectures that are not ready
    to use this way and do not define HAVE_COPY_THREAD_TLS.

    Let's use _do_fork() in the kprobe examples to make them work again on
    all architectures.

    Signed-off-by: Petr Mladek
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Thiago Macieira
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

18 Sep, 2015

1 commit

  • Existing bpf_clone_redirect() helper clones skb before redirecting
    it to RX or TX of destination netdev.
    Introduce bpf_redirect() helper that does that without cloning.

    Benchmarked with two hosts using 10G ixgbe NICs.
    One host is doing line rate pktgen.
    Another host is configured as:
    $ tc qdisc add dev $dev ingress
    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
    so it receives the packet on $dev and immediately xmits it on $dev + 1
    The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
    that does bpf_clone_redirect() and performance is 2.0 Mpps

    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
    which is using bpf_redirect() - 2.4 Mpps

    and using cls_bpf with integrated actions as:
    $ tc filter add dev $dev root pref 10 \
    bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
    performance is 2.5 Mpps

    To summarize:
    u32+act_bpf using clone_redirect - 2.0 Mpps
    u32+act_bpf using redirect - 2.4 Mpps
    cls_bpf using redirect - 2.5 Mpps

    For comparison linux bridge in this setup is doing 2.1 Mpps
    and ixgbe rx + drop in ip_rcv - 7.8 Mpps

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

13 Aug, 2015

1 commit

  • There are two improvements in this patch:
    1. Fix the build warnings;
    2. Add function read_trace_pipe() to print the result on
    the screen;

    Before this patch, we can get the result through /sys/kernel/de
    bug/tracing/trace_pipe and get nothing on the screen.
    By applying this patch, the result can be printed on the screen.
    $ ./tracex6
    ...
    tracex6-705 [003] d..1 131.428593: : CPU-3 19981414
    sshd-683 [000] d..1 131.428727: : CPU-0 221682321
    sshd-683 [000] d..1 131.428821: : CPU-0 221808766
    sshd-683 [000] d..1 131.428950: : CPU-0 221982984
    sshd-683 [000] d..1 131.429045: : CPU-0 222111851
    tracex6-705 [003] d..1 131.429168: : CPU-3 20757551
    sshd-683 [000] d..1 131.429170: : CPU-0 222281240
    sshd-683 [000] d..1 131.429261: : CPU-0 222403340
    sshd-683 [000] d..1 131.429378: : CPU-0 222561024
    ...

    Signed-off-by: Kaixu Xia
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Kaixu Xia
     

10 Aug, 2015

1 commit


27 Jul, 2015

1 commit


23 Jul, 2015

1 commit


18 Jul, 2015

1 commit

  • He Kuang noticed that the trace event samples for arrays was broken:

    "The output result of trace_foo_bar event in traceevent samples is
    wrong. This problem can be reproduced as following:

    (Build kernel with SAMPLE_TRACE_EVENTS=m)

    $ insmod trace-events-sample.ko

    $ echo 1 > /sys/kernel/debug/tracing/events/sample-trace/foo_bar/enable

    $ cat /sys/kernel/debug/tracing/trace

    event-sample-980 [000] .... 43.649559: foo_bar: foo hello 21 0x15
    BIT1|BIT3|0x10 {0x1,0x6f6f6e53,0xff007970,0xffffffff} Snoopy
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    The array length is not right, should be {0x1}.
    (ffffffff,ffffffff)

    event-sample-980 [000] .... 44.653827: foo_bar: foo hello 22 0x16
    BIT2|BIT3|0x10
    {0x1,0x2,0x646e6147,0x666c61,0xffffffff,0xffffffff,0x750aeffe,0x7}
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    The array length is not right, should be {0x1,0x2}.
    Gandalf (ffffffff,ffffffff)"

    This was caused by an update to have __print_array()'s second parameter
    be the count of items in the array and not the size of the array.

    As there is already users of __print_array(), it can not change. But
    the sample code can and we can also improve on the documentation about
    __print_array() and __get_dynamic_array_len().

    Link: http://lkml.kernel.org/r/1436839171-31527-2-git-send-email-hekuang@huawei.com

    Fixes: ac01ce1410fc2 ("tracing: Make ftrace_print_array_seq compute buf_len")
    Reported-by: He Kuang
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

09 Jul, 2015

1 commit


23 Jun, 2015

1 commit

  • BPF offers another way to generate latency histograms. We attach
    kprobes at trace_preempt_off and trace_preempt_on and calculate the
    time it takes to from seeing the off/on transition.

    The first array is used to store the start time stamp. The key is the
    CPU id. The second array stores the log2(time diff). We need to use
    static allocation here (array and not hash tables). The kprobes
    hooking into trace_preempt_on|off should not calling any dynamic
    memory allocation or free path. We need to avoid recursivly
    getting called. Besides that, it reduces jitter in the measurement.

    CPU 0
    latency : count distribution
    1 -> 1 : 0 | |
    2 -> 3 : 0 | |
    4 -> 7 : 0 | |
    8 -> 15 : 0 | |
    16 -> 31 : 0 | |
    32 -> 63 : 0 | |
    64 -> 127 : 0 | |
    128 -> 255 : 0 | |
    256 -> 511 : 0 | |
    512 -> 1023 : 0 | |
    1024 -> 2047 : 0 | |
    2048 -> 4095 : 166723 |*************************************** |
    4096 -> 8191 : 19870 |*** |
    8192 -> 16383 : 6324 | |
    16384 -> 32767 : 1098 | |
    32768 -> 65535 : 190 | |
    65536 -> 131071 : 179 | |
    131072 -> 262143 : 18 | |
    262144 -> 524287 : 4 | |
    524288 -> 1048575 : 1363 | |
    CPU 1
    latency : count distribution
    1 -> 1 : 0 | |
    2 -> 3 : 0 | |
    4 -> 7 : 0 | |
    8 -> 15 : 0 | |
    16 -> 31 : 0 | |
    32 -> 63 : 0 | |
    64 -> 127 : 0 | |
    128 -> 255 : 0 | |
    256 -> 511 : 0 | |
    512 -> 1023 : 0 | |
    1024 -> 2047 : 0 | |
    2048 -> 4095 : 114042 |*************************************** |
    4096 -> 8191 : 9587 |** |
    8192 -> 16383 : 4140 | |
    16384 -> 32767 : 673 | |
    32768 -> 65535 : 179 | |
    65536 -> 131071 : 29 | |
    131072 -> 262143 : 4 | |
    262144 -> 524287 : 1 | |
    524288 -> 1048575 : 364 | |
    CPU 2
    latency : count distribution
    1 -> 1 : 0 | |
    2 -> 3 : 0 | |
    4 -> 7 : 0 | |
    8 -> 15 : 0 | |
    16 -> 31 : 0 | |
    32 -> 63 : 0 | |
    64 -> 127 : 0 | |
    128 -> 255 : 0 | |
    256 -> 511 : 0 | |
    512 -> 1023 : 0 | |
    1024 -> 2047 : 0 | |
    2048 -> 4095 : 40147 |*************************************** |
    4096 -> 8191 : 2300 |* |
    8192 -> 16383 : 828 | |
    16384 -> 32767 : 178 | |
    32768 -> 65535 : 59 | |
    65536 -> 131071 : 2 | |
    131072 -> 262143 : 0 | |
    262144 -> 524287 : 1 | |
    524288 -> 1048575 : 174 | |
    CPU 3
    latency : count distribution
    1 -> 1 : 0 | |
    2 -> 3 : 0 | |
    4 -> 7 : 0 | |
    8 -> 15 : 0 | |
    16 -> 31 : 0 | |
    32 -> 63 : 0 | |
    64 -> 127 : 0 | |
    128 -> 255 : 0 | |
    256 -> 511 : 0 | |
    512 -> 1023 : 0 | |
    1024 -> 2047 : 0 | |
    2048 -> 4095 : 29626 |*************************************** |
    4096 -> 8191 : 2704 |** |
    8192 -> 16383 : 1090 | |
    16384 -> 32767 : 160 | |
    32768 -> 65535 : 72 | |
    65536 -> 131071 : 32 | |
    131072 -> 262143 : 26 | |
    262144 -> 524287 : 12 | |
    524288 -> 1048575 : 298 | |

    All this is based on the trace3 examples written by
    Alexei Starovoitov .

    Signed-off-by: Daniel Wagner
    Cc: Alexei Starovoitov
    Cc: Alexei Starovoitov
    Cc: "David S. Miller"
    Cc: Daniel Borkmann
    Cc: Ingo Molnar
    Cc: linux-kernel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Acked-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Wagner
     

16 Jun, 2015

1 commit

  • eBPF programs attached to kprobes need to filter based on
    current->pid, uid and other fields, so introduce helper functions:

    u64 bpf_get_current_pid_tgid(void)
    Return: current->tgid << 32 | current->pid

    u64 bpf_get_current_uid_gid(void)
    Return: current_gid << 32 | current_uid

    bpf_get_current_comm(char *buf, int size_of_buf)
    stores current->comm into buf

    They can be used from the programs attached to TC as well to classify packets
    based on current task fields.

    Update tracex2 example to print histogram of write syscalls for each process
    instead of aggregated for all.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

07 Jun, 2015

2 commits

  • allow programs read/write skb->mark, tc_index fields and
    ((struct qdisc_skb_cb *)cb)->data.

    mark and tc_index are generically useful in TC.
    cb[0]-cb[4] are primarily used to pass arguments from one
    program to another called via bpf_tail_call() which can
    be seen in sockex3_kern.c example.

    All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
    mark, tc_index are writeable from tc_cls_act only.
    cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

    Add verifier tests and improve sample code.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
    For ingress L2 header is already pulled, whereas for egress it's present.
    This is known to program writers which are currently forced to use
    BPF_LL_OFF workaround.
    Since programs don't change skb internal pointers it is safe to do
    pull/push right around invocation of the program and earlier taps and
    later pt->func() will not be affected.
    Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
    around run_filter/BPF_PROG_RUN even if skb_shared.

    This fix finally allows programs to use optimized LD_ABS/IND instructions
    without BPF_LL_OFF for higher performance.
    tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
    w/o JIT w/JIT
    before 20.5 23.6 Mpps
    after 21.8 26.6 Mpps

    Old programs with BPF_LL_OFF will still work as-is.

    We can now undo most of the earlier workaround commit:
    a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative offsets")

    Signed-off-by: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

23 May, 2015

5 commits

  • This script pktgen_bench_xmit_mode_netif_receive.sh is a benchmark
    script, which can be used for benchmarking part of the network stack.
    This can be used for performance improving or catching regression in
    that area.

    The script is developed for benchmarking ingress qdisc path, original
    idea by Alexei Starovoitov. This script don't really need any
    hardware. This is achieved via the recently introduced stack inject
    feature "xmit_mode netif_receive". See commit 62f64aed622b6 ("pktgen:
    introduce xmit_mode ''").

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Add the pktgen samples script pktgen_sample03_burst_single_flow.sh
    that demonstrates how to acheive maximum performance.

    If correctly tuned[1] single CPU 10Gbit/s wirespeed small pkts is
    possible[2] which is 14.88Mpps. The trick is to take advantage of the
    "burst" feature introduced in commit 38b2cf2982dc73 ("net: pktgen:
    packet bursting via skb->xmit_more").

    [1] http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html
    [2] http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Add the pktgen samples script pktgen_sample02_multiqueue.sh that
    demonstrates generating packets on multiqueue NICs.

    Specifically notice the options "-t" that specifies how many
    kernel threads to activate. Also notice the flag QUEUE_MAP_CPU,
    which cause the SKB TX queue to be mapped to the CPU running the
    kernel thread. For best scalability people are also encourage to
    map NIC IRQ /proc/irq/*/smp_affinity to CPU number.

    Usage example with "-t" 4 threads and help:
    ./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4

    Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
    -i : ($DEV) output interface/device (required)
    -s : ($PKT_SIZE) packet size
    -d : ($DEST_IP) destination IP
    -m : ($DST_MAC) destination MAC-addr
    -t : ($THREADS) threads to start
    -c : ($SKB_CLONE) SKB clones send before alloc new SKB
    -b : ($BURST) HW level bursting of SKBs
    -v : ($VERBOSE) verbose
    -x : ($DEBUG) debug

    Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
    should be covered now.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Add the first basic pktgen samples script pktgen_sample01_simple.sh,
    which demonstrates the a simple use of the helper functions.
    Removing pktgen.conf-1-1 as that example should be covered now.

    The naming scheme pktgen_sampleNN, where NN is a number, should encourage
    reading the samples in a specific order.

    Script cause pktgen sending with a single thread and single interface,
    and introduce flow variation via random UDP source port.

    Usage example and help:
    ./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2

    Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
    -i : ($DEV) output interface/device (required)
    -s : ($PKT_SIZE) packet size
    -d : ($DEST_IP) destination IP
    -m : ($DST_MAC) destination MAC-addr
    -c : ($SKB_CLONE) SKB clones send before alloc new SKB
    -v : ($VERBOSE) verbose
    -x : ($DEBUG) debug

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Preparing for removing existing samples/pktgen/ scripts, and
    replacing these with easier to use samples.

    This commit provides two helper shell files, that can
    be "included" by shell source'ing. Namely "functions.sh"
    and "parameters.sh".

    The parameters.sh file support easy and consistant parameter
    parsing across the sample scripts. Usage example is printed on
    errors.

    The functions.sh file provides, three new shell functions for
    configuring the different components of pktgen: pg_ctrl(),
    pg_thread() and pg_set(). A slightly improved version of the old
    pgset() function is also provided for backwards compat.

    The new functions correspond to pktgens different components.
    * pg_ctrl() control "pgctrl" (/proc/net/pktgen/pgctrl)
    * pg_thread() control the kernel threads and binding to devices
    * pg_set() control setup of individual devices

    These changes are borrowed from:
    https://github.com/netoptimizer/network-testing/tree/master/pktgen

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer