16 Sep, 2016

2 commits

  • We have a small skb_at_tc_ingress() helper for testing for ingress, so
    make use of it. cls_bpf already uses it and so should act_bpf.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • The skb_mac_header_was_set() test in cls_bpf's and act_bpf's fast-path is
    actually unnecessary and can be removed altogether. This was added by
    commit a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative
    offsets"), which was later on improved by 3431205e0397 ("bpf: make programs
    see skb->data == L2 for ingress and egress"). We're always guaranteed to
    have valid mac header at the time we invoke cls_bpf_classify() or tcf_bpf().

    Reason is that since 6d1ccff62780 ("net: reset mac header in dev_start_xmit()")
    we do skb_reset_mac_header() in __dev_queue_xmit() before we could call
    into sch_handle_egress() or any subsequent enqueue. sch_handle_ingress()
    always sees a valid mac header as well (things like skb_reset_mac_len()
    would badly fail otherwise). Thus, drop the unnecessary test in classifier
    and action case.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

26 Jul, 2016

1 commit

  • struct tc_action is confusing, currently we use it for two purposes:
    1) Pass in arguments and carry out results from helper functions
    2) A generic representation for tc actions

    The first one is error-prone, since we need to make sure we don't
    miss anything. This patch aims to get rid of this use, by moving
    tc_action into tcf_common, so that they are allocated together
    in hashtable and can be cast'ed easily.

    And together with the following patch, we could really make
    tc_action a generic representation for all tc actions and each
    type of action can inherit from it.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

02 Jul, 2016

1 commit

  • Since bpf_prog_get() and program type check is used in a couple of places,
    refactor this into a small helper function that we can make use of. Since
    the non RO prog->aux part is not used in performance critical paths and a
    program destruction via RCU is rather very unlikley when doing the put, we
    shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
    check, but actually not taking the ref at all (due to being in fdget() /
    fdput() section of the bpf fd) is even cleaner and makes the diff smaller
    as well, so just go for that. Callsites are changed to make use of the new
    helper where possible.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

08 Jun, 2016

3 commits


07 May, 2016

1 commit

  • allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
    The bpf helpers that change skb->data need to update data_end pointer as well.
    The verifier checks that programs always reload data, data_end pointers
    after calls to such bpf helpers.
    We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
    since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

27 Apr, 2016

1 commit


26 Feb, 2016

1 commit

  • Currently tc actions are stored in a per-module hashtable,
    therefore are visible to all network namespaces. This is
    probably the last part of the tc subsystem which is not
    aware of netns now. This patch makes them per-netns,
    several tc action API's need to be adjusted for this.

    The tc action API code is ugly due to historical reasons,
    we need to refactor that code in the future.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

18 Sep, 2015

1 commit

  • Existing bpf_clone_redirect() helper clones skb before redirecting
    it to RX or TX of destination netdev.
    Introduce bpf_redirect() helper that does that without cloning.

    Benchmarked with two hosts using 10G ixgbe NICs.
    One host is doing line rate pktgen.
    Another host is configured as:
    $ tc qdisc add dev $dev ingress
    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
    so it receives the packet on $dev and immediately xmits it on $dev + 1
    The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
    that does bpf_clone_redirect() and performance is 2.0 Mpps

    $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
    action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
    which is using bpf_redirect() - 2.4 Mpps

    and using cls_bpf with integrated actions as:
    $ tc filter add dev $dev root pref 10 \
    bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
    performance is 2.5 Mpps

    To summarize:
    u32+act_bpf using clone_redirect - 2.0 Mpps
    u32+act_bpf using redirect - 2.4 Mpps
    cls_bpf using redirect - 2.5 Mpps

    For comparison linux bridge in this setup is doing 2.1 Mpps
    and ixgbe rx + drop in ip_rcv - 7.8 Mpps

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

27 Aug, 2015

2 commits

  • Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
    with extra care taken to free bpf programs after rcu grace period.
    Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
    and final destruction is done from tc_action_ops->cleanup() callback that is
    called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
    bind and refcnt reach zero which is only possible when classifier is destroyed.
    Previous two patches fixed the last two classifiers (tcindex and rsvp) to
    call tcf_exts_destroy() from rcu callback.

    Similar to gact/mirred there is a race between prog->filter and
    prog->tcf_action. Meaning that the program being replaced may use
    previous default action if it happened to return TC_ACT_UNSPEC.
    act_mirred race betwen tcf_action and tcfm_dev is similar.
    In all cases the race is harmless.
    Long term we may want to improve the situation by replacing the whole
    tc_action->priv as single pointer instead of updating inner fields one by one.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Fix harmless typo and avoid unnecessary copy of empty 'prog' into
    unused 'strcut tcf_bpf_cfg old'.

    Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

04 Aug, 2015

1 commit

  • Since the introduction of the BPF action in d23b8ad8ab23 ("tc: add BPF
    based action"), late binding was not working as expected. I.e. setting
    the action part for a classifier only via 'bpf index ', where
    is the index of an existing action, is being rejected by the kernel due
    to other missing parameters.

    It doesn't make sense to require these parameters such as BPF opcodes
    etc, as they are not going to be used anyway: in this case, they're just
    allocated/parsed and then freed again w/o doing anything meaningful.

    Instead, parse and verify the remaining parameters *after* the test on
    tcf_hash_check(), when we really know that we're dealing with creation
    of a new action or replacement of an existing one and where late binding
    is thus irrelevant.

    After patch, test case is now working:

    FOO="1,6 0 0 4294967295,"
    tc actions add action bpf bytecode "$FOO"
    tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action bpf index 1
    tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 2 bind 1
    tc filter show dev foo
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295'
    action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 2 bind 1

    Late binding of a BPF action can be useful for preloading maps (e.g. before
    they hit traffic) in case of eBPF programs, or to share a single eBPF action
    with multiple classifiers.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

01 Aug, 2015

1 commit


30 Jul, 2015

1 commit

  • We currently trigger multiple memory leaks when replacing bpf
    actions, besides others:

    comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
    hex dump (first 32 bytes):
    01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
    18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............
    backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] __vmalloc_node_range+0x1bd/0x2c0
    [] __vmalloc+0x4a/0x50
    [] bpf_prog_alloc+0x3a/0xa0
    [] bpf_prog_create+0x44/0xa0
    [] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
    [] tcf_action_init_1+0x191/0x1b0
    [] tcf_action_init+0x82/0xf0
    [] tcf_exts_validate+0xb2/0xc0
    [] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
    [] cls_bpf_change+0x1a6/0x274 [cls_bpf]
    [] tc_ctl_tfilter+0x335/0x910
    [] rtnetlink_rcv_msg+0x95/0x240
    [] netlink_rcv_skb+0xaf/0xc0
    [] rtnetlink_rcv+0x2e/0x40
    [] netlink_unicast+0xef/0x1b0

    Issue is that the old content from tcf_bpf is allocated and needs
    to be released when we replace it. We seem to do that since the
    beginning of act_bpf on the filter and insns, later on the name as
    well.

    Example test case, after patch:

    # FOO="1,6 0 0 4294967295,"
    # BAR="1,6 0 0 4294967294,"
    # tc actions add action bpf bytecode "$FOO" index 2
    # tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 2 ref 1 bind 0
    # tc actions replace action bpf bytecode "$BAR" index 2
    # tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
    index 2 ref 1 bind 0
    # tc actions replace action bpf bytecode "$FOO" index 2
    # tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 2 ref 1 bind 0
    # tc actions del action bpf index 2
    [...]
    # echo "scan" > /sys/kernel/debug/kmemleak
    # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
    0

    Fixes: d23b8ad8ab23 ("tc: add BPF based action")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

23 Jul, 2015

1 commit


16 Jul, 2015

1 commit

  • prog->bpf_ops is populated when act_bpf is used with classic BPF and
    prog->bpf_name is optionally used with extended BPF.
    Fix memory leak when act_bpf is released.

    Fixes: d23b8ad8ab23 ("tc: add BPF based action")
    Fixes: a8cb5f556b56 ("act_bpf: add initial eBPF support for actions")
    Acked-by: Daniel Borkmann
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

09 Jul, 2015

1 commit

  • Reuse existing percpu infrastructure John Fastabend added for qdisc.

    This patch adds a new cpustats parameter to tcf_hash_create() and all
    actions pass false, meaning this patch should have no effect yet.

    Signed-off-by: Eric Dumazet
    Cc: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Jun, 2015

1 commit

  • eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
    For ingress L2 header is already pulled, whereas for egress it's present.
    This is known to program writers which are currently forced to use
    BPF_LL_OFF workaround.
    Since programs don't change skb internal pointers it is safe to do
    pull/push right around invocation of the program and earlier taps and
    later pt->func() will not be affected.
    Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
    around run_filter/BPF_PROG_RUN even if skb_shared.

    This fix finally allows programs to use optimized LD_ABS/IND instructions
    without BPF_LL_OFF for higher performance.
    tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
    w/o JIT w/JIT
    before 20.5 23.6 Mpps
    after 21.8 26.6 Mpps

    Old programs with BPF_LL_OFF will still work as-is.

    We can now undo most of the earlier workaround commit:
    a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative offsets")

    Signed-off-by: Alexei Starovoitov
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

17 Apr, 2015

1 commit

  • For the short-term solution, lets fix bpf helper functions to use
    skb->mac_header relative offsets instead of skb->data in order to
    get the same eBPF programs with cls_bpf and act_bpf work on ingress
    and egress qdisc path. We need to ensure that mac_header is set
    before calling into programs. This is effectively the first option
    from below referenced discussion.

    More long term solution for LD_ABS|LD_IND instructions will be more
    intrusive but also more beneficial than this, and implemented later
    as it's too risky at this point in time.

    I.e., we plan to look into the option of moving skb_pull() out of
    eth_type_trans() and into netif_receive_skb() as has been suggested
    as second option. Meanwhile, this solution ensures ingress can be
    used with eBPF, too, and that we won't run into ABI troubles later.
    For dealing with negative offsets inside eBPF helper functions,
    we've implemented bpf_skb_clone_unwritable() to test for unwriteable
    headers.

    Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
    Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
    Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

21 Mar, 2015

1 commit

  • This work extends the "classic" BPF programmable tc action by extending
    its scope also to native eBPF code!

    Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support
    for programmable classifiers") this adds the facility to implement fully
    flexible classifier and actions for tc that can be implemented in a C
    subset in user space, "safely" loaded into the kernel, and being run in
    native speed when JITed.

    Also, since eBPF maps can be shared between eBPF programs, it offers the
    possibility that cls_bpf and act_bpf can share data 1) between themselves
    and 2) between user space applications. That means that, f.e. customized
    runtime statistics can be collected in user space, but also more importantly
    classifier and action behaviour could be altered based on map input from
    the user space application.

    For the remaining details on the workflow and integration, see the cls_bpf
    commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1].

    [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act

    Signed-off-by: Daniel Borkmann
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Acked-by: Jiri Pirko
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

18 Mar, 2015

1 commit

  • Revisiting commit d23b8ad8ab23 ("tc: add BPF based action") with regards
    to eBPF support, I was thinking that it might be better to improve
    return semantics from a BPF program invoked through BPF_PROG_RUN().

    Currently, in case filter_res is 0, we overwrite the default action
    opcode with TC_ACT_SHOT. A default action opcode configured through tc's
    m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC,
    TC_ACT_OK.

    In cls_bpf, we have the possibility to overwrite the default class
    associated with the classifier in case filter_res is _not_ 0xffffffff
    (-1).

    That allows us to fold multiple [e]BPF programs into a single one, where
    they would otherwise need to be defined as a separate classifier with
    its own classid, needlessly redoing parsing work, etc.

    Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes
    are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode
    mapping, where we would allow above possibilities. Thus, like in cls_bpf,
    a filter_res of 0xffffffff (-1) means that the configured _default_ action
    is used. Any unkown return code from the BPF program would fail in
    tcf_bpf() with TC_ACT_UNSPEC.

    Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED,
    which both have the same semantics, we have the option to either use
    that as a default action (filter_res of 0xffffffff) or non-default BPF
    return code.

    All that will allow us to transparently use tcf_bpf() for both BPF
    flavours.

    Signed-off-by: Daniel Borkmann
    Cc: Jiri Pirko
    Cc: Alexei Starovoitov
    Cc: Jamal Hadi Salim
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

27 Jan, 2015

1 commit


18 Jan, 2015

1 commit