Eric Lee / smarc-fsl-linux-kernel

16 Sep, 2016

2 commits

f53d8c7b1 bpf: use skb_at_tc_ingress helper in tcf_bpf ... Browse Code »

We have a small skb_at_tc_ingress() helper for testing for ingress, so
make use of it. cls_bpf already uses it and so should act_bpf.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
9 years ago
04b3f8de4 bpf: drop unnecessary test in cls_bpf_classify and tcf_bpf ... Browse Code »

The skb_mac_header_was_set() test in cls_bpf's and act_bpf's fast-path is
actually unnecessary and can be removed altogether. This was added by
commit a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative
offsets"), which was later on improved by 3431205e0397 ("bpf: make programs
see skb->data == L2 for ingress and egress"). We're always guaranteed to
have valid mac header at the time we invoke cls_bpf_classify() or tcf_bpf().

Reason is that since 6d1ccff62780 ("net: reset mac header in dev_start_xmit()")
we do skb_reset_mac_header() in __dev_queue_xmit() before we could call
into sch_handle_egress() or any subsequent enqueue. sch_handle_ingress()
always sees a valid mac header as well (things like skb_reset_mac_len()
would badly fail otherwise). Thus, drop the unnecessary test in classifier
and action case.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
9 years ago

26 Jul, 2016

1 commit

a85a970af net_sched: move tc_action into tcf_common ... Browse Code »

struct tc_action is confusing, currently we use it for two purposes:
1) Pass in arguments and carry out results from helper functions
2) A generic representation for tc actions

The first one is error-prone, since we need to make sure we don't
miss anything. This patch aims to get rid of this use, by moving
tc_action into tcf_common, so that they are allocated together
in hashtable and can be cast'ed easily.

And together with the following patch, we could really make
tc_action a generic representation for all tc actions and each
type of action can inherit from it.

Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
9 years ago

02 Jul, 2016

1 commit

113214be7 bpf: refactor bpf_prog_get and type check into helper ... Browse Code »

Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
9 years ago

08 Jun, 2016

3 commits

0b0f43fe2 net sched: indentation and other OCD stylistic fixes ... Browse Code »

Signed-off-by: Jamal Hadi Salim
Acked-by: Cong Wang

Jamal Hadi Salim
9 years ago
48d8ee169 net sched actions: aggregate dumping of actions timeinfo ... Browse Code »

Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
9 years ago
53eb440f4 net sched actions: introduce timestamp for firsttime use ... Browse Code »

Useful to know when the action was first used for accounting
(and debugging)

Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
9 years ago

07 May, 2016

1 commit

db58ba459 bpf: wire in data and data_end for cls_act_bpf ... Browse Code »

allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
The bpf helpers that change skb->data need to update data_end pointer as well.
The verifier checks that programs always reload data, data_end pointers
after calls to such bpf helpers.
We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.

Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
9 years ago

27 Apr, 2016

1 commit

9854518ea sched: align nlattr properly when needed ... Browse Code »

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
9 years ago

26 Feb, 2016

1 commit

ddf97ccdd net_sched: add network namespace support for tc actions ... Browse Code »

Currently tc actions are stored in a per-module hashtable,
therefore are visible to all network namespaces. This is
probably the last part of the tc subsystem which is not
aware of netns now. This patch makes them per-netns,
several tc action API's need to be adjusted for this.

The tc action API code is ugly due to historical reasons,
we need to refactor that code in the future.

Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
9 years ago

18 Sep, 2015

1 commit

27b29f630 bpf: add bpf_redirect() helper ... Browse Code »

Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.

Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps

$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps

and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps

To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps

For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps

Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago

27 Aug, 2015

2 commits

cff82457c net_sched: act_bpf: remove spinlock in fast path ... Browse Code »

Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
with extra care taken to free bpf programs after rcu grace period.
Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
and final destruction is done from tc_action_ops->cleanup() callback that is
called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
bind and refcnt reach zero which is only possible when classifier is destroyed.
Previous two patches fixed the last two classifiers (tcindex and rsvp) to
call tcf_exts_destroy() from rcu callback.

Similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
tc_action->priv as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago
faa54be4c net_sched: act_bpf: remove unnecessary copy ... Browse Code »

Fix harmless typo and avoid unnecessary copy of empty 'prog' into
unused 'strcut tcf_bpf_cfg old'.

Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago

04 Aug, 2015

1 commit

a5c90b29e act_bpf: properly support late binding of bpf action to a classifier ... Browse Code »

Since the introduction of the BPF action in d23b8ad8ab23 ("tc: add BPF
based action"), late binding was not working as expected. I.e. setting
the action part for a classifier only via 'bpf index ', where
is the index of an existing action, is being rejected by the kernel due
to other missing parameters.

It doesn't make sense to require these parameters such as BPF opcodes
etc, as they are not going to be used anyway: in this case, they're just
allocated/parsed and then freed again w/o doing anything meaningful.

Instead, parse and verify the remaining parameters *after* the test on
tcf_hash_check(), when we really know that we're dealing with creation
of a new action or replacement of an existing one and where late binding
is thus irrelevant.

After patch, test case is now working:

FOO="1,6 0 0 4294967295,"
tc actions add action bpf bytecode "$FOO"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action bpf index 1
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 1 ref 2 bind 1
tc filter show dev foo
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295'
action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 1 ref 2 bind 1

Late binding of a BPF action can be useful for preloading maps (e.g. before
they hit traffic) in case of eBPF programs, or to share a single eBPF action
with multiple classifiers.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
10 years ago

01 Aug, 2015

1 commit

5510b3c2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.c

All four conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller

David S. Miller
10 years ago

30 Jul, 2015

1 commit

f4eaed28c act_bpf: fix memory leaks when replacing bpf programs ... Browse Code »

We currently trigger multiple memory leaks when replacing bpf
actions, besides others:

comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s)
hex dump (first 32 bytes):
01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............
backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] __vmalloc_node_range+0x1bd/0x2c0
[] __vmalloc+0x4a/0x50
[] bpf_prog_alloc+0x3a/0xa0
[] bpf_prog_create+0x44/0xa0
[] tcf_bpf_init+0x28b/0x3c0 [act_bpf]
[] tcf_action_init_1+0x191/0x1b0
[] tcf_action_init+0x82/0xf0
[] tcf_exts_validate+0xb2/0xc0
[] cls_bpf_modify_existing+0x98/0x340 [cls_bpf]
[] cls_bpf_change+0x1a6/0x274 [cls_bpf]
[] tc_ctl_tfilter+0x335/0x910
[] rtnetlink_rcv_msg+0x95/0x240
[] netlink_rcv_skb+0xaf/0xc0
[] rtnetlink_rcv+0x2e/0x40
[] netlink_unicast+0xef/0x1b0

Issue is that the old content from tcf_bpf is allocated and needs
to be released when we replace it. We seem to do that since the
beginning of act_bpf on the filter and insns, later on the name as
well.

Example test case, after patch:

# FOO="1,6 0 0 4294967295,"
# BAR="1,6 0 0 4294967294,"
# tc actions add action bpf bytecode "$FOO" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 2 ref 1 bind 0
# tc actions replace action bpf bytecode "$BAR" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
index 2 ref 1 bind 0
# tc actions replace action bpf bytecode "$FOO" index 2
# tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 2 ref 1 bind 0
# tc actions del action bpf index 2
[...]
# echo "scan" > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l
0

Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
10 years ago

23 Jul, 2015

1 commit

c5e40ee28 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
10 years ago

16 Jul, 2015

1 commit

ddf06c1e5 tc: act_bpf: fix memory leak ... Browse Code »

prog->bpf_ops is populated when act_bpf is used with classic BPF and
prog->bpf_name is optionally used with extended BPF.
Fix memory leak when act_bpf is released.

Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Fixes: a8cb5f556b56 ("act_bpf: add initial eBPF support for actions")
Acked-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago

09 Jul, 2015

1 commit

519c818e8 net: sched: add percpu stats to actions ... Browse Code »

Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet
Cc: Alexei Starovoitov
Acked-by: Jamal Hadi Salim
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Eric Dumazet
10 years ago

07 Jun, 2015

1 commit

3431205e0 bpf: make programs see skb->data == L2 for ingress and egress ... Browse Code »

eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt->func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
w/o JIT w/JIT
before 20.5 23.6 Mpps
after 21.8 26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe33 ("bpf: fix bpf helpers to use skb->mac_header relative offsets")

Signed-off-by: Alexei Starovoitov
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago

17 Apr, 2015

1 commit

a166151cb bpf: fix bpf helpers to use skb->mac_header relative offsets ... Browse Code »

For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
10 years ago

21 Mar, 2015

1 commit

a8cb5f556 act_bpf: add initial eBPF support for actions ... Browse Code »

This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!

Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.

Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.

For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1].

[1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act

Signed-off-by: Daniel Borkmann
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Acked-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
10 years ago

18 Mar, 2015

1 commit

ced585c83 act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome ... Browse Code »

Revisiting commit d23b8ad8ab23 ("tc: add BPF based action") with regards
to eBPF support, I was thinking that it might be better to improve
return semantics from a BPF program invoked through BPF_PROG_RUN().

Currently, in case filter_res is 0, we overwrite the default action
opcode with TC_ACT_SHOT. A default action opcode configured through tc's
m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC,
TC_ACT_OK.

In cls_bpf, we have the possibility to overwrite the default class
associated with the classifier in case filter_res is _not_ 0xffffffff
(-1).

That allows us to fold multiple [e]BPF programs into a single one, where
they would otherwise need to be defined as a separate classifier with
its own classid, needlessly redoing parsing work, etc.

Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes
are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode
mapping, where we would allow above possibilities. Thus, like in cls_bpf,
a filter_res of 0xffffffff (-1) means that the configured _default_ action
is used. Any unkown return code from the BPF program would fail in
tcf_bpf() with TC_ACT_UNSPEC.

Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED,
which both have the same semantics, we have the option to either use
that as a default action (filter_res of 0xffffffff) or non-default BPF
return code.

All that will allow us to transparently use tcf_bpf() for both BPF
flavours.

Signed-off-by: Daniel Borkmann
Cc: Jiri Pirko
Cc: Alexei Starovoitov
Cc: Jamal Hadi Salim
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Daniel Borkmann
10 years ago

27 Jan, 2015

1 commit

fd3e646c8 net: act_bpf: fix size mismatch on filter preparation ... Browse Code »

Similarly as in cls_bpf, also this code needs to reject mismatches.

Reference: http://article.gmane.org/gmane.linux.network/347406
Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
10 years ago

18 Jan, 2015

1 commit

d23b8ad8a tc: add BPF based action ... Browse Code »

This action provides a possibility to exec custom BPF code.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
10 years ago