Eric Lee / smarc-fsl-linux-kernel

10 Dec, 2020

1 commit

7fdd375e3 net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower ... Browse Code »

TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
20 bits long).

Fixes the following bug:

$ tc filter add dev ethX ingress protocol mpls_uc \
flower mpls lse depth 2 label 256 \
action drop

$ tc filter show dev ethX ingress
filter protocol mpls_uc pref 49152 flower chain 0
filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
eth_type 8847
mpls
lse depth 2 label 0
Signed-off-by: David S. Miller

Guillaume Nault
2020-12-10 12:39:38 +0800

15 Sep, 2020

2 commits

8e1b3ac47 net: sched: initialize with 0 before setting erspan md->u ... Browse Code »

In fl_set_erspan_opt(), all bits of erspan md was set 1, as this
function is also used to set opt MASK. However, when setting for
md->u.index for opt VALUE, the rest bits of the union md->u will
be left 1. It would cause to fail the match of the whole md when
version is 1 and only index is set.

This patch is to fix by initializing with 0 before setting erspan
md->u.

Reported-by: Shuang Li
Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options")
Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2020-09-15 07:53:38 +0800
13e6ce98a net: sched: only keep the available bits when setting vxlan md->gbp ... Browse Code »

As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
be set to or parse from the packet for vxlan gbp option.

So we'd better do the mask when set it in act_tunnel_key and cls_flower.
Otherwise, when users don't know these bits, they may configure with a
value which can never be matched.

Reported-by: Shuang Li
Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2020-09-15 07:49:39 +0800

25 Jul, 2020

1 commit

5923b8f7f net/sched: cls_flower: Add hash info to flow classification ... Browse Code »

Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Ariel Levkovich
2020-07-25 06:23:31 +0800

11 Jul, 2020

1 commit

71930d610 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller

David S. Miller
2020-07-11 15:46:00 +0800

04 Jul, 2020

1 commit

d7bf2ebeb sched: consistently handle layer3 header accesses in the presence of VLANs ... Browse Code »

There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice

Reported-by: Ilya Ponetayev
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller

Toke Høiland-Jørgensen
2020-07-04 05:34:53 +0800

20 Jun, 2020

1 commit

4b61d3e8d net: qos offload add flow status with dropped count ... Browse Code »

This patch adds a drop frames counter to tc flower offloading.
Reporting h/w dropped frames is necessary for some actions.
Some actions like police action and the coming introduced stream gate
action would produce dropped frames which is necessary for user. Status
update shows how many filtered packets increasing and how many dropped
in those packets.

v2: Changes
- Update commit comments suggest by Jiri Pirko.

Signed-off-by: Po Liu
Reviewed-by: Simon Horman
Reviewed-by: Vlad Buslov
Signed-off-by: David S. Miller

Po Liu
2020-06-20 03:53:30 +0800

02 Jun, 2020

2 commits

4e4f4ce6a cls_flower: remove mpls_opts_policy ... Browse Code »

Compiling with W=1 gives the following warning:
net/sched/cls_flower.c:731:1: warning: ‘mpls_opts_policy’ defined but not used [-Wunused-const-variable=]

The TCA_FLOWER_KEY_MPLS_OPTS contains a list of
TCA_FLOWER_KEY_MPLS_OPTS_LSE. Therefore, the attributes all have the
same type and we can't parse the list with nla_parse*() and have the
attributes validated automatically using an nla_policy.

fl_set_key_mpls_opts() properly verifies that all attributes in the
list are TCA_FLOWER_KEY_MPLS_OPTS_LSE. Then fl_set_key_mpls_lse()
uses nla_parse_nested() on all these attributes, thus verifying that
they have the NLA_F_NESTED flag. So we can safely drop the
mpls_opts_policy.

Reported-by: kbuild test robot
Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-06-02 03:01:05 +0800
0af413bd3 flow_dissector: work around stack frame size warning ... Browse Code »

The fl_flow_key structure is around 500 bytes, so having two of them
on the stack in one function now exceeds the warning limit after an
otherwise correct change:

net/sched/cls_flower.c:298:12: error: stack frame size of 1056 bytes in function 'fl_classify' [-Werror,-Wframe-larger-than=]

I suspect the fl_classify function could be reworked to only have one
of them on the stack and modify it in place, but I could not work out
how to do that.

As a somewhat hacky workaround, move one of them into an out-of-line
function to reduce its scope. This does not necessarily reduce the stack
usage of the outer function, but at least the second copy is removed
from the stack during most of it and does not add up to whatever is
called from there.

I now see 552 bytes of stack usage for fl_classify(), plus 528 bytes
for fl_mask_lookup().

Fixes: 58cff782cc55 ("flow_dissector: Parse multiple MPLS Label Stack Entries")
Signed-off-by: Arnd Bergmann
Acked-by: Cong Wang
Acked-by: Guillaume Nault
Signed-off-by: David S. Miller

Arnd Bergmann
2020-06-02 02:52:05 +0800

27 May, 2020

2 commits

61aec25a6 cls_flower: Support filtering on multiple MPLS Label Stack Entries ... Browse Code »

With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.

In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).

For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.

The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).

Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-05-27 06:22:58 +0800
58cff782c flow_dissector: Parse multiple MPLS Label Stack Entries ... Browse Code »

The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).

This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.

FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.

To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.

TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.

The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.

Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-05-27 06:22:58 +0800

16 May, 2020

1 commit

0348451db net: sched: cls_flower: implement terse dump support ... Browse Code »

Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only
dump handle, flags and action data in terse mode.

Signed-off-by: Vlad Buslov
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2020-05-16 01:23:11 +0800

31 Mar, 2020

1 commit

93a129eb8 net: sched: expose HW stats types per action used by drivers ... Browse Code »

It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2020-03-31 02:06:49 +0800

27 Mar, 2020

3 commits

e304e21a2 cls_flower: Add extack support for flags key ... Browse Code »

Pass extack down to fl_set_key_flags() and set message on error.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800
bd7d4c128 cls_flower: Add extack support for src and dst port range options ... Browse Code »

Pass extack down to fl_set_key_port_range() and set message on error.

Both the min and max ports would qualify as invalid attributes here.
Report the min one as invalid, as it's probably what makes the most
sense from a user point of view.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800
442f730e4 cls_flower: Add extack support for mpls options ... Browse Code »

Pass extack down to fl_set_key_mpls() and set message on error.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800

22 Feb, 2020

1 commit

e65ee2fb5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Conflict resolution of ice_virtchnl_pf.c based upon work by
Stephen Rothwell.

Signed-off-by: David S. Miller

David S. Miller
2020-02-22 05:39:34 +0800

18 Feb, 2020

2 commits

8a9093c79 net: sched: correct flower port blocking ... Browse Code »

tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.

Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.

Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet
Signed-off-by: Eric Dumazet
Acked-by: Cong Wang
Signed-off-by: Jason Baron
Signed-off-by: David S. Miller

Jason Baron
2020-02-18 13:33:28 +0800
b15e7a6e8 net: sched: don't take rtnl lock during flow_action setup ... Browse Code »

Refactor tc_setup_flow_action() function not to use rtnl lock and remove
'rtnl_held' argument that is no longer needed.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2020-02-18 06:17:02 +0800

14 Feb, 2020

1 commit

e2debf085 net/sched: flower: add missing validation of TCA_FLOWER_FLAGS ... Browse Code »

unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
to fl_policy.

Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Signed-off-by: Davide Caratti
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Davide Caratti
2020-02-14 06:16:35 +0800

27 Jan, 2020

1 commit

2e24cd755 net_sched: fix ops->bind_class() implementations ... Browse Code »

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2020-01-27 17:51:43 +0800

31 Dec, 2019

1 commit

a5b72a083 net/sched: add delete_empty() to filters and use it in cls_flower ... Browse Code »

Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:

6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id

On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.

This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.

Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller

Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Suggested-by: Jamal Hadi Salim
Suggested-by: Vlad Buslov
Signed-off-by: Davide Caratti
Reviewed-by: Vlad Buslov
Tested-by: Jamal Hadi Salim
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Davide Caratti
2019-12-31 12:35:19 +0800

10 Dec, 2019

1 commit

c593642c8 treewide: Use sizeof_field() macro ... Browse Code »

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net

Pankaj Bharadiya
2019-12-10 02:36:44 +0800

04 Dec, 2019

1 commit

8ffb055be cls_flower: Fix the behavior using port ranges with hw-offload ... Browse Code »

The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
packets using port ranges") had added filtering based on port ranges
to tc flower. However the commit missed necessary changes in hw-offload
code, so the feature gave rise to generating incorrect offloaded flow
keys in NIC.

One more detailed example is below:

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 100-200 action drop

With the setup above, an exact match filter with dst_port == 0 will be
installed in NIC by hw-offload. IOW, the NIC will have a rule which is
equivalent to the following one.

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 0 action drop

The behavior was caused by the flow dissector which extracts packet
data into the flow key in the tc flower. More specifically, regardless
of exact match or specified port ranges, fl_init_dissector() set the
FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
numbers from skb in skb_flow_dissect() called by fl_classify(). Note
that device drivers received the same struct flow_dissector object as
used in skb_flow_dissect(). Thus, offloaded drivers could not identify
which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
set to struct flow_dissector in either case.

This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
tp_range field in struct fl_flow_key to recognize which filters are applied
to offloaded drivers. At this point, when filters based on port ranges
passed to drivers, drivers return the EOPNOTSUPP error because they do
not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
flag).

Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Yoshiki Komachi
Signed-off-by: David S. Miller

Yoshiki Komachi
2019-12-04 03:55:46 +0800

22 Nov, 2019

2 commits

79b1011cb net: sched: allow flower to match erspan options ... Browse Code »

This patch is to allow matching options in erspan.

The options can be described in the form:
VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
When ver is set to 1, index will be applied while dir
and hwid will be ignored, and when ver is set to 2,
dir and hwid will be used while index will be ignored.

Different from geneve, only one option can be set. And
also, geneve options, vxlan options or erspan options
can't be set at the same time.

# ip link add name erspan1 type erspan external
# tc qdisc add dev erspan1 ingress
# tc filter add dev erspan1 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
erspan_opts 1:12:0:0/1:ffff:0:0 \
ip_proto udp \
action mirred egress redirect dev eth0

v1->v2:
- improve some err msgs of extack.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-11-22 03:44:06 +0800
d8f9dfae4 net: sched: allow flower to match vxlan options ... Browse Code »

This patch is to allow matching gbp option in vxlan.

The options can be described in the form GBP/GBP_MASK,
where GBP is represented as a 32bit hexadecimal value.
Different from geneve, only one option can be set. And
also, geneve options and vxlan options can't be set at
the same time.

# ip link add name vxlan0 type vxlan dstport 0 external
# tc qdisc add dev vxlan0 ingress
# tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
vxlan_opts 01020304/ffffffff \
ip_proto udp \
action mirred egress redirect dev eth0

v1->v2:
- add .strict_start_type for enc_opts_policy as Jakub noticed.
- use Duplicate instead of Wrong in err msg for extack as Jakub
suggested.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-11-22 03:44:06 +0800

27 Aug, 2019

5 commits

918190f50 net: sched: flower: don't take rtnl lock for cls hw offloads API ... Browse Code »

Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
5a6ff4b13 net: sched: take reference to action dev before calling offloads ... Browse Code »

In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
9838b20a7 net: sched: take rtnl lock in tc_setup_flow_action() ... Browse Code »

In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
a449a3e77 net: sched: notify classifier on successful offload add/delete ... Browse Code »

To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
401192113 net: sched: refactor block offloads counter usage ... Browse Code »

Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:

tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
already in hardware is successfully offloaded, increment block offloads
counter, set filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be intact and offloads counter is not
decremented.

tc_setup_cb_replace() - destructive filter replace. Release existing
filter block offload counter and reset its in hardware counter and flag.
Set new filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be destroyed and offload counter is
decremented.

tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
offloads counter.

tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
call tc_cls_offload_cnt_update() if cb() didn't return an error.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800

20 Jul, 2019

1 commit

a73233115 net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t ... Browse Code »

Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-07-20 12:27:45 +0800

10 Jul, 2019

2 commits

f9e30088d net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload ... Browse Code »

And any other existing fields in this structure that refer to tc.
Specifically:

* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-07-10 05:38:51 +0800
e0ace68af net/sched: cls_flower: Add matching on conntrack info ... Browse Code »

New matches for conntrack mark, label, zone, and state.

Signed-off-by: Paul Blakey
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: Yossi Kuperman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Paul Blakey
2019-07-10 03:12:00 +0800

09 Jul, 2019

1 commit

af144a983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two cases of overlapping changes, nothing fancy.

Signed-off-by: David S. Miller

David S. Miller
2019-07-09 10:48:57 +0800

02 Jul, 2019

1 commit

d39d71496 idr: introduce idr_for_each_entry_continue_ul() ... Browse Code »

Similarly, other callers of idr_get_next_ul() suffer the same
overflow bug as they don't handle it properly either.

Introduce idr_for_each_entry_continue_ul() to help these callers
iterate from a given ID.

cls_flower needs more care here because it still has overflow when
does arg->cookie++, we have to fold its nested loops into one
and remove the arg->cookie++.

Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr")
Fixes: 12d6066c3b29 ("net/mlx5: Add flow counters idr")
Reported-by: Li Shuang
Cc: Davide Caratti
Cc: Vlad Buslov
Cc: Chris Mi
Cc: Matthew Wilcox
Signed-off-by: Cong Wang
Tested-by: Davide Caratti
Signed-off-by: David S. Miller

Cong Wang
2019-07-02 10:15:46 +0800

19 Jun, 2019

1 commit

8212ed777 net: sched: cls_flower: use flow_dissector for ingress ifindex ... Browse Code »

Use previously introduced infra to obtain and store ingress ifindex
instead doing it locally.

Signed-off-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller

Jiri Pirko
2019-06-19 22:09:22 +0800

18 Jun, 2019

1 commit

13091aa30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller

David S. Miller
2019-06-18 11:20:36 +0800

16 Jun, 2019

1 commit

a51486266 net: sched: remove NET_CLS_IND config option ... Browse Code »

This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.

Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2019-06-16 05:06:13 +0800

15 Jun, 2019

1 commit

99815f503 net: sched: flower: don't call synchronize_rcu() on mask creation ... Browse Code »

Current flower mask creating code assumes that temporary mask that is used
when inserting new filter is stack allocated. To prevent race condition
with data patch synchronize_rcu() is called every time fl_create_new_mask()
replaces temporary stack allocated mask. As reported by Jiri, this
increases runtime of creating 20000 flower classifiers from 4 seconds to
163 seconds. However, this design is no longer necessary since temporary
mask was converted to be dynamically allocated by commit 2cddd2014782
("net/sched: cls_flower: allocate mask dynamically in fl_change()").

Remove synchronize_rcu() calls from mask creation code. Instead, refactor
fl_change() to always deallocate temporary mask with rcu grace period.

Fixes: 195c234d15c9 ("net: sched: flower: handle concurrent mask insertion")
Reported-by: Jiri Pirko
Signed-off-by: Vlad Buslov
Tested-by: Jiri Pirko
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-06-15 10:29:57 +0800