Eric Lee / smarc-fsl-linux-kernel

31 Mar, 2020

1 commit

93a129eb8 net: sched: expose HW stats types per action used by drivers ... Browse Code »

It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2020-03-31 02:06:49 +0800

27 Mar, 2020

3 commits

e304e21a2 cls_flower: Add extack support for flags key ... Browse Code »

Pass extack down to fl_set_key_flags() and set message on error.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800
bd7d4c128 cls_flower: Add extack support for src and dst port range options ... Browse Code »

Pass extack down to fl_set_key_port_range() and set message on error.

Both the min and max ports would qualify as invalid attributes here.
Report the min one as invalid, as it's probably what makes the most
sense from a user point of view.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800
442f730e4 cls_flower: Add extack support for mpls options ... Browse Code »

Pass extack down to fl_set_key_mpls() and set message on error.

Signed-off-by: Guillaume Nault
Signed-off-by: David S. Miller

Guillaume Nault
2020-03-27 10:52:31 +0800

22 Feb, 2020

1 commit

e65ee2fb5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Conflict resolution of ice_virtchnl_pf.c based upon work by
Stephen Rothwell.

Signed-off-by: David S. Miller

David S. Miller
2020-02-22 05:39:34 +0800

18 Feb, 2020

2 commits

8a9093c79 net: sched: correct flower port blocking ... Browse Code »

tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.

Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.

Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet
Signed-off-by: Eric Dumazet
Acked-by: Cong Wang
Signed-off-by: Jason Baron
Signed-off-by: David S. Miller

Jason Baron
2020-02-18 13:33:28 +0800
b15e7a6e8 net: sched: don't take rtnl lock during flow_action setup ... Browse Code »

Refactor tc_setup_flow_action() function not to use rtnl lock and remove
'rtnl_held' argument that is no longer needed.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2020-02-18 06:17:02 +0800

14 Feb, 2020

1 commit

e2debf085 net/sched: flower: add missing validation of TCA_FLOWER_FLAGS ... Browse Code »

unlike other classifiers that can be offloaded (i.e. users can set flags
like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
to fl_policy.

Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Signed-off-by: Davide Caratti
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Davide Caratti
2020-02-14 06:16:35 +0800

27 Jan, 2020

1 commit

2e24cd755 net_sched: fix ops->bind_class() implementations ... Browse Code »

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2020-01-27 17:51:43 +0800

31 Dec, 2019

1 commit

a5b72a083 net/sched: add delete_empty() to filters and use it in cls_flower ... Browse Code »

Revert "net/sched: cls_u32: fix refcount leak in the error path of
u32_change()", and fix the u32 refcount leak in a more generic way that
preserves the semantic of rule dumping.
On tc filters that don't support lockless insertion/removal, there is no
need to guard against concurrent insertion when a removal is in progress.
Therefore, for most of them we can avoid a full walk() when deleting, and
just decrease the refcount, like it was done on older Linux kernels.
This fixes situations where walk() was wrongly detecting a non-empty
filter, like it happened with cls_u32 in the error path of change(), thus
leading to failures in the following tdc selftests:

6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id

On cls_flower, and on (future) lockless filters, this check is necessary:
move all the check_empty() logic in a callback so that each filter
can have its own implementation. For cls_flower, it's sufficient to check
if no IDRs have been allocated.

This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.

Changes since v1:
- document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
is used, thanks to Vlad Buslov
- implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
- squash revert and new fix in a single patch, to be nice with bisect
tests that run tdc on u32 filter, thanks to Dave Miller

Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Suggested-by: Jamal Hadi Salim
Suggested-by: Vlad Buslov
Signed-off-by: Davide Caratti
Reviewed-by: Vlad Buslov
Tested-by: Jamal Hadi Salim
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Davide Caratti
2019-12-31 12:35:19 +0800

10 Dec, 2019

1 commit

c593642c8 treewide: Use sizeof_field() macro ... Browse Code »

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook
Signed-off-by: Kees Cook
Acked-by: David Miller # for net

Pankaj Bharadiya
2019-12-10 02:36:44 +0800

04 Dec, 2019

1 commit

8ffb055be cls_flower: Fix the behavior using port ranges with hw-offload ... Browse Code »

The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
packets using port ranges") had added filtering based on port ranges
to tc flower. However the commit missed necessary changes in hw-offload
code, so the feature gave rise to generating incorrect offloaded flow
keys in NIC.

One more detailed example is below:

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 100-200 action drop

With the setup above, an exact match filter with dst_port == 0 will be
installed in NIC by hw-offload. IOW, the NIC will have a rule which is
equivalent to the following one.

$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 0 action drop

The behavior was caused by the flow dissector which extracts packet
data into the flow key in the tc flower. More specifically, regardless
of exact match or specified port ranges, fl_init_dissector() set the
FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
numbers from skb in skb_flow_dissect() called by fl_classify(). Note
that device drivers received the same struct flow_dissector object as
used in skb_flow_dissect(). Thus, offloaded drivers could not identify
which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
set to struct flow_dissector in either case.

This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
tp_range field in struct fl_flow_key to recognize which filters are applied
to offloaded drivers. At this point, when filters based on port ranges
passed to drivers, drivers return the EOPNOTSUPP error because they do
not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
flag).

Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Yoshiki Komachi
Signed-off-by: David S. Miller

Yoshiki Komachi
2019-12-04 03:55:46 +0800

22 Nov, 2019

2 commits

79b1011cb net: sched: allow flower to match erspan options ... Browse Code »

This patch is to allow matching options in erspan.

The options can be described in the form:
VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
When ver is set to 1, index will be applied while dir
and hwid will be ignored, and when ver is set to 2,
dir and hwid will be used while index will be ignored.

Different from geneve, only one option can be set. And
also, geneve options, vxlan options or erspan options
can't be set at the same time.

# ip link add name erspan1 type erspan external
# tc qdisc add dev erspan1 ingress
# tc filter add dev erspan1 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
erspan_opts 1:12:0:0/1:ffff:0:0 \
ip_proto udp \
action mirred egress redirect dev eth0

v1->v2:
- improve some err msgs of extack.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-11-22 03:44:06 +0800
d8f9dfae4 net: sched: allow flower to match vxlan options ... Browse Code »

This patch is to allow matching gbp option in vxlan.

The options can be described in the form GBP/GBP_MASK,
where GBP is represented as a 32bit hexadecimal value.
Different from geneve, only one option can be set. And
also, geneve options and vxlan options can't be set at
the same time.

# ip link add name vxlan0 type vxlan dstport 0 external
# tc qdisc add dev vxlan0 ingress
# tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 10.0.99.192 \
enc_dst_ip 10.0.99.193 \
enc_key_id 11 \
vxlan_opts 01020304/ffffffff \
ip_proto udp \
action mirred egress redirect dev eth0

v1->v2:
- add .strict_start_type for enc_opts_policy as Jakub noticed.
- use Duplicate instead of Wrong in err msg for extack as Jakub
suggested.

Signed-off-by: Xin Long
Signed-off-by: David S. Miller

Xin Long
2019-11-22 03:44:06 +0800

27 Aug, 2019

5 commits

918190f50 net: sched: flower: don't take rtnl lock for cls hw offloads API ... Browse Code »

Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
5a6ff4b13 net: sched: take reference to action dev before calling offloads ... Browse Code »

In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
9838b20a7 net: sched: take rtnl lock in tc_setup_flow_action() ... Browse Code »

In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
a449a3e77 net: sched: notify classifier on successful offload add/delete ... Browse Code »

To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800
401192113 net: sched: refactor block offloads counter usage ... Browse Code »

Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:

tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
already in hardware is successfully offloaded, increment block offloads
counter, set filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be intact and offloads counter is not
decremented.

tc_setup_cb_replace() - destructive filter replace. Release existing
filter block offload counter and reset its in hardware counter and flag.
Set new filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be destroyed and offload counter is
decremented.

tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
offloads counter.

tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
call tc_cls_offload_cnt_update() if cb() didn't return an error.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-08-27 05:17:43 +0800

20 Jul, 2019

1 commit

a73233115 net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t ... Browse Code »

Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-07-20 12:27:45 +0800

10 Jul, 2019

2 commits

f9e30088d net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload ... Browse Code »

And any other existing fields in this structure that refer to tc.
Specifically:

* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2019-07-10 05:38:51 +0800
e0ace68af net/sched: cls_flower: Add matching on conntrack info ... Browse Code »

New matches for conntrack mark, label, zone, and state.

Signed-off-by: Paul Blakey
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: Yossi Kuperman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Paul Blakey
2019-07-10 03:12:00 +0800

09 Jul, 2019

1 commit

af144a983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two cases of overlapping changes, nothing fancy.

Signed-off-by: David S. Miller

David S. Miller
2019-07-09 10:48:57 +0800

02 Jul, 2019

1 commit

d39d71496 idr: introduce idr_for_each_entry_continue_ul() ... Browse Code »

Similarly, other callers of idr_get_next_ul() suffer the same
overflow bug as they don't handle it properly either.

Introduce idr_for_each_entry_continue_ul() to help these callers
iterate from a given ID.

cls_flower needs more care here because it still has overflow when
does arg->cookie++, we have to fold its nested loops into one
and remove the arg->cookie++.

Fixes: 01683a146999 ("net: sched: refactor flower walk to iterate over idr")
Fixes: 12d6066c3b29 ("net/mlx5: Add flow counters idr")
Reported-by: Li Shuang
Cc: Davide Caratti
Cc: Vlad Buslov
Cc: Chris Mi
Cc: Matthew Wilcox
Signed-off-by: Cong Wang
Tested-by: Davide Caratti
Signed-off-by: David S. Miller

Cong Wang
2019-07-02 10:15:46 +0800

19 Jun, 2019

1 commit

8212ed777 net: sched: cls_flower: use flow_dissector for ingress ifindex ... Browse Code »

Use previously introduced infra to obtain and store ingress ifindex
instead doing it locally.

Signed-off-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller

Jiri Pirko
2019-06-19 22:09:22 +0800

18 Jun, 2019

1 commit

13091aa30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller

David S. Miller
2019-06-18 11:20:36 +0800

16 Jun, 2019

1 commit

a51486266 net: sched: remove NET_CLS_IND config option ... Browse Code »

This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.

Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2019-06-16 05:06:13 +0800

15 Jun, 2019

1 commit

99815f503 net: sched: flower: don't call synchronize_rcu() on mask creation ... Browse Code »

Current flower mask creating code assumes that temporary mask that is used
when inserting new filter is stack allocated. To prevent race condition
with data patch synchronize_rcu() is called every time fl_create_new_mask()
replaces temporary stack allocated mask. As reported by Jiri, this
increases runtime of creating 20000 flower classifiers from 4 seconds to
163 seconds. However, this design is no longer necessary since temporary
mask was converted to be dynamically allocated by commit 2cddd2014782
("net/sched: cls_flower: allocate mask dynamically in fl_change()").

Remove synchronize_rcu() calls from mask creation code. Instead, refactor
fl_change() to always deallocate temporary mask with rcu grace period.

Fixes: 195c234d15c9 ("net: sched: flower: handle concurrent mask insertion")
Reported-by: Jiri Pirko
Signed-off-by: Vlad Buslov
Tested-by: Jiri Pirko
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-06-15 10:29:57 +0800

31 May, 2019

1 commit

2874c5fd2 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-31 02:26:32 +0800

08 May, 2019

1 commit

d6787147e net/sched: remove block pointer from common offload structure ... Browse Code »

Based on feedback from Jiri avoid carrying a pointer to the tcf_block
structure in the tc_cls_common_offload structure. Instead store
a flag in driver private data which indicates if offloads apply
to a shared block at block binding time.

Suggested-by: Jiri Pirko
Signed-off-by: Pieter Jansen van Vuuren
Reviewed-by: Jakub Kicinski
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Pieter Jansen van Vuuren
2019-05-08 03:23:40 +0800

06 May, 2019

1 commit

88c44a520 net/sched: add block pointer to tc_cls_common_offload structure ... Browse Code »

Some actions like the police action are stateful and could share state
between devices. This is incompatible with offloading to multiple devices
and drivers might want to test for shared blocks when offloading.
Store a pointer to the tcf_block structure in the tc_cls_common_offload
structure to allow drivers to determine when offloads apply to a shared
block.

Signed-off-by: Pieter Jansen van Vuuren
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller

Pieter Jansen van Vuuren
2019-05-06 12:49:24 +0800

28 Apr, 2019

2 commits

8cb081746 netlink: make validation more configurable for future strictness ... Browse Code »

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2019-04-28 05:07:21 +0800
ae0be8de9 netlink: make nla_nest_start() add NLA_F_NESTED flag ... Browse Code »

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek
Acked-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller

Michal Kubecek
2019-04-28 05:03:44 +0800

25 Apr, 2019

1 commit

c049d56eb net: sched: flower: refactor reoffload for concurrent access ... Browse Code »

Recent changes that introduced unlocked flower did not properly account for
case when reoffload is initiated concurrently with filter updates. To fix
the issue, extend flower with 'hw_filters' list that is used to store
filters that don't have 'skip_hw' flag set. Filter is added to the list
when it is inserted to hardware and only removed from it after being
unoffloaded from all drivers that parent block is attached to. This ensures
that concurrent reoffload can still access filter that is being deleted and
prevents race condition when driver callback can be removed when filter is
no longer accessible trough idr, but is still present in hardware.

Refactor fl_change() to respect new filter reference counter and to release
filter reference with __fl_put() in case of error, instead of directly
deallocating filter memory. This allows for concurrent access to filter
from fl_reoffload() and protects it with reference counting. Refactor
fl_reoffload() to iterate over hw_filters list instead of idr. Implement
fl_get_next_hw_filter() helper function that is used to iterate over
hw_filters list with reference counting and skips filters that are being
concurrently deleted.

Fixes: 92149190067d ("net: sched: flower: set unlocked flag for flower proto ops")
Signed-off-by: Vlad Buslov
Reviewed-by: Saeed Mahameed
Signed-off-by: David S. Miller

Vlad Buslov
2019-04-25 02:53:28 +0800

12 Apr, 2019

2 commits

9994677c9 net: sched: flower: fix filter net reference counting ... Browse Code »

Fix net reference counting in fl_change() and remove redundant call to
tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
before releasing exts and deallocating a filter, so this code caused flower
classifier to obtain net twice per filter that is being deleted.

Implementation of __fl_delete() called tcf_exts_get_net() to pass its
result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
and only complicates fl_mask_put() implementation. This functionality seems
to be copied from filter cleanup code, where it was added by Cong with
following explanation:

This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.

This doesn't apply to flower mask, which doesn't call any tc action code
during cleanup. Simplify fl_mask_put() by removing the flag parameter and
always use tcf_queue_work() to free mask objects.

Fixes: 061775583e35 ("net: sched: flower: introduce reference counting for filters")
Fixes: 1f17f7742eeb ("net: sched: flower: insert filter to ht before offloading it to hw")
Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Reported-by: Ido Schimmel
Signed-off-by: Vlad Buslov
Signed-off-by: David S. Miller

Vlad Buslov
2019-04-12 12:32:27 +0800
9e35552ae net: sched: flower: use correct ht function to prevent duplicates ... Browse Code »

Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.

Fixes: 1f17f7742eeb ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: Vlad Buslov
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller

Vlad Buslov
2019-04-12 02:33:06 +0800

08 Apr, 2019

1 commit

1f17f7742 net: sched: flower: insert filter to ht before offloading it to hw ... Browse Code »

John reports:

Recent refactoring of fl_change aims to use the classifier spinlock to
avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
function was moved to before the lock is taken. This can create problems
for drivers if duplicate filters are created (commmon in ovs tc offload
due to filters being triggered by user-space matches).

Drivers registered for such filters will now receive multiple copies of
the same rule, each with a different cookie value. This means that the
drivers would need to do a full match field lookup to determine
duplicates, repeating work that will happen in flower __fl_lookup().
Currently, drivers do not expect to receive duplicate filters.

To fix this, verify that filter with same key is not present in flower
classifier hash table and insert the new filter to the flower hash table
before offloading it to hardware. Implement helper function
fl_ht_insert_unique() to atomically verify/insert a filter.

This change makes filter visible to fast path at the beginning of
fl_change() function, which means it can no longer be freed directly in
case of error. Refactor fl_change() error handling code to deallocate the
filter with rcu timeout.

Fixes: 620da4860827 ("net: sched: flower: refactor fl_change")
Reported-by: John Hurley
Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-04-08 10:33:07 +0800

05 Apr, 2019

1 commit

95e27a4da net: sched: ensure tc flower reoffload takes filter ref ... Browse Code »

Recent changes to TC flower remove the requirement for rtnl lock when
accessing and modifying filters. Refcounts now ensure access and deletion
do not happen concurrently. However, the reoffload function which cycles
through all filters and replays them to registered hw drivers is not
protected.

Use the fl_get_next_filter() function to cycle the filters for reoffload
and ensure the ref taken by this function is put when done with each
filter.

Signed-off-by: John Hurley
Reviewed-by: Jakub Kicinski
Reviewed-by: Vlad Buslov
Signed-off-by: David S. Miller

John Hurley
2019-04-05 08:19:43 +0800

22 Mar, 2019

2 commits

921491900 net: sched: flower: set unlocked flag for flower proto ops ... Browse Code »

Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
in fl_destroy_filter_work() that is executed on workqueue instead of being
called by cls API and is not affected by setting
TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
classifier before calling hardware offloads API that has not been updated
for unlocked execution.

Signed-off-by: Vlad Buslov
Reviewed-by: Stefano Brivio
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-03-22 05:32:17 +0800
c24e43d83 net: sched: flower: track rtnl lock state ... Browse Code »

Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
to internal functions that need to know rtnl lock state. Take rtnl lock
before calling tcf APIs that require it (hw offload, bind filter, etc.).

Signed-off-by: Vlad Buslov
Reviewed-by: Stefano Brivio
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2019-03-22 05:32:17 +0800