03 Jan, 2018
1 commit
-
[ Upstream commit b59e6979a86384e68b0ab6ffeab11f0034fba82d ]
Move static key increments to the beginning of the init function
so they pair 1:1 with decrements in ingress/clsact_destroy,
which is called in case ingress/clsact_init fails.Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko
Acked-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
26 Aug, 2017
1 commit
-
For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:1) For class modification and dumping paths, we already hold RTNL lock,
so all of these ->get(),->change(),->put() are atomic.2) For filter bindiing/unbinding, we use other reference counter than
this one, and they should have RTNL lock too.3) For ->qlen_notify(), it is special because it is called on ->enqueue()
path, but we already hold qdisc tree lock there, and we hold this
tree lock when graft or delete the class too, so it should not be gone
or changed until we release the tree lock.Therefore, this patch removes ->get() and ->put(), but:
1) Adds a new ->find() to find the pointer to a class by classid, no
refcnt.2) Move the original class destroy upon the last refcnt into ->delete(),
right after releasing tree lock. This is fine because the class is
already removed from hash when holding the lock.For those who also use ->put() as ->unbind(), just rename them to reflect
this change.Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
12 Aug, 2017
1 commit
-
cops->tcf_cl_offload is no longer needed, as the drivers check what they
can and cannot offload using the classid identify helpers. So remove this.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
18 May, 2017
1 commit
-
Currently, the filter chains are direcly put into the private structures
of qdiscs. In order to be able to have multiple chains per qdisc and to
allow filter chains sharing among qdiscs, there is a need for common
object that would hold the chains. This introduces such object and calls
it "tcf_block".Helpers to get and put the blocks are provided to be called from
individual qdisc code. Also, the original filter_list pointers are left
in qdisc privs to allow the entry into tcf_block processing without any
added overhead of possible multiple pointer dereference on fast path.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
11 Feb, 2017
1 commit
-
Creation is done in this file, move destruction to be at the same place.
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
08 Jun, 2016
1 commit
-
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Signed-off-by: David S. Miller
11 Jan, 2016
1 commit
-
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.Example, adding qdisc:
# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1Adding filters (deleting, etc works analogous by specifying ingress/egress):
# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-actionA 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
[1] http://patchwork.ozlabs.org/patch/512949/
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Signed-off-by: David S. Miller
11 May, 2015
1 commit
-
Ingress qdisc has no other purpose than calling into tc_classify()
that executes attached classifier(s) and action(s).It has a 1:1 relationship to dev->ingress_queue. After having commit
087c1a601ad7 ("net: sched: run ingress qdisc without locks") removed
the central ingress lock, one major contention point is gone.The extra indirection layers however, are not necessary for calling
into ingress qdisc. pktgen calling locally into netif_receive_skb()
with a dummy u32, single CPU result on a Supermicro X10SLM-F, Xeon
E3-1240: before ~21,1 Mpps, after patch ~22,9 Mpps.We can redirect the private classifier list to the netdev directly,
without changing any classifier API bits (!) and execute on that from
handle_ing() side. The __QDISC_STATE_DEACTIVATE test can be removed,
ingress qdisc doesn't have a queue and thus dev_deactivate_queue()
is also not applicable, ingress_cl_list provides similar behaviour.
In other words, ingress qdisc acts like TCQ_F_BUILTIN qdisc.One next possible step is the removal of the dev's ingress (dummy)
netdev_queue, and to only have the list member in the netdevice
itself.Note, the filter chain is RCU protected and individual filter elements
are being kfree'd by sched subsystem after RCU grace period. RCU read
lock is being held by __netif_receive_skb_core().Joint work with Alexei Starovoitov.
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller
04 May, 2015
1 commit
-
TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.
In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 MppsSigned-off-by: John Fastabend
Signed-off-by: Alexei Starovoitov
Signed-off-by: Jamal Hadi Salim
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
14 Apr, 2015
1 commit
-
Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
30 Sep, 2014
1 commit
-
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller
14 Sep, 2014
1 commit
-
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.Signed-off-by: John Fastabend
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Mar, 2014
1 commit
-
nla_nest_end() already has return skb->len, so replace
return skb->len with return nla_nest_end instead().Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
11 Jan, 2011
1 commit
-
HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packetsAdd bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 May, 2010
1 commit
-
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
06 Sep, 2009
1 commit
-
Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
05 Sep, 2009
1 commit
-
If the parent qdisc doesn't support classes, use EOPNOTSUPP.
If the parent class doesn't exist, use ENOENT. Currently EINVAL
is returned in both cases.Additionally check whether grafting is supported and remove a now
unnecessary graft function from sch_ingress.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
20 Jul, 2008
1 commit
-
Signed-off-by: Jussi Kivilinna
Signed-off-by: David S. Miller
02 Jul, 2008
1 commit
-
Pass double tcf_proto pointers to tcf_destroy_chain() to make it
clear the start of the filter list for more consistency.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
01 Feb, 2008
1 commit
-
Since the old policer code is gone, TC actions are needed for policing.
The ingress qdisc can get packets directly from netif_receive_skb()
in case TC actions are enabled or through netfilter otherwise, but
since without TC actions there is no policer the only thing it actually
does is count packets.Remove the netfilter support and always require TC actions.
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
29 Jan, 2008
15 commits
-
Use nla_nest_start/nla_nest_end for dumping nested attributes.
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
The printk about ingress qdisc registration error can't be triggered
under normal circumstances. Since register_qdisc only fails for two
identical registrations, the only way to trigger it is by loading the
sch_ingress modules multiple times under different names, in which
case we already return -EEXIST to userspace.Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Move the repeating "ifndef CONFIG_NET_CLS_ACT/ifdef CONFIG_NETFILTER"
ifdefs into a single condition.Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Instead of complaining at scheduler initialization time, check the
dependencies in Kconfig.Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
- ->reset is optional
- sch_api provides identical defaults for ->dequeue/->requeue
- ->drop can't happen since ingress never has a parent qdiscSigned-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Remove excessive debugging statements and some "future use" stuff.
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Qdisc_class_ops are const, and Qdisc_ops are mostly read.
Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.Signed-off-by: Patrick McHardy
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
16 Oct, 2007
1 commit
-
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
31 Jul, 2007
1 commit
-
Fix handling of empty or completely non-matching filter chains. In
that case -1 is returned and tcf_result is uninitialized, the
qdisc should fall back to default classification in that case.Noticed by PJ Waskiewicz .
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
15 Jul, 2007
1 commit
-
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
11 Jul, 2007
1 commit
-
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
26 Apr, 2007
1 commit
-
Spring cleaning time...
There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller