05 Jul, 2017
1 commit
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
01 Jul, 2017
3 commits
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.Signed-off-by: David S. Miller
30 Jun, 2017
1 commit
-
When qdisc fail to init, qdisc_create would invoke the destroy callback
to cleanup. But there is no check if the callback exists really. So it
would cause the panic if there is no real destroy callback like the qdisc
codel, fq, and so on.Take codel as an example following:
When a malicious user constructs one invalid netlink msg, it would cause
codel_init->codel_change->nla_parse_nested failed.
Then kernel would invoke the destroy callback directly but qdisc codel
doesn't define one. It causes one panic as a result.Now add one the check for destroy to avoid the possible panic.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Gao Feng
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
22 Jun, 2017
1 commit
-
In order to be able to retrieve the attached programs from cls_bpf
and act_bpf, we need to expose the prog ids via netlink so that
an application can later on get an fd based on the id through the
BPF_PROG_GET_FD_BY_ID command, and dump related prog info via
BPF_OBJ_GET_INFO_BY_FD command for bpf(2).Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
16 Jun, 2017
2 commits
-
Allow requesting of zero UDP checksum for encapsulated packets. The name and
meaning of the attribute is "NO_CSUM" in order to have the same meaning of
the attribute missing and being 0.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller -
There's currently no way to request (outer) UDP checksum with
act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key
action with IPv6 does not work without going through hassles: both sides
have to have udp6zerocsumrx configured on the tunnel interface. This is
obviously not a good solution universally.It makes more sense to compute the UDP checksum by default even for IPv4.
Just set the default to request the checksum when using act_tunnel_key.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
15 Jun, 2017
3 commits
-
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.Signed-off-by: David S. Miller
-
I'm reviewing static checker warnings where we do ERR_PTR(0), which is
the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL)
here. Sometimes these bugs lead to a NULL dereference but I don't
immediately see that problem here.Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Dan Carpenter
Acked-by: Amir Vadai
Signed-off-by: David S. Miller -
Laura reported a sleep-in-atomic kernel warning inside
tcf_act_police_init() which calls gen_replace_estimator() with
spinlock protection.It is not necessary in this case, we already have RTNL lock here
so it is enough to protect concurrent writers. For the reader,
i.e. tcf_act_police(), it needs to make decision based on this
rate estimator, in the worst case we drop more/less packets than
necessary while changing the rate in parallel, it is still acceptable.Reported-by: Laura Abbott
Reported-by: Nick Huber
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
08 Jun, 2017
1 commit
-
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
07 Jun, 2017
1 commit
-
There is need to instruct the HW offloaded path to push certain matched
packets to cpu/kernel for further analysis. So this patch introduces a
new TRAP control action to TC.For kernel datapath, this action does not make much sense. So with the
same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just
dropped in the datapath (and virtually ejected to an upper level, which
does not exist in case of kernel).Signed-off-by: Jiri Pirko
Reviewed-by: Yotam Gigi
Reviewed-by: Andrew Lunn
Signed-off-by: David S. Miller
05 Jun, 2017
2 commits
-
It really makes no sense to have cls_act enabled without cls. In that
case, the cls_act code is dead. So select it.This also fixes an issue recently reported by kbuild robot:
[linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get'Reported-by: kbuild test robot
Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Benefit from the support of ip header fields dissection and
allow users to set rules matching on ipv4 tos and ttl or
ipv6 traffic-class and hoplimit.Signed-off-by: Or Gerlitz
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
26 May, 2017
2 commits
-
tcf_chain_get() always creates a new filter chain if not found
in existing ones. This is totally unnecessary when we get or
delete filters, new chain should be only created for new filters
(or new actions).Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Cc: Jamal Hadi Salim
Cc: Jiri Pirko
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller -
With the introduction of chain goto action, the reclassification would
cause the re-iteration of the actual chain. It makes more sense to restart
the whole thing and re-iterate starting from the original tp - start
of chain 0.Signed-off-by: Jiri Pirko
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller
25 May, 2017
1 commit
-
Benefit from the support of tcp flags dissection and allow user to
insert rules matching on tcp flags.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
23 May, 2017
4 commits
-
When user instructs to remove all filters from chain, we cannot destroy
the chain as other actions may hold a reference. Also the put in errout
would try to destroy it again. So instead, just walk the chain and remove
all existing filters.Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Signed-off-by: Jiri Pirko
Acked-by: Cong Wang
Signed-off-by: David S. Miller -
*p_filter_chain is rcu-dereferenced on reader path. So here in writer,
property assign the pointer.Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Since the head is guaranteed by the check above to be null, the call_rcu
would explode. Remove the previously logically dead code that was made
logically very much alive and kicking.Fixes: 985538eee06f ("net/sched: remove redundant null check on head")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
20 May, 2017
1 commit
-
skb->csum_not_inet carries the indication on which algorithm is needed to
compute checksum on skb in the transmit path, when skb->ip_summed is equal
to CHECKSUM_PARTIAL. If skb carries a SCTP packet and crc32c hasn't been
yet written in L4 header, skb->csum_not_inet is assigned to 1; otherwise,
assume Internet Checksum is needed and thus set skb->csum_not_inet to 0.Suggested-by: Tom Herbert
Signed-off-by: Davide Caratti
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
18 May, 2017
11 commits
-
We still need to initialize err to -EINVAL for
the case where 'opt' is NULL in dsmark_init().Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: David S. Miller -
Introduce new type of termination action called "goto_chain". This allows
user to specify a chain to be processed. This action type is
then processed as a return value in tcf_classify loop in similar
way as "reclassify" is, only it does not reset to the first filter
in chain but rather reset to the first filter of the desired chain.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Tp pointer will be needed by the next patch in order to get the chain.
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Instead of having only one filter per block, introduce a list of chains
for every block. Create chain 0 by default. UAPI is extended so the user
can specify which chain he wants to change. If the new attribute is not
specified, chain 0 is used. That allows to maintain backward
compatibility. If chain does not exist and user wants to manipulate with
it, new chain is created with specified index. Also, when last filter is
removed from the chain, the chain is destroyed.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Since there will be multiple chains to dump, push chain dumping code to
a separate function.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Introduce struct tcf_chain object and set of helpers around it. Wraps up
insertion, deletion and search in the filter chain.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Call the helper from the function rather than to always adjust the
return value of the function.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
The use of "nprio" variable in tc_ctl_tfilter is a bit cryptic and makes
a reader wonder what is going on for a while. So help him to understand
this priority allocation dance a litte bit better.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Make the name consistent with the rest of the helpers around.
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Currently, the filter chains are direcly put into the private structures
of qdiscs. In order to be able to have multiple chains per qdisc and to
allow filter chains sharing among qdiscs, there is a need for common
object that would hold the chains. This introduces such object and calls
it "tcf_block".Helpers to get and put the blocks are provided to be called from
individual qdisc code. Also, the original filter_list pointers are left
in qdisc privs to allow the entry into tcf_block processing without any
added overhead of possible multiple pointer dereference on fast path.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Move tc_classify function to cls_api.c where it belongs, rename it to
fit the namespace.Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
17 May, 2017
1 commit
-
BBR congestion control depends on pacing, and pacing is
currently handled by sch_fq packet scheduler for performance reasons,
and also because implemening pacing with FQ was convenient to truly
avoid bursts.However there are many cases where this packet scheduler constraint
is not practical.
- Many linux hosts are not focusing on handling thousands of TCP
flows in the most efficient way.
- Some routers use fq_codel or other AQM, but still would like
to use BBR for the few TCP flows they initiate/terminate.This patch implements an automatic fallback to internal pacing.
Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option.
If sch_fq happens to be in the egress path, pacing is delegated to
the qdisc, otherwise pacing is done by TCP itself.One advantage of pacing from TCP stack is to get more precise rtt
estimations, and less work done from TX completion, since TCP Small
queue limits are not generally hit. Setups with single TX queue but
many cpus might even benefit from this.Note that unlike sch_fq, we do not take into account header sizes.
Taking care of these headers would add additional complexity for
no practical differences in behavior.Some performance numbers using 800 TCP_STREAM flows rate limited to
~48 Mbit per second on 40Gbit NIC.If MQ+pfifo_fast is used on the NIC :
$ sar -n DEV 1 5 | grep eth
14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00
14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00
14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00
14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00
14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00
Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0
4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0
0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0
1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0
1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0Now use MQ+FQ :
lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc
lpaa23:~# tc qdisc replace dev eth0 root mq$ sar -n DEV 1 5 | grep eth
14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00
14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00
14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00
14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00
14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00
Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0
1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0
0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0
4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0
3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0As expected, number of interrupts per second is very different.
Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Cc: Neal Cardwell
Cc: Yuchung Cheng
Cc: Van Jacobson
Cc: Jerry Chu
Signed-off-by: David S. Miller
12 May, 2017
1 commit
-
In commit 59cc1f61f09c ("net: sched: convert qdisc linked list to
hashtable") we missed the opportunity to considerably speed up
tc_dump_tclass_root() if a qdisc handle is provided by user.Instead of iterating all the qdiscs, use qdisc_match_from_root()
to directly get the one we look for.Signed-off-by: Eric Dumazet
Cc: Jiri Kosina
Cc: Jamal Hadi Salim
Cc: Cong Wang
Cc: Jiri Pirko
Signed-off-by: David S. Miller
09 May, 2017
2 commits
-
fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc with
vmalloc fallback. Use the kvmalloc variant instead. Keep the
__GFP_REPEAT flag based on explanation from Eric:"At the time, tests on the hardware I had in my labs showed that
vmalloc() could deliver pages spread all over the memory and that was
a small penalty (once memory is fragmented enough, not at boot time)"The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requestsAcked-by: Vlastimil Babka
Cc: Eric Dumazet
Cc: David Miller
Cc: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator. E.g.
allocation requests
Reviewed-by: Boris Ostrovsky # Xen bits
Acked-by: Kees Cook
Acked-by: Vlastimil Babka
Acked-by: Andreas Dilger # Lustre
Acked-by: Christian Borntraeger # KVM/s390
Acked-by: Dan Williams # nvdim
Acked-by: David Sterba # btrfs
Acked-by: Ilya Dryomov # Ceph
Acked-by: Tariq Toukan # mlx4
Acked-by: Leon Romanovsky # mlx5
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Herbert Xu
Cc: Anton Vorontsov
Cc: Colin Cross
Cc: Tony Luck
Cc: "Rafael J. Wysocki"
Cc: Ben Skeggs
Cc: Kent Overstreet
Cc: Santosh Raspatur
Cc: Hariprasad S
Cc: Yishai Hadas
Cc: Oleg Drokin
Cc: "Yan, Zheng"
Cc: Alexander Viro
Cc: Alexei Starovoitov
Cc: Eric Dumazet
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2017
1 commit
-
head is previously null checked and so the 2nd null check on head
is redundant and therefore can be removed.Detected by CoverityScan, CID#1399505 ("Logically dead code")
Signed-off-by: Colin Ian King
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
03 May, 2017
1 commit
-
Jump is now the only one using value action opcode. This is going to
change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
bit check, not a value check.Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller