Eric Lee / smarc-fsl-linux-kernel

07 Dec, 2019

1 commit

9f104c773 mqprio: Fix out-of-bounds access in mqprio_dump ... Browse Code »

When user runs a command like
tc qdisc add dev eth1 root mqprio
KASAN stack-out-of-bounds warning is emitted.
Currently, NLA_ALIGN macro used in mqprio_dump provides too large
buffer size as argument for nla_put and memcpy down the call stack.
The flow looks like this:
1. nla_put expects exact object size as an argument;
2. Later it provides this size to memcpy;
3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
macro itself.

Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
Otherwise it will lead to out-of-bounds memory access in memcpy.

Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Signed-off-by: Vladyslav Tarasiuk
Signed-off-by: David S. Miller

Vladyslav Tarasiuk
2019-12-07 03:58:45 +0800

04 Dec, 2019

1 commit

2f23cd42e net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues ... Browse Code »

sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
in mq_dump() and mqprio_dump().

Fixes: ce679e8df7ed ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
Signed-off-by: Dust Li
Signed-off-by: Tony Lu
Signed-off-by: David S. Miller

Dust Li
2019-12-04 03:53:55 +0800

01 Dec, 2019

1 commit

14e54ab91 net: sched: fix `tc -s class show` no bstats on class with nolock subqueues ... Browse Code »

When a classful qdisc's child qdisc has set the flag
TCQ_F_CPUSTATS (pfifo_fast for example), the child qdisc's
cpu_bstats should be passed to gnet_stats_copy_basic(),
but many classful qdisc didn't do that. As a result,
`tc -s class show dev DEV` always return 0 for bytes and
packets in this case.

Pass the child qdisc's cpu_bstats to gnet_stats_copy_basic()
to fix this issue.

The qstats also has this problem, but it has been fixed
in 5dd431b6b9 ("net: sched: introduce and use qstats read...")
and bstats still remains buggy.

Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe")
Signed-off-by: Dust Li
Signed-off-by: Tony Lu
Acked-by: Cong Wang
Signed-off-by: David S. Miller

Dust Li
2019-12-01 02:38:40 +0800

19 Jun, 2019

1 commit

d2912cb15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 ... Browse Code »

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Enrico Weigelt
Reviewed-by: Kate Stewart
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-19 23:09:55 +0800

28 Apr, 2019

2 commits

8cb081746 netlink: make validation more configurable for future strictness ... Browse Code »

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2019-04-28 05:07:21 +0800
ae0be8de9 netlink: make nla_nest_start() add NLA_F_NESTED flag ... Browse Code »

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek
Acked-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller

Michal Kubecek
2019-04-28 05:03:44 +0800

02 Apr, 2019

1 commit

5dd431b6b net: sched: introduce and use qstats read helpers ... Browse Code »

Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.

Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.

This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.

DRR needs also to access the child qdisc queue length, so it
needs custom handling.

Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2019-04-02 05:50:13 +0800

26 Sep, 2018

1 commit

86bd446b5 net: sched: rename qdisc_destroy() to qdisc_put() ... Browse Code »

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Vlad Buslov
2018-09-26 11:17:35 +0800

22 Dec, 2017

3 commits

a38a98821 net: sch: api: add extack support in qdisc_create_dflt ... Browse Code »

This patch adds extack support for the function qdisc_create_dflt which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_create_dflt failed. The function qdisc_create_dflt
will also call an init callback which can fail by any per-qdisc specific
handling.

Cc: David Ahern
Acked-by: Jamal Hadi Salim
Signed-off-by: Alexander Aring
Signed-off-by: David S. Miller

Alexander Aring
2017-12-22 01:32:51 +0800
653d6fd68 net: sched: sch: add extack for graft callback ... Browse Code »

This patch adds extack support for graft callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern
Acked-by: Jamal Hadi Salim
Signed-off-by: Alexander Aring
Signed-off-by: David S. Miller

Alexander Aring
2017-12-22 01:32:50 +0800
e63d7dfd2 net: sched: sch: add extack for init callback ... Browse Code »

This patch adds extack support for init callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern
Acked-by: Jamal Hadi Salim
Signed-off-by: Alexander Aring
Signed-off-by: David S. Miller

Alexander Aring
2017-12-22 01:32:50 +0800

09 Dec, 2017

1 commit

ce679e8df net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio ... Browse Code »

The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2017-12-09 02:32:26 +0800

08 Nov, 2017

1 commit

575ed7d39 net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO ... Browse Code »

Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.

Signed-off-by: Nogah Frankel
Signed-off-by: Jiri Pirko
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Nogah Frankel
2017-11-08 11:23:38 +0800

28 Oct, 2017

1 commit

0f7787b41 net/sched: Add select_queue() class_ops for mqprio ... Browse Code »

When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
the netdev_queue pointer that the current child qdisc is associated
with before creating the new qdisc.

Currently, when using mqprio as root qdisc, the kernel will end up
getting the queue #0 pointer from the mqprio (root qdisc), which leaves
any new child qdisc with a possibly wrong netdev_queue pointer.

Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
issue and avoid an inconsistent state when child qdiscs are replaced.

Signed-off-by: Jesus Sanchez-Palencia
Tested-by: Henrik Austad
Signed-off-by: Jeff Kirsher

Jesus Sanchez-Palencia
2017-10-28 00:41:50 +0800

19 Oct, 2017

1 commit

22ce97fe4 mqprio: fix potential null pointer dereference on opt ... Browse Code »

The pointer opt has a null check however before for this check opt is
dereferenced when len is initialized, hence we potentially have a null
pointer deference on opt. Avoid this by checking for a null opt before
dereferencing it.

Detected by CoverityScan, CID#1458234 ("Dereference before null check")

Fixes: 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller

Colin Ian King
2017-10-19 20:13:14 +0800

17 Oct, 2017

1 commit

32302902f mqprio: Reserve last 32 classid values for HW traffic classes and misc IDs ... Browse Code »

This patch makes a slight tweak to mqprio in order to bring the
classid values used back in line with what is used for mq. The general idea
is to reserve values :ffe0 - :ffef to identify hardware traffic classes
normally reported via dev->num_tc. By doing this we can maintain a
consistent behavior with mq for classid where :1 - :ffdf will represent a
physical qdisc mapped onto a Tx queue represented by classid - 1, and the
traffic classes will be mapped onto a known subset of classid values
reserved for our virtual qdiscs.

Note I reserved the range from :fff0 - :ffff since this way we might be
able to reuse these classid values with clsact and ingress which would mean
that for mq, mqprio, ingress, and clsact we should be able to maintain a
similar classid layout.

Signed-off-by: Alexander Duyck
Tested-by: Jesus Sanchez-Palencia
Signed-off-by: David S. Miller

Alexander Duyck
2017-10-17 03:53:23 +0800

14 Oct, 2017

1 commit

4e8b86c06 mqprio: Introduce new hardware offload mode and shaper in mqprio ... Browse Code »

The offload types currently supported in mqprio are 0 (no offload) and
1 (offload only TCs) by setting these values for the 'hw' option. If
offloads are supported by setting the 'hw' option to 1, the default
offload mode is 'dcb' where only the TC values are offloaded to the
device. This patch introduces a new hardware offload mode called
'channel' with 'hw' set to 1 in mqprio which makes full use of the
mqprio options, the TCs, the queue configurations and the QoS parameters
for the TCs. This is achieved through a new netlink attribute for the
'mode' option which takes values such as 'dcb' (default) and 'channel'.
The 'channel' mode also supports QoS attributes for traffic class such as
minimum and maximum values for bandwidth rate limits.

This patch enables configuring additional HW shaper attributes associated
with a traffic class. Currently the shaper for bandwidth rate limiting is
supported which takes options such as minimum and maximum bandwidth rates
and are offloaded to the hardware in the 'channel' mode. The min and max
limits for bandwidth rates are provided by the user along with the TCs
and the queue configurations when creating the mqprio qdisc. The interface
can be extended to support new HW shapers in future through the 'shaper'
attribute.

Introduces a new data structure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the existing data structure
for mqprio queue options. This new data structure can be extended when
adding new attributes for traffic class such as mode, shaper, shaper
parameters (bandwidth rate limits). The existing data structure for mqprio
queue options will be shared between the kernel and userspace.

Example:
queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit

To dump the bandwidth rates:

qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
queues:(0:3) (4:7)
mode:channel
shaper:bw_rlimit min_rate:1Gbit 2Gbit max_rate:4Gbit 5Gbit

Signed-off-by: Amritha Nambiar
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher

Amritha Nambiar
2017-10-14 04:23:35 +0800

26 Aug, 2017

1 commit

143976ce9 net_sched: remove tc class reference counting ... Browse Code »

For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:

1) For class modification and dumping paths, we already hold RTNL lock,
so all of these ->get(),->change(),->put() are atomic.

2) For filter bindiing/unbinding, we use other reference counter than
this one, and they should have RTNL lock too.

3) For ->qlen_notify(), it is special because it is called on ->enqueue()
path, but we already hold qdisc tree lock there, and we hold this
tree lock when graft or delete the class too, so it should not be gone
or changed until we release the tree lock.

Therefore, this patch removes ->get() and ->put(), but:

1) Adds a new ->find() to find the pointer to a class by classid, no
refcnt.

2) Move the original class destroy upon the last refcnt into ->delete(),
right after releasing tree lock. This is fine because the class is
already removed from hash when holding the lock.

For those who also use ->put() as ->unbind(), just rename them to reflect
this change.

Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2017-08-26 08:19:10 +0800

08 Aug, 2017

3 commits

de4784ca0 net: sched: get rid of struct tc_to_netdev ... Browse Code »

Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2017-08-08 00:42:37 +0800
5fd9fc4e2 net: sched: push cls related args into cls_common structure ... Browse Code »

As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2017-08-08 00:42:37 +0800
2572ac53c net: sched: make type an argument for ndo_setup_tc ... Browse Code »

Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2017-08-08 00:42:35 +0800

08 Jun, 2017

1 commit

a5fcf8a6c net: propagate tc filter chain index down the ndo_setup_tc call ... Browse Code »

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2017-06-08 21:55:53 +0800

16 Mar, 2017

2 commits

56f36acd2 mqprio: Modify mqprio to pass user parameters via ndo_setup_tc. ... Browse Code »

The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.

This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.

Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.

Signed-off-by: Amritha Nambiar
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Amritha Nambiar
2017-03-16 06:20:27 +0800
2026fecf5 mqprio: Change handling of hw u8 to allow for multiple hardware offload modes ... Browse Code »

This patch is meant to allow for support of multiple hardware offload type
for a single device. There is currently no bounds checking for the hw
member of the mqprio_qopt structure. This results in us being able to pass
values from 1 to 255 with all being treated the same. On retreiving the
value it is returned as 1 for anything 1 or greater being set.

With this change we are currently adding limited bounds checking by
defining an enum and using those values to limit the reported hardware
offloads.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2017-03-16 06:20:27 +0800

13 Mar, 2017

1 commit

49b499718 net: sched: make default fifo qdiscs appear in the dump ... Browse Code »

The original reason [1] for having hidden qdiscs (potential scalability
issues in qdisc_match_from_root() with single linked list in case of large
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too
intrusive / unnecessary change of default behavior towards userspace.
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
applications to request complete qdisc hierarchy dump, including the
ones that have always been implicit/invisible.

Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
about singletons would require quite some surgery with very little gain
(seeing no qdisc or seeing noop qdisc in the dump is probably setting
the same user expectation).

[1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com
[2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.net

Signed-off-by: Jiri Kosina
Signed-off-by: David S. Miller

Jiri Kosina
2017-03-13 13:53:02 +0800

12 Feb, 2017

1 commit

87b60cfac net_sched: fix error recovery at qdisc creation ... Browse Code »

Dmitry reported uses after free in qdisc code [1]

The problem here is that ops->init() can return an error.

qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.

Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.

This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.

[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
Call Trace:
[] __dump_stack lib/dump_stack.c:15 [inline]
[] dump_stack+0x6c/0x98 lib/dump_stack.c:51
[] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[] print_address_description mm/kasan/report.c:196 [inline]
[] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
[] kasan_report mm/kasan/report.c:305 [inline]
[] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
[] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
[] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
[] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
[] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
[] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
[] __dev_open+0x221/0x320 net/core/dev.c:1403
[] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
[] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
[] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
[] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
[] sock_do_ioctl+0x99/0xb0 net/socket.c:879
[] sock_ioctl+0x2a0/0x390 net/socket.c:958
[] vfs_ioctl fs/ioctl.c:44 [inline]
[] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
[] SYSC_ioctl fs/ioctl.c:626 [inline]
[] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
[] entry_SYSCALL_64_fastpath+0x12/0x17

Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Signed-off-by: David S. Miller

Eric Dumazet
2017-02-12 10:38:58 +0800

11 Aug, 2016

1 commit

59cc1f61f net: sched: convert qdisc linked list to hashtable ... Browse Code »

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

Reviewed-by: Cong Wang
Signed-off-by: Jiri Kosina
Signed-off-by: David S. Miller

Jiri Kosina
2016-08-11 08:19:02 +0800

08 Jun, 2016

1 commit

edb09eb17 net: sched: do not acquire qdisc spinlock in qdisc/class stats dump ... Browse Code »

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Cc: John Fastabend
Cc: Kevin Athey
Cc: Xiaotian Pei
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-08 07:37:14 +0800

04 Mar, 2016

1 commit

1f27cde31 net: sched: use pfifo_fast for non real queues ... Browse Code »

Some devices declare a high number of TX queues, then set a much
lower real_num_tx_queues

This cause setups using fq_codel, sfq or fq as the default qdisc to consume
more memory than really needed.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-03-04 06:38:46 +0800

02 Mar, 2016

1 commit

241deec94 sch_mqprio: Fix build with older gcc. ... Browse Code »

CC [M] net/sched/sch_mqprio.o
net/sched/sch_mqprio.c: In function ?mqprio_init?:
net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer
net/sched/sch_mqprio.c:145: warning: missing braces around initializer
net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.?)
make[2]: *** [net/sched/sch_mqprio.o] Error 1
make[1]: *** [net/sched] Error 2
make: *** [net] Error 2

Several people reported this, surround the unnamed union
member initialization with braces to fix.

Signed-off-by: David S. Miller

David S. Miller
2016-03-02 06:44:59 +0800

17 Feb, 2016

2 commits

16e5cc647 net: rework setup_tc ndo op to consume general tc operand ... Browse Code »

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

John Fastabend
2016-02-17 22:47:35 +0800
e4c6734ea net: rework ndo tc op to consume additional qdisc handle parameter ... Browse Code »

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri
CC: Shradha Shah
CC: Or Gerlitz
CC: Ariel Elior
CC: Jeff Kirsher
CC: Bruce Allan
CC: Jesse Brandeburg
CC: Don Skidmore
Signed-off-by: John Fastabend
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

John Fastabend
2016-02-17 22:47:35 +0800

04 Dec, 2015

1 commit

4eaf3b84f net_sched: fix qdisc_tree_decrease_qlen() races ... Browse Code »

qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[ 681.774821] PAX: sch->q.qlen: 0 n: 1
[ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1
[ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[ 681.774962] Call Trace:
[ 681.774967] [] dump_stack+0x4c/0x7f
[ 681.774970] [] report_size_overflow+0x34/0x50
[ 681.774972] [] qdisc_tree_decrease_qlen+0x152/0x160
[ 681.774976] [] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[ 681.774978] [] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[ 681.774980] [] __qdisc_run+0x4d/0x1d0
[ 681.774983] [] net_tx_action+0xc2/0x160
[ 681.774985] [] __do_softirq+0xf1/0x200
[ 681.774987] [] run_ksoftirqd+0x1e/0x30
[ 681.774989] [] smpboot_thread_fn+0x150/0x260
[ 681.774991] [] ? sort_range+0x40/0x40
[ 681.774992] [] kthread+0xe4/0x100
[ 681.774994] [] ? kthread_worker_fn+0x170/0x170
[ 681.774995] [] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.

Reported-by: Daniele Fucini
Signed-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller

Eric Dumazet
2015-12-04 03:59:05 +0800

30 Sep, 2014

3 commits

b0ab6f927 net: sched: enable per cpu qstats ... Browse Code »

After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800
640158536 net: sched: restrict use of qstats qlen ... Browse Code »

This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800
22e0f8b93 net: sched: make bstats per cpu and estimator RCU safe ... Browse Code »

In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800

14 Sep, 2014

1 commit

46e5da40a net: qdisc: use rcu prefix and silence sparse warnings ... Browse Code »

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

John Fastabend
2014-09-14 00:30:25 +0800

10 Dec, 2013

1 commit

95dc19299 pkt_sched: give visibility to mq slave qdiscs ... Browse Code »

Commit 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
added the ability to change default qdisc from pfifo_fast to say fq

But as most modern ethernet devices are multiqueue, we cant really
see all the statistics from "tc -s qdisc show", as the default root
qdisc is mq.

This patch adds the calls to qdisc_list_add() to mq and mqprio

Signed-off-by: Eric Dumazet
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2013-12-10 08:54:47 +0800

31 Aug, 2013

1 commit

6da7c8fcb qdisc: allow setting default queuing discipline ... Browse Code »

By default, the pfifo_fast queue discipline has been used by default
for all devices. But we have better choices now.

This patch allow setting the default queueing discipline with sysctl.
This allows easy use of better queueing disciplines on all devices
without having to use tc qdisc scripts. It is intended to allow
an easy path for distributions to make fq_codel or sfq the default
qdisc.

This patch also makes pfifo_fast more of a first class qdisc, since
it is now possible to manually override the default and explicitly
use pfifo_fast. The behavior for systems who do not use the sysctl
is unchanged, they still get pfifo_fast

Also removes leftover random # in sysctl net core.

Signed-off-by: Stephen Hemminger
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

stephen hemminger
2013-08-31 12:32:32 +0800

12 Dec, 2012

1 commit

1abbe1394 pkt_sched: avoid requeues if possible ... Browse Code »

With BQL being deployed, we can more likely have following behavior :

We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :

- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs

This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.

This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.

Signed-off-by: Eric Dumazet
Cc: Jamal Hadi Salim
Cc: John Fastabend
Signed-off-by: David S. Miller

Eric Dumazet
2012-12-12 13:16:47 +0800