03 Mar, 2019
1 commit
-
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
htb_activate() -> WARN_ON()
htb_dequeue_tree() to decide if a class can be htb_deactivated
when it has no more packets.HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.It seems we have to put back qlen in a central location,
at least for stable kernels.For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet
Cc: John Fastabend
Cc: Jamal Hadi Salim
Cc: Cong Wang
Cc: Jiri Pirko
Signed-off-by: David S. Miller
29 Sep, 2018
1 commit
-
Fixes the following sparse warning:
net/core/gen_stats.c:166:1: warning:
symbol '___gnet_stats_copy_basic' was not declared. Should it be static?Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
Signed-off-by: Wei Yongjun
Acked-by: Eelco Chaudron
Signed-off-by: David S. Miller
25 Sep, 2018
1 commit
-
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can
be used to count packets/bytes processed by hardware offload.Signed-off-by: Eelco Chaudron
Signed-off-by: David S. Miller
04 Jul, 2018
1 commit
-
The gen_stats facility will add a header for the toplevel nlattr of type
TCA_STATS2 that contains all stats added by qdisc callbacks. A reference
to this header is stored in the gnet_dump struct, and when all the
per-qdisc callbacks have finished adding their stats, the length of the
containing header will be adjusted to the right value.However, on architectures that need padding (i.e., that don't set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added
before the stats, which means that the stored pointer will point to the
padding, and so when the header is fixed up, the result is just a very
big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS
struct, this problem has been mostly invisible, but we exposed it with
the netlink attribute-based statistics in CAKE.Fix the issue by fixing up the stored pointer if it points to a padding
nlattr.Tested-by: Pete Heist
Tested-by: Kevin Darbyshire-Bryant
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller
09 Dec, 2017
1 commit
-
The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: John Fastabend
Signed-off-by: David S. Miller
06 Dec, 2016
1 commit
-
1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)2) Using a single timer to update all estimators does not scale.
3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)In this rewrite :
- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.- Each estimator has its own timer.
- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jun, 2016
1 commit
-
Conflicts:
net/sched/act_police.c
net/sched/sch_drr.c
net/sched/sch_hfsc.c
net/sched/sch_prio.c
net/sched/sch_red.c
net/sched/sch_tbf.cIn net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.Signed-off-by: David S. Miller
09 Jun, 2016
2 commits
-
"make htmldocs" complains otherwise:
.//net/core/gen_stats.c:168: warning: No description found for parameter 'running'
.//include/linux/netdevice.h:1867: warning: No description found for parameter 'qdisc_running_key'Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet
Reported-by: kbuild test robot
Signed-off-by: David S. Miller -
"make htmldocs" complains otherwise:
.//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr'
.//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr'Fixes: 9854518ea04d ("sched: align nlattr properly when needed")
Signed-off-by: Eric Dumazet
Reported-by: kbuild test robot
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller
08 Jun, 2016
1 commit
-
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: Eric Dumazet
Cc: Cong Wang
Cc: Jamal Hadi Salim
Cc: John Fastabend
Cc: Kevin Athey
Cc: Xiaotian Pei
Signed-off-by: David S. Miller
27 Apr, 2016
1 commit
-
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
21 Mar, 2016
1 commit
-
Function gnet_stats_copy_basic is missing the description of the cpu
argument in the documentation. Adding it.Signed-off-by: Luis de Bethencourt
Signed-off-by: David S. Miller
20 Feb, 2015
1 commit
-
The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.[xiyou.wangcong@gmail.com: remove a useless kfree()]
Signed-off-by: Ignacy Gawędzki
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
07 Oct, 2014
1 commit
-
Probably not a big deal, but we'd better just use the
one we get in retry loop.Fixes: commit 22e0f8b9322cb1a48b1357e8 ("net: sched: make bstats per cpu and estimator RCU safe")
Reported-by: Joe Perches
Cc: John Fastabend
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
30 Sep, 2014
3 commits
-
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller
06 Sep, 2014
1 commit
-
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: David S. Miller
11 Jun, 2013
1 commit
-
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.This structure is dumped to user space as a new attribute :
TCA_STATS_RATE_EST64
Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.Old tc command output, after patch :
eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
rate 34360Mbit 189696pps backlog 0b 0p requeues 0This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Signed-off-by: David S. Miller
02 Apr, 2012
1 commit
-
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.Signed-off-by: David S. Miller
13 Jul, 2010
1 commit
-
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Oct, 2009
1 commit
-
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...Okay :) Here is the last round, before the night !
Thanks again
[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)Signed-off-by: Eric Dumazet
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
18 Aug, 2009
1 commit
-
In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.Restoring old behavior is not welcome, for performance reason.
Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)Based on a report and initial patch from Michael Spang.
Reported-by: Michael Spang
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Jan, 2008
2 commits
-
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Add __acquires() and __releases() annotations to suppress some sparse
warnings.example of warnings :
net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlockSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
26 Apr, 2007
1 commit
-
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller
11 Feb, 2007
1 commit
-
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
17 Apr, 2005
1 commit
-
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!