27 Jan, 2015
2 commits
-
When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
11 Dec, 2014
1 commit
-
Conflicts:
drivers/net/ethernet/amd/xgbe/xgbe-desc.c
drivers/net/ethernet/renesas/sh_eth.cOverlapping changes in both conflict cases.
Signed-off-by: David S. Miller
10 Dec, 2014
5 commits
-
To cancel nesting, this function is more convenient.
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
commit 46e5da40ae (net: qdisc: use rcu prefix and silence
sparse warnings) triggers a spurious warning:net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage!
The code should be using the _bh variant of rcu_dereference.
Signed-off-by: Valdis Kletnieks
Acked-by: Eric Dumazet
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Pirko
Reviewed-by: John Fastabend
Signed-off-by: David S. Miller -
It is never called and implementations are void. So just remove it.
Signed-off-by: Jiri Pirko
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Andrew Shewmaker
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
09 Dec, 2014
6 commits
-
since head->handle == handle (checked before), just assign handle.
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).Signed-off-by: Jiri Pirko
ACKed-by: Jamal Hadi Salim
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
27 Nov, 2014
1 commit
-
FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.
It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.Signed-off-by: Eric Dumazet
Cc: Yang Yingliang
Signed-off-by: David S. Miller
22 Nov, 2014
1 commit
-
This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.Signed-off-by: Jiri Pirko
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
07 Nov, 2014
1 commit
-
Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
04 Nov, 2014
1 commit
-
Yaogong replaces TCP out of order receive queue by an RB tree.
As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.Signed-off-by: Eric Dumazet
Cc: Yaogong Wang
Signed-off-by: David S. Miller
30 Oct, 2014
1 commit
-
Cc: Vijay Subramanian
Cc: David S. Miller
Signed-off-by: Cong Wang
Acked-by: Eric Dumazet
22 Oct, 2014
1 commit
-
Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.
Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca
Acked-by: Cong Wang
Signed-off-by: David S. Miller
10 Oct, 2014
1 commit
-
Restore the quota fairness between qdisc's, that we broke with commit
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").Before that commit, the quota in __qdisc_run() were in packets as
dequeue_skb() would only dequeue a single packet, that assumption
broke with bulk dequeue.We choose not to account for the number of packets inside the TSO/GSO
packets (accessable via "skb_gso_segs"). As the previous fairness
also had this "defect". Thus, GSO/TSO packets counts as a single
packet.Further more, we choose to slack on accuracy, by allowing a bulk
dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
limited by the BQL bytelimit. This is done because BQL prefers to get
its full budget for appropriate feedback from TX completion.In future, we might consider reworking this further and, if it allows,
switch to a time-based model, as suggested by Eric. Right now, we only
restore old semantics.Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the
first patch in cooperation with Daniel and Jesper. Eric rewrote the
patch.Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller
09 Oct, 2014
2 commits
-
We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
08 Oct, 2014
1 commit
-
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.Current handling of IFF_XMIT_DST_RELEASE is not optimal.
Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu YThe logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.Signed-off-by: Eric Dumazet
Cc: Julian Anastasov
Signed-off-by: David S. Miller
07 Oct, 2014
3 commits
-
Using the tcf_proto pointer 'tp' from inside the classifiers callback
is not valid because it may have been cleaned up by another call_rcu
occuring on another CPU.'tp' is currently being used by tcf_unbind_filter() in this patch we
move instances of tcf_unbind_filter outside of the call_rcu() context.
This is safe to do because any running schedulers will either read the
valid class field or it will be zeroed.And all schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.Reported-by: Cong Wang
Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller -
It is not RCU safe to destroy the action chain while there
is a possibility of readers accessing it. Move this code
into the rcu callback using the same rcu callback used in the
code patch to make a change to head.Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller -
This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller
06 Oct, 2014
1 commit
-
Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
05 Oct, 2014
2 commits
-
The result of a negated container has to be inverted before checking for
early ending.This fixes my previous attempt (17c9c8232663a47f074b7452b9b034efda868ca7) to
make inverted containers work correctly.Signed-off-by: Ignacy Gawędzki
Signed-off-by: David S. Miller -
Suspicious RCU usage in qdisc_watchdog call needs to be done inside
rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
need to ensure timer is cancelled before removing qdisc structure.[ 3992.191339] ===============================
[ 3992.191340] [ INFO: suspicious RCU usage. ]
[ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
[ 3992.191345] -------------------------------
[ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
[ 3992.191348]
[ 3992.191348] other info that might help us debug this:
[ 3992.191348]
[ 3992.191351]
[ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
[ 3992.191353] no locks held by swapper/1/0.
[ 3992.191355]
[ 3992.191355] stack backtrace:
[ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
[ 3992.191360] Hardware name: /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
[ 3992.191362] 0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
[ 3992.191366] ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
[ 3992.191370] ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
[ 3992.191374] Call Trace:
[ 3992.191376] [] dump_stack+0x4e/0x68
[ 3992.191387] [] lockdep_rcu_suspicious+0xe6/0x130
[ 3992.191392] [] qdisc_watchdog+0x8a/0xb0
[ 3992.191396] [] __run_hrtimer+0x72/0x420
[ 3992.191399] [] ? hrtimer_interrupt+0x7d/0x240
[ 3992.191403] [] ? tc_classify+0xc0/0xc0
[ 3992.191406] [] hrtimer_interrupt+0xff/0x240
[ 3992.191410] [] ? __atomic_notifier_call_chain+0x5/0x140
[ 3992.191415] [] local_apic_timer_interrupt+0x3b/0x60
[ 3992.191419] [] smp_apic_timer_interrupt+0x45/0x60
[ 3992.191422] [] apic_timer_interrupt+0x6f/0x80
[ 3992.191424] [] ? cpuidle_enter_state+0x73/0x2e0
[ 3992.191432] [] ? cpuidle_enter_state+0x6e/0x2e0
[ 3992.191437] [] cpuidle_enter+0x17/0x20
[ 3992.191441] [] cpu_startup_entry+0x3d1/0x4a0
[ 3992.191445] [] ? clockevents_config_and_register+0x26/0x30
[ 3992.191448] [] start_secondary+0x1b6/0x260Fixes: b26b0d1e8b1 ("net: qdisc: use rcu prefix and silence sparse warnings")
Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller
04 Oct, 2014
3 commits
-
Validation of skb can be pretty expensive :
GSO segmentation and/or checksum computations.
We can do this without holding qdisc lock, so that other cpus
can queue additional packets.Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.Same if disabling TX checksum offload.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
The TSO and GSO segmented packets already benefit from bulking
on their own.The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9e8 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a4 ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list. And via commit
ce93718fb7cd ("net: Don't keep around original SKB when we
software segment GSO frames.").This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.Testing:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35msSigned-off-by: David S. Miller
-
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
(1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet. The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
(2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.One restriction of the new API is that every SKB must belong to the
same TXQ. This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()). In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit(). In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits. In-effect bulking will only happen for BQL enabled drivers.Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets. They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.Jointed work with Hannes, Daniel and Florian.
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
03 Oct, 2014
1 commit
-
Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.cBoth r8152 and nfnetlink conflicts were simple overlapping changes.
Signed-off-by: David S. Miller
02 Oct, 2014
2 commits
-
This fixes the following crash:
[ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[ 63.980094] RIP: 0010:[] [] u32_destroy_key+0x27/0x6d
[ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
[ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[ 63.980094] Stack:
[ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[ 63.980094] Call Trace:
[ 63.980094] [] u32_delete_key_freepf_rcu+0x1b/0x1d
[ 63.980094] [] rcu_process_callbacks+0x3bb/0x691
[ 63.980094] [] ? rcu_process_callbacks+0x2c1/0x691
[ 63.980094] [] ? u32_destroy_key+0x6d/0x6d
[ 63.980094] [] __do_softirq+0x142/0x323
[ 63.980094] [] run_ksoftirqd+0x23/0x53
[ 63.980094] [] smpboot_thread_fn+0x203/0x221
[ 63.980094] [] ? smpboot_unpark_thread+0x33/0x33
[ 63.980094] [] kthread+0xc9/0xd1
[ 63.980094] [] ? do_wait_for_common+0xf8/0x125
[ 63.980094] [] ? __kthread_parkme+0x61/0x61
[ 63.980094] [] ret_from_fork+0x7c/0xb0
[ 63.980094] [] ? __kthread_parkme+0x61/0x61tp could be freed in call_rcu callback too, the order is not guaranteed.
John Fastabend says:
====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
This patch fixes the following crash:
[ 166.670795] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 166.674230] IP: [] __list_del_entry+0x5c/0x98
[ 166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0
[ 166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642
[ 166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000
[ 166.674230] RIP: 0010:[] [] __list_del_entry+0x5c/0x98
[ 166.674230] RSP: 0018:ffff8800cae7f7d0 EFLAGS: 00010207
[ 166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700
[ 166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700
[ 166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001
[ 166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8
[ 166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700
[ 166.674230] FS: 00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 166.674230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0
[ 166.674230] Stack:
[ 166.674230] ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828
[ 166.674230] ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8
[ 166.674230] 0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848
[ 166.674230] Call Trace:
[ 166.674230] [] list_del+0xd/0x2b
[ 166.674230] [] tcf_action_destroy+0x4c/0x71
[ 166.674230] [] tcf_exts_destroy+0x20/0x2d
[ 166.674230] [] tcindex_delete+0x196/0x1b7struct list_head can not be simply copied and we should always init it.
Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: John Fastabend
Signed-off-by: David S. Miller
30 Sep, 2014
4 commits
-
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller -
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller