Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

27 Jan, 2015

2 commits

3f2ab1359 net: cls_bpf: fix auto generation of per list handles ... Browse Code »

When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-27 07:50:19 +0800
7913ecf69 net: cls_bpf: fix size mismatch on filter preparation ... Browse Code »

In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-27 07:50:18 +0800

11 Dec, 2014

1 commit

22f10923d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/amd/xgbe/xgbe-desc.c
drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller

David S. Miller
2014-12-11 04:48:20 +0800

10 Dec, 2014

5 commits

6ea3b446b net: sched: cls: use nla_nest_cancel instead of nlmsg_trim ... Browse Code »

To cancel nesting, this function is more convenient.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-10 10:49:58 +0800
69204cf7e net: fix suspicious rcu_dereference_check in net/sched/sch_fq_codel.c ... Browse Code »

commit 46e5da40ae (net: qdisc: use rcu prefix and silence
sparse warnings) triggers a spurious warning:

net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage!

The code should be using the _bh variant of rcu_dereference.

Signed-off-by: Valdis Kletnieks
Acked-by: Eric Dumazet
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Valdis.Kletnieks@vt.edu
2014-12-10 10:49:09 +0800
bd42b7886 net: sched: cls_basic: fix error path in basic_change() ... Browse Code »

Signed-off-by: Jiri Pirko
Reviewed-by: John Fastabend
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-10 04:41:56 +0800
57d743a3d net: sched: cls: remove unused op put from tcf_proto_ops ... Browse Code »

It is never called and implementations are void. So just remove it.

Signed-off-by: Jiri Pirko
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-10 03:49:02 +0800
5d330cddb Update old iproute2 and Xen Remus links ... Browse Code »

Signed-off-by: Andrew Shewmaker
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Andrew Shewmaker
2014-12-10 02:38:13 +0800

09 Dec, 2014

6 commits

18b5427ae net_sched: cls_cgroup: remove unnecessary if ... Browse Code »

since head->handle == handle (checked before), just assign handle.

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:41 +0800
2f8a2965d net_sched: cls_flow: remove duplicate assignments ... Browse Code »

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:41 +0800
6a659cd06 net_sched: cls_flow: remove faulty use of list_for_each_entry_rcu ... Browse Code »

rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:40 +0800
3fe6b49e2 net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu ... Browse Code »

rcu variant is not correct here. The code is called by updater (rtnl
lock is held), not by reader (no rcu_read_lock is held).

Signed-off-by: Jiri Pirko
ACKed-by: Jamal Hadi Salim
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:40 +0800
472f58370 net_sched: cls_bpf: remove unnecessary iteration and use passed arg ... Browse Code »

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:40 +0800
e4386456a net_sched: cls_basic: remove unnecessary iteration and use passed arg ... Browse Code »

Signed-off-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2014-12-09 09:53:40 +0800

27 Nov, 2014

1 commit

ced7a04e3 pkt_sched: fq: increase max delay from 125 ms to one second ... Browse Code »

FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.

It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.

Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.

Signed-off-by: Eric Dumazet
Cc: Yang Yingliang
Signed-off-by: David S. Miller

Eric Dumazet
2014-11-27 01:08:04 +0800

22 Nov, 2014

1 commit

c7e2b9689 sched: introduce vlan action ... Browse Code »
13

This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.

Signed-off-by: Jiri Pirko
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jiri Pirko
2014-11-22 03:20:18 +0800

07 Nov, 2014

1 commit

0c6965dd3 sched: fix act file names in header comment ... Browse Code »

Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2014-11-07 04:04:41 +0800

04 Nov, 2014

1 commit

56b174256 net: add rbnode to struct sk_buff ... Browse Code »

Yaogong replaces TCP out of order receive queue by an RB tree.

As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.

Signed-off-by: Eric Dumazet
Cc: Yaogong Wang
Signed-off-by: David S. Miller

Eric Dumazet
2014-11-04 05:13:03 +0800

30 Oct, 2014

1 commit

d56109020 sch_pie: schedule the timer after all init succeed ... Browse Code »

Cc: Vijay Subramanian
Cc: David S. Miller
Signed-off-by: Cong Wang
Acked-by: Eric Dumazet

WANG Cong
2014-10-30 02:28:01 +0800

22 Oct, 2014

1 commit

7c1c97d54 net: sched: initialize bstats syncp ... Browse Code »

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca
Acked-by: Cong Wang
Signed-off-by: David S. Miller

Sabrina Dubroca
2014-10-22 09:45:21 +0800

10 Oct, 2014

1 commit

b8358d70c net_sched: restore qdisc quota fairness limits after bulk dequeue ... Browse Code »

Restore the quota fairness between qdisc's, that we broke with commit
5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE").

Before that commit, the quota in __qdisc_run() were in packets as
dequeue_skb() would only dequeue a single packet, that assumption
broke with bulk dequeue.

We choose not to account for the number of packets inside the TSO/GSO
packets (accessable via "skb_gso_segs"). As the previous fairness
also had this "defect". Thus, GSO/TSO packets counts as a single
packet.

Further more, we choose to slack on accuracy, by allowing a bulk
dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only
limited by the BQL bytelimit. This is done because BQL prefers to get
its full budget for appropriate feedback from TX completion.

In future, we might consider reworking this further and, if it allows,
switch to a time-based model, as suggested by Eric. Right now, we only
restore old semantics.

Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the
first patch in cooperation with Daniel and Jesper. Eric rewrote the
patch.

Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
Signed-off-by: Eric Dumazet
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2014-10-10 07:12:26 +0800

09 Oct, 2014

2 commits

64b1f00a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2014-10-09 04:22:22 +0800
5301e3e11 net_sched: copy exts->type in tcf_exts_change() ... Browse Code »

We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.

Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim
Cc: Jamal Hadi Salim
Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2014-10-09 03:41:27 +0800

08 Oct, 2014

1 commit

028758788 net: better IFF_XMIT_DST_RELEASE support ... Browse Code »

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet
Cc: Julian Anastasov
Signed-off-by: David S. Miller

Eric Dumazet
2014-10-08 01:22:11 +0800

07 Oct, 2014

3 commits

18cdb37eb net: sched: do not use tcf_proto 'tp' argument from call_rcu ... Browse Code »

Using the tcf_proto pointer 'tp' from inside the classifiers callback
is not valid because it may have been cleaned up by another call_rcu
occuring on another CPU.

'tp' is currently being used by tcf_unbind_filter() in this patch we
move instances of tcf_unbind_filter outside of the call_rcu() context.
This is safe to do because any running schedulers will either read the
valid class field or it will be zeroed.

And all schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.

Reported-by: Cong Wang
Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller

John Fastabend
2014-10-07 06:02:33 +0800
13990f815 net: sched: cls_cgroup tear down exts and ematch from rcu callback ... Browse Code »

It is not RCU safe to destroy the action chain while there
is a possibility of readers accessing it. Move this code
into the rcu callback using the same rcu callback used in the
code patch to make a change to head.

Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller

John Fastabend
2014-10-07 06:02:32 +0800
82a470f11 net: sched: remove tcf_proto from ematch calls ... Browse Code »

This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.

Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller

John Fastabend
2014-10-07 06:02:32 +0800

06 Oct, 2014

1 commit

f2600cf02 net: sched: avoid costly atomic operation in fq_dequeue() ... Browse Code »
2

Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()

It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.

Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-10-06 12:55:10 +0800

05 Oct, 2014

2 commits

34a419d4e ematch: Fix early ending of inverted containers. ... Browse Code »

The result of a negated container has to be inverted before checking for
early ending.

This fixes my previous attempt (17c9c8232663a47f074b7452b9b034efda868ca7) to
make inverted containers work correctly.

Signed-off-by: Ignacy Gawędzki
Signed-off-by: David S. Miller

Ignacy Gawędzki
2014-10-05 08:49:46 +0800
1e203c1a2 net: sched: suspicious RCU usage in qdisc_watchdog ... Browse Code »
13

Suspicious RCU usage in qdisc_watchdog call needs to be done inside
rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
need to ensure timer is cancelled before removing qdisc structure.

[ 3992.191339] ===============================
[ 3992.191340] [ INFO: suspicious RCU usage. ]
[ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
[ 3992.191345] -------------------------------
[ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
[ 3992.191348]
[ 3992.191348] other info that might help us debug this:
[ 3992.191348]
[ 3992.191351]
[ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
[ 3992.191353] no locks held by swapper/1/0.
[ 3992.191355]
[ 3992.191355] stack backtrace:
[ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
[ 3992.191360] Hardware name: /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
[ 3992.191362] 0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
[ 3992.191366] ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
[ 3992.191370] ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
[ 3992.191374] Call Trace:
[ 3992.191376] [] dump_stack+0x4e/0x68
[ 3992.191387] [] lockdep_rcu_suspicious+0xe6/0x130
[ 3992.191392] [] qdisc_watchdog+0x8a/0xb0
[ 3992.191396] [] __run_hrtimer+0x72/0x420
[ 3992.191399] [] ? hrtimer_interrupt+0x7d/0x240
[ 3992.191403] [] ? tc_classify+0xc0/0xc0
[ 3992.191406] [] hrtimer_interrupt+0xff/0x240
[ 3992.191410] [] ? __atomic_notifier_call_chain+0x5/0x140
[ 3992.191415] [] local_apic_timer_interrupt+0x3b/0x60
[ 3992.191419] [] smp_apic_timer_interrupt+0x45/0x60
[ 3992.191422] [] apic_timer_interrupt+0x6f/0x80
[ 3992.191424] [] ? cpuidle_enter_state+0x73/0x2e0
[ 3992.191432] [] ? cpuidle_enter_state+0x6e/0x2e0
[ 3992.191437] [] cpuidle_enter+0x17/0x20
[ 3992.191441] [] cpu_startup_entry+0x3d1/0x4a0
[ 3992.191445] [] ? clockevents_config_and_register+0x26/0x30
[ 3992.191448] [] start_secondary+0x1b6/0x260

Fixes: b26b0d1e8b1 ("net: qdisc: use rcu prefix and silence sparse warnings")
Signed-off-by: John Fastabend
Acked-by: Cong Wang
Signed-off-by: David S. Miller

John Fastabend
2014-10-05 08:45:54 +0800

04 Oct, 2014

3 commits

55a93b3ea qdisc: validate skb without holding lock ... Browse Code »
5

Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-10-04 06:36:11 +0800
808e7ac0b qdisc: dequeue bulking also pickup GSO/TSO packets ... Browse Code »

The TSO and GSO segmented packets already benefit from bulking
on their own.

The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.

The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9e8 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a4 ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list. And via commit
ce93718fb7cd ("net: Don't keep around original SKB when we
software segment GSO frames.").

This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.

Testing:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).

Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).

Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.

Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms

Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2014-10-04 03:37:06 +0800
5772e9a34 qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE ... Browse Code »
28

Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
(1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet. The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
(2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ. This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()). In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit(). In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits. In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets. They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2014-10-04 03:37:06 +0800

03 Oct, 2014

1 commit

739e4a758 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2014-10-03 02:25:43 +0800

02 Oct, 2014

2 commits

a0efb80ce net_sched: avoid calling tcf_unbind_filter() in call_rcu callback ... Browse Code »

This fixes the following crash:

[ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[ 63.980094] RIP: 0010:[] [] u32_destroy_key+0x27/0x6d
[ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
[ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[ 63.980094] Stack:
[ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[ 63.980094] Call Trace:
[ 63.980094] [] u32_delete_key_freepf_rcu+0x1b/0x1d
[ 63.980094] [] rcu_process_callbacks+0x3bb/0x691
[ 63.980094] [] ? rcu_process_callbacks+0x2c1/0x691
[ 63.980094] [] ? u32_destroy_key+0x6d/0x6d
[ 63.980094] [] __do_softirq+0x142/0x323
[ 63.980094] [] run_ksoftirqd+0x23/0x53
[ 63.980094] [] smpboot_thread_fn+0x203/0x221
[ 63.980094] [] ? smpboot_unpark_thread+0x33/0x33
[ 63.980094] [] kthread+0xc9/0xd1
[ 63.980094] [] ? do_wait_for_common+0xf8/0x125
[ 63.980094] [] ? __kthread_parkme+0x61/0x61
[ 63.980094] [] ret_from_fork+0x7c/0xb0
[ 63.980094] [] ? __kthread_parkme+0x61/0x61

tp could be freed in call_rcu callback too, the order is not guaranteed.

John Fastabend says:

====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.

All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================

Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: John Fastabend
Signed-off-by: David S. Miller

WANG Cong
2014-10-02 10:00:42 +0800
6e0565697 net_sched: fix another crash in cls_tcindex ... Browse Code »

This patch fixes the following crash:

[ 166.670795] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 166.674230] IP: [] __list_del_entry+0x5c/0x98
[ 166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0
[ 166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642
[ 166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000
[ 166.674230] RIP: 0010:[] [] __list_del_entry+0x5c/0x98
[ 166.674230] RSP: 0018:ffff8800cae7f7d0 EFLAGS: 00010207
[ 166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700
[ 166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700
[ 166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001
[ 166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8
[ 166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700
[ 166.674230] FS: 00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 166.674230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0
[ 166.674230] Stack:
[ 166.674230] ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828
[ 166.674230] ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8
[ 166.674230] 0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848
[ 166.674230] Call Trace:
[ 166.674230] [] list_del+0xd/0x2b
[ 166.674230] [] tcf_action_destroy+0x4c/0x71
[ 166.674230] [] tcf_exts_destroy+0x20/0x2d
[ 166.674230] [] tcindex_delete+0x196/0x1b7

struct list_head can not be simply copied and we should always init it.

Cc: John Fastabend
Signed-off-by: Cong Wang
Acked-by: John Fastabend
Signed-off-by: David S. Miller

WANG Cong
2014-10-02 10:00:42 +0800

30 Sep, 2014

4 commits

b0ab6f927 net: sched: enable per cpu qstats ... Browse Code »
5

After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800
640158536 net: sched: restrict use of qstats qlen ... Browse Code »
2

This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800
25331d6ce net: sched: implement qstat helper routines ... Browse Code »
18

This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800
22e0f8b93 net: sched: make bstats per cpu and estimator RCU safe ... Browse Code »
41

In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800