22 May, 2014
1 commit
-
Kelly reported the following crash:
IP: [] tcf_action_exec+0x46/0x90
PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
RIP: 0010:[] [] tcf_action_exec+0x46/0x90
RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283
RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
Stack:
ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
Call Trace:
[] tcindex_classify+0x88/0x9b
[] tc_classify_compat+0x3e/0x7b
[] tc_classify+0x25/0x9f
[] htb_enqueue+0x55/0x27a
[] dsmark_enqueue+0x165/0x1a4
[] __dev_queue_xmit+0x35e/0x536
[] dev_queue_xmit+0x10/0x12
[] packet_sendmsg+0xb26/0xb9a
[] ? __lock_acquire+0x3ae/0xdf3
[] __sock_sendmsg_nosec+0x25/0x27
[] sock_aio_write+0xd0/0xe7
[] do_sync_write+0x59/0x78
[] vfs_write+0xb5/0x10a
[] SyS_write+0x49/0x7f
[] system_call_fastpath+0x16/0x1bThis is because we memcpy struct tcindex_filter_result which contains
struct tcf_exts, obviously struct list_head can not be simply copied.
This is a regression introduced by commit 33be627159913b094bb578
(net_sched: act: use standard struct list_head).It's not very easy to fix it as the code is a mess:
if (old_r)
memcpy(&cr, r, sizeof(cr));
else {
memset(&cr, 0, sizeof(cr));
tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
}
...
tcf_exts_change(tp, &cr.exts, &e);
...
memcpy(r, &cr, sizeof(cr));the above code should equal to:
tcindex_filter_result_init(&cr);
if (old_r)
cr.res = r->res;
...
if (old_r)
tcf_exts_change(tp, &r->exts, &e);
else
tcf_exts_change(tp, &cr.exts, &e);
...
r->res = cr.res;after this change, since there is no need to copy struct tcf_exts.
And it also fixes other places zero'ing struct's contains struct tcf_exts.
Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
Reported-by: Kelly Anderson
Tested-by: Kelly Anderson
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
05 May, 2014
1 commit
-
hhf_change() takes the sch_tree_lock and releases it but misses the
error cases. Fix the missed case here.To reproduce try a command like this,
# tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000
Signed-off-by: John Fastabend
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
25 Apr, 2014
1 commit
-
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
01 Apr, 2014
1 commit
-
This allows to monitor carrier on/off transitions and detect link
flapping issues:
- new /sys/class/net/X/carrier_changes
- new rtnetlink IFLA_CARRIER_CHANGES (getlink)Tested:
- grep . /sys/class/net/*/carrier_changes
+ ip link set dev X down/up
+ plug/unplug cable
- updated iproute2: prints IFLA_CARRIER_CHANGES
- iproute2 20121211-2 (debian): unchanged behaviorSigned-off-by: David Decotigny
Signed-off-by: David S. Miller
19 Mar, 2014
1 commit
-
In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
classifier") Patrick added an u32 field in fw_head, making it slightly
bigger than one page.Lets use 256 slots to make fw_hash() more straight forward, and move
@mask to the beginning of the structure as we often use a small number
of skb->mark. @mask and first hash buckets share the same cache line.This brings back the memory usage to less than 4000 bytes, and permits
John to add a rcu_head at the end of the structure later without any
worry.Signed-off-by: Eric Dumazet
Cc: Thomas Graf
Cc: John Fastabend
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
15 Mar, 2014
1 commit
-
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.cBoth the r8152 and netback conflicts were simple overlapping
changes.Signed-off-by: David S. Miller
14 Mar, 2014
1 commit
-
nla_nest_end() already has return skb->len, so replace
return skb->len with return nla_nest_end instead().Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
12 Mar, 2014
2 commits
-
We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.Add cond_resched() to give a chance to higher prio tasks to get cpu.
Signed-off-by; Eric Dumazet
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(),
so we do not need to use rcu protection to protect list of netdevices.This will allow preemption to occur, thus reducing latencies.
Following patch adds explicit cond_resched() calls.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Mar, 2014
2 commits
-
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.This is definitely not good, as allocation might sleep.
We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.Signed-off-by: Eric Dumazet
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller -
The WARN_ON(root == &noop_qdisc)) added in qdisc_list_add()
can trigger in normal conditions when devices are not up.
It should be done only right before the list_add_tail() call.Fixes: e57a784d8cae4 ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
Reported-by: Valdis Kletnieks
Tested-by: Mirco Tischler
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Mar, 2014
1 commit
-
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.This is definitely not good, as allocation might sleep.
We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.Signed-off-by: Eric Dumazet
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller
07 Mar, 2014
1 commit
-
htb_dump() and htb_dump_class() do not strictly need to acquire
qdisc lock to fetch qdisc and/or class parameters.We hold RTNL and no changes can occur.
This reduces by 50% qdisc lock pressure while doing tc qdisc|class dump
operations.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Mar, 2014
1 commit
-
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.cThe SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller
04 Mar, 2014
1 commit
-
On x86_64 we have 3 holes in struct tbf_sched_data.
The member peak_present can be replaced with peak.rate_bytes_ps,
because peak.rate_bytes_ps is set only when peak is specified in
tbf_change(). tbf_peak_present() is introduced to test
peak.rate_bytes_ps.The member max_size is moved to fill 32bit hole.
Signed-off-by: Hiroaki SHIMODA
Signed-off-by: David S. Miller
28 Feb, 2014
1 commit
-
The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA
Cc: Yang Yingliang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Feb, 2014
1 commit
-
Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.cTwo minor conflicts in bonding, both of which were overlapping
changes.Signed-off-by: David S. Miller
18 Feb, 2014
1 commit
-
Replace two magic numbers which intialize clgstate::state.
Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
14 Feb, 2014
4 commits
-
Replace some magic numbers which describe states of GE model
loss generator with enumerate.Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller -
In netem_change(), we have already get "struct netem_sched_data *q".
Replace params of get_correlation() and other similar functions with
"struct netem_sched_data *q".Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller -
get_dist_table() and get_loss_clg() may be failed. These
two functions should be called after setting the members
of qdisc_priv(sch), or it will break the old settings while
either of them is failed.Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller -
Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
more details. Also add references to algorithm (IETF draft and paper) to top of
file.Signed-off-by: Vijay Subramanian
CC: Mythili Prabhu
CC: Norbert Kiesel
Signed-off-by: David S. Miller
13 Feb, 2014
5 commits
-
We could allocate tc_action on stack in tca_action_flush(),
since it is not large.Also, we could use create_a() in tcf_action_get_1().
Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
When an action is bonnd to a filter, there is no point to
remove it outside. Currently we just silently decrease the refcnt,
we should reject this explicitly with EPERM.Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
For bindcnt and refcnt etc., they are common for all actions,
not need to repeat such operations for their own, they can be unified
now. Actions just need to do its specific cleanup if needed.Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Now we can totally hide it from modules. tcf_hash_*() API's
will operate on struct tc_action, modules don't need to care about
the details.Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
27 Jan, 2014
1 commit
-
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Jan, 2014
1 commit
-
If the class in skb->priority is not a leaf, apply filters from the
selected class, not the qdisc. This lets netfilter or user space
partially classify the packet.Signed-off-by: Harry Mason
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
22 Jan, 2014
4 commits
-
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.This has been fixed by Eric Dumazet in commit aee636c4809fa5
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d8d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:1) Initialization:
int l = ceil(log_2 d)
uword m' = floor((1<<
Cc: Eric Dumazet
Cc: Austin S Hemmelgarn
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross
Cc: Jamal Hadi Salim
Cc: Stephen Hemminger
Cc: Matt Mackall
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Andy Gospodarek
Cc: Veaceslav Falico
Cc: Jay Vosburgh
Cc: Jakub Zawadzki
Signed-off-by: Daniel Borkmann
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
Many functions have open coded a function that returns a random
number in range [0,N-1]. Under the assumption that we have a PRNG
such as taus113 with being well distributed in [0, ~0U] space,
we can implement such a function as uword t = (n*m')>>32, where
m' is a random number obtained from PRNG, n the right open interval
border and t our resulting random number, with n,m',t in u32 universe.Lets go with Joe and simply call it prandom_u32_max(), although
technically we have an right open interval endpoint, but that we
have documented. Other users can further be migrated to the new
prandom_u32_max() function later on; for now, we need to make sure
to migrate reciprocal_divide() users for the reciprocal_divide()
follow-up fixup since their function signatures are going to change.Joint work with Hannes Frederic Sowa.
Cc: Jakub Zawadzki
Cc: Eric Dumazet
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller -
So that we will not expose struct tcf_common to modules.
Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
20 Jan, 2014
2 commits
-
It is not actually implemented.
Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller -
Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
17 Jan, 2014
3 commits
-
The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.Fixes: 2519a602c273 ('net_sched: optimize tcf_match_indev()')
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller -
In tcf_register_action() we check either ->type or ->kind to see if
there is an existing action registered, but ipt action registers two
actions with same type but different kinds. They should have different
types too.Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Cc: Jamal Hadi Salim
Cc: David S. Miller
Signed-off-by: Cong Wang
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
15 Jan, 2014
2 commits
-
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.Signed-off-by: Aruna-Hewapathirane
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller