Eric Lee / smarc-fsl-linux-kernel

25 May, 2018

1 commit

8ffa5f978 net: sched: red: avoid hashing NULL child ... Browse Code »

[ Upstream commit 44a63b137f7b6e4c7bd6c9cc21615941cb36509d ]

Hangbin reported an Oops triggered by the syzkaller qdisc rules:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
Modules linked in: sch_red
CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:qdisc_hash_add+0x26/0xa0
RSP: 0018:ffff8800589cf470 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971
RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c
RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0
R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0
R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0
FS: 00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
red_change+0x2d2/0xed0 [sch_red]
qdisc_create+0x57e/0xef0
tc_modify_qdisc+0x47f/0x14e0
rtnetlink_rcv_msg+0x6a8/0x920
netlink_rcv_skb+0x2a2/0x3c0
netlink_unicast+0x511/0x740
netlink_sendmsg+0x825/0xc30
sock_sendmsg+0xc5/0x100
___sys_sendmsg+0x778/0x8e0
__sys_sendmsg+0xf5/0x1b0
do_syscall_64+0xbd/0x3b0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x450869
RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700
Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470

When a red qdisc is updated with a 0 limit, the child qdisc is left
unmodified, no additional scheduler is created in red_change(),
the 'child' local variable is rightfully NULL and must not add it
to the hash table.

This change addresses the above issue moving qdisc_hash_add() right
after the child qdisc creation. It additionally removes unneeded checks
for noop_qdisc.

Reported-by: Hangbin Liu
Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the dump")
Signed-off-by: Paolo Abeni
Acked-by: Jiri Kosina
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Paolo Abeni
2018-05-25 22:17:23 +0800

02 Sep, 2017

1 commit

6026e043d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Three cases of simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2017-09-02 08:42:05 +0800

31 Aug, 2017

1 commit

c2d6511e6 sch_tbf: fix two null pointer dereferences on init failure ... Browse Code »

sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
callbacks but it may fail before the timer is initialized due to missing
options (either not supplied by user-space or set as a default qdisc),
also q->qdisc is used by ->reset and ->destroy so we need it initialized.

Reproduce:
$ sysctl net.core.default_qdisc=tbf
$ ip l set ethX up

Crash log:
[ 959.160172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 959.160323] IP: qdisc_reset+0xa/0x5c
[ 959.160400] PGD 59cdb067
[ 959.160401] P4D 59cdb067
[ 959.160466] PUD 59ccb067
[ 959.160532] PMD 0
[ 959.160597]
[ 959.160706] Oops: 0000 [#1] SMP
[ 959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem
[ 959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62
[ 959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 959.161157] task: ffff880059c9a700 task.stack: ffff8800376d0000
[ 959.161263] RIP: 0010:qdisc_reset+0xa/0x5c
[ 959.161347] RSP: 0018:ffff8800376d3610 EFLAGS: 00010286
[ 959.161531] RAX: ffffffffa001b1dd RBX: ffff8800373a2800 RCX: 0000000000000000
[ 959.161733] RDX: ffffffff8215f160 RSI: ffffffff8215f160 RDI: 0000000000000000
[ 959.161939] RBP: ffff8800376d3618 R08: 00000000014080c0 R09: 00000000ffffffff
[ 959.162141] R10: ffff8800376d3578 R11: 0000000000000020 R12: ffffffffa001d2c0
[ 959.162343] R13: ffff880037538000 R14: 00000000ffffffff R15: 0000000000000001
[ 959.162546] FS: 00007fcc5126b740(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
[ 959.162844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 959.163030] CR2: 0000000000000018 CR3: 000000005abc4000 CR4: 00000000000406e0
[ 959.163233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 959.163436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 959.163638] Call Trace:
[ 959.163788] tbf_reset+0x19/0x64 [sch_tbf]
[ 959.163957] qdisc_destroy+0x8b/0xe5
[ 959.164119] qdisc_create_dflt+0x86/0x94
[ 959.164284] ? dev_activate+0x129/0x129
[ 959.164449] attach_one_default_qdisc+0x36/0x63
[ 959.164623] netdev_for_each_tx_queue+0x3d/0x48
[ 959.164795] dev_activate+0x4b/0x129
[ 959.164957] __dev_open+0xe7/0x104
[ 959.165118] __dev_change_flags+0xc6/0x15c
[ 959.165287] dev_change_flags+0x25/0x59
[ 959.165451] do_setlink+0x30c/0xb3f
[ 959.165613] ? check_chain_key+0xb0/0xfd
[ 959.165782] rtnl_newlink+0x3a4/0x729
[ 959.165947] ? rtnl_newlink+0x117/0x729
[ 959.166121] ? ns_capable_common+0xd/0xb1
[ 959.166288] ? ns_capable+0x13/0x15
[ 959.166450] rtnetlink_rcv_msg+0x188/0x197
[ 959.166617] ? rcu_read_unlock+0x3e/0x5f
[ 959.166783] ? rtnl_newlink+0x729/0x729
[ 959.166948] netlink_rcv_skb+0x6c/0xce
[ 959.167113] rtnetlink_rcv+0x23/0x2a
[ 959.167273] netlink_unicast+0x103/0x181
[ 959.167439] netlink_sendmsg+0x326/0x337
[ 959.167607] sock_sendmsg_nosec+0x14/0x3f
[ 959.167772] sock_sendmsg+0x29/0x2e
[ 959.167932] ___sys_sendmsg+0x209/0x28b
[ 959.168098] ? do_raw_spin_unlock+0xcd/0xf8
[ 959.168267] ? _raw_spin_unlock+0x27/0x31
[ 959.168432] ? __handle_mm_fault+0x651/0xdb1
[ 959.168602] ? check_chain_key+0xb0/0xfd
[ 959.168773] __sys_sendmsg+0x45/0x63
[ 959.168934] ? __sys_sendmsg+0x45/0x63
[ 959.169100] SyS_sendmsg+0x19/0x1b
[ 959.169260] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 959.169432] RIP: 0033:0x7fcc5097e690
[ 959.169592] RSP: 002b:00007ffd0d5c7b48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 959.169887] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007fcc5097e690
[ 959.170089] RDX: 0000000000000000 RSI: 00007ffd0d5c7b90 RDI: 0000000000000003
[ 959.170292] RBP: ffff8800376d3f98 R08: 0000000000000001 R09: 0000000000000003
[ 959.170494] R10: 00007ffd0d5c7910 R11: 0000000000000246 R12: 0000000000000006
[ 959.170697] R13: 000000000066f1a0 R14: 00007ffd0d5cfc40 R15: 0000000000000000
[ 959.170900] ? trace_hardirqs_off_caller+0xa7/0xcf
[ 959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24
98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89
e5 53 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb
[ 959.171637] RIP: qdisc_reset+0xa/0x5c RSP: ffff8800376d3610
[ 959.171821] CR2: 0000000000000018

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2017-08-31 06:26:12 +0800

26 Aug, 2017

1 commit

143976ce9 net_sched: remove tc class reference counting ... Browse Code »

For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:

1) For class modification and dumping paths, we already hold RTNL lock,
so all of these ->get(),->change(),->put() are atomic.

2) For filter bindiing/unbinding, we use other reference counter than
this one, and they should have RTNL lock too.

3) For ->qlen_notify(), it is special because it is called on ->enqueue()
path, but we already hold qdisc tree lock there, and we hold this
tree lock when graft or delete the class too, so it should not be gone
or changed until we release the tree lock.

Therefore, this patch removes ->get() and ->put(), but:

1) Adds a new ->find() to find the pointer to a class by classid, no
refcnt.

2) Move the original class destroy upon the last refcnt into ->delete(),
right after releasing tree lock. This is fine because the class is
already removed from hash when holding the lock.

For those who also use ->put() as ->unbind(), just rename them to reflect
this change.

Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Acked-by: Jiri Pirko
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

WANG Cong
2017-08-26 08:19:10 +0800

14 Apr, 2017

1 commit

fceb6435e netlink: pass extended ACK struct to parsing functions ... Browse Code »

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-04-14 01:58:22 +0800

13 Mar, 2017

2 commits

e33cc3163 sch_tbf: Remove bogus semicolon in if() conditional. ... Browse Code »

Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the dump")
Reported-by: kbuild test robot
Signed-off-by: David S. Miller

David S. Miller
2017-03-13 15:00:03 +0800
49b499718 net: sched: make default fifo qdiscs appear in the dump ... Browse Code »

The original reason [1] for having hidden qdiscs (potential scalability
issues in qdisc_match_from_root() with single linked list in case of large
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too
intrusive / unnecessary change of default behavior towards userspace.
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
applications to request complete qdisc hierarchy dump, including the
ones that have always been implicit/invisible.

Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
about singletons would require quite some surgery with very little gain
(seeing no qdisc or seeing noop qdisc in the dump is probably setting
the same user expectation).

[1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com
[2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.net

Signed-off-by: Jiri Kosina
Signed-off-by: David S. Miller

Jiri Kosina
2017-03-13 13:53:02 +0800

26 Jun, 2016

1 commit

520ac30f4 net_sched: drop packets after root qdisc lock is released ... Browse Code »

Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.

Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.

Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.

This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc->enqueue(), and in qdisc_drop() helper.

I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.

Signed-off-by: Eric Dumazet
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-26 00:19:35 +0800

11 Jun, 2016

2 commits

45f50bed1 net_sched: remove generic throttled management ... Browse Code »

__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.

I already removed it for sch_fq in commit f2600cf02b5b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.

When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.

Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.

This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-06-11 14:58:21 +0800
1578b0a5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/sched/act_police.c
net/sched/sch_drr.c
net/sched/sch_hfsc.c
net/sched/sch_prio.c
net/sched/sch_red.c
net/sched/sch_tbf.c

In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.

A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.

Signed-off-by: David S. Miller

David S. Miller
2016-06-11 02:52:24 +0800

09 Jun, 2016

2 commits

a09ceb0e0 sched: remove qdisc->drop ... Browse Code »

after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of ->drop() outside of other ->drop functions, i.e.
nothing calls them.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-06-09 14:58:52 +0800
c3a173d7d sched: remove qdisc_rehape_fail ... Browse Code »

After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2016-06-09 14:58:51 +0800

04 Jun, 2016

1 commit

8d5958f42 sch_tbf: update backlog as well ... Browse Code »

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-06-04 07:24:04 +0800

26 Apr, 2016

1 commit

2a51c1e8e sched: use nla_put_u64_64bit() ... Browse Code »

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-04-26 03:09:09 +0800

01 Mar, 2016

2 commits

2ccccf5fb net_sched: update hierarchical backlog too ... Browse Code »

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim
Acked-by: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-03-01 06:02:33 +0800
86a7996cc net_sched: introduce qdisc_replace() helper ... Browse Code »

Remove nearly duplicated code and prepare for the following patch.

Cc: Jamal Hadi Salim
Acked-by: Jamal Hadi Salim
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-03-01 06:02:33 +0800

06 Oct, 2014

1 commit

f2600cf02 net: sched: avoid costly atomic operation in fq_dequeue() ... Browse Code »

Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()

It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.

Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-10-06 12:55:10 +0800

30 Sep, 2014

1 commit

25331d6ce net: sched: implement qstat helper routines ... Browse Code »

This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2014-09-30 13:02:26 +0800

23 Aug, 2014

1 commit

d2de875c6 net: use ktime_get_ns() and ktime_get_real_ns() helpers ... Browse Code »

ktime_get_ns() replaces ktime_to_ns(ktime_get())

ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2014-08-23 10:57:23 +0800

14 Mar, 2014

1 commit

d59b7d805 net_sched: return nla_nest_end() instead of skb->len ... Browse Code »

nla_nest_end() already has return skb->len, so replace
return skb->len with return nla_nest_end instead().

Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller

Yang Yingliang
2014-03-14 03:39:20 +0800

06 Mar, 2014

1 commit

67ddc87f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c

The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.

The two wireless conflicts were overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2014-03-06 09:32:02 +0800

04 Mar, 2014

1 commit

a135e598c sch_tbf: Remove holes in struct tbf_sched_data. ... Browse Code »

On x86_64 we have 3 holes in struct tbf_sched_data.

The member peak_present can be replaced with peak.rate_bytes_ps,
because peak.rate_bytes_ps is set only when peak is specified in
tbf_change(). tbf_peak_present() is introduced to test
peak.rate_bytes_ps.

The member max_size is moved to fill 32bit hole.

Signed-off-by: Hiroaki SHIMODA
Signed-off-by: David S. Miller

Hiroaki SHIMODA
2014-03-04 04:43:47 +0800

28 Feb, 2014

1 commit

724b9e1d7 sch_tbf: Fix potential memory leak in tbf_change(). ... Browse Code »

The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA
Cc: Yang Yingliang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hiroaki SHIMODA
2014-02-28 01:53:50 +0800

27 Jan, 2014

1 commit

de960aa9a net: add and use skb_gso_transport_seglen() ... Browse Code »

This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.

Signed-off-by: Florian Westphal
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Florian Westphal
2014-01-27 14:38:23 +0800

27 Dec, 2013

1 commit

2e04ad424 sch_tbf: add TBF_BURST/TBF_PBURST attribute ... Browse Code »

When we set burst to 1514 with low rate in userspace,
the kernel get a value of burst that less than 1514,
which doesn't work.

Because it may make some loss when transform burst
to buffer in userspace. This makes burst lose some
bytes, when the kernel transform the buffer back to
burst.

This patch adds two new attributes to support sending
burst/mtu to kernel directly to avoid the loss.

Signed-off-by: Yang Yingliang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yang Yingliang
2013-12-27 02:54:22 +0800

12 Dec, 2013

2 commits

d55d282e6 sch_tbf: use do_div() for 64-bit divide ... Browse Code »

It's doing a 64-bit divide which is not supported
on 32-bit architectures in psched_ns_t2l(). The
correct way to do this is to use do_div().

It's introduced by commit cc106e441a63
("net: sched: tbf: fix the calculation of max_size")

Reported-by: kbuild test robot
Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller

Yang Yingliang
2013-12-12 11:53:26 +0800
cc106e441 net: sched: tbf: fix the calculation of max_size ... Browse Code »

Current max_size is caluated from rate table. Now, the rate table
has been replaced and it's wrong to caculate max_size based on this
rate table. It can lead wrong calculation of max_size.

The burst in kernel may be lower than user asked, because burst may gets
some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
and it seems we cannot avoid this loss. Burst's value(max_size) based on
rate table may be equal user asked. If a packet's length is max_size, this
packet will be stalled in tbf_dequeue() because its length is above the
burst in kernel so that it cannot get enough tokens. The max_size guards
against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().

To make consistent with the calculation of tokens, this patch add a helper
psched_ns_t2l() to calculate burst(max_size) directly to fix this problem.

After this fix, we can support to using 64bit rates to calculate burst as well.

Signed-off-by: Yang Yingliang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yang Yingliang
2013-12-12 04:08:41 +0800

24 Nov, 2013

1 commit

4d0820cf6 sch_tbf: handle too small burst ... Browse Code »

If a too small burst is inadvertently set on TBF, we might trigger
a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
qdisc_reshape_fail() call.

tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
50mbit

Fix the bug, and add a warning, as such configuration is not
going to work anyway for non GSO packets.

(For some reason, one has to use a burst >= 1520 to get a working
configuration, even with old kernels. This is a probable iproute2/tc
bug)

Based on a report and initial patch from Yang Yingliang

Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets")
Signed-off-by: Eric Dumazet
Reported-by: Yang Yingliang
Signed-off-by: David S. Miller

Eric Dumazet
2013-11-24 06:46:25 +0800

10 Nov, 2013

1 commit

a33c4a266 net_sched: tbf: support of 64bit rates ... Browse Code »

With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
Add two new attributes so that tc can use them to break the 32bit
limit.

Signed-off-by: Yang Yingliang
Suggested-by: Sergei Shtylyov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yang Yingliang
2013-11-10 03:53:37 +0800

21 Sep, 2013

1 commit

3e1e3aae1 net_sched: add u64 rate to psched_ratecfg_precompute() ... Browse Code »

Add an extra u64 rate parameter to psched_ratecfg_precompute()
so that some qdisc can opt-in for 64bit rates in the future,
to overcome the ~34 Gbits limit.

psched_ratecfg_getrate() reports a legacy structure to
tc utility, so if actual rate is above the 32bit rate field,
cap it to the 34Gbit limit.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-09-21 02:41:02 +0800

06 Jun, 2013

1 commit

6bc19fb82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge. Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller

David S. Miller
2013-06-06 07:37:30 +0800

03 Jun, 2013

1 commit

01cb71d2d net_sched: restore "overhead xxx" handling ... Browse Code »

commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "overhead xxx" handling, as well as the "linklayer atm"
attribute.

tc class add ... htb rate X ceil Y linklayer atm overhead 10

This patch restores the "overhead xxx" handling, for htb, tbf
and act_police

The "linklayer atm" thing needs a separate fix.

Reported-by: Jesper Dangaard Brouer
Signed-off-by: Eric Dumazet
Cc: Vimalkumar
Cc: Jiri Pirko
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-03 13:22:35 +0800

23 May, 2013

1 commit

e43ac79a4 sch_tbf: segment too big GSO packets ... Browse Code »

If a GSO packet has a length above tbf burst limit, the packet
is currently silently dropped.

Current way to handle this is to set the device in non GSO/TSO mode, or
setting high bursts, and its sub optimal.

We can actually segment too big GSO packets, and send individual
segments as tbf parameters allow, allowing for better interoperability.

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Cc: Jiri Pirko
Cc: Jamal Hadi Salim
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Eric Dumazet
2013-05-23 15:06:40 +0800

13 Feb, 2013

1 commit

b757c9336 tbf: improved accuracy at high rates ... Browse Code »

Current TBF uses rate table computed by the "tc" userspace program,
which has the following issue:

The rate table has 256 entries to map packet lengths to
token (time units). With TSO sized packets, the 256 entry granularity
leads to loss/gain of rate, making the token bucket inaccurate.

Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.

This is a followup to 56b765b79e9a78dc7d3f8850ba5e5567205a3ecd
("htb: improved accuracy at high rates").

Signed-off-by: Jiri Pirko
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jiri Pirko
2013-02-13 07:59:45 +0800

02 Apr, 2012

1 commit

1b34ec43c pkt_sched: Stop using NLA_PUT*(). ... Browse Code »

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller

David S. Miller
2012-04-02 06:11:37 +0800

30 Dec, 2011

1 commit

b0460e448 sch_tbf: report backlog information ... Browse Code »

Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
Sent 948517 bytes 898 pkt (dropped 0, overlimits 0 requeues 1)
rate 175056bit 16pps backlog 114b 1p requeues 1
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
Sent 948517 bytes 898 pkt (dropped 15, overlimits 611 requeues 0)
backlog 18168b 12p requeues 0

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-30 04:07:21 +0800

25 Jan, 2011

1 commit

5bdc22a56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
net/sched/sch_hfsc.c
net/sched/sch_htb.c
net/sched/sch_tbf.c

David S. Miller
2011-01-25 06:09:35 +0800

21 Jan, 2011

2 commits

9190b3b32 net_sched: accurate bytes/packets stats/rates ... Browse Code »

In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.

Signed-off-by: Eric Dumazet
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-21 15:31:33 +0800
fd245a4ad net_sched: move TCQ_F_THROTTLED flag ... Browse Code »

In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.

Signed-off-by: Eric Dumazet
CC: Patrick McHardy
CC: Jesper Dangaard Brouer
CC: Jarek Poplawski
CC: Jamal Hadi Salim
CC: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-21 08:59:32 +0800

20 Jan, 2011

1 commit

cc7ec456f net_sched: cleanups ... Browse Code »

Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-20 15:31:12 +0800