Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 Jul, 2013

2 commits

88d4f419a pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts ... Browse Code »

This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.

Reported-by: David S. Miller
Signed-off-by: Paolo Valente
Signed-off-by: David S. Miller

Paolo Valente
2013-07-12 04:01:07 +0800
87f1369d6 pkt_sched: sch_qfq: improve efficiency of make_eligible ... Browse Code »

In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result. The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31. On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.

The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32. As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

Reported-by: David S. Miller
Reported-by: David Laight
Signed-off-by: Paolo Valente
Signed-off-by: David S. Miller

Paolo Valente
2013-07-12 04:01:07 +0800

04 Jul, 2013

1 commit

36b7bfe09 netem: fix possible NULL deref in netem_dequeue() ... Browse Code »

commit aec0a40a6f7884 ("netem: use rb tree to implement the time queue")
added a regression if a child qdisc is attached to netem, as we perform
a NULL dereference.

Fix this by adding a temporary variable to cache
netem_skb_cb(skb)->time_to_send.

Reported-by: Dan Carpenter
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-04 07:52:10 +0800

02 Jul, 2013

1 commit

aec0a40a6 netem: use rb tree to implement the time queue ... Browse Code »

Following typical setup to implement a ~100 ms RTT and big
amount of reorders has very poor performance because netem
implements the time queue using a linked list.
-----------------------------------------------------------
ETH=eth0
IFB=ifb0
modprobe ifb
ip link set dev $IFB up
tc qdisc add dev $ETH ingress 2>/dev/null
tc filter add dev $ETH parent ffff: \
protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
redirect dev $IFB
ethtool -K $ETH gro off tso off gso off
tc qdisc add dev $IFB root netem delay 50ms 10ms limit 100000
tc qd add dev $ETH root netem delay 50ms limit 100000
---------------------------------------------------------

Switch netem time queue to a rb tree, so this kind of setup can work at
high speed.

Signed-off-by: Eric Dumazet
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-02 09:07:15 +0800

20 Jun, 2013

2 commits

c9364636d htb: refactor struct htb_sched fields for performance ... Browse Code »

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-20 14:06:52 +0800
d98cae64e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller

David S. Miller
2013-06-20 07:49:39 +0800

14 Jun, 2013

1 commit

ca4ec90b3 htb: reorder struct htb_class fields for performance ... Browse Code »

htb_class structures are big, and source of false sharing on SMP.

By carefully splitting them in two parts, we can improve performance.

I got 9 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-14 08:17:02 +0800

12 Jun, 2013

2 commits

64153ce0a net_sched: htb: do not setup default rate estimators ... Browse Code »

With a thousand htb classes, est_timer() spends ~5 million cpu cycles
and throws out cpu cache, because each htb class has a default
rate estimator (est 4sec 16sec).

Most users do not use default rate estimators, so switch htb
to not setup ones.

Add a module parameter (htb_rate_est) so that users relying
on this default rate estimator can revert the behavior.

echo 1 >/sys/module/sch_htb/parameters/htb_rate_est

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-12 15:14:21 +0800
130d3d68b net_sched: psched_ratecfg_precompute() improvements ... Browse Code »

Before allowing 64bits bytes rates, refactor
psched_ratecfg_precompute() to get better comments
and increased accuracy.

rate_bps field is renamed to rate_bytes_ps, as we only
have to worry about bytes per second.

Signed-off-by: Eric Dumazet
Cc: Ben Greear
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-12 13:39:47 +0800

11 Jun, 2013

1 commit

45203a3b3 net_sched: add 64bit rate estimators ... Browse Code »

struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.

Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.

This structure is dumped to user space as a new attribute :

TCA_STATS_RATE_EST64

Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.

Old tc command output, after patch :

eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
rate 34360Mbit 189696pps backlog 0b 0p requeues 0

This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-11 17:51:03 +0800

08 Jun, 2013

1 commit

40edeff6e net_sched: qdisc_get_rtab() must check data[] array ... Browse Code »

qdisc_get_rtab() should check not only the keys in struct tc_ratespec,
but also the full data[] array.

"tc ... linklayer atm " only perturbs values in the 256 slots array.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-08 06:24:04 +0800

06 Jun, 2013

1 commit

6bc19fb82 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge. Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller

David S. Miller
2013-06-06 07:37:30 +0800

05 Jun, 2013

1 commit

5343a7f8b net_sched: htb: do not mix 1ns and 64ns time units ... Browse Code »

commit 56b765b79 ("htb: improved accuracy at high rates") added another
regression for low rates, because it mixes 1ns and 64ns time units.

So the maximum delay (mbuffer) was not 60 second, but 937 ms.

Lets convert all time fields to 1ns as 64bit arches are becoming the
norm.

Reported-by: Jesper Dangaard Brouer
Signed-off-by: Eric Dumazet
Tested-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-05 08:44:07 +0800

03 Jun, 2013

1 commit

01cb71d2d net_sched: restore "overhead xxx" handling ... Browse Code »

commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "overhead xxx" handling, as well as the "linklayer atm"
attribute.

tc class add ... htb rate X ceil Y linklayer atm overhead 10

This patch restores the "overhead xxx" handling, for htb, tbf
and act_police

The "linklayer atm" thing needs a separate fix.

Reported-by: Jesper Dangaard Brouer
Signed-off-by: Eric Dumazet
Cc: Vimalkumar
Cc: Jiri Pirko
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-03 13:22:35 +0800

29 May, 2013

1 commit

351638e7d net: pass info struct via netdevice notifier ... Browse Code »

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller

Jiri Pirko
2013-05-29 04:11:01 +0800

23 May, 2013

1 commit

e43ac79a4 sch_tbf: segment too big GSO packets ... Browse Code »

If a GSO packet has a length above tbf burst limit, the packet
is currently silently dropped.

Current way to handle this is to set the device in non GSO/TSO mode, or
setting high bursts, and its sub optimal.

We can actually segment too big GSO packets, and send individual
segments as tbf parameters allow, allowing for better interoperability.

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Cc: Jiri Pirko
Cc: Jamal Hadi Salim
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Eric Dumazet
2013-05-23 15:06:40 +0800

02 May, 2013

2 commits

73287a43c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):

1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.

2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.

3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.

4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.

6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.

Use it to support wake-on-lan in mv643xx_eth.

From Michael Stapelberg.

7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.

8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.

9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.

11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.

12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.

13) Start adding networking selftests.

14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.

15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.

16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.

17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.

18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.

20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.

21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.

22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.

23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.

24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.

25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.

26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.

27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.

29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.

30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

31) Support several new r8169 chips, from Hayes Wang.

32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.

33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

34) Add 802.1ad vlan offload support, from Patrick McHardy.

35) Support mmap() based netlink communication, also from Patrick
McHardy.

36) Support HW timestamping in mlx4 driver, from Amir Vadai.

37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.

38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.

39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...

Linus Torvalds
2013-05-02 05:08:52 +0800
0dcffd096 net_sched: act_ipt forward compat with xtables ... Browse Code »

Deal with changes in newer xtables while maintaining backward
compatibility. Thanks to Jan Engelhardt for suggestions.

Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
2013-05-02 01:19:19 +0800

30 Apr, 2013

2 commits

5106f3bd8 net/sched: rename random32() to prandom_u32() ... Browse Code »

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita
Cc: Stephen Hemminger
Cc: Jamal Hadi Salim
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2013-04-30 09:28:43 +0800
075e64c04 netfilter: ipset: Introduce extensions to elements in the core ... Browse Code »

Introduce extensions to elements in the core and prepare timeout as
the first one.

This patch also modifies the em_ipset classifier to use the new
extension struct layout.

Signed-off-by: Jozsef Kadlecsik
Signed-off-by: Pablo Neira Ayuso

Jozsef Kadlecsik
2013-04-30 02:08:54 +0800

23 Apr, 2013

1 commit

6e0895c2e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller

David S. Miller
2013-04-23 08:32:51 +0800

20 Apr, 2013

2 commits

cb95ec626 pkt_sched: fix error return code in fw_change_attrs() ... Browse Code »

Fix to return -EINVAL when tb[TCA_FW_MASK] is set and head->mask != 0xFFFFFFFF
instead of 0 (ifdef CONFIG_NET_CLS_IND and tb[TCA_FW_INDEV]), as done elsewhere
in this function.

Signed-off-by: Wei Yongjun
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Wei Yongjun
2013-04-20 05:34:53 +0800
e32123e59 netlink: rename ssk to sk in struct netlink_skb_params ... Browse Code »

Memory mapped netlink needs to store the receiving userspace socket
when sending from the kernel to userspace. Rename 'ssk' to 'sk' to
avoid confusion.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-20 02:57:56 +0800

13 Apr, 2013

1 commit

d14a489a4 act_csum: fix possible use after free ... Browse Code »

tcf_csum_skb_nextlayer() / pskb_may_pull() can change skb->head, so we
must be careful not keeping pointers to previous headers.

Signed-off-by: Eric Dumazet
Cc: Jamal Hadi Salim
Cc: Grégoire Baron
Signed-off-by: David S. Miller

Eric Dumazet
2013-04-13 03:25:41 +0800

08 Apr, 2013

1 commit

2f13e9f74 net_cls: remove duplicated include from cls_api.c ... Browse Code »

Remove duplicated include.

Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2013-04-08 05:12:01 +0800

03 Apr, 2013

2 commits

d66248326 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller

David S. Miller
2013-04-03 13:31:54 +0800
f0f6ee1f7 cbq: incorrect processing of high limits ... Browse Code »

currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin
Reviewed-by: Alexey Kuznetsov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Vasily Averin
2013-04-03 02:29:20 +0800

02 Apr, 2013

1 commit

a210576cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/mac80211/sta_info.c
net/wireless/core.h

Two minor conflicts in wireless. Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().

Signed-off-by: David S. Miller

David S. Miller
2013-04-02 01:36:50 +0800

30 Mar, 2013

1 commit

cd68ddd4c net: fq_codel: Fix off-by-one error ... Browse Code »

Currently, we hold a max of sch->limit -1 number of packets instead of
sch->limit packets. Fix this off-by-one error.

Signed-off-by: Vijay Subramanian
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Vijay Subramanian
2013-03-30 03:32:23 +0800

29 Mar, 2013

1 commit

573ce260b net-next: replace obsolete NLMSG_* with type safe nlmsg_* ... Browse Code »

Signed-off-by: Hong Zhiguo
Signed-off-by: David S. Miller

Hong zhi guo
2013-03-29 02:25:25 +0800

28 Mar, 2013

1 commit

ea872d771 sch: add missing u64 in psched_ratecfg_precompute() ... Browse Code »

It seems that commit

commit 292f1c7ff6cc10516076ceeea45ed11833bb71c7
Author: Jiri Pirko
Date: Tue Feb 12 00:12:03 2013 +0000

sch: make htb_rate_cfg and functions around that generic

adds little regression.

Before:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 5000Mbit ceil 5000Mbit burst 625b cburst
625b
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
lended: 0 borrowed: 0 giants: 0
tokens: 31 ctokens: 31

After:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 1544Mbit ceil 1544Mbit burst 625b cburst
625b
Sent 5073 bytes 41 pkt (dropped 0, overlimits 0 requeues 0)
rate 1976bit 2pps backlog 0b 0p requeues 0
lended: 41 borrowed: 0 giants: 0
tokens: 1802 ctokens: 1802

This probably due to lost u64 cast of rate parameter in
psched_ratecfg_precompute() (net/sched/sch_generic.c).

Signed-off-by: Sergey Popovich
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Sergey Popovich
2013-03-28 02:06:41 +0800

27 Mar, 2013

1 commit

de179c8c1 netlink: have length check of rtnl msg before deref ... Browse Code »

When the legacy array rtm_min still exists, the length check within
these functions is covered by rtm_min[RTM_NEWTFILTER],
rtm_min[RTM_NEWQDISC] and rtm_min[RTM_NEWTCLASS].

But after Thomas Graf removed rtm_min several days ago, these checks
are missing. Other doit functions should be OK.

Signed-off-by: Hong Zhiguo
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Hong zhi guo
2013-03-27 00:35:27 +0800

22 Mar, 2013

1 commit

661d2967b rtnetlink: Remove passing of attributes into rtnl_doit functions ... Browse Code »

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2013-03-22 22:31:16 +0800

12 Mar, 2013

1 commit

e5f2ef7ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/intel/e1000e/netdev.c

Minor conflict in e1000e, a line that got fixed in 'net'
has been removed in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2013-03-12 17:52:22 +0800

07 Mar, 2013

1 commit

6906f4ed6 htb: add HTB_DIRECT_QLEN attribute ... Browse Code »

HTB uses an internal pfifo queue, which limit is not reported
to userland tools (tc), and value inherited from device tx_queue_len
at setup time.

Introduce TCA_HTB_DIRECT_QLEN attribute to allow finer control.

Remove two obsolete pr_err() calls as well.

Signed-off-by: Eric Dumazet
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller

Eric Dumazet
2013-03-07 04:40:53 +0800

06 Mar, 2013

5 commits

76e4cb0d3 pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible ... Browse Code »

QFQ+ can select for service only 'eligible' aggregates, i.e.,
aggregates that would have started to be served also in the emulated
ideal system. As a consequence, for QFQ+ to be work conserving, at
least one of the active aggregates must be eligible when it is time to
choose the next aggregate to serve.

The set of eligible aggregates is updated through the function
qfq_update_eligible(), which does guarantee that, after its
invocation, at least one of the active aggregates is eligible.
Because of this property, this function is invoked in
qfq_deactivate_agg() to guarantee that at least one of the active
aggregates is still eligible after an aggregate has been deactivated.
In particular, the critical case is when there are other active
aggregates, but the aggregate being deactivated happens to be the only
one eligible.

However, this precaution is not needed for QFQ+ to be work conserving,
because update_eligible() is always invoked also at the beginning of
qfq_choose_next_agg(). This patch removes the additional invocation of
update_eligible() in qfq_deactivate_agg().

Signed-off-by: Paolo Valente
Reviewed-by: Fabio Checconi
Signed-off-by: David S. Miller

Paolo Valente
2013-03-06 15:47:05 +0800
40dd2d546 pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service ... Browse Code »

By definition of (the algorithm of) QFQ+, the system virtual time must
be pushed up only if there is no 'eligible' aggregate, i.e. no
aggregate that would have started to be served also in the ideal
system emulated by QFQ+. QFQ+ serves only eligible aggregates, hence
the aggregate currently in service is eligible. As a consequence, to
decide whether there is no eligible aggregate, QFQ+ must also check
whether there is no aggregate in service.

Signed-off-by: Paolo Valente
Reviewed-by: Fabio Checconi
Signed-off-by: David S. Miller

Paolo Valente
2013-03-06 15:47:05 +0800
a0143efa9 pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue ... Browse Code »

Aggregate budgets are computed so as to guarantee that, after an
aggregate has been selected for service, that aggregate has enough
budget to serve at least one maximum-size packet for the classes it
contains. For this reason, after a new aggregate has been selected
for service, its next packet is immediately dequeued, without any
further control.

The maximum packet size for a class, lmax, can be changed through
qfq_change_class(). In case the user sets lmax to a lower value than
the the size of some of the still-to-arrive packets, QFQ+ will
automatically push up lmax as it enqueues these packets. This
automatic push up is likely to happen with TSO/GSO.

In any case, if lmax is assigned a lower value than the size of some
of the packets already enqueued for the class, then the following
problem may occur: the size of the next packet to dequeue for the
class may happen to be larger than lmax, after the aggregate to which
the class belongs has been just selected for service. In this case,
even the budget of the aggregate, which is an unsigned value, may be
lower than the size of the next packet to dequeue. After dequeueing
this packet and subtracting its size from the budget, the latter would
wrap around.

This fix prevents the budget from wrapping around after any packet
dequeue.

Signed-off-by: Paolo Valente
Reviewed-by: Fabio Checconi
Signed-off-by: David S. Miller

Paolo Valente
2013-03-06 15:47:05 +0800
2f3b89a1f pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty ... Browse Code »

If no aggregate is in service, then the function qfq_dequeue() does
not dequeue any packet. For this reason, to guarantee QFQ+ to be work
conserving, a just-activated aggregate must be set as in service
immediately if it happens to be the only active aggregate.
This is done by the function qfq_enqueue().

Unfortunately, the function qfq_add_to_agg(), used to add a class to
an aggregate, does not perform this important additional operation.
In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
of a class from a source aggregate, becoming, for this move, inactive,
to a destination aggregate, becoming instead active, and 2) the
destination aggregate becomes the only active aggregate, then this
aggregate is not however set as in service. QFQ+ remains then in a
non-work-conserving state until a new invocation of qfq_enqueue()
recovers the situation.

This fix solves the problem by moving the logic for setting an
aggregate as in service directly into the function qfq_activate_agg().
Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
remains work conserving. Since the more-complex logic of this new
version of activate_aggregate() is not necessary, in qfq_dequeue(), to
reschedule an aggregate that finishes its budget, then the aggregate
is now rescheduled by invoking directly the functions needed.

Signed-off-by: Paolo Valente
Reviewed-by: Fabio Checconi
Signed-off-by: David S. Miller

Paolo Valente
2013-03-06 15:47:05 +0800
624b85fb9 pkt_sched: sch_qfq: fix the update of eligible-group sets ... Browse Code »

Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL< 31.
The fix just replaces 1UL with 1ULL.

Signed-off-by: Paolo Valente
Reviewed-by: Fabio Checconi
Signed-off-by: David S. Miller

Paolo Valente
2013-03-06 15:47:05 +0800