Eric Lee / smarc-fsl-linux-kernel

21 Mar, 2012

2 commits

3b59bf081 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking merge from David Miller:
"1) Move ixgbe driver over to purely page based buffering on receive.
From Alexander Duyck.

2) Add receive packet steering support to e1000e, from Bruce Allan.

3) Convert TCP MD5 support over to RCU, from Eric Dumazet.

4) Reduce cpu usage in handling out-of-order TCP packets on modern
systems, also from Eric Dumazet.

5) Support the IP{,V6}_UNICAST_IF socket options, making the wine
folks happy, from Erich Hoover.

6) Support VLAN trunking from guests in hyperv driver, from Haiyang
Zhang.

7) Support byte-queue-limtis in r8169, from Igor Maravic.

8) Outline code intended for IP_RECVTOS in IP_PKTOPTIONS existed but
was never properly implemented, Jiri Benc fixed that.

9) 64-bit statistics support in r8169 and 8139too, from Junchang Wang.

10) Support kernel side dump filtering by ctmark in netfilter
ctnetlink, from Pablo Neira Ayuso.

11) Support byte-queue-limits in gianfar driver, from Paul Gortmaker.

12) Add new peek socket options to assist with socket migration, from
Pavel Emelyanov.

13) Add sch_plug packet scheduler whose queue is controlled by
userland daemons using explicit freeze and release commands. From
Shriram Rajagopalan.

14) Fix FCOE checksum offload handling on transmit, from Yi Zou."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1846 commits)
Fix pppol2tp getsockname()
Remove printk from rds_sendmsg
ipv6: fix incorrent ipv6 ipsec packet fragment
cpsw: Hook up default ndo_change_mtu.
net: qmi_wwan: fix build error due to cdc-wdm dependecy
netdev: driver: ethernet: Add TI CPSW driver
netdev: driver: ethernet: add cpsw address lookup engine support
phy: add am79c874 PHY support
mlx4_core: fix race on comm channel
bonding: send igmp report for its master
fs_enet: Add MPC5125 FEC support and PHY interface selection
net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
net: update the usage of CHECKSUM_UNNECESSARY
fcoe: use CHECKSUM_UNNECESSARY instead of CHECKSUM_PARTIAL on tx
net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso
ixgbe: Fix issues with SR-IOV loopback when flow control is disabled
net/hyperv: Fix the code handling tx busy
ixgbe: fix namespace issues when FCoE/DCB is not enabled
rtlwifi: Remove unused ETH_ADDR_LEN defines
igbvf: Use ETH_ALEN
...

Fix up fairly trivial conflicts in drivers/isdn/gigaset/interface.c and
drivers/net/usb/{Kconfig,qmi_wwan.c} as per David.

Linus Torvalds
2012-03-21 12:04:47 +0800
0d9cabdcc Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"Out of the 8 commits, one fixes a long-standing locking issue around
tasklist walking and others are cleanups."

* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
cgroup: remove cgroup_subsys argument from callbacks
cgroup: remove extra calls to find_existing_css_set
cgroup: replace tasklist_lock with rcu_read_lock
cgroup: simplify double-check locking in cgroup_attach_proc
cgroup: move struct cgroup_pidlist out from the header file
cgroup: remove cgroup_attach_task_current_cg()

Linus Torvalds
2012-03-21 09:11:21 +0800

19 Mar, 2012

1 commit

4da0bd736 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2012-03-19 11:29:41 +0800

16 Mar, 2012

1 commit

cc34eb672 sch_sfq: revert dont put new flow at the end of flows ... Browse Code »

This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.

Reported-by: Jesper Dangaard Brouer
Cc: Dave Taht
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-03-16 16:55:25 +0800

27 Feb, 2012

1 commit

ff4783ce7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller

David S. Miller
2012-02-27 10:55:51 +0800

20 Feb, 2012

1 commit

cd961c2ca netem: fix dequeue ... Browse Code »

commit 50612537e9 (netem: fix classful handling) added two errors in
netem_dequeue()

1) After checking skb at the head of tfifo queue for time constraints,
it dequeues tail skb, thus adding unwanted reordering.

2) qdisc stats are updated twice per packet
(one when packet dequeued from tfifo, once when delivered)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-20 07:57:50 +0800

14 Feb, 2012

1 commit

2132cf643 net_sched: sch_plug: plug_qdisc_ops is static ... Browse Code »

net/sched/sch_plug.c:211:18: warning: symbol 'plug_qdisc_ops' was not
declared. Should it be static?

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-14 05:04:40 +0800

10 Feb, 2012

1 commit

16bda13d9 net: Make qdisc_skb_cb upper size bound explicit. ... Browse Code »
1

Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller

David S. Miller
2012-02-10 02:50:34 +0800

08 Feb, 2012

1 commit

c3059be16 net/sched: sch_plug - Queue traffic until an explicit release command ... Browse Code »

The qdisc supports two operations - plug and unplug. When the
qdisc receives a plug command via netlink request, packets arriving
henceforth are buffered until a corresponding unplug command is received.
Depending on the type of unplug command, the queue can be unplugged
indefinitely or selectively.

This qdisc can be used to implement output buffering, an essential
functionality required for consistent recovery in checkpoint based
fault-tolerance systems. Output buffering enables speculative execution
by allowing generated network traffic to be rolled back. It is used to
provide network protection for Xen Guests in the Remus high availability
project, available as part of Xen.

This module is generic enough to be used by any other system that wishes
to add speculative execution and output buffering to its applications.

This module was originally available in the linux 2.6.32 PV-OPS tree,
used as dom0 for Xen.

For more information, please refer to http://nss.cs.ubc.ca/remus/
and http://wiki.xensource.com/xenwiki/Remus

Changes in V3:
* Removed debug output (printk) on queue overflow
* Added TCQ_PLUG_RELEASE_INDEFINITE - that allows the user to
use this qdisc, for simple plug/unplug operations.
* Use of packet counts instead of pointers to keep track of
the buffers in the queue.

Signed-off-by: Shriram Rajagopalan
Signed-off-by: Brendan Cully
[author of the code in the linux 2.6.32 pvops tree]
Signed-off-by: David S. Miller

Shriram Rajagopalan
2012-02-08 01:54:56 +0800

07 Feb, 2012

1 commit

a0417fa3a net: Make qdisc_skb_cb upper size bound explicit. ... Browse Code »
93

Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller

David S. Miller
2012-02-07 04:14:37 +0800

03 Feb, 2012

1 commit

761b3ef50 cgroup: remove cgroup_subsys argument from callbacks ... Browse Code »
46

The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.

Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().

So we reduce a few lines of code, though the shrinking of object size
is minimal.

16 files changed, 113 insertions(+), 162 deletions(-)

text data bss dec hex filename
5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
5486170 656987 7039960 13183117 c9288d vmlinux.o

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Li Zefan
2012-02-03 01:20:22 +0800

23 Jan, 2012

1 commit

a42b4799c netem: Fix off-by-one bug in reordering ... Browse Code »

With netem reordering, a gap of N is supposed to reorder every Nth packet with
given reorder probability. However, the code currently skips N packets and
reorders every (N+1)th packet.

Signed-off-by: Vijay Subramanian
Signed-off-by: David S. Miller

Vijay Subramanian
2012-01-23 04:08:44 +0800

13 Jan, 2012

1 commit

ddecf0f4d net_sched: sfq: add optional RED on top of SFQ ... Browse Code »

Adds an optional Random Early Detection on each SFQ flow queue.

Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.

1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.

2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]

Example of use :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
limit 3000 headdrop flows 512 divisor 16384 \
redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
ewma 6 min 8000b max 60000b probability 0.2 ecn
prob_mark 0 prob_mark_head 4876 prob_drop 6131
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
rate 99483Kbit 8219pps backlog 689392b 456p requeues 0

In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)

If same test is run, without RED, we can check backlog is much bigger.

qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0

Signed-off-by: Eric Dumazet
CC: Stephen Hemminger
CC: Dave Taht
Tested-by: Dave Taht
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-13 12:05:28 +0800

06 Jan, 2012

3 commits

eeca6688d net_sched: red: split red_parms into parms and vars ... Browse Code »

This patch splits the red_parms structure into two components.

One holding the RED 'constant' parameters, and one containing the
variables.

This permits a size reduction of GRED qdisc, and is a preliminary step
to add an optional RED unit to SFQ.

SFQRED will have a single red_parms structure shared by all flows, and a
private red_vars per flow.

Signed-off-by: Eric Dumazet
CC: Dave Taht
CC: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-06 03:01:21 +0800
18cb80985 net_sched: sfq: extend limits ... Browse Code »

SFQ as implemented in Linux is very limited, with at most 127 flows
and limit of 127 packets. [ So if 127 flows are active, we have one
packet per flow ]

This patch brings to SFQ following features to cope with modern needs.

- Ability to specify a smaller per flow limit of inflight packets.
(default value being at 127 packets)

- Ability to have up to 65408 active flows (instead of 127)

- Ability to have head drops instead of tail drops
(to drop old packets from a flow)

Example of use : No more than 20 packets per flow, max 8000 flows, max
20000 packets in SFQ qdisc, hash table of 65536 slots.

tc qdisc add ... sfq \
flows 8000 \
depth 20 \
headdrop \
limit 20000 \
divisor 65536

Ram usage :

2 bytes per hash table entry (instead of previous 1 byte/entry)
32 bytes per flow on 64bit arches, instead of 384 for QFQ, so much
better cache hit ratio.

Signed-off-by: Eric Dumazet
CC: Dave Taht
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-06 03:01:21 +0800
eb1019244 net_sched: Bug in netem reordering ... Browse Code »
1

Not now, but it looks you are correct. q->qdisc is NULL until another
additional qdisc is attached (beside tfifo). See 50612537e9ab2969312.
The following patch should work.

From: Hagen Paul Pfeifer

netem: catch NULL pointer by updating the real qdisc statistic

Reported-by: Vijay Subramanian
Signed-off-by: Hagen Paul Pfeifer
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hagen Paul Pfeifer
2012-01-06 02:27:39 +0800

05 Jan, 2012

3 commits

117ff42fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2012-01-05 10:35:43 +0800
02a9098ed net_sched: sfq: always randomize hash perturbation ... Browse Code »

SFQ q->perturbation is used in sfq_hash() as an input to Jenkins hash.

We currently randomize this 32bit value only if a perturbation timer is
setup.

Its much better to always initialize it to defeat attackers, or else
they can predict very well what kind of packets they have to forge to
hit a particular flow.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-05 03:12:48 +0800
bd16a6cce net_sched: sfq: fix mem alloc error recovery ... Browse Code »

Since commit 817fb15dfd98 (net_sched: sfq: allow divisor to be a
parameter), we can leave perturbation timer armed if a memory allocation
error aborts sfq_init().

Memory containing active struct timer_list is freed and kernel can
crash.

Call sfq_destroy() from sfq_init() to properly dismantle qdisc.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-05 03:12:48 +0800

04 Jan, 2012

4 commits

fa0f5aa74 net_sched: qdisc_alloc_handle() can be too slow ... Browse Code »

When trying to allocate ~32768 qdiscs using autohandle mechanism, we can
fill the space managed by kernel (handles in [8000-FFFF]:0000 range)

But O(N^2) qdisc_alloc_handle() loops 0x10000 times instead of 0x8000

time tc add qdisc add dev eth0 parent 10:7fff pfifo limit 10
RTNETLINK answers: Cannot allocate memory
real 1m54.826s
user 0m0.000s
sys 0m0.004s

INFO: rcu_sched_state detected stall on CPU 0 (t=60000 jiffies)

Half number of loops, and add a cond_resched() call.
We hold rtnl at this point.

Signed-off-by: Eric Dumazet
CC: Dave Taht
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-04 02:03:20 +0800
d32ae76f2 sch_qfq: accurate wsum handling ... Browse Code »

We can underestimate q->wsum in case of "tc class replace ... qfq"
and/or qdisc_create_dflt() error.

wsum is not really used in fast path, only at qfq qdisc/class setup,
to catch user error.

Signed-off-by: Eric Dumazet
CC: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-04 02:02:19 +0800
6bafcac32 sch_qfq: fix overflow in qfq_update_start() ... Browse Code »

grp->slot_shift is between 22 and 41, so using 32bit wide variables is
probably a typo.

This could explain QFQ hangs Dave reported to me, after 2^23 packets ?

(23 = 64 - 41)

Reported-by: Dave Taht
Signed-off-by: Eric Dumazet
CC: Stephen Hemminger
CC: Dave Taht
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-04 01:58:23 +0800
d47a0ac7b sch_sfq: dont put new flow at the end of flows ... Browse Code »
46

SFQ enqueue algo puts a new flow _behind_ all pre-existing flows in the
circular list. In fact this is probably an old SFQ implementation bug.

100 Mbits = ~8333 full frames per second, or ~8 frames per ms.

With 50 flows, it means your "new flow" will have to wait 50 packets
being sent before its own packet. Thats the ~6ms.

We certainly can change SFQ to give a priority advantage to new flows,
so that next dequeued packet is taken from a new flow, not an old one.

Reported-by: Dave Taht
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-04 01:52:09 +0800

31 Dec, 2011

2 commits

50612537e netem: fix classful handling ... Browse Code »
47

Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
reintroduced classful functionality to netem, but broke basic netem
behavior :

netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]

If qdisc is changed, time constraints are not respected and other qdisc
can destroy skb->cb[] and block netem at dequeue time.

Fix this by always using internal tfifo, and optionally attach a child
qdisc to netem (or a tree of qdiscs)

Example of use :

DEV=eth3
tc qdisc del dev $DEV root
tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
burst 20480 limit 20480 mtu 1514 rate 32000bps

qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0)
rate 18416bit 3pps backlog 0b 0p requeues 0
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0)
backlog 0b 5p requeues 0

Signed-off-by: Eric Dumazet
CC: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-31 06:12:23 +0800
7f8e3234c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2011-12-31 02:04:14 +0800

30 Dec, 2011

1 commit

b0460e448 sch_tbf: report backlog information ... Browse Code »

Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
Sent 948517 bytes 898 pkt (dropped 0, overlimits 0 requeues 1)
rate 175056bit 16pps backlog 114b 1p requeues 1
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
Sent 948517 bytes 898 pkt (dropped 15, overlimits 611 requeues 0)
backlog 18168b 12p requeues 0

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-30 04:07:21 +0800

25 Dec, 2011

1 commit

bb52c7acf netem: dont call vfree() under spinlock and BH disabled ... Browse Code »

commit 6373a9a286 (netem: use vmalloc for distribution table) added a
regression, since vfree() is called while holding a spinlock and BH
being disabled.

Fix this by doing the pointers swap in critical section, and freeing
after spinlock release.

Also add __GFP_NOWARN to the kmalloc() try, since we fallback to
vmalloc().

Signed-off-by: Eric Dumazet
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-25 05:08:50 +0800

24 Dec, 2011

3 commits

abb434cb0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller

David S. Miller
2011-12-24 06:13:56 +0800
2494654d4 netem: loss model API sizes ... Browse Code »

The new netem loss model is configured with nested netlink messages.
This code is being overly strict about sizes, and is easily confused
by padding (or possible future expansion). Also message
for gemodel is incorrect.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

stephen hemminger
2011-12-24 05:51:18 +0800
f5a59b733 sch_hfsc: report backlog information ... Browse Code »

Add backlog (byte count) information in hfsc classes and qdisc, so that
"tc -s" can report it to user, instead of 0 values :

qdisc hfsc 1: root refcnt 6 default 20
Sent 45141660 bytes 30545 pkt (dropped 0, overlimits 91751 requeues 0)
rate 1492Kbit 126pps backlog 103226b 74p requeues 0
...
class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 400000bit ls m1 0bit d 0us m2 200000bit
Sent 49534912 bytes 33519 pkt (dropped 0, overlimits 0 requeues 0)
backlog 81822b 56p requeues 0
period 23 work 49451576 bytes rtwork 13277552 bytes level 0
...

Signed-off-by: Eric Dumazet
CC: John A. Sullivan III
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-24 05:51:18 +0800

23 Dec, 2011

1 commit

7838f2ce3 mqprio: Avoid panic if no options are provided ... Browse Code »
1

Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2011-12-23 11:34:56 +0800

22 Dec, 2011

1 commit

225d9b89c sch_sfq: rehash queues in perturb timer ... Browse Code »

A known Out Of Order (OOO) problem hurts SFQ when timer changes
perturbation value, since all new packets delivered to SFQ enqueue might
end on different slots than previous in-flight packets.

With round robin delivery, we can thus deliver packets in a different
order.

Since SFQ is limited to small amount of in-flight packets, we can rehash
packets so that this OOO problem is fixed.

This rehashing is performed only if internal flow classifier is in use.

We now store in skb->cb[] the "struct flow_keys" so that we dont call
skb_flow_dissect() again while rehashing.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-22 04:44:34 +0800

17 Dec, 2011

1 commit

869aa4104 sch_gred: prefer GFP_KERNEL allocations ... Browse Code »

In control path, its better to use GFP_KERNEL allocations where
possible.

Before taking qdisc spinlock, we preallocate memory just in case we'll
need it in gred_change_vq()

This is a followup to commit 3f1e6d3fd37b (sch_gred: should not use
GFP_KERNEL while holding a spinlock)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-17 04:40:33 +0800

16 Dec, 2011

1 commit

b26e478f8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »
46

Conflicts:
drivers/net/ethernet/freescale/fsl_pq_mdio.c
net/batman-adv/translation-table.c
net/ipv6/route.c

David S. Miller
2011-12-16 15:11:14 +0800

15 Dec, 2011

1 commit

3a53943b5 cls_flow: remove one dynamic array ... Browse Code »

Its better to use a predefined size for this small automatic variable.

Removes a sparse error as well :

net/sched/cls_flow.c:288:13: error: bad constant expression

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-15 02:34:55 +0800

13 Dec, 2011

2 commits

90b41a1cd netem: add cell concept to simulate special MAC behavior ... Browse Code »

This extension can be used to simulate special link layer
characteristics. Simulate because packet data is not modified, only the
calculation base is changed to delay a packet based on the original
packet size and artificial cell information.

packet_overhead can be used to simulate a link layer header compression
scheme (e.g. set packet_overhead to -20) or with a positive
packet_overhead value an additional MAC header can be simulated. It is
also possible to "replace" the 14 byte Ethernet header with something
else.

cell_size and cell_overhead can be used to simulate link layer schemes,
based on cells, like some TDMA schemes. Another application area are MAC
schemes using a link layer fragmentation with a (small) header each.
Cell size is the maximum amount of data bytes within one cell. Cell
overhead is an additional variable to change the per-cell-overhead
(e.g. 5 byte header per fragment).

Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
cell overhead 5 byte):

tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: Florian Westphal
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Hagen Paul Pfeifer
2011-12-13 08:44:48 +0800
3f1e6d3fd sch_gred: should not use GFP_KERNEL while holding a spinlock ... Browse Code »
1

gred_change_vq() is called under sch_tree_lock(sch).

This means a spinlock is held, and we are not allowed to sleep in this
context.

We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-13 08:08:54 +0800

10 Dec, 2011

1 commit

a73ed26bb sch_red: generalize accurate MAX_P support to RED/GRED/CHOKE ... Browse Code »

Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.

Old tc binaries are non aware of new attributes, and still set/get Plog.

New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.

# tc -d qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15

Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-10 02:46:15 +0800

09 Dec, 2011

1 commit

8af2a218d sch_red: Adaptative RED AQM ... Browse Code »

Adaptative RED AQM for linux, based on paper from Sally FLoyd,
Ramakrishna Gummadi, and Scott Shenker, August 2001 :

http://icir.org/floyd/papers/adaptiveRed.pdf

Goal of Adaptative RED is to make max_p a dynamic value between 1% and
50% to reach the target average queue : (max_th - min_th) / 2

Every 500 ms:
if (avg > target and max_p < target and max_p >= 0.01)
decrease max_p : max_p *= beta;

target :[min_th + 0.4*(min_th - max_th),
min_th + 0.6*(min_th - max_th)].
alpha : min(0.01, max_p / 4)
beta : 0.9
max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)

Changes against our RED implementation are :

max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
fixed point number, to allow full range described in Adatative paper.

To deliver a random number, we now use a reciprocal divide (thats really
a multiply), but this operation is done once per marked/droped packet
when in RED_BETWEEN_TRESH window, so added cost (compared to previous
AND operation) is near zero.

dump operation gives current max_p value in a new TCA_RED_MAX_P
attribute.

Example on a 10Mbit link :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
limit 400000 min 30000 max 90000 avpkt 1000 \
burst 55 ecn adaptative bandwidth 10Mbit

# tc -s -d qdisc show dev eth3
...
qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
adaptative ewma 5 max_p=0.113335 Scell_log 15
Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
rate 9749Kbit 831pps backlog 72056b 16p requeues 0
marked 1357 early 35 pdrop 0 other 0

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-09 08:52:43 +0800

06 Dec, 2011

1 commit

272174550 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. ... Browse Code »

To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller
Acked-by: Roland Dreier

David Miller
2011-12-06 04:20:19 +0800