Eric Lee / smarc-fsl-linux-kernel

05 Apr, 2011

1 commit

0545a3037 pkt_sched: QFQ - quick fair queue scheduler ... Browse Code »

This is an implementation of the Quick Fair Queue scheduler developed
by Fabio Checconi. The same algorithm is already implemented in ipfw
in FreeBSD. Fabio had an earlier version developed on Linux, I just
cleaned it up. Thanks to Eric Dumazet for testing this under load.

Signed-off-by: Stephen Hemminger
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

stephen hemminger
2011-04-05 02:10:24 +0800

24 Feb, 2011

1 commit

e13e02a3c net_sched: SFB flow scheduler ... Browse Code »

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
if (B[i][h{i}].qlen > bin_size)
B[i][h{i}].p_mark += p_increment;
else if (B[i][h{i}].qlen == 0)
B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
ratelimit();
else
mark/drop with probabilty p_min;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $DEV parent 1:11 handle 11: \
est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet
CC: Juliusz Chroboczek
CC: Stephen Hemminger
CC: Patrick McHardy
CC: Andi Kleen
CC: John W. Linville
Signed-off-by: David S. Miller

Eric Dumazet
2011-02-24 06:05:11 +0800

03 Feb, 2011

1 commit

45e144339 sched: CHOKe flow scheduler ... Browse Code »

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
For every packet arrival:
Calculate Qave
if (Qave < minth)
Queue the new packet
else
Select randomly a packet from the queue
if (both packets from same flow)
then Drop both the packets
else if (Qave > maxth)
Drop packet
else
Admit packet with proability p (same as RED)

See also:
Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
queue management scheme for approximating fair bandwidth allocation",
Proceeding of INFOCOM'2000, March 2000.

Help from:
Eric Dumazet
Patrick McHardy

Signed-off-by: Stephen Hemminger
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

stephen hemminger
2011-02-03 12:52:42 +0800

20 Jan, 2011

1 commit

b8970f0bf net_sched: implement a root container qdisc sch_mqprio ... Browse Code »

This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
__u8 num_tc;
__u8 prio_tc_map[TC_BITMASK + 1];
__u8 hw;
__u16 count[TC_MAX_QUEUE];
__u16 offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.

Signed-off-by: John Fastabend
Signed-off-by: David S. Miller

John Fastabend
2011-01-20 15:31:11 +0800

20 Aug, 2010

1 commit

eb4d40654 net/sched: add ACT_CSUM action to update packets checksums ... Browse Code »

net/sched: add ACT_CSUM action to update packets checksums

ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
altered checksums in IPv4 and IPv6 packets. The following checksums are
supported by this patch:
- IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
- IPv6: ICMPv6, TCP, UDP & UDPLite
It's possible to request in the same action to update different kind of
checksums, if the packets flow mix TCP, UDP and UDPLite, ...

An example of usage is done in the associated iproute2 patch.

Version 3 changes:
- remove useless goto instructions
- improve IPv6 hop options decoding

Version 2 changes:
- coding style correction
- remove useless arguments of some functions
- use stack in tcf_csum_dump()
- add tcf_csum_skb_nextlayer() to factor code

Signed-off-by: Gregoire Baron
Acked-by: jamal
Signed-off-by: David S. Miller

Grégoire Baron
2010-08-20 16:42:59 +0800

06 Sep, 2009

1 commit

6ec1c69a8 net_sched: add classful multiqueue dummy scheduler ... Browse Code »

This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.

This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.

Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

- cl_ops->select_queue selects the tx queue number for new child classes.

- qdisc_ops->attach() overrides root qdisc device grafting to attach
non-shared qdiscs to the queues.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

David S. Miller
2009-09-06 17:07:05 +0800

20 Nov, 2008

1 commit

13d2a1d2b pkt_sched: add DRR scheduler ... Browse Code »

Add classful DRR scheduler as a more flexible replacement for SFQ.

The main difference to the algorithm described in "Efficient Fair Queueing
using Deficit Round Robin" is that this implementation doesn't drop packets
from the longest queue on overrun because its classful and limits are
handled by each individual child qdisc.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-11-20 20:10:00 +0800

08 Nov, 2008

1 commit

f40092373 pkt_sched: Control group classifier ... Browse Code »

The classifier should cover the most common use case and will work
without any special configuration.

The principle of the classifier is to directly access the
task_struct via get_current(). In order for this to work,
classification requests from softirqs must be ignored. This is
not a problem because the vast majority of packets in softirq
context are not assigned to a task anyway. For this to work, a
mechanism is needed to trace softirq context.

This repost goes back to the method of relying on the number of
nested bh disable calls for the sake of not adding too much
complexity and the option to come up with something more reliable
if actually needed.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2008-11-08 14:56:00 +0800

13 Sep, 2008

2 commits

ca9b0e27e pkt_action: add new action skbedit ... Browse Code »

This new action will have the ability to change the priority and/or
queue_mapping fields on an sk_buff.

Signed-off-by: Alexander Duyck
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller

Alexander Duyck
2008-09-13 07:30:20 +0800
92651940a pkt_sched: Add multiqueue scheduler support ... Browse Code »

This patch is intended to add a qdisc to support the new tx multiqueue
architecture by providing a band for each hardware queue. By doing
this it is possible to support a different qdisc per physical hardware
queue.

This qdisc uses the skb->queue_mapping to select which band to place
the traffic onto. It then uses a round robin w/ a check to see if the
subqueue is stopped to determine which band to dequeue the packet from.

Signed-off-by: Alexander Duyck
Signed-off-by: Jeff Kirsher
Signed-off-by: David S. Miller

Alexander Duyck
2008-09-13 07:29:34 +0800

01 Feb, 2008

1 commit

e5dfb8151 [NET_SCHED]: Add flow classifier ... Browse Code »

Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...

Some examples:

- Classic SFQ hash:

tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024

- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:

tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024

- Map destination IPs of 192.168.0.0/24 to classids 1-257:

tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256

- alternatively:

tc filter add ... flow map \
key dst and 0xff

- similar, but reverse ordered:

tc filter add ... flow map \
key dst and 0xff xor 0xff

Perturbation is currently not supported because we can't reliable kill the
timer on destruction.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2008-02-01 11:28:36 +0800

11 Oct, 2007

1 commit

b42199523 [PKT_SCHED]: Add stateless NAT ... Browse Code »

Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.

In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.

Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem. This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small. The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.

If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.

For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.

The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.

The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey. In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2007-10-11 07:53:11 +0800

15 Jul, 2007

1 commit

c3bc7cff8 [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE ... Browse Code »

The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2007-07-15 15:03:05 +0800

27 Mar, 2007

1 commit

9b2f7bcf0 [NET]: Remove dead net/sched/Makefile entry for sch_hpfq.o. ... Browse Code »

Remove the worthless net/sched/Makefile entry for the non-existent
source file sch_hpfq.c.

Signed-off-by: Robert P. J. Day
Signed-off-by: David S. Miller

Robert P. J. Day
2007-03-27 07:20:34 +0800

03 Dec, 2006

1 commit

3c62f75aa [PKT_SCHED]: Make sch_fifo.o available when CONFIG_NET_SCHED is not set. ... Browse Code »

Based on patch by Patrick McHardy.

Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
without requiring CONFIG_NET_SCHED.

The d80211 stack needs a generic fifo qdisc for WME. At present it
uses net/d80211/fifo_qdisc.c which is functionally equivalent to
sch_fifo.c. This patch will allow the d80211 stack to remove
net/d80211/fifo_qdisc.c and use sch_fifo.c instead.

Signed-off-by: David Kimdon
Signed-off-by: David S. Miller

David Kimdon
2006-12-03 13:21:43 +0800

10 Jan, 2006

1 commit

4bba39259 [PKT_SCHED]: Prefix tc actions with act_ ... Browse Code »

Clean up the net/sched directory a bit by prefix all actions with act_.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2006-01-10 06:16:14 +0800

06 Jul, 2005

1 commit

63d886c96 [PKT_SCHED]: Blackhole queueing discipline ... Browse Code »

Useful in combination with classful qdiscs to drop or
temporary disable certain flows, e.g. one could block
specific ds flows with dsmark.

Unlike the noop qdisc it can be controlled by the user and
statistic accounting is done.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2005-07-06 06:29:16 +0800

24 Jun, 2005

1 commit

d675c989e [PKT_SCHED]: Packet classification based on textsearch (ematch) ... Browse Code »

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2005-06-24 12:00:58 +0800

25 Apr, 2005

1 commit

db7530797 [PKT_SCHED]: Introduce simple actions. ... Browse Code »

And provide an example simply action in order to
demonstrate usage.

Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Jamal Hadi Salim
2005-04-25 11:10:16 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800