05 Apr, 2011

1 commit

  • This is an implementation of the Quick Fair Queue scheduler developed
    by Fabio Checconi. The same algorithm is already implemented in ipfw
    in FreeBSD. Fabio had an earlier version developed on Linux, I just
    cleaned it up. Thanks to Eric Dumazet for testing this under load.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

24 Feb, 2011

1 commit

  • This is the Stochastic Fair Blue scheduler, based on work from :

    W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
    Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

    http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

    This implementation is based on work done by Juliusz Chroboczek

    General SFB algorithm can be found in figure 14, page 15:

    B[l][n] : L x N array of bins (L levels, N bins per level)
    enqueue()
    Calculate hash function values h{0}, h{1}, .. h{L-1}
    Update bins at each level
    for i = 0 to L - 1
    if (B[i][h{i}].qlen > bin_size)
    B[i][h{i}].p_mark += p_increment;
    else if (B[i][h{i}].qlen == 0)
    B[i][h{i}].p_mark -= p_decrement;
    p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
    if (p_min == 1.0)
    ratelimit();
    else
    mark/drop with probabilty p_min;

    I did the adaptation of Juliusz code to meet current kernel standards,
    and various changes to address previous comments :

    http://thread.gmane.org/gmane.linux.network/90225
    http://thread.gmane.org/gmane.linux.network/90375

    Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
    we can use an external flow classifier if wanted.

    tc qdisc add dev $DEV parent 1:11 handle 11: \
    est 0.5sec 2sec sfb limit 128

    tc filter add dev $DEV protocol ip parent 11: handle 3 \
    flow hash keys dst divisor 1024

    Notes:

    1) SFB default child qdisc is pfifo_fast. It can be changed by another
    qdisc but a child qdisc MUST not drop a packet previously queued. This
    is because SFB needs to handle a dequeued packet in order to maintain
    its virtual queue states. pfifo_head_drop or CHOKe should not be used.

    2) ECN is enabled by default, unlike RED/CHOKe/GRED

    With help from Patrick McHardy & Andi Kleen

    Signed-off-by: Eric Dumazet
    CC: Juliusz Chroboczek
    CC: Stephen Hemminger
    CC: Patrick McHardy
    CC: Andi Kleen
    CC: John W. Linville
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Feb, 2011

1 commit

  • CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
    packet scheduler based on the Random Exponential Drop (RED) algorithm.

    The core idea is:
    For every packet arrival:
    Calculate Qave
    if (Qave < minth)
    Queue the new packet
    else
    Select randomly a packet from the queue
    if (both packets from same flow)
    then Drop both the packets
    else if (Qave > maxth)
    Drop packet
    else
    Admit packet with proability p (same as RED)

    See also:
    Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
    queue management scheme for approximating fair bandwidth allocation",
    Proceeding of INFOCOM'2000, March 2000.

    Help from:
    Eric Dumazet
    Patrick McHardy

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    stephen hemminger
     

20 Jan, 2011

1 commit

  • This implements a mqprio queueing discipline that by default creates
    a pfifo_fast qdisc per tx queue and provides the needed configuration
    interface.

    Using the mqprio qdisc the number of tcs currently in use along
    with the range of queues alloted to each class can be configured. By
    default skbs are mapped to traffic classes using the skb priority.
    This mapping is configurable.

    Configurable parameters,

    struct tc_mqprio_qopt {
    __u8 num_tc;
    __u8 prio_tc_map[TC_BITMASK + 1];
    __u8 hw;
    __u16 count[TC_MAX_QUEUE];
    __u16 offset[TC_MAX_QUEUE];
    };

    Here the count/offset pairing give the queue alignment and the
    prio_tc_map gives the mapping from skb->priority to tc.

    The hw bit determines if the hardware should configure the count
    and offset values. If the hardware bit is set then the operation
    will fail if the hardware does not implement the ndo_setup_tc
    operation. This is to avoid undetermined states where the hardware
    may or may not control the queue mapping. Also minimal bounds
    checking is done on the count/offset to verify a queue does not
    exceed num_tx_queues and that queue ranges do not overlap. Otherwise
    it is left to user policy or hardware configuration to create
    useful mappings.

    It is expected that hardware QOS schemes can be implemented by
    creating appropriate mappings of queues in ndo_tc_setup().

    One expected use case is drivers will use the ndo_setup_tc to map
    queue ranges onto 802.1Q traffic classes. This provides a generic
    mechanism to map network traffic onto these traffic classes and
    removes the need for lower layer drivers to know specifics about
    traffic types.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

20 Aug, 2010

1 commit

  • net/sched: add ACT_CSUM action to update packets checksums

    ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some
    altered checksums in IPv4 and IPv6 packets. The following checksums are
    supported by this patch:
    - IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite
    - IPv6: ICMPv6, TCP, UDP & UDPLite
    It's possible to request in the same action to update different kind of
    checksums, if the packets flow mix TCP, UDP and UDPLite, ...

    An example of usage is done in the associated iproute2 patch.

    Version 3 changes:
    - remove useless goto instructions
    - improve IPv6 hop options decoding

    Version 2 changes:
    - coding style correction
    - remove useless arguments of some functions
    - use stack in tcf_csum_dump()
    - add tcf_csum_skb_nextlayer() to factor code

    Signed-off-by: Gregoire Baron
    Acked-by: jamal
    Signed-off-by: David S. Miller

    Grégoire Baron
     

06 Sep, 2009

1 commit

  • This patch adds a classful dummy scheduler which can be used as root qdisc
    for multiqueue devices and exposes each device queue as a child class.

    This allows to address queues individually and graft them similar to regular
    classes. Additionally it presents an accumulated view of the statistics of
    all real root qdiscs in the dummy root.

    Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

    - cl_ops->select_queue selects the tx queue number for new child classes.

    - qdisc_ops->attach() overrides root qdisc device grafting to attach
    non-shared qdiscs to the queues.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    David S. Miller
     

20 Nov, 2008

1 commit

  • Add classful DRR scheduler as a more flexible replacement for SFQ.

    The main difference to the algorithm described in "Efficient Fair Queueing
    using Deficit Round Robin" is that this implementation doesn't drop packets
    from the longest queue on overrun because its classful and limits are
    handled by each individual child qdisc.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

08 Nov, 2008

1 commit

  • The classifier should cover the most common use case and will work
    without any special configuration.

    The principle of the classifier is to directly access the
    task_struct via get_current(). In order for this to work,
    classification requests from softirqs must be ignored. This is
    not a problem because the vast majority of packets in softirq
    context are not assigned to a task anyway. For this to work, a
    mechanism is needed to trace softirq context.

    This repost goes back to the method of relying on the number of
    nested bh disable calls for the sake of not adding too much
    complexity and the option to come up with something more reliable
    if actually needed.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

13 Sep, 2008

2 commits

  • This new action will have the ability to change the priority and/or
    queue_mapping fields on an sk_buff.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch is intended to add a qdisc to support the new tx multiqueue
    architecture by providing a band for each hardware queue. By doing
    this it is possible to support a different qdisc per physical hardware
    queue.

    This qdisc uses the skb->queue_mapping to select which band to place
    the traffic onto. It then uses a round robin w/ a check to see if the
    subqueue is stopped to determine which band to dequeue the packet from.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     

01 Feb, 2008

1 commit

  • Add new "flow" classifier, which is meant to extend the SFQ hashing
    capabilities without hard-coding new hash functions and also allows
    deterministic mappings of keys to classes, replacing some out of tree
    iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
    IPs to marks, with fw filters to classes), ...

    Some examples:

    - Classic SFQ hash:

    tc filter add ... flow hash \
    keys src,dst,proto,proto-src,proto-dst divisor 1024

    - Classic SFQ hash, but using information from conntrack to work properly in
    combination with NAT:

    tc filter add ... flow hash \
    keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024

    - Map destination IPs of 192.168.0.0/24 to classids 1-257:

    tc filter add ... flow map \
    key dst addend -192.168.0.0 divisor 256

    - alternatively:

    tc filter add ... flow map \
    key dst and 0xff

    - similar, but reverse ordered:

    tc filter add ... flow map \
    key dst and 0xff xor 0xff

    Perturbation is currently not supported because we can't reliable kill the
    timer on destruction.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

11 Oct, 2007

1 commit

  • Stateless NAT is useful in controlled environments where restrictions are
    placed on through traffic such that we don't need connection tracking to
    correctly NAT protocol-specific data.

    In particular, this is of interest when the number of flows or the number
    of addresses being NATed is large, or if connection tracking information
    has to be replicated and where it is not practical to do so.

    Previously we had stateless NAT functionality which was integrated into
    the IPv4 routing subsystem. This was a great solution as long as the NAT
    worked on a subnet to subnet basis such that the number of NAT rules was
    relatively small. The reason is that for SNAT the routing based system
    had to perform a linear scan through the rules.

    If the number of rules is large then major renovations would have take
    place in the routing subsystem to make this practical.

    For the time being, the least intrusive way of achieving this is to use
    the u32 classifier written by Alexey Kuznetsov along with the actions
    infrastructure implemented by Jamal Hadi Salim.

    The following patch is an attempt at this problem by creating a new nat
    action that can be invoked from u32 hash tables which would allow large
    number of stateless NAT rules that can be used/updated in constant time.

    The actual NAT code is mostly based on the previous stateless NAT code
    written by Alexey. In future we might be able to utilise the protocol
    NAT code from netfilter to improve support for other protocols.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

15 Jul, 2007

1 commit

  • The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
    remove the old code. The config option will be kept around to select
    the equivalent NET_CLS_ACT options for a short time to allow easier
    upgrades.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

27 Mar, 2007

1 commit


03 Dec, 2006

1 commit

  • Based on patch by Patrick McHardy.

    Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
    without requiring CONFIG_NET_SCHED.

    The d80211 stack needs a generic fifo qdisc for WME. At present it
    uses net/d80211/fifo_qdisc.c which is functionally equivalent to
    sch_fifo.c. This patch will allow the d80211 stack to remove
    net/d80211/fifo_qdisc.c and use sch_fifo.c instead.

    Signed-off-by: David Kimdon
    Signed-off-by: David S. Miller

    David Kimdon
     

10 Jan, 2006

1 commit


06 Jul, 2005

1 commit

  • Useful in combination with classful qdiscs to drop or
    temporary disable certain flows, e.g. one could block
    specific ds flows with dsmark.

    Unlike the noop qdisc it can be controlled by the user and
    statistic accounting is done.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

24 Jun, 2005

1 commit


25 Apr, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds