06 Jan, 2012

1 commit

  • This patch splits the red_parms structure into two components.

    One holding the RED 'constant' parameters, and one containing the
    variables.

    This permits a size reduction of GRED qdisc, and is a preliminary step
    to add an optional RED unit to SFQ.

    SFQRED will have a single red_parms structure shared by all flows, and a
    private red_vars per flow.

    Signed-off-by: Eric Dumazet
    CC: Dave Taht
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Dec, 2011

1 commit

  • Now RED uses a Q0.32 number to store max_p (max probability), allow
    RED/GRED/CHOKE to use/report full resolution at config/dump time.

    Old tc binaries are non aware of new attributes, and still set/get Plog.

    New tc binary set/get both Plog and max_p for backward compatibility,
    they display "probability value" if they get max_p from new kernels.

    # tc -d qdisc show dev ...
    ...
    qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
    probability 0.09 Scell_log 15

    Make sure we avoid potential divides by 0 in reciprocal_value(), if
    (max_th - min_th) is big.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Dec, 2011

1 commit

  • Adaptative RED AQM for linux, based on paper from Sally FLoyd,
    Ramakrishna Gummadi, and Scott Shenker, August 2001 :

    http://icir.org/floyd/papers/adaptiveRed.pdf

    Goal of Adaptative RED is to make max_p a dynamic value between 1% and
    50% to reach the target average queue : (max_th - min_th) / 2

    Every 500 ms:
    if (avg > target and max_p < target and max_p >= 0.01)
    decrease max_p : max_p *= beta;

    target :[min_th + 0.4*(min_th - max_th),
    min_th + 0.6*(min_th - max_th)].
    alpha : min(0.01, max_p / 4)
    beta : 0.9
    max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)

    Changes against our RED implementation are :

    max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
    fixed point number, to allow full range described in Adatative paper.

    To deliver a random number, we now use a reciprocal divide (thats really
    a multiply), but this operation is done once per marked/droped packet
    when in RED_BETWEEN_TRESH window, so added cost (compared to previous
    AND operation) is near zero.

    dump operation gives current max_p value in a new TCA_RED_MAX_P
    attribute.

    Example on a 10Mbit link :

    tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
    limit 400000 min 30000 max 90000 avpkt 1000 \
    burst 55 ecn adaptative bandwidth 10Mbit

    # tc -s -d qdisc show dev eth3
    ...
    qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
    adaptative ewma 5 max_p=0.113335 Scell_log 15
    Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
    rate 9749Kbit 831pps backlog 72056b 16p requeues 0
    marked 1357 early 35 pdrop 0 other 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Dec, 2011

1 commit

  • Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

    > (Almost) nobody uses RED because they can't figure it out.
    > According to Wikipedia, VJ says that:
    > "there are not one, but two bugs in classic RED."

    RED is useful for high throughput routers, I doubt many linux machines
    act as such devices.

    I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
    Gummadi, Scott Shender), August 2001

    In this version, maxp is dynamic (from 1% to 50%), and user only have to
    setup min_th (target average queue size)
    (max_th and wq (burst in linux RED) are automatically setup)

    By the way it seems we have a small bug in red_change()

    if (skb_queue_empty(&sch->q))
    red_end_of_idle_period(&q->parms);

    First, if queue is empty, we should call
    red_start_of_idle_period(&q->parms);

    Second, since we dont use anymore sch->q, but q->qdisc, the test is
    meaningless.

    Oh well...

    [PATCH] sch_red: fix red_change()

    Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
    we start an idle period, not end it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jan, 2011

1 commit


21 Jan, 2011

1 commit

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Jan, 2011

1 commit

  • Provide child qdisc backlog (byte count) information so that "tc -s
    qdisc" can report it to user.

    packet count is already correctly provided.

    qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn
    Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0)
    rate 242385Kbit 13630pps backlog 13560b 8p requeues 0
    marked 7865 early 1 pdrop 7 other 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

06 Sep, 2009

3 commits

  • The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
    originate from either ->get() or ->walk() and are always valid.

    Remove unnecessary checks.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some schedulers don't support creating, changing or deleting classes.
    Make the respective callbacks optionally and consistently return
    -EOPNOTSUPP for unsupported operations, instead of currently either
    -EOPNOTSUPP, -ENOSYS or no error.

    In case of sch_prio and sch_multiq, the removed operations additionally
    checked for an invalid class. This is not necessary since the class
    argument can only orginate from ->get() or in case of ->change is 0
    for creation of new classes, in which case ->change() incorrectly
    returned -ENOENT.

    As a side-effect, this patch fixes a possible (root-only) NULL pointer
    function call in sch_ingress, which didn't implement a so far mandatory
    ->delete() operation.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some qdiscs don't support attaching filters. Handle this centrally in
    cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

20 Nov, 2008

1 commit


14 Nov, 2008

1 commit

  • After implementing qdisc->ops->peek() and changing sch_netem into
    classless qdisc there are no more qdisc->ops->requeue() users. This
    patch removes this method with its wrappers (qdisc_requeue()), and
    also unused qdisc->requeue structure. There are a few minor fixes of
    warnings (htb_enqueue()) and comments btw.

    The idea to kill ->requeue() and a similar patch were first developed
    by David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

31 Oct, 2008

1 commit


05 Aug, 2008

1 commit

  • Patrick McHardy noticed:
    "The other problem that affects all qdiscs supporting actions is
    TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
    even though the packet is not queued, corrupting upper qdiscs'
    qlen counters."

    and later explained:
    "The reason why it translates it at all seems to be to not increase
    the drops counter. Within a single qdisc this could be avoided by
    other means easily, upper qdiscs would still increase the counter
    when we return anything besides NET_XMIT_SUCCESS though.

    This means we need a new NET_XMIT return value to indicate this to
    the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
    return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
    in dev_queue_xmit, similar to NET_XMIT_BYPASS."

    David Miller noticed:
    "Maybe these NET_XMIT_* values being passed around should be a set of
    bits. They could be composed of base meanings, combined with specific
    attributes.

    So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"

    The attributes get masked out by the top-level ->enqueue() caller,
    such that the base meanings are the only thing that make their
    way up into the stack. If it's only about communication within the
    qdisc tree, let's simply code it that way."

    This patch is trying to realize these ideas.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

20 Jul, 2008

2 commits


06 Jul, 2008

1 commit


04 Jun, 2008

1 commit

  • Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
    nla_nest_cancel() void functions.

    Return -EMSGSIZE instead of -1 if the provided message buffer is not
    big enough.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

29 Jan, 2008

4 commits


11 Jul, 2007

1 commit


11 Feb, 2007

1 commit


03 Dec, 2006

2 commits


01 Jul, 2006

1 commit


21 Mar, 2006

1 commit

  • Convert sch_red to a classful qdisc. All qdiscs that maintain accurate
    backlog counters are eligible as child qdiscs. When a queue limit larger
    than zero is given, a bfifo qdisc is used for backwards compatibility.
    Current versions of tc enforce a limit larger than zero, other users
    can avoid creating the default qdisc by using zero.

    Signed-off-by: Patrick McHardy
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Patrick McHardy
     

06 Nov, 2005

5 commits

  • Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
    marking is enabled packets should still be dropped once the
    average queue length exceeds the maximum threshold.

    This _may_ help to avoid global synchronisation during small
    bursts of peers advertising but not caring about ECN. Use this
    option very carefully, it does more harm than good if
    (qth_max - qth_min) does not cover at least two average burst
    cycles.

    The difference to the current behaviour, in which we'd run into
    the hard queue limit, is that due to the low pass filter of RED
    short bursts are less likely to cause a global synchronisation.

    Signed-off-by: Thomas Graf
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Graf
     
  • Removes the skb trimming code which is not needed since we never
    touch the skb upon failure. Removes unnecessary includes,
    initializers, and simplifies the code a bit. Removes Jamal's
    obsolete email addresses upon his own request.

    Signed-off-by: Thomas Graf
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Graf
     
  • We should not interrupt and restart an idle period while idling already.

    Signed-off-by: Thomas Graf
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Graf
     
  • Signed-off-by: Thomas Graf
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Graf
     
  • Simplifies code a lot by separating the red algorithm and the
    queueing logic. We now differentiate between probability marks
    and forced marks but sum them together again to not break
    backwards compatibility.

    Signed-off-by: Thomas Graf
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Graf
     

09 Jul, 2005

1 commit

  • This is part of the grand scheme to eliminate the qlen
    member of skb_queue_head, and subsequently remove the
    'list' member of sk_buff.

    Most users of skb_queue_len() want to know if the queue is
    empty or not, and that's trivially done with skb_queue_empty()
    which doesn't use the skb_queue_head->qlen member and instead
    uses the queue list emptyness as the test.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds