06 Jul, 2011

1 commit


31 Mar, 2011

1 commit


02 Jun, 2010

1 commit


02 Apr, 2010

1 commit

  • One of my test machine got a deadlock during "tc" sessions,
    adding/deleting classes & filters, using traffic estimators.

    After some analysis, I believe we have a potential use after free case
    in est_timer() :

    spin_lock(e->stats_lock); << HERE >>
    read_lock(&est_lock);
    if (e->bstats == NULL) << TEST >>
    goto skip;

    Test is done a bit late, because after estimator is killed, and before
    rcu grace period elapsed, we might already have freed/reuse memory where
    e->stats_locks points to (some qdisc->q.lock)

    A possible fix is to respect a rcu grace period at Qdisc dismantle time.

    On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to
    it (for struct rcu_head) is a problem because it might change
    performance, given QDISC_ALIGNTO is 32 bytes.

    This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most
    current alignment requirements.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Jan, 2010

1 commit

  • This adds an additional queuing strategy, called pfifo_head_drop,
    to remove the oldest skb in the case of an overflow within the queue -
    the head element - instead of the last skb (tail). To remove the oldest
    skb in congested situations is useful for sensor network environments
    where newer packets reflect the superior information.

    Reviewed-by: Florian Westphal
    Acked-by: Patrick McHardy
    Signed-off-by: Hagen Paul Pfeifer
    Signed-off-by: David S. Miller

    Hagen Paul Pfeifer
     

04 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Aug, 2009

1 commit

  • dev_queue_xmit enqueue's a skb and calls qdisc_run which
    dequeue's the skb and xmits it. In most cases, the skb that
    is enqueue'd is the same one that is dequeue'd (unless the
    queue gets stopped or multiple cpu's write to the same queue
    and ends in a race with qdisc_run). For default qdiscs, we
    can remove the redundant enqueue/dequeue and simply xmit the
    skb since the default qdisc is work-conserving.

    The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
    default fast queue. The controversial part of the patch is
    incrementing qlen when a skb is requeued - this is to avoid
    checks like the second line below:

    + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
    >> !q->gso_skb &&
    + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {

    Results of a 2 hour testing for multiple netperf sessions (1,
    2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
    aggregate Mb/s across iterations tested with this version on
    System-X boxes with Chelsio 10gbps cards:

    ----------------------------------
    Size | ORG BW NEW BW |
    ----------------------------------
    128K | 156964 159381 |
    256K | 158650 162042 |
    ----------------------------------

    Changes from ver1:

    1. Move sch_direct_xmit declaration from sch_generic.h to
    pkt_sched.h
    2. Update qdisc basic statistics for direct xmit path.
    3. Set qlen to zero in qdisc_reset.
    4. Changed some function names to more meaningful ones.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

15 Jun, 2009

1 commit


09 Jun, 2009

2 commits

  • Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
    resolution. This will increase 16x a number of (internal) ticks per
    nanosecond, and is needed to improve accuracy of schedulers based on
    rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
    assumed this change is safe for 32bit accounting of time diffs up
    to 2 minutes, which should be enough for common use (extremely low
    rate values may overflow, so get inaccurate instead). To make full
    use of this change an updated iproute2 will be needed. (But using
    older iproute2 should be safe too.)

    This change breaks ticks - microseconds similarity, so some minor code
    fixes might be needed. It is also planned to change naming adequately
    eg. to PSCHED_TICKS2NS() etc. in the near future.

    Reported-by: Antonio Almeida
    Tested-by: Antonio Almeida
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
    PSCHED_NS2US() macros to enable changing this value later.

    Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
    definitions. This part of the patch is based on feedback from
    Patrick McHardy .

    Reported-by: Antonio Almeida
    Tested-by: Antonio Almeida
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

01 Feb, 2009

1 commit


23 Sep, 2008

1 commit

  • The current check wrongly uses the state of one (currently the first)
    tx queue for all tx queues in case of non-default qdiscs. This check
    mainly prevented requeuing loop with __netif_schedule(), but now it's
    controlled inside __qdisc_run(), while dequeuing. The wrongness of
    this check was first noticed by Herbert Xu.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

22 Aug, 2008

1 commit

  • Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
    without rtnl_lock(), adding and deleting from a qdisc list needs
    additional locking. This patch adds global spinlock qdisc_list_lock
    and wrapper functions for modifying the list. It is considered as a
    temporary solution until hfsc_dequeue(), netem_dequeue() and
    tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.

    With feedback from Herbert Xu and David S. Miller.

    Signed-off-by: Jarek Poplawski
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

13 Aug, 2008

1 commit

  • Based upon a bug report by Andrew Gallatin on netdev
    with subject "CPU utilization increased in 2.6.27rc"

    In commit 37437bb2e1ae8af470dfcd5b4ff454110894ccaf
    ("pkt_sched: Schedule qdiscs instead of netdev_queue.")
    the test of the queue being stopped was erroneously
    removed from qdisc_run().

    When the TX queue of the device fills up, this omission
    causes lots of extraneous useless work to be queued up
    to softirq context, where we'll just return immediately
    because the device is still stuffed up.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Jul, 2008

1 commit


18 Jul, 2008

3 commits

  • When we have shared qdiscs, packets come out of the qdiscs
    for multiple transmit queues.

    Therefore it doesn't make any sense to schedule the transmit
    queue when logically we cannot know ahead of time the TX
    queue of the SKB that the qdisc->dequeue() will give us.

    Just for sanity I added a BUG check to make sure we never
    get into a state where the noop_qdisc is scheduled.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently it is associated with a netdev_queue, but when we have
    qdisc sharing that no longer makes any sense.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This effectively "flips the switch" by making the core networking
    and multiqueue-aware drivers use the new TX multiqueue structures.

    Non-multiqueue drivers need no changes. The interfaces they use such
    as netif_stop_queue() degenerate into an operation on TX queue zero.
    So everything "just works" for them.

    Code that really wants to do "X" to all TX queues now invokes a
    routine that does so, such as netif_tx_wake_all_queues(),
    netif_tx_stop_all_queues(), etc.

    pktgen and netpoll required a little bit more surgery than the others.

    In particular the pktgen changes, whilst functional, could be largely
    improved. The initial check in pktgen_xmit() will sometimes check the
    wrong queue, which is mostly harmless. The thing to do is probably to
    invoke fill_packet() earlier.

    The bulk of the netpoll changes is to make the code operate solely on
    the TX queue indicated by by the SKB queue mapping.

    Setting of the SKB queue mapping is entirely confined inside of
    net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
    special semantics (drops, for example) it will be implemented here.

    Finally, we now have a "real_num_tx_queues" which is where the driver
    indicates how many TX queues are actually active.

    With IGB changes from Jeff Kirsher.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Jul, 2008

2 commits


06 Jul, 2008

1 commit


29 Jan, 2008

1 commit

  • Convert packet schedulers to use the netlink API. Unfortunately a gradual
    conversion is not possible without breaking compilation in the middle or
    adding lots of casts, so this patch converts them all in one step. The
    patch has been mostly generated automatically with some minor edits to
    at least allow seperate conversion of classifiers and actions.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

11 Oct, 2007

1 commit


15 Jul, 2007

1 commit

  • The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
    it to the qdisc, which could handle it internally or ignore it. With
    NET_CLS_ACT however, tc_classify starts over at the first classifier
    and never returns it to the qdisc. This makes it impossible to support
    qdisc-internal reclassification, which in turn makes it impossible to
    remove the old NET_CLS_POLICE code without breaking compatibility since
    we have two qdiscs (CBQ and ATM) that support this.

    This patch adds a tc_classify_compat function that handles
    reclassification the old way and changes CBQ and ATM to use it.

    This again is of course not fully backwards compatible with the previous
    NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
    compatibility *and* support qdisc internal reclassification with
    NET_CLS_ACT, but this seems like the better choice over keeping the two
    incompatible options around forever.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

26 Apr, 2007

11 commits


25 Jul, 2006

1 commit

  • In PSCHED_TADD and PSCHED_TADD2, if delta is less than tv.tv_usec (so,
    less than USEC_PER_SEC too) then tv_res will be smaller than tv. The
    affectation "(tv_res).tv_usec = __delta;" is wrong. The fix is to
    revert to the original code before
    4ee303dfeac6451b402e3d8512723d3a0f861857 and change the 'if' in
    'while'.

    [Shuya MAEDA: "while (__delta >= USEC_PER_SEC){ ... }" instead of
    "while (__delta > USEC_PER_SEC){ ... }"]

    Signed-off-by: Guillaume Chazarain
    Signed-off-by: David S. Miller

    Guillaume Chazarain
     

30 Jun, 2006

1 commit


20 Jun, 2006

1 commit

  • Having two or more qdisc_run's contend against each other is bad because
    it can induce packet reordering if the packets have to be requeued. It
    appears that this is an unintended consequence of relinquinshing the queue
    lock while transmitting. That in turn is needed for devices that spend a
    lot of time in their transmit routine.

    There are no advantages to be had as devices with queues are inherently
    single-threaded (the loopback device is not but then it doesn't have a
    queue).

    Even if you were to add a queue to a parallel virtual device (e.g., bolt
    a tbf filter in front of an ipip tunnel device), you would still want to
    process the queue in sequence to ensure that the packets are ordered
    correctly.

    The solution here is to steal a bit from net_device to prevent this.

    BTW, as qdisc_restart is no longer used by anyone as a module inside the
    kernel (IIRC it used to with netif_wake_queue), I have not exported the
    new __qdisc_run function.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

10 Jan, 2006

1 commit


06 Jul, 2005

1 commit