07 Sep, 2017

1 commit


26 Aug, 2017

1 commit

  • For TC classes, their ->get() and ->put() are always paired, and the
    reference counting is completely useless, because:

    1) For class modification and dumping paths, we already hold RTNL lock,
    so all of these ->get(),->change(),->put() are atomic.

    2) For filter bindiing/unbinding, we use other reference counter than
    this one, and they should have RTNL lock too.

    3) For ->qlen_notify(), it is special because it is called on ->enqueue()
    path, but we already hold qdisc tree lock there, and we hold this
    tree lock when graft or delete the class too, so it should not be gone
    or changed until we release the tree lock.

    Therefore, this patch removes ->get() and ->put(), but:

    1) Adds a new ->find() to find the pointer to a class by classid, no
    refcnt.

    2) Move the original class destroy upon the last refcnt into ->delete(),
    right after releasing tree lock. This is fine because the class is
    already removed from hash when holding the lock.

    For those who also use ->put() as ->unbind(), just rename them to reflect
    this change.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

17 Aug, 2017

1 commit

  • This callback is used for deactivating class in parent qdisc.
    This is cheaper to test queue length right here.

    Also this allows to catch draining screwed backlog and prevent
    second deactivation of already inactive parent class which will
    crash kernel for sure. Kernel with print warning at destruction
    of child qdisc where no packets but backlog is not zero.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

07 Jun, 2017

1 commit

  • There is need to instruct the HW offloaded path to push certain matched
    packets to cpu/kernel for further analysis. So this patch introduces a
    new TRAP control action to TC.

    For kernel datapath, this action does not make much sense. So with the
    same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just
    dropped in the datapath (and virtually ejected to an upper level, which
    does not exist in case of kernel).

    Signed-off-by: Jiri Pirko
    Reviewed-by: Yotam Gigi
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Jiri Pirko
     

18 May, 2017

2 commits

  • Currently, the filter chains are direcly put into the private structures
    of qdiscs. In order to be able to have multiple chains per qdisc and to
    allow filter chains sharing among qdiscs, there is a need for common
    object that would hold the chains. This introduces such object and calls
    it "tcf_block".

    Helpers to get and put the blocks are provided to be called from
    individual qdisc code. Also, the original filter_list pointers are left
    in qdisc privs to allow the entry into tcf_block processing without any
    added overhead of possible multiple pointer dereference on fast path.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Move tc_classify function to cls_api.c where it belongs, rename it to
    fit the namespace.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

14 Apr, 2017

1 commit


13 Mar, 2017

1 commit

  • The original reason [1] for having hidden qdiscs (potential scalability
    issues in qdisc_match_from_root() with single linked list in case of large
    amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
    qdisc linked list to hashtable").

    This allows us for bringing more clarity and determinism into the dump by
    making default pfifo qdiscs visible.

    We're not turning this on by default though, at it was deemed [2] too
    intrusive / unnecessary change of default behavior towards userspace.
    Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
    applications to request complete qdisc hierarchy dump, including the
    ones that have always been implicit/invisible.

    Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
    about singletons would require quite some surgery with very little gain
    (seeing no qdisc or seeing noop qdisc in the dump is probably setting
    the same user expectation).

    [1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com
    [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.net

    Signed-off-by: Jiri Kosina
    Signed-off-by: David S. Miller

    Jiri Kosina
     

06 Dec, 2016

1 commit

  • 1) Old code was hard to maintain, due to complex lock chains.
    (We probably will be able to remove some kfree_rcu() in callers)

    2) Using a single timer to update all estimators does not scale.

    3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
    is not supposed to work well)

    In this rewrite :

    - I removed the RB tree that had to be scanned in
    gen_estimator_active(). qdisc dumps should be much faster.

    - Each estimator has its own timer.

    - Estimations are maintained in net_rate_estimator structure,
    instead of dirtying the qdisc. Minor, but part of the simplification.

    - Reading the estimator uses RCU and a seqcount to provide proper
    support for 32bit kernels.

    - We reduce memory need when estimators are not used, since
    we store a pointer, instead of the bytes/packets counters.

    - xt_rateest_mt() no longer has to grab a spinlock.
    (In the future, xt_rateest_tg() could be switched to per cpu counters)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Sep, 2016

1 commit


26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jun, 2016

1 commit

  • Conflicts:
    net/sched/act_police.c
    net/sched/sch_drr.c
    net/sched/sch_hfsc.c
    net/sched/sch_prio.c
    net/sched/sch_red.c
    net/sched/sch_tbf.c

    In net-next the drop methods of the packet schedulers got removed, so
    the bug fixes to them in 'net' are irrelevant.

    A packet action unload crash fix conflicts with the addition of the
    new firstuse timestamp.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Jun, 2016

2 commits


08 Jun, 2016

1 commit

  • Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
    agent [1] are problematic at scale :

    For each qdisc/class found in the dump, we currently lock the root qdisc
    spinlock in order to get stats. Sampling stats every 5 seconds from
    thousands of HTB classes is a challenge when the root qdisc spinlock is
    under high pressure. Not only the dumps take time, they also slow
    down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

    An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
    that might need the qdisc lock in fq_codel_dump_stats() and
    fq_codel_dump_class_stats()

    In v2 of this patch, I now use the Qdisc running seqcount to provide
    consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

    I also changed rate estimators to use the same infrastructure
    so that they no longer need to lock root qdisc lock.

    [1]
    http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

    Signed-off-by: Eric Dumazet
    Cc: Cong Wang
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Cc: Kevin Athey
    Cc: Xiaotian Pei
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Mar, 2016

2 commits

  • When the bottom qdisc decides to, for example, drop some packet,
    it calls qdisc_tree_decrease_qlen() to update the queue length
    for all its ancestors, we need to update the backlog too to
    keep the stats on root qdisc accurate.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Remove nearly duplicated code and prepare for the following patch.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

28 Aug, 2015

1 commit

  • For classifiers getting invoked via tc_classify(), we always need an
    extra function call into tc_classify_compat(), as both are being
    exported as symbols and tc_classify() itself doesn't do much except
    handling of reclassifications when tp->classify() returned with
    TC_ACT_RECLASSIFY.

    CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
    all others use tc_classify(). When tc actions are being configured
    out in the kernel, tc_classify() effectively does nothing besides
    delegating.

    We could spare this layer and consolidate both functions. pktgen on
    single CPU constantly pushing skbs directly into the netif_receive_skb()
    path with a dummy classifier on ingress qdisc attached, improves
    slightly from 22.3Mpps to 23.1Mpps.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

16 Jul, 2015

1 commit

  • The member (u32) "num_active_agg" of struct qfq_sched has been unused
    since its introduction in 462dbc9101acd38e92eda93c0726857517a24bbd
    "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
    there is no active plan to use it; this removes the member.

    Signed-off-by: Andrea Parri
    Acked-by: Paolo Valente
    Signed-off-by: David S. Miller

    Andrea Parri
     

22 Jun, 2015

1 commit


30 Sep, 2014

4 commits

  • After previous patches to simplify qstats the qstats can be
    made per cpu with a packed union in Qdisc struct.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This removes the use of qstats->qlen variable from the classifiers
    and makes it an explicit argument to gnet_stats_copy_queue().

    The qlen represents the qdisc queue length and is packed into
    the qstats at the last moment before passnig to user space. By
    handling it explicitely we avoid, in the percpu stats case, having
    to figure out which per_cpu variable to put it in.

    It would probably be best to remove it from qstats completely
    but qstats is a user space ABI and can't be broken. A future
    patch could make an internal only qstats structure that would
    avoid having to allocate an additional u32 variable on the
    Qdisc struct. This would make the qstats struct 128bits instead
    of 128+32.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This adds helpers to manipulate qstats logic and replaces locations
    that touch the counters directly. This simplifies future patches
    to push qstats onto per cpu counters.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • In order to run qdisc's without locking statistics and estimators
    need to be handled correctly.

    To resolve bstats make the statistics per cpu. And because this is
    only needed for qdiscs that are running without locks which is not
    the case for most qdiscs in the near future only create percpu
    stats when qdiscs set the TCQ_F_CPUSTATS flag.

    Next because estimators use the bstats to calculate packets per
    second and bytes per second the estimator code paths are updated
    to use the per cpu statistics.

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Sep, 2014

1 commit

  • rcu'ify tcf_proto this allows calling tc_classify() without holding
    any locks. Updaters are protected by RTNL.

    This patch prepares the core net_sched infrastracture for running
    the classifier/action chains without holding the qdisc lock however
    it does nothing to ensure cls_xxx and act_xxx types also work without
    locking. Additional patches are required to address the fall out.

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

19 Jul, 2013

1 commit

  • QFQ+ inherits from QFQ a design choice that may cause a high packet
    delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
    special quantity, the system virtual time, to track the service
    provided by the ideal system it approximates. When a packet is
    dequeued, this quantity must be incremented by the size of the packet,
    divided by the sum of the weights of the aggregates waiting to be
    served. Tracking this sum correctly is a non-trivial task, because, to
    preserve tight service guarantees, the decrement of this sum must be
    delayed in a special way [1]: this sum can be decremented only after
    that its value would decrease also in the ideal system approximated by
    QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
    weight sum, increased and decreased immediately as the weight of an
    aggregate changes, and as an aggregate is created or destroyed (which,
    in its turn, happens as a consequence of some class being
    created/destroyed/changed). However, to avoid the problems caused to
    service guarantees by these immediate decreases, QFQ+ increments the
    system virtual time using the maximum value allowed for the weight
    sum, 2^10, in place of the dynamic, instantaneous value. The
    instantaneous value of the weight sum is used only to check whether a
    request of weight increase or a class creation can be satisfied.

    Unfortunately, the problems caused by this choice are worse than the
    temporary degradation of the service guarantees that may occur, when a
    class is changed or destroyed, if the instantaneous value of the
    weight sum was used to update the system virtual time. In fact, the
    fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
    equal to the ratio between the weight of the aggregate and the sum of
    the weights of the competing aggregates. The packet delay guaranteed
    to the aggregate is instead inversely proportional to the guaranteed
    bandwidth. By using the maximum possible value, and not the actual
    value of the weight sum, QFQ+ provides each aggregate with the worst
    possible service guarantees, and not with service guarantees related
    to the actual set of competing aggregates. To see the consequences of
    this fact, consider the following simple example.

    Suppose that only the following aggregates are backlogged, i.e., that
    only the classes in the following aggregates have packets to transmit:
    one aggregate with weight 10, say A, and ten aggregates with weight 1,
    say B1, B2, ..., B10. In particular, suppose that these aggregates are
    always backlogged. Given the weight distribution, the smoothest and
    fairest service order would be:
    A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

    QFQ+ would provide exactly this optimal service if it used the actual
    value for the weight sum instead of the maximum possible value, i.e.,
    11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
    serves aggregates as follows (easy to prove and to reproduce
    experimentally):
    A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

    By replacing 10 with N in the above example, and by increasing N, one
    can increase at will the maximum packet delay and the jitter
    experienced by the classes in aggregate A.

    This patch addresses this issue by just using the above
    'instantaneous' value of the weight sum, instead of the maximum
    possible value, when updating the system virtual time. After the
    instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
    service for a time interval in the order of the time to serve one
    maximum-size packet for each backlogged class. The worst-case extent
    of the deviation exhibited by QFQ+ during this time interval [1] is
    basically the same as of the deviation described above (but, without
    this patch, QFQ+ suffers from such a deviation all the time). Finally,
    this patch modifies the comment to the function qfq_slot_insert, to
    make it coherent with the fact that the weight sum used by QFQ+ can
    now be lower than the maximum possible value.

    [1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
    Proceedings of AAA-IDEA'05, June 2005.

    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     

12 Jul, 2013

2 commits

  • This patch removes the forward declaration of qfq_update_agg_ts, by moving
    the definition of the function above its first call. This patch also
    removes a useless forward declaration of qfq_schedule_agg.

    Reported-by: David S. Miller
    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • In make_eligible, a mask is used to decide which groups must become eligible:
    the i-th group becomes eligible only if the i-th bit of the mask (from the
    right) is set. The mask is computed by left-shifting a 1 by a given number of
    places, and decrementing the result. The shift is performed on a ULL to avoid
    problems in case the number of places to shift is higher than 31. On a 32-bit
    machine, this is more costly than working on an UL. This patch replaces such a
    costly operation with two cheaper branches.

    The trick is based on the following fact: in case of a shift of at least 32
    places, the resulting mask has at least the 32 less significant bits set,
    whereas the total number of groups is lower than 32. As a consequence, in this
    case it is enough to just set the 32 less significant bits of the mask with a
    cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

    Reported-by: David S. Miller
    Reported-by: David Laight
    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     

11 Jun, 2013

1 commit

  • struct gnet_stats_rate_est contains u32 fields, so the bytes per second
    field can wrap at 34360Mbit.

    Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
    and switch the kernel to use this structure natively.

    This structure is dumped to user space as a new attribute :

    TCA_STATS_RATE_EST64

    Old tc command will now display the capped bps (to 34360Mbit), instead
    of wrapped values, and updated tc command will display correct
    information.

    Old tc command output, after patch :

    eric:~# tc -s -d qd sh dev lo
    qdisc pfifo 8001: root refcnt 2 limit 1000p
    Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
    rate 34360Mbit 189696pps backlog 0b 0p requeues 0

    This patch carefully reorganizes "struct Qdisc" layout to get optimal
    performance on SMP.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Mar, 2013

6 commits

  • QFQ+ can select for service only 'eligible' aggregates, i.e.,
    aggregates that would have started to be served also in the emulated
    ideal system. As a consequence, for QFQ+ to be work conserving, at
    least one of the active aggregates must be eligible when it is time to
    choose the next aggregate to serve.

    The set of eligible aggregates is updated through the function
    qfq_update_eligible(), which does guarantee that, after its
    invocation, at least one of the active aggregates is eligible.
    Because of this property, this function is invoked in
    qfq_deactivate_agg() to guarantee that at least one of the active
    aggregates is still eligible after an aggregate has been deactivated.
    In particular, the critical case is when there are other active
    aggregates, but the aggregate being deactivated happens to be the only
    one eligible.

    However, this precaution is not needed for QFQ+ to be work conserving,
    because update_eligible() is always invoked also at the beginning of
    qfq_choose_next_agg(). This patch removes the additional invocation of
    update_eligible() in qfq_deactivate_agg().

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • By definition of (the algorithm of) QFQ+, the system virtual time must
    be pushed up only if there is no 'eligible' aggregate, i.e. no
    aggregate that would have started to be served also in the ideal
    system emulated by QFQ+. QFQ+ serves only eligible aggregates, hence
    the aggregate currently in service is eligible. As a consequence, to
    decide whether there is no eligible aggregate, QFQ+ must also check
    whether there is no aggregate in service.

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • Aggregate budgets are computed so as to guarantee that, after an
    aggregate has been selected for service, that aggregate has enough
    budget to serve at least one maximum-size packet for the classes it
    contains. For this reason, after a new aggregate has been selected
    for service, its next packet is immediately dequeued, without any
    further control.

    The maximum packet size for a class, lmax, can be changed through
    qfq_change_class(). In case the user sets lmax to a lower value than
    the the size of some of the still-to-arrive packets, QFQ+ will
    automatically push up lmax as it enqueues these packets. This
    automatic push up is likely to happen with TSO/GSO.

    In any case, if lmax is assigned a lower value than the size of some
    of the packets already enqueued for the class, then the following
    problem may occur: the size of the next packet to dequeue for the
    class may happen to be larger than lmax, after the aggregate to which
    the class belongs has been just selected for service. In this case,
    even the budget of the aggregate, which is an unsigned value, may be
    lower than the size of the next packet to dequeue. After dequeueing
    this packet and subtracting its size from the budget, the latter would
    wrap around.

    This fix prevents the budget from wrapping around after any packet
    dequeue.

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • If no aggregate is in service, then the function qfq_dequeue() does
    not dequeue any packet. For this reason, to guarantee QFQ+ to be work
    conserving, a just-activated aggregate must be set as in service
    immediately if it happens to be the only active aggregate.
    This is done by the function qfq_enqueue().

    Unfortunately, the function qfq_add_to_agg(), used to add a class to
    an aggregate, does not perform this important additional operation.
    In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
    of a class from a source aggregate, becoming, for this move, inactive,
    to a destination aggregate, becoming instead active, and 2) the
    destination aggregate becomes the only active aggregate, then this
    aggregate is not however set as in service. QFQ+ remains then in a
    non-work-conserving state until a new invocation of qfq_enqueue()
    recovers the situation.

    This fix solves the problem by moving the logic for setting an
    aggregate as in service directly into the function qfq_activate_agg().
    Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
    remains work conserving. Since the more-complex logic of this new
    version of activate_aggregate() is not necessary, in qfq_dequeue(), to
    reschedule an aggregate that finishes its budget, then the aggregate
    is now rescheduled by invoking directly the functions needed.

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • Between two invocations of make_eligible, the system virtual time may
    happen to grow enough that, in its binary representation, a bit with
    higher order than 31 flips. This happens especially with
    TSO/GSO. Before this fix, the mask used in make_eligible was computed
    as (1UL< 31.
    The fix just replaces 1UL with 1ULL.

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • QFQ+ schedules the active aggregates in a group using a bucket list
    (one list per group). The bucket in which each aggregate is inserted
    depends on the aggregate's timestamps, and the number
    of buckets in a group is enough to accomodate the possible (range of)
    values of the timestamps of all the aggregates in the group. For this
    property to hold, timestamps must however be computed correctly. One
    necessary condition for computing timestamps correctly is that the
    number of bits dequeued for each aggregate, while the aggregate is in
    service, does not exceed the maximum budget budgetmax assigned to the
    aggregate.

    For each aggregate, budgetmax is proportional to the number of classes
    in the aggregate. If the number of classes of the aggregate is
    decreased through qfq_change_class(), then budgetmax is decreased
    automatically as well. Problems may occur if the aggregate is in
    service when budgetmax is decreased, because the current remaining
    budget of the aggregate and/or the service already received by the
    aggregate may happen to be larger than the new value of budgetmax. In
    this case, when the aggregate is eventually deselected and its
    timestamps are updated, the aggregate may happen to have received an
    amount of service larger than budgetmax. This may cause the aggregate
    to be assigned a higher virtual finish time than the maximum
    acceptable value for the last bucket in the bucket list of the group.

    This fix introduces a cap that addresses this issue.

    Signed-off-by: Paolo Valente
    Reviewed-by: Fabio Checconi
    Signed-off-by: David S. Miller

    Paolo Valente
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

29 Nov, 2012

1 commit

  • This patch turns QFQ into QFQ+, a variant of QFQ that provides the
    following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
    from QFQ, QFQ+ correctly schedules also non-leaves classes in a
    hierarchical setting. A detailed description of QFQ+, plus a
    performance comparison with DRR and QFQ, can be found in [1].

    [1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
    http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf

    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     

08 Nov, 2012

1 commit

  • If the max packet size for some class (configured through tc) is
    violated by the actual size of the packets of that class, then QFQ
    would not schedule classes correctly, and the data structures
    implementing the bucket lists may get corrupted. This problem occurs
    with TSO/GSO even if the max packet size is set to the MTU, and is,
    e.g., the cause of the failure reported in [1]. Two patches have been
    proposed to solve this problem in [2], one of them is a preliminary
    version of this patch.

    This patch addresses the above issues by: 1) setting QFQ parameters to
    proper values for supporting TSO/GSO (in particular, setting the
    maximum possible packet size to 64KB), 2) automatically increasing the
    max packet size for a class, lmax, when a packet with a larger size
    than the current value of lmax arrives.

    The drawback of the first point is that the maximum weight for a class
    is now limited to 4096, which is equal to 1/16 of the maximum weight
    sum.

    Finally, this patch also forcibly caps the timestamps of a class if
    they are too high to be stored in the bucket list. This capping, taken
    from QFQ+ [3], handles the unfrequent case described in the comment to
    the function slot_insert.

    [1] http://marc.info/?l=linux-netdev&m=134968777902077&w=2
    [2] http://marc.info/?l=linux-netdev&m=135096573507936&w=2
    [3] http://marc.info/?l=linux-netdev&m=134902691421670&w=2

    Signed-off-by: Paolo Valente
    Tested-by: Cong Wang
    Acked-by: Stephen Hemminger
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Paolo Valente
     

29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Sep, 2012

1 commit

  • GCC refuses to recognize that all error control flows do in fact
    set err to something.

    Add an explicit initialization to shut it up.

    net/sched/sch_drr.c: In function ‘drr_enqueue’:
    net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    net/sched/sch_qfq.c: In function ‘qfq_enqueue’:
    net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]

    Signed-off-by: David S. Miller

    David S. Miller