26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jun, 2016

2 commits

  • __QDISC_STATE_THROTTLED bit manipulation is rather expensive
    for HTB and few others.

    I already removed it for sch_fq in commit f2600cf02b5b
    ("net: sched: avoid costly atomic operation in fq_dequeue()")
    and so far nobody complained.

    When one ore more packets are stuck in one or more throttled
    HTB class, a htb dequeue() performs two atomic operations
    to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
    lock is held.

    Removing this pair of atomic operations bring me a 8 % performance
    increase on 200 TCP_RR tests, in presence of throttled classes.

    This patch has no side effect, since nothing actually uses
    disc_is_throttled() anymore.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Conflicts:
    net/sched/act_police.c
    net/sched/sch_drr.c
    net/sched/sch_hfsc.c
    net/sched/sch_prio.c
    net/sched/sch_red.c
    net/sched/sch_tbf.c

    In net-next the drop methods of the packet schedulers got removed, so
    the bug fixes to them in 'net' are irrelevant.

    A packet action unload crash fix conflicts with the addition of the
    new firstuse timestamp.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Jun, 2016

2 commits

  • after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
    more callers of ->drop() outside of other ->drop functions, i.e.
    nothing calls them.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
    is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

04 Jun, 2016

1 commit


26 Apr, 2016

1 commit


01 Mar, 2016

2 commits

  • When the bottom qdisc decides to, for example, drop some packet,
    it calls qdisc_tree_decrease_qlen() to update the queue length
    for all its ancestors, we need to update the backlog too to
    keep the stats on root qdisc accurate.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Remove nearly duplicated code and prepare for the following patch.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

06 Oct, 2014

1 commit

  • Standard qdisc API to setup a timer implies an atomic operation on every
    packet dequeue : qdisc_unthrottled()

    It turns out this is not really needed for FQ, as FQ has no concept of
    global qdisc throttling, being a qdisc handling many different flows,
    some of them can be throttled, while others are not.

    Fix is straightforward : add a 'bool throttle' to
    qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
    in sch_fq.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2014

1 commit


23 Aug, 2014

1 commit


14 Mar, 2014

1 commit


06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2014

1 commit

  • On x86_64 we have 3 holes in struct tbf_sched_data.

    The member peak_present can be replaced with peak.rate_bytes_ps,
    because peak.rate_bytes_ps is set only when peak is specified in
    tbf_change(). tbf_peak_present() is introduced to test
    peak.rate_bytes_ps.

    The member max_size is moved to fill 32bit hole.

    Signed-off-by: Hiroaki SHIMODA
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

28 Feb, 2014

1 commit

  • The allocated child qdisc is not freed in error conditions.
    Defer the allocation after user configuration turns out to be
    valid and acceptable.

    Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
    Signed-off-by: Hiroaki SHIMODA
    Cc: Yang Yingliang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

27 Jan, 2014

1 commit


27 Dec, 2013

1 commit

  • When we set burst to 1514 with low rate in userspace,
    the kernel get a value of burst that less than 1514,
    which doesn't work.

    Because it may make some loss when transform burst
    to buffer in userspace. This makes burst lose some
    bytes, when the kernel transform the buffer back to
    burst.

    This patch adds two new attributes to support sending
    burst/mtu to kernel directly to avoid the loss.

    Signed-off-by: Yang Yingliang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yang Yingliang
     

12 Dec, 2013

2 commits

  • It's doing a 64-bit divide which is not supported
    on 32-bit architectures in psched_ns_t2l(). The
    correct way to do this is to use do_div().

    It's introduced by commit cc106e441a63
    ("net: sched: tbf: fix the calculation of max_size")

    Reported-by: kbuild test robot
    Signed-off-by: Yang Yingliang
    Signed-off-by: David S. Miller

    Yang Yingliang
     
  • Current max_size is caluated from rate table. Now, the rate table
    has been replaced and it's wrong to caculate max_size based on this
    rate table. It can lead wrong calculation of max_size.

    The burst in kernel may be lower than user asked, because burst may gets
    some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s")
    and it seems we cannot avoid this loss. Burst's value(max_size) based on
    rate table may be equal user asked. If a packet's length is max_size, this
    packet will be stalled in tbf_dequeue() because its length is above the
    burst in kernel so that it cannot get enough tokens. The max_size guards
    against enqueuing packet sizes above q->buffer "time" in tbf_enqueue().

    To make consistent with the calculation of tokens, this patch add a helper
    psched_ns_t2l() to calculate burst(max_size) directly to fix this problem.

    After this fix, we can support to using 64bit rates to calculate burst as well.

    Signed-off-by: Yang Yingliang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yang Yingliang
     

24 Nov, 2013

1 commit

  • If a too small burst is inadvertently set on TBF, we might trigger
    a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
    qdisc_reshape_fail() call.

    tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
    50mbit

    Fix the bug, and add a warning, as such configuration is not
    going to work anyway for non GSO packets.

    (For some reason, one has to use a burst >= 1520 to get a working
    configuration, even with old kernels. This is a probable iproute2/tc
    bug)

    Based on a report and initial patch from Yang Yingliang

    Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets")
    Signed-off-by: Eric Dumazet
    Reported-by: Yang Yingliang
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Nov, 2013

1 commit

  • With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
    Add two new attributes so that tc can use them to break the 32bit
    limit.

    Signed-off-by: Yang Yingliang
    Suggested-by: Sergei Shtylyov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yang Yingliang
     

21 Sep, 2013

1 commit

  • Add an extra u64 rate parameter to psched_ratecfg_precompute()
    so that some qdisc can opt-in for 64bit rates in the future,
    to overcome the ~34 Gbits limit.

    psched_ratecfg_getrate() reports a legacy structure to
    tc utility, so if actual rate is above the 32bit rate field,
    cap it to the 34Gbit limit.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jun, 2013

1 commit

  • Merge 'net' bug fixes into 'net-next' as we have patches
    that will build on top of them.

    This merge commit includes a change from Emil Goode
    (emilgoode@gmail.com) that fixes a warning that would
    have been introduced by this merge. Specifically it
    fixes the pingv6_ops method ipv6_chk_addr() to add a
    "const" to the "struct net_device *dev" argument and
    likewise update the dummy_ipv6_chk_addr() declaration.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jun, 2013

1 commit

  • commit 56b765b79 ("htb: improved accuracy at high rates")
    broke the "overhead xxx" handling, as well as the "linklayer atm"
    attribute.

    tc class add ... htb rate X ceil Y linklayer atm overhead 10

    This patch restores the "overhead xxx" handling, for htb, tbf
    and act_police

    The "linklayer atm" thing needs a separate fix.

    Reported-by: Jesper Dangaard Brouer
    Signed-off-by: Eric Dumazet
    Cc: Vimalkumar
    Cc: Jiri Pirko
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2013

1 commit

  • If a GSO packet has a length above tbf burst limit, the packet
    is currently silently dropped.

    Current way to handle this is to set the device in non GSO/TSO mode, or
    setting high bursts, and its sub optimal.

    We can actually segment too big GSO packets, and send individual
    segments as tbf parameters allow, allowing for better interoperability.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Cc: Jiri Pirko
    Cc: Jamal Hadi Salim
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Feb, 2013

1 commit

  • Current TBF uses rate table computed by the "tc" userspace program,
    which has the following issue:

    The rate table has 256 entries to map packet lengths to
    token (time units). With TSO sized packets, the 256 entry granularity
    leads to loss/gain of rate, making the token bucket inaccurate.

    Thus, instead of relying on rate table, this patch explicitly computes
    the time and accounts for packet transmission times with nanosecond
    granularity.

    This is a followup to 56b765b79e9a78dc7d3f8850ba5e5567205a3ecd
    ("htb: improved accuracy at high rates").

    Signed-off-by: Jiri Pirko
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jiri Pirko
     

02 Apr, 2012

1 commit


30 Dec, 2011

1 commit

  • Provide child qdisc backlog (byte count) information so that "tc -s
    qdisc" can report it to user.

    qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
    Sent 948517 bytes 898 pkt (dropped 0, overlimits 0 requeues 1)
    rate 175056bit 16pps backlog 114b 1p requeues 1
    qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
    Sent 948517 bytes 898 pkt (dropped 15, overlimits 611 requeues 0)
    backlog 18168b 12p requeues 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jan, 2011

1 commit


21 Jan, 2011

2 commits

  • In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed
    a problem with pfifo_head drops that incorrectly decreased
    sch->bstats.bytes and sch->bstats.packets

    Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
    previously enqueued packet, and bstats cannot be changed, so
    bstats/rates are not accurate (over estimated)

    This patch changes the qdisc_bstats updates to be done at dequeue() time
    instead of enqueue() time. bstats counters no longer account for dropped
    frames, and rates are more correct, since enqueue() bursts dont have
    effect on dequeue() rate.

    Signed-off-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit
    ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
    the cache line containing qdisc lock and often dirtied fields.

    I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
    mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

    Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
    "unsigned int" for __state container, reducing by 8 bytes Qdisc size.

    Introduce helpers to hide implementation details.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Jesper Dangaard Brouer
    CC: Jarek Poplawski
    CC: Jamal Hadi Salim
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


11 Jan, 2011

1 commit

  • HTB takes into account skb is segmented in stats updates.
    Generalize this to all schedulers.

    They should use qdisc_bstats_update() helper instead of manipulating
    bstats.bytes and bstats.packets

    Add bstats_update() helper too for classes that use
    gnet_stats_basic_packed fields.

    Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
    stab is setup on qdisc.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Aug, 2010

1 commit


18 May, 2010

1 commit


06 Sep, 2009

3 commits

  • The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
    originate from either ->get() or ->walk() and are always valid.

    Remove unnecessary checks.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some schedulers don't support creating, changing or deleting classes.
    Make the respective callbacks optionally and consistently return
    -EOPNOTSUPP for unsupported operations, instead of currently either
    -EOPNOTSUPP, -ENOSYS or no error.

    In case of sch_prio and sch_multiq, the removed operations additionally
    checked for an invalid class. This is not necessary since the class
    argument can only orginate from ->get() or in case of ->change is 0
    for creation of new classes, in which case ->change() incorrectly
    returned -ENOENT.

    As a side-effect, this patch fixes a possible (root-only) NULL pointer
    function call in sch_ingress, which didn't implement a so far mandatory
    ->delete() operation.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some qdiscs don't support attaching filters. Handle this centrally in
    cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

22 Mar, 2009

1 commit

  • tcp_sack_swap seems unnecessary so I pushed swap to the caller.
    Also removed comment that seemed then pointless, and added include
    when not already there. Compile tested.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen