26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Jun, 2016

1 commit

  • after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
    more callers of ->drop() outside of other ->drop functions, i.e.
    nothing calls them.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

19 Aug, 2015

1 commit

  • Those were all workarounds for the formerly double meaning of
    tx_queue_len, which broke scheduling algorithms if untreated.

    Now that all in-tree drivers have been converted away from setting
    tx_queue_len = 0, it should be safe to drop these workarounds for
    categorically broken setups.

    Signed-off-by: Phil Sutter
    Cc: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Phil Sutter
     

14 May, 2015

1 commit

  • Four minor merge conflicts:

    1) qca_spi.c renamed the local variable used for the SPI device
    from spi_device to spi, meanwhile the spi_set_drvdata() call
    got moved further up in the probe function.

    2) Two changes were both adding new members to codel params
    structure, and thus we had overlapping changes to the
    initializer function.

    3) 'net' was making a fix to sk_release_kernel() which is
    completely removed in 'net-next'.

    4) In net_namespace.c, the rtnl_net_fill() call for GET operations
    had the command value fixed, meanwhile 'net-next' adjusted the
    argument signature a bit.

    This also matches example merge resolutions posted by Stephen
    Rothwell over the past two days.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 May, 2015

1 commit

  • In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop
    parameters configured, then packets for the default VQ are not subjected
    to RED and are only dropped if the queue is larger than the net_device's
    tx_queue_len. This behavior is useful for WRED mode, since these packets
    will still influence the calculated average queue length and (therefore)
    the drop probability for all of the other VQs. However, for some drivers
    tx_queue_len is zero. In other cases the user may wish to make the limit
    the same for all VQs (including the default VQ with no drop parameters).

    This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit,
    in bytes, during qdisc setup. (This limit is in bytes to be consistent
    with the drop parameters.) The default limit is the same as for a bfifo
    queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are
    configured with a smaller limit than the GRED queue limit, that VQ will
    still observe the smaller limit instead.

    Signed-off-by: David Ward
    Signed-off-by: David S. Miller

    David Ward
     

12 May, 2015

1 commit

  • In WRED mode, the backlog for a single virtual queue (VQ) should not be
    used to determine queue behavior; instead the backlog is summed across
    all VQs. This sum is currently used when calculating the average queue
    lengths. It also needs to be used when determining if the queue's hard
    limit has been reached, or when reporting each VQ's backlog via netlink.
    q->backlog will only be used if the queue switches out of WRED mode.

    Signed-off-by: David Ward
    Signed-off-by: David S. Miller

    David Ward
     

30 Sep, 2014

1 commit


01 Jan, 2014

1 commit


14 Sep, 2012

4 commits

  • gred_dequeue() and gred_drop() do not seem to get called when the
    queue is empty, meaning that we never start idling while in WRED
    mode. And since qidlestart is not stored by gred_store_wred_set(),
    we would never stop idling while in WRED mode if we ever started.
    This messes up the average queue size calculation that influences
    packet marking/dropping behavior.

    Now, we start WRED mode idling as we are removing the last packet
    from the queue. Also we now actually stop WRED mode idling when we
    are enqueuing a packet.

    Cc: Bruce Osler
    Signed-off-by: David Ward
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    David Ward
     
  • q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
    to pass q->vars.qavg as the backlog value, we need to un-scale it.
    Additionally, the qave value returned via netlink should not be Wlog
    scaled, so we need to un-scale the result of red_calc_qavg().

    This caused artificially high values for "Average Queue" to be shown
    by 'tc -s -d qdisc', but did not affect the actual operation of GRED.

    Signed-off-by: David Ward
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    David Ward
     
  • Each pair of DPs only needs to be compared once when searching for
    a non-unique prio value.

    Signed-off-by: David Ward
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    David Ward
     
  • Signed-off-by: David Ward
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    David Ward
     

16 May, 2012

1 commit


24 Apr, 2012

1 commit


17 Apr, 2012

1 commit

  • A parameter set exists for WRED mode, called wred_set, to hold the same
    values for qavg and qidlestart across all VQs. The WRED mode values had
    been previously held in the VQ for the default DP. After these values
    were moved to wred_set, the VQ for the default DP was no longer created
    automatically (so that it could be omitted on purpose, to have packets
    in the default DP enqueued directly to the device without using RED).

    However, gred_dump() was overlooked during that change; in WRED mode it
    still reads qavg/qidlestart from the VQ for the default DP, which might
    not even exist. As a result, this command sequence will cause an oops:

    tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
    DPs 3 default 2 grio
    tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
    tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS

    This fixes gred_dump() in WRED mode to use the values held in wred_set.

    Signed-off-by: David Ward
    Signed-off-by: David S. Miller

    David Ward
     

02 Apr, 2012

1 commit


06 Jan, 2012

1 commit

  • This patch splits the red_parms structure into two components.

    One holding the RED 'constant' parameters, and one containing the
    variables.

    This permits a size reduction of GRED qdisc, and is a preliminary step
    to add an optional RED unit to SFQ.

    SFQRED will have a single red_parms structure shared by all flows, and a
    private red_vars per flow.

    Signed-off-by: Eric Dumazet
    CC: Dave Taht
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Dec, 2011

1 commit

  • In control path, its better to use GFP_KERNEL allocations where
    possible.

    Before taking qdisc spinlock, we preallocate memory just in case we'll
    need it in gred_change_vq()

    This is a followup to commit 3f1e6d3fd37b (sch_gred: should not use
    GFP_KERNEL while holding a spinlock)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Dec, 2011

1 commit


13 Dec, 2011

1 commit


10 Dec, 2011

1 commit

  • Now RED uses a Q0.32 number to store max_p (max probability), allow
    RED/GRED/CHOKE to use/report full resolution at config/dump time.

    Old tc binaries are non aware of new attributes, and still set/get Plog.

    New tc binary set/get both Plog and max_p for backward compatibility,
    they display "probability value" if they get max_p from new kernels.

    # tc -d qdisc show dev ...
    ...
    qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
    probability 0.09 Scell_log 15

    Make sure we avoid potential divides by 0 in reciprocal_value(), if
    (max_th - min_th) is big.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jan, 2011

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

14 Nov, 2008

1 commit

  • After implementing qdisc->ops->peek() and changing sch_netem into
    classless qdisc there are no more qdisc->ops->requeue() users. This
    patch removes this method with its wrappers (qdisc_requeue()), and
    also unused qdisc->requeue structure. There are a few minor fixes of
    warnings (htb_enqueue()) and comments btw.

    The idea to kill ->requeue() and a similar patch were first developed
    by David S. Miller.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

31 Oct, 2008

1 commit


20 Jul, 2008

1 commit


09 Jul, 2008

1 commit


04 Jun, 2008

1 commit

  • Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
    nla_nest_cancel() void functions.

    Return -EMSGSIZE instead of -1 if the provided message buffer is not
    big enough.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

29 Jan, 2008

4 commits


11 Jul, 2007

1 commit


11 Feb, 2007

1 commit


22 Jul, 2006

1 commit


01 Jul, 2006

1 commit


06 Nov, 2005

4 commits