26 Jun, 2016
1 commit
-
Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc->enqueue(), and in qdisc_drop() helper.I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.Signed-off-by: Eric Dumazet
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
09 Jun, 2016
1 commit
-
after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of ->drop() outside of other ->drop functions, i.e.
nothing calls them.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
19 Aug, 2015
1 commit
-
Those were all workarounds for the formerly double meaning of
tx_queue_len, which broke scheduling algorithms if untreated.Now that all in-tree drivers have been converted away from setting
tx_queue_len = 0, it should be safe to drop these workarounds for
categorically broken setups.Signed-off-by: Phil Sutter
Cc: Jamal Hadi Salim
Signed-off-by: David S. Miller
14 May, 2015
1 commit
-
Four minor merge conflicts:
1) qca_spi.c renamed the local variable used for the SPI device
from spi_device to spi, meanwhile the spi_set_drvdata() call
got moved further up in the probe function.2) Two changes were both adding new members to codel params
structure, and thus we had overlapping changes to the
initializer function.3) 'net' was making a fix to sk_release_kernel() which is
completely removed in 'net-next'.4) In net_namespace.c, the rtnl_net_fill() call for GET operations
had the command value fixed, meanwhile 'net-next' adjusted the
argument signature a bit.This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.Signed-off-by: David S. Miller
13 May, 2015
1 commit
-
In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop
parameters configured, then packets for the default VQ are not subjected
to RED and are only dropped if the queue is larger than the net_device's
tx_queue_len. This behavior is useful for WRED mode, since these packets
will still influence the calculated average queue length and (therefore)
the drop probability for all of the other VQs. However, for some drivers
tx_queue_len is zero. In other cases the user may wish to make the limit
the same for all VQs (including the default VQ with no drop parameters).This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit,
in bytes, during qdisc setup. (This limit is in bytes to be consistent
with the drop parameters.) The default limit is the same as for a bfifo
queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are
configured with a smaller limit than the GRED queue limit, that VQ will
still observe the smaller limit instead.Signed-off-by: David Ward
Signed-off-by: David S. Miller
12 May, 2015
1 commit
-
In WRED mode, the backlog for a single virtual queue (VQ) should not be
used to determine queue behavior; instead the backlog is summed across
all VQs. This sum is currently used when calculating the average queue
lengths. It also needs to be used when determining if the queue's hard
limit has been reached, or when reporting each VQ's backlog via netlink.
q->backlog will only be used if the queue switches out of WRED mode.Signed-off-by: David Ward
Signed-off-by: David S. Miller
30 Sep, 2014
1 commit
-
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.Signed-off-by: John Fastabend
Signed-off-by: David S. Miller
01 Jan, 2014
1 commit
-
Prefer pr_warn(... to pr_warning(...
Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
14 Sep, 2012
4 commits
-
gred_dequeue() and gred_drop() do not seem to get called when the
queue is empty, meaning that we never start idling while in WRED
mode. And since qidlestart is not stored by gred_store_wred_set(),
we would never stop idling while in WRED mode if we ever started.
This messes up the average queue size calculation that influences
packet marking/dropping behavior.Now, we start WRED mode idling as we are removing the last packet
from the queue. Also we now actually stop WRED mode idling when we
are enqueuing a packet.Cc: Bruce Osler
Signed-off-by: David Ward
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
to pass q->vars.qavg as the backlog value, we need to un-scale it.
Additionally, the qave value returned via netlink should not be Wlog
scaled, so we need to un-scale the result of red_calc_qavg().This caused artificially high values for "Average Queue" to be shown
by 'tc -s -d qdisc', but did not affect the actual operation of GRED.Signed-off-by: David Ward
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Each pair of DPs only needs to be compared once when searching for
a non-unique prio value.Signed-off-by: David Ward
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
Signed-off-by: David Ward
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller
16 May, 2012
1 commit
-
Standardize the net core ratelimited logging functions.
Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
24 Apr, 2012
1 commit
-
Fix merge between commit 3adadc08cc1e ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d27 ("net ax25: Simplify and
cleanup the ax25 sysctl handling")The former moved around the sysctl register/unregister calls, the
later simply removed them.With help from Stephen Rothwell.
Signed-off-by: David S. Miller
17 Apr, 2012
1 commit
-
A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONSThis fixes gred_dump() in WRED mode to use the values held in wred_set.
Signed-off-by: David Ward
Signed-off-by: David S. Miller
02 Apr, 2012
1 commit
-
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.Signed-off-by: David S. Miller
06 Jan, 2012
1 commit
-
This patch splits the red_parms structure into two components.
One holding the RED 'constant' parameters, and one containing the
variables.This permits a size reduction of GRED qdisc, and is a preliminary step
to add an optional RED unit to SFQ.SFQRED will have a single red_parms structure shared by all flows, and a
private red_vars per flow.Signed-off-by: Eric Dumazet
CC: Dave Taht
CC: Stephen Hemminger
Signed-off-by: David S. Miller
17 Dec, 2011
1 commit
-
In control path, its better to use GFP_KERNEL allocations where
possible.Before taking qdisc spinlock, we preallocate memory just in case we'll
need it in gred_change_vq()This is a followup to commit 3f1e6d3fd37b (sch_gred: should not use
GFP_KERNEL while holding a spinlock)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Dec, 2011
1 commit
-
Conflicts:
drivers/net/ethernet/freescale/fsl_pq_mdio.c
net/batman-adv/translation-table.c
net/ipv6/route.c
13 Dec, 2011
1 commit
-
gred_change_vq() is called under sch_tree_lock(sch).
This means a spinlock is held, and we are not allowed to sleep in this
context.We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Dec, 2011
1 commit
-
Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.Old tc binaries are non aware of new attributes, and still set/get Plog.
New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.# tc -d qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Jan, 2011
1 commit
-
Cleanup net/sched code to current CodingStyle and practices.
Reduce inline abuse
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
14 Nov, 2008
1 commit
-
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
31 Oct, 2008
1 commit
-
Add qdisc->ops->peek() implementation for work-conserving qdiscs.
With feedback from Patrick McHardy.Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller
20 Jul, 2008
1 commit
-
Signed-off-by: Jussi Kivilinna
Signed-off-by: David S. Miller
09 Jul, 2008
1 commit
-
It can be obtained via the netdev_queue. So create a helper routine,
qdisc_dev(), to make the transformations nicer looking.Now, qdisc_alloc() now no longer needs a net_device pointer argument.
Signed-off-by: David S. Miller
04 Jun, 2008
1 commit
-
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
29 Jan, 2008
4 commits
-
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
nla_parse() returns more detailed errno codes, propagate them back on
error.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Qdisc_class_ops are const, and Qdisc_ops are mostly read.
Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jul, 2007
1 commit
-
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller
11 Feb, 2007
1 commit
-
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
22 Jul, 2006
1 commit
-
Signed-off-by: Panagiotis Issaris
Signed-off-by: David S. Miller
01 Jul, 2006
1 commit
-
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
06 Nov, 2005
4 commits
-
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.Signed-off-by: Thomas Graf
Signed-off-by: Arnaldo Carvalho de Melo -
Adds a new u8 flags in a unused padding area of the netlink
message. Adds ECN marking support to be used instead of dropping
packets immediately.Signed-off-by: Thomas Graf
Signed-off-by: Arnaldo Carvalho de Melo -
Signed-off-by: Thomas Graf
Signed-off-by: Arnaldo Carvalho de Melo -
Removes unnecessary includes, initializers, and simplifies
the code a bit.Signed-off-by: Thomas Graf
Signed-off-by: Arnaldo Carvalho de Melo