10 Mar, 2020

1 commit

  • Commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows")
    changes the scale of probability values in PIE from (2^64 - 1) to
    (2^56 - 1). This affects the precision of tc_pie_xstats->prob in
    user space.

    This patch ensures user space is unaffected.

    Suggested-by: Eric Dumazet
    Signed-off-by: Leslie Monis
    Signed-off-by: David S. Miller

    Leslie Monis
     

05 Mar, 2020

3 commits

  • The variable pie_vars->accu_prob is used as an accumulator for
    probability values. Since probabilty values are scaled using the
    MAX_PROB macro denoting (2^64 - 1), pie_vars->accu_prob is
    likely to overflow as it is of type u64.

    The variable pie_vars->accu_prob_overflows counts the number of
    times the variable pie_vars->accu_prob overflows.

    The MAX_PROB macro needs to be equal to at least (2^39 - 1) in
    order to do precise calculations without any underflow. Thus
    MAX_PROB can be reduced to (2^56 - 1) without affecting the
    precision in calculations drastically. Doing so will eliminate
    the need for the variable pie_vars->accu_prob_overflows as the
    variable pie_vars->accu_prob will never overflow.

    Removing the variable pie_vars->accu_prob_overflows also reduces
    the size of the structure pie_vars to exactly 64 bytes.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: Leslie Monis
    Signed-off-by: David S. Miller

    Leslie Monis
     
  • In function pie_calculate_probability(), the variables alpha and
    beta are of type u64. The variables qdelay, qdelay_old and
    params->target are of type psched_time_t (which is also u64).
    The explicit type casting done when calculating the value for
    the variable delta is redundant and not required.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: Leslie Monis
    Signed-off-by: David S. Miller

    Leslie Monis
     
  • Remove ambiguity by using the term backlog instead of qlen when
    representing the queue length in bytes.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: Leslie Monis
    Signed-off-by: David S. Miller

    Leslie Monis
     

23 Jan, 2020

5 commits

  • This patch makes the drop_early(), calculate_probability() and
    pie_process_dequeue() functions generic enough to be used by
    both PIE and FQ-PIE (to be added in a future commit). The major
    change here is in the way the functions take in arguments. This
    patch exports these functions and makes FQ-PIE dependent on
    sch_pie.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Leslie Monis
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • Make the alignment in the initialization of the struct instances
    consistent in the file.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Leslie Monis
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • Fix punctuation and logical mistakes in the comments. The
    logical mistake was that "dequeue_rate" is no longer the default
    way to calculate queuing delay and is not needed. The default
    way to calculate queue delay was changed in commit cec2975f2b70
    ("net: sched: pie: enable timestamp based delay calculation").

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Leslie Monis
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • Rearrange the members of the structure such that closely
    referenced members appear together and/or fit in the same
    cacheline. Also, change the order of their initializations to
    match the order in which they appear in the structure.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Leslie Monis
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • This patch moves macros, structures and small functions common
    to PIE and FQ-PIE (to be added in a future commit) from the file
    net/sched/sch_pie.c to the header file include/net/pie.h.
    All the moved functions are made inline.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Leslie Monis
    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     

21 Nov, 2019

1 commit

  • RFC 8033 suggests an alternative approach to calculate the queue
    delay in PIE by using a timestamp on every enqueued packet. This
    patch adds an implementation of that approach and sets it as the
    default method to calculate queue delay. The previous method (based
    on Little's law) to calculate queue delay is set as optional.

    Signed-off-by: Gautam Ramakrishnan
    Signed-off-by: Leslie Monis
    Signed-off-by: Mohit P. Tahiliani
    Acked-by: Dave Taht
    Signed-off-by: David S. Miller

    Gautam Ramakrishnan
     

19 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license this
    program is distributed in the hope that it will be useful but
    without any warranty without even the implied warranty of
    merchantability or fitness for a particular purpose see the gnu
    general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 53 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190602204653.904365654@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

01 Mar, 2019

1 commit


27 Feb, 2019

2 commits


26 Feb, 2019

7 commits

  • RFC 8033 replaces the IETF draft for PIE

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • Random dropping of packets to achieve latency control may
    introduce outlier situations where packets are dropped too
    close to each other or too far from each other. This can
    cause the real drop percentage to temporarily deviate from
    the intended drop probability. In certain scenarios, such
    as a small number of simultaneous TCP flows, these
    deviations can cause significant deviations in link
    utilization and queuing latency.

    RFC 8033 suggests using a derandomization mechanism to avoid
    these deviations.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • The current implementation scales the local alpha and beta
    variables in the calculate_probability function by the same
    amount for all values of drop probability below 1%.

    RFC 8033 suggests using additional cases for auto-tuning
    alpha and beta when the drop probability is less than 1%.

    In order to add more auto-tuning cases, MAX_PROB must be
    scaled by u64 instead of u32 to prevent underflow when
    scaling the local alpha and beta variables in the
    calculate_probability function.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • RFC 8033 suggests an initial value of 150 milliseconds for
    the maximum time allowed for a burst of packets.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • RFC 8033 suggests a default value of 15 milliseconds for the
    update interval.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • RFC 8033 suggests a default value of 15 milliseconds for the
    target queue delay.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     
  • RFC 8033 recommends a value of 16384 bytes for the queue
    threshold.

    Signed-off-by: Mohit P. Tahiliani
    Signed-off-by: Dhaval Khandla
    Signed-off-by: Hrishikesh Hiraskar
    Signed-off-by: Manish Kumar B
    Signed-off-by: Sachin D. Patil
    Signed-off-by: Leslie Monis
    Acked-by: Dave Taht
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Mohit P. Tahiliani
     

08 Oct, 2018

1 commit

  • Fix 5 warnings and 14 checks issued by checkpatch.pl:

    CHECK: Logical continuations should be on the previous line
    + if ((q->vars.qdelay < q->params.target / 2)
    + && (q->vars.prob < MAX_PROB / 5))

    WARNING: line over 80 characters
    + q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE]));

    CHECK: Blank lines aren't necessary after an open brace '{'
    +{
    +

    CHECK: braces {} should be used on all arms of this statement
    + if (qlen < QUEUE_THRESHOLD)
    [...]
    + else {
    [...]

    CHECK: Unbalanced braces around else statement
    + else {

    CHECK: No space is necessary after a cast
    + if (delta > (s32) (MAX_PROB / (100 / 2)) &&

    CHECK: Unnecessary parentheses around 'qdelay == 0'
    + if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

    CHECK: Unnecessary parentheses around 'qdelay_old == 0'
    + if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

    CHECK: Unnecessary parentheses around 'q->vars.prob == 0'
    + if ((q->vars.qdelay < q->params.target / 2) &&
    + (q->vars.qdelay_old < q->params.target / 2) &&
    + (q->vars.prob == 0) &&
    + (q->vars.avg_dq_rate > 0))

    CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0'
    + if ((q->vars.qdelay < q->params.target / 2) &&
    + (q->vars.qdelay_old < q->params.target / 2) &&
    + (q->vars.prob == 0) &&
    + (q->vars.avg_dq_rate > 0))

    CHECK: Blank lines aren't necessary before a close brace '}'
    +
    +}

    CHECK: Comparison to NULL could be written "!opts"
    + if (opts == NULL)

    CHECK: No space is necessary after a cast
    + ((u32) PSCHED_TICKS2NS(q->params.target)) /

    WARNING: line over 80 characters
    + nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) ||

    CHECK: Blank lines aren't necessary before a close brace '}'
    +
    +}

    CHECK: No space is necessary after a cast
    + .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) /

    WARNING: Missing a blank line after declarations
    + struct sk_buff *skb;
    + skb = qdisc_dequeue_head(sch);

    WARNING: Missing a blank line after declarations
    + struct pie_sched_data *q = qdisc_priv(sch);
    + qdisc_reset_queue(sch);

    WARNING: Missing a blank line after declarations
    + struct pie_sched_data *q = qdisc_priv(sch);
    + q->params.tupdate = 0;

    Signed-off-by: Leslie Monis
    Signed-off-by: David S. Miller

    Leslie Monis
     

22 Dec, 2017

2 commits


18 Oct, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly. Add pointer back to Qdisc.

    Cc: Jamal Hadi Salim
    Cc: Cong Wang
    Cc: Jiri Pirko
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

14 Apr, 2017

1 commit


19 Sep, 2016

2 commits


26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jun, 2016

1 commit


01 Mar, 2016

1 commit

  • When the bottom qdisc decides to, for example, drop some packet,
    it calls qdisc_tree_decrease_qlen() to update the queue length
    for all its ancestors, we need to update the backlog too to
    keep the stats on root qdisc accurate.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

30 Oct, 2014

1 commit


30 Sep, 2014

1 commit


14 Feb, 2014

1 commit

  • Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
    more details. Also add references to algorithm (IETF draft and paper) to top of
    file.

    Signed-off-by: Vijay Subramanian
    CC: Mythili Prabhu
    CC: Norbert Kiesel
    Signed-off-by: David S. Miller

    Vijay Subramanian
     

15 Jan, 2014

1 commit


07 Jan, 2014

1 commit

  • Proportional Integral controller Enhanced (PIE) is a scheduler to address the
    bufferbloat problem.

    >From the IETF draft below:
    " Bufferbloat is a phenomenon where excess buffers in the network cause high
    latency and jitter. As more and more interactive applications (e.g. voice over
    IP, real time video streaming and financial transactions) run in the Internet,
    high latency and jitter degrade application performance. There is a pressing
    need to design intelligent queue management schemes that can control latency and
    jitter; and hence provide desirable quality of service to users.

    We present here a lightweight design, PIE(Proportional Integral controller
    Enhanced) that can effectively control the average queueing latency to a target
    value. Simulation results, theoretical analysis and Linux testbed results have
    shown that PIE can ensure low latency and achieve high link utilization under
    various congestion situations. The design does not require per-packet
    timestamp, so it incurs very small overhead and is simple enough to implement
    in both hardware and software. "

    Many thanks to Dave Taht for extensive feedback, reviews, testing and
    suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
    suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
    support.

    For more information, please see technical paper about PIE in the IEEE
    Conference on High Performance Switching and Routing 2013. A copy of the paper
    can be found at ftp://ftpeng.cisco.com/pie/.

    Please also refer to the IETF draft submission at
    http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

    All relevant code, documents and test scripts and results can be found at
    ftp://ftpeng.cisco.com/pie/.

    For problems with the iproute2/tc or Linux kernel code, please contact Vijay
    Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
    (mysuryan@cisco.com)

    Signed-off-by: Vijay Subramanian
    Signed-off-by: Mythili Prabhu
    CC: Dave Taht
    Signed-off-by: David S. Miller

    Vijay Subramanian