30 May, 2018

1 commit

  • [ Upstream commit ecc832758a654e375924ebf06a4ac971acb5ce60 ]

    The link to the pdf containing the algorithm description is now a
    dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been
    moved to https://sites.google.com/a/illinois.edu/srikant/ and none of
    the original papers can be found there...

    I have replaced it with the only working copy I was able to find.

    n.b. there is also a copy available at:

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf

    However, this seems to only be a *cached* version, so I am unsure
    exactly how reliable that link can be expected to remain over time
    and have decided against using that one.

    Signed-off-by: Joey Pabalinas

    net/ipv4/tcp_illinois.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Joey Pabalinas
     

07 Aug, 2017

1 commit

  • Most TCP congestion controls are using identical logic to undo
    cwnd except BBR. This patch consolidates these similar functions
    to the one used currently by Reno and others.

    Suggested-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

22 Nov, 2016

1 commit

  • congestion control algorithms that do not halve cwnd in their .ssthresh
    should provide a .cwnd_undo rather than rely on current fallback which
    assumes reno halving (and thus doubles the cwnd).

    All of these do 'something else' in their .ssthresh implementation, thus
    store the cwnd on loss and provide .undo_cwnd to restore it again.

    A followup patch will remove the fallback and all algorithms will
    need to provide a .cwnd_undo function.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

12 May, 2016

1 commit

  • Replace 2 arguments (cnt and rtt) in the congestion control modules'
    pkts_acked() function with a struct. This will allow adding more
    information without having to modify existing congestion control
    modules (tcp_nv in particular needs bytes in flight when packet
    was sent).

    As proposed by Neal Cardwell in his comments to the tcp_nv patch.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

10 Jul, 2015

1 commit

  • Add a helper to test the slow start condition in various congestion
    control modules and other places. This is to prepare a slight improvement
    in policy as to exactly when to slow start.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

30 Apr, 2015

1 commit

  • We would like that optional info provided by Congestion Control
    modules using netlink can also be read using getsockopt()

    This patch changes get_info() to put this information in a buffer,
    instead of skb, like tcp_get_info(), so that following patch
    can reuse this common infrastructure.

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Neal Cardwell
    Acked-by: Daniel Borkmann
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Apr, 2015

1 commit

  • Two different problems are fixed here :

    1) inet_sk_diag_fill() might be called without socket lock held.
    icsk->icsk_ca_ops can change under us and module be unloaded.
    -> Access to freed memory.
    Fix this using rcu_read_lock() to prevent module unload.

    2) Some TCP Congestion Control modules provide information
    but again this is not safe against icsk->icsk_ca_ops
    change and nla_put() errors were ignored. Some sockets
    could not get the additional info if skb was almost full.

    Fix this by returning a status from get_info() handlers and
    using rcu protection as well.

    Signed-off-by: Eric Dumazet
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Sep, 2014

1 commit

  • Fix places where there is space before tab, long lines, and
    awkward if(){, double spacing etc. Add blank line after declaration/initialization.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

04 May, 2014

1 commit


27 Feb, 2014

1 commit

  • Upcoming congestion controls for TCP require usec resolution for RTT
    estimations. Millisecond resolution is simply not enough these days.

    FQ/pacing in DC environments also require this change for finer control
    and removal of bimodal behavior due to the current hack in
    tcp_update_pacing_rate() for 'small rtt'

    TCP_CONG_RTT_STAMP is no longer needed.

    As Julian Anastasov pointed out, we need to keep user compatibility :
    tcp_metrics used to export RTT and RTTVAR in msec resolution,
    so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
    to use the new attributes if provided by the kernel.

    In this example ss command displays a srtt of 32 usecs (10Gbit link)

    lpk51:~# ./ss -i dst lpk52
    Netid State Recv-Q Send-Q Local Address:Port Peer
    Address:Port
    tcp ESTAB 0 1 10.246.11.51:42959
    10.246.11.52:64614
    cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
    cwnd:10 send
    3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

    Updated iproute2 ip command displays :

    lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
    10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
    10.246.11.51

    Old binary displays :

    lpk51:~# ip tcp_metrics | grep 10.246.11.52
    10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
    10.246.11.51

    With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Cc: Stephen Hemminger
    Cc: Yuchung Cheng
    Cc: Larry Brakmo
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Feb, 2014

1 commit


24 Jan, 2014

2 commits

  • Now that the definition is centralized in , the
    definitions of U32_MAX (and related) elsewhere in the kernel can be
    removed.

    Signed-off-by: Alex Elder
    Acked-by: Sage Weil
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • The symbol U32_MAX is defined in several spots. Change these
    definitions to be conditional. This is in preparation for the next
    patch, which centralizes the definition in .

    Signed-off-by: Alex Elder
    Cc: Sage Weil
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     

05 Nov, 2013

1 commit

  • Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
    regardless the number of packets. Consequently slow start performance
    is highly dependent on the degree of the stretch ACKs caused by
    receiver or network ACK compression mechanisms (e.g., delayed-ACK,
    GRO, etc). But slow start algorithm is to send twice the amount of
    packets of packets left so it should process a stretch ACK of degree
    N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
    follow up patch will use the remainder of the N (if greater than 1)
    to adjust cwnd in the congestion avoidance phase.

    In addition this patch retires the experimental limited slow start
    (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
    fractional cwnd increase in LSS requires a loop in slow start even
    though it's rarely used. Configuring such an increase step via a global
    sysctl on different BDPS seems hard. Finally and most importantly the
    slow start overshoot concern is now better covered by the Hybrid slow
    start (hystart) enabled by default.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

01 Nov, 2012

1 commit

  • Reading TCP stats when using TCP Illinois congestion control algorithm
    can cause a divide by zero kernel oops.

    The division by zero occur in tcp_illinois_info() at:
    do_div(t, ca->cnt_rtt);
    where ca->cnt_rtt can become zero (when rtt_reset is called)

    Steps to Reproduce:
    1. Register tcp_illinois:
    # sysctl -w net.ipv4.tcp_congestion_control=illinois
    2. Monitor internal TCP information via command "ss -i"
    # watch -d ss -i
    3. Establish new TCP conn to machine

    Either it fails at the initial conn, or else it needs to wait
    for a loss or a reset.

    This is only related to reading stats. The function avg_delay() also
    performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
    calling point in update_params(). Thus, simply fix tcp_illinois_info().

    Function tcp_illinois_info() / get_info() is called without
    socket lock. Thus, eliminate any race condition on ca->cnt_rtt
    by using a local stack variable. Simply reuse info.tcpv_rttcnt,
    as its already set to ca->cnt_rtt.
    Function avg_delay() is not affected by this race condition, as
    its called with the socket lock.

    Cc: Petr Matousek
    Signed-off-by: Jesper Dangaard Brouer
    Acked-by: Eric Dumazet
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

10 Mar, 2011

1 commit


18 Oct, 2010

1 commit

  • The patch below updates broken web addresses in the kernel

    Signed-off-by: Justin P. Mattock
    Cc: Maciej W. Rozycki
    Cc: Geert Uytterhoeven
    Cc: Finn Thain
    Cc: Randy Dunlap
    Cc: Matt Turner
    Cc: Dimitry Torokhov
    Cc: Mike Frysinger
    Acked-by: Ben Pfaff
    Acked-by: Hans J. Koch
    Reviewed-by: Finn Thain
    Signed-off-by: Jiri Kosina

    Justin P. Mattock
     

29 Jan, 2008

1 commit


29 Nov, 2007

1 commit


31 Jul, 2007

1 commit

  • This patch changes the API for the callback that is done after an ACK is
    received. It solves a couple of issues:

    * Some congestion controls want higher resolution value of RTT
    (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
    all compute a RTT in microseconds.

    * Other congestion control could use RTT at jiffies resolution.

    To keep API consistent the units should be the same for both cases, just the
    resolution should change.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

18 Jul, 2007

1 commit


16 Jun, 2007

1 commit

  • Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT
    sampling of congestion control modules. Inaccurate timestamps
    could be fed to them without providing any way for them to
    identify such cases. Previously RTT sampler was called only if
    FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
    timestamps nicely. In addition, the new behavior could give an
    invalid timestamp (zero) to RTT sampler if only skbs with
    TCPCB_RETRANS were ACKed. This solves both problems.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

26 Apr, 2007

4 commits

  • To avoid raw division, use ktime_to_timeval() to get usec.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Do some simple changes to make congestion control API faster/cleaner.
    * use ktime_t rather than timeval
    * merge rtt sampling into existing ack callback
    this means one indirect call versus two per ack.
    * use flags bits to store options/settings

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This version more closely matches the paper, and fixes several
    math errors. The biggest difference is that it updates alpha/beta
    once per RTT

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This is an implementation of TCP Illinois invented by Shao Liu
    at University of Illinois. It is a another variant of Reno which adapts
    the alpha and beta parameters based on RTT. The basic idea is to increase
    window less rapidly as delay approaches the maximum. See the papers
    and talks to get a more complete description.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger