18 May, 2017

2 commits

  • TCP Timestamps option is defined in RFC 7323

    Traditionally on linux, it has been tied to the internal
    'jiffies' variable, because it had been a cheap and good enough
    generator.

    For TCP flows on the Internet, 1 ms resolution would be much better
    than 4ms or 10ms (HZ=250 or HZ=100 respectively)

    For TCP flows in the DC, Google has used usec resolution for more
    than two years with great success [1]

    Receive size autotuning (DRS) is indeed more precise and converges
    faster to optimal window size.

    This patch converts tp->tcp_mstamp to a plain u64 value storing
    a 1 usec TCP clock.

    This choice will allow us to upstream the 1 usec TS option as
    discussed in IETF 97.

    [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf

    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • tcp_time_stamp will become slightly more expensive soon,
    cache its value.

    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 May, 2017

1 commit

  • Be careful when comparing tcp_time_stamp to some u32 quantity,
    otherwise result can be surprising.

    Fixes: 7c106d7e782b ("[TCP]: TCP Low Priority congestion control")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Nov, 2016

1 commit

  • The undo_cwnd fallback in the stack doubles cwnd based on ssthresh,
    which un-does reno halving behaviour.

    It seems more appropriate to let congctl algorithms pair .ssthresh
    and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it
    up for all congestion algorithms that used to rely on the fallback.

    Cc: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

12 May, 2016

1 commit

  • Replace 2 arguments (cnt and rtt) in the congestion control modules'
    pkts_acked() function with a struct. This will allow adding more
    information without having to modify existing congestion control
    modules (tcp_nv in particular needs bytes in flight when packet
    was sent).

    As proposed by Neal Cardwell in his comments to the tcp_nv patch.

    Signed-off-by: Lawrence Brakmo
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Lawrence Brakmo
     

04 May, 2014

1 commit


27 Feb, 2014

1 commit

  • Upcoming congestion controls for TCP require usec resolution for RTT
    estimations. Millisecond resolution is simply not enough these days.

    FQ/pacing in DC environments also require this change for finer control
    and removal of bimodal behavior due to the current hack in
    tcp_update_pacing_rate() for 'small rtt'

    TCP_CONG_RTT_STAMP is no longer needed.

    As Julian Anastasov pointed out, we need to keep user compatibility :
    tcp_metrics used to export RTT and RTTVAR in msec resolution,
    so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
    to use the new attributes if provided by the kernel.

    In this example ss command displays a srtt of 32 usecs (10Gbit link)

    lpk51:~# ./ss -i dst lpk52
    Netid State Recv-Q Send-Q Local Address:Port Peer
    Address:Port
    tcp ESTAB 0 1 10.246.11.51:42959
    10.246.11.52:64614
    cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
    cwnd:10 send
    3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

    Updated iproute2 ip command displays :

    lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
    10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
    10.246.11.51

    Old binary displays :

    lpk51:~# ip tcp_metrics | grep 10.246.11.52
    10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
    10.246.11.51

    With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Cc: Stephen Hemminger
    Cc: Yuchung Cheng
    Cc: Larry Brakmo
    Cc: Julian Anastasov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Feb, 2014

1 commit


05 Nov, 2013

1 commit

  • Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
    regardless the number of packets. Consequently slow start performance
    is highly dependent on the degree of the stretch ACKs caused by
    receiver or network ACK compression mechanisms (e.g., delayed-ACK,
    GRO, etc). But slow start algorithm is to send twice the amount of
    packets of packets left so it should process a stretch ACK of degree
    N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
    follow up patch will use the remainder of the N (if greater than 1)
    to adjust cwnd in the congestion avoidance phase.

    In addition this patch retires the experimental limited slow start
    (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
    fractional cwnd increase in LSS requires a loop in slow start even
    though it's rarely used. Configuring such an increase step via a global
    sysctl on different BDPS seems hard. Finally and most importantly the
    slow start overshoot concern is now better covered by the Hybrid slow
    start (hystart) enabled by default.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

31 Mar, 2011

1 commit


10 Mar, 2011

1 commit


24 Nov, 2009

1 commit

  • On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
    > It should be of the form:
    > if (x &&
    > y)
    >
    > or:
    > if (x && y)
    >
    > Fix patches, rather than complaints, for existing cases where things
    > do not follow this pattern are certainly welcome.

    Also collapsed some multiple tabs to single space.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

29 Jan, 2008

1 commit


31 Jul, 2007

1 commit

  • This patch changes the API for the callback that is done after an ACK is
    received. It solves a couple of issues:

    * Some congestion controls want higher resolution value of RTT
    (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
    all compute a RTT in microseconds.

    * Other congestion control could use RTT at jiffies resolution.

    To keep API consistent the units should be the same for both cases, just the
    resolution should change.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

18 Jul, 2007

1 commit


16 Jun, 2007

1 commit

  • Commit 164891aadf1721fca4dce473bb0e0998181537c6 broke RTT
    sampling of congestion control modules. Inaccurate timestamps
    could be fed to them without providing any way for them to
    identify such cases. Previously RTT sampler was called only if
    FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
    timestamps nicely. In addition, the new behavior could give an
    invalid timestamp (zero) to RTT sampler if only skbs with
    TCPCB_RETRANS were ACKed. This solves both problems.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

26 Apr, 2007

2 commits


29 Sep, 2006

1 commit


23 Sep, 2006

2 commits


18 Sep, 2006

1 commit


18 Jun, 2006

1 commit

  • TCP Low Priority is a distributed algorithm whose goal is to utilize only
    the excess network bandwidth as compared to the ``fair share`` of
    bandwidth as targeted by TCP. Available from:
    http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf

    Original Author:
    Aleksandar Kuzmanovic

    See http://www-ece.rice.edu/networks/TCP-LP/ for their implementation.
    As of 2.6.13, Linux supports pluggable congestion control algorithms.
    Due to the limitation of the API, we take the following changes from
    the original TCP-LP implementation:
    o We use newReno in most core CA handling. Only add some checking
    within cong_avoid.
    o Error correcting in remote HZ, therefore remote HZ will be keeped
    on checking and updating.
    o Handling calculation of One-Way-Delay (OWD) within rtt_sample, sicne
    OWD have a similar meaning as RTT. Also correct the buggy formular.
    o Handle reaction for Early Congestion Indication (ECI) within
    pkts_acked, as mentioned within pseudo code.
    o OWD is handled in relative format, where local time stamp will in
    tcp_time_stamp format.

    Port from 2.4.19 to 2.6.16 as module by:
    Wong Hoi Sing Edison
    Hung Hing Lun

    Signed-off-by: Wong Hoi Sing Edison
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Wong Hoi Sing Edison