21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

22 Sep, 2018

1 commit

  • There are few places where TCP reads skb->skb_mstamp expecting
    a value in usec unit.

    skb->tstamp (aka skb->skb_mstamp) will soon store CLOCK_TAI nsec value.

    Add tcp_skb_timestamp_us() to provide proper conversion when needed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Jul, 2018

1 commit

  • Congestion control algorithms, which access the rate sample
    through the tcp_cong_control function, only have access to the maximum
    of the send and receive interval, for cases where the acknowledgment
    rate may be inaccurate due to ACK compression or decimation. Algorithms
    may want to use send rates and receive rates as separate signals.

    Signed-off-by: Deepti Raghavan
    Acked-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Deepti Raghavan
     

08 Dec, 2017

1 commit

  • Mark tcp_sock during a SACK reneging event and invalidate rate samples
    while marked. Such rate samples may overestimate bw by including packets
    that were SACKed before reneging.

    < ack 6001 win 10000 sack 7001:38001
    < ack 7001 win 0 sack 8001:38001 // Reneg detected
    > seq 7001:8001 // RTO, SACK cleared.
    < ack 38001 win 10000

    In above example the rate sample taken after the last ack will count
    7001-38001 as delivered while the actual delivery rate likely could
    be much lower i.e. 7001-8001.

    This patch adds a new field tcp_sock.sack_reneg and marks it when we
    declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
    the last rate sample was taken before moving back to TCP_CA_Open. This
    patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
    is set.

    Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
    Signed-off-by: Yousuk Seung
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Acked-by: Soheil Hassas Yeganeh
    Acked-by: Eric Dumazet
    Acked-by: Priyaranjan Jha
    Signed-off-by: David S. Miller

    Yousuk Seung
     

16 Jun, 2017

1 commit


18 May, 2017

1 commit

  • TCP Timestamps option is defined in RFC 7323

    Traditionally on linux, it has been tied to the internal
    'jiffies' variable, because it had been a cheap and good enough
    generator.

    For TCP flows on the Internet, 1 ms resolution would be much better
    than 4ms or 10ms (HZ=250 or HZ=100 respectively)

    For TCP flows in the DC, Google has used usec resolution for more
    than two years with great success [1]

    Receive size autotuning (DRS) is indeed more precise and converges
    faster to optimal window size.

    This patch converts tp->tcp_mstamp to a plain u64 value storing
    a 1 usec TCP clock.

    This choice will allow us to upstream the 1 usec TS option as
    discussed in IETF 97.

    [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf

    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Apr, 2017

1 commit


21 Sep, 2016

3 commits

  • This commit export two new fields in struct tcp_info:

    tcpi_delivery_rate: The most recent goodput, as measured by
    tcp_rate_gen(). If the socket is limited by the sending
    application (e.g., no data to send), it reports the highest
    measurement instead of the most recent. The unit is bytes per
    second (like other rate fields in tcp_info).

    tcpi_delivery_rate_app_limited: A boolean indicating if the goodput
    was measured when the socket's throughput was limited by the
    sending application.

    This delivery rate information can be useful for applications that
    want to know the current throughput the TCP connection is seeing,
    e.g. adaptive bitrate video streaming. It can also be very useful for
    debugging or troubleshooting.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • This commit adds code to track whether the delivery rate represented
    by each rate_sample was limited by the application.

    Upon each transmit, we store in the is_app_limited field in the skb a
    boolean bit indicating whether there is a known "bubble in the pipe":
    a point in the rate sample interval where the sender was
    application-limited, and did not transmit even though the cwnd and
    pacing rate allowed it.

    This logic marks the flow app-limited on a write if *all* of the
    following are true:

    1) There is less than 1 MSS of unsent data in the write queue
    available to transmit.

    2) There is no packet in the sender's queues (e.g. in fq or the NIC
    tx queue).

    3) The connection is not limited by cwnd.

    4) There are no lost packets to retransmit.

    The tcp_rate_check_app_limited() code in tcp_rate.c determines whether
    the connection is application-limited at the moment. If the flow is
    application-limited, it sets the tp->app_limited field. If the flow is
    application-limited then that means there is effectively a "bubble" of
    silence in the pipe now, and this silence will be reflected in a lower
    bandwidth sample for any rate samples from now until we get an ACK
    indicating this bubble has exited the pipe: specifically, until we get
    an ACK for the next packet we transmit.

    When we send every skb we record in scb->tx.is_app_limited whether the
    resulting rate sample will be application-limited.

    The code in tcp_rate_gen() checks to see when it is safe to mark all
    known application-limited bubbles of silence as having exited the
    pipe. It does this by checking to see when the delivered count moves
    past the tp->app_limited marker. At this point it zeroes the
    tp->app_limited marker, as all known bubbles are out of the pipe.

    We make room for the tx.is_app_limited bit in the skb by borrowing a
    bit from the in_flight field used by NV to record the number of bytes
    in flight. The receive window in the TCP header is 16 bits, and the
    max receive window scaling shift factor is 14 (RFC 1323). So the max
    receive window offered by the TCP protocol is 2^(16+14) = 2^30. So we
    only need 30 bits for the tx.in_flight used by NV.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Soheil Hassas Yeganeh
     
  • This patch generates data delivery rate (throughput) samples on a
    per-ACK basis. These rate samples can be used by congestion control
    modules, and specifically will be used by TCP BBR in later patches in
    this series.

    Key state:

    tp->delivered: Tracks the total number of data packets (original or not)
    delivered so far. This is an already-existing field.

    tp->delivered_mstamp: the last time tp->delivered was updated.

    Algorithm:

    A rate sample is calculated as (d1 - d0)/(t1 - t0) on a per-ACK basis:

    d1: the current tp->delivered after processing the ACK
    t1: the current time after processing the ACK

    d0: the prior tp->delivered when the acked skb was transmitted
    t0: the prior tp->delivered_mstamp when the acked skb was transmitted

    When an skb is transmitted, we snapshot d0 and t0 in its control
    block in tcp_rate_skb_sent().

    When an ACK arrives, it may SACK and ACK some skbs. For each SACKed
    or ACKed skb, tcp_rate_skb_delivered() updates the rate_sample struct
    to reflect the latest (d0, t0).

    Finally, tcp_rate_gen() generates a rate sample by storing
    (d1 - d0) in rs->delivered and (t1 - t0) in rs->interval_us.

    One caveat: if an skb was sent with no packets in flight, then
    tp->delivered_mstamp may be either invalid (if the connection is
    starting) or outdated (if the connection was idle). In that case,
    we'll re-stamp tp->delivered_mstamp.

    At first glance it seems t0 should always be the time when an skb was
    transmitted, but actually this could over-estimate the rate due to
    phase mismatch between transmit and ACK events. To track the delivery
    rate, we ensure that if packets are in flight then t0 and and t1 are
    times at which packets were marked delivered.

    If the initial and final RTTs are different then one may be corrupted
    by some sort of noise. The noise we see most often is sending gaps
    caused by delayed, compressed, or stretched acks. This either affects
    both RTTs equally or artificially reduces the final RTT. We approach
    this by recording the info we need to compute the initial RTT
    (duration of the "send phase" of the window) when we recorded the
    associated inflight. Then, for a filter to avoid bandwidth
    overestimates, we generalize the per-sample bandwidth computation
    from:

    bw = delivered / ack_phase_rtt

    to the following:

    bw = delivered / max(send_phase_rtt, ack_phase_rtt)

    In large-scale experiments, this filtering approach incorporating
    send_phase_rtt is effective at avoiding bandwidth overestimates due to
    ACK compression or stretched ACKs.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Yuchung Cheng