18 Jul, 2007

1 commit


31 May, 2007

1 commit


03 May, 2007

1 commit

  • TCP has a transitional state when SACK is not in use during
    which this invariant is temporarily broken. Without SACK,
    tcp_clean_rtx_queue does not decrement sacked_out. Therefore
    calls to tcp_sync_left_out before sacked_out is again
    corrected by tcp_fastretrans_alert can trigger this trap as
    sacked_out still has couple of segments that are already out
    of window.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

30 Apr, 2007

2 commits

  • This is a corner case where less than MSS sized new data thingie
    is awaiting in the send queue. For F-RTO to work correctly, a
    new data segment must be sent at certain point or F-RTO cannot
    be used at all. RFC4138 allows overriding of Nagle at that
    point.

    Implementation uses frto_counter states 2 and 3 to distinguish
    when Nagle override is needed.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • SACKED_ACKED and LOST are mutually exclusive with SACK, thus
    having their sum larger than packets_out is bug with SACK.
    Eventually these bugs trigger traps in the tcp_clean_rtx_queue
    with SACK but it's much more informative to do this here.

    Non-SACK TCP, however, could get more than packets_out duplicate
    ACKs which each increment sacked_out, so it makes sense to do
    this kind of limitting for non-SACK TCP but not for SACK enabled
    one. Perhaps the author had the opposite in mind but did the
    logic accidently wrong way around? Anyway, the sacked_out
    incrementer code for non-SACK already deals this issue before
    calling sync_left_out so this trapping can be done
    unconditionally.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

26 Apr, 2007

11 commits

  • Do some simple changes to make congestion control API faster/cleaner.
    * use ktime_t rather than timeval
    * merge rtt sampling into existing ack callback
    this means one indirect call versus two per ack.
    * use flags bits to store options/settings

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This is (mostly) automated change using magic:

    sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp\([^{]*\n{\n\)|
    struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
    -e 's|struct sock \*sk, struct tcp_sock \*tp|
    struct sock \*sk|g' -e 's|sk, tp\([^-]\)|sk\1|g'

    Fixed four unused variable (tp) warnings that were introduced.

    In addition, manually added newlines after local variables and
    tweaked function arguments positioning.

    $ gcc --version
    gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
    ...
    $ codiff -fV built-in.o.old built-in.o.new
    net/ipv4/route.c:
    rt_cache_flush | +14
    1 function changed, 14 bytes added

    net/ipv4/tcp.c:
    tcp_setsockopt | -5
    tcp_sendpage | -25
    tcp_sendmsg | -16
    3 functions changed, 46 bytes removed

    net/ipv4/tcp_input.c:
    tcp_try_undo_recovery | +3
    tcp_try_undo_dsack | +2
    tcp_mark_head_lost | -12
    tcp_ack | -15
    tcp_event_data_recv | -32
    tcp_rcv_state_process | -10
    tcp_rcv_established | +1
    7 functions changed, 6 bytes added, 69 bytes removed, diff: -63

    net/ipv4/tcp_output.c:
    update_send_head | -9
    tcp_transmit_skb | +19
    tcp_cwnd_validate | +1
    tcp_write_wakeup | -17
    __tcp_push_pending_frames | -25
    tcp_push_one | -8
    tcp_send_fin | -4
    7 functions changed, 20 bytes added, 63 bytes removed, diff: -43

    built-in.o.new:
    18 functions changed, 40 bytes added, 178 bytes removed, diff: -138

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • The function is quite big and has several call sites and nothing
    to collapse by compiler optimization on inlining.

    Besides it's nicer to read in a in .c file.

    Signed-off-by: Andi Kleen
    Signed-off-by: David S. Miller

    Andi Kleen
     
  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This allows the write queue implementation to be changed,
    for example, to one which allows fast interval searching.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Where appropriate, convert references to xtime.tv_sec to the
    get_seconds() helper function.

    Signed-off-by: James Morris
    Signed-off-by: David S. Miller

    James Morris
     
  • New sysctl tcp_frto_response is added to select amongst these
    responses:
    - Rate halving based; reuses CA_CWR state (default)
    - Very conservative; used to be the only one available (=1)
    - Undo cwr; undoes ssthresh and cwnd reductions (=2)

    The response with rate halving requires a new parameter to
    tcp_enter_cwr because FRTO has already reduced ssthresh and
    doing a second reduction there has to be prevented. In addition,
    to keep things nice on 80 cols screen, a local variable was
    added.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Signed-off-by: John Heffner
    Signed-off-by: David S. Miller

    John Heffner
     
  • This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
    than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
    NOT be entered when earlier fast recovery is underway."

    I think the RFC means to say (especially in the light of
    Appendix B) that ...recovery is underway (not just fast recovery)
    or was underway when it was interrupted by an earlier (F-)RTO
    that hasn't yet been resolved (snd_una has not advanced enough).
    Thus, my interpretation is that whenever TCP has ever
    retransmitted other than head, basic version cannot be used
    because then the order assumptions which are used as FRTO basis
    do not hold.

    NewReno has only the head segment retransmitted at a time.
    Therefore, walk up to the segment that has not been SACKed, if
    that segment is not retransmitted nor anything before it, we know
    for sure, that nothing after the non-SACKed segment should be
    either. This assumption is valid because TCPCB_EVER_RETRANS does
    not leave holes but each non-SACKed segment is rexmitted
    in-order.

    Check for retrans_out > 1 avoids more expensive walk through the
    skb list, as we can know the result beforehand: F-RTO will not be
    allowed.

    SACKed skb can turn into non-SACked only in the extremely rare
    case of SACK reneging, in this case we might fail to detect
    retransmissions if there were them for any other than head. To
    get rid of that feature, whole rexmit queue would have to be
    walked (always) or FRTO should be prevented when SACK reneging
    happens. Of course RTO should still trigger after reneging which
    makes this issue even less likely to show up. And as long as the
    response is as conservative as it's now, nothing bad happens even
    then.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • In addition, removed inline.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

09 Feb, 2007

1 commit


05 Jan, 2007

1 commit

  • This reverts the new (unambiguous) definition of the TCP `before'
    relation. As pointed out in an example by Herbert Xu, there is
    existing code which implicitly requires the old definition in order
    to work correctly.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

23 Dec, 2006

1 commit

  • While looking at DCCP sequence numbers, I stumbled over a problem with
    the following definition of before in tcp.h:

    static inline int before(__u32 seq1, __u32 seq2)
    {
    return (__s32)(seq1-seq2) < 0;
    }

    Problem: This definition suffers from an an ambiguity, i.e. always

    before(a, (a + 2^31) % 2^32)) = 1
    before((a + 2^31) % 2^32), a) = 1

    In text: when the difference between a and b amounts to 2^31,
    a is always considered `before' b, the function can not decide.
    The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

    Solution: There is a simple fix, by defining before in such a way that
    0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
    By not using the middle between 0 and 2^32, before can be made
    unambiguous.
    This is achieved by testing whether seq2-seq1 > 0 (using signed
    32-bit arithmetic).

    I attach a patch to codify this. Also the `after' relation is basically
    a redefinition of `before', it is now defined as a macro after before.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

03 Dec, 2006

7 commits


03 Aug, 2006

1 commit

  • Refer to RFC2012, tcpAttemptFails is defined as following:
    tcpAttemptFails OBJECT-TYPE
    SYNTAX Counter32
    MAX-ACCESS read-only
    STATUS current
    DESCRIPTION
    "The number of times TCP connections have made a direct
    transition to the CLOSED state from either the SYN-SENT
    state or the SYN-RCVD state, plus the number of times TCP
    connections have made a direct transition to the LISTEN
    state from the SYN-RCVD state."
    ::= { tcp 7 }

    When I lookup into RFC793, I found that the state change should occured
    under following condition:
    1. SYN-SENT -> CLOSED
    a) Received ACK,RST segment when SYN-SENT state.

    2. SYN-RCVD -> CLOSED
    b) Received SYN segment when SYN-RCVD state(came from LISTEN).
    c) Received RST segment when SYN-RCVD state(came from SYN-SENT).
    d) Received SYN segment when SYN-RCVD state(came from SYN-SENT).

    3. SYN-RCVD -> LISTEN
    e) Received RST segment when SYN-RCVD state(came from LISTEN).

    In my test, those direct state transition can not be counted to
    tcpAttemptFails.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

09 Jul, 2006

1 commit

  • Certain subsystems in the stack (e.g., netfilter) can break the partial
    checksum on GSO packets. Until they're fixed, this patch allows this to
    work by recomputing the partial checksums through the GSO mechanism.

    Once they've all been converted to update the partial checksum instead of
    clearing it, this workaround can be removed.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

01 Jul, 2006

1 commit

  • This patch generalises the TSO-specific bits from sk_setup_caps by adding
    the sk_gso_type member to struct sock. This makes sk_setup_caps generic
    so that it can be used by TCPv6 or UFO.

    The only catch is that whoever uses this must provide a GSO implementation
    for their protocol which I think is a fair deal :) For now UFO continues to
    live without a GSO implementation which is OK since it doesn't use the sock
    caps field at the moment.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Jun, 2006

1 commit

  • When GSO packets come from an untrusted source (e.g., a Xen guest domain),
    we need to verify the header integrity before passing it to the hardware.

    Since the first step in GSO is to verify the header, we can reuse that
    code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
    bit set can only be fed directly to devices with the corresponding bit
    NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
    is fed to the GSO engine which will allow the packet to be sent to the
    hardware if it passes the header check.

    This patch changes the sg flag to a full features flag. The same method
    can be used to implement TSO ECN support. We simply have to mark packets
    with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
    NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
    the packet, or segment the first MTU and pass the rest to the hardware for
    further segmentation.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

23 Jun, 2006

2 commits

  • This patch adds the GSO implementation for IPv4 TCP.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
    going to scale if we add any more segmentation methods (e.g., DCCP). So
    let's merge them.

    They were used to tell the protocol of a packet. This function has been
    subsumed by the new gso_type field. This is essentially a set of netdev
    feature bits (shifted by 16 bits) that are required to process a specific
    skb. As such it's easy to tell whether a given device can process a GSO
    skb: you just have to and the gso_type field and the netdev's features
    field.

    I've made gso_type a conjunction. The idea is that you have a base type
    (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
    For example, if we add a hardware TSO type that supports ECN, they would
    declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
    have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
    packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
    to be emulated in software.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

21 Jun, 2006

1 commit

  • * git://git.infradead.org/hdrcleanup-2.6: (63 commits)
    [S390] __FD_foo definitions.
    Switch to __s32 types in joystick.h instead of C99 types for consistency.
    Add to headers included for userspace in
    Move inclusion of out of user scope in asm-x86_64/mtrr.h
    Remove struct fddi_statistics from user view in
    Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
    Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
    Include and use __uXX types in
    Use __uXX types in , include too
    Remove private struct dx_hash_info from public view in
    Include and use __uXX types in
    Use __uXX types in for struct divert_blk et al.
    Use __u32 for elf_addr_t in , not u32. It's user-visible.
    Remove PPP_FCS from user view in , remove __P mess entirely
    Use __uXX types in user-visible structures in
    Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
    Use __uXX types for S390 DASD volume label definitions which are user-visible
    S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
    Remove unneeded inclusion of from
    Fix private integer types used in V4L2 ioctls.
    ...

    Manually resolve conflict in include/linux/mtd/physmap.h

    Linus Torvalds
     

18 Jun, 2006

5 commits


26 Apr, 2006

1 commit


31 Mar, 2006

1 commit