08 Oct, 2008

1 commit


01 Oct, 2008

1 commit

  • Current TCP code relies on the local port of the listening socket
    being the same as the destination address of the incoming
    connection. Port redirection used by many transparent proxying
    techniques obviously breaks this, so we have to store the original
    destination port address.

    This patch extends struct inet_request_sock and stores the incoming
    destination port value there. It also modifies the handshake code to
    use that value as the source port when sending reply packets.

    Signed-off-by: KOVACS Krisztian
    Signed-off-by: David S. Miller

    KOVACS Krisztian
     

23 Sep, 2008

2 commits


22 Sep, 2008

1 commit


21 Sep, 2008

4 commits

  • Most importantly avoid doing it with cumulative ACK. Not clearing
    means that we no longer need n^2 processing in resolution of each
    fast recovery.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Both loops are quite similar, so they can be combined
    with little effort. As a result, forward_skb_hint becomes
    obsolete as well.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Main benefit in this is that we can then freely point
    the retransmit_skb_hint to anywhere we want to because
    there's no longer need to know what would be the count
    changes involve, and since this is really used only as a
    terminator, unnecessary work is one time walk at most,
    and if some retransmissions are necessary after that
    point later on, the walk is not full waste of time
    anyway.

    Since retransmit_high must be kept valid, all lost
    markers must ensure that.

    Now I also have learned how those "holes" in the
    rexmittable skbs can appear, mtu probe does them. So
    I removed the misleading comment as well.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Ie., the difference between partial and all clearing doesn't
    exists anymore since the SACK optimizations got dropped by
    an sacktag rewrite.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

09 Sep, 2008

1 commit


04 Sep, 2008

1 commit


19 Jul, 2008

2 commits

  • This should fix the following bugs:
    * Connections with MD5 signatures produce invalid packets whenever SACK
    options are included
    * MD5 signatures are counted twice in the MSS calculations

    Behaviour changes:
    * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK

    This is because we can't fit any SACK blocks in a packet with MD5 + TS
    options. There was discussion about disabling SACK rather than TS in
    order to fit in better with old, buggy kernels, but that was deemed to
    be unnecessary.

    * SYNs with MD5 don't include a TS option

    See above.

    Additionally, it removes a bunch of duplicated logic for calculating options,
    which should help avoid these sort of issues in the future.

    Signed-off-by: Adam Langley
    Signed-off-by: David S. Miller

    Adam Langley
     
  • Currently, the MD5 code assumes that the SKBs are linear and, in the case
    that they aren't, happily goes off and hashes off the end of the SKB and
    into random memory.

    Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy
    Polyakov. Also includes a couple of missed route_caps from Stephen's patch
    in [2].

    [1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2
    [2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2

    Signed-off-by: Adam Langley
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Adam Langley
     

18 Jul, 2008

1 commit


17 Jul, 2008

8 commits


15 Jun, 2008

1 commit


14 Jun, 2008

1 commit


13 Jun, 2008

1 commit

  • This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
    ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
    the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
    ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

    This change causes several problems, first reported by Ingo Molnar
    as a distcc-over-loopback regression where connections were getting
    stuck.

    Ilpo Järvinen first spotted the locking problems. The new function
    added by this code, tcp_defer_accept_check(), only has the
    child socket locked, yet it is modifying state of the parent
    listening socket.

    Fixing that is non-trivial at best, because we can't simply just grab
    the parent listening socket lock at this point, because it would
    create an ABBA deadlock. The normal ordering is parent listening
    socket --> child socket, but this code path would require the
    reverse lock ordering.

    Next is a problem noticed by Vitaliy Gusev, he noted:

    ----------------------------------------
    >--- a/net/ipv4/tcp_timer.c
    >+++ b/net/ipv4/tcp_timer.c
    >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
    > goto death;
    > }
    >
    >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
    >+ tcp_send_active_reset(sk, GFP_ATOMIC);
    >+ goto death;

    Here socket sk is not attached to listening socket's request queue. tcp_done()
    will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
    release this sk) as socket is not DEAD. Therefore socket sk will be lost for
    freeing.
    ----------------------------------------

    Finally, Alexey Kuznetsov argues that there might not even be any
    real value or advantage to these new semantics even if we fix all
    of the bugs:

    ----------------------------------------
    Hiding from accept() sockets with only out-of-order data only
    is the only thing which is impossible with old approach. Is this really
    so valuable? My opinion: no, this is nothing but a new loophole
    to consume memory without control.
    ----------------------------------------

    So revert this thing for now.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jun, 2008

4 commits


11 Jun, 2008

1 commit

  • - The tcp_unhash() method in /include/net/tcp.h is no more needed, as the
    unhash method in tcp_prot structure is now inet_unhash (instead of
    tcp_unhash in the
    past); see tcp_prot structure in net/ipv4/tcp_ipv4.c.

    - So, this patch removes tcp_unhash() declaration from include/net/tcp.h

    Signed-off-by: Rami Rosen
    Signed-off-by: David S. Miller

    Rami Rosen
     

16 Apr, 2008

1 commit


14 Apr, 2008

6 commits


10 Apr, 2008

1 commit

  • Allow the use of SACK and window scaling when syncookies are used
    and the client supports tcp timestamps. Options are encoded into
    the timestamp sent in the syn-ack and restored from the timestamp
    echo when the ack is received.

    Based on earlier work by Glenn Griffin.
    This patch avoids increasing the size of structs by encoding TCP
    options into the least significant bits of the timestamp and
    by not using any 'timestamp offset'.

    The downside is that the timestamp sent in the packet after the synack
    will increase by several seconds.

    changes since v1:
    don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
    and have ipv6/syncookies.c use it.
    Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()

    Reviewed-by: Hagen Paul Pfeifer
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

08 Apr, 2008

1 commit

  • This fixes Bugzilla #10384

    tcp_simple_retransmit does L increment without any checking
    whatsoever for overflowing S+L when Reno is in use.

    The simplest scenario I can currently think of is rather
    complex in practice (there might be some more straightforward
    cases though). Ie., if mss is reduced during mtu probing, it
    may end up marking everything lost and if some duplicate ACKs
    arrived prior to that sacked_out will be non-zero as well,
    leading to S+L > packets_out, tcp_clean_rtx_queue on the next
    cumulative ACK or tcp_fastretrans_alert on the next duplicate
    ACK will fix the S counter.

    More straightforward (but questionable) solution would be to
    just call tcp_reset_reno_sack() in tcp_simple_retransmit but
    it would negatively impact the probe's retransmission, ie.,
    the retransmissions would not occur if some duplicate ACKs
    had arrived.

    So I had to add reno sacked_out reseting to CA_Loss state
    when the first cumulative ACK arrives (this stale sacked_out
    might actually be the explanation for the reports of left_out
    overflows in kernel prior to 2.6.23 and S+L overflow reports
    of 2.6.24). However, this alone won't be enough to fix kernel
    before 2.6.24 because it is building on top of the commit
    1b6d427bb7e ([TCP]: Reduce sacked_out with reno when purging
    write_queue) to keep the sacked_out from overflowing.

    Signed-off-by: Ilpo Järvinen
    Reported-by: Alessandro Suardi
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

24 Mar, 2008

1 commit

  • the first u32 copied from syncookie_secret is overwritten by the
    minute-counter four lines below. After adjusting the destination
    address, the size of syncookie_secret can be reduced accordingly.

    AFAICS, the only other user of syncookie_secret[] is the ipv6
    syncookie support. Because ipv6 syncookies only grab 44 bytes from
    syncookie_secret[], this shouldn't affect them in any way.

    With fixes from Glenn Griffin.

    Signed-off-by: Florian Westphal
    Acked-by: Glenn Griffin
    Signed-off-by: David S. Miller

    Florian Westphal