21 Oct, 2010

2 commits


24 Sep, 2010

1 commit


13 Jul, 2010

1 commit

  • a new boolean flag no_autobind is added to structure proto to avoid the autobind
    calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
    TCP's sendmsg() and sendpage() pathes.

    Signed-off-by: Changli Gao
    ----
    include/net/inet_common.h | 4 ++++
    include/net/sock.h | 1 +
    include/net/tcp.h | 8 ++++----
    net/ipv4/af_inet.c | 15 +++++++++------
    net/ipv4/tcp.c | 11 +++++------
    net/ipv4/tcp_ipv4.c | 3 +++
    net/ipv6/af_inet6.c | 8 ++++----
    net/ipv6/tcp_ipv6.c | 3 +++
    8 files changed, 33 insertions(+), 20 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

27 Jun, 2010

1 commit

  • Allows use of ECN when syncookies are in effect by encoding ecn_ok
    into the syn-ack tcp timestamp.

    While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
    With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
    removes the "if (0)".

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

16 Jun, 2010

1 commit

  • When syncookies are in effect, req->iif is left uninitialized.
    In case of e.g. link-local addresses the route lookup then fails
    and no syn-ack is sent.

    Rearrange things so ->iif is also initialized in the syncookie case.

    want_cookie can only be true when the isn was zero, thus move the want_cookie
    check into the "!isn" branch.

    Cc: Glenn Griffin
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Jun, 2010

1 commit


02 Jun, 2010

1 commit

  • There are more than a dozen occurrences of following code in the
    IPv6 stack:

    if (opt && opt->srcrt) {
    struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
    ipv6_addr_copy(&final, &fl.fl6_dst);
    ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
    final_p = &final;
    }

    Replace those with a helper. Note that the helper overrides final_p
    in all cases. This is ok as final_p was previously initialized to
    NULL when declared.

    Signed-off-by: Arnaud Ebalard
    Signed-off-by: David S. Miller

    Arnaud Ebalard
     

16 May, 2010

1 commit

  • TCP-MD5 sessions have intermittent failures, when route cache is
    invalidated. ip_queue_xmit() has to find a new route, calls
    sk_setup_caps(sk, &rt->u.dst), destroying the

    sk->sk_route_caps &= ~NETIF_F_GSO_MASK

    that MD5 desperately try to make all over its way (from
    tcp_transmit_skb() for example)

    So we send few bad packets, and everything is fine when
    tcp_transmit_skb() is called again for this socket.

    Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
    socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

    Reported-by: Bhaskar Dutta
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2010

1 commit

  • This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism.

    Not to users of mapped address; the IPV6 and IPV4 socket options are seperate.
    The server does have to deal with both IPv4 and IPv6 socket options
    and the client has to handle the different for each family.

    On client:
    int ttl = 255;
    getaddrinfo(argv[1], argv[2], &hint, &result);

    for (rp = result; rp != NULL; rp = rp->ai_next) {
    s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
    if (s < 0) continue;

    if (rp->ai_family == AF_INET) {
    setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl));
    } else if (rp->ai_family == AF_INET6) {
    setsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS,
    &ttl, sizeof(ttl)))
    }

    if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) {
    ...

    On server:
    int minttl = 255 - maxhops;

    getaddrinfo(NULL, port, &hints, &result);
    for (rp = result; rp != NULL; rp = rp->ai_next) {
    s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
    if (s < 0) continue;

    if (rp->ai_family == AF_INET6)
    setsockopt(s, IPPROTO_IPV6, IPV6_MINHOPCOUNT,
    &minttl, sizeof(minttl));
    setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl));

    if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0)
    break
    ...

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

22 Apr, 2010

1 commit


21 Apr, 2010

4 commits

  • Commit 6651ffc8e8bdd5fb4b7d1867c6cfebb4f309512c
    ("ipv6: Fix tcp_v6_send_response transport header setting.")
    fixed one half of why ipv6 tcp response checksums were
    invalid, but it's not the whole story.

    If we're going to use CHECKSUM_PARTIAL for these things (which we are
    since commit 2e8e18ef52e7dd1af0a3bd1f7d990a1d0b249586 "tcp: Set
    CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"), we can't be setting
    buff->csum as we always have been here in tcp_v6_send_response. We
    need to leave it at zero.

    Kill that line and checksums are good again.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Conflicts:
    drivers/net/wireless/iwlwifi/iwl-6000.c
    net/core/dev.c

    David S. Miller
     
  • My recent patch to remove the open-coded checksum sequence in
    tcp_v6_send_response broke it as we did not set the transport
    header pointer on the new packet.

    Actually, there is code there trying to set the transport
    header properly, but it sets it for the wrong skb ('skb'
    instead of 'buff').

    This bug was introduced by commit
    a8fdf2b331b38d61fb5f11f3aec4a4f9fb2dedcb ("ipv6: Fix
    tcp_v6_send_response(): it didn't set skb transport header")

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Sparse can help us find endianness bugs, but we need to make some
    cleanups to be able to more easily spot real bugs.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Apr, 2010

1 commit

  • As Herbert Xu said: we should be able to simply replace ipfragok
    with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
    has droped ipfragok and set local_df value properly.

    The patch kills the ipfragok parameter of .queue_xmit().

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

12 Apr, 2010

2 commits

  • inet: Remove unused send_check length argument

    This patch removes the unused length argument from the send_check
    function in struct inet_connection_sock_af_ops.

    Signed-off-by: Herbert Xu
    Tested-by: Yinghai
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv6

    This patch moves the common code between tcp_v6_send_check and
    tcp_v6_gso_send_check into a new function __tcp_v6_send_check.

    It then uses the new function in tcp_v6_send_synack as well as
    tcp_v6_send_response so that they handle CHECKSUM_PARTIAL properly.

    Signed-off-by: Herbert Xu
    Tested-by: Yinghai
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Mar, 2010

1 commit

  • Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
    of dropping frames when backlog queue is full.

    Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
    possibility of dropping frames when TTL is under a given limit.

    This patch adds new SNMP MIB entries, named TCPBacklogDrop and
    TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line

    netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
    TCPBacklogDrop: 0
    TCPMinTTLDrop: 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Mar, 2010

2 commits

  • sk_add_backlog -> __sk_add_backlog
    sk_add_backlog_limited -> sk_add_backlog

    Signed-off-by: Zhu Yi
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhu Yi
     
  • Make tcp adapt to the limited socket backlog change.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: "Pekka Savola (ipv6)"
    Cc: Patrick McHardy
    Signed-off-by: Zhu Yi
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhu Yi
     

18 Jan, 2010

2 commits

  • __net_init/__net_exit are apparently not going away, so use them
    to full extent.

    In some cases __net_init was removed, because it was called from
    __net_exit code.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • Currently we don't increment SYN-ACK timeouts & retransmissions
    although we do increment the same stats for SYN. We seem to have lost
    the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
    (commit 2248761e in the netdev-vger-cvs tree).

    This patch fixes this issue. In the process we also rename the v4/v6
    syn/ack retransmit functions for clarity. We also add a new
    request_socket operations (syn_ack_timeout) so we can keep code in
    inet_connection_sock.c protocol agnostic.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

08 Jan, 2010

1 commit


16 Dec, 2009

1 commit

  • It creates a regression, triggering badness for SYN_RECV
    sockets, for example:

    [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
    [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
    [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
    [19148.023496] MSR: 00029032 CR: 24002442 XER: 00000000
    [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

    This is likely caused by the change in the 'estab' parameter
    passed to tcp_parse_options() when invoked by the functions
    in net/ipv4/tcp_minisocks.c

    But even if that is fixed, the ->conn_request() changes made in
    this patch series is fundamentally wrong. They try to use the
    listening socket's 'dst' to probe the route settings. The
    listening socket doesn't even have a route, and you can't
    get the right route (the child request one) until much later
    after we setup all of the state, and it must be done by hand.

    This stuff really isn't ready, so the best thing to do is a
    full revert. This reverts the following commits:

    f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d
    022c3f7d82f0f1c68018696f2f027b87b9bb45c2
    1aba721eba1d84a2defce45b950272cee1e6c72a
    cda42ebd67ee5fdf09d7057b5a4584d36fe8a335
    345cda2fd695534be5a4494f1b59da9daed33663
    dc343475ed062e13fc260acccaab91d7d80fd5b2
    05eaade2782fb0c90d3034fd7a7d5a16266182bb
    6a2a2d6bf8581216e08be15fcb563cfd6c430e1e

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Dec, 2009

1 commit

  • First patch changes __inet_hash_nolisten() and __inet6_hash()
    to get a timewait parameter to be able to unhash it from ehash
    at same time the new socket is inserted in hash.

    This makes sure timewait socket wont be found by a concurrent
    writer in __inet_check_established()

    Reported-by: kapil dakhane
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Dec, 2009

1 commit

  • This function walks the whole hashtable so there is no point in
    passing it a network namespace. Instead I purge all timewait
    sockets from dead network namespaces that I find. If the namespace
    is one of the once I am trying to purge I am guaranteed no new timewait
    sockets can be formed so this will get them all. If the namespace
    is one I am not acting for it might form a few more but I will
    call inet_twsk_purge again and shortly to get rid of them. In
    any even if the network namespace is dead timewait sockets are
    useless.

    Move the calls of inet_twsk_purge into batch_exit routines so
    that if I am killing a bunch of namespaces at once I will just
    call inet_twsk_purge once and save a lot of redundant unnecessary
    work.

    My simple 4k network namespace exit test the cleanup time dropped from
    roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell
    to about 2ms. 1ms for ipv4 and 1ms for ipv6.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Dec, 2009

3 commits

  • Parse incoming TCP_COOKIE option(s).

    Calculate TCP_COOKIE option.

    Send optional data.

    This is a significantly revised implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    Requires:
    TCPCT part 1a: add request_values parameter for sending SYNACK
    TCPCT part 1b: generate Responder Cookie secret
    TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
    TCPCT part 1d: define TCP cookie option, extend existing struct's
    TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
    TCPCT part 1f: Initiator Cookie => Responder

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     
  • Data structures are carefully composed to require minimal additions.
    For example, the struct tcp_options_received cookie_plus variable fits
    between existing 16-bit and 8-bit variables, requiring no additional
    space (taking alignment into consideration). There are no additions to
    tcp_request_sock, and only 1 pointer in tcp_sock.

    This is a significantly revised implementation of an earlier (year-old)
    patch that no longer applies cleanly, with permission of the original
    author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

    The principle difference is using a TCP option to carry the cookie nonce,
    instead of a user configured offset in the data. This is more flexible and
    less subject to user configuration error. Such a cookie option has been
    suggested for many years, and is also useful without SYN data, allowing
    several related concepts to use the same extension option.

    "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
    http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html

    "Re: what a new TCP header might look like", May 12, 1998.
    ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail

    These functions will also be used in subsequent patches that implement
    additional features.

    Requires:
    TCPCT part 1a: add request_values parameter for sending SYNACK
    TCPCT part 1b: generate Responder Cookie secret
    TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS

    Signed-off-by: William.Allen.Simpson@gmail.com
    Signed-off-by: David S. Miller

    William Allen Simpson
     
  • Add optional function parameters associated with sending SYNACK.
    These parameters are not needed after sending SYNACK, and are not
    used for retransmission. Avoids extending struct tcp_request_sock,
    and avoids allocating kernel memory.

    Also affects DCCP as it uses common struct request_sock_ops,
    but this parameter is currently reserved for future use.

    Signed-off-by: William.Allen.Simpson@gmail.com
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    William Allen Simpson
     

14 Nov, 2009

1 commit

  • Define two symbols needed in both kernel and user space.

    Remove old (somewhat incorrect) kernel variant that wasn't used in
    most cases. Default should apply to both RMSS and SMSS (RFC2581).

    Replace numeric constants with defined symbols.

    Stand-alone patch, originally developed for TCPCT.

    Signed-off-by: William.Allen.Simpson@gmail.com
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    William Allen Simpson
     

06 Nov, 2009

1 commit

  • struct can_proto had a capability field which wasn't ever used. It is
    dropped entirely.

    struct inet_protosw had a capability field which can be more clearly
    expressed in the code by just checking if sock->type = SOCK_RAW.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

29 Oct, 2009

1 commit


19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Oct, 2009

1 commit


07 Oct, 2009

1 commit

  • Atis Elsts wrote:
    > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.

    Ok, I'll just drop that part, I'm not sure what should be done in this case.

    > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?

    I just sent that patch out too quickly, here's a better one with the updates.

    Add support for IPv6 route lookups using sk_mark.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

15 Sep, 2009

2 commits

  • It was once upon time so that snd_sthresh was a 16-bit quantity.
    ...That has not been true for long period of time. I run across
    some ancient compares which still seem to trust such legacy.
    Put all that magic into a single place, I hopefully found all
    of them.

    Compile tested, though linking of allyesconfig is ridiculous
    nowadays it seems.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

04 Sep, 2009

1 commit

  • Here is a patch which fixes an issue observed when using TCP over IPv6
    and AH from IPsec.

    When a connection gets closed the 4-way method and the last ACK from
    the server gets dropped, the subsequent FINs from the client do not
    get ACKed because tcp_v6_send_response does not set the transport
    header pointer. This causes ah6_output to try to allocate a lot of
    memory, which typically fails, so the ACKs never make it out of the
    stack.

    I have reproduced the problem on kernel 2.6.7, but after looking at
    the latest kernel it seems the problem is still there.

    Signed-off-by: Cosmin Ratiu
    Signed-off-by: David S. Miller

    Cosmin Ratiu