08 Jan, 2006

1 commit


04 Jan, 2006

8 commits


11 Nov, 2005

2 commits

  • Minor spelling fixes for TCP code.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Here is the patch that introduces the generic skb_checksum_complete
    which also checks for hardware RX checksum faults. If that happens,
    it'll call netdev_rx_csum_fault which currently prints out a stack
    trace with the device name. In future it can turn off RX checksum.

    I've converted every spot under net/ that does RX checksum checks to
    use skb_checksum_complete or __skb_checksum_complete with the
    exceptions of:

    * Those places where checksums are done bit by bit. These will call
    netdev_rx_csum_fault directly.

    * The following have not been completely checked/converted:

    ipmr
    ip_vs
    netfilter
    dccp

    This patch is based on patches and suggestions from Stephen Hemminger
    and David S. Miller.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

09 Nov, 2005

1 commit

  • From: Jesper Juhl

    This is the net/ part of the big kfree cleanup patch.

    Remove pointless checks for NULL prior to calling kfree() in net/.

    Signed-off-by: Jesper Juhl
    Cc: "David S. Miller"
    Cc: Arnaldo Carvalho de Melo
    Acked-by: Marcel Holtmann
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: Andrew Morton

    Jesper Juhl
     

06 Nov, 2005

1 commit

  • This patch randomizes the port selected on bind() for connections
    to help with possible security attacks. It should also be faster
    in most cases because there is no need for a global lock.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephen Hemminger
     

04 Oct, 2005

1 commit

  • Arnaldo and I agreed it could be applied now, because I have other
    pending patches depending on this one (Thank you Arnaldo)

    (The other important patch moves skc_refcnt in a separate cache line,
    so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

    1) First some performance data :
    --------------------------------

    tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

    The most time critical code is :

    sk_for_each(sk, node, &head->chain) {
    if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
    goto hit; /* You sunk my battleship! */
    }

    The sk_for_each() does use prefetch() hints but only the begining of
    "struct sock" is prefetched.

    As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
    away from the begining of "struct sock", it has to bring into CPU
    cache cold cache line. Each iteration has to use at least 2 cache
    lines.

    This can be problematic if some chains are very long.

    2) The goal
    -----------

    The idea I had is to change things so that INET_MATCH() may return
    FALSE in 99% of cases only using the data already in the CPU cache,
    using one cache line per iteration.

    3) Description of the patch
    ---------------------------

    Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
    filling a 32 bits hole on 64 bits platform.

    struct sock_common {
    unsigned short skc_family;
    volatile unsigned char skc_state;
    unsigned char skc_reuse;
    int skc_bound_dev_if;
    struct hlist_node skc_node;
    struct hlist_node skc_bind_node;
    atomic_t skc_refcnt;
    + unsigned int skc_hash;
    struct proto *skc_prot;
    };

    Store in this 32 bits field the full hash, not masked by (ehash_size -
    1) Using this full hash as the first comparison done in INET_MATCH
    permits us immediatly skip the element without touching a second cache
    line in case of a miss.

    Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
    sk_hash and tw_hash) already contains the slot number if we mask with
    (ehash_size - 1)

    File include/net/inet_hashtables.h

    64 bits platforms :
    #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash))
    ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
    ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    32bits platforms:
    #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash)) && \
    (inet_sk(__sk)->daddr == (__saddr)) && \
    (inet_sk(__sk)->rcv_saddr == (__daddr)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    - Adds a prefetch(head->chain.first) in
    __inet_lookup_established()/__tcp_v4_check_established() and
    __inet6_lookup_established()/__tcp_v6_check_established() and
    __dccp_v4_check_established() to bring into cache the first element of the
    list, before the {read|write}_lock(&head->lock);

    Signed-off-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Aug, 2005

21 commits

  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
    minimal renaming/moving done in this changeset to ease review.

    Most of it is just changes of struct tcp_sock * to struct sock * parameters.

    With this we move to a state closer to two interesting goals:

    1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
    for any INET transport protocol that has struct inet_hashinfo and are
    derived from struct inet_connection_sock. Keeps the userspace API, that will
    just not display DCCP sockets, while newer versions of tools can support
    DCCP.

    2. INET generic transport pluggable Congestion Avoidance infrastructure, using
    the current TCP CA infrastructure with DCCP.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • That groups all of the tables and variables associated to the TCP timewait
    schedulling/recycling/killing code, that now can be isolated from the TCP
    specific code and used by other transport protocols, such as DCCP.

    Next changeset will move this code to net/ipv4/inet_timewait_sock.c

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This also moved inet_iif from tcp to inet_hashtables.h, as it is
    needed by the inet_lookup callers, perhaps this needs a bit of
    polishing, but for now seems fine.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Completing the previous changeset, this also generalises tcp_v4_synq_add,
    renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the
    DCCP tree, which I plan to merge RSN.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This creates struct inet_connection_sock, moving members out of struct
    tcp_sock that are shareable with other INET connection oriented
    protocols, such as DCCP, that in my private tree already uses most of
    these members.

    The functions that operate on these members were renamed, using a
    inet_csk_ prefix while not being moved yet to a new file, so as to
    ease the review of these changes.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • And also some TIME_WAIT functions.

    [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
    /tmp/before.size: 282955 13122 9312 305389 4a8ed net/ipv4/built-in.o
    /tmp/after.size: 281566 13122 9312 304000 4a380 net/ipv4/built-in.o
    [acme@toy net-2.6.14]$

    I kept them still inlined, will uninline at some point to see what
    would be the performance difference.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This paves the way to generalise the rest of the sock ID lookup
    routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
    kernels (where IPv6 is always built as a module):

    [root@qemu ~]# grep tw_sock /proc/slabinfo
    tw_sock_TCPv6 0 0 128 31 1
    tw_sock_TCP 0 0 96 41 1
    [root@qemu ~]#

    Now if a protocol wants to use the TIME_WAIT generic infrastructure it
    only has to set the sk_prot->twsk_obj_size field with the size of its
    inet_timewait_sock derived sock and proto_register will create
    sk_prot->twsk_slab, for now its only for INET sockets, but we can
    introduce timewait_sock later if some non INET transport protocolo
    wants to use this stuff.

    Next changesets will take advantage of this new infrastructure to
    generalise even more TCP code.

    [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
    /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o
    /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o
    [acme@toy net-2.6.14]$

    Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
    (qemu host)).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • [acme@toy net-2.6.14]$ grep built-in /tmp/before /tmp/after
    /tmp/before: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o
    /tmp/after: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o

    Will be used in DCCP, not exporting it right now not to get in Adrian
    Bunk's exported-but-not-used-on-modules radar 8)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • It really just makes the existing code be a helper function that
    tcp_v4_hash and tcp_unhash uses, specifying the right inet_hashinfo,
    tcp_hashinfo.

    One thing I'll investigate at some point is to have the inet_hashinfo
    pointer in sk_prot, so that we get all the hashtable information from
    the sk pointer, this can lead to some extra indirections that may well
    hurt performance/code size, we'll see. Ultimate idea would be that
    sk_prot would provide _all_ the information about a protocol
    implementation.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Also expose all of the tcp_hashinfo members, i.e. killing those
    tcp_ehash, etc macros, this will more clearly expose already generic
    functions and some that need just a bit of work to become generic, as
    we'll see in the upcoming changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This required moving tcp_bucket_cachep to inet_hashinfo.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This should really be in a inet_connection_sock, but I'm leaving it
    for a later optimization, when some more fields common to INET
    transport protocols now in tcp_sk or inet_sk will be chunked out into
    inet_connection_sock, for now its better to concentrate on getting the
    changes in the core merged to leave the DCCP tree with only DCCP
    specific code.

    Next changesets will take advantage of this move to generalise things
    like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
    receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
    the future, when tcp_tw_bucket gets transformed into the struct
    timewait_sock hierarchy.

    tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
    moved to sk_prot.

    A cascade of incremental changes will ultimately make the tcp_lookup
    functions be fully generic.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This is to break down the complexity of the series of patches,
    making it very clear that this one just does:

    1. renames tcp_ prefixed hashtable functions and data structures that
    were already mostly generic to inet_ to share it with DCCP and
    other INET transport protocols.

    2. Removes not used functions (__tb_head & tb_head)

    3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
    tcp_v4_build_header)

    Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
    make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
    __tcp_put_port, generic and get others like tcp_destroy_sock closer to generic
    (tcp_orphan_count will go to sk->sk_prot to allow this).

    Eventually most of these functions will be used passing the transport protocol
    inet_hashinfo structure.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • To be shared with DCCP (and others), this is the start of a series of patches
    that will expose the already generic TCP hash table routines.

    The few changes noticed when calling gcc -S before/after on a pentium4 were of
    this type:

    movl 40(%esp), %edx
    cmpl %esi, 472(%edx)
    je .L168
    - pushl $291
    + pushl $272
    pushl $.LC0
    pushl $.LC1
    pushl $.LC2

    [acme@toy net-2.6.14]$ size net/ipv4/tcp_ipv4.before.o net/ipv4/tcp_ipv4.after.o
    text data bss dec hex filename
    17804 516 140 18460 481c net/ipv4/tcp_ipv4.before.o
    17804 516 140 18460 481c net/ipv4/tcp_ipv4.after.o

    Holler if some weird architecture has issues with things like this 8)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • From tcp_v4_rebuild_header, that already was pretty generic, I only
    needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and
    establish the requirement that INET transport layer protocols that
    want to use this function map TCP_SYN_SENT to its equivalent state.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • From tcp_v4_setup_caps, that always is preceded by a call to
    __sk_dst_set, so coalesce this sequence into sk_setup_caps, removing
    one call to a TCP function in the IP layer.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This operation was already generic and DCCP will use it.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

24 Aug, 2005

1 commit


09 Aug, 2005

1 commit

  • Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux
    kernel 2.6.13-rc5. Also weird use of indentation is changed in some
    places.

    Signed-off-by: Heikki Orsila
    Signed-off-by: David S. Miller

    Heikki Orsila
     

06 Jul, 2005

1 commit

  • Make TSO segment transmit size decisions at send time not earlier.

    The basic scheme is that we try to build as large a TSO frame as
    possible when pulling in the user data, but the size of the TSO frame
    output to the card is determined at transmit time.

    This is guided by tp->xmit_size_goal. It is always set to a multiple
    of MSS and tells sendmsg/sendpage how large an SKB to try and build.

    Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
    necessary and conditions warrant. These routines can also decide to
    "defer" in order to wait for more ACKs to arrive and thus allow larger
    TSO frames to be emitted.

    A general observation is that TSO elongates the pipe, thus requiring a
    larger congestion window and larger buffering especially at the sender
    side. Therefore, it is important that applications 1) get a large
    enough socket send buffer (this is accomplished by our dynamic send
    buffer expansion code) 2) do large enough writes.

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2005

2 commits