26 Apr, 2007

5 commits

  • For the places where we need a pointer to the transport header, it is
    still legal to touch skb->h.raw directly if just adding to,
    subtracting from or setting it to another layer header.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to
    avoid the longer, open coded equivalent.

    Ditched a no-op in bnx2 in the process.

    I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()...

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • I noticed in oprofile study a cache miss in tcp_rcv_established() to read
    copied_seq.

    ffffffff80400a80 : /* tcp_rcv_established total: 4034293  
    2.0400 */

     55493  0.0281 :ffffffff80400bc9:   mov    0x4c8(%r12),%eax copied_seq
    543103  0.2746 :ffffffff80400bd1:   cmp    0x3e0(%r12),%eax   rcv_nxt    

    if (tp->copied_seq == tp->rcv_nxt &&
            len - tcp_header_len ucopy.len) {

    In this function, the cache line 0x4c0 -> 0x500 is used only for this
    reading 'copied_seq' field.

    rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of
    active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...)

    As you suggested, I changed tcp_create_openreq_child() so that these fields
    are changed together, to avoid adding a new store buffer stall.

    Patch is 64bit friendly (no new hole because of alignment constraints)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Feb, 2007

1 commit

  • Move DSACK code outside the SACK fast-path checking code. If the DSACK
    determined that the information was too old we stayed with a partial cache
    copied. Most likely this matters very little since the next packet will not be
    DSACK and we will find it in the cache. but it's still not good form and there
    is little reason to couple the two checks.

    Since the SACK receive cache doesn't need the data to be in host order we also
    remove the ntohl in the checking loop.

    Signed-off-by: Baruch Even
    Signed-off-by: David S. Miller

    Baruch Even
     

03 Dec, 2006

4 commits


19 Oct, 2006

1 commit

  • This patch limits the amount of time you will defer sending a TSO segment
    to less than two clock ticks, or the time between two acks, whichever is
    longer.

    On slow links, deferring causes significant bursts. See attached plots,
    which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue
    for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness
    causes significant jitter, tends to overflow queues early (bad for short
    queues), and makes delay-based congestion control more difficult.

    Deferring by a couple clock ticks I believe will have a relatively small
    impact on performance.

    Signed-off-by: John Heffner
    Signed-off-by: David S. Miller

    John Heffner
     

29 Sep, 2006

3 commits


23 Jun, 2006

1 commit


21 Jun, 2006

1 commit

  • * git://git.infradead.org/hdrcleanup-2.6: (63 commits)
    [S390] __FD_foo definitions.
    Switch to __s32 types in joystick.h instead of C99 types for consistency.
    Add to headers included for userspace in
    Move inclusion of out of user scope in asm-x86_64/mtrr.h
    Remove struct fddi_statistics from user view in
    Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
    Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
    Include and use __uXX types in
    Use __uXX types in , include too
    Remove private struct dx_hash_info from public view in
    Include and use __uXX types in
    Use __uXX types in for struct divert_blk et al.
    Use __u32 for elf_addr_t in , not u32. It's user-visible.
    Remove PPP_FCS from user view in , remove __P mess entirely
    Use __uXX types in user-visible structures in
    Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
    Use __uXX types for S390 DASD volume label definitions which are user-visible
    S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
    Remove unneeded inclusion of from
    Fix private integer types used in V4L2 ioctls.
    ...

    Manually resolve conflict in include/linux/mtd/physmap.h

    Linus Torvalds
     

18 Jun, 2006

1 commit


26 Apr, 2006

1 commit


21 Mar, 2006

1 commit


04 Jan, 2006

3 commits


11 Nov, 2005

2 commits

  • Use "hints" to speed up the SACK processing. Various forms
    of this have been used by TCP developers (Web100, STCP, BIC)
    to avoid the 2x linear search of outstanding segments.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This is an updated version of the RFC3465 ABC patch originally
    for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
    bytes ack'd rather than packets when updating congestion control.

    The orignal ABC described in the RFC applied to a Reno style
    algorithm. For advanced congestion control there is little
    change after leaving slow start.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

30 Aug, 2005

7 commits

  • This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
    minimal renaming/moving done in this changeset to ease review.

    Most of it is just changes of struct tcp_sock * to struct sock * parameters.

    With this we move to a state closer to two interesting goals:

    1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
    for any INET transport protocol that has struct inet_hashinfo and are
    derived from struct inet_connection_sock. Keeps the userspace API, that will
    just not display DCCP sockets, while newer versions of tools can support
    DCCP.

    2. INET generic transport pluggable Congestion Avoidance infrastructure, using
    the current TCP CA infrastructure with DCCP.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • With this we're very close to getting all of the current TCP
    refactorings in my dccp-2.6 tree merged, next changeset will export
    some functions needed by the current DCCP code and then dccp-2.6.git
    will be born!

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This creates struct inet_connection_sock, moving members out of struct
    tcp_sock that are shareable with other INET connection oriented
    protocols, such as DCCP, that in my private tree already uses most of
    these members.

    The functions that operate on these members were renamed, using a
    inet_csk_ prefix while not being moved yet to a new file, so as to
    ease the review of these changes.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This paves the way to generalise the rest of the sock ID lookup
    routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
    kernels (where IPv6 is always built as a module):

    [root@qemu ~]# grep tw_sock /proc/slabinfo
    tw_sock_TCPv6 0 0 128 31 1
    tw_sock_TCP 0 0 96 41 1
    [root@qemu ~]#

    Now if a protocol wants to use the TIME_WAIT generic infrastructure it
    only has to set the sk_prot->twsk_obj_size field with the size of its
    inet_timewait_sock derived sock and proto_register will create
    sk_prot->twsk_slab, for now its only for INET sockets, but we can
    introduce timewait_sock later if some non INET transport protocolo
    wants to use this stuff.

    Next changesets will take advantage of this new infrastructure to
    generalise even more TCP code.

    [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
    /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o
    /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o
    [acme@toy net-2.6.14]$

    Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
    (qemu host)).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Lots of places just needs the states, not even linux/tcp.h, where this
    enum was, needs it.

    This speeds up development of the refactorings as less sources are
    rebuilt when things get moved from net/tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This should really be in a inet_connection_sock, but I'm leaving it
    for a later optimization, when some more fields common to INET
    transport protocols now in tcp_sk or inet_sk will be chunked out into
    inet_connection_sock, for now its better to concentrate on getting the
    changes in the core merged to leave the DCCP tree with only DCCP
    specific code.

    Next changesets will take advantage of this move to generalise things
    like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later
    receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in
    the future, when tcp_tw_bucket gets transformed into the struct
    timewait_sock hierarchy.

    tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets
    moved to sk_prot.

    A cascade of incremental changes will ultimately make the tcp_lookup
    functions be fully generic.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This is to break down the complexity of the series of patches,
    making it very clear that this one just does:

    1. renames tcp_ prefixed hashtable functions and data structures that
    were already mostly generic to inet_ to share it with DCCP and
    other INET transport protocols.

    2. Removes not used functions (__tb_head & tb_head)

    3. Removes some leftover prototypes in the headers (tcp_bucket_unlock &
    tcp_v4_build_header)

    Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can
    make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port,
    __tcp_put_port, generic and get others like tcp_destroy_sock closer to generic
    (tcp_orphan_count will go to sk->sk_prot to allow this).

    Eventually most of these functions will be used passing the transport protocol
    inet_hashinfo structure.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

06 Jul, 2005

1 commit

  • Make TSO segment transmit size decisions at send time not earlier.

    The basic scheme is that we try to build as large a TSO frame as
    possible when pulling in the user data, but the size of the TSO frame
    output to the card is determined at transmit time.

    This is guided by tp->xmit_size_goal. It is always set to a multiple
    of MSS and tells sendmsg/sendpage how large an SKB to try and build.

    Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
    necessary and conditions warrant. These routines can also decide to
    "defer" in order to wait for more ACKs to arrive and thus allow larger
    TSO frames to be emitted.

    A general observation is that TSO elongates the pipe, thus requiring a
    larger congestion window and larger buffering especially at the sender
    side. Therefore, it is important that applications 1) get a large
    enough socket send buffer (this is accomplished by our dynamic send
    buffer expansion code) 2) do large enough writes.

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2005

2 commits


19 Jun, 2005

3 commits

  • This chunks out the accept_queue and tcp_listen_opt code and moves
    them to net/core/request_sock.c and include/net/request_sock.h, to
    make it useful for other transport protocols, DCCP being the first one
    to use it.

    Next patches will rename tcp_listen_opt to accept_sock and remove the
    inline tcp functions that just call a reqsk_queue_ function.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Ok, this one just renames some stuff to have a better namespace and to
    dissassociate it from TCP:

    struct open_request -> struct request_sock
    tcp_openreq_alloc -> reqsk_alloc
    tcp_openreq_free -> reqsk_free
    tcp_openreq_fastfree -> __reqsk_free

    With this most of the infrastructure closely resembles a struct
    sock methods subset.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Kept this first changeset minimal, without changing existing names to
    ease peer review.

    Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
    has two new members:

    ->slab, that replaces tcp_openreq_cachep
    ->obj_size, to inform the size of the openreq descendant for
    a specific protocol

    The protocol specific fields in struct open_request were moved to a
    class hierarchy, with the things that are common to all connection
    oriented PF_INET protocols in struct inet_request_sock, the TCP ones
    in tcp_request_sock, that is an inet_request_sock, that is an
    open_request.

    I.e. this uses the same approach used for the struct sock class
    hierarchy, with sk_prot indicating if the protocol wants to use the
    open_request infrastructure by filling in sk_prot->rsk_prot with an
    or_calltable.

    Results? Performance is improved and TCP v4 now uses only 64 bytes per
    open request minisock, down from 96 without this patch :-)

    Next changeset will rename some of the structs, fields and functions
    mentioned above, struct or_calltable is way unclear, better name it
    struct request_sock_ops, s/struct open_request/struct request_sock/g,
    etc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds