04 Jan, 2006

3 commits

  • Its common enough to to justify that, TCP still can't use it as it has the
    prequeueing stuff, still to be made generic in the not so distant future :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • So that we can share several timewait sockets related functions and
    make the timewait mini sockets infrastructure closer to the request
    mini sockets one.

    Next changesets will take advantage of this, moving more code out of
    TCP and DCCP v4 and v6 to common infrastructure.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • It also looks like there were 2 places where the test on sk_err was
    missing from the event wait logic (in sk_stream_wait_connect and
    sk_stream_wait_memory), while the rest of the sock_error() users look
    to be doing the right thing. This version of the patch fixes those,
    and cleans up a few places that were testing ->sk_err directly.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

11 Nov, 2005

1 commit

  • Use "hints" to speed up the SACK processing. Various forms
    of this have been used by TCP developers (Web100, STCP, BIC)
    to avoid the 2x linear search of outstanding segments.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

09 Nov, 2005

1 commit


28 Oct, 2005

1 commit


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

04 Oct, 2005

1 commit

  • Arnaldo and I agreed it could be applied now, because I have other
    pending patches depending on this one (Thank you Arnaldo)

    (The other important patch moves skc_refcnt in a separate cache line,
    so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

    1) First some performance data :
    --------------------------------

    tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

    The most time critical code is :

    sk_for_each(sk, node, &head->chain) {
    if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
    goto hit; /* You sunk my battleship! */
    }

    The sk_for_each() does use prefetch() hints but only the begining of
    "struct sock" is prefetched.

    As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
    away from the begining of "struct sock", it has to bring into CPU
    cache cold cache line. Each iteration has to use at least 2 cache
    lines.

    This can be problematic if some chains are very long.

    2) The goal
    -----------

    The idea I had is to change things so that INET_MATCH() may return
    FALSE in 99% of cases only using the data already in the CPU cache,
    using one cache line per iteration.

    3) Description of the patch
    ---------------------------

    Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
    filling a 32 bits hole on 64 bits platform.

    struct sock_common {
    unsigned short skc_family;
    volatile unsigned char skc_state;
    unsigned char skc_reuse;
    int skc_bound_dev_if;
    struct hlist_node skc_node;
    struct hlist_node skc_bind_node;
    atomic_t skc_refcnt;
    + unsigned int skc_hash;
    struct proto *skc_prot;
    };

    Store in this 32 bits field the full hash, not masked by (ehash_size -
    1) Using this full hash as the first comparison done in INET_MATCH
    permits us immediatly skip the element without touching a second cache
    line in case of a miss.

    Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
    sk_hash and tw_hash) already contains the slot number if we mask with
    (ehash_size - 1)

    File include/net/inet_hashtables.h

    64 bits platforms :
    #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash))
    ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
    ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    32bits platforms:
    #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash)) && \
    (inet_sk(__sk)->daddr == (__saddr)) && \
    (inet_sk(__sk)->rcv_saddr == (__daddr)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    - Adds a prefetch(head->chain.first) in
    __inet_lookup_established()/__tcp_v4_check_established() and
    __inet6_lookup_established()/__tcp_v6_check_established() and
    __dccp_v4_check_established() to bring into cache the first element of the
    list, before the {read|write}_lock(&head->lock);

    Signed-off-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Sep, 2005

1 commit


02 Sep, 2005

2 commits

  • I've finally found a potential cause of the sk_forward_alloc underflows
    that people have been reporting sporadically.

    When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
    check sk_forward_alloc even though a large amount of time may have
    elapsed since we allocated the page. In the mean time someone could've
    come along and liberated packets and reclaimed sk_forward_alloc memory.

    This patch makes tcp_sendmsg check sk_forward_alloc every time as we
    do in do_tcp_sendpages.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch introduces sk_stream_wmem_schedule as a short-hand for
    the sk_forward_alloc checking on egress.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Aug, 2005

14 commits

  • Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • Protocols that make extensive use of SKB cloning,
    for example TCP, eat at least 2 allocations per
    packet sent as a result.

    To cut the kmalloc() count in half, we implement
    a pre-allocation scheme wherein we allocate
    2 sk_buff objects in advance, then use a simple
    reference count to free up the memory at the
    correct time.

    Based upon an initial patch by Thomas Graf and
    suggestions from Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Reduces skb size by 8 bytes on 64-bit.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • This also moved inet_iif from tcp to inet_hashtables.h, as it is
    needed by the inet_lookup callers, perhaps this needs a bit of
    polishing, but for now seems fine.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This creates struct inet_connection_sock, moving members out of struct
    tcp_sock that are shareable with other INET connection oriented
    protocols, such as DCCP, that in my private tree already uses most of
    these members.

    The functions that operate on these members were renamed, using a
    inet_csk_ prefix while not being moved yet to a new file, so as to
    ease the review of these changes.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Out of tcp_create_openreq_child, will be used in
    dccp_create_openreq_child, and is a nice sock function anyway.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • And also some TIME_WAIT functions.

    [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
    /tmp/before.size: 282955 13122 9312 305389 4a8ed net/ipv4/built-in.o
    /tmp/after.size: 281566 13122 9312 304000 4a380 net/ipv4/built-in.o
    [acme@toy net-2.6.14]$

    I kept them still inlined, will uninline at some point to see what
    would be the performance difference.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This paves the way to generalise the rest of the sock ID lookup
    routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
    kernels (where IPv6 is always built as a module):

    [root@qemu ~]# grep tw_sock /proc/slabinfo
    tw_sock_TCPv6 0 0 128 31 1
    tw_sock_TCP 0 0 96 41 1
    [root@qemu ~]#

    Now if a protocol wants to use the TIME_WAIT generic infrastructure it
    only has to set the sk_prot->twsk_obj_size field with the size of its
    inet_timewait_sock derived sock and proto_register will create
    sk_prot->twsk_slab, for now its only for INET sockets, but we can
    introduce timewait_sock later if some non INET transport protocolo
    wants to use this stuff.

    Next changesets will take advantage of this new infrastructure to
    generalise even more TCP code.

    [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
    /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o
    /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o
    [acme@toy net-2.6.14]$

    Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1
    (qemu host)).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Also expose all of the tcp_hashinfo members, i.e. killing those
    tcp_ehash, etc macros, this will more clearly expose already generic
    functions and some that need just a bit of work to become generic, as
    we'll see in the upcoming changesets.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • From tcp_v4_setup_caps, that always is preceded by a call to
    __sk_dst_set, so coalesce this sequence into sk_setup_caps, removing
    one call to a TCP function in the IP layer.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This operation was already generic and DCCP will use it.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

24 Aug, 2005

1 commit

  • The socket flag cleanups that went into 2.6.12-rc1 are basically oring
    the flags of an old socket into the socket just being created.
    Unfortunately that one was just initialized by sock_init_data(), so already
    has SOCK_ZAPPED set. As the result zapped sockets are created and all
    incoming connection will fail due to this bug which again was carefully
    replicated to at least AX.25, NET/ROM or ROSE.

    In order to keep the abstraction alive I've introduced sock_copy_flags()
    to copy the socket flags from one sockets to another and used that
    instead of the bitwise copy thing. Anyway, the idea here has probably
    been to copy all flags, so sock_copy_flags() should be the right thing.
    With this the ham radio protocols are usable again, so I hope this will
    make it into 2.6.13.

    Signed-off-by: Ralf Baechle DL5RB
    Signed-off-by: David S. Miller

    Ralf Baechle
     

09 Jul, 2005

1 commit


06 Jul, 2005

1 commit

  • The ideal and most optimal layout for an SKB when doing
    scatter-gather is to put all the headers at skb->data, and
    all the user data in the page array.

    This makes SKB splitting and combining extremely simple,
    especially before a packet goes onto the wire the first
    time.

    So, when sk_stream_alloc_pskb() is given a zero size, make
    sure there is no skb_tailroom(). This is achieved by applying
    SKB_DATA_ALIGN() to the header length used here.

    Next, make select_size() in TCP output segmentation use a
    length of zero when NETIF_F_SG is true on the outgoing
    interface.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jun, 2005

2 commits

  • Ok, this one just renames some stuff to have a better namespace and to
    dissassociate it from TCP:

    struct open_request -> struct request_sock
    tcp_openreq_alloc -> reqsk_alloc
    tcp_openreq_free -> reqsk_free
    tcp_openreq_fastfree -> __reqsk_free

    With this most of the infrastructure closely resembles a struct
    sock methods subset.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Kept this first changeset minimal, without changing existing names to
    ease peer review.

    Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
    has two new members:

    ->slab, that replaces tcp_openreq_cachep
    ->obj_size, to inform the size of the openreq descendant for
    a specific protocol

    The protocol specific fields in struct open_request were moved to a
    class hierarchy, with the things that are common to all connection
    oriented PF_INET protocols in struct inet_request_sock, the TCP ones
    in tcp_request_sock, that is an inet_request_sock, that is an
    open_request.

    I.e. this uses the same approach used for the struct sock class
    hierarchy, with sk_prot indicating if the protocol wants to use the
    open_request infrastructure by filling in sk_prot->rsk_prot with an
    or_calltable.

    Results? Performance is improved and TCP v4 now uses only 64 bytes per
    open request minisock, down from 96 without this patch :-)

    Next changeset will rename some of the structs, fields and functions
    mentioned above, struct or_calltable is way unclear, better name it
    struct request_sock_ops, s/struct open_request/struct request_sock/g,
    etc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

06 May, 2005

2 commits


01 May, 2005

2 commits

  • Some KernelDoc descriptions are updated to match the current code.
    No code changes.

    Signed-off-by: Martin Waitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Waitz
     
  • I have recompiled Linux kernel 2.6.11.5 documentation for me and our
    university students again. The documentation could be extended for more
    sources which are equipped by structured comments for recent 2.6 kernels. I
    have tried to proceed with that task. I have done that more times from 2.6.0
    time and it gets boring to do same changes again and again. Linux kernel
    compiles after changes for i386 and ARM targets. I have added references to
    some more files into kernel-api book, I have added some section names as well.
    So please, check that changes do not break something and that categories are
    not too much skewed.

    I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
    by kernel convention. Most of the other changes are modifications in the
    comments to make kernel-doc happy, accept some parameters description and do
    not bail out on errors. Changed to @pid in the description, moved some
    #ifdef before comments to correct function to comments bindings, etc.

    You can see result of the modified documentation build at
    http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz

    Some more sources are ready to be included into kernel-doc generated
    documentation. Sources has been added into kernel-api for now. Some more
    section names added and probably some more chaos introduced as result of quick
    cleanup work.

    Signed-off-by: Pavel Pisa
    Signed-off-by: Martin Waitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Pisa
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds