09 Jun, 2009

1 commit


16 Feb, 2009

2 commits

  • The sctp crc32c checksum is always generated in little endian.
    So, we clean up the code to treat it as little endian and remove
    all the __force casts.

    Suggested by Herbert Xu.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This is a new version of my patch, now using a module parameter instead
    of a sysctl, so that the option is harder to find. Please note that,
    once the module is loaded, it is still possible to change the value of
    the parameter in /sys/module/sctp/parameters/, which is useful if you
    want to do performance comparisons without rebooting.

    Computation of SCTP checksums significantly affects the performance of
    SCTP. For example, using two dual-Opteron 246 connected using a Gbe
    network, it was not possible to achieve more than ~730 Mbps, compared to
    941 Mbps after disabling SCTP checksums.
    Unfortunately, SCTP checksum offloading in NICs is not commonly
    available (yet).

    By default, checksums are still enabled, of course.

    Signed-off-by: Lucas Nussbaum
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Lucas Nussbaum
     

23 Jan, 2009

1 commit

  • There is a race between sctp_rcv() and sctp_accept() where we
    have moved the association from the listening socket to the
    accepted socket, but sctp_rcv() processing cached the old
    socket and continues to use it.

    The easy solution is to check for the socket mismatch once we've
    grabed the socket lock. If we hit a mis-match, that means
    that were are currently holding the lock on the listening socket,
    but the association is refrencing a newly accepted socket. We need
    to drop the lock on the old socket and grab the lock on the new one.

    A more proper solution might be to create accepted sockets when
    the new association is established, similar to TCP. That would
    eliminate the race for 1-to-1 style sockets, but it would still
    existing for 1-to-many sockets where a user wished to peeloff an
    association. For now, we'll live with this easy solution as
    it addresses the problem.

    Reported-by: Michal Hocko
    Reported-by: Karsten Keil
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

23 Oct, 2008

1 commit

  • If ICMP packet too big message is received with MTU larger than current
    PMTU, SCTP will still accept this ICMP message and sync the PMTU of assoc
    with the wrong MTU.

    Endpoing A Endpoint B
    (ESTABLISHED) (ESTABLISHED)
    ICMP --------->
    (packet too big, MTU too larger)
    sync PMTU

    This patch fixed the problem by drop that ICMP message.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

19 Jul, 2008

1 commit


17 Jul, 2008

1 commit


15 Jul, 2008

1 commit


20 Jun, 2008

1 commit

  • This patch add to validate initiate tag and chunk type if verification
    tag is 0 when handling ICMP message.

    RFC 4960, Appendix C. ICMP Handling

    ICMP6) An implementation MUST validate that the Verification Tag
    contained in the ICMP message matches the Verification Tag of the peer.
    If the Verification Tag is not 0 and does NOT match, discard the ICMP
    message. If it is 0 and the ICMP message contains enough bytes to
    verify that the chunk type is an INIT chunk and that the Initiate Tag
    matches the tag of the peer, continue with ICMP7. If the ICMP message
    is too short or the chunk type or the Initiate Tag does not match,
    silently discard the packet.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

10 Apr, 2008

1 commit


18 Mar, 2008

2 commits


06 Mar, 2008

1 commit


05 Feb, 2008

1 commit

  • I was notified by Randy Stewart that lksctp claims to be
    "the reference implementation". First of all, "the
    refrence implementation" was the original implementation
    of SCTP in usersapce written ty Randy and a few others.
    Second, after looking at the definiton of 'reference implementation',
    we don't really meet the requirements.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     

29 Jan, 2008

2 commits


10 Nov, 2007

1 commit


11 Oct, 2007

1 commit


26 Sep, 2007

1 commit

  • RFC 4460 and future RFC 4960 (2960-bis) specify that packets
    with bundled INIT chunks need to be dropped. We currenlty do
    that only after processing any leading chunks. For OOTB chunks,
    since we already walk the entire packet, we should discard packets
    with bundled INITs.

    There are other chunks chunks that MUST NOT be bundled, but the spec
    is silent on theire treatment. Thus, we'll leave their teatment
    alone for the moment.

    Signed-off-by: Vlad Yasevich
    Acked-by: Wei Yongjun

    Vlad Yasevich
     

01 Aug, 2007

1 commit


14 Jun, 2007

2 commits

  • Currently, if the socket is owned by the user, we drop the ICMP
    message. As a result SCTP forgets that path MTU changed and
    never adjusting it's estimate. This causes all subsequent
    packets to be fragmented. With this patch, we'll flag the association
    that it needs to udpate it's estimate based on the already updated
    routing information.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala

    Vlad Yasevich
     
  • Introduce new function sctp_transport_update_pmtu that updates
    the transports and destination caches view of the path mtu.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala

    Vlad Yasevich
     

26 Apr, 2007

10 commits

  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
    in further shrinking work, likely with the offsetization of other pointers,
    such as ->{data,tail,end}, at the cost of adds, that were minimized by the
    usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
    accessed multiple times in each function, it also is not more expensive than
    before with regards to most of the handling of such headers, like setting one
    of these headers to another (transport to network, etc), or subtracting, adding
    to/from it, comparing them, etc.

    Now we have this layout for sk_buff on a x86_64 machine:

    [acme@mica net-2.6.22]$ pahole vmlinux sk_buff
    struct sk_buff {
    struct sk_buff * next; /* 0 8 */
    struct sk_buff * prev; /* 8 8 */
    struct rb_node rb; /* 16 24 */
    struct sock * sk; /* 40 8 */
    ktime_t tstamp; /* 48 8 */
    struct net_device * dev; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct net_device * input_dev; /* 64 8 */
    sk_buff_data_t transport_header; /* 72 4 */
    sk_buff_data_t network_header; /* 76 4 */
    sk_buff_data_t mac_header; /* 80 4 */

    /* XXX 4 bytes hole, try to pack */

    struct dst_entry * dst; /* 88 8 */
    struct sec_path * sp; /* 96 8 */
    char cb[48]; /* 104 48 */
    /* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
    unsigned int len; /* 152 4 */
    unsigned int data_len; /* 156 4 */
    unsigned int mac_len; /* 160 4 */
    union {
    __wsum csum; /* 4 */
    __u32 csum_offset; /* 4 */
    }; /* 164 4 */
    __u32 priority; /* 168 4 */
    __u8 local_df:1; /* 172 1 */
    __u8 cloned:1; /* 172 1 */
    __u8 ip_summed:2; /* 172 1 */
    __u8 nohdr:1; /* 172 1 */
    __u8 nfctinfo:3; /* 172 1 */
    __u8 pkt_type:3; /* 173 1 */
    __u8 fclone:2; /* 173 1 */
    __u8 ipvs_property:1; /* 173 1 */

    /* XXX 2 bits hole, try to pack */

    __be16 protocol; /* 174 2 */
    void (*destructor)(struct sk_buff *); /* 176 8 */
    struct nf_conntrack * nfct; /* 184 8 */
    /* --- cacheline 3 boundary (192 bytes) --- */
    struct sk_buff * nfct_reasm; /* 192 8 */
    struct nf_bridge_info *nf_bridge; /* 200 8 */
    __u16 tc_index; /* 208 2 */
    __u16 tc_verd; /* 210 2 */
    dma_cookie_t dma_cookie; /* 212 4 */
    __u32 secmark; /* 216 4 */
    __u32 mark; /* 220 4 */
    unsigned int truesize; /* 224 4 */
    atomic_t users; /* 228 4 */
    unsigned char * head; /* 232 8 */
    unsigned char * data; /* 240 8 */
    unsigned char * tail; /* 248 8 */
    /* --- cacheline 4 boundary (256 bytes) --- */
    unsigned char * end; /* 256 8 */
    }; /* size: 264, cachelines: 5 */
    /* sum members: 260, holes: 1, sum holes: 4 */
    /* bit holes: 1, sum bit holes: 2 bits */
    /* last cacheline: 8 bytes */

    On 32 bits nothing changes, and pointers continue to be used with the compiler
    turning all this abstraction layer into dust. But there are some sk_buff
    validation tricks that are now possible, humm... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
    skb->mac to skb->mac_header, to match the names of the associated helpers
    (skb[_[re]set]_{transport,network,mac}_header).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For consistency with all the other skb->h.raw accessors.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For the quite common 'skb->h.raw - skb->data' sequence.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This time of the type:

    skb->nh.iph = (struct iphdr *)skb->data;

    That is completely equivalent to:

    skb->nh.raw = skb->data;

    Wonder why people love casts... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

11 Feb, 2007

1 commit


03 Dec, 2006

6 commits


31 Oct, 2006

1 commit

  • Every time SCTP creates a temporary association, the stack hashes it,
    puts it on a list of endpoint associations and increments the backlog.
    However, the lifetime of a temporary association is the processing time
    of a current packet and it's destroyed after that. In fact, we don't
    really want anyone else finding this association. There is no reason to
    do this extra work.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich