14 Jun, 2007

2 commits

  • Currently, if the socket is owned by the user, we drop the ICMP
    message. As a result SCTP forgets that path MTU changed and
    never adjusting it's estimate. This causes all subsequent
    packets to be fragmented. With this patch, we'll flag the association
    that it needs to udpate it's estimate based on the already updated
    routing information.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala

    Vlad Yasevich
     
  • Introduce new function sctp_transport_update_pmtu that updates
    the transports and destination caches view of the path mtu.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala

    Vlad Yasevich
     

26 Apr, 2007

10 commits

  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
    in further shrinking work, likely with the offsetization of other pointers,
    such as ->{data,tail,end}, at the cost of adds, that were minimized by the
    usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
    accessed multiple times in each function, it also is not more expensive than
    before with regards to most of the handling of such headers, like setting one
    of these headers to another (transport to network, etc), or subtracting, adding
    to/from it, comparing them, etc.

    Now we have this layout for sk_buff on a x86_64 machine:

    [acme@mica net-2.6.22]$ pahole vmlinux sk_buff
    struct sk_buff {
    struct sk_buff * next; /* 0 8 */
    struct sk_buff * prev; /* 8 8 */
    struct rb_node rb; /* 16 24 */
    struct sock * sk; /* 40 8 */
    ktime_t tstamp; /* 48 8 */
    struct net_device * dev; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct net_device * input_dev; /* 64 8 */
    sk_buff_data_t transport_header; /* 72 4 */
    sk_buff_data_t network_header; /* 76 4 */
    sk_buff_data_t mac_header; /* 80 4 */

    /* XXX 4 bytes hole, try to pack */

    struct dst_entry * dst; /* 88 8 */
    struct sec_path * sp; /* 96 8 */
    char cb[48]; /* 104 48 */
    /* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
    unsigned int len; /* 152 4 */
    unsigned int data_len; /* 156 4 */
    unsigned int mac_len; /* 160 4 */
    union {
    __wsum csum; /* 4 */
    __u32 csum_offset; /* 4 */
    }; /* 164 4 */
    __u32 priority; /* 168 4 */
    __u8 local_df:1; /* 172 1 */
    __u8 cloned:1; /* 172 1 */
    __u8 ip_summed:2; /* 172 1 */
    __u8 nohdr:1; /* 172 1 */
    __u8 nfctinfo:3; /* 172 1 */
    __u8 pkt_type:3; /* 173 1 */
    __u8 fclone:2; /* 173 1 */
    __u8 ipvs_property:1; /* 173 1 */

    /* XXX 2 bits hole, try to pack */

    __be16 protocol; /* 174 2 */
    void (*destructor)(struct sk_buff *); /* 176 8 */
    struct nf_conntrack * nfct; /* 184 8 */
    /* --- cacheline 3 boundary (192 bytes) --- */
    struct sk_buff * nfct_reasm; /* 192 8 */
    struct nf_bridge_info *nf_bridge; /* 200 8 */
    __u16 tc_index; /* 208 2 */
    __u16 tc_verd; /* 210 2 */
    dma_cookie_t dma_cookie; /* 212 4 */
    __u32 secmark; /* 216 4 */
    __u32 mark; /* 220 4 */
    unsigned int truesize; /* 224 4 */
    atomic_t users; /* 228 4 */
    unsigned char * head; /* 232 8 */
    unsigned char * data; /* 240 8 */
    unsigned char * tail; /* 248 8 */
    /* --- cacheline 4 boundary (256 bytes) --- */
    unsigned char * end; /* 256 8 */
    }; /* size: 264, cachelines: 5 */
    /* sum members: 260, holes: 1, sum holes: 4 */
    /* bit holes: 1, sum bit holes: 2 bits */
    /* last cacheline: 8 bytes */

    On 32 bits nothing changes, and pointers continue to be used with the compiler
    turning all this abstraction layer into dust. But there are some sk_buff
    validation tricks that are now possible, humm... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
    skb->mac to skb->mac_header, to match the names of the associated helpers
    (skb[_[re]set]_{transport,network,mac}_header).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For consistency with all the other skb->h.raw accessors.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For the quite common 'skb->h.raw - skb->data' sequence.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This time of the type:

    skb->nh.iph = (struct iphdr *)skb->data;

    That is completely equivalent to:

    skb->nh.raw = skb->data;

    Wonder why people love casts... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

11 Feb, 2007

1 commit


03 Dec, 2006

6 commits


31 Oct, 2006

2 commits

  • Every time SCTP creates a temporary association, the stack hashes it,
    puts it on a list of endpoint associations and increments the backlog.
    However, the lifetime of a temporary association is the processing time
    of a current packet and it's destroyed after that. In fact, we don't
    really want anyone else finding this association. There is no reason to
    do this extra work.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • I was looking at a RHEL5 bug report involving Xen and SCTP
    (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212550).
    It turns out that SCTP wasn't written to handle skb fragments at
    all. The absence of any calls to skb_may_pull is testament to
    that.

    It just so happens that Xen creates fragmented packets more often
    than other scenarios (header & data split when going from domU to
    dom0). That's what caused this bug to show up.

    Until someone has the time sits down and audits the entire net/sctp
    directory, here is a conservative and safe solution that simply
    linearises all packets on input.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

30 Sep, 2006

2 commits


23 Sep, 2006

2 commits

  • Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
    needlock = 0, while socket is not locked at that moment. In order to avoid
    this and similar issues in the future, use rcu for sk->sk_filter field read
    protection.

    Signed-off-by: Dmitry Mishin
    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev

    Dmitry Mishin
     
  • This patch adds more statistics info under /proc/net/sctp/snmp
    that should be useful for debugging. The additional events that
    are counted now include timer expirations, retransmits, packet
    and data chunk discards.

    The Data chunk discards include all the cases where a data chunk
    is discarded including high tsn, bad stream, dup tsn and the most
    useful one(out of receive buffer/rwnd).

    Also moved the SCTP MIB data structures from the generic include
    directories to include/sctp/sctp.h.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Sridhar Samudrala
     

18 Jun, 2006

2 commits


20 May, 2006

2 commits

  • sctp_rcv().

    The goal is to hold the ref on the association/endpoint throughout the
    state-machine process. We accomplish like this:

    /* ref on the assoc/ep is taken during lookup */

    if owned_by_user(sk)
    sctp_add_backlog(skb, sk);
    else
    inqueue_push(skb, sk);

    /* drop the ref on the assoc/ep */

    However, in sctp_add_backlog() we take the ref on assoc/ep and hold it
    while the skb is on the backlog queue. This allows us to get rid of the
    sock_hold/sock_put in the lookup routines.

    Now sctp_backlog_rcv() needs to account for potential association move.
    In the unlikely event that association moved, we need to retest if the
    new socket is locked by user. If we don't this, we may have two packets
    racing up the stack toward the same socket and we can't deal with it.
    If the new socket is still locked, we'll just add the skb to its backlog
    continuing to hold the ref on the association. This get's rid of the
    need to move packets from one backlog to another and it also safe in
    case new packets arrive on the same backlog queue.

    The last step, is to lock the new socket when we are moving the
    association to it. This is needed in case any new packets arrive on
    the association when it moved. We want these to go to the backlog since
    we would like to avoid the race between this new packet and a packet
    that may be sitting on the backlog queue of the old socket toward the
    same association.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: Sridhar Samudrala

    Vladislav Yasevich
     
  • Also fix some other cases where sk_err is not set for 1-1 style sockets.

    Signed-off-by: Sridhar Samudrala

    Sridhar Samudrala
     

25 Mar, 2006

1 commit

  • I was working on the ipip/xfrm problem and as usual I get side-tracked by
    other problems.

    As part of an attempt to change the IPv4 protocol handler calling
    convention I found that SCTP violated the existing convention.

    It's returning non-zero values after freeing the skb. This is doubly bad
    as 1) the skb gets resubmitted; 2) the return value is interpreted as a
    protocol number.

    This patch changes those return values to zero.

    IPv6 doesn't suffer from this problem because it uses a positive return
    value as an indication for resubmission. So the only effect of this patch
    there is to increment the IPSTATS_MIB_INDELIVERS counter which IMHO is
    the right thing to do.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Jan, 2006

3 commits


08 Jan, 2006

1 commit


04 Jan, 2006

1 commit


12 Nov, 2005

1 commit


30 Aug, 2005

1 commit


19 Jul, 2005

1 commit


09 Jul, 2005

1 commit


21 Jun, 2005

1 commit