21 Oct, 2005

3 commits

  • Turns out the problem has nothing to do with use-after-free or double-free.
    It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB
    format that's incompatible with IP.

    Signed-off-by: Herbert Xu
    Signed-off-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo

    Herbert Xu
     
  • icmp_send doesn't use skb->sk at all so even if skb->sk has already
    been freed it can't cause crash there (it would've crashed somewhere
    else first, e.g., ip_queue_xmit).

    I found a double-free on an skb that could explain this though.
    dccp_sendmsg and dccp_write_xmit are a little confused as to what
    should free the packet when something goes wrong. Sometimes they
    both go for the ball and end up in each other's way.

    This patch makes dccp_write_xmit always free the packet no matter
    what. This makes sense since dccp_transmit_skb which in turn comes
    from the fact that ip_queue_xmit always frees the packet.

    Signed-off-by: Herbert Xu
    Signed-off-by: Arnaldo Carvalho de Melo

    Herbert Xu
     
  • David S. Miller wrote:
    > One thing you can probably do for this bug is to mark data packets
    > explicitly somehow, perhaps in the SKB control block DCCP already
    > uses for other data. Put some boolean in there, set it true for
    > data packets. Then change the test in dccp_transmit_skb() as
    > appropriate to test the boolean flag instead of "skb_cloned(skb)".

    I agree. In fact we already have that flag, it's called skb->sk.
    So here is patch to test that instead of skb_cloned().

    Signed-off-by: Herbert Xu
    Acked-by: Ian McDonald
    Signed-off-by: Arnaldo Carvalho de Melo

    Herbert Xu
     

11 Oct, 2005

2 commits


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

04 Oct, 2005

1 commit

  • Arnaldo and I agreed it could be applied now, because I have other
    pending patches depending on this one (Thank you Arnaldo)

    (The other important patch moves skc_refcnt in a separate cache line,
    so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

    1) First some performance data :
    --------------------------------

    tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

    The most time critical code is :

    sk_for_each(sk, node, &head->chain) {
    if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
    goto hit; /* You sunk my battleship! */
    }

    The sk_for_each() does use prefetch() hints but only the begining of
    "struct sock" is prefetched.

    As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
    away from the begining of "struct sock", it has to bring into CPU
    cache cold cache line. Each iteration has to use at least 2 cache
    lines.

    This can be problematic if some chains are very long.

    2) The goal
    -----------

    The idea I had is to change things so that INET_MATCH() may return
    FALSE in 99% of cases only using the data already in the CPU cache,
    using one cache line per iteration.

    3) Description of the patch
    ---------------------------

    Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
    filling a 32 bits hole on 64 bits platform.

    struct sock_common {
    unsigned short skc_family;
    volatile unsigned char skc_state;
    unsigned char skc_reuse;
    int skc_bound_dev_if;
    struct hlist_node skc_node;
    struct hlist_node skc_bind_node;
    atomic_t skc_refcnt;
    + unsigned int skc_hash;
    struct proto *skc_prot;
    };

    Store in this 32 bits field the full hash, not masked by (ehash_size -
    1) Using this full hash as the first comparison done in INET_MATCH
    permits us immediatly skip the element without touching a second cache
    line in case of a miss.

    Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
    sk_hash and tw_hash) already contains the slot number if we mask with
    (ehash_size - 1)

    File include/net/inet_hashtables.h

    64 bits platforms :
    #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash))
    ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
    ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    32bits platforms:
    #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash)) && \
    (inet_sk(__sk)->daddr == (__saddr)) && \
    (inet_sk(__sk)->rcv_saddr == (__daddr)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    - Adds a prefetch(head->chain.first) in
    __inet_lookup_established()/__tcp_v4_check_established() and
    __inet6_lookup_established()/__tcp_v6_check_established() and
    __dccp_v4_check_established() to bring into cache the first element of the
    list, before the {read|write}_lock(&head->lock);

    Signed-off-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Sep, 2005

4 commits


17 Sep, 2005

2 commits


14 Sep, 2005

2 commits


13 Sep, 2005

1 commit


10 Sep, 2005

3 commits


09 Sep, 2005

9 commits


30 Aug, 2005

12 commits