13 May, 2006

1 commit

  • The classical IP over ATM code maintains its own IPv4
    ARP table, using the standard neighbour-table code. The
    neigh_table_init function adds this neighbour table to a linked list
    of all neighbor tables which is used by the functions neigh_delete()
    neigh_add() and neightbl_set(), all called by the netlink code.

    Once the ATM neighbour table is added to the list, there are two
    tables with family == AF_INET there, and ARP entries sent via netlink
    go into the first table with matching family. This is indeterminate
    and often wrong.

    To see the bug, on a kernel with CLIP enabled, create a standard IPv4
    ARP entry by pinging an unused address on a local subnet. Then attempt
    to complete that entry by doing

    ip neigh replace lladdr nud reachable

    Looking at the ARP tables by using

    ip neigh show

    will reveal two ARP entries for the same address. One of these can be
    found in /proc/net/arp, and the other in /proc/net/atm/arp.

    This patch adds a new function, neigh_table_init_no_netlink() which
    does everything the neigh_table_init() does, except add the table to
    the netlink all-arp-tables chain. In addition neigh_table_init() has a
    check that all tables on the chain have a distinct address family.
    The init call in clip.c is changed to call
    neigh_table_init_no_netlink().

    Since ATM ARP tables are rather more complicated than can currently be
    handled by the available rtattrs in the netlink protocol, no
    functionality is lost by this patch, and non-ATM ARP manipulation via
    netlink is rescued. A more complete solution would involve a rtattr
    for ATM ARP entries and some way for the netlink code to give
    neigh_add and friends more information than just address family with
    which to find the correct ARP table.

    [ I've changed the assertion checking in neigh_table_init() to not
    use BUG_ON() while holding neigh_tbl_lock. Instead we remember that
    we found an existing tbl with the same family, and after dropping
    the lock we'll give a diagnostic kernel log message and a stack dump.
    -DaveM ]

    Signed-off-by: Simon Kelley
    Signed-off-by: David S. Miller

    Simon Kelley
     

15 Apr, 2006

6 commits


21 Mar, 2006

3 commits


05 Mar, 2006

1 commit


14 Feb, 2006

1 commit


12 Jan, 2006

2 commits


11 Jan, 2006

1 commit


07 Jan, 2006

1 commit


04 Jan, 2006

1 commit

  • I noticed that some of 'struct proto_ops' used in the kernel may share
    a cache line used by locks or other heavily modified data. (default
    linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
    least)

    This patch makes sure a 'struct proto_ops' can be declared as const,
    so that all cpus can share all parts of it without false sharing.

    This is not mandatory : a driver can still use a read/write structure
    if it needs to (and eventually a __read_mostly)

    I made a global stubstitute to change all existing occurences to make
    them const.

    This should reduce the possibility of false sharing on SMP, and
    speedup some socket system calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Nov, 2005

5 commits


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

08 Oct, 2005

1 commit


07 Oct, 2005

1 commit


05 Oct, 2005

1 commit

  • Fix implicit nocast warnings in atm code:
    net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type
    drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type

    Also use kzalloc() instead of kmalloc().

    Signed-off-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Randy Dunlap
     

04 Oct, 2005

2 commits

  • The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
    introduces __in_dev_get_rcu() to cover the second case.

    1) RCU with refcnt should use in_dev_get().
    2) RCU without refcnt should use __in_dev_get_rcu().
    3) All others must hold RTNL and use __in_dev_get_rtnl().

    There is one exception in net/ipv4/route.c which is in fact a pre-existing
    race condition. I've marked it as such so that we remember to fix it.

    This patch is based on suggestions and prior work by Suzanne Wood and
    Paul McKenney.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Arnaldo and I agreed it could be applied now, because I have other
    pending patches depending on this one (Thank you Arnaldo)

    (The other important patch moves skc_refcnt in a separate cache line,
    so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

    1) First some performance data :
    --------------------------------

    tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

    The most time critical code is :

    sk_for_each(sk, node, &head->chain) {
    if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
    goto hit; /* You sunk my battleship! */
    }

    The sk_for_each() does use prefetch() hints but only the begining of
    "struct sock" is prefetched.

    As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
    away from the begining of "struct sock", it has to bring into CPU
    cache cold cache line. Each iteration has to use at least 2 cache
    lines.

    This can be problematic if some chains are very long.

    2) The goal
    -----------

    The idea I had is to change things so that INET_MATCH() may return
    FALSE in 99% of cases only using the data already in the CPU cache,
    using one cache line per iteration.

    3) Description of the patch
    ---------------------------

    Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
    filling a 32 bits hole on 64 bits platform.

    struct sock_common {
    unsigned short skc_family;
    volatile unsigned char skc_state;
    unsigned char skc_reuse;
    int skc_bound_dev_if;
    struct hlist_node skc_node;
    struct hlist_node skc_bind_node;
    atomic_t skc_refcnt;
    + unsigned int skc_hash;
    struct proto *skc_prot;
    };

    Store in this 32 bits field the full hash, not masked by (ehash_size -
    1) Using this full hash as the first comparison done in INET_MATCH
    permits us immediatly skip the element without touching a second cache
    line in case of a miss.

    Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
    sk_hash and tw_hash) already contains the slot number if we mask with
    (ehash_size - 1)

    File include/net/inet_hashtables.h

    64 bits platforms :
    #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash))
    ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
    ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    32bits platforms:
    #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
    (((__sk)->sk_hash == (__hash)) && \
    (inet_sk(__sk)->daddr == (__saddr)) && \
    (inet_sk(__sk)->rcv_saddr == (__daddr)) && \
    (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

    - Adds a prefetch(head->chain.first) in
    __inet_lookup_established()/__tcp_v4_check_established() and
    __inet6_lookup_established()/__tcp_v6_check_established() and
    __dccp_v4_check_established() to bring into cache the first element of the
    list, before the {read|write}_lock(&head->lock);

    Signed-off-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Sep, 2005

2 commits


29 Sep, 2005

3 commits


10 Sep, 2005

1 commit


06 Sep, 2005

1 commit


30 Aug, 2005

1 commit

  • Remove the "list" member of struct sk_buff, as it is entirely
    redundant. All SKB list removal callers know which list the
    SKB is on, so storing this in sk_buff does nothing other than
    taking up some space.

    Two tricky bits were SCTP, which I took care of, and two ATM
    drivers which Francois Romieu fixed
    up.

    Signed-off-by: David S. Miller
    Signed-off-by: Francois Romieu

    David S. Miller
     

20 Jul, 2005

2 commits


13 Jul, 2005

1 commit


12 Jul, 2005

1 commit

  • Move the protocol specific config options out to the specific protocols.
    With this change net/Kconfig now starts to become readable and serve as a
    good basis for further re-structuring.

    The menu structure is left almost intact, except that indention is
    fixed in most cases. Most visible are the INET changes where several
    "depends on INET" are replaced with a single ifdef INET / endif pair.

    Several new files were created to accomplish this change - they are
    small but serve the purpose that config options are now distributed
    out where they belongs.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Sam Ravnborg