06 Oct, 2010

4 commits

  • fib_lookup() converted to be called in RCU protected context, no
    reference taken and released on a contended cache line (fib_clntref)

    fib_table_lookup() and fib_semantic_match() get an additional parameter.

    struct fib_info gets an rcu_head field, and is freed after an rcu grace
    period.

    Stress test :
    (Sending 160.000.000 UDP frames on same neighbour,
    IP route cache disabled, dual E5540 @2.53GHz,
    32bit kernel, FIB_HASH) (about same results for FIB_TRIE)

    Before patch :

    real 1m31.199s
    user 0m13.761s
    sys 23m24.780s

    After patch:

    real 1m5.375s
    user 0m14.997s
    sys 15m50.115s

    Before patch Profile :

    13044.00 15.4% __ip_route_output_key vmlinux
    8438.00 10.0% dst_destroy vmlinux
    5983.00 7.1% fib_semantic_match vmlinux
    5410.00 6.4% fib_rules_lookup vmlinux
    4803.00 5.7% neigh_lookup vmlinux
    4420.00 5.2% _raw_spin_lock vmlinux
    3883.00 4.6% rt_set_nexthop vmlinux
    3261.00 3.9% _raw_read_lock vmlinux
    2794.00 3.3% fib_table_lookup vmlinux
    2374.00 2.8% neigh_resolve_output vmlinux
    2153.00 2.5% dst_alloc vmlinux
    1502.00 1.8% _raw_read_lock_bh vmlinux
    1484.00 1.8% kmem_cache_alloc vmlinux
    1407.00 1.7% eth_header vmlinux
    1406.00 1.7% ipv4_dst_destroy vmlinux
    1298.00 1.5% __copy_from_user_ll vmlinux
    1174.00 1.4% dev_queue_xmit vmlinux
    1000.00 1.2% ip_output vmlinux

    After patch Profile :

    13712.00 15.8% dst_destroy vmlinux
    8548.00 9.9% __ip_route_output_key vmlinux
    7017.00 8.1% neigh_lookup vmlinux
    4554.00 5.3% fib_semantic_match vmlinux
    4067.00 4.7% _raw_read_lock vmlinux
    3491.00 4.0% dst_alloc vmlinux
    3186.00 3.7% neigh_resolve_output vmlinux
    3103.00 3.6% fib_table_lookup vmlinux
    2098.00 2.4% _raw_read_lock_bh vmlinux
    2081.00 2.4% kmem_cache_alloc vmlinux
    2013.00 2.3% _raw_spin_lock vmlinux
    1763.00 2.0% __copy_from_user_ll vmlinux
    1763.00 2.0% ip_output vmlinux
    1761.00 2.0% ipv4_dst_destroy vmlinux
    1631.00 1.9% eth_header vmlinux
    1440.00 1.7% _raw_read_unlock_bh vmlinux

    Reference results, if IP route cache is enabled :

    real 0m29.718s
    user 0m10.845s
    sys 7m37.341s

    25213.00 29.5% __ip_route_output_key vmlinux
    9011.00 10.5% dst_release vmlinux
    4817.00 5.6% ip_push_pending_frames vmlinux
    4232.00 5.0% ip_finish_output vmlinux
    3940.00 4.6% udp_sendmsg vmlinux
    3730.00 4.4% __copy_from_user_ll vmlinux
    3716.00 4.4% ip_route_output_flow vmlinux
    2451.00 2.9% __xfrm_lookup vmlinux
    2221.00 2.6% ip_append_data vmlinux
    1718.00 2.0% _raw_spin_lock_bh vmlinux
    1655.00 1.9% __alloc_skb vmlinux
    1572.00 1.8% sock_wfree vmlinux
    1345.00 1.6% kfree vmlinux

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Allow sysadmins to configure the number of multicast
    membership report sent on a link failure event.

    Signed-off-by: Flavio Leitner
    Signed-off-by: David S. Miller

    Flavio Leitner
     
  • David

    This is the first step for RCU conversion of neigh code.

    Next patches will convert hash_buckets[] and "struct neighbour" to RCU
    protected objects.

    Thanks

    [PATCH net-next] net neigh: RCU conversion of neigh hash table

    Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
    neigh_table", a new structure is defined :

    struct neigh_hash_table {
    struct neighbour **hash_buckets;
    unsigned int hash_mask;
    __u32 hash_rnd;
    struct rcu_head rcu;
    };

    And "struct neigh_table" has an RCU protected pointer to such a
    neigh_hash_table.

    This means the signature of (*hash)() function changed: We need to add a
    third parameter with the actual hash_rnd value, since this is not
    anymore a neigh_table field.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In various situations, a device provides a packet to our stack and we
    drop it before it enters protocol stack :
    - softnet backlog full (accounted in /proc/net/softnet_stat)
    - bad vlan tag (not accounted)
    - unknown/unregistered protocol (not accounted)

    We can handle a per-device counter of such dropped frames at core level,
    and automatically adds it to the device provided stats (rx_dropped), so
    that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

    This is a generalization of commit 8990f468a (net: rx_dropped
    accounting), thus reverting it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2010

5 commits


04 Oct, 2010

3 commits

  • While doing stress tests with IP route cache disabled, and multi queue
    devices, I noticed a very high contention on one rwlock used in
    neighbour code.

    When many cpus are trying to send frames (possibly using a high
    performance multiqueue device) to the same neighbour, they fight for the
    neigh->lock rwlock in order to call neigh_hh_init(), and fight on
    hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

    But we dont need to call neigh_hh_init() for dst that are used only
    once. It costs four atomic operations at least, on two contended cache
    lines, plus the high contention on neigh->lock rwlock.

    Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
    inserted in route cache.

    With the stress test bench, sending 160000000 frames on one neighbour,
    results are :

    Before patch:

    real 2m28.406s
    user 0m11.781s
    sys 36m17.964s

    After patch:

    real 1m26.532s
    user 0m12.185s
    sys 20m3.903s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     
  • Use RCU & RTNL protection for mfc_cache_array[]

    ipmr_cache_find() is called under rcu_read_lock();

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2010

3 commits


30 Sep, 2010

4 commits


29 Sep, 2010

1 commit

  • This patch allows a host to be configured to respond to any address in
    a specified range as if it were local, without actually needing to
    configure the address on an interface. This is done through routing
    table configuration. For instance, to configure a host to respond
    to any address in 10.1/16 received on eth0 as a local address we can do:

    ip rule add from all iif eth0 lookup 200
    ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

    This host is now reachable by any 10.1/16 address (route lookup on
    input for packets received on eth0 can find the route). On output, the
    rule will not be matched so that this host can still send packets to
    10.1/16 (not sent on loopback). Presumably, external routing can be
    configured to make sense out of this.

    To make this work, we needed to modify the logic in finding the
    interface which is assigned a given source address for output
    (dev_ip_find). We perform a normal fib_lookup instead of just a
    lookup on the local table, and in the lookup we ignore the input
    interface for matching.

    This patch is useful to implement IP-anycast for subnets of virtual
    addresses.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

28 Sep, 2010

8 commits

  • This sets the active numbers of queues on a net device to match another.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • For RPS, we create a kobject for each RX queue based on the number of
    queues passed to alloc_netdev_mq(). However, drivers generally do not
    determine the numbers of hardware queues to use until much later, so
    this usually represents the maximum number the driver may use and not
    the actual number in use.

    For TX queues, drivers can update the actual number using
    netif_set_real_num_tx_queues(). Add a corresponding function for RX
    queues, netif_set_real_num_rx_queues().

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • Tunnels are going to use percpu for their accounting.

    They are going to use a new tstats field in net_device.

    skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

    IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Phonet stack assumes the presence of Pipe Controller, either in Modem or
    on Application Processing Engine user-space for the Pipe data.
    Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
    implement Pipe controller in them.
    This patch adds Pipe Controller implemenation to Phonet stack to support
    Pipe data over Phonet stack for Nokia Slim Modems.

    Signed-off-by: Kumar Sanghvi
    Acked-by: Linus Walleij
    Signed-off-by: David S. Miller

    Kumar Sanghvi
     
  • Fixes kernel bugzilla #16603

    tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
    zero bytes, for example.

    There is also the problem higher up of how verify_iovec() works. It
    wants to prevent the total length from looking like an error return
    value.

    However it does this using 'int', but syscalls return 'long' (and
    thus signed 64-bit on 64-bit machines). So it could trigger
    false-positives on 64-bit as written. So fix it to use 'long'.

    Reported-by: Olaf Bonorden
    Reported-by: Daniel Büse
    Reported-by: Andrew Morton
    Signed-off-by: David S. Miller

    David S. Miller
     
  • as done in ip_route_connect()

    Signed-off-by: Ulrich Weber
    Signed-off-by: David S. Miller

    Ulrich Weber
     
  • commit 8c0c709eea5cbab97fb464cd68b06f24acc58ee1
    Author: Johannes Berg
    Date: Wed Nov 25 17:46:15 2009 +0100

    mac80211: move cmntr flag out of rx flags

    moved the CMNTR flag into the skb RX flags for
    some aggregation cleanups, but this was wrong
    since the optimisation this flag tried to make
    requires that it is kept across the processing
    of multiple interfaces -- which isn't true for
    flags in the skb. The patch not only broke the
    optimisation, it also introduced a bug: under
    some (common!) circumstances the flag will be
    set on an already freed skb!

    However, investigating this in more detail, I
    found that most of the flags that we set should
    be per packet, _except_ for this one, due to
    a-MPDU processing. Additionally, the flags used
    for processing (currently just this one) need
    to be reset before processing a new packet.

    Since we haven't actually seen bugs reported as
    a result of the wrong flags handling (which is
    not too surprising -- the only real bug case I
    can come up with is an a-MSDU contained in an
    a-MPDU), I'll make a different fix for rc.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     
  • The old ieee80211_find_sta_by_hw method didn't properly
    find VIFS when there was more than one per AP. This caused
    AMPDU logic in ath9k to get the wrong VIF when trying to
    account for transmitted SKBs.

    This patch changes ieee80211_find_sta_by_hw to take a
    localaddr argument to distinguish between VIFs with the
    same AP but different local addresses. The method name
    is changed to ieee80211_find_sta_by_ifaddr.

    Signed-off-by: Ben Greear
    Acked-by: Johannes Berg
    Signed-off-by: John W. Linville

    Ben Greear
     

27 Sep, 2010

6 commits

  • Conflicts:
    drivers/net/qlcnic/qlcnic_init.c
    net/ipv4/ip_output.c

    David S. Miller
     
  • Clean up a missing exit path in the ipv6 module init routines. In
    addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
    for the ipv6_addr_label_ops structure. But if module loading fails, or if the
    ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
    which leaves a now-bogus address on the pernet_list, leading to oopses in
    subsequent registrations. This patch cleans up both the failed load path and
    the unload path. Tested by myself with good results.

    Signed-off-by: Neil Horman

    include/net/addrconf.h | 1 +
    net/ipv6/addrconf.c | 11 ++++++++---
    net/ipv6/addrlabel.c | 5 +++++
    3 files changed, 14 insertions(+), 3 deletions(-)
    Signed-off-by: David S. Miller

    Neil Horman
     
  • loopback driver uses dev->ml_priv to store its percpu stats pointer.
    It uses ugly casts "(void __percpu __force *)" to shut up sparse
    complains.

    Define an union to better document we use ml_priv in loopback driver and
    define a lstats field with appropriate types.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SOCK_MIN_RCVBUF current value is 256 bytes

    It doesnt permit to receive the smallest possible frame, considering
    socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
    sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
    headroom, and we go over the limit.

    With old kernels and 32bit arches, we were under the limit, if netdriver
    was doing copybreak.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Reset queue mapping when an skb is reentering the stack via a tunnel.
    On second pass, the queue mapping from the original device is no
    longer valid.

    Signed-off-by: Tom Herbert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
    offset and size fields. This patch saves 72 bytes per skb on i386, or
    128 bytes after rounding.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Sep, 2010

2 commits

  • If the PM support is available this is passed
    through the platform instead to be hard-coded
    in the core files.
    WoL on Magic Frame can be enabled by using
    the ethtool support.

    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: David S. Miller

    Giuseppe Cavallaro
     
  • This patch fixes stale mac80211_tx_control_flags for
    filtered / retried frames.

    Because ieee80211_handle_filtered_frame feeds skbs back
    into the tx path, they have to be stripped of some tx
    flags so they won't confuse the stack, driver or device.

    Cc:
    Acked-by: Johannes Berg
    Signed-off-by: Christian Lamparter
    Signed-off-by: John W. Linville

    Christian Lamparter
     

24 Sep, 2010

1 commit


23 Sep, 2010

1 commit


22 Sep, 2010

2 commits