29 Oct, 2009

3 commits


28 Oct, 2009

3 commits


27 Oct, 2009

1 commit


24 Oct, 2009

2 commits

  • When handling large number of netdevice, rtnl_dump_ifinfo()
    is very slow because it has O(N^2) complexity.

    Instead of scanning one single list, we can use the 256 sub lists
    of the dev_index hash table.

    This considerably speedups "ip link" operations

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • SIT tunnels use one rwlock to protect their prl entries.

    This first patch adds RCU locking for prl management,
    with standard call_rcu() calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Oct, 2009

1 commit


21 Oct, 2009

2 commits

  • dst_negative_advice() should check for changed dst and reset
    sk_tx_queue_mapping accordingly. Pass sock to the callers of
    dst_negative_advice.

    (sk_reset_txq is defined just for use by dst_negative_advice. The
    only way I could find to get around this is to move dst_negative_()
    from dst.h to dst.c, include sock.h in dst.c, etc)

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • Introduce sk_tx_queue_mapping; and functions that set, test and
    get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
    cache is set/reset, and in socket alloc. Setting txq to -1 and
    using valid txq= allows the tx path to use the value
    of sk_tx_queue_mapping directly instead of subtracting 1 on every
    tx.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

20 Oct, 2009

2 commits


19 Oct, 2009

4 commits

  • The last users of skb_icv_walk are converted to ahash now,
    so skb_icv_walk is unused and can be removed.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • ah4 and ah6 are converted to ahash now, so we can remove the
    code for the obsolete hash algorithm.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • To support for ahash algorithms, we add a pointer to a
    crypto_ahash to ah_data.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Oct, 2009

3 commits



13 Oct, 2009

4 commits

  • Storing the mask (size - 1) instead of the size allows fast path to be
    a bit faster.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     
  • ieee80211_rx() must be called with softirqs disabled
    since the networking stack requires this for netif_rx()
    and some code in mac80211 can assume that it can not
    be processing its own tasklet and this call at the same
    time.

    It may be possible to remove this requirement after a
    careful audit of mac80211 and doing any needed locking
    improvements in it along with disabling softirqs around
    netif_rx(). An alternative might be to push all packet
    processing to process context in mac80211, instead of
    to the tasklet, and add other synchronisation.

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

12 Oct, 2009

1 commit

  • Since commit a98b65a3 (net: annotate struct sock bitfield), we lost
    8 bytes in struct sock on 64bit arches because of
    kmemcheck_bitfield_end(flags) misplacement.

    Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks,
    sk_protocol and sk_type in the 'flags' 32bits bitfield

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Oct, 2009

1 commit


08 Oct, 2009

3 commits

  • UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
    several setups.

    4000 active UDP sockets -> 32 sockets per chain in average. An
    incoming frame has to lookup all sockets to find best match, so long
    chains hurt latency.

    Instead of a fixed size hash table that cant be perfect for every
    needs, let UDP stack choose its table size at boot time like tcp/ip
    route, using alloc_large_system_hash() helper

    Add an optional boot parameter, uhash_entries=x so that an admin can
    force a size between 256 and 65536 if needed, like thash_entries and
    rhash_entries.

    dmesg logs two new lines :
    [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
    [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

    Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
    debugging spinlocks.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It's useful to provide firmware and hardware version to user space and have a
    generic interface to retrieve them. Users can provide the version information
    in bug reports etc.

    Add fields for firmware and hardware version to struct wiphy.

    (Dropped nl80211 bits for now and modified remaining bits in favor of
    ethtool. -- JWL)

    Cc: Kalle Valo
    Signed-off-by: John W. Linville

    Kalle Valo
     
  • Refactor wext to
    * split out iwpriv handling
    * split out iwspy handling
    * split out procfs support
    * allow cfg80211 to have wireless extensions compat code
    w/o CONFIG_WIRELESS_EXT

    After this, drivers need to
    - select WIRELESS_EXT - for wext support
    - select WEXT_PRIV - for iwpriv support
    - select WEXT_SPY - for iwspy support

    except cfg80211 -- which gets new hooks in wext-core.c
    and can then get wext handlers without CONFIG_WIRELESS_EXT.

    Wireless extensions procfs support is auto-selected
    based on PROC_FS and anything that requires the wext core
    (i.e. WIRELESS_EXT or CFG80211_WEXT).

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

07 Oct, 2009

4 commits

  • All usages of structure net_proto_ops should be declared const.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Jarek Poplawski a écrit :
    >
    >
    > Hmm... So you made me to do some "real" work here, and guess what?:
    > there is one serious checkpatch warning! ;-) Plus, this new parameter
    > should be added to the function description. Otherwise:
    > Signed-off-by: Jarek Poplawski
    >
    > Thanks,
    > Jarek P.
    >
    > PS: I guess full "Don't" would show we really mean it...

    Okay :) Here is the last round, before the night !

    Thanks again

    [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

    We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
    is running.

    # tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
    rate 0bit 0pps backlog 0b 0p requeues 0

    User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
    one (because no estimator is active)

    After this patch, tc command output is :
    $ tc -s -d qdisc
    qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
    Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

    We add a parameter to gnet_stats_copy_rate_est() function so that
    it can use gen_estimator_active(bstats, r), as suggested by Jarek.

    This parameter can be NULL if check is not necessary, (htb for
    example has a mandatory rate estimator)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
    mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
    deploy IPv6 unicast service to IPv4 sites to which it provides
    customer premise equipment. Like 6to4, it utilizes stateless IPv6 in
    IPv4 encapsulation in order to transit IPv4-only network
    infrastructure. Unlike 6to4, a 6rd service provider uses an IPv6
    prefix of its own in place of the fixed 6to4 prefix.

    With this option enabled, the SIT driver offers 6rd functionality by
    providing additional ioctl API to configure the IPv6 Prefix for in
    stead of static 2002::/16 for 6to4.

    Original patch was done by Alexandre Cassen
    based on old Internet-Draft.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki / 吉藤英明
     
  • An incoming datagram must bring into cpu cache *lot* of cache lines,
    in particular : (other parts omitted (hash chains, ip route cache...))

    On 32bit arches :

    offsetof(struct sock, sk_rcvbuf) =0x30 (read)
    offsetof(struct sock, sk_lock) =0x34 (rw)

    offsetof(struct sock, sk_sleep) =0x50 (read)
    offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
    offsetof(struct sock, sk_receive_queue)=0x74 (rw)

    offsetof(struct sock, sk_forward_alloc)=0x98 (rw)

    offsetof(struct sock, sk_callback_lock)=0xcc (rw)
    offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
    offsetof(struct sock, sk_filter) =0xf8 (read)

    offsetof(struct sock, sk_socket) =0x138 (read)

    offsetof(struct sock, sk_data_ready) =0x15c (read)

    We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
    with no fasync() structures. (socket->fasync_list ptr is probably already in cache
    because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

    This avoids one cache line load per incoming packet for common cases (no fasync())

    We can leave (or even move in a future patch) sk->sk_socket in a cold location

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Oct, 2009

2 commits

  • We currently dirty a cache line to update tunnel device stats
    (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
    counters that already are present in cpu cache, in the cache
    line shared with txq->_xmit_lock

    This patch extends IPTUNNEL_XMIT() macro to use txq pointer
    provided by the caller.

    Also &tunnel->dev->stats can be replaced by &dev->stats

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The FIB algorithim for IPV4 is set at compile time, but kernel goes through
    the overhead of function call indirection at runtime. Save some
    cycles by turning the indirect calls to direct calls to either
    hash or trie code.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2009

1 commit

  • The move away from having drivers assign wireless handlers,
    in favour of making cfg80211 assign them, broke the sysfs
    registration (the wireless/ dir went missing) because the
    handlers are now assigned only after registration, which is
    too late.

    Fix this by special-casing cfg80211-based devices, all
    of which are required to have an ieee80211_ptr, in the
    sysfs code, and also using get_wireless_stats() to have
    the same values reported as in procfs.

    Signed-off-by: Johannes Berg
    Reported-by: Hugh Dickins
    Tested-by: Hugh Dickins
    Signed-off-by: John W. Linville

    Johannes Berg