06 Mar, 2010

2 commits

  • sk_add_backlog -> __sk_add_backlog
    sk_add_backlog_limited -> sk_add_backlog

    Signed-off-by: Zhu Yi
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhu Yi
     
  • Make udp adapt to the limited socket backlog change.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: "Pekka Savola (ipv6)"
    Cc: Patrick McHardy
    Signed-off-by: Zhu Yi
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zhu Yi
     

13 Feb, 2010

1 commit

  • The variable 'copied' is used in udp_recvmsg() to emphasize that the passed
    'len' is adjusted to fit the actual datagram length. But the same can be
    done by adjusting 'len' directly. This patch thus removes the indirection.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

18 Jan, 2010

1 commit


14 Dec, 2009

1 commit

  • Now we can have a large udp hash table, udp_lib_get_port() loop
    should be converted to a do {} while (cond) form,
    or we dont enter it at all if hash table size is exactly 65536.

    Reported-by: Yinghai Lu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Nov, 2009

1 commit

  • On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
    > It should be of the form:
    > if (x &&
    > y)
    >
    > or:
    > if (x && y)
    >
    > Fix patches, rather than complaints, for existing cases where things
    > do not follow this pattern are certainly welcome.

    Also collapsed some multiple tabs to single space.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

11 Nov, 2009

1 commit


09 Nov, 2009

6 commits

  • When skb_clone() fails, we should increment sk_drops and SNMP counters.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • UDP multicast rx path is a bit complex and can hold a spinlock
    for a long time.

    Using a small (32 or 64 entries) stack of socket pointers can help
    to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
    outside of the lock, in most cases.

    It's also a base for a future RCU conversion of multicast recption.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Lucian Adrian Grijincu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We first locate the (local port) hash chain head
    If few sockets are in this chain, we proceed with previous lookup algo.

    If too many sockets are listed, we take a look at the secondary
    (port, address) hash chain we added in previous patch.

    We choose the shortest chain and proceed with a RCU lookup on the elected chain.

    But, if we chose (port, address) chain, and fail to find a socket on given address,
    we must try another lookup on (port, INADDR_ANY) chain to find socket not bound
    to a particular IP.

    -> No extra cost for typical setups, where the first lookup will probabbly
    be performed.

    RCU lookups everywhere, we dont acquire spinlock.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Extends udp_table to contain a secondary hash table.

    socket anchor for this second hash is free, because UDP
    doesnt use skc_bind_node : We define an union to hold
    both skc_bind_node & a new hlist_nulls_node udp_portaddr_node

    udp_lib_get_port() inserts sockets into second hash chain
    (additional cost of one atomic op)

    udp_lib_unhash() deletes socket from second hash chain
    (additional cost of one atomic op)

    Note : No spinlock lockdep annotation is needed, because
    lock for the secondary hash chain is always get after
    lock for primary hash chain.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Union sk_hash with two u16 hashes for udp (no extra memory taken)

    One 16 bits hash on (local port) value (the previous udp 'hash')

    One 16 bits hash on (local address, local port) values, initialized
    but not yet used. This second hash is using jenkin hash for better
    distribution.

    Because the 'port' is xored later, a partial hash is performed
    on local address + net_hash_mix(net)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Adds a counter in udp_hslot to keep an accurate count
    of sockets present in chain.

    This will permit to upcoming UDP lookup algo to chose
    the shortest chain when secondary hash is added.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Nov, 2009

1 commit


31 Oct, 2009

1 commit

  • On UDP sockets, we must call skb_free_datagram() with socket locked,
    or risk sk_forward_alloc corruption. This requirement is not respected
    in SUNRPC.

    Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC

    Reported-by: Francis Moreau
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Oct, 2009

2 commits

  • - skb_kill_datagram() can increment sk->sk_drops itself, not callers.

    - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Oct, 2009

1 commit

  • sock_queue_rcv_skb() can update sk_drops itself, removing need for
    callers to take care of it. This is more consistent since
    sock_queue_rcv_skb() also reads sk_drops when queueing a skb.

    This adds sk_drops managment to many protocols that not cared yet.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Oct, 2009

1 commit


13 Oct, 2009

2 commits

  • udp_poll() can in some circumstances drop frames with incorrect checksums.

    Problem is we now have to lock the socket while dropping frames, or risk
    sk_forward corruption.

    This bug is present since commit 95766fff6b9a78d1
    ([UDP]: Add memory accounting.)

    While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

08 Oct, 2009

1 commit

  • UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
    several setups.

    4000 active UDP sockets -> 32 sockets per chain in average. An
    incoming frame has to lookup all sockets to find best match, so long
    chains hurt latency.

    Instead of a fixed size hash table that cant be perfect for every
    needs, let UDP stack choose its table size at boot time like tcp/ip
    route, using alloc_large_system_hash() helper

    Add an optional boot parameter, uhash_entries=x so that an admin can
    force a size between 256 and 65536 if needed, like thash_entries and
    rhash_entries.

    dmesg logs two new lines :
    [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
    [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

    Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
    debugging spinlocks.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Oct, 2009

1 commit

  • This patch against v2.6.31 adds support for route lookup using sk_mark in some
    more places. The benefits from this patch are the following.
    First, SO_MARK option now has effect on UDP sockets too.
    Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
    lookup correctly if TCP sockets with SO_MARK were used.

    Signed-off-by: Atis Elsts
    Acked-by: Eric Dumazet

    Atis Elsts
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Sep, 2009

1 commit

  • Christoph Lameter pointed out that packet drops at qdisc level where not
    accounted in SNMP counters. Only if application sets IP_RECVERR, drops
    are reported to user (-ENOBUFS errors) and SNMP counters updated.

    IP_RECVERR is used to enable extended reliable error message passing,
    but these are not needed to update system wide SNMP stats.

    This patch changes things a bit to allow SNMP counters to be updated,
    regardless of IP_RECVERR being set or not on the socket.

    Example after an UDP tx flood
    # netstat -s
    ...
    IP:
    1487048 outgoing packets dropped
    ...
    Udp:
    ...
    SndbufErrors: 1487048

    send() syscalls, do however still return an OK status, to not
    break applications.

    Note : send() manual page explicitly says for -ENOBUFS error :

    "The output queue for a network interface was full.
    This generally indicates that the interface has stopped sending,
    but may be caused by transient congestion.
    (Normally, this does not occur in Linux. Packets are just silently
    dropped when a device queue overflows.) "

    This is not true for IP_RECVERR enabled sockets : a send() syscall
    that hit a qdisc drop returns an ENOBUFS error.

    Many thanks to Christoph, David, and last but not least, Alexey !

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jul, 2009

1 commit


13 Jul, 2009

1 commit


18 Jun, 2009

1 commit

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Apr, 2009

1 commit

  • Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6
    (ipv6: Fix conflict resolutions during ipv6 binding)
    introduced a regression where time-wait sockets were
    not treated correctly. This resulted in the following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
    IP: [] ipv4_rcv_saddr_equal+0x61/0x70
    ...
    Call Trace:
    [] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
    [] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
    [] inet_csk_get_port+0x1ee/0x400
    [] inet6_bind+0x1cf/0x3a0 [ipv6]
    [] ? sockfd_lookup_light+0x3c/0xd0
    [] sys_bind+0x89/0x100
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b

    Tested-by: Brian Haley
    Tested-by: Ed Tomlinson
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

26 Mar, 2009

1 commit


25 Mar, 2009

1 commit


24 Mar, 2009

1 commit

  • Reading zero bytes from /proc/net/udp or other similar files which use
    the same seq_file udp infrastructure panics kernel in that way:

    =====================================
    [ BUG: bad unlock balance detected! ]
    -------------------------------------
    read/1985 is trying to release lock (&table->hash[i].lock) at:
    [] udp_seq_stop+0x27/0x29
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by read/1985:
    #0: (&p->lock){--..}, at: [] seq_read+0x38/0x348

    stack backtrace:
    Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
    Call Trace:
    [] ? udp_seq_stop+0x27/0x29
    [] print_unlock_inbalance_bug+0xd6/0xe1
    [] lock_release_non_nested+0x9e/0x1c6
    [] ? seq_read+0xb2/0x348
    [] ? mark_held_locks+0x68/0x86
    [] ? udp_seq_stop+0x27/0x29
    [] lock_release+0x15d/0x189
    [] _spin_unlock_bh+0x1e/0x34
    [] udp_seq_stop+0x27/0x29
    [] seq_read+0x2bb/0x348
    [] ? seq_read+0x0/0x348
    [] proc_reg_read+0x90/0xaf
    [] vfs_read+0xa6/0x103
    [] ? trace_hardirqs_on_caller+0x12f/0x153
    [] sys_read+0x45/0x69
    [] system_call_fastpath+0x16/0x1b
    BUG: scheduling while atomic: read/1985/0xffffff00
    INFO: lockdep is turned off.
    Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s
    eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs
    rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan]
    Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
    Call Trace:
    [] ? __debug_show_held_locks+0x1b/0x24
    [] __schedule_bug+0x7e/0x83
    [] schedule+0xce/0x838
    [] ? fsnotify_access+0x5f/0x67
    [] ? sysret_careful+0xb/0x37
    [] ? trace_hardirqs_on_caller+0x1f/0x153
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] sysret_careful+0x31/0x37
    read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6
    Kernel panic - not syncing: Aiee, killing interrupt handler!

    udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost
    during splitting global udp_hash_lock to subsequent spinlocks.

    Signed-off by: Vitaly Mayatskikh
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Vitaly Mayatskikh
     

14 Mar, 2009

1 commit


16 Feb, 2009

1 commit


06 Feb, 2009

2 commits

  • Like the UDP header fix, pskb_may_pull() can potentially
    alter the SKB buffer. Thus the saddr and daddr, pointers
    may point to the old skb->data buffer.

    I haven't seen corruptions, as its only seen if the old
    skb->data buffer were reallocated by another user and
    written into very quickly (or poison'd by SLAB debugging).

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • The UDP header pointer assignment must happen after calling
    pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB
    buffer.

    This was exposted by running multicast traffic through the NIU driver,
    as it won't prepull the protocol headers into the linear area on
    receive.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

03 Feb, 2009

1 commit


27 Jan, 2009

1 commit

  • commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
    (udp: Improve port randomization) introduced a regression for UDP bind() syscall
    to null port (getting a random port) in case lot of ports are already in use.

    This is because we do about 28000 scans of very long chains (220 sockets per chain),
    with many spin_lock_bh()/spin_unlock_bh() calls.

    Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
    so that we scan chains at most once.

    Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms

    Based on a report from Vitaly Mayatskikh

    Reported-by: Vitaly Mayatskikh
    Signed-off-by: Eric Dumazet
    Tested-by: Vitaly Mayatskikh
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Nov, 2008

1 commit