06 Mar, 2010
2 commits
-
sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlogSigned-off-by: Zhu Yi
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
Make udp adapt to the limited socket backlog change.
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: "Pekka Savola (ipv6)"
Cc: Patrick McHardy
Signed-off-by: Zhu Yi
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Feb, 2010
1 commit
-
The variable 'copied' is used in udp_recvmsg() to emphasize that the passed
'len' is adjusted to fit the actual datagram length. But the same can be
done by adjusting 'len' directly. This patch thus removes the indirection.Signed-off-by: Gerrit Renker
Signed-off-by: David S. Miller
18 Jan, 2010
1 commit
-
__net_init/__net_exit are apparently not going away, so use them
to full extent.In some cases __net_init was removed, because it was called from
__net_exit code.Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
14 Dec, 2009
1 commit
-
Now we can have a large udp hash table, udp_lib_get_port() loop
should be converted to a do {} while (cond) form,
or we dont enter it at all if hash table size is exactly 65536.Reported-by: Yinghai Lu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
24 Nov, 2009
1 commit
-
On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
> It should be of the form:
> if (x &&
> y)
>
> or:
> if (x && y)
>
> Fix patches, rather than complaints, for existing cases where things
> do not follow this pattern are certainly welcome.Also collapsed some multiple tabs to single space.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
11 Nov, 2009
1 commit
-
UDP bind() can be O(N^2) in some pathological cases.
Thanks to secondary hash tables, we can make it O(N)
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Nov, 2009
6 commits
-
When skb_clone() fails, we should increment sk_drops and SNMP counters.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.It's also a base for a future RCU conversion of multicast recption.
Signed-off-by: Eric Dumazet
Signed-off-by: Lucian Adrian Grijincu
Signed-off-by: David S. Miller -
We first locate the (local port) hash chain head
If few sockets are in this chain, we proceed with previous lookup algo.If too many sockets are listed, we take a look at the secondary
(port, address) hash chain we added in previous patch.We choose the shortest chain and proceed with a RCU lookup on the elected chain.
But, if we chose (port, address) chain, and fail to find a socket on given address,
we must try another lookup on (port, INADDR_ANY) chain to find socket not bound
to a particular IP.-> No extra cost for typical setups, where the first lookup will probabbly
be performed.RCU lookups everywhere, we dont acquire spinlock.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Extends udp_table to contain a secondary hash table.
socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_nodeudp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Union sk_hash with two u16 hashes for udp (no extra memory taken)
One 16 bits hash on (local port) value (the previous udp 'hash')
One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Nov, 2009
1 commit
-
Conflicts:
drivers/net/usb/cdc_ether.cAll CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.Signed-off-by: David S. Miller
31 Oct, 2009
1 commit
-
On UDP sockets, we must call skb_free_datagram() with socket locked,
or risk sk_forward_alloc corruption. This requirement is not respected
in SUNRPC.Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC
Reported-by: Francis Moreau
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Oct, 2009
2 commits
-
- skb_kill_datagram() can increment sk->sk_drops itself, not callers.
- UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
15 Oct, 2009
1 commit
-
sock_queue_rcv_skb() can update sk_drops itself, removing need for
callers to take care of it. This is more consistent since
sock_queue_rcv_skb() also reads sk_drops when queueing a skb.This adds sk_drops managment to many protocols that not cared yet.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Oct, 2009
1 commit
13 Oct, 2009
2 commits
-
udp_poll() can in some circumstances drop frames with incorrect checksums.
Problem is we now have to lock the socket while dropping frames, or risk
sk_forward corruption.This bug is present since commit 95766fff6b9a78d1
([UDP]: Add memory accounting.)While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Create a new socket level option to report number of queue overflows
Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.Notes:
1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.3) This applies cleanly to net-next assuming that commit
977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is revertedSigned-off-by: Neil Horman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Oct, 2009
1 commit
-
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helperAdd an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.dmesg logs two new lines :
[ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
02 Oct, 2009
1 commit
-
This patch against v2.6.31 adds support for route lookup using sk_mark in some
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
lookup correctly if TCP sockets with SO_MARK were used.Signed-off-by: Atis Elsts
Acked-by: Eric Dumazet
01 Oct, 2009
1 commit
-
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.Signed-off-by: David S. Miller
03 Sep, 2009
1 commit
-
Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.Example after an UDP tx flood
# netstat -s
...
IP:
1487048 outgoing packets dropped
...
Udp:
...
SndbufErrors: 1487048send() syscalls, do however still return an OK status, to not
break applications.Note : send() manual page explicitly says for -ENOBUFS error :
"The output queue for a network interface was full.
This generally indicates that the interface has stopped sending,
but may be caused by transient congestion.
(Normally, this does not occur in Linux. Packets are just silently
dropped when a device queue overflows.) "This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Jul, 2009
1 commit
-
Pure style cleanups.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Jul, 2009
1 commit
-
- validate and forward GSO UDP/IPv4 packets from untrusted sources.
- do software UFO if the outgoing device doesn't support UFO.Signed-off-by: Sridhar Samudrala
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
18 Jun, 2009
1 commit
-
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
03 Jun, 2009
1 commit
-
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;Delete skb->dst field
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Apr, 2009
1 commit
-
Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6
(ipv6: Fix conflict resolutions during ipv6 binding)
introduced a regression where time-wait sockets were
not treated correctly. This resulted in the following:BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
IP: [] ipv4_rcv_saddr_equal+0x61/0x70
...
Call Trace:
[] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
[] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
[] inet_csk_get_port+0x1ee/0x400
[] inet6_bind+0x1cf/0x3a0 [ipv6]
[] ? sockfd_lookup_light+0x3c/0xd0
[] sys_bind+0x89/0x100
[] ? trace_hardirqs_on_thunk+0x3a/0x3c
[] system_call_fastpath+0x16/0x1bTested-by: Brian Haley
Tested-by: Ed Tomlinson
Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller
26 Mar, 2009
1 commit
25 Mar, 2009
1 commit
-
The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal()
which at times wrongly identified intersections between addresses.
It particularly broke down under a few instances and caused erroneous
bind conflicts.Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller
24 Mar, 2009
1 commit
-
Reading zero bytes from /proc/net/udp or other similar files which use
the same seq_file udp infrastructure panics kernel in that way:=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
read/1985 is trying to release lock (&table->hash[i].lock) at:
[] udp_seq_stop+0x27/0x29
but there are no more locks to release!other info that might help us debug this:
1 lock held by read/1985:
#0: (&p->lock){--..}, at: [] seq_read+0x38/0x348stack backtrace:
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
[] ? udp_seq_stop+0x27/0x29
[] print_unlock_inbalance_bug+0xd6/0xe1
[] lock_release_non_nested+0x9e/0x1c6
[] ? seq_read+0xb2/0x348
[] ? mark_held_locks+0x68/0x86
[] ? udp_seq_stop+0x27/0x29
[] lock_release+0x15d/0x189
[] _spin_unlock_bh+0x1e/0x34
[] udp_seq_stop+0x27/0x29
[] seq_read+0x2bb/0x348
[] ? seq_read+0x0/0x348
[] proc_reg_read+0x90/0xaf
[] vfs_read+0xa6/0x103
[] ? trace_hardirqs_on_caller+0x12f/0x153
[] sys_read+0x45/0x69
[] system_call_fastpath+0x16/0x1b
BUG: scheduling while atomic: read/1985/0xffffff00
INFO: lockdep is turned off.
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s
eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs
rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan]
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
[] ? __debug_show_held_locks+0x1b/0x24
[] __schedule_bug+0x7e/0x83
[] schedule+0xce/0x838
[] ? fsnotify_access+0x5f/0x67
[] ? sysret_careful+0xb/0x37
[] ? trace_hardirqs_on_caller+0x1f/0x153
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] sysret_careful+0x31/0x37
read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6
Kernel panic - not syncing: Aiee, killing interrupt handler!udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost
during splitting global udp_hash_lock to subsequent spinlocks.Signed-off by: Vitaly Mayatskikh
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Mar, 2009
1 commit
-
Signed-off-by: Neil Horman
include/linux/skbuff.h | 4 +++-
net/core/datagram.c | 2 +-
net/core/skbuff.c | 22 ++++++++++++++++++++++
net/ipv4/arp.c | 2 +-
net/ipv4/udp.c | 2 +-
net/packet/af_packet.c | 2 +-
6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller
16 Feb, 2009
1 commit
-
Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.Signed-off-by: Patrick Ohly
Signed-off-by: David S. Miller
06 Feb, 2009
2 commits
-
Like the UDP header fix, pskb_may_pull() can potentially
alter the SKB buffer. Thus the saddr and daddr, pointers
may point to the old skb->data buffer.I haven't seen corruptions, as its only seen if the old
skb->data buffer were reallocated by another user and
written into very quickly (or poison'd by SLAB debugging).Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller -
The UDP header pointer assignment must happen after calling
pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB
buffer.This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
03 Feb, 2009
1 commit
-
Commit 93821778def10ec1e69aa3ac10adee975dad4ff3 (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.This field can be used to detect incorrect sizing of socket receive
buffers.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Jan, 2009
1 commit
-
commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms
Based on a report from Vitaly Mayatskikh
Reported-by: Vitaly Mayatskikh
Signed-off-by: Eric Dumazet
Tested-by: Vitaly Mayatskikh
Signed-off-by: David S. Miller
26 Nov, 2008
1 commit
-
Impact: Optimization
Like done in inet_unhash(), we can avoid taking a chain lock if
socket is not hashed in udp_unhash()Triggered by close(socket(AF_INET, SOCK_DGRAM, 0));
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller