30 Jul, 2015
1 commit
-
This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
16 Jul, 2015
1 commit
-
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.Signed-off-by: Eric Dumazet
Acked-by: Herbert Xu
Signed-off-by: David S. Miller
08 Jul, 2014
1 commit
-
For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).Before fix:
95.02% CPU utilization
154/256/505 90/95/99% latencies
1.13042e+06 tpsTime in functions:
0.28% skb_flow_dissect
0.21% __skb_get_hashAfter fix:
94.95% CPU utilization
156/254/485 90/95/99% latencies
1.15447e+06Neither __skb_get_hash nor skb_flow_dissect appear in perf
Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
12 Jun, 2014
1 commit
-
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk->sk_dst_lock) to prevent corruptions.TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelAddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
[] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
[] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
[] ip4_datagram_release_cb+0x46/0x390 ??:0
[] release_sock+0x17a/0x230 ./net/core/sock.c:2413
[] ip4_datagram_connect+0x462/0x5d0 ??:0
[] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[] SyS_connect+0xe/0x10 ./net/socket.c:1682
[] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629Freed by thread T15455:
[] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
[] dst_release+0x45/0x80 ./net/core/dst.c:280
[] ip4_datagram_connect+0xa1/0x5d0 ??:0
[] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[] SyS_connect+0xe/0x10 ./net/socket.c:1682
[] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629Allocated by thread T15453:
[] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
[] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
[< inlined >] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
[] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
[] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
[] ip4_datagram_connect+0x317/0x5d0 ??:0
[] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
[] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
[] SyS_connect+0xe/0x10 ./net/socket.c:1682
[] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629[2]
[196727.311203] general protection fault: 0000 [#1] SMP
[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
[196727.311377] RIP: 0010:[] [] ipv4_dst_destroy+0x4f/0x80
[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282
[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[196727.311713] Stack:
[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
[196727.311885] Call Trace:
[196727.311907]
[196727.311912] [] dst_destroy+0x32/0xe0
[196727.311959] [] dst_release+0x56/0x80
[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0
[196727.312013] [] tcp_v4_rcv+0x7da/0x820
[196727.312041] [] ? ip_rcv_finish+0x360/0x360
[196727.312070] [] ? nf_hook_slow+0x7d/0x150
[196727.312097] [] ? ip_rcv_finish+0x360/0x360
[196727.312125] [] ip_local_deliver_finish+0xb2/0x230
[196727.312154] [] ip_local_deliver+0x4a/0x90
[196727.312183] [] ip_rcv_finish+0x119/0x360
[196727.312212] [] ip_rcv+0x22b/0x340
[196727.312242] [] ? macvlan_broadcast+0x160/0x160 [macvlan]
[196727.312275] [] __netif_receive_skb_core+0x512/0x640
[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150
[196727.312338] [] __netif_receive_skb+0x21/0x70
[196727.312368] [] netif_receive_skb+0x31/0xa0
[196727.312397] [] napi_gro_receive+0xe8/0x140
[196727.312433] [] ixgbe_poll+0x551/0x11f0 [ixgbe]
[196727.312463] [] ? ip_rcv+0x22b/0x340
[196727.312491] [] net_rx_action+0x111/0x210
[196727.312521] [] ? __netif_receive_skb+0x21/0x70
[196727.312552] [] __do_softirq+0xd0/0x270
[196727.312583] [] call_softirq+0x1c/0x30
[196727.312613] [] do_softirq+0x55/0x90
[196727.312640] [] irq_exit+0x55/0x60
[196727.312668] [] do_IRQ+0x63/0xe0
[196727.312696] [] common_interrupt+0x6a/0x6a
[196727.312722]
[196727.313071] RIP [] ipv4_dst_destroy+0x4f/0x80
[196727.313100] RSP
[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
[196727.380908] Kernel panic - not syncing: Fatal exception in interruptReported-by: Alexey Preobrazhensky
Reported-by: dormando
Signed-off-by: Eric Dumazet
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert
Signed-off-by: David S. Miller
06 Dec, 2013
1 commit
-
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.Signed-off-by: Steffen Klassert
15 Nov, 2013
1 commit
-
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet
Reported-by: Dave Jones
Signed-off-by: David S. Miller
22 Jan, 2013
1 commit
-
This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
09 May, 2011
1 commit
-
This is to make sure that an l2tp socket's inet cork flow is
fully filled in, when it's encapsulated in UDP.Signed-off-by: David S. Miller
29 Apr, 2011
2 commits
-
Now that output route lookups update the flow with
destination address selection, we can fetch it from
fl4->daddr instead of rt->rt_dstSigned-off-by: David S. Miller
-
Now that output route lookups update the flow with
source address selection, we can fetch it from
fl4->saddr instead of rt->rt_srcSigned-off-by: David S. Miller
28 Apr, 2011
1 commit
-
These functions are used together as a unit for route resolution
during connect(). They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.Let the callers provide the on-stack flow object. That way we only
need to initialize it once in the ip_route_connect() call.Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.Also, describe why this set of facilities are needed and how it works
in a big comment.Signed-off-by: David S. Miller
Reviewed-by: Eric Dumazet
03 Mar, 2011
1 commit
-
Instead of on the stack.
Signed-off-by: David S. Miller
02 Mar, 2011
1 commit
-
Since that's what the current vague "flags" thing means.
Signed-off-by: David S. Miller
24 Sep, 2010
1 commit
-
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Sep, 2010
1 commit
-
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).Problem is that following sequence :
fd = socket(...)
connect(fd, &remote, ...)not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)Sequence is :
- autobind() : choose a random local port, insert socket in hash tables
[while local address is INADDR_ANY]
- connect() : set remote address and port, change local address to IP
given by a route lookup.When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.One solution to this problem is to rehash datagram socket if needed.
We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.Reported-by: Krzysztof Piotr Oledzki
Signed-off-by: Eric Dumazet
Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: David S. Miller
13 Jul, 2010
1 commit
-
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jun, 2010
1 commit
-
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Oct, 2009
1 commit
-
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Jul, 2008
1 commit
-
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
29 Jan, 2008
1 commit
-
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
04 Jun, 2007
1 commit
-
Signed-off-by: Wei Dong
Signed-off-by: David S. Miller
11 Feb, 2007
1 commit
-
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
09 Feb, 2007
1 commit
-
Do this even for non-blocking sockets. This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).With help from Venkat Tekkirala.
Signed-off-by: David S. Miller
29 Sep, 2006
1 commit
-
annotated address arguments (port number left alone for now); ditto
for inferred net-endian variables in callers.Signed-off-by: Al Viro
Signed-off-by: David S. Miller
01 Jul, 2006
1 commit
-
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
30 Aug, 2005
2 commits
-
Of this type, mostly:
CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller -
Lots of places just needs the states, not even linux/tcp.h, where this
enum was, needs it.This speeds up development of the refactorings as less sources are
rebuilt when things get moved from net/tcp.h.Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller
17 Apr, 2005
1 commit
-
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!