30 Jul, 2015

1 commit

  • This patch creates sk_set_txhash and eliminates protocol specific
    inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
    random number instead of performing flow dissection. sk_set_txash
    is also allowed to be called multiple times for the same socket,
    we'll need this when redoing the hash for negative routing advice.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

16 Jul, 2015

1 commit

  • ip6_datagram_connect() is doing a lot of socket changes without
    socket being locked.

    This looks wrong, at least for udp_lib_rehash() which could corrupt
    lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

    Signed-off-by: Eric Dumazet
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jul, 2014

1 commit

  • For a connected socket we can precompute the flow hash for setting
    in skb->hash on output. This is a performance advantage over
    calculating the skb->hash for every packet on the connection. The
    computation is done using the common hash algorithm to be consistent
    with computations done for packets of the connection in other states
    where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

    This patch adds sk_txhash to the sock structure. inet_set_txhash and
    ip6_set_txhash functions are added which are called from points in
    TCP and UDP where socket moves to established state.

    skb_set_hash_from_sk is a function which sets skb->hash from the
    sock txhash value. This is called in UDP and TCP transmit path when
    transmitting within the context of a socket.

    Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
    interface (in this case skb_get_hash called on every TX packet to
    create a UDP source port).

    Before fix:

    95.02% CPU utilization
    154/256/505 90/95/99% latencies
    1.13042e+06 tps

    Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

    After fix:

    94.95% CPU utilization
    156/254/485 90/95/99% latencies
    1.15447e+06

    Neither __skb_get_hash nor skb_flow_dissect appear in perf

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

12 Jun, 2014

1 commit

  • Alexey gave a AddressSanitizer[1] report that finally gave a good hint
    at where was the origin of various problems already reported by Dormando
    in the past [2]

    Problem comes from the fact that UDP can have a lockless TX path, and
    concurrent threads can manipulate sk_dst_cache, while another thread,
    is holding socket lock and calls __sk_dst_set() in
    ip4_datagram_release_cb() (this was added in linux-3.8)

    It seems that all we need to do is to use sk_dst_check() and
    sk_dst_set() so that all the writers hold same spinlock
    (sk->sk_dst_lock) to prevent corruptions.

    TCP stack do not need this protection, as all sk_dst_cache writers hold
    the socket lock.

    [1]
    https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

    AddressSanitizer: heap-use-after-free in ipv4_dst_check
    Read of size 2 by thread T15453:
    [] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
    [] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
    [] ip4_datagram_release_cb+0x46/0x390 ??:0
    [] release_sock+0x17a/0x230 ./net/core/sock.c:2413
    [] ip4_datagram_connect+0x462/0x5d0 ??:0
    [] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
    [] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
    [] SyS_connect+0xe/0x10 ./net/socket.c:1682
    [] system_call_fastpath+0x16/0x1b
    ./arch/x86/kernel/entry_64.S:629

    Freed by thread T15455:
    [] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
    [] dst_release+0x45/0x80 ./net/core/dst.c:280
    [] ip4_datagram_connect+0xa1/0x5d0 ??:0
    [] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
    [] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
    [] SyS_connect+0xe/0x10 ./net/socket.c:1682
    [] system_call_fastpath+0x16/0x1b
    ./arch/x86/kernel/entry_64.S:629

    Allocated by thread T15453:
    [] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
    [] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
    [< inlined >] __ip_route_output_key+0x3e8/0xf70
    __mkroute_output ./net/ipv4/route.c:1939
    [] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
    [] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
    [] ip4_datagram_connect+0x317/0x5d0 ??:0
    [] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
    [] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
    [] SyS_connect+0xe/0x10 ./net/socket.c:1682
    [] system_call_fastpath+0x16/0x1b
    ./arch/x86/kernel/entry_64.S:629

    [2]
    [196727.311203] general protection fault: 0000 [#1] SMP
    [196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
    [196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
    [196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
    [196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
    [196727.311377] RIP: 0010:[] [] ipv4_dst_destroy+0x4f/0x80
    [196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282
    [196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
    [196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
    [196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
    [196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
    [196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
    [196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
    [196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [196727.311713] Stack:
    [196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
    [196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
    [196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
    [196727.311885] Call Trace:
    [196727.311907]
    [196727.311912] [] dst_destroy+0x32/0xe0
    [196727.311959] [] dst_release+0x56/0x80
    [196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0
    [196727.312013] [] tcp_v4_rcv+0x7da/0x820
    [196727.312041] [] ? ip_rcv_finish+0x360/0x360
    [196727.312070] [] ? nf_hook_slow+0x7d/0x150
    [196727.312097] [] ? ip_rcv_finish+0x360/0x360
    [196727.312125] [] ip_local_deliver_finish+0xb2/0x230
    [196727.312154] [] ip_local_deliver+0x4a/0x90
    [196727.312183] [] ip_rcv_finish+0x119/0x360
    [196727.312212] [] ip_rcv+0x22b/0x340
    [196727.312242] [] ? macvlan_broadcast+0x160/0x160 [macvlan]
    [196727.312275] [] __netif_receive_skb_core+0x512/0x640
    [196727.312308] [] ? kmem_cache_alloc+0x13b/0x150
    [196727.312338] [] __netif_receive_skb+0x21/0x70
    [196727.312368] [] netif_receive_skb+0x31/0xa0
    [196727.312397] [] napi_gro_receive+0xe8/0x140
    [196727.312433] [] ixgbe_poll+0x551/0x11f0 [ixgbe]
    [196727.312463] [] ? ip_rcv+0x22b/0x340
    [196727.312491] [] net_rx_action+0x111/0x210
    [196727.312521] [] ? __netif_receive_skb+0x21/0x70
    [196727.312552] [] __do_softirq+0xd0/0x270
    [196727.312583] [] call_softirq+0x1c/0x30
    [196727.312613] [] do_softirq+0x55/0x90
    [196727.312640] [] irq_exit+0x55/0x60
    [196727.312668] [] do_IRQ+0x63/0xe0
    [196727.312696] [] common_interrupt+0x6a/0x6a
    [196727.312722]
    [196727.313071] RIP [] ipv4_dst_destroy+0x4f/0x80
    [196727.313100] RSP
    [196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
    [196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

    Reported-by: Alexey Preobrazhensky
    Reported-by: dormando
    Signed-off-by: Eric Dumazet
    Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
    Cc: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Dec, 2013

1 commit


15 Nov, 2013

1 commit

  • ip4_datagram_connect() being called from process context,
    it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
    otherwise we can deadlock on 32bit arches, or get corruptions of
    SNMP counters.

    Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
    Signed-off-by: Eric Dumazet
    Reported-by: Dave Jones
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Jan, 2013

1 commit


09 May, 2011

1 commit


29 Apr, 2011

2 commits


28 Apr, 2011

1 commit

  • These functions are used together as a unit for route resolution
    during connect(). They address the chicken-and-egg problem that
    exists when ports need to be allocated during connect() processing,
    yet such port allocations require addressing information from the
    routing code.

    It's currently more heavy handed than it needs to be, and in
    particular we allocate and initialize a flow object twice.

    Let the callers provide the on-stack flow object. That way we only
    need to initialize it once in the ip_route_connect() call.

    Later, if ip_route_newports() needs to do anything, it re-uses that
    flow object as-is except for the ports which it updates before the
    route re-lookup.

    Also, describe why this set of facilities are needed and how it works
    in a big comment.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David S. Miller
     

03 Mar, 2011

1 commit


02 Mar, 2011

1 commit


24 Sep, 2010

1 commit


09 Sep, 2010

1 commit

  • commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
    added a secondary hash on UDP, hashed on (local addr, local port).

    Problem is that following sequence :

    fd = socket(...)
    connect(fd, &remote, ...)

    not only selects remote end point (address and port), but also sets
    local address, while UDP stack stored in secondary hash table the socket
    while its local address was INADDR_ANY (or ipv6 equivalent)

    Sequence is :
    - autobind() : choose a random local port, insert socket in hash tables
    [while local address is INADDR_ANY]
    - connect() : set remote address and port, change local address to IP
    given by a route lookup.

    When an incoming UDP frame comes, if more than 10 sockets are found in
    primary hash table, we switch to secondary table, and fail to find
    socket because its local address changed.

    One solution to this problem is to rehash datagram socket if needed.

    We add a new rehash(struct socket *) method in "struct proto", and
    implement this method for UDP v4 & v6, using a common helper.

    This rehashing only takes care of secondary hash table, since primary
    hash (based on local port only) is not changed.

    Reported-by: Krzysztof Piotr Oledzki
    Signed-off-by: Eric Dumazet
    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Jul, 2010

1 commit


11 Jun, 2010

1 commit


19 Oct, 2009

1 commit

  • In order to have better cache layouts of struct sock (separate zones
    for rx/tx paths), we need this preliminary patch.

    Goal is to transfert fields used at lookup time in the first
    read-mostly cache line (inside struct sock_common) and move sk_refcnt
    to a separate cache line (only written by rx path)

    This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
    sport and id fields. This allows a future patch to define these
    fields as macros, like sk_refcnt, without name clashes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Jul, 2008

1 commit


29 Jan, 2008

1 commit


04 Jun, 2007

1 commit


11 Feb, 2007

1 commit


09 Feb, 2007

1 commit


29 Sep, 2006

1 commit


01 Jul, 2006

1 commit


30 Aug, 2005

2 commits

  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Lots of places just needs the states, not even linux/tcp.h, where this
    enum was, needs it.

    This speeds up development of the refactorings as less sources are
    rebuilt when things get moved from net/tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds