18 Feb, 2017

1 commit

  • commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.

    Unlike ipv4, this control socket is shared by all cpus so we cannot use
    it as scratchpad area to annotate the mark that we pass to ip6_xmit().

    Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
    family caches the flowi6 structure in the sctp_transport structure, so
    we cannot use to carry the mark unless we later on reset it back, which
    I discarded since it looks ugly to me.

    Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
    Suggested-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira
     

30 Nov, 2016

1 commit


04 Nov, 2016

5 commits

  • While fuzzing kernel with syzkaller, Andrey reported a nasty crash
    in inet6_bind() caused by DCCP lacking a required method.

    Fixes: ab1e0a13d7029 ("[SOCK] proto: Add hashinfo member to struct proto")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Cc: Arnaldo Carvalho de Melo
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • dccp_v6_err() does not use pskb_may_pull() and might access garbage.

    We only need 4 bytes at the beginning of the DCCP header, like TCP,
    so the 8 bytes pulled in icmpv6_notify() are more than enough.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • dccp_v4_err() does not use pskb_may_pull() and might access garbage.

    We only need 4 bytes at the beginning of the DCCP header, like TCP,
    so the 8 bytes pulled in icmp_socket_deliver() are more than enough.

    This patch might allow to process more ICMP messages, as some routers
    are still limiting the size of reflected bytes to 28 (RFC 792), instead
    of extended lengths (RFC 1812 4.3.2.3)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Andrey reported following warning while fuzzing with syzkaller

    WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
    ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
    0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
    Call Trace:
    [< inline >] __dump_stack lib/dump_stack.c:15
    [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
    [] panic+0x1bc/0x39d kernel/panic.c:179
    [] __warn+0x1cc/0x1f0 kernel/panic.c:542
    [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
    [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
    [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
    [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
    [] sock_release+0x8e/0x1d0 net/socket.c:570
    [] sock_close+0x16/0x20 net/socket.c:1017
    [] __fput+0x29d/0x720 fs/file_table.c:208
    [] ____fput+0x15/0x20 fs/file_table.c:244
    [] task_work_run+0xf8/0x170 kernel/task_work.c:116
    [< inline >] exit_task_work include/linux/task_work.h:21
    [] do_exit+0x883/0x2ac0 kernel/exit.c:828
    [] do_group_exit+0x10e/0x340 kernel/exit.c:931
    [] get_signal+0x634/0x15a0 kernel/signal.c:2307
    [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
    [] exit_to_usermode_loop+0xe5/0x130
    arch/x86/entry/common.c:156
    [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
    [] syscall_return_slowpath+0x1a8/0x1e0
    arch/x86/entry/common.c:259
    [] entry_SYSCALL_64_fastpath+0xc0/0xc2
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Kernel Offset: disabled

    Fix this the same way we did for TCP in commit 565b7b2d2e63
    ("tcp: do not send reset to already closed sockets")

    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Andrey Konovalov reported following error while fuzzing with syzkaller :

    IPv4: Attempt to release alive inet socket ffff880068e98940
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Modules linked in:
    CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff88006b9e0000 task.stack: ffff880068770000
    RIP: 0010:[] []
    selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
    RSP: 0018:ffff8800687771c8 EFLAGS: 00010202
    RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
    RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
    RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
    R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
    R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
    FS: 00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
    Stack:
    ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
    ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
    ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
    Call Trace:
    [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
    [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
    [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
    [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
    [] ip_local_deliver_finish+0x332/0xad0
    net/ipv4/ip_input.c:216
    [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
    [< inline >] NF_HOOK ./include/linux/netfilter.h:255
    [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
    [< inline >] dst_input ./include/net/dst.h:507
    [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
    [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
    [< inline >] NF_HOOK ./include/linux/netfilter.h:255
    [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
    [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
    [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
    [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
    [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
    [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
    [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
    [< inline >] new_sync_write fs/read_write.c:499
    [] __vfs_write+0x334/0x570 fs/read_write.c:512
    [] vfs_write+0x17b/0x500 fs/read_write.c:560
    [< inline >] SYSC_write fs/read_write.c:607
    [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
    [] entry_SYSCALL_64_fastpath+0x1f/0xc2

    It turns out DCCP calls __sk_receive_skb(), and this broke when
    lookups no longer took a reference on listeners.

    Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
    so that sock_put() is used only when needed.

    Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Jul, 2016

1 commit

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - TPM core and driver updates/fixes
    - IPv6 security labeling (CALIPSO)
    - Lots of Apparmor fixes
    - Seccomp: remove 2-phase API, close hole where ptrace can change
    syscall #"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
    apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
    tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
    tpm: Factor out common startup code
    tpm: use devm_add_action_or_reset
    tpm2_i2c_nuvoton: add irq validity check
    tpm: read burstcount from TPM_STS in one 32-bit transaction
    tpm: fix byte-order for the value read by tpm2_get_tpm_pt
    tpm_tis_core: convert max timeouts from msec to jiffies
    apparmor: fix arg_size computation for when setprocattr is null terminated
    apparmor: fix oops, validate buffer size in apparmor_setprocattr()
    apparmor: do not expose kernel stack
    apparmor: fix module parameters can be changed after policy is locked
    apparmor: fix oops in profile_unpack() when policy_db is not present
    apparmor: don't check for vmalloc_addr if kvzalloc() failed
    apparmor: add missing id bounds check on dfa verification
    apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
    apparmor: use list_next_entry instead of list_entry_next
    apparmor: fix refcount race when finding a child profile
    apparmor: fix ref count leak when profile sha1 hash is read
    apparmor: check that xindex is in trans_table bounds
    ...

    Linus Torvalds
     

14 Jul, 2016

1 commit

  • Dccp verifies packet integrity, including length, at initial rcv in
    dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

    A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
    skb_copy_datagram_msg interprets this as a negative value, so
    (correctly) fails with EFAULT. The negative length is reported in
    ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

    Introduce an sk_receive_skb variant that caps how small a filter
    program can trim packets, and call this in dccp with the header
    length. Excessively trimmed packets are now processed normally and
    queued for reception as 0B payloads.

    Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
    Signed-off-by: Willem de Bruijn
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

10 Jul, 2016

1 commit

  • In the prep work I did before enabling BH while handling socket backlog,
    I missed two points in DCCP :

    1) dccp_v4_ctl_send_reset() uses bh_lock_sock(), assuming BH were
    blocked. It is not anymore always true.

    2) dccp_v4_route_skb() was using __IP_INC_STATS() instead of
    IP_INC_STATS()

    A similar fix was done for TCP, in commit 47dcc20a39d0
    ("ipv4: tcp: ip_send_unicast_reply() is not BH safe")

    Fixes: 7309f8821fd6 ("dccp: do not assume DCCP code is non preemptible")
    Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Jul, 2016

1 commit


28 Jun, 2016

1 commit

  • If set, these will take precedence over the parent's options during
    both sending and child creation. If they're not set, the parent's
    options (if any) will be used.

    This is to allow the security_inet_conn_request() hook to modify the
    IPv6 options in just the same way that it already may do for IPv4.

    Signed-off-by: Huw Davies
    Signed-off-by: Paul Moore

    Huw Davies
     

03 May, 2016

1 commit


28 Apr, 2016

6 commits


08 Apr, 2016

1 commit


05 Apr, 2016

1 commit

  • When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
    cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.

    By letting listeners use SOCK_RCU_FREE infrastructure,
    we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

    Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
    only listeners are impacted by this change.

    Peak performance under SYNFLOOD is increased by ~33% :

    On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

    Most consuming functions are now skb_set_owner_w() and sock_wfree()
    contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Mar, 2016

1 commit

  • Now SYN_RECV request sockets are installed in ehash table, an ICMP
    handler can find a request socket while another cpu handles an incoming
    packet transforming this SYN_RECV request socket into an ESTABLISHED
    socket.

    We need to remove the now obsolete WARN_ON(req->sk), since req->sk
    is set when a new child is created and added into listener accept queue.

    If this race happens, the ICMP will do nothing special.

    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Reported-by: Ben Lazarus
    Reported-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Feb, 2016

1 commit


19 Feb, 2016

1 commit

  • Ilya reported following lockdep splat:

    kernel: =========================
    kernel: [ BUG: held lock freed! ]
    kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
    kernel: -------------------------
    kernel: swapper/5/0 is freeing memory
    ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
    kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
    [] inet_csk_reqsk_queue_add+0x28/0xa0
    kernel: 4 locks held by swapper/5/0:
    kernel: #0: (rcu_read_lock){......}, at: []
    netif_receive_skb_internal+0x4b/0x1f0
    kernel: #1: (rcu_read_lock){......}, at: []
    ip_local_deliver_finish+0x3f/0x380
    kernel: #2: (slock-AF_INET){+.-...}, at: []
    sk_clone_lock+0x19b/0x440
    kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at:
    [] inet_csk_reqsk_queue_add+0x28/0xa0

    To properly fix this issue, inet_csk_reqsk_queue_add() needs
    to return to its callers if the child as been queued
    into accept queue.

    We also need to make sure listener is still there before
    calling sk->sk_data_ready(), by holding a reference on it,
    since the reference carried by the child can disappear as
    soon as the child is put on accept queue.

    Reported-by: Ilya Dryomov
    Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Feb, 2016

2 commits

  • This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
    groups. Doing so with a BPF filter will require access to the
    skb in question. This change plumbs the skb (and offset to payload
    data) through the call stack to the listening socket lookup
    implementations where it will be used in a following patch.

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     
  • In order to support fast lookups for TCP sockets with SO_REUSEPORT,
    the function that adds sockets to the listening hash set needs
    to be able to check receive address equality. Since this equality
    check is different for IPv4 and IPv6, we will need two different
    socket hashing functions.

    This patch adds inet6_hash identical to the existing inet_hash function
    and updates the appropriate references. A following patch will
    differentiate the two by passing different comparison functions to
    __inet_hash.

    Additionally, in order to use the IPv6 address equality function from
    inet6_hashtables (which is compiled as a built-in object when IPv6 is
    enabled) it also needs to be in a built-in object file as well. This
    moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     

04 Dec, 2015

2 commits

  • Conflicts:
    drivers/net/ethernet/renesas/ravb_main.c
    kernel/bpf/syscall.c
    net/ipv4/ipmr.c

    All three conflicts were cases of overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • While testing the np->opt RCU conversion, I found that UDP/IPv6 was
    using a mixture of xchg() and sk_dst_lock to protect concurrent changes
    to sk->sk_dst_cache, leading to possible corruptions and crashes.

    ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
    way to fix the mess is to remove sk_dst_lock completely, as we did for
    IPv4.

    __ip6_dst_store() and ip6_dst_store() share same implementation.

    sk_setup_caps() being called with socket lock being held or not,
    we have to use sk_dst_set() instead of __sk_dst_set()

    Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
    in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

    This is because ip6_dst_store() can be called from process context,
    without any lock held.

    As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
    from another cpu doing a concurrent ip6_dst_store()

    Doing the dst dereference before doing the install is needed to make
    sure no use after free would trigger.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Dec, 2015

1 commit

  • This patch addresses multiple problems :

    UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
    while socket is not locked : Other threads can change np->opt
    concurrently. Dmitry posted a syzkaller
    (http://github.com/google/syzkaller) program desmonstrating
    use-after-free.

    Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
    and dccp_v6_request_recv_sock() also need to use RCU protection
    to dereference np->opt once (before calling ipv6_dup_options())

    This patch adds full RCU protection to np->opt

    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Dec, 2015

1 commit

  • This patch is a cleanup to make following patch easier to
    review.

    Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
    from (struct socket)->flags to a (struct socket_wq)->flags
    to benefit from RCU protection in sock_wake_async()

    To ease backports, we rename both constants.

    Two new helpers, sk_set_bit(int nr, struct sock *sk)
    and sk_clear_bit(int net, struct sock *sk) are added so that
    following patch can change their implementation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Dec, 2015

1 commit

  • The memory barrier in the helper wq_has_sleeper is needed by just
    about every user of waitqueue_active. This patch generalises it
    by making it take a wait_queue_head_t directly. The existing
    helper is renamed to skwq_has_sleeper.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

03 Nov, 2015

1 commit

  • IPv6 request sockets store a pointer to skb containing the SYN packet
    to be able to transfer it to full blown socket when 3WHS is done
    (ireq->pktopts -> np->pktoptions)

    As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for
    passive sessions"), we must transfer the skb only if we won the
    hashdance race, if multiple cpus receive the 'ack' packet completing
    3WHS at the same time.

    Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Nov, 2015

1 commit

  • This patch changes the use of struct timespec in
    dccp_probe to use struct timespec64 instead. timespec uses a 32-bit
    seconds field which will overflow in the year 2038 and beyond. timespec64
    uses a 64-bit seconds field. Note that the correctness of the code isn't
    changed, since the original code only uses the timestamps to compute a
    small elapsed interval. This patch is part of a larger attempt to remove
    instances of 32-bit timekeeping structures (timespec, timeval, time_t)
    from the kernel so it is easier to identify where the real 2038 issues
    are.

    Signed-off-by: Tina Ruchandani
    Signed-off-by: David S. Miller

    Tina Ruchandani
     

23 Oct, 2015

1 commit

  • Multiple cpus can process duplicates of incoming ACK messages
    matching a SYN_RECV request socket. This is a rare event under
    normal operations, but definitely can happen.

    Only one must win the race, otherwise corruption would occur.

    To fix this without adding new atomic ops, we use logic in
    inet_ehash_nolisten() to detect the request was present in the same
    ehash bucket where we try to insert the new child.

    If request socket was not found, we have to undo the child creation.

    This actually removes a spin_lock()/spin_unlock() pair in
    reqsk_queue_unlink() for the fast path.

    Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Oct, 2015

2 commits

  • Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
    In many cases we also need to release reference on request socket,
    so add a helper to do this, reducing code size and complexity.

    Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14.

    At the time of above commit, tcp_req_err() and dccp_req_err()
    were dead code, as SYN_RECV request sockets were not yet in ehash table.

    Real bug was fixed later in a different commit.

    We need to revert to not leak a refcount on request socket.

    inet_csk_reqsk_queue_drop_and_put() will be added
    in following commit to make clean inet_csk_reqsk_queue_drop()
    does not release the reference owned by caller.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Oct, 2015

1 commit

  • When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
    become stale, meaning 3WHS can not complete.

    But current behavior is wrong :
    incoming packets finding such stale sockets are dropped.

    We need instead to cleanup the request socket and perform another
    lookup :
    - Incoming ACK will give a RST answer,
    - SYN rtx might find another listener if available.
    - We expedite cleanup of request sockets and old listener socket.

    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2015

1 commit

  • This patch makes dccp_bad_service_code return bool due to these
    particular functions only using either one or zero as their return
    value.

    dccp_list_has_service is also been made return bool in this patchset.

    No functional change.

    Signed-off-by: Yaowei Bai
    Signed-off-by: David S. Miller

    Yaowei Bai
     

05 Oct, 2015

1 commit

  • inet_reqsk_alloc() is used to allocate a temporary request
    in order to generate a SYNACK with a cookie. Then later,
    syncookie validation also uses a temporary request.

    These paths already took a reference on listener refcount,
    we can avoid a couple of atomic operations.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Oct, 2015

1 commit

  • In this patch, we insert request sockets into TCP/DCCP
    regular ehash table (where ESTABLISHED and TIMEWAIT sockets
    are) instead of using the per listener hash table.

    ACK packets find SYN_RECV pseudo sockets without having
    to find and lock the listener.

    In nominal conditions, this halves pressure on listener lock.

    Note that this will allow for SO_REUSEPORT refinements,
    so that we can select a listener using cpu/numa affinities instead
    of the prior 'consistent hash', since only SYN packets will
    apply this selection logic.

    We will shrink listen_sock in the following patch to ease
    code review.

    Signed-off-by: Eric Dumazet
    Cc: Ying Cai
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet