31 Aug, 2022

3 commits

  • commit 3ef3905aa3b5b3e222ee6eb0210bfd999417a8cc upstream.

    Got crash when doing pressure test of mptcp:

    ===========================================================================
    dst_release: dst:ffffa06ce6e5c058 refcnt:-1
    kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
    BUG: unable to handle kernel paging request at ffffa06ce6e5c058
    PGD 190a01067 P4D 190a01067 PUD 43fffb067 PMD 22e403063 PTE 8000000226e5c063
    Oops: 0011 [#1] SMP PTI
    CPU: 7 PID: 7823 Comm: kworker/7:0 Kdump: loaded Tainted: G E
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.2.1 04/01/2014
    Call Trace:
    ? skb_release_head_state+0x68/0x100
    ? skb_release_all+0xe/0x30
    ? kfree_skb+0x32/0xa0
    ? mptcp_sendmsg_frag+0x57e/0x750
    ? __mptcp_retrans+0x21b/0x3c0
    ? __switch_to_asm+0x35/0x70
    ? mptcp_worker+0x25e/0x320
    ? process_one_work+0x1a7/0x360
    ? worker_thread+0x30/0x390
    ? create_worker+0x1a0/0x1a0
    ? kthread+0x112/0x130
    ? kthread_flush_work_fn+0x10/0x10
    ? ret_from_fork+0x35/0x40
    ===========================================================================

    In __mptcp_alloc_tx_skb skb was allocated and skb->tcp_tsorted_anchor will
    be initialized, in under memory pressure situation sk_wmem_schedule will
    return false and then kfree_skb. In this case skb->_skb_refdst is not null
    because_skb_refdst and tcp_tsorted_anchor are stored in the same mem, and
    kfree_skb will try to release dst and cause crash.

    Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache")
    Reviewed-by: Paolo Abeni
    Signed-off-by: Yonglong Li
    Signed-off-by: Mat Martineau
    Link: https://lore.kernel.org/r/20220317220953.426024-1-mathew.j.martineau@linux.intel.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Yonglong Li
     
  • [ Upstream commit 657b991afb89d25fe6c4783b1b75a8ad4563670d ]

    While reading sysctl_max_skb_frags, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit f70cad1085d1e01d3ec73c1078405f906237feee ]

    We want to revert the skb TX cache, but MPTCP is currently
    using it unconditionally.

    Rework the MPTCP tx code, so that tcp_tx_skb_cache is not
    needed anymore: do the whole coalescing check, skb allocation
    skb initialization/update inside mptcp_sendmsg_frag(), quite
    alike the current TCP code.

    Reviewed-by: Mat Martineau
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     

03 Aug, 2022

2 commits

  • [ Upstream commit 02739545951ad4c1215160db7fbf9b7a918d3c0b ]

    While reading these sysctl variables, they can be changed concurrently.
    Thus, we need to add READ_ONCE() to their readers.

    - .sysctl_rmem
    - .sysctl_rwmem
    - .sysctl_rmem_offset
    - .sysctl_wmem_offset
    - sysctl_tcp_rmem[1, 2]
    - sysctl_tcp_wmem[1, 2]
    - sysctl_decnet_rmem[1]
    - sysctl_decnet_wmem[1]
    - sysctl_tipc_rmem[1]

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • commit 780476488844e070580bfc9e3bc7832ec1cea883 upstream.

    While reading sysctl_tcp_moderate_rcvbuf, it can be changed
    concurrently. Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Kuniyuki Iwashima
     

09 Jun, 2022

1 commit

  • [ Upstream commit 0e203c324752e13d22624ab7ffafe934fa06ab50 ]

    Similar to the previous patch, for priority changes
    requested by the local PM.

    Reported-and-suggested-by: Davide Caratti
    Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     

25 May, 2022

4 commits

  • commit ae66fb2ba6c3dcaf8b9612b65aa949a1a4bed150 upstream.

    RFC 8684 section 3.7 describes several opportunities for a MPTCP
    connection to "fall back" to regular TCP early in the connection
    process, before it has been confirmed that MPTCP options can be
    successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
    acknowledges the first received data packet with a regular TCP header
    (no MPTCP options), fallback is allowed.

    If the recipient of that first data packet finds a MPTCP DSS checksum
    error, this provides an opportunity to fail gracefully with a TCP
    fallback rather than resetting the connection (as might happen if a
    checksum failure were detected later).

    This commit modifies the checksum failure code to attempt fallback on
    the initial subflow of a MPTCP connection, only if it's a failure in the
    first data mapping. In cases where the peer initiates the connection,
    requests checksums, is the first to send data, and the peer is sending
    incorrect checksums (see
    https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
    the connection to proceed as TCP rather than reset.

    Fixes: dd8bcd1768ff ("mptcp: validate the data checksum")
    Acked-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    [mathew.j.martineau: backport: Resolved bitfield conflict in protocol.h]
    Signed-off-by: Mat Martineau
    Signed-off-by: Greg Kroah-Hartman

    Mat Martineau
     
  • [ Upstream commit ba2c89e0ea74a904d5231643245753d77422e7f5 ]

    The MPTCP code typecasts the checksum value to u16 and
    then converts it to big endian while storing the value into
    the MPTCP option.

    As a result, the wire encoding for little endian host is
    wrong, and that causes interoperabilty interoperability
    issues with other implementation or host with different endianness.

    Address the issue writing in the packet the unmodified __sum16 value.

    MPTCP checksum is disabled by default, interoperating with systems
    with bad mptcp-level csum encoding should cause fallback to TCP.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275
    Fixes: c5b39e26d003 ("mptcp: send out checksum for DSS")
    Fixes: 390b95a5fb84 ("mptcp: receive checksum for DSS")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     
  • [ Upstream commit 8401e87f5a36d370cbf1e9d4ba602a553ce9324a ]

    This patch reused __mptcp_make_csum() in validate_data_csum() instead of
    open-coding.

    Signed-off-by: Geliang Tang
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Geliang Tang
     
  • [ Upstream commit c312ee219100e86143a1d3cc10b367bc43a0e0b8 ]

    This patch changed the type of the last parameter of __mptcp_make_csum()
    from __sum16 to __wsum. And export this function in protocol.h.

    Signed-off-by: Geliang Tang
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Geliang Tang
     

09 Mar, 2022

1 commit

  • commit 877d11f0332cd2160e19e3313e262754c321fa36 upstream.

    Syzkaller with UBSAN uncovered a scenario where a large number of
    DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
    timeout calculation:

    ================================================================================
    UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    Workqueue: events mptcp_worker
    Call Trace:

    __dump_stack lib/dump_stack.c:88 [inline]
    dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
    ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
    __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
    mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
    __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
    mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
    process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
    worker_thread+0x95/0xe10 kernel/workqueue.c:2454
    kthread+0x2f4/0x3b0 kernel/kthread.c:377
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

    ================================================================================

    This change limits the maximum timeout by limiting the size of the
    shift, which keeps all intermediate values in-bounds.

    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
    Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN")
    Acked-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Mat Martineau
     

02 Mar, 2022

2 commits

  • commit f73c1194634506ab60af0debef04671fc431a435 upstream.

    The MPTCP in kernel path manager has some constraints on incoming
    addresses announce processing, so that in edge scenarios it can
    end-up dropping (ignoring) some of such announces.

    The above is not very limiting in practice since such scenarios are
    very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.

    This patch adds a few MIB counters to account for such drop events
    to allow easier introspection of the critical scenarios.

    Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr")
    Reviewed-by: Matthieu Baerts
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • commit 837cf45df163a3780bc04b555700231e95b31dc9 upstream.

    If an MPTCP endpoint received multiple consecutive incoming
    ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
    the current remote address value after the PM lock is released
    in mptcp_pm_nl_add_addr_received() and before such address
    is echoed.

    Fix the issue caching the remote address value a little earlier
    and always using the cached value after releasing the PM lock.

    Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr")
    Reviewed-by: Matthieu Baerts
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

16 Feb, 2022

1 commit

  • [ Upstream commit 029744cd4bc6e9eb3bd833b4a033348296d34645 ]

    This change updates mptcp_pm_nl_create_listen_socket() to create
    listening sockets bound to IPv6 addresses (where IPv6 is supported).

    Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
    Acked-by: Geliang Tang
    Signed-off-by: Kishen Maloor
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Kishen Maloor
     

09 Feb, 2022

1 commit

  • commit 8e9eacad7ec7a9cbf262649ebf1fa6e6f6cc7d82 upstream.

    The MPTCP endpoint list is under RCU protection, guarded by the
    pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
    without acquiring the spin-lock nor under the RCU critical section.

    This change addresses the issue performing the lookup and the endpoint
    update under the pernet spinlock.

    [The upstream commit had to handle a lookup_by_id variable that is only
    present in 5.17. This version of the patch removes that variable, so
    the __lookup_addr() function only handles the lookup as it is
    implemented in 5.15 and 5.16. It also removes one 'const' keyword to
    prevent a warning due to differing const-ness in the 5.17 version of
    addresses_equal().]

    Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

27 Jan, 2022

3 commits

  • [ Upstream commit 110b6d1fe98fd7af9893992459b651594d789293 ]

    'ptr += 1;' was omitted in the original code.

    If the DSS is the last option -- which is what we have most of the
    time -- that's not an issue. But it is if we need to send something else
    after like a RM_ADDR or an MP_PRIO.

    Fixes: 1bff1e43a30e ("mptcp: optimize out option generation")
    Reviewed-by: Matthieu Baerts
    Signed-off-by: Geliang Tang
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Geliang Tang
     
  • [ Upstream commit 04fac2cae9422a3401c172571afbcfdd58fa5c7e ]

    When these two options had to be sent -- which is not common -- the DSS
    size was not being taken into account in the remaining size.

    Additionally in this situation, the reported size was only the one of
    the MP_FAIL which can cause issue if at the end, we need to write more
    in the TCP options than previously said.

    Here we use a dedicated variable for MP_FAIL size to keep the
    WARN_ON_ONCE() just after.

    Fixes: c25aeb4e0953 ("mptcp: MP_FAIL suboption sending")
    Acked-and-tested-by: Geliang Tang
    Signed-off-by: Matthieu Baerts
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Matthieu Baerts
     
  • [ Upstream commit f7d6a237d7422809d458d754016de2844017cb4d ]

    Since full-mesh endpoint support, the reception of a single ADD_ADDR
    option can cause multiple subflows creation. When such option is
    accepted we increment 'add_addr_accepted' by one. When we received
    a paired RM_ADDR option, we deleted all the relevant subflows,
    decrementing 'add_addr_accepted' by one for each of them.

    We have a similar issue for 'local_addr_used'

    Fix them moving the pm endpoint accounting outside the subflow
    traversal.

    Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     

22 Dec, 2021

4 commits

  • [ Upstream commit 3d79e3756ca90f7a6087b77b62c1d9c0801e0820 ]

    __mptcp_push_pending() may call mptcp_flush_join_list() with subflow
    socket lock held. If such call hits mptcp_sockopt_sync_all() then
    subsequently __mptcp_sockopt_sync() could try to lock the subflow
    socket for itself, causing a deadlock.

    sysrq: Show Blocked State
    task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000
    Call Trace:

    __schedule+0x2d6/0x10c0
    ? __mod_memcg_state+0x4d/0x70
    ? csum_partial+0xd/0x20
    ? _raw_spin_lock_irqsave+0x26/0x50
    schedule+0x4e/0xc0
    __lock_sock+0x69/0x90
    ? do_wait_intr_irq+0xa0/0xa0
    __lock_sock_fast+0x35/0x50
    mptcp_sockopt_sync_all+0x38/0xc0
    __mptcp_push_pending+0x105/0x200
    mptcp_sendmsg+0x466/0x490
    sock_sendmsg+0x57/0x60
    __sys_sendto+0xf0/0x160
    ? do_wait_intr_irq+0xa0/0xa0
    ? fpregs_restore_userregs+0x12/0xd0
    __x64_sys_sendto+0x20/0x30
    do_syscall_64+0x38/0x90
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f9ba546c2d0
    RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
    RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
    RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
    R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8

    Fix the issue by using __mptcp_flush_join_list() instead of plain
    mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
    Florian. The sockopt sync will be deferred to the workqueue.

    Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244
    Suggested-by: Florian Westphal
    Reviewed-by: Florian Westphal
    Signed-off-by: Maxim Galaganov
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Maxim Galaganov
     
  • [ Upstream commit d6692b3b97bdc165d150f4c1505751a323a80717 ]

    The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
    It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
    working for plain tcp sockets (any userspace-exposed socket).

    But in case of fallback, accept() can return a plain tcp sk.
    In such case, sk is still tagged as 'kernel' and setsockopt will work.

    This will crash the kernel, The subflow extension has a NULL ctx->conn
    mptcp socket:

    BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
    Call Trace:
    tcp_data_ready+0xf8/0x370
    [..]

    Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
    Signed-off-by: Florian Westphal
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit 404cd9a22150f24acf23a8df2ad0c094ba379f57 ]

    TCP_ULP setsockopt cannot be used for mptcp because its already
    used internally to plumb subflow (tcp) sockets to the mptcp layer.

    syzbot managed to trigger a crash for mptcp connections that are
    in fallback mode:

    KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
    CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
    RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
    [..]
    __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
    tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
    do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
    mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638

    Remove support for TCP_ULP setsockopt.

    Fixes: d9e4c1291810 ("mptcp: only admit explicitly supported sockopt")
    Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Florian Westphal
     
  • [ Upstream commit b0cdc5dbcf2ba0d99785da5aabf1b17943805b8a ]

    Currently, when deleting an endpoint the netlink PM treverses
    all the local MPTCP sockets, regardless of their status.

    If an MPTCP listener socket is bound to the IP matching the
    delete endpoint, the listener TCP socket will be closed.
    That is unexpected, the PM should only affect data subflows.

    Additionally, syzbot was able to trigger a NULL ptr dereference
    due to the above:

    general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
    CPU: 1 PID: 6550 Comm: syz-executor122 Not tainted 5.16.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__lock_acquire+0xd7d/0x54a0 kernel/locking/lockdep.c:4897
    Code: 0f 0e 41 be 01 00 00 00 0f 86 c8 00 00 00 89 05 69 cc 0f 0e e9 bd 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 3c 02 00 0f 85 f3 2f 00 00 48 81 3b 20 75 17 8f 0f 84 52 f3 ff
    RSP: 0018:ffffc90001f2f818 EFLAGS: 00010016
    RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000
    RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
    R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000000
    R13: ffff88801b98d700 R14: 0000000000000000 R15: 0000000000000001
    FS: 00007f177cd3d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f177cd1b268 CR3: 000000001dd55000 CR4: 0000000000350ee0
    Call Trace:

    lock_acquire kernel/locking/lockdep.c:5637 [inline]
    lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
    _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
    finish_wait+0xc0/0x270 kernel/sched/wait.c:400
    inet_csk_wait_for_connect net/ipv4/inet_connection_sock.c:464 [inline]
    inet_csk_accept+0x7de/0x9d0 net/ipv4/inet_connection_sock.c:497
    mptcp_accept+0xe5/0x500 net/mptcp/protocol.c:2865
    inet_accept+0xe4/0x7b0 net/ipv4/af_inet.c:739
    mptcp_stream_accept+0x2e7/0x10e0 net/mptcp/protocol.c:3345
    do_accept+0x382/0x510 net/socket.c:1773
    __sys_accept4_file+0x7e/0xe0 net/socket.c:1816
    __sys_accept4+0xb0/0x100 net/socket.c:1846
    __do_sys_accept net/socket.c:1864 [inline]
    __se_sys_accept net/socket.c:1861 [inline]
    __x64_sys_accept+0x71/0xb0 net/socket.c:1861
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f177cd8b8e9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f177cd3d308 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
    RAX: ffffffffffffffda RBX: 00007f177ce13408 RCX: 00007f177cd8b8e9
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00007f177ce13400 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f177ce1340c
    R13: 00007f177cde1004 R14: 6d705f706374706d R15: 0000000000022000

    Fix the issue explicitly skipping MPTCP socket in TCP_LISTEN
    status.

    Reported-and-tested-by: syzbot+e4d843bb96a9431e6331@syzkaller.appspotmail.com
    Reviewed-by: Mat Martineau
    Fixes: 740d798e8767 ("mptcp: remove id 0 address")
    Signed-off-by: Paolo Abeni
    Link: https://lore.kernel.org/r/ebc7594cdd420d241fb2172ddb8542ba64717657.1639238695.git.pabeni@redhat.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Sasha Levin

    Paolo Abeni
     

01 Dec, 2021

2 commits

  • [ Upstream commit bcd97734318d1d87bb237dbc0a60c81237b0ac50 ]

    Scheduling a delack in mptcp_established_options_mp() is
    not a good idea: such function is called by tcp_send_ack() and
    the pending delayed ack will be cleared shortly after by the
    tcp_event_ack_sent() call in __tcp_transmit_skb().

    Instead use the mptcp delegated action infrastructure to
    schedule the delayed ack after the current bh processing completes.

    Additionally moves the schedule_3rdack_retransmission() helper
    into protocol.c to avoid making it visible in a different compilation
    unit.

    Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
    Reviewed-by: Mat Martineau @linux.intel.com>
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Paolo Abeni
     
  • [ Upstream commit ee50e67ba0e17b1a1a8d76691d02eadf9e0f392c ]

    To compute the rtx timeout schedule_3rdack_retransmission() does multiple
    things in the wrong way: srtt_us is measured in usec/8 and the timeout
    itself is an absolute value.

    Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
    Acked-by: Paolo Abeni
    Reviewed-by: Mat Martineau @linux.intel.com>
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     

19 Nov, 2021

1 commit

  • [ Upstream commit 0d199e4363b482badcedba764e2aceab53a4a10a ]

    When recovering after a link failure, snd_nxt should not be set to a
    lower value. Else, update of snd_nxt is broken because:

    msk->snd_nxt += ret; (where ret is number of bytes sent)

    assumes that snd_nxt always moves forward.
    After reduction, its possible that snd_nxt update gets out of sync:
    dfrag we just sent might have had a data sequence number even past
    recovery_snd_nxt.

    This change factors the common msk state update to a helper
    and updates snd_nxt based on the current dfrag data sequence number.

    The conditional is required for the recovery phase where we may
    re-transmit old dfrags that are before current snd_nxt.

    After this change, snd_nxt only moves forward and covers all in-sequence
    data that was transmitted.

    recovery_snd_nxt is retained to detect when recovery has completed.

    Fixes: 1e1d9d6f119c5 ("mptcp: handle pending data on closed subflow")
    Signed-off-by: Florian Westphal
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Florian Westphal
     

28 Oct, 2021

1 commit

  • using packetdrill it's possible to observe that the receiver key contains
    random values when clients transmit MP_CAPABLE with data and checksum (as
    specified in RFC8684 §3.1). Fix the layout of mptcp_out_options, to avoid
    using the skb extension copy when writing the MP_CAPABLE sub-option.

    Fixes: d7b269083786 ("mptcp: shrink mptcp_out_options struct")
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/233
    Reported-by: Poorva Sonparote
    Signed-off-by: Davide Caratti
    Signed-off-by: Mat Martineau
    Link: https://lore.kernel.org/r/20211027203855.264600-1-mathew.j.martineau@linux.intel.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

08 Oct, 2021

1 commit

  • recvmsg() can enter an infinite loop if the caller provides the
    MSG_WAITALL, the data present in the receive queue is not sufficient to
    fulfill the request, and no more data is received by the peer.

    When the above happens, mptcp_wait_data() will always return with
    no wait, as the MPTCP_DATA_READY flag checked by such function is
    set and never cleared in such code path.

    Leveraging the above syzbot was able to trigger an RCU stall:

    rcu: INFO: rcu_preempt self-detected stall on CPU
    rcu: 0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1
    (t=10500 jiffies g=13089 q=109)
    rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
    rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
    rcu: RCU grace-period kthread stack dump:
    task:rcu_preempt state:R running task stack:28696 pid: 14 ppid: 2 flags:0x00004000
    Call Trace:
    context_switch kernel/sched/core.c:4955 [inline]
    __schedule+0x940/0x26f0 kernel/sched/core.c:6236
    schedule+0xd3/0x270 kernel/sched/core.c:6315
    schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
    rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
    rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
    kthread+0x405/0x4f0 kernel/kthread.c:327
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
    rcu: Stack dump where RCU GP kthread last ran:
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
    RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
    RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
    RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
    RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
    RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
    Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
    RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
    RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
    RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
    RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
    R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
    R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
    FS: 0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
    S: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
    test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
    mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
    release_sock+0xb4/0x1b0 net/core/sock.c:3204
    mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
    mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
    inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
    sock_recvmsg_nosec net/socket.c:944 [inline]
    ____sys_recvmsg+0x527/0x600 net/socket.c:2626
    ___sys_recvmsg+0x127/0x200 net/socket.c:2670
    do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
    __sys_recvmmsg net/socket.c:2843 [inline]
    __do_sys_recvmmsg net/socket.c:2866 [inline]
    __se_sys_recvmmsg net/socket.c:2859 [inline]
    __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fc200d2dc39
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
    RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
    R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
    R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003

    Fix the issue by replacing the MPTCP_DATA_READY bit with direct
    inspection of the msk receive queue.

    Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
    Fixes: 7a6a6cbc3e59 ("mptcp: recvmsg() can drain data from multiple subflow")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     

30 Sep, 2021

1 commit

  • Syzkaller reported a false positive deadlock involving
    the nl socket lock and the subflow socket lock:

    MPTCP: kernel_bind error, err=-98
    ============================================
    WARNING: possible recursive locking detected
    5.15.0-rc1-syzkaller #0 Not tainted
    --------------------------------------------
    syz-executor998/6520 is trying to acquire lock:
    ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738

    but task is already holding lock:
    ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
    ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(k-sk_lock-AF_INET);
    lock(k-sk_lock-AF_INET);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    3 locks held by syz-executor998/6520:
    #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
    #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
    #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
    #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
    #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720

    stack backtrace:
    CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:88 [inline]
    dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
    print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
    check_deadlock kernel/locking/lockdep.c:2987 [inline]
    validate_chain kernel/locking/lockdep.c:3776 [inline]
    __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
    lock_acquire kernel/locking/lockdep.c:5625 [inline]
    lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
    lock_sock_fast+0x36/0x100 net/core/sock.c:3229
    mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
    inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
    __sock_release net/socket.c:649 [inline]
    sock_release+0x87/0x1b0 net/socket.c:677
    mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
    mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
    genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
    genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
    genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
    netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
    genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
    netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
    netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
    sock_sendmsg_nosec net/socket.c:704 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:724
    sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
    kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
    kernel_sendpage net/socket.c:3501 [inline]
    sock_sendpage+0xe5/0x140 net/socket.c:1003
    pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
    splice_from_pipe_feed fs/splice.c:418 [inline]
    __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
    splice_from_pipe fs/splice.c:597 [inline]
    generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
    do_splice_from fs/splice.c:767 [inline]
    direct_splice_actor+0x110/0x180 fs/splice.c:936
    splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
    do_splice_direct+0x1b3/0x280 fs/splice.c:979
    do_sendfile+0xae9/0x1240 fs/read_write.c:1249
    __do_sys_sendfile64 fs/read_write.c:1314 [inline]
    __se_sys_sendfile64 fs/read_write.c:1300 [inline]
    __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f215cb69969
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
    RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
    RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
    R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
    R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000

    the problem originates from uncorrect lock annotation in the mptcp
    code and is only visible since commit 2dcb96bacce3 ("net: core: Correct
    the sock::sk_lock.owned lockdep annotations"), but is present since
    the port-based endpoint support initial implementation.

    This patch addresses the issue introducing a nested variant of
    lock_sock_fast() and using it in the relevant code path.

    Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
    Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
    Suggested-by: Thomas Gleixner
    Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
    Signed-off-by: Paolo Abeni
    Reviewed-by: Thomas Gleixner
    Signed-off-by: David S. Miller

    Paolo Abeni
     

24 Sep, 2021

2 commits

  • current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e.
    using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that
    matches the endpoint address. There is no reason for that, so we can just
    ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints
    can reconfigure their 'backup' flag even if no MPTCP sockets are open (or
    more generally, in case the MP_PRIO message is not sent out).

    Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
    Signed-off-by: Davide Caratti
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • mptcp_token_get_sock() may return a mptcp socket that is in
    a different net namespace than the socket that received the token value.

    The mptcp syncookie code path had an explicit check for this,
    this moves the test into mptcp_token_get_sock() function.

    Eventually token.c should be converted to pernet storage, but
    such change is not suitable for net tree.

    Fixes: 2c5ebd001d4f0 ("mptcp: refactor token container")
    Signed-off-by: Florian Westphal
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Florian Westphal
     

22 Sep, 2021

1 commit

  • Due to signed/unsigned comparison, the expression:

    info->size_goal - skb->len > 0

    evaluates to true when the size goal is smaller than the
    skb size. That results in lack of tx cache refill, so that
    the skb allocated by the core TCP code lacks the required
    MPTCP skb extensions.

    Due to the above, syzbot is able to trigger the following WARN_ON():

    WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
    Modules linked in:
    CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
    Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
    RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
    RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
    RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
    RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
    R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
    R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
    FS: 00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
    mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
    release_sock+0xb4/0x1b0 net/core/sock.c:3206
    sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
    mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
    inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
    sock_sendmsg_nosec net/socket.c:704 [inline]
    sock_sendmsg+0xcf/0x120 net/socket.c:724
    sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
    call_write_iter include/linux/fs.h:2163 [inline]
    new_sync_write+0x40b/0x640 fs/read_write.c:507
    vfs_write+0x7cf/0xae0 fs/read_write.c:594
    ksys_write+0x1ee/0x250 fs/read_write.c:647
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x4665f9
    Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
    RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
    RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
    R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000

    Fix the issue rewriting the relevant expression to avoid
    sign-related problems - note: size_goal is always >= 0.

    Additionally, ensure that the skb in the tx cache always carries
    the relevant extension.

    Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
    Fixes: 1094c6fe7280 ("mptcp: fix possible divide by zero")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

03 Sep, 2021

1 commit

  • Recent changes exposed a bug where specifically-timed requests to the
    path manager netlink API could trigger a divide-by-zero in
    __tcp_select_window(), as syzkaller does:

    divide error: 0000 [#1] SMP KASAN NOPTI
    CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016
    Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea
    RSP: 0018:ffff888031ccf020 EFLAGS: 00010216
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000
    RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002
    RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000
    R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000
    R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000
    FS: 00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0
    PKRU: 55555554
    Call Trace:
    tcp_select_window net/ipv4/tcp_output.c:264 [inline]
    __tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351
    __tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972
    __tcp_send_ack net/ipv4/tcp_output.c:3978 [inline]
    tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978
    mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654
    mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58
    mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328
    mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359
    genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
    genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
    genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
    netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
    genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
    netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
    netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
    netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
    sock_sendmsg_nosec net/socket.c:704 [inline]
    sock_sendmsg+0x14e/0x190 net/socket.c:724
    ____sys_sendmsg+0x709/0x870 net/socket.c:2403
    ___sys_sendmsg+0xff/0x170 net/socket.c:2457
    __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the
    first subflow in the MPTCP socket's connection list without validating
    that the subflow was in a suitable connection state. To address this,
    always validate subflow state when sending extra ACKs on subflows
    for address advertisement or subflow priority change.

    Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229
    Co-developed-by: Paolo Abeni
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Acked-by: Geliang Tang
    Signed-off-by: David S. Miller

    Mat Martineau
     

02 Sep, 2021

1 commit

  • Florian noted that if mptcp_alloc_tx_skb() allocation fails
    in __mptcp_push_pending(), we can end-up invoking
    mptcp_push_release()/tcp_push() with a zero mss, causing
    a divide by 0 error.

    This change addresses the issue refactoring the skb allocation
    code checking if skb collapsing will happen for sure and doing
    the skb allocation only after such check. Skb allocation will
    now happen only after the call to tcp_send_mss() which
    correctly initializes mss_now.

    As side bonuses we now fill the skb tx cache only when needed,
    and this also clean-up a bit the output path.

    v1 -> v2:
    - use lockdep_assert_held_once() - Jakub
    - fix indentation - Jakub

    Reported-by: Florian Westphal
    Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context")
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: Jakub Kicinski

    Paolo Abeni
     

01 Sep, 2021

1 commit

  • Fix the following coccicheck warning:
    ./net/mptcp/protocol.h:36:50-73: duplicated argument to & or |

    The OPTION_MPTCP_MPJ_SYNACK here is duplicate.
    Here should be OPTION_MPTCP_MPJ_ACK.

    Fixes: 74c7dfbee3e18 ("mptcp: consolidate in_opt sub-options fields in a bitmask")
    Signed-off-by: Wan Jiabing
    Acked-by: Paolo Abeni
    Reviewed-by: Matthieu Baerts
    Signed-off-by: David S. Miller

    Wan Jiabing
     

27 Aug, 2021

5 commits

  • Florian noted the locking schema used by __mptcp_push_pending()
    is hard to follow, let's add some more descriptive comments
    and drop an unneeded and confusing check.

    Suggested-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Most MPTCP packets carries a single MPTCP subption: the
    DSS containing the mapping for the current packet.

    Check explicitly for the above, so that is such scenario we
    replace most conditional statements with a single likely() one.

    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • This makes input options processing more consistent with
    output ones and will simplify the next patch.

    Also avoid clearing the suboption field after processing
    it, since it's not needed.

    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • This change reorder the mptcp_options_received fields
    to shrink the structure a bit and to ensure the most
    frequently used fields are all in the first cacheline.

    Sub-opt specific flags are moved out of the suboptions area,
    and we must now explicitly set them when the relevant
    suboption is parsed.

    There is a notable exception: 'csum_reqd' is used by both DSS
    and MPC suboptions, and keeping such field in the suboptions
    flag area will simplfy the next patch.

    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Should be set only if the ingress packets present it, otherwise
    we can confuse csum validation.

    Signed-off-by: Paolo Abeni
    Signed-off-by: Mat Martineau
    Signed-off-by: David S. Miller

    Paolo Abeni
     

25 Aug, 2021

1 commit