16 Oct, 2020
1 commit
-
Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.In both cases code was added on both sides in the same place
so just keep both.Signed-off-by: Jakub Kicinski
11 Oct, 2020
2 commits
-
The msk can close MP_JOIN subflows if the initial handshake
fails. Currently such subflows are kept alive in the
conn_list until the msk itself is closed.Beyond the wasted memory, we could end-up sending the
DATA_FIN and the DATA_FIN ack on such socket, even after a
reset.Fixes: 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine")
Reviewed-by: Mat Martineau
Signed-off-by: Paolo Abeni
Signed-off-by: Jakub Kicinski -
Additional/MP_JOIN subflows that do not pass some initial handshake
tests currently causes fallback to TCP. That is an RFC violation:
we should instead reset the subflow and leave the the msk untouched.Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/91
Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Mat Martineau
Signed-off-by: Paolo Abeni
Signed-off-by: Jakub Kicinski
09 Oct, 2020
1 commit
-
using packetdrill it's possible to observe the same MPTCP DSN being acked
by different subflows with DACK4 and DACK8. This is in contrast with what
specified in RFC8684 §3.3.2: if an MPTCP endpoint transmits a 64-bit wide
DSN, it MUST be acknowledged with a 64-bit wide DACK. Fix 'use_64bit_ack'
variable to make it a property of MPTCP sockets, not TCP subflows.Fixes: a0c1d0eafd1e ("mptcp: Use 32-bit DATA_ACK when possible")
Acked-by: Paolo Abeni
Signed-off-by: Davide Caratti
Reviewed-by: Mat Martineau
Signed-off-by: Jakub Kicinski
06 Oct, 2020
1 commit
-
Rejecting non-native endian BTF overlapped with the addition
of support for it.The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.Signed-off-by: David S. Miller
04 Oct, 2020
1 commit
-
The MPTCP ADD_ADDR suboption with echo-flag=1 has no HMAC, the size is
smaller than the one initially sent without echo-flag=1. We then need to
use the correct size everywhere when we need this echo bit.Before this patch, the wrong size was reserved but the correct amount of
bytes were written (and read): the remaining bytes contained garbage.Fixes: 6a6c05a8b016 ("mptcp: send out ADD_ADDR with echo flag")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/95
Reported-and-tested-by: Davide Caratti
Acked-by: Geliang Tang
Signed-off-by: Matthieu Baerts
Signed-off-by: David S. Miller
30 Sep, 2020
1 commit
-
The peer may send a DATA_FIN mapping with either a 32-bit or 64-bit
sequence number. When a 32-bit sequence number is received for the
DATA_FIN, it must be expanded to 64 bits before comparing it to the
last acked sequence number. This expansion was missing.Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/93
Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller
25 Sep, 2020
8 commits
-
This patch implemented the retransmition of ADD_ADDR when no ADD_ADDR echo
is received. It added a timer with the announced address. When timeout
occurs, ADD_ADDR will be retransmitted.Suggested-by: Mat Martineau
Suggested-by: Paolo Abeni
Acked-by: Paolo Abeni
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This patch added a new helper named mptcp_destroy_common containing the
shared code between mptcp_destroy() and mptcp_sock_destruct().Suggested-by: Paolo Abeni
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This patch implemented the local subflow removing function,
mptcp_pm_remove_subflow, it simply called mptcp_pm_nl_rm_subflow_received
under the PM spin lock.We use mptcp_pm_remove_subflow to remove a local subflow, so change it's
argument from remote_id to local_id.We check subflow->local_id in mptcp_pm_nl_rm_subflow_received to remove
a subflow.Suggested-by: Matthieu Baerts
Suggested-by: Paolo Abeni
Suggested-by: Mat Martineau
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This patch implements the remove announced addr and subflow logic in PM
netlink.When the PM netlink removes an address, we traverse all the existing msk
sockets to find the relevant sockets.We add a new list named anno_list in mptcp_pm_data, to record all the
announced addrs. In the traversing, we check if it has been recorded.
If it has been, we trigger the RM_ADDR signal.We also check if this address is in conn_list. If it is, we remove the
subflow which using this local address.Since we call mptcp_pm_free_anno_list in mptcp_destroy, we need to move
__mptcp_init_sock before the mptcp_is_enabled check in mptcp_init_sock.Suggested-by: Matthieu Baerts
Suggested-by: Paolo Abeni
Suggested-by: Mat Martineau
Acked-by: Paolo Abeni
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
When the ADD_ADDR suboption has been received, we need to send out the same
ADD_ADDR suboption with echo-flag=1, and no HMAC.Suggested-by: Mat Martineau
Reviewed-by: Mat Martineau
Signed-off-by: Geliang Tang
Signed-off-by: David S. Miller -
This patch added the RM_ADDR option parsing logic:
We parsed the incoming options to find if the rm_addr option is received,
and called mptcp_pm_rm_addr_received to schedule PM work to a new status,
named MPTCP_PM_RM_ADDR_RECEIVED.PM work got this status, and called mptcp_pm_nl_rm_addr_received to handle
it.In mptcp_pm_nl_rm_addr_received, we closed the subflow matching the rm_id,
and updated PM counter.Suggested-by: Matthieu Baerts
Suggested-by: Paolo Abeni
Suggested-by: Mat Martineau
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This patch added a new signal named rm_addr_signal in PM. On outgoing path,
we called mptcp_pm_should_rm_signal to check if rm_addr_signal has been
set. If it has been, we sent out the RM_ADDR option.Suggested-by: Matthieu Baerts
Suggested-by: Paolo Abeni
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This patch renamed addr_signal and the related functions with the explicit
word "add".Suggested-by: Matthieu Baerts
Suggested-by: Paolo Abeni
Signed-off-by: Geliang Tang
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
15 Sep, 2020
5 commits
-
Update the scheduler to less trivial heuristic: cache
the last used subflow, and try to send on it a reasonably
long burst of data.When the burst or the subflow send space is exhausted, pick
the subflow with the lower ratio between write space and
send buffer - that is, the subflow with the greater relative
amount of free space.v1 -> v2:
- fix 32 bit build breakage due to 64bits div
- fix checkpath issues (uint64_t -> u64)Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
So that can be accessed easily from the subflow creation
helper. No functional change intended.Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
There is no need to use the tcp_read_sock(), we can
simply drop the skb. Additionally try to look at the
next buffer for in order data.This both simplifies the code and avoid unneeded indirect
calls.Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
Add an RB-tree to cope with OoO (at MPTCP level) data.
__mptcp_move_skb() insert into the RB tree "future"
data, eventually coalescing skb as allowed by the
MPTCP DSN.To simplify sequence accounting, move the DSN inside
the cb.After successfully enqueuing in sequence data, check
if we can use any data from the RB tree.Additionally move the data_fin check after spooling
data from the OoO tree, otherwise we could miss shutdown
events.The RB tree code is copied as verbatim as possible
from tcp_data_queue_ofo(), with a few simplifications
due to the fact that MPTCP doesn't need to cope with
sacks. All bugs here are added by me.Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
This is a prerequisite to allow receiving data from multiple
subflows without re-injection.Instead of dropping the OoO - "future" data in
subflow_check_data_avail(), call into __mptcp_move_skbs()
and let the msk drop that.To avoid code duplication factor out the mptcp_subflow_discard_data()
helper.Note that __mptcp_move_skbs() can now find multiple subflows
with data avail (comprising to-be-discarded data), so must
update the byte counter incrementally.v1 -> v2:
- fix checkpatch issues (unsigned -> unsigned int)Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
01 Aug, 2020
2 commits
-
JOIN requests do not work in syncookie mode -- for HMAC validation, the
peers nonce and the mptcp token (to obtain the desired connection socket
the join is for) are required, but this information is only present in the
initial syn.So either we need to drop all JOIN requests once a listening socket enters
syncookie mode, or we need to store enough state to reconstruct the request
socket later.This adds a state table (1024 entries) to store the data present in the
MP_JOIN syn request and the random nonce used for the cookie syn/ack.When a MP_JOIN ACK passed cookie validation, the table is consulted
to rebuild the request socket from it.An alternate approach would be to "cancel" syn-cookie mode and force
MP_JOIN to always use a syn queue entry.However, doing so brings the backlog over the configured queue limit.
v2: use req->syncookie, not (removed) want_cookie arg
Suggested-by: Paolo Abeni
Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
Will be used to initialize the mptcp request socket when a MP_CAPABLE
request was handled in syncookie mode, i.e. when a TCP ACK containing a
MP_CAPABLE option is a valid syncookie value.Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
ID and stores it in the MPTCP token storage to be able to retrieve the
mptcp socket for subflow joining.In syncookie case, we do not want to store any state, so just generate the
unique ID and use it in the reply.This means there is a small window where another connection could generate
the same token.When Cookie ACK comes back, we check that the token has not been registered
in the mean time. If it was, the connection needs to fall back to TCP.Changes in v2:
- use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
(Eric Dumazet)Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
29 Jul, 2020
2 commits
-
Incoming DATA_FIN headers need to propagate the presence of the DATA_FIN
bit and the associated sequence number to the MPTCP layer, even when
arriving on a bare ACK that does not get added to the receive queue. Add
structure members to store the DATA_FIN information and helpers to set
and check those values.Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller -
Since DATA_FIN information is the same for every subflow, store it only
in the mptcp_sock.Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller
24 Jul, 2020
1 commit
-
Currently accepted msk sockets become established only after
accept() returns the new sk to user-space.As MP_JOIN request are refused as per RFC spec on non fully
established socket, the above causes mp_join self-tests
instabilities.This change lets the msk entering the established status
as soon as it receives the 3rd ack and propagates the first
subflow fully established status on the msk socket.Finally we can change the subflow acceptance condition to
take in account both the sock state and the msk fully
established flag.Reviewed-by: Mat Martineau
Tested-by: Christoph Paasch
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
22 Jul, 2020
1 commit
-
Only used in token.c.
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
10 Jul, 2020
1 commit
-
mptcp_token_iter_next() allow traversing all the MPTCP
sockets inside the token container belonging to the given
network namespace with a quite standard iterator semantic.That will be used by the next patch, but keep the API generic,
as we plan to use this later for PM's sake.Additionally export mptcp_token_get_sock(), as it also
will be used by the diag module.Reviewed-by: Mat Martineau
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller
08 Jul, 2020
1 commit
-
We can re-use the existing work queue to handle path management
instead of a dedicated work queue. Just move pm_worker to protocol.c,
call it from the mptcp worker and get rid of the msk lock (already held).Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
02 Jul, 2020
1 commit
-
When mptcp is used, userspace doesn't read from the tcp (subflow)
socket but from the parent (mptcp) socket receive queue.skbs are moved from the subflow socket to the mptcp rx queue either from
'data_ready' callback (if mptcp socket can be locked), a work queue, or
the socket receive function.This means tcp_rcv_space_adjust() is never called and thus no receive
buffer size auto-tuning is done.An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
function that moves skbs from subflow to mptcp socket.
While this enabled autotuning, it also meant tuning was done even if
userspace was reading the mptcp socket very slowly.This adds mptcp_rcv_space_adjust() and calls it after userspace has
read data from the mptcp socket rx queue.Its very similar to tcp_rcv_space_adjust, with two differences:
1. The rtt estimate is the largest one observed on a subflow
2. The rcvbuf size and window clamp of all subflows is adjusted
to the mptcp-level rcvbuf.Otherwise, we get spurious drops at tcp (subflow) socket level if
the skbs are not moved to the mptcp socket fast enough.Before:
time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
[..]
ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 40823ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 23119ms) [ OK ]
ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5421ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 41446ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 23427ms) [ OK ]
ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5426ms) [ OK ]
Time: 1396 secondsAfter:
ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 5417ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 5427ms) [ OK ]
ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5422ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 5415ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 5422ms) [ OK ]
ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5423ms) [ OK ]
Time: 296 secondsSigned-off-by: Florian Westphal
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller
30 Jun, 2020
2 commits
-
when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on the socket current state,
multiple faulty behaviours can be observed:1) a WARN_ON() in subflow_data_ready() is hit
WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
[...]
CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
[...]
RIP: 0010:subflow_data_ready+0x18b/0x230
[...]
Call Trace:
tcp_data_queue+0xd2f/0x4250
tcp_rcv_state_process+0xb1c/0x49d3
tcp_v4_do_rcv+0x2bc/0x790
__release_sock+0x153/0x2d0
release_sock+0x4f/0x170
mptcp_shutdown+0x167/0x4e0
__sys_shutdown+0xe6/0x180
__x64_sys_shutdown+0x50/0x70
do_syscall_64+0x9a/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa92) client is stuck forever in mptcp_sendmsg() because the socket is not
TCP_ESTABLISHEDcrash> bt 4847
PID: 4847 TASK: ffff88814b2fb100 CPU: 1 COMMAND: "gh35"
#0 [ffff8881376ff680] __schedule at ffffffff97248da4
#1 [ffff8881376ff778] schedule at ffffffff9724a34f
#2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
#3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
#4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
#5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
#6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
#7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
#8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
#9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
#10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
#11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
#12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
RIP: 00007f126f6956ed RSP: 00007ffc2a320278 RFLAGS: 00000217
RAX: ffffffffffffffda RBX: 0000000020000044 RCX: 00007f126f6956ed
RDX: 0000000000000004 RSI: 00000000004007b8 RDI: 0000000000000003
RBP: 00007ffc2a3202a0 R8: 0000000000400720 R9: 0000000000400720
R10: 0000000000400720 R11: 0000000000000217 R12: 00000000004004b0
R13: 00007ffc2a320380 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
didn't complete.$ tcpdump -tnnr bad.pcap
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in mptcp_sendmsg().Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch
Suggested-by: Paolo Abeni
Signed-off-by: Davide Caratti
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
to regular TCP. When fallback is triggered, skip addition of the MPTCP
option on send.Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Davide Caratti
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
27 Jun, 2020
2 commits
-
Replace the radix tree with a hash table allocated
at boot time. The radix tree has some shortcoming:
a single lock is contented by all the mptcp operation,
the lookup currently use such lock, and traversing
all the items would require a lock, too.With hash table instead we trade a little memory to
address all the above - a per bucket lock is used.To hash the MPTCP sockets, we re-use the msk' sk_node
entry: the MPTCP sockets are never hashed by the stack.
Replace the existing hash proto callbacks with a dummy
implementation, annotating the above constraint.Additionally refactor the token creation to code to:
- limit the number of consecutive attempts to a fixed
maximum. Hitting a hash bucket with a long chain is
considered a failed attempt- accept() no longer can fail to token management.
- if token creation fails at connect() time, we do
fallback to TCP (before the connection was closed)v1 -> v2:
- fix "no newline at end of file" - JakubSigned-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller -
Add the missing annotation in some setup-only
functions.Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
26 Jun, 2020
1 commit
-
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.Signed-off-by: David S. Miller
24 Jun, 2020
2 commits
-
Declare ipv4_specific once, in tcp.h were it belongs.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
ipv6_specific should be declared in tcp include files,
not mptcp.This removes the following warning :
CHECK net/ipv6/tcp_ipv6.c
net/ipv6/tcp_ipv6.c:78:42: warning: symbol 'ipv6_specific' was not declared. Should it be static?Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Jun, 2020
1 commit
-
The msk ownership is transferred to the child socket at
3rd ack time, so that we avoid more lookups later. If the
request does not reach the 3rd ack, the MSK reference is
dropped at request sock release time.As a side effect, fallback is now tracked by a NULL msk
reference instead of zeroed 'mp_join' field. This will
simplify the next patch.Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller
16 Jun, 2020
2 commits
-
Use list_first_entry_or_null to simplify the code.
Signed-off-by: Geliang Tang
Signed-off-by: David S. Miller -
We have defined MPTCP_PM_ADDR_MAX in pm_netlink.c, so drop this duplicate macro.
Fixes: 1b1c7a0ef7f3 ("mptcp: Add path manager interface")
Signed-off-by: Geliang Tang
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller
25 May, 2020
1 commit
-
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.Signed-off-by: David S. Miller