Eric Lee / smarc-fsl-linux-kernel

02 Apr, 2020

4 commits

564cf2f39 mptcp: fix "fn parameter not described" warnings ... Browse Code »

Obtained with:

$ make W=1 net/mptcp/token.o
net/mptcp/token.c:53: warning: Function parameter or member 'req' not described in 'mptcp_token_new_request'
net/mptcp/token.c:98: warning: Function parameter or member 'sk' not described in 'mptcp_token_new_connect'
net/mptcp/token.c:133: warning: Function parameter or member 'conn' not described in 'mptcp_token_new_accept'
net/mptcp/token.c:178: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy_request'
net/mptcp/token.c:191: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy'

Fixes: 79c0949e9a09 (mptcp: Add key generation and token tree)
Fixes: 58b09919626b (mptcp: create msk early)
Signed-off-by: Matthieu Baerts
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Matthieu Baerts
2020-04-02 21:59:21 +0800
de06f5739 mptcp: re-check dsn before reading from subflow ... Browse Code »

mptcp_subflow_data_available() is commonly called via
ssk->sk_data_ready(), in this case the mptcp socket lock
cannot be acquired.

Therefore, while we can safely discard subflow data that
was already received up to msk->ack_seq, we cannot be sure
that 'subflow->data_avail' will still be valid at the time
userspace wants to read the data -- a previous read on a
different subflow might have carried this data already.

In that (unlikely) event, msk->ack_seq will have been updated
and will be ahead of the subflow dsn.

We can check for this condition and skip/resync to the expected
sequence number.

Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2020-04-02 21:59:21 +0800
59832e246 mptcp: subflow: check parent mptcp socket on subflow state change ... Browse Code »

This is needed at least until proper MPTCP-Level fin/reset
signalling gets added:

We wake parent when a subflow changes, but we should do this only
when all subflows have closed, not just one.

Schedule the mptcp worker and tell it to check eof state on all
subflows.

Only flag mptcp socket as closed and wake userspace processes blocking
in poll if all subflows have closed.

Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2020-04-02 21:59:21 +0800
0b4f33def mptcp: fix tcp fallback crash ... Browse Code »

Christoph Paasch reports following crash:

general protection fault [..]
CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62
RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471
[..]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
do_raw_spin_lock include/linux/spinlock.h:181 [inline]
spin_lock_bh include/linux/spinlock.h:343 [inline]
__mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278
mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882
[..]

Problem is that mptcp_shutdown() socket isn't an mptcp socket,
its a plain tcp_sk. Thus, trying to access mptcp_sk specific
members accesses garbage.

Root cause is that accept() returns a fallback (tcp) socket, not an mptcp
one. There is code in getpeername to detect this and override the sockets
stream_ops. But this will only run when accept() caller provided a
sockaddr struct. "accept(fd, NULL, 0)" will therefore result in
mptcp stream ops, but with sock->sk pointing at a tcp_sk.

Update the existing fallback handling to detect this as well.

Moreover, mptcp_shutdown did not have fallback handling, and
mptcp_poll did it too late so add that there as well.

Reported-by: Christoph Paasch
Tested-by: Christoph Paasch
Reviewed-by: Mat Martineau
Signed-off-by: Matthieu Baerts
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2020-04-02 21:59:21 +0800

30 Mar, 2020

15 commits

01cacb00b mptcp: add netlink-based PM ... Browse Code »

Expose a new netlink family to userspace to control the PM, setting:

- list of local addresses to be signalled.
- list of local addresses used to created subflows.
- maximum number of add_addr option to react

When the msk is fully established, the PM netlink attempts to
announce the 'signal' list via the ADD_ADDR option. Since we
currently lack the ADD_ADDR echo (and related event) only the
first addr is sent.

After exhausting the 'announce' list, the PM tries to create
subflow for each addr in 'local' list, waiting for each
connection to be completed before attempting the next one.

Idea is to add an additional PM hook for ADD_ADDR echo, to allow
the PM netlink announcing multiple addresses, in sequence.

Co-developed-by: Matthieu Baerts
Signed-off-by: Matthieu Baerts
Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:49 +0800
fc518953b mptcp: add and use MIB counter infrastructure ... Browse Code »

Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far. The counter list can
be increased at any time later on.

v2 -> v3:
- remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Florian Westphal
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-03-30 13:14:49 +0800
5147dfb50 mptcp: allow dumping subflow context to userspace ... Browse Code »

add ulp-specific diagnostic functions, so that subflow information can be
dumped to userspace programs like 'ss'.

v2 -> v3:
- uapi: use bit macros appropriate for userspace

Co-developed-by: Matthieu Baerts
Signed-off-by: Matthieu Baerts
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Davide Caratti
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Davide Caratti
2020-03-30 13:14:48 +0800
3b1d6210a mptcp: implement and use MPTCP-level retransmission ... Browse Code »

On timeout event, schedule a work queue to do the retransmission.
Retransmission code closely resembles the sendmsg() implementation and
re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags'
sake - and peeking the relevant dfrag from the rtx head.

Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
3f8e0aae1 mptcp: rework mptcp_sendmsg_frag to accept optional dfrag ... Browse Code »

This will simplify mptcp-level retransmission implementation
in the next patch. If dfrag is provided by the caller, skip
kernel space memory allocation and use data and metadata
provided by the dfrag itself.

Because a peer could ack data at TCP level but refrain from
sending mptcp-level ACKs, we could grow the mptcp socket
backlog indefinitely.

We should thus block mptcp_sendmsg until the peer has acked some of the
sent data.

In order to be able to do so, increment the mptcp socket wmem_queued
counter on memory allocation and decrement it when releasing the memory
on mptcp-level ack reception.

Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make
this the mptcp sk_sndbuf limit.

In the future we could add experiment with autotuning as TCP does in
tcp_sndbuf_expand().

v2 -> v3:
- remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
7948f6cc9 mptcp: allow partial cleaning of rtx head dfrag ... Browse Code »

After adding wmem accounting for the mptcp socket we could get
into a situation where the mptcp socket can't transmit more data,
and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced
because it currently will only remove entire dfrags.

Allow advancing the dfrag head sequence and reduce wmem,
even though this isn't correct (as we can't release the page).

Because we will soon block on mptcp sk in case wmem is too large,
call sk_stream_write_space() in case we reduced the backlog so
userspace task blocked in sendmsg or poll will be woken up.

This isn't an issue if the send buffer is large, but it is when
SO_SNDBUF is used to reduce it to a lower value.

Note we can still get a deadlock for low SO_SNDBUF values in
case both sides of the connection write to the socket: both could
be blocked due to wmem being too small -- and current mptcp stack
will only increment mptcp ack_seq on recv.

This doesn't happen with the selftest as it uses poll() and
will always call recv if there is data to read.

Signed-off-by: Florian Westphal
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-03-30 13:14:48 +0800
d027236c4 mptcp: implement memory accounting for mptcp rtx queue ... Browse Code »

Charge the data on the rtx queue to the master MPTCP socket, too.
Such memory in uncharged when the data is acked/dequeued.

Also account mptcp sockets inuse via a protocol specific pcpu
counter.

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
b51f9b80c mptcp: introduce MPTCP retransmission timer ... Browse Code »

The timer will be used to schedule retransmission. It's
frequency is based on the current subflow RTO estimation and
is reset on every una_seq update

The timer is clearer for good by __mptcp_clear_xmit()

Also clean MPTCP rtx queue before each transmission.

Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
18b683bff mptcp: queue data for mptcp level retransmission ... Browse Code »

Keep the send page fragment on an MPTCP level retransmission queue.
The queue entries are allocated inside the page frag allocator,
acquiring an additional reference to the page for each list entry.

Also switch to a custom page frag refill function, to ensure that
the current page fragment can always host an MPTCP rtx queue entry.

The MPTCP rtx queue is flushed at disconnect() and close() time

Note that now we need to call __mptcp_init_sock() regardless of mptcp
enable status, as the destructor will try to walk the rtx_queue.

v2 -> v3:
- remove 'inline' in foo.c files (David S. Miller)

Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
cc9d25669 mptcp: update per unacked sequence on pkt reception ... Browse Code »

So that we keep per unacked sequence number consistent; since
we update per msk data, use an atomic64 cmpxchg() to protect
against concurrent updates from multiple subflows.

Initialize the snd_una at connect()/accept() time.

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Signed-off-by: Paolo Abeni
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-30 13:14:48 +0800
926bdeab5 mptcp: Implement path manager interface commands ... Browse Code »

Fill in more path manager functionality by adding a worker function and
modifying the related stub functions to schedule the worker.

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Peter Krystad
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Peter Krystad
2020-03-30 13:14:48 +0800
ec3edaa7c mptcp: Add handling of outgoing MP_JOIN requests ... Browse Code »

Subflow creation may be initiated by the path manager when
the primary connection is fully established and a remote
address has been received via ADD_ADDR.

Create an in-kernel sock and use kernel_connect() to
initiate connection.

Passive sockets can't acquire the mptcp socket lock at
subflow creation time, so an additional list protected by
a new spinlock is used to track the MPJ subflows.

Such list is spliced into conn_list tail every time the msk
socket lock is acquired, so that it will not interfere
with data flow on the original connection.

Data flow and connection failover not addressed by this commit.

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Co-developed-by: Matthieu Baerts
Signed-off-by: Matthieu Baerts
Signed-off-by: Peter Krystad
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Peter Krystad
2020-03-30 13:14:48 +0800
f296234c9 mptcp: Add handling of incoming MP_JOIN requests ... Browse Code »

Process the MP_JOIN option in a SYN packet with the same flow
as MP_CAPABLE but when the third ACK is received add the
subflow to the MPTCP socket subflow list instead of adding it to
the TCP socket accept queue.

The subflow is added at the end of the subflow list so it will not
interfere with the existing subflows operation and no data is
expected to be transmitted on it.

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Peter Krystad
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Peter Krystad
2020-03-30 13:14:48 +0800
1b1c7a0ef mptcp: Add path manager interface ... Browse Code »

Add enough of a path manager interface to allow sending of ADD_ADDR
when an incoming MPTCP connection is created. Capable of sending only
a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection
sock will need to be expanded to handle multiple interfaces and IPv6.
Partial processing of the incoming ADD_ADDR is included so the path
manager notification of that event happens at the proper time, which
involves validating the incoming address information.

This is a skeleton interface definition for events generated by
MPTCP.

Co-developed-by: Matthieu Baerts
Signed-off-by: Matthieu Baerts
Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Co-developed-by: Mat Martineau
Signed-off-by: Mat Martineau
Signed-off-by: Peter Krystad
Signed-off-by: David S. Miller

Peter Krystad
2020-03-30 13:14:48 +0800
3df523ab5 mptcp: Add ADD_ADDR handling ... Browse Code »

Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6,
and RM_ADDR suboptions.

Co-developed-by: Matthieu Baerts
Signed-off-by: Matthieu Baerts
Co-developed-by: Paolo Abeni
Signed-off-by: Paolo Abeni
Signed-off-by: Peter Krystad
Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Peter Krystad
2020-03-30 13:14:48 +0800

24 Mar, 2020

1 commit

c3c123d16 net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback ... Browse Code »

it's still possible for packetdrill to hang in mptcp_sendmsg(), when the
MPTCP socket falls back to regular TCP (e.g. after receiving unsupported
flags/version during the three-way handshake). Adjust MPTCP socket state
earlier, to ensure correct functionality of mptcp_sendmsg() even in case
of TCP fallback.

Fixes: 767d3ded5fb8 ("net: mptcp: don't hang before sending 'MP capable with data'")
Fixes: 1954b86016cf ("mptcp: Check connection state before attempting send")
Signed-off-by: Davide Caratti
Acked-by: Paolo Abeni
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller

Davide Caratti
2020-03-24 11:53:25 +0800

22 Mar, 2020

1 commit

09984483d mptcp: Remove set but not used variable 'can_ack' ... Browse Code »

Fixes gcc '-Wunused-but-set-variable' warning:

net/mptcp/options.c: In function 'mptcp_established_options_dss':
net/mptcp/options.c:338:7: warning:
variable 'can_ack' set but not used [-Wunused-but-set-variable]

commit dc093db5cc05 ("mptcp: drop unneeded checks")
leave behind this unused, remove it.

Signed-off-by: YueHaibing
Acked-by: Paolo Abeni
Signed-off-by: David S. Miller

YueHaibing
2020-03-22 11:20:36 +0800

20 Mar, 2020

1 commit

0be534f5c mptcp: rename fourth ack field ... Browse Code »

The name is misleading, it actually tracks the 'fully established'
status.

Reviewed-by: Mat Martineau
Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-20 11:19:34 +0800

18 Mar, 2020

1 commit

7f20d5fc7 mptcp: move msk state update to subflow_syn_recv_sock() ... Browse Code »

After commit 58b09919626b ("mptcp: create msk early"), the
msk socket is already available at subflow_syn_recv_sock()
time. Let's move there the state update, to mirror more
closely the first subflow state.

The above will also help multiple subflow supports.

Signed-off-by: Paolo Abeni
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-18 13:52:24 +0800

15 Mar, 2020

2 commits

dc093db5c mptcp: drop unneeded checks ... Browse Code »

After the previous patch subflow->conn is always != NULL and
is never changed. We can drop a bunch of now unneeded checks.

v1 -> v2:
- rebased on top of commit 2398e3991bda ("mptcp: always
include dack if possible.")

Signed-off-by: Paolo Abeni
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-15 15:19:03 +0800
58b099196 mptcp: create msk early ... Browse Code »

This change moves the mptcp socket allocation from mptcp_accept() to
subflow_syn_recv_sock(), so that subflow->conn is now always set
for the non fallback scenario.

It allows cleaning up a bit mptcp_accept() reducing the additional
locking and will allow fourther cleanup in the next patch.

Signed-off-by: Paolo Abeni
Reviewed-by: Matthieu Baerts
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-15 15:19:03 +0800

13 Mar, 2020

1 commit

1d3435793 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller

David S. Miller
2020-03-13 13:34:48 +0800

12 Mar, 2020

1 commit

767d3ded5 net: mptcp: don't hang before sending 'MP capable with data' ... Browse Code »

the following packetdrill script

socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> S 0:0(0)
< S. 0:0(0) ack 1 win 65535
> . 1:1(0) ack 1 win 256
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(3, F_SETFL, O_RDWR) = 0
write(3, ..., 1000) = 1000

doesn't transmit 1KB data packet after a successful three-way-handshake,
using mp_capable with data as required by protocol v1, and write() hangs
forever:

PID: 973 TASK: ffff97dd399cae80 CPU: 1 COMMAND: "packetdrill"
#0 [ffffa9b94062fb78] __schedule at ffffffff9c90a000
#1 [ffffa9b94062fc08] schedule at ffffffff9c90a4a0
#2 [ffffa9b94062fc18] schedule_timeout at ffffffff9c90e00d
#3 [ffffa9b94062fc90] wait_woken at ffffffff9c120184
#4 [ffffa9b94062fcb0] sk_stream_wait_connect at ffffffff9c75b064
#5 [ffffa9b94062fd20] mptcp_sendmsg at ffffffff9c8e801c
#6 [ffffa9b94062fdc0] sock_sendmsg at ffffffff9c747324
#7 [ffffa9b94062fdd8] sock_write_iter at ffffffff9c7473c7
#8 [ffffa9b94062fe48] new_sync_write at ffffffff9c302976
#9 [ffffa9b94062fed0] vfs_write at ffffffff9c305685
#10 [ffffa9b94062ff00] ksys_write at ffffffff9c305985
#11 [ffffa9b94062ff38] do_syscall_64 at ffffffff9c004475
#12 [ffffa9b94062ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ca0008c
RIP: 00007f959407eaf7 RSP: 00007ffe9e95a910 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f959407eaf7
RDX: 00000000000003e8 RSI: 0000000001785fe0 RDI: 0000000000000008
RBP: 0000000001785fe0 R8: 0000000000000000 R9: 0000000000000003
R10: 0000000000000007 R11: 0000000000000293 R12: 00000000000003e8
R13: 00007ffe9e95ae30 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

Fix it ensuring that socket state is TCP_ESTABLISHED on reception of the
third ack.

Fixes: 1954b86016cf ("mptcp: Check connection state before attempting send")
Suggested-by: Paolo Abeni
Signed-off-by: Davide Caratti
Signed-off-by: David S. Miller

Davide Caratti
2020-03-12 14:59:11 +0800

10 Mar, 2020

1 commit

ec33916d4 mptcp: don't grow mptcp socket receive buffer when rcvbuf is locked ... Browse Code »

The mptcp rcvbuf size is adjusted according to the subflow rcvbuf size.
This should not be done if userspace did set a fixed value.

Fixes: 600911ff5f72bae ("mptcp: add rmem queue accounting")
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2020-03-10 10:30:08 +0800

06 Mar, 2020

1 commit

2398e3991 mptcp: always include dack if possible. ... Browse Code »

Currently passive MPTCP socket can skip including the DACK
option - if the peer sends data before accept() completes.

The above happens because the msk 'can_ack' flag is set
only after the accept() call.

Such missing DACK option may cause - as per RFC spec -
unwanted fallback to TCP.

This change addresses the issue using the key material
available in the current subflow, if any, to create a suitable
dack option when msk ack seq is not yet available.

v1 -> v2:
- adavance the generated ack after the initial MPC packet

Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option")
Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-03-06 13:34:42 +0800

04 Mar, 2020

3 commits

6d37a0b85 mptcp: Only send DATA_FIN with final mapping ... Browse Code »

When a DATA_FIN is sent in a MPTCP DSS option that contains a data
mapping, the DATA_FIN consumes one byte of space in the mapping. In this
case, the DATA_FIN should only be included in the DSS option if its
sequence number aligns with the end of the mapped data. Otherwise the
subflow can send an incorrect implicit sequence number for the DATA_FIN,
and the DATA_ACK for that sequence number would not close the
MPTCP-level connection correctly.

Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Mat Martineau
2020-03-04 09:01:43 +0800
76c42a29c mptcp: Use per-subflow storage for DATA_FIN sequence number ... Browse Code »

Instead of reading the MPTCP-level sequence number when sending DATA_FIN,
store the data in the subflow so it can be safely accessed when the
subflow TCP headers are written to the packet without the MPTCP-level
lock held. This also allows the MPTCP-level socket to close individual
subflows without closing the MPTCP connection.

Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Mat Martineau
2020-03-04 09:01:43 +0800
1954b8601 mptcp: Check connection state before attempting send ... Browse Code »

MPTCP should wait for an active connection or skip sending depending on
the connection state, as TCP does. This happens before the possible
passthrough to a regular TCP sendmsg because the subflow's socket type
(MPTCP or TCP fallback) is not known until the connection is
complete. This is also relevent at disconnect time, where data should
not be sent in certain MPTCP-level connection states.

Signed-off-by: Mat Martineau
Signed-off-by: David S. Miller

Mat Martineau
2020-03-04 09:01:42 +0800

28 Feb, 2020

1 commit

9f6e05590 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

The mptcp conflict was overlapping additions.

The SMC conflict was an additional and removal happening at the same
time.

Signed-off-by: David S. Miller

David S. Miller
2020-02-28 10:31:39 +0800

27 Feb, 2020

7 commits

dc24f8b4e mptcp: add dummy icsk_sync_mss() ... Browse Code »

syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
callback, and was able to trigger a null pointer dereference:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
FS: 0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
smack_netlabel security/smack/smack_lsm.c:2425 [inline]
smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
security_inode_setsecurity+0xb2/0x140 security/security.c:1364
__vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
vfs_setxattr fs/xattr.c:224 [inline]
setxattr+0x335/0x430 fs/xattr.c:451
__do_sys_fsetxattr fs/xattr.c:506 [inline]
__se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
__x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440199
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
CR2: 0000000000000000

Address the issue adding a dummy icsk_sync_mss callback.
To properly sync the subflows mss and options list we need some
additional infrastructure, which will land to net-next.

Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket")
Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-02-27 12:49:50 +0800
14c441b56 mptcp: defer work schedule until mptcp lock is released ... Browse Code »

Don't schedule the work queue right away, instead defer this
to the lock release callback.

This has the advantage that it will give recv path a chance to
complete -- this might have moved all pending packets from the
subflow to the mptcp receive queue, which allows to avoid the
schedule_work().

Co-developed-by: Florian Westphal
Signed-off-by: Florian Westphal
Signed-off-by: Paolo Abeni
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-02-27 12:46:26 +0800
2e52213c7 mptcp: avoid work queue scheduling if possible ... Browse Code »

We can't lock_sock() the mptcp socket from the subflow data_ready callback,
it would result in ABBA deadlock with the subflow socket lock.

We can however grab the spinlock: if that succeeds and the mptcp socket
is not owned at the moment, we can process the new skbs right away
without deferring this to the work queue.

This avoids the schedule_work and hence the small delay until the
work item is processed.

Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-02-27 12:46:26 +0800
bfae9dae4 mptcp: remove mptcp_read_actor ... Browse Code »

Only used to discard stale data from the subflow, so move
it where needed.

Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-02-27 12:46:26 +0800
600911ff5 mptcp: add rmem queue accounting ... Browse Code »

If userspace never drains the receive buffers we must stop draining
the subflow socket(s) at some point.

This adds the needed rmem accouting for this.
If the threshold is reached, we stop draining the subflows.

Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-02-27 12:46:26 +0800
6771bfd9e mptcp: update mptcp ack sequence from work queue ... Browse Code »

If userspace is not reading data, all the mptcp-level acks contain the
ack_seq from the last time userspace read data rather than the most
recent in-sequence value.

This causes pointless retransmissions for data that is already queued.

The reason for this is that all the mptcp protocol level processing
happens at mptcp_recv time.

This adds work queue to move skbs from the subflow sockets receive
queue on the mptcp socket receive queue (which was not used so far).

This allows us to announce the correct mptcp ack sequence in a timely
fashion, even when the application does not call recv() on the mptcp socket
for some time.

We still wake userspace tasks waiting for POLLIN immediately:
If the mptcp level receive queue is empty (because the work queue is
still pending) it can be filled from in-sequence subflow sockets at
recv time without a need to wait for the worker.

The skb_orphan when moving skbs from subflow to mptcp level is needed,
because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
being taken.

A followup patch will add needed rmem accouting for the moved skbs.

Other problem: In case application behaves as expected, and calls
recv() as soon as mptcp socket becomes readable, the work queue will
only waste cpu cycles. This will also be addressed in followup patches.

Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Florian Westphal
2020-02-27 12:46:26 +0800
809920171 mptcp: add work queue skeleton ... Browse Code »

Will be extended with functionality in followup patches.
Initial user is moving skbs from subflows receive queue to
the mptcp-level receive queue.

Signed-off-by: Paolo Abeni
Signed-off-by: Florian Westphal
Reviewed-by: Mat Martineau
Signed-off-by: David S. Miller

Paolo Abeni
2020-02-27 12:46:26 +0800