29 May, 2020
3 commits
-
Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space
without going through a fake uaccess.Signed-off-by: Christoph Hellwig
Signed-off-by: David S. Miller -
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.Signed-off-by: Christoph Hellwig
Acked-by: Sagi Grimberg
Acked-by: Jason Gunthorpe
Signed-off-by: David S. Miller -
Add a helper to directly set the SO_LINGER sockopt from kernel space
with onoff set to true and a linger time of 0 without going through a
fake uaccess.Signed-off-by: Christoph Hellwig
Acked-by: Sagi Grimberg
Signed-off-by: David S. Miller
24 Jul, 2018
1 commit
-
This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
connection requests. RDS/RDMA/IB uses a private data (struct
rds_ib_connect_private) exchange between endpoints at RDS connection
establishment time to support RDMA. This private data exchange uses a
32 bit integer to represent an IP address. This needs to be changed in
order to support IPv6. A new private data struct
rds6_ib_connect_private is introduced to handle this. To ensure
backward compatibility, an IPv6 capable RDS stack uses another RDMA
listener port (RDS_CM_PORT) to accept IPv6 connection. And it
continues to use the original RDS_PORT for IPv4 RDS connections. When
it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
send the connection set up request.v5: Fixed syntax problem (David Miller).
v4: Changed port history comments in rds.h (Sowmini Varadhan).
v3: Added support to set up IPv4 connection using mapped address
(David Miller).
Added support to set up connection between link local and non-link
addresses.
Various review comments from Santosh Shilimkar and Sowmini Varadhan.v2: Fixed bound and peer address scope mismatched issue.
Added back rds_connect() IPv6 changes.Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
24 Jan, 2018
1 commit
-
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.Signed-off-by: David S. Miller
23 Jan, 2018
1 commit
-
rds-tcp uses m_ack_seq to track the tcp ack# that indicates
that the peer has received a rds_message. The m_ack_seq is
used in rds_tcp_is_acked() to figure out when it is safe to
drop the rds_message from the RDS retransmit queue.The m_ack_seq must be calculated as an offset from the right
edge of the in-flight tcp buffer, i.e., it should be based on
the ->write_seq, not the ->snd_nxt.Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
02 Dec, 2017
1 commit
-
The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
to find the rds_connection entries marked for deletion as part
of the netns deletion under the protection of the rds_tcp_conn_lock.
Since the rds_tcp_conn_list tracks rds_tcp_connections (which
have a 1:1 mapping with rds_conn_path), multiple tc entries in
the rds_tcp_conn_list will map to a single rds_connection, and will
be deleted as part of the rds_conn_destroy() operation that is
done outside the rds_tcp_conn_lock.The rds_tcp_conn_list traversal done under the protection of
rds_tcp_conn_lock should not leave any doomed tc entries in
the list after the rds_tcp_conn_lock is released, else another
concurrently executiong netns delete (for a differnt netns) thread
may trip on these entries.Reported-by: syzbot
Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
22 Jun, 2017
1 commit
-
If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.Signed-off-by: Sowmini Varadhan
Tested-by: Jenny Xu
Signed-off-by: David S. Miller
08 Mar, 2017
1 commit
-
Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in
rds_tcp_listen_data_ready") added the function
rds_tcp_listen_sock_def_readable() to handle the case when a
partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
null and we would hit a panic of the form
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: (null)
:
? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
tcp_data_queue+0x39d/0x5b0
tcp_rcv_established+0x2e5/0x660
tcp_v4_do_rcv+0x122/0x220
tcp_v4_rcv+0x8b7/0x980
:
In the above case, it is not fatal to encounter a NULL value for
ready- we should just drop the packet and let the flush of the
acceptor thread finish gracefully.In general, the tear-down sequence for listen() and accept() socket
that is ensured by this commit is:
rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
In rds_tcp_listen_stop():
serialize with, and prevent, further callbacks using lock_sock()
flush rds_wq
flush acceptor workq
sock_release(listen socket)Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
16 Jul, 2016
1 commit
-
As the existing comments in rds_tcp_listen_data_ready() indicate,
it is possible under some race-windows to get to this function with the
accept() socket. If that happens, we could run into a sequence wherebythread 1 thread 2
rds_tcp_accept_one() thread
sets up new_sock via ->accept().
The sk_user_data is now
sock_def_readable
data comes in for new_sock,
->sk_data_ready is called, and
we land in rds_tcp_listen_data_ready
rds_tcp_set_callbacks()
takes the sk_callback_lock and
sets up sk_user_data to be the cp
read_lock sk_callback_lock
ready = cp
unlock sk_callback_lock
page fault on readyIn the above sequence, we end up with a panic on a bad page reference
when trying to execute (*ready)(). Instead we need to call
sock_def_readable() safely, which is what this patch achieves.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
02 Jul, 2016
5 commits
-
This patch adds ->conn_path_connect callbacks in the rds_transport
that are used to set up a single connection path.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller -
The ->sk_user_data contains a pointer to the rds_conn_path
for the socket. Use this consistently in the rds_tcp_data_ready
callbacks to get the rds_conn_path for rds_recv_incoming.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller -
The socket callbacks should all operate on a struct rds_conn_path,
in preparation for a MP capable RDS-TCP.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller -
The struct rds_tcp_connection is the transport-specific private
data structure that tracks TCP information per rds_conn_path.
Modify this structure to have a back-pointer to the rds_conn_path
for which it is the ->cp_transport_data.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller -
Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
19 Jun, 2016
1 commit
-
Fix coding style issues in the following files:
ib_cm.c: add space
loop.c: convert spaces to tabs
sysctl.c: add space
tcp.h: convert spaces to tabs
tcp_connect.c:remove extra indentation in switch statement
tcp_recv.c: convert spaces to tabs
tcp_send.c: convert spaces to tabs
transport.c: move brace up one line on for statementSigned-off-by: Joshua Houghton
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
08 Jun, 2016
1 commit
-
When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed. Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
04 May, 2016
1 commit
-
An arbitration scheme for duelling SYNs is implemented as part of
commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
08 Aug, 2015
1 commit
-
Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
24 Nov, 2014
1 commit
-
instances get considerably simpler from that...
Signed-off-by: Al Viro
12 Apr, 2014
1 commit
-
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.Signed-off-by: David S. Miller
21 Oct, 2010
1 commit
-
The RDS protocol has lots of functions that should be
declared static. rds_message_get/add_version_extension is
removed since it defined but never used.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
09 Sep, 2010
3 commits
-
The trivial amount of memory saved isn't worth the cost of dealing with section
mismatches.Signed-off-by: Zach Brown
-
We now ask the transport to give us a rm for the congestion
map, and then we handle it normally. Previously, the
transport defined a function that we would call to send
a congestion map.Convert TCP and loop transports to new cong map method.
Signed-off-by: Andy Grover
-
Signed-off-by: Andy Grover
24 Aug, 2009
1 commit
-
This code allows RDS to be tunneled over a TCP connection.
RDMA operations are disabled when using TCP transport,
but this frees RDS from the IB/RDMA stack dependency, and allows
it to be used with standard Ethernet adapters, or in a VM.Signed-off-by: Andy Grover
Signed-off-by: David S. Miller