Eric Lee / smarc-fsl-linux-kernel

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

02 Jul, 2020

1 commit

9ef845f89 rds: If one path needs re-connection, check all and re-connect ... Browse Code »

In testing with mprds enabled, Oracle Cluster nodes after reboot were
not able to communicate with others nodes and so failed to rejoin
the cluster. Peers with lower IP address initiated connection but the
node could not respond as it choose a different path and could not
initiate a connection as it had a higher IP address.

With this patch, when a node sends out a packet and the selected path
is down, all other paths are also checked and any down paths are
re-connected.

Reviewed-by: Ka-cheong Poon
Reviewed-by: David Edmondson
Signed-off-by: Somasundaram Krishnasamy
Signed-off-by: Rao Shoaib
Signed-off-by: David S. Miller

Rao Shoaib
2020-07-02 08:35:17 +0800

16 Apr, 2020

1 commit

7dba92037 net/rds: Use ERR_PTR for rds_message_alloc_sgs() ... Browse Code »

Returning the error code via a 'int *ret' when the function returns a
pointer is very un-kernely and causes gcc 10's static analysis to choke:

net/rds/message.c: In function ‘rds_message_map_pages’:
net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
358 | return ERR_PTR(ret);

Use a typical ERR_PTR return instead.

Signed-off-by: Jason Gunthorpe
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Jason Gunthorpe
2020-04-16 03:33:29 +0800

05 Sep, 2019

1 commit

842841ece Convert usage of IN_MULTICAST to ipv4_is_multicast ... Browse Code »

IN_MULTICAST's primary intent is as a uapi macro.

Elsewhere in the kernel we use ipv4_is_multicast consistently.

This patch unifies linux's multicast checks to use that function
rather than this macro.

Signed-off-by: Dave Taht
Reviewed-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller

Dave Taht
2019-09-05 15:38:32 +0800

16 Aug, 2019

1 commit

11740ef44 rds: check for excessive looping in rds_send_xmit ... Browse Code »

Original commit from 2011 updated to include a change by
Yuval Shaia
that adds a new statistic counter "send_stuck_rm"
to capture the messages looping exessively
in the send path.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Andy Grover
2019-08-16 03:04:24 +0800

10 Jul, 2019

1 commit

616d37a07 rds: fix reordering with composite message notification ... Browse Code »

RDS composite message(rdma + control) user notification needs to be
triggered once the full message is delivered and such a fix was
added as part of commit 941f8d55f6d61 ("RDS: RDMA: Fix the composite
message user notification"). But rds_send_remove_from_sock is missing
data part notify check and hence at times the user don't get
notification which isn't desirable.

One way is to fix the rds_send_remove_from_sock to check of that case
but considering the ordering complexity with completion handler and
rdma + control messages are always dispatched back to back in same send
context, just delaying the signaled completion on rmda work request also
gets the desired behaviour. i.e Notifying application only after
RDMA + control message send completes. So patch updates the earlier
fix with this approach. The delay signaling completions of rdma op
till the control message send completes fix was done by Venkat
Venkatsubra in downstream kernel.

Reviewed-and-tested-by: Zhu Yanjun
Reviewed-by: Gerd Rausch
Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2019-07-10 12:45:41 +0800

05 Feb, 2019

2 commits

fd261ce6a rds: rdma: update rdma transport for tos ... Browse Code »

For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.

The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.

Reviewed-by: Sowmini Varadhan
Signed-off-by: Santosh Shilimkar
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun

Santosh Shilimkar
2019-02-05 06:59:13 +0800
3eb450367 rds: add type of service(tos) infrastructure ... Browse Code »

RDS Service type (TOS) is user-defined and needs to be configured
via RDS IOCTL interface. It must be set before initiating any
traffic and once set the TOS can not be changed. All out-going
traffic from the socket will be associated with its TOS.

Reviewed-by: Sowmini Varadhan
Signed-off-by: Santosh Shilimkar
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun

Santosh Shilimkar
2019-02-05 06:59:12 +0800

07 Jan, 2019

1 commit

eeb2c4fb6 rds: use DIV_ROUND_UP instead of ceil ... Browse Code »

Yes indeed, DIV_ROUND_UP is in kernel.h.

Signed-off-by: Jacob Wen
Signed-off-by: David S. Miller

Jacob Wen
2019-01-07 23:22:36 +0800

20 Dec, 2018

3 commits

d84e7bc05 rds: Fix warning. ... Browse Code »

>> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer

Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs")
Reported-by: kbuild test robot
Signed-off-by: David S. Miller

David S. Miller
2018-12-20 12:53:18 +0800
c75ab8a55 net/rds: remove user triggered WARN_ON in rds_sendmsg ... Browse Code »

per comment from Leon in rdma mailing list
https://lkml.org/lkml/2018/10/31/312 :

Please don't forget to remove user triggered WARN_ON.
https://lwn.net/Articles/769365/
"Greg Kroah-Hartman raised the problem of core kernel API code that will
use WARN_ON_ONCE() to complain about bad usage; that will not generate
the desired result if WARN_ON_ONCE() is configured to crash the machine.
He was told that the code should just call pr_warn() instead, and that
the called function should return an error in such situations. It was
generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
triggered from user space need to be fixed."

in addition harden rds_sendmsg to detect and overcome issues with
invalid sg count and fail the sendmsg.

Suggested-by: Leon Romanovsky
Acked-by: Santosh Shilimkar
Signed-off-by: shamir rabinovitch
Signed-off-by: David S. Miller

shamir rabinovitch
2018-12-20 02:27:58 +0800
ea010070d net/rds: fix warn in rds_message_alloc_sgs ... Browse Code »

redundant copy_from_user in rds_sendmsg system call expose rds
to issue where rds_rdma_extra_size walk the rds iovec and and
calculate the number pf pages (sgs) it need to add to the tail of
rds message and later rds_cmsg_rdma_args copy the rds iovec again
and re calculate the same number and get different result causing
WARN_ON in rds_message_alloc_sgs.

fix this by doing the copy_from_user only once per rds_sendmsg
system call.

When issue occur the below dump is seen:

WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
panic+0x2ad/0x55c kernel/panic.c:188
__warn.cold.8+0x20/0x45 kernel/panic.c:540
report_bug+0x254/0x2d0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
rds_cmsg_send net/rds/send.c:971 [inline]
rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:632
___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
__sys_sendmsg+0x11d/0x280 net/socket.c:2155
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44a859
Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
Kernel Offset: disabled
Rebooting in 86400 seconds..

Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
Acked-by: Santosh Shilimkar
Signed-off-by: shamir rabinovitch
Signed-off-by: David S. Miller

shamir rabinovitch
2018-12-20 02:27:58 +0800

11 Oct, 2018

1 commit

9a4890bd6 rds: RDS (tcp) hangs on sendto() to unresponding address ... Browse Code »

In rds_send_mprds_hash(), if the calculated hash value is non-zero and
the MPRDS connections are not yet up, it will wait. But it should not
wait if the send is non-blocking. In this case, it should just use the
base c_path for sending the message.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2018-10-11 13:19:52 +0800

03 Aug, 2018

1 commit

89b1698c9 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The BTF conflicts were simple overlapping changes.

The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.

Signed-off-by: David S. Miller

David S. Miller
2018-08-03 01:55:32 +0800

02 Aug, 2018

1 commit

e65d4d963 rds: Remove IPv6 dependency ... Browse Code »

This patch removes the IPv6 dependency from RDS.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2018-08-02 00:32:35 +0800

27 Jul, 2018

1 commit

9e630bcb7 RDS: RDMA: Fix the NULL-ptr deref in rds_ib_get_mr ... Browse Code »

Registration of a memory region(MR) through FRMR/fastreg(unlike FMR)
needs a connection/qp. With a proxy qp, this dependency on connection
will be removed, but that needs more infrastructure patches, which is a
work in progress.

As an intermediate fix, the get_mr returns EOPNOTSUPP when connection
details are not populated. The MR registration through sendmsg() will
continue to work even with fast registration, since connection in this
case is formed upfront.

This patch fixes the following crash:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544
RSP: 0018:ffff8801b059f890 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e
RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068
RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200
R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200
R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000
FS: 00007f4d050af700(0000) GS:ffff8801db300000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271
rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357
rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4456d9
RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9
RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004
RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000
R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005
Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00
48
b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 3c 02
00 0f
85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00
RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP:
ffff8801b059f890
---[ end trace 7e1cea13b85473b0 ]---

Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com
Signed-off-by: Santosh Shilimkar
Signed-off-by: Avinash Repaka

Signed-off-by: David S. Miller

Avinash Repaka
2018-07-27 05:03:07 +0800

26 Jul, 2018

1 commit

dc66fe43b rds: send: Fix dead code in rds_sendmsg ... Browse Code »

Currently, code at label *out* is unreachable. Fix this by updating
variable *ret* with -EINVAL, so the jump to *out* can be properly
executed instead of directly returning from function.

Addresses-Coverity-ID: 1472059 ("Structurally dead code")
Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
Signed-off-by: Gustavo A. R. Silva
Acked-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2018-07-26 13:37:31 +0800

24 Jul, 2018

2 commits

1e2b44e78 rds: Enable RDS IPv6 support ... Browse Code »

This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
connection requests. RDS/RDMA/IB uses a private data (struct
rds_ib_connect_private) exchange between endpoints at RDS connection
establishment time to support RDMA. This private data exchange uses a
32 bit integer to represent an IP address. This needs to be changed in
order to support IPv6. A new private data struct
rds6_ib_connect_private is introduced to handle this. To ensure
backward compatibility, an IPv6 capable RDS stack uses another RDMA
listener port (RDS_CM_PORT) to accept IPv6 connection. And it
continues to use the original RDS_PORT for IPv4 RDS connections. When
it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
send the connection set up request.

v5: Fixed syntax problem (David Miller).

v4: Changed port history comments in rds.h (Sowmini Varadhan).

v3: Added support to set up IPv4 connection using mapped address
(David Miller).
Added support to set up connection between link local and non-link
addresses.
Various review comments from Santosh Shilimkar and Sowmini Varadhan.

v2: Fixed bound and peer address scope mismatched issue.
Added back rds_connect() IPv6 changes.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2018-07-24 12:17:44 +0800
eee2fa6ab rds: Changing IP address internal representation to struct in6_addr ... Browse Code »

This patch changes the internal representation of an IP address to use
struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr. But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2018-07-24 12:17:44 +0800

11 Apr, 2018

1 commit

a43cced9a rds: MP-RDS may use an invalid c_path ... Browse Code »

rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
send a message. Suppose the RDS connection is not yet up. In
rds_send_mprds_hash(), it does

if (conn->c_npaths == 0)
wait_event_interruptible(conn->c_hs_waitq,
(conn->c_npaths != 0));

If it is interrupted before the connection is set up,
rds_send_mprds_hash() will return a non-zero hash value. Hence
rds_sendmsg() will use a non-zero c_path to send the message. But if
the RDS connection ends up to be non-MP capable, the message will be
lost as only the zero c_path can be used.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2018-04-11 22:24:01 +0800

24 Feb, 2018

1 commit

79a5b9727 rds: rds_msg_zcopy should return error of null rm->data.op_mmp_znotifier ... Browse Code »

if either or both of MSG_ZEROCOPY and SOCK_ZEROCOPY have not been
specified, the rm->data.op_mmp_znotifier allocation will be skipped.
In this case, it is invalid ot pass down a cmsghdr with
RDS_CMSG_ZCOPY_COOKIE, so return EINVAL from rds_msg_zcopy for this
case.

Reported-by: syzbot+f893ae7bb2f6456dfbc3@syzkaller.appspotmail.com
Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Signed-off-by: Sowmini Varadhan
Acked-by: Willem de Bruijn
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2018-02-24 01:30:52 +0800

22 Feb, 2018

1 commit

f90531135 rds: send: mark expected switch fall-through in rds_rm_size ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1465362 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva
Acked-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2018-02-22 03:18:18 +0800

17 Feb, 2018

2 commits

0cebaccef rds: zerocopy Tx support. ... Browse Code »

If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg() are pinned.

The pinning uses the accounting infrastructure added by
Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

The payload bytes in the message may not be modified for the
duration that the message has been pinned. A multi-threaded
application using this infrastructure may thus need to be notified
about send-completion so that it can free/reuse the buffers
passed to rds_sendmsg(). Notification of send-completion will
identify each message-buffer by a cookie that the application
must specify as ancillary data to rds_sendmsg().
The ancillary data in this case has cmsg_level == SOL_RDS
and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.

Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Sowmini Varadhan
2018-02-17 05:04:17 +0800
ea8994cb0 rds: hold a sock ref from rds_message to the rds_sock ... Browse Code »

The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.

We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.

This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.

Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2018-02-17 05:04:16 +0800

09 Feb, 2018

1 commit

ebeeb1ad9 rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds… ... Browse Code »

… connection/workq management

An rds_connection can get added during netns deletion between lines 528
and 529 of

506 static void rds_tcp_kill_sock(struct net *net)
:
/* code to pull out all the rds_connections that should be destroyed */
:
528 spin_unlock_irq(&rds_tcp_conn_lock);
529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
530 rds_conn_destroy(tc->t_cpath->cp_conn);

Such an rds_connection would miss out the rds_conn_destroy()
loop (that cancels all pending work) and (if it was scheduled
after netns deletion) could trigger the use-after-free.

A similar race-window exists for the module unload path
in rds_tcp_exit -> rds_tcp_destroy_conns

Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
by checking check_net() before enqueuing new work or adding new
connections.

Concurrency with module-unload is handled by maintaining a module
specific flag that is set at the start of the module exit function,
and must be checked before enqueuing new work or adding new connections.

This commit refactors existing RDS_DESTROY_PENDING checks added by
commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
connection teardown") and consolidates all the concurrency checks
listed above into the function rds_destroy_pending().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Sowmini Varadhan
2018-02-09 04:23:52 +0800

06 Jan, 2018

1 commit

3db6e0d17 rds: use RCU to synchronize work-enqueue with connection teardown ... Browse Code »

rds_sendmsg() can enqueue work on cp_send_w from process context, but
it should not enqueue this work if connection teardown has commenced
(else we risk enquing work after rds_conn_path_destroy() has assumed that
all work has been cancelled/flushed).

Similarly some other functions like rds_cong_queue_updates
and rds_tcp_data_ready are called in softirq context, and may end
up enqueuing work on rds_wq after rds_conn_path_destroy() has assumed
that all workqs are quiesced.

Check the RDS_DESTROY_PENDING bit and use rcu synchronization to avoid
all these races.

Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2018-01-06 02:39:18 +0800

27 Dec, 2017

1 commit

14e138a86 RDS: Check cmsg_len before dereferencing CMSG_DATA ... Browse Code »

RDS currently doesn't check if the length of the control message is
large enough to hold the required data, before dereferencing the control
message data. This results in following crash:

BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
[inline]
BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
net/rds/send.c:1066
Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157

CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
rds_rdma_bytes net/rds/send.c:1013 [inline]
rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
sock_sendmsg_nosec net/socket.c:628 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:638
___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
__sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
SYSC_sendmmsg net/socket.c:2139 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2134
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x43fe49
RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000

To fix this, we verify that the cmsg_len is large enough to hold the
data to be read, before proceeding further.

Reported-by: syzbot
Signed-off-by: Avinash Repaka
Acked-by: Santosh Shilimkar
Reviewed-by: Yuval Shaia
Signed-off-by: David S. Miller

Avinash Repaka
2017-12-27 23:37:23 +0800

08 Sep, 2017

1 commit

126f760ca rds: Fix incorrect statistics counting ... Browse Code »

In rds_send_xmit() there is logic to batch the sends. However, if
another thread has acquired the lock and has incremented the send_gen,
it is considered a race and we yield. The code incrementing the
s_send_lock_queue_raced statistics counter did not count this event
correctly.

This commit counts the race condition correctly.

Changes from v1:
- Removed check for *someone_on_xmit()*
- Fixed incorrect indentation

Signed-off-by: Håkon Bugge
Reviewed-by: Knut Omang
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Håkon Bugge
2017-09-08 11:07:13 +0800

06 Sep, 2017

1 commit

f530f39f5 rds: Fix non-atomic operation on shared flag variable ... Browse Code »

The bits in m_flags in struct rds_message are used for a plurality of
reasons, and from different contexts. To avoid any missing updates to
m_flags, use the atomic set_bit() instead of the non-atomic equivalent.

Signed-off-by: Håkon Bugge
Reviewed-by: Knut Omang
Reviewed-by: Wei Lin Guay
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Håkon Bugge
2017-09-06 05:49:49 +0800

21 Jul, 2017

1 commit

e623a48ee rds: Make sure updates to cp_send_gen can be observed ... Browse Code »

cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.

Normative reference from the Linux-Kernel Memory Model:

Loads from and stores to shared (but non-atomic) variables should
be protected with the READ_ONCE(), WRITE_ONCE(), and
ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge
Reviewed-by: Knut Omang
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Håkon Bugge
2017-07-21 06:33:01 +0800

22 Jun, 2017

1 commit

69b92b5b7 rds: tcp: send handshake ping-probe from passive endpoint ... Browse Code »

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Signed-off-by: Sowmini Varadhan
Tested-by: Jenny Xu
Signed-off-by: David S. Miller

Sowmini Varadhan
2017-06-22 23:34:04 +0800

17 Jun, 2017

1 commit

00354de57 rds: tcp: various endian-ness fixes ... Browse Code »

Found when testing between sparc and x86 machines on different
subnets, so the address comparison patterns hit the corner cases and
brought out some bugs fixed by this patch.

Signed-off-by: Sowmini Varadhan
Tested-by: Imanti Mendez
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2017-06-17 00:45:15 +0800

03 Jan, 2017

4 commits

f9fb69adb RDS: make message size limit compliant with spec ... Browse Code »

RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.

Signed-off-by: Avinash Repaka
Signed-off-by: Santosh Shilimkar

Avinash Repaka
2017-01-03 06:02:57 +0800
941f8d55f RDS: RDMA: Fix the composite message user notification ... Browse Code »

When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.

Reviewed-by: Venkat Venkatsubra
Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2017-01-03 06:02:54 +0800
584a8279a RDS: RDMA: return appropriate error on rdma map failures ... Browse Code »

The first message to a remote node should prompt a new
connection even if it is RDMA operation. For RDMA operation
the MR mapping can fail because connections is not yet up.

Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
Before returning to the user, lets trigger the connection
so that its ready for the next retry.

Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2017-01-03 06:02:46 +0800
bb7897631 RDS: mark few internal functions static to make sparse build happy ... Browse Code »

Fixes below warnings:
warning: symbol 'rds_send_probe' was not declared. Should it be static?
warning: symbol 'rds_send_ping' was not declared. Should it be static?
warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2017-01-03 06:02:40 +0800

18 Nov, 2016

1 commit

905dd4184 RDS: TCP: Track peer's connection generation number ... Browse Code »

The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
but the RDS socket/connection layer on both sides stays
the same
(b) when the peer's RDS layer itself resets (e.g., due to module
reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.

Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-11-18 02:35:18 +0800

16 Jul, 2016

1 commit

5916e2c15 RDS: TCP: Enable multipath RDS for TCP ... Browse Code »

Use RDS probe-ping to compute how many paths may be used with
the peer, and to synchronously start the multiple paths. If mprds is
supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
when multipath RDS is supported by the transport.

CC: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-07-16 02:36:58 +0800

02 Jul, 2016

1 commit

226f7a7d9 RDS: Rework path specific indirections ... Browse Code »

Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.

Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-07-02 04:45:17 +0800

15 Jun, 2016

1 commit

d769ef81d RDS: Update rds_conn_shutdown to work with rds_conn_path ... Browse Code »

This commit changes rds_conn_shutdown to take a rds_conn_path *
argument, allowing it to shutdown paths other than c_path[0] for
MP-capable transports.

Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-06-15 14:50:44 +0800