24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

02 Jul, 2020

1 commit

  • In testing with mprds enabled, Oracle Cluster nodes after reboot were
    not able to communicate with others nodes and so failed to rejoin
    the cluster. Peers with lower IP address initiated connection but the
    node could not respond as it choose a different path and could not
    initiate a connection as it had a higher IP address.

    With this patch, when a node sends out a packet and the selected path
    is down, all other paths are also checked and any down paths are
    re-connected.

    Reviewed-by: Ka-cheong Poon
    Reviewed-by: David Edmondson
    Signed-off-by: Somasundaram Krishnasamy
    Signed-off-by: Rao Shoaib
    Signed-off-by: David S. Miller

    Rao Shoaib
     

16 Apr, 2020

1 commit

  • Returning the error code via a 'int *ret' when the function returns a
    pointer is very un-kernely and causes gcc 10's static analysis to choke:

    net/rds/message.c: In function ‘rds_message_map_pages’:
    net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    358 | return ERR_PTR(ret);

    Use a typical ERR_PTR return instead.

    Signed-off-by: Jason Gunthorpe
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Jason Gunthorpe
     

05 Sep, 2019

1 commit

  • IN_MULTICAST's primary intent is as a uapi macro.

    Elsewhere in the kernel we use ipv4_is_multicast consistently.

    This patch unifies linux's multicast checks to use that function
    rather than this macro.

    Signed-off-by: Dave Taht
    Reviewed-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Dave Taht
     

16 Aug, 2019

1 commit


10 Jul, 2019

1 commit

  • RDS composite message(rdma + control) user notification needs to be
    triggered once the full message is delivered and such a fix was
    added as part of commit 941f8d55f6d61 ("RDS: RDMA: Fix the composite
    message user notification"). But rds_send_remove_from_sock is missing
    data part notify check and hence at times the user don't get
    notification which isn't desirable.

    One way is to fix the rds_send_remove_from_sock to check of that case
    but considering the ordering complexity with completion handler and
    rdma + control messages are always dispatched back to back in same send
    context, just delaying the signaled completion on rmda work request also
    gets the desired behaviour. i.e Notifying application only after
    RDMA + control message send completes. So patch updates the earlier
    fix with this approach. The delay signaling completions of rdma op
    till the control message send completes fix was done by Venkat
    Venkatsubra in downstream kernel.

    Reviewed-and-tested-by: Zhu Yanjun
    Reviewed-by: Gerd Rausch
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

05 Feb, 2019

2 commits

  • For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
    to provide clients the ability to segregate traffic flows for
    different type of data. RDMA CM abstract it for ULPs using
    rdma_set_service_type(). Internally, each traffic flow is
    represented by a connection with all of its independent resources
    like that of a normal connection, and is differentiated by
    service type. In other words, there can be multiple qp connections
    between an IP pair and each supports a unique service type.

    The feature has been added from RDSv4.1 onwards and supports
    rolling upgrades. RDMA connection metadata also carries the tos
    information to set up SL on end to end context. The original
    code was developed by Bang Nguyen in downstream kernel back in
    2.6.32 kernel days and it has evolved over period of time.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     
  • RDS Service type (TOS) is user-defined and needs to be configured
    via RDS IOCTL interface. It must be set before initiating any
    traffic and once set the TOS can not be changed. All out-going
    traffic from the socket will be associated with its TOS.

    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    [yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
    Signed-off-by: Zhu Yanjun

    Santosh Shilimkar
     

07 Jan, 2019

1 commit


20 Dec, 2018

3 commits

  • >> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer

    Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs")
    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    David S. Miller
     
  • per comment from Leon in rdma mailing list
    https://lkml.org/lkml/2018/10/31/312 :

    Please don't forget to remove user triggered WARN_ON.
    https://lwn.net/Articles/769365/
    "Greg Kroah-Hartman raised the problem of core kernel API code that will
    use WARN_ON_ONCE() to complain about bad usage; that will not generate
    the desired result if WARN_ON_ONCE() is configured to crash the machine.
    He was told that the code should just call pr_warn() instead, and that
    the called function should return an error in such situations. It was
    generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
    triggered from user space need to be fixed."

    in addition harden rds_sendmsg to detect and overcome issues with
    invalid sg count and fail the sendmsg.

    Suggested-by: Leon Romanovsky
    Acked-by: Santosh Shilimkar
    Signed-off-by: shamir rabinovitch
    Signed-off-by: David S. Miller

    shamir rabinovitch
     
  • redundant copy_from_user in rds_sendmsg system call expose rds
    to issue where rds_rdma_extra_size walk the rds iovec and and
    calculate the number pf pages (sgs) it need to add to the tail of
    rds message and later rds_cmsg_rdma_args copy the rds iovec again
    and re calculate the same number and get different result causing
    WARN_ON in rds_message_alloc_sgs.

    fix this by doing the copy_from_user only once per rds_sendmsg
    system call.

    When issue occur the below dump is seen:

    WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x244/0x39d lib/dump_stack.c:113
    panic+0x2ad/0x55c kernel/panic.c:188
    __warn.cold.8+0x20/0x45 kernel/panic.c:540
    report_bug+0x254/0x2d0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
    RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
    Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
    RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
    RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
    RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
    RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
    R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
    R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
    rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
    rds_cmsg_send net/rds/send.c:971 [inline]
    rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
    sock_sendmsg_nosec net/socket.c:622 [inline]
    sock_sendmsg+0xd5/0x120 net/socket.c:632
    ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
    __sys_sendmsg+0x11d/0x280 net/socket.c:2155
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x44a859
    Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
    RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
    RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
    R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
    Kernel Offset: disabled
    Rebooting in 86400 seconds..

    Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
    Acked-by: Santosh Shilimkar
    Signed-off-by: shamir rabinovitch
    Signed-off-by: David S. Miller

    shamir rabinovitch
     

11 Oct, 2018

1 commit

  • In rds_send_mprds_hash(), if the calculated hash value is non-zero and
    the MPRDS connections are not yet up, it will wait. But it should not
    wait if the send is non-blocking. In this case, it should just use the
    base c_path for sending the message.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

03 Aug, 2018

1 commit


02 Aug, 2018

1 commit


27 Jul, 2018

1 commit

  • Registration of a memory region(MR) through FRMR/fastreg(unlike FMR)
    needs a connection/qp. With a proxy qp, this dependency on connection
    will be removed, but that needs more infrastructure patches, which is a
    work in progress.

    As an intermediate fix, the get_mr returns EOPNOTSUPP when connection
    details are not populated. The MR registration through sendmsg() will
    continue to work even with fast registration, since connection in this
    case is formed upfront.

    This patch fixes the following crash:
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Modules linked in:
    CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544
    RSP: 0018:ffff8801b059f890 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e
    RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068
    RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200
    R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200
    R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000
    FS: 00007f4d050af700(0000) GS:ffff8801db300000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271
    rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357
    rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x42/0xb7
    RIP: 0033:0x4456d9
    RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9
    RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004
    RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000
    R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005
    Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00
    48
    b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 3c 02
    00 0f
    85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00
    RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP:
    ffff8801b059f890
    ---[ end trace 7e1cea13b85473b0 ]---

    Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Avinash Repaka

    Signed-off-by: David S. Miller

    Avinash Repaka
     

26 Jul, 2018

1 commit

  • Currently, code at label *out* is unreachable. Fix this by updating
    variable *ret* with -EINVAL, so the jump to *out* can be properly
    executed instead of directly returning from function.

    Addresses-Coverity-ID: 1472059 ("Structurally dead code")
    Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

24 Jul, 2018

2 commits

  • This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
    listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
    connection requests. RDS/RDMA/IB uses a private data (struct
    rds_ib_connect_private) exchange between endpoints at RDS connection
    establishment time to support RDMA. This private data exchange uses a
    32 bit integer to represent an IP address. This needs to be changed in
    order to support IPv6. A new private data struct
    rds6_ib_connect_private is introduced to handle this. To ensure
    backward compatibility, an IPv6 capable RDS stack uses another RDMA
    listener port (RDS_CM_PORT) to accept IPv6 connection. And it
    continues to use the original RDS_PORT for IPv4 RDS connections. When
    it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
    send the connection set up request.

    v5: Fixed syntax problem (David Miller).

    v4: Changed port history comments in rds.h (Sowmini Varadhan).

    v3: Added support to set up IPv4 connection using mapped address
    (David Miller).
    Added support to set up connection between link local and non-link
    addresses.
    Various review comments from Santosh Shilimkar and Sowmini Varadhan.

    v2: Fixed bound and peer address scope mismatched issue.
    Added back rds_connect() IPv6 changes.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     
  • This patch changes the internal representation of an IP address to use
    struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
    All the functions which take an IP address as argument are also
    changed to use struct in6_addr. But RDS socket layer is not modified
    such that it still does not accept IPv6 address from an application.
    And RDS layer does not accept nor initiate IPv6 connections.

    v2: Fixed sparse warnings.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

11 Apr, 2018

1 commit

  • rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
    send a message. Suppose the RDS connection is not yet up. In
    rds_send_mprds_hash(), it does

    if (conn->c_npaths == 0)
    wait_event_interruptible(conn->c_hs_waitq,
    (conn->c_npaths != 0));

    If it is interrupted before the connection is set up,
    rds_send_mprds_hash() will return a non-zero hash value. Hence
    rds_sendmsg() will use a non-zero c_path to send the message. But if
    the RDS connection ends up to be non-MP capable, the message will be
    lost as only the zero c_path can be used.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

24 Feb, 2018

1 commit

  • if either or both of MSG_ZEROCOPY and SOCK_ZEROCOPY have not been
    specified, the rm->data.op_mmp_znotifier allocation will be skipped.
    In this case, it is invalid ot pass down a cmsghdr with
    RDS_CMSG_ZCOPY_COOKIE, so return EINVAL from rds_msg_zcopy for this
    case.

    Reported-by: syzbot+f893ae7bb2f6456dfbc3@syzkaller.appspotmail.com
    Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Willem de Bruijn
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

22 Feb, 2018

1 commit


17 Feb, 2018

2 commits

  • If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
    if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
    application pages sent down with rds_sendmsg() are pinned.

    The pinning uses the accounting infrastructure added by
    Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

    The payload bytes in the message may not be modified for the
    duration that the message has been pinned. A multi-threaded
    application using this infrastructure may thus need to be notified
    about send-completion so that it can free/reuse the buffers
    passed to rds_sendmsg(). Notification of send-completion will
    identify each message-buffer by a cookie that the application
    must specify as ancillary data to rds_sendmsg().
    The ancillary data in this case has cmsg_level == SOL_RDS
    and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • The existing model holds a reference from the rds_sock to the
    rds_message, but the rds_message does not itself hold a sock_put()
    on the rds_sock. Instead the m_rs field in the rds_message is
    assigned when the message is queued on the sock, and nulled when
    the message is dequeued from the sock.

    We want to be able to notify userspace when the rds_message
    is actually freed (from rds_message_purge(), after the refcounts
    to the rds_message go to 0). At the time that rds_message_purge()
    is called, the message is no longer on the rds_sock retransmit
    queue. Thus the explicit reference for the m_rs is needed to
    send a notification that will signal to userspace that
    it is now safe to free/reuse any pages that may have
    been pinned down for zerocopy.

    This patch manages the m_rs assignment in the rds_message with
    the necessary refcount book-keeping.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

09 Feb, 2018

1 commit

  • … connection/workq management

    An rds_connection can get added during netns deletion between lines 528
    and 529 of

    506 static void rds_tcp_kill_sock(struct net *net)
    :
    /* code to pull out all the rds_connections that should be destroyed */
    :
    528 spin_unlock_irq(&rds_tcp_conn_lock);
    529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
    530 rds_conn_destroy(tc->t_cpath->cp_conn);

    Such an rds_connection would miss out the rds_conn_destroy()
    loop (that cancels all pending work) and (if it was scheduled
    after netns deletion) could trigger the use-after-free.

    A similar race-window exists for the module unload path
    in rds_tcp_exit -> rds_tcp_destroy_conns

    Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
    by checking check_net() before enqueuing new work or adding new
    connections.

    Concurrency with module-unload is handled by maintaining a module
    specific flag that is set at the start of the module exit function,
    and must be checked before enqueuing new work or adding new connections.

    This commit refactors existing RDS_DESTROY_PENDING checks added by
    commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
    connection teardown") and consolidates all the concurrency checks
    listed above into the function rds_destroy_pending().

    Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
    Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Sowmini Varadhan
     

06 Jan, 2018

1 commit

  • rds_sendmsg() can enqueue work on cp_send_w from process context, but
    it should not enqueue this work if connection teardown has commenced
    (else we risk enquing work after rds_conn_path_destroy() has assumed that
    all work has been cancelled/flushed).

    Similarly some other functions like rds_cong_queue_updates
    and rds_tcp_data_ready are called in softirq context, and may end
    up enqueuing work on rds_wq after rds_conn_path_destroy() has assumed
    that all workqs are quiesced.

    Check the RDS_DESTROY_PENDING bit and use rcu synchronization to avoid
    all these races.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

27 Dec, 2017

1 commit

  • RDS currently doesn't check if the length of the control message is
    large enough to hold the required data, before dereferencing the control
    message data. This results in following crash:

    BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
    [inline]
    BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
    net/rds/send.c:1066
    Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157

    CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_address_description+0x73/0x250 mm/kasan/report.c:252
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x25b/0x340 mm/kasan/report.c:409
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
    rds_rdma_bytes net/rds/send.c:1013 [inline]
    rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
    sock_sendmsg_nosec net/socket.c:628 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:638
    ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
    __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
    SYSC_sendmmsg net/socket.c:2139 [inline]
    SyS_sendmmsg+0x35/0x60 net/socket.c:2134
    entry_SYSCALL_64_fastpath+0x1f/0x96
    RIP: 0033:0x43fe49
    RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
    RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
    R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000

    To fix this, we verify that the cmsg_len is large enough to hold the
    data to be read, before proceeding further.

    Reported-by: syzbot
    Signed-off-by: Avinash Repaka
    Acked-by: Santosh Shilimkar
    Reviewed-by: Yuval Shaia
    Signed-off-by: David S. Miller

    Avinash Repaka
     

08 Sep, 2017

1 commit

  • In rds_send_xmit() there is logic to batch the sends. However, if
    another thread has acquired the lock and has incremented the send_gen,
    it is considered a race and we yield. The code incrementing the
    s_send_lock_queue_raced statistics counter did not count this event
    correctly.

    This commit counts the race condition correctly.

    Changes from v1:
    - Removed check for *someone_on_xmit()*
    - Fixed incorrect indentation

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

06 Sep, 2017

1 commit

  • The bits in m_flags in struct rds_message are used for a plurality of
    reasons, and from different contexts. To avoid any missing updates to
    m_flags, use the atomic set_bit() instead of the non-atomic equivalent.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Reviewed-by: Wei Lin Guay
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

21 Jul, 2017

1 commit

  • cp->cp_send_gen is treated as a normal variable, although it may be
    used by different threads.

    This is fixed by using {READ,WRITE}_ONCE when it is incremented and
    READ_ONCE when it is read outside the {acquire,release}_in_xmit
    protection.

    Normative reference from the Linux-Kernel Memory Model:

    Loads from and stores to shared (but non-atomic) variables should
    be protected with the READ_ONCE(), WRITE_ONCE(), and
    ACCESS_ONCE().

    Clause 5.1.2.4/25 in the C standard is also relevant.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

22 Jun, 2017

1 commit

  • The RDS handshake ping probe added by commit 5916e2c1554f
    ("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
    before the first data packet is sent to a peer. If the conversation
    is not bidirectional (i.e., one side is always passive and never
    invokes rds_sendmsg()) and the passive side restarts its rds_tcp
    module, a new HS ping probe needs to be sent, so that the number
    of paths can be re-established.

    This patch achieves that by sending a HS ping probe from
    rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
    a handshake probe with this peer yet).

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Jun, 2017

1 commit

  • Found when testing between sparc and x86 machines on different
    subnets, so the address comparison patterns hit the corner cases and
    brought out some bugs fixed by this patch.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

03 Jan, 2017

4 commits

  • RDS support max message size as 1M but the code doesn't check this
    in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
    and its enforced irrespective of underlying transport.

    Signed-off-by: Avinash Repaka
    Signed-off-by: Santosh Shilimkar

    Avinash Repaka
     
  • When application sends an RDS RDMA composite message consist of
    RDMA transfer to be followed up by non RDMA payload, it expect to
    be notified *only* when the full message gets delivered. RDS RDMA
    notification doesn't behave this way though.

    Thanks to Venkat for debug and root casuing the issue
    where only first part of the message(RDMA) was
    successfully delivered but remainder payload delivery failed.
    In that case, application should not be notified with
    a false positive of message delivery success.

    Fix this case by making sure the user gets notified only after
    the full message delivery.

    Reviewed-by: Venkat Venkatsubra
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • The first message to a remote node should prompt a new
    connection even if it is RDMA operation. For RDMA operation
    the MR mapping can fail because connections is not yet up.

    Since the connection establishment is asynchronous,
    we make sure the map failure because of unavailable
    connection reach to the user by appropriate error code.
    Before returning to the user, lets trigger the connection
    so that its ready for the next retry.

    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • Fixes below warnings:
    warning: symbol 'rds_send_probe' was not declared. Should it be static?
    warning: symbol 'rds_send_ping' was not declared. Should it be static?
    warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
    warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?

    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

18 Nov, 2016

1 commit

  • The RDS transport has to be able to distinguish between
    two types of failure events:
    (a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
    (b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
    In case (a) both sides must reconnect and continue the RDS messaging
    without any message loss or disruption to the message sequence numbers,
    and this is achieved by rds_send_path_reset().

    In case (b) we should reset all rds_connection state to the
    new incarnation of the peer. Examples of state that needs to
    be reset are next expected rx sequence number from, or messages to be
    retransmitted to, the new incarnation of the peer.

    To achieve this, the RDS handshake probe added as part of
    commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
    is enhanced so that sender and receiver of the RDS ping-probe
    will add a generation number as part of the RDS_EXTHDR_GEN_NUM
    extension header. Each peer stores local and remote generation
    numbers as part of each rds_connection. Changes in generation
    number will be detected via incoming handshake probe ping
    request or response and will allow the receiver to reset rds_connection
    state.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

16 Jul, 2016

1 commit

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

02 Jul, 2016

1 commit

  • Refactor code to avoid separate indirections for single-path
    and multipath transports. All transports (both single and mp-capable)
    will get a pointer to the rds_conn_path, and can trivially derive
    the rds_connection from the ->cp_conn.

    Acked-by: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jun, 2016

1 commit