13 Oct, 2018

1 commit

  • commit f394ad28feffbeebab77c8bf9a203bd49b957c9a upstream.

    Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches()
    without passing along the gfp_t flag. But rds_ib_recv_alloc_caches()
    and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that
    rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the
    correct flag instead of calling alloc_percpu().

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Cc: Håkon Bugge
    Signed-off-by: Greg Kroah-Hartman

    Ka-Cheong Poon
     

26 Sep, 2018

1 commit

  • [ Upstream commit cc4dfb7f70a344f24c1c71e298deea0771dadcb2 ]

    When a rds sock is bound, it is inserted into the bind_hash_table
    which is protected by RCU. But when releasing rds sock, after it
    is removed from this hash table, it is freed immediately without
    respecting RCU grace period. This could cause some use-after-free
    as reported by syzbot.

    Mark the rds sock with SOCK_RCU_FREE before inserting it into the
    bind_hash_table, so that it would be always freed after a RCU grace
    period.

    The other problem is in rds_find_bound(), the rds sock could be
    freed in between rhashtable_lookup_fast() and rds_sock_addref(),
    so we need to extend RCU read lock protection in rds_find_bound()
    to close this race condition.

    Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
    Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Signed-off-by: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

15 Sep, 2018

1 commit

  • [ Upstream commit 5941923da29e84bc9e2a1abb2c14fffaf8d71e2f ]

    Fix a static code checker warning:
    net/rds/ib_frmr.c:82 rds_ib_alloc_frmr() warn: passing zero to 'ERR_PTR'

    The error path for ib_alloc_mr failure should set err to PTR_ERR.

    Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
    Signed-off-by: YueHaibing
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     

22 Jul, 2018

1 commit

  • commit f1693c63ab133d16994cc50f773982b5905af264 upstream.

    Loop transport which is self loopback, remote port congestion
    update isn't relevant. Infact the xmit path already ignores it.
    Receive path needs to do the same.

    Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Santosh Shilimkar
     

21 Jun, 2018

1 commit

  • [ Upstream commit 91a825290ca4eae88603bc811bf74a45f94a3f46 ]

    The function rds_ib_setup_qp is calling rds_ib_get_client_data and
    should correspondingly call rds_ib_dev_put. This call was lost in
    the non-error path with the introduction of error handling done in
    commit 3b12f73a5c29 ("rds: ib: add error handle")

    Signed-off-by: Dag Moxnes
    Reviewed-by: Håkon Bugge
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dag Moxnes
     

30 May, 2018

1 commit

  • [ Upstream commit 84eef2b2187ed73c0e4520cbfeb874e964a0b56a ]

    Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
    accept socket") has a reference counting issue in TCP socket creation
    when accepting a new connection. The code uses sock_create_lite() to
    create a kernel socket. But it does not do __module_get() on the
    socket owner. When the connection is shutdown and sock_release() is
    called to free the socket, the owner's reference count is decremented
    and becomes incorrect. Note that this bug only shows up when the socket
    owner is configured as a kernel module.

    v2: Update comments

    Fixes: 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket")
    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ka-Cheong Poon
     

19 May, 2018

1 commit

  • [ Upstream commit eb80ca476ec11f67a62691a93604b405ffc7d80c ]

    syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
    from rds_cmsg_recv().

    Simply clear the structure, since we have holes there, or since
    rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.

    BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
    BUG: KMSAN: uninit-value in put_cmsg+0x600/0x870 net/core/scm.c:242
    CPU: 0 PID: 4459 Comm: syz-executor582 Not tainted 4.16.0+ #87
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:53
    kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
    kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
    kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x600/0x870 net/core/scm.c:242
    rds_cmsg_recv net/rds/recv.c:570 [inline]
    rds_recvmsg+0x2db5/0x3170 net/rds/recv.c:657
    sock_recvmsg_nosec net/socket.c:803 [inline]
    sock_recvmsg+0x1d0/0x230 net/socket.c:810
    ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205
    __sys_recvmsg net/socket.c:2250 [inline]
    SYSC_recvmsg+0x298/0x3c0 net/socket.c:2262
    SyS_recvmsg+0x54/0x80 net/socket.c:2257
    do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Santosh Shilimkar
    Cc: linux-rdma
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

26 Apr, 2018

1 commit

  • [ Upstream commit 2c0aa08631b86a4678dbc93b9caa5248014b4458 ]

    Scenario:
    1. Port down and do fail over
    2. Ap do rds_bind syscall

    PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6"
    #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
    #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
    #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
    #3 [ffff898e35f15b60] no_context at ffffffff8104854c
    #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
    #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
    #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
    #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282
    RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88
    RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000
    R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0
    R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
    #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
    #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
    #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

    PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap"
    #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
    #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
    #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
    #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
    #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
    #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
    #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
    #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
    #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

    PID: 45659 PID: 47039
    rds_ib_laddr_check
    /* create id_priv with a null event_handler */
    rdma_create_id
    rdma_bind_addr
    cma_acquire_dev
    /* add id_priv to cma_dev->id_list */
    cma_attach_to_dev
    cma_ndev_work_handler
    /* event_hanlder is null */
    id_priv->id.event_handler

    Signed-off-by: Guanglei Li
    Signed-off-by: Honglei Wang
    Reviewed-by: Junxiao Bi
    Reviewed-by: Yanjun Zhu
    Reviewed-by: Leon Romanovsky
    Acked-by: Santosh Shilimkar
    Acked-by: Doug Ledford
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Guanglei Li
     

19 Apr, 2018

1 commit

  • [ Upstream commit a43cced9a348901f9015f4730b70b69e7c41a9c9 ]

    rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
    send a message. Suppose the RDS connection is not yet up. In
    rds_send_mprds_hash(), it does

    if (conn->c_npaths == 0)
    wait_event_interruptible(conn->c_hs_waitq,
    (conn->c_npaths != 0));

    If it is interrupted before the connection is set up,
    rds_send_mprds_hash() will return a non-zero hash value. Hence
    rds_sendmsg() will use a non-zero c_path to send the message. But if
    the RDS connection ends up to be non-MP capable, the message will be
    lost as only the zero c_path can be used.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ka-Cheong Poon
     

12 Apr, 2018

1 commit

  • [ Upstream commit 7ae0c649c47f1c5d2db8cee6dd75855970af1669 ]

    If the rds_sock is not added to the bind_hash_table, we must
    reset rs_bound_addr so that rds_remove_bound will not trip on
    this rds_sock.

    rds_add_bound() does a rds_sock_put() in this failure path, so
    failing to reset rs_bound_addr will result in a socket refcount
    bug, and will trigger a WARN_ON with the stack shown below when
    the application subsequently tries to close the PF_RDS socket.

    WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
    rds_sock_destruct+0x15/0x30 [rds]
    :
    __sk_destruct+0x21/0x190
    rds_remove_bound.part.13+0xb6/0x140 [rds]
    rds_release+0x71/0x120 [rds]
    sock_release+0x1a/0x70
    sock_close+0xe/0x20
    __fput+0xd5/0x210
    task_work_run+0x82/0xa0
    do_exit+0x2ce/0xb30
    ? syscall_trace_enter+0x1cc/0x2b0
    do_group_exit+0x39/0xa0
    SyS_exit_group+0x10/0x10
    do_syscall_64+0x61/0x1a0

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sowmini Varadhan
     

25 Feb, 2018

2 commits

  • commit f10b4cff98c6977668434fbf5dd58695eeca2897 upstream.

    The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
    to find the rds_connection entries marked for deletion as part
    of the netns deletion under the protection of the rds_tcp_conn_lock.
    Since the rds_tcp_conn_list tracks rds_tcp_connections (which
    have a 1:1 mapping with rds_conn_path), multiple tc entries in
    the rds_tcp_conn_list will map to a single rds_connection, and will
    be deleted as part of the rds_conn_destroy() operation that is
    done outside the rds_tcp_conn_lock.

    The rds_tcp_conn_list traversal done under the protection of
    rds_tcp_conn_lock should not leave any doomed tc entries in
    the list after the rds_tcp_conn_lock is released, else another
    concurrently executiong netns delete (for a differnt netns) thread
    may trip on these entries.

    Reported-by: syzbot
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sowmini Varadhan
     
  • commit 681648e67d43cf269c5590ecf021ed481f4551fc upstream.

    Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
    introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
    (and thus rds_tcp_dev_event notification) is only called from put_net()
    when all netns refcounts go to 0, but this cannot happen if the
    rds_connection itself is holding a c_net ref that it expects to
    release in rds_tcp_kill_sock.

    Instead, the rds_tcp_kill_sock callback should make sure to
    tear down state carefully, ensuring that the socket teardown
    is only done after all data-structures and workqs that depend
    on it are quiesced.

    The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit
    refcounts on struct net") was to resolve a race condition reported by
    syzkaller where workqs for tx/rx/connect were triggered after the
    namespace was deleted. Those worker threads should have been
    cancelled/flushed before socket tear-down and indeed,
    rds_conn_path_destroy() does try to sequence this by doing
    /* cancel cp_send_w */
    /* cancel cp_recv_w */
    /* flush cp_down_w */
    /* free data structures */
    Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
    invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
    we ought to have satisfied the requirement that "socket-close is
    done after all other dependent state is quiesced". However,
    rds_conn_shutdown has a bug in that it *always* triggers the reconnect
    workq (and if connection is successful, we always restart tx/rx
    workqs so with the right timing, we risk the race conditions reported
    by syzkaller).

    Netns deletion is like module teardown- no need to restart a
    reconnect in this case. We can use the c_destroy_in_prog bit
    to avoid restarting the reconnect.

    Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sowmini Varadhan
     

17 Jan, 2018

2 commits

  • [ Upstream commit 7d11f77f84b27cef452cee332f4e469503084737 ]

    set rm->atomic.op_active to 0 when rds_pin_pages() fails
    or the user supplied address is invalid,
    this prevents a NULL pointer usage in rds_atomic_free_op()

    Signed-off-by: Mohamed Ghannam
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mohamed Ghannam
     
  • [ Upstream commit c095508770aebf1b9218e77026e48345d719b17c ]

    When args->nr_local is 0, nr_pages gets also 0 due some size
    calculation via rds_rm_size(), which is later used to allocate
    pages for DMA, this bug produces a heap Out-Of-Bound write access
    to a specific memory region.

    Signed-off-by: Mohamed Ghannam
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mohamed Ghannam
     

03 Jan, 2018

1 commit

  • [ Upstream commit 14e138a86f6347c6199f610576d2e11c03bec5f0 ]

    RDS currently doesn't check if the length of the control message is
    large enough to hold the required data, before dereferencing the control
    message data. This results in following crash:

    BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
    [inline]
    BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
    net/rds/send.c:1066
    Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157

    CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_address_description+0x73/0x250 mm/kasan/report.c:252
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x25b/0x340 mm/kasan/report.c:409
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
    rds_rdma_bytes net/rds/send.c:1013 [inline]
    rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
    sock_sendmsg_nosec net/socket.c:628 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:638
    ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
    __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
    SYSC_sendmmsg net/socket.c:2139 [inline]
    SyS_sendmmsg+0x35/0x60 net/socket.c:2134
    entry_SYSCALL_64_fastpath+0x1f/0x96
    RIP: 0033:0x43fe49
    RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
    RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
    R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000

    To fix this, we verify that the cmsg_len is large enough to hold the
    data to be read, before proceeding further.

    Reported-by: syzbot
    Signed-off-by: Avinash Repaka
    Acked-by: Santosh Shilimkar
    Reviewed-by: Yuval Shaia
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Avinash Repaka
     

17 Dec, 2017

1 commit

  • [ Upstream commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca ]

    This is a fix for syzkaller719569, where memory registration was
    attempted without any underlying transport being loaded.

    Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
    (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.

    Here is an example stack trace when the bug is hit:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
    IP: __rds_rdma_map+0x36/0x440 [rds]
    PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
    dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
    coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
    ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
    iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
    shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
    auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
    mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
    sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
    crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
    dm_region_hash dm_log dm_mod
    CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
    Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
    task: ffff882f9190db00 task.stack: ffffc9002b994000
    RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
    RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580
    RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002
    R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580
    R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc
    FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0
    Call Trace:
    rds_get_mr+0x56/0x80 [rds]
    rds_setsockopt+0x172/0x340 [rds]
    ? __fget_light+0x25/0x60
    ? __fdget+0x13/0x20
    SyS_setsockopt+0x80/0xe0
    do_syscall_64+0x67/0x1b0
    entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fbff9b117f9
    RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9
    RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b
    RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860
    R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000
    R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021

    Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
    89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00
    83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08

    The fix is to check the existence of an underlying transport in
    __rds_rdma_map().

    Signed-off-by: Håkon Bugge
    Reported-by: syzbot
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Håkon Bugge
     

10 Nov, 2017

1 commit

  • rds_ib_recv_refill() is a function that refills an IB receive
    queue. It can be called from both the CQE handler (tasklet) and a
    worker thread.

    Just after the call to ib_post_recv(), a debug message is printed with
    rdsdebug():

    ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
    rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
    recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
    (long) ib_sg_dma_address(
    ic->i_cm_id->device,
    &recv->r_frag->f_sg),
    ret);

    Now consider an invocation of rds_ib_recv_refill() from the worker
    thread, which is preemptible. Further, assume that the worker thread
    is preempted between the ib_post_recv() and rdsdebug() statements.

    Then, if the preemption is due to a receive CQE event, the
    rds_ib_recv_cqe_handler() will be invoked. This function processes
    receive completions, including freeing up data structures, such as the
    recv->r_frag.

    In this scenario, rds_ib_recv_cqe_handler() will process the receive
    WR posted above. That implies, that the recv->r_frag has been freed
    before the above rdsdebug() statement has been executed. When it is
    later executed, we will have a NULL pointer dereference:

    [ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    [ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
    [ 4088.082686] PGD 0 P4D 0
    [ 4088.085515] Oops: 0000 [#1] SMP
    [ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
    [ 4088.168486] fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
    [ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G OE 4.14.0-rc7.master.20171105.ol7.x86_64 #1
    [ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
    [ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
    [ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
    [ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
    [ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
    [ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
    [ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
    [ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
    [ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
    [ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
    [ 4088.280397] FS: 0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
    [ 4088.289434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
    [ 4088.303816] Call Trace:
    [ 4088.306557] rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
    [ 4088.312982] ? __dynamic_pr_debug+0x8c/0xb0
    [ 4088.317664] ? __queue_work+0x142/0x3c0
    [ 4088.321944] rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
    [ 4088.328370] cma_ib_handler+0xcd/0x280 [rdma_cm]
    [ 4088.333522] cm_process_work+0x25/0x120 [ib_cm]
    [ 4088.338580] cm_work_handler+0xd6b/0x17aa [ib_cm]
    [ 4088.343832] process_one_work+0x149/0x360
    [ 4088.348307] worker_thread+0x4d/0x3e0
    [ 4088.352397] kthread+0x109/0x140
    [ 4088.355996] ? rescuer_thread+0x380/0x380
    [ 4088.360467] ? kthread_park+0x60/0x60
    [ 4088.364563] ret_from_fork+0x25/0x30
    [ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
    [ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
    [ 4088.397772] CR2: 0000000000000020
    [ 4088.401505] ---[ end trace fe922e6ccf004431 ]---

    This bug was provoked by compiling rds out-of-tree with
    EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
    between the rdsdebug() and ib_ib_port_recv() statements:

    /* XXX when can this fail? */
    ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
    + if (can_wait)
    + usleep_range(1000, 5000);
    rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
    recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
    (long) ib_sg_dma_address(

    The fix is simply to move the rdsdebug() statement up before the
    ib_post_recv() and remove the printing of ret, which is taken care of
    anyway by the non-debug code.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Reviewed-by: Wei Lin Guay
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

03 Nov, 2017

1 commit

  • …el/git/gregkh/driver-core

    Pull initial SPDX identifiers from Greg KH:
    "License cleanup: add SPDX license identifiers to some files

    Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the
    'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
    binding shorthand, which can be used instead of the full boiler plate
    text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart
    and Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset
    of the use cases:

    - file had no licensing information it it.

    - file was a */uapi/* one with no licensing information in it,

    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to
    license had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied
    to a file was done in a spreadsheet of side by side results from of
    the output of two independent scanners (ScanCode & Windriver)
    producing SPDX tag:value files created by Philippe Ombredanne.
    Philippe prepared the base worksheet, and did an initial spot review
    of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537
    files assessed. Kate Stewart did a file by file comparison of the
    scanner results in the spreadsheet to determine which SPDX license
    identifier(s) to be applied to the file. She confirmed any
    determination that was not immediately clear with lawyers working with
    the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:

    - Files considered eligible had to be source code files.

    - Make and config files were included as candidates if they contained
    >5 lines of source

    - File already had some variant of a license header in it (even if <5
    lines).

    All documentation files were explicitly excluded.

    The following heuristics were used to determine which SPDX license
    identifiers to apply.

    - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.

    For non */uapi/* files that summary was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 11139

    and resulted in the first patch in this series.

    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
    was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note 930

    and resulted in the second patch in this series.

    - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point). Results summary:

    SPDX license identifier # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note 270
    GPL-2.0+ WITH Linux-syscall-note 169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
    LGPL-2.1+ WITH Linux-syscall-note 15
    GPL-1.0+ WITH Linux-syscall-note 14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
    LGPL-2.0+ WITH Linux-syscall-note 4
    LGPL-2.1 WITH Linux-syscall-note 3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

    and that resulted in the third patch in this series.

    - when the two scanners agreed on the detected license(s), that
    became the concluded license(s).

    - when there was disagreement between the two scanners (one detected
    a license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.

    - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply
    (and which scanner probably needed to revisit its heuristics).

    - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.

    - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.

    In total, over 70 hours of logged manual review was done on the
    spreadsheet to determine the SPDX license identifiers to apply to the
    source files by Kate, Philippe, Thomas and, in some cases,
    confirmation by lawyers working with the Linux Foundation.

    Kate also obtained a third independent scan of the 4.13 code base from
    FOSSology, and compared selected files where the other two scanners
    disagreed against that SPDX file, to see if there was new insights.
    The Windriver scanner is based on an older version of FOSSology in
    part, so they are related.

    Thomas did random spot checks in about 500 files from the spreadsheets
    for the uapi headers and agreed with SPDX license identifier in the
    files he inspected. For the non-uapi files Thomas did random spot
    checks in about 15000 files.

    In initial set of patches against 4.14-rc6, 3 files were found to have
    copy/paste license identifier errors, and have been fixed to reflect
    the correct identifier.

    Additionally Philippe spent 10 hours this week doing a detailed manual
    inspection and review of the 12,461 patched files from the initial
    patch version early this week with:

    - a full scancode scan run, collecting the matched texts, detected
    license ids and scores

    - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct

    - reviewing anything where there was no detection but the patch
    license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
    applied SPDX license was correct

    This produced a worksheet with 20 files needing minor correction. This
    worksheet was then exported into 3 different .csv files for the
    different types of files to be modified.

    These .csv files were then reviewed by Greg. Thomas wrote a script to
    parse the csv files and add the proper SPDX tag to the file, in the
    format that the file expected. This script was further refined by Greg
    based on the output to detect more types of files automatically and to
    distinguish between header and source .c files (which need different
    comment types.) Finally Greg ran the script using the .csv files to
    generate the patches.

    Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
    Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

    * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    License cleanup: add SPDX license identifier to uapi header files with a license
    License cleanup: add SPDX license identifier to uapi header files with no license
    License cleanup: add SPDX GPL-2.0 license identifier to files with no license

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

26 Oct, 2017

2 commits

  • The number of unsignaled work-requests posted to the IB send queue is
    tracked by a counter in the rds_ib_connection struct. When it reaches
    zero, or the caller explicitly asks for it, the send-signaled bit is
    set in send_flags and the counter is reset. This is performed by the
    rds_ib_set_wr_signal_state() function.

    However, this function is not always used which yields inaccurate
    accounting. This commit fixes this, re-factors a code bloat related to
    the matter, and makes the actual parameter type to the function
    consistent.

    Signed-off-by: Håkon Bugge
    Signed-off-by: David S. Miller

    Håkon Bugge
     
  • send_flags needs to be initialized before calling
    rds_ib_set_wr_signal_state().

    Signed-off-by: Håkon Bugge
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

08 Sep, 2017

1 commit

  • In rds_send_xmit() there is logic to batch the sends. However, if
    another thread has acquired the lock and has incremented the send_gen,
    it is considered a race and we yield. The code incrementing the
    s_send_lock_queue_raced statistics counter did not count this event
    correctly.

    This commit counts the race condition correctly.

    Changes from v1:
    - Removed check for *someone_on_xmit()*
    - Fixed incorrect indentation

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

06 Sep, 2017

1 commit

  • The bits in m_flags in struct rds_message are used for a plurality of
    reasons, and from different contexts. To avoid any missing updates to
    m_flags, use the atomic set_bit() instead of the non-atomic equivalent.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Reviewed-by: Wei Lin Guay
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

29 Aug, 2017

1 commit


10 Aug, 2017

1 commit

  • The UDP offload conflict is dealt with by simply taking what is
    in net-next where we have removed all of the UFO handling code
    entirely.

    The TCP conflict was a case of local variables in a function
    being removed from both net and net-next.

    In netvsc we had an assignment right next to where a missing
    set of u64 stats sync object inits were added.

    Signed-off-by: David S. Miller

    David S. Miller
     

09 Aug, 2017

1 commit

  • In commit 7e3f2952eeb1 ("rds: don't let RDS shutdown a connection
    while senders are present"), refilling the receive queue was removed
    from rds_ib_recv(), along with the increment of
    s_ib_rx_refill_from_thread.

    Commit 73ce4317bf98 ("RDS: make sure we post recv buffers")
    re-introduces filling the receive queue from rds_ib_recv(), but does
    not add the statistics counter. rds_ib_recv() was later renamed to
    rds_ib_recv_path().

    This commit reintroduces the statistics counting of
    s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.

    Signed-off-by: Håkon Bugge
    Reviewed-by: Knut Omang
    Reviewed-by: Wei Lin Guay
    Reviewed-by: Shamir Rabinovitch
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Håkon Bugge
     

04 Aug, 2017

1 commit

  • RDS over IB does not use multipath RDS, so the array
    of additional rds_conn_path structures is always superfluous
    in this case. Reduce the memory footprint of the rds module
    by making this a dynamic allocation predicated on whether
    the transport is mp_capable.

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Tested-by: Efrain Galaviz
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

21 Jul, 2017

2 commits


17 Jul, 2017

1 commit

  • We could end up executing rds_conn_shutdown before the rds_recv_worker
    thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
    sock_release and set sock->sk to null, which may interleave in bad
    ways with rds_recv_worker, e.g., it could result in:

    "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
    [ffff881769f6fd70] release_sock at ffffffff815f337b
    [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
    [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
    [ffff881769f6fde0] process_one_work at ffffffff810a14c1
    [ffff881769f6fe40] worker_thread at ffffffff810a1940
    [ffff881769f6fec0] kthread at ffffffff810a6b1e

    Also, do not enqueue any new shutdown workq items when the connection is
    shutting down (this may happen for rds-tcp in softirq mode, if a FIN
    or CLOSE is received while the modules is in the middle of an unload)

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

08 Jul, 2017

1 commit

  • There are two problems with calling sock_create_kern() from
    rds_tcp_accept_one()
    1. it sets up a new_sock->sk that is wasteful, because this ->sk
    is going to get replaced by inet_accept() in the subsequent ->accept()
    2. The new_sock->sk is a leaked reference in sock_graft() which
    expects to find a null parent->sk

    Avoid these problems by calling sock_create_lite().

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

05 Jul, 2017

4 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

22 Jun, 2017

2 commits

  • If we are unloading the rds_tcp module, we can set linger to 1
    and drop pending packets to accelerate reconnect. The peer will
    end up resetting the connection based on new generation numbers
    of the new incarnation, so hanging on to unsent TCP packets via
    linger is mostly pointless in this case.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • The RDS handshake ping probe added by commit 5916e2c1554f
    ("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
    before the first data packet is sent to a peer. If the conversation
    is not bidirectional (i.e., one side is always passive and never
    invokes rds_sendmsg()) and the passive side restarts its rds_tcp
    module, a new HS ping probe needs to be sent, so that the number
    of paths can be re-established.

    This patch achieves that by sending a HS ping probe from
    rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
    a handshake probe with this peer yet).

    Signed-off-by: Sowmini Varadhan
    Tested-by: Jenny Xu
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

17 Jun, 2017

2 commits

  • Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP
    layer accepts the connection and then the rds_tcp_accept_one()
    callback is invoked to process the incoming connection.

    rds_tcp_accept_one() may reject the incoming syn for a number of
    reasons, e.g., commit 1a0e100fb2c9 ("RDS: TCP: Force every connection
    to be initiated by numerically smaller IP address"), or because
    we are getting spammed by a malicious node that is triggering
    a flood of connection attempts to RDS_TCP_PORT. If the incoming
    syn is rejected, no data would have been sent on the TCP socket,
    and we do not need to be in TIME_WAIT state, so we set linger on
    the TCP socket before closing, thereby closing the socket efficiently
    with a RST.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Found when testing between sparc and x86 machines on different
    subnets, so the address comparison patterns hit the corner cases and
    brought out some bugs fixed by this patch.

    Signed-off-by: Sowmini Varadhan
    Tested-by: Imanti Mendez
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan