27 Sep, 2019

1 commit

  • In rds_bind(), laddr_check is called without checking if it is NULL or
    not. And rs_transport should be reset if rds_add_bound() fails.

    Fixes: c5c1a030a7db ("net/rds: An rds_sock is added too early to the hash table")
    Reported-by: syzbot+fae39afd2101a17ec624@syzkaller.appspotmail.com
    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

15 Sep, 2019

1 commit


11 Sep, 2019

1 commit

  • In rds_bind(), an rds_sock is added to the RDS bind hash table before
    rs_transport is set. This means that the socket can be found by the
    receive code path when rs_transport is NULL. And the receive code
    path de-references rs_transport for congestion update check. This can
    cause a panic. An rds_sock should not be added to the bind hash table
    before all the needed fields are set.

    Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
    Signed-off-by: Ka-Cheong Poon
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

05 Sep, 2019

1 commit

  • IN_MULTICAST's primary intent is as a uapi macro.

    Elsewhere in the kernel we use ipv4_is_multicast consistently.

    This patch unifies linux's multicast checks to use that function
    rather than this macro.

    Signed-off-by: Dave Taht
    Reviewed-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller

    Dave Taht
     

13 Apr, 2019

1 commit

  • syzbot is reporting uninitialized value at rds_connect() [1] and
    rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
    these functions expect that it is safe to access sockaddr->family field
    in order to determine minimal address length for validation.

    [1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
    [2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38f

    Reported-by: syzbot
    Reported-by: syzbot
    Signed-off-by: Tetsuo Handa
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Tetsuo Handa
     

01 Feb, 2019

1 commit

  • syzbot was able to catch a bug in rds [1]

    The issue here is that the socket might be found in a hash table
    but that its refcount has already be set to 0 by another cpu.

    We need to use refcount_inc_not_zero() to be safe here.

    [1]

    refcount_t: increment on 0; use-after-free.
    WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
    WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
    panic+0x2cb/0x65c kernel/panic.c:214
    __warn.cold+0x20/0x48 kernel/panic.c:571
    report_bug+0x263/0x2b0 lib/bug.c:186
    fixup_bug arch/x86/kernel/traps.c:178 [inline]
    fixup_bug arch/x86/kernel/traps.c:173 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
    do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
    RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
    RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
    Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
    RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
    RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
    RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
    R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
    R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
    sock_hold include/net/sock.h:647 [inline]
    rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
    rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
    rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
    rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
    rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
    rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xdd/0x130 net/socket.c:631
    __sys_sendto+0x387/0x5f0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto net/socket.c:1796 [inline]
    __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
    do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x458089
    Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
    RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
    RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
    R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff

    Fixes: cc4dfb7f70a3 ("rds: fix two RCU related problems")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Cc: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Sep, 2018

1 commit

  • When a rds sock is bound, it is inserted into the bind_hash_table
    which is protected by RCU. But when releasing rds sock, after it
    is removed from this hash table, it is freed immediately without
    respecting RCU grace period. This could cause some use-after-free
    as reported by syzbot.

    Mark the rds sock with SOCK_RCU_FREE before inserting it into the
    bind_hash_table, so that it would be always freed after a RCU grace
    period.

    The other problem is in rds_find_bound(), the rds sock could be
    freed in between rhashtable_lookup_fast() and rds_sock_addref(),
    so we need to extend RCU read lock protection in rds_find_bound()
    to close this race condition.

    Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
    Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Signed-off-by: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Cong Wang
     

02 Aug, 2018

1 commit


24 Jul, 2018

2 commits

  • This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
    listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
    connection requests. RDS/RDMA/IB uses a private data (struct
    rds_ib_connect_private) exchange between endpoints at RDS connection
    establishment time to support RDMA. This private data exchange uses a
    32 bit integer to represent an IP address. This needs to be changed in
    order to support IPv6. A new private data struct
    rds6_ib_connect_private is introduced to handle this. To ensure
    backward compatibility, an IPv6 capable RDS stack uses another RDMA
    listener port (RDS_CM_PORT) to accept IPv6 connection. And it
    continues to use the original RDS_PORT for IPv4 RDS connections. When
    it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
    send the connection set up request.

    v5: Fixed syntax problem (David Miller).

    v4: Changed port history comments in rds.h (Sowmini Varadhan).

    v3: Added support to set up IPv4 connection using mapped address
    (David Miller).
    Added support to set up connection between link local and non-link
    addresses.
    Various review comments from Santosh Shilimkar and Sowmini Varadhan.

    v2: Fixed bound and peer address scope mismatched issue.
    Added back rds_connect() IPv6 changes.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     
  • This patch changes the internal representation of an IP address to use
    struct in6_addr. IPv4 address is stored as an IPv4 mapped address.
    All the functions which take an IP address as argument are also
    changed to use struct in6_addr. But RDS socket layer is not modified
    such that it still does not accept IPv6 address from an application.
    And RDS layer does not accept nor initiate IPv6 connections.

    v2: Fixed sparse warnings.

    Signed-off-by: Ka-Cheong Poon
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Ka-Cheong Poon
     

28 Dec, 2017

1 commit

  • If the rds_sock is not added to the bind_hash_table, we must
    reset rs_bound_addr so that rds_remove_bound will not trip on
    this rds_sock.

    rds_add_bound() does a rds_sock_put() in this failure path, so
    failing to reset rs_bound_addr will result in a socket refcount
    bug, and will trigger a WARN_ON with the stack shown below when
    the application subsequently tries to close the PF_RDS socket.

    WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
    rds_sock_destruct+0x15/0x30 [rds]
    :
    __sk_destruct+0x21/0x190
    rds_remove_bound.part.13+0xb6/0x140 [rds]
    rds_release+0x71/0x120 [rds]
    sock_release+0x1a/0x70
    sock_close+0xe/0x20
    __fput+0xd5/0x210
    task_work_run+0x82/0xa0
    do_exit+0x2ce/0xb30
    ? syscall_trace_enter+0x1cc/0x2b0
    do_group_exit+0x39/0xa0
    SyS_exit_group+0x10/0x10
    do_syscall_64+0x61/0x1a0

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

29 Aug, 2017

1 commit


03 Jan, 2017

1 commit


16 Jul, 2016

1 commit

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

03 Nov, 2015

1 commit

  • To further improve the RDS connection scalabilty on massive systems
    where number of sockets grows into tens of thousands of sockets, there
    is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
    not very flexible in terms of memory utilisation. The rhashtable
    infrastructure gives us the flexibility to grow the hashtbable based
    on use and also comes up with inbuilt efficient bucket(chain) handling.

    Reviewed-by: David Miller
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     

13 Oct, 2015

1 commit


01 Oct, 2015

3 commits

  • One global lock protecting hash-tables with 1024 buckets isn't
    efficient and it shows up in a massive systems with truck
    loads of RDS sockets serving multiple databases. The
    perf data clearly highlights the contention on the rw
    lock in these massive workloads.

    When the contention gets worse, the code gets into a state where
    it decides to back off on the lock. So while it has disabled interrupts,
    it sits and backs off on this lock get. This causes the system to
    become sluggish and eventually all sorts of bad things happen.

    The simple fix is to move the lock into the hash bucket and
    use per-bucket lock to improve the scalability.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • One need to take rds socket reference while using it and release it
    once done with it. rds_add_bind() code path does not do that so
    lets fix it.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • RDS bind and release locking scheme is very inefficient. It
    uses RCU for maintaining the bind hash-table which is great but
    it also needs to hold spinlock for [add/remove]_bound(). So
    overall usecase, the hash-table concurrent speedup doesn't pay off.
    In fact blocking nature of synchronize_rcu() makes the RDS
    socket shutdown too slow which hurts RDS performance since
    connection shutdown and re-connect happens quite often to
    maintain the RC part of the protocol.

    So we make the locking scheme simpler and more efficient by
    replacing spin_locks with reader/writer locks and getting rid
    off rcu for bind hash-table.

    In subsequent patch, we also covert the global lock with per-bucket
    lock to reduce the global lock contention.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

08 Aug, 2015

1 commit


01 Jun, 2015

1 commit

  • An application may deterministically attach the underlying transport for
    a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
    option at the SOL_RDS level. The integer argument to setsockopt must be
    one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
    must be specified before invoking bind(2) on the socket, and may only
    be used once on the socket. An attempt to set the option on a bound
    socket, or to invoke the option after a successful SO_RDS_TRANSPORT
    attachment, will return EOPNOTSUPP.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jan, 2014

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

17 Jun, 2011

1 commit


09 Sep, 2010

3 commits


24 Aug, 2009

1 commit


27 Feb, 2009

1 commit