26 Sep, 2018

1 commit

  • [ Upstream commit cc4dfb7f70a344f24c1c71e298deea0771dadcb2 ]

    When a rds sock is bound, it is inserted into the bind_hash_table
    which is protected by RCU. But when releasing rds sock, after it
    is removed from this hash table, it is freed immediately without
    respecting RCU grace period. This could cause some use-after-free
    as reported by syzbot.

    Mark the rds sock with SOCK_RCU_FREE before inserting it into the
    bind_hash_table, so that it would be always freed after a RCU grace
    period.

    The other problem is in rds_find_bound(), the rds sock could be
    freed in between rhashtable_lookup_fast() and rds_sock_addref(),
    so we need to extend RCU read lock protection in rds_find_bound()
    to close this race condition.

    Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
    Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
    Cc: Sowmini Varadhan
    Cc: Santosh Shilimkar
    Cc: rds-devel@oss.oracle.com
    Signed-off-by: Cong Wang
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

12 Apr, 2018

1 commit

  • [ Upstream commit 7ae0c649c47f1c5d2db8cee6dd75855970af1669 ]

    If the rds_sock is not added to the bind_hash_table, we must
    reset rs_bound_addr so that rds_remove_bound will not trip on
    this rds_sock.

    rds_add_bound() does a rds_sock_put() in this failure path, so
    failing to reset rs_bound_addr will result in a socket refcount
    bug, and will trigger a WARN_ON with the stack shown below when
    the application subsequently tries to close the PF_RDS socket.

    WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
    rds_sock_destruct+0x15/0x30 [rds]
    :
    __sk_destruct+0x21/0x190
    rds_remove_bound.part.13+0xb6/0x140 [rds]
    rds_release+0x71/0x120 [rds]
    sock_release+0x1a/0x70
    sock_close+0xe/0x20
    __fput+0xd5/0x210
    task_work_run+0x82/0xa0
    do_exit+0x2ce/0xb30
    ? syscall_trace_enter+0x1cc/0x2b0
    do_group_exit+0x39/0xa0
    SyS_exit_group+0x10/0x10
    do_syscall_64+0x61/0x1a0

    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sowmini Varadhan
     

29 Aug, 2017

1 commit


03 Jan, 2017

1 commit


16 Jul, 2016

1 commit

  • Use RDS probe-ping to compute how many paths may be used with
    the peer, and to synchronously start the multiple paths. If mprds is
    supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
    when multipath RDS is supported by the transport.

    CC: Santosh Shilimkar
    Signed-off-by: Sowmini Varadhan
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

03 Nov, 2015

1 commit

  • To further improve the RDS connection scalabilty on massive systems
    where number of sockets grows into tens of thousands of sockets, there
    is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
    not very flexible in terms of memory utilisation. The rhashtable
    infrastructure gives us the flexibility to grow the hashtbable based
    on use and also comes up with inbuilt efficient bucket(chain) handling.

    Reviewed-by: David Miller
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     

13 Oct, 2015

1 commit


01 Oct, 2015

3 commits

  • One global lock protecting hash-tables with 1024 buckets isn't
    efficient and it shows up in a massive systems with truck
    loads of RDS sockets serving multiple databases. The
    perf data clearly highlights the contention on the rw
    lock in these massive workloads.

    When the contention gets worse, the code gets into a state where
    it decides to back off on the lock. So while it has disabled interrupts,
    it sits and backs off on this lock get. This causes the system to
    become sluggish and eventually all sorts of bad things happen.

    The simple fix is to move the lock into the hash bucket and
    use per-bucket lock to improve the scalability.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • One need to take rds socket reference while using it and release it
    once done with it. rds_add_bind() code path does not do that so
    lets fix it.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     
  • RDS bind and release locking scheme is very inefficient. It
    uses RCU for maintaining the bind hash-table which is great but
    it also needs to hold spinlock for [add/remove]_bound(). So
    overall usecase, the hash-table concurrent speedup doesn't pay off.
    In fact blocking nature of synchronize_rcu() makes the RDS
    socket shutdown too slow which hurts RDS performance since
    connection shutdown and re-connect happens quite often to
    maintain the RC part of the protocol.

    So we make the locking scheme simpler and more efficient by
    replacing spin_locks with reader/writer locks and getting rid
    off rcu for bind hash-table.

    In subsequent patch, we also covert the global lock with per-bucket
    lock to reduce the global lock contention.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar

    Santosh Shilimkar
     

08 Aug, 2015

1 commit


01 Jun, 2015

1 commit

  • An application may deterministically attach the underlying transport for
    a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
    option at the SOL_RDS level. The integer argument to setsockopt must be
    one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
    must be specified before invoking bind(2) on the socket, and may only
    be used once on the socket. An attempt to set the option on a bound
    socket, or to invoke the option after a successful SO_RDS_TRANSPORT
    attachment, will return EOPNOTSUPP.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

15 Jan, 2014

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

17 Jun, 2011

1 commit


09 Sep, 2010

3 commits


24 Aug, 2009

1 commit


27 Feb, 2009

1 commit