26 Sep, 2018
1 commit
-
[ Upstream commit cc4dfb7f70a344f24c1c71e298deea0771dadcb2 ]
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Cc: Sowmini Varadhan
Cc: Santosh Shilimkar
Cc: rds-devel@oss.oracle.com
Signed-off-by: Cong Wang
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
12 Apr, 2018
1 commit
-
[ Upstream commit 7ae0c649c47f1c5d2db8cee6dd75855970af1669 ]
If the rds_sock is not added to the bind_hash_table, we must
reset rs_bound_addr so that rds_remove_bound will not trip on
this rds_sock.rds_add_bound() does a rds_sock_put() in this failure path, so
failing to reset rs_bound_addr will result in a socket refcount
bug, and will trigger a WARN_ON with the stack shown below when
the application subsequently tries to close the PF_RDS socket.WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
rds_sock_destruct+0x15/0x30 [rds]
:
__sk_destruct+0x21/0x190
rds_remove_bound.part.13+0xb6/0x140 [rds]
rds_release+0x71/0x120 [rds]
sock_release+0x1a/0x70
sock_close+0xe/0x20
__fput+0xd5/0x210
task_work_run+0x82/0xa0
do_exit+0x2ce/0xb30
? syscall_trace_enter+0x1cc/0x2b0
do_group_exit+0x39/0xa0
SyS_exit_group+0x10/0x10
do_syscall_64+0x61/0x1a0Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
29 Aug, 2017
1 commit
-
Make this const as it is either used during a copy operation or passed
to a const argument of the function rhltable_initSigned-off-by: Bhumika Goyal
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
03 Jan, 2017
1 commit
-
It's useful to know the IP address when RDS fails to bind a
connection. Thus, adding it to the error message.Orabug: 21894138
Reviewed-by: Wei Lin Guay
Signed-off-by: Santosh Shilimkar
16 Jul, 2016
1 commit
-
Use RDS probe-ping to compute how many paths may be used with
the peer, and to synchronously start the multiple paths. If mprds is
supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
when multipath RDS is supported by the transport.CC: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
03 Nov, 2015
1 commit
-
To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.Reviewed-by: David Miller
Signed-off-by: Santosh Shilimkar
Signed-off-by: Santosh Shilimkar
Signed-off-by: David S. Miller
13 Oct, 2015
1 commit
-
The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller
01 Oct, 2015
3 commits
-
One global lock protecting hash-tables with 1024 buckets isn't
efficient and it shows up in a massive systems with truck
loads of RDS sockets serving multiple databases. The
perf data clearly highlights the contention on the rw
lock in these massive workloads.When the contention gets worse, the code gets into a state where
it decides to back off on the lock. So while it has disabled interrupts,
it sits and backs off on this lock get. This causes the system to
become sluggish and eventually all sorts of bad things happen.The simple fix is to move the lock into the hash bucket and
use per-bucket lock to improve the scalability.Signed-off-by: Santosh Shilimkar
Signed-off-by: Santosh Shilimkar -
One need to take rds socket reference while using it and release it
once done with it. rds_add_bind() code path does not do that so
lets fix it.Signed-off-by: Santosh Shilimkar
Signed-off-by: Santosh Shilimkar -
RDS bind and release locking scheme is very inefficient. It
uses RCU for maintaining the bind hash-table which is great but
it also needs to hold spinlock for [add/remove]_bound(). So
overall usecase, the hash-table concurrent speedup doesn't pay off.
In fact blocking nature of synchronize_rcu() makes the RDS
socket shutdown too slow which hurts RDS performance since
connection shutdown and re-connect happens quite often to
maintain the RC part of the protocol.So we make the locking scheme simpler and more efficient by
replacing spin_locks with reader/writer locks and getting rid
off rcu for bind hash-table.In subsequent patch, we also covert the global lock with per-bucket
lock to reduce the global lock contention.Signed-off-by: Santosh Shilimkar
Signed-off-by: Santosh Shilimkar
08 Aug, 2015
1 commit
-
Open the sockets calling sock_create_kern() with the correct struct net
pointer, and use that struct net pointer when verifying the
address passed to rds_bind().Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
01 Jun, 2015
1 commit
-
An application may deterministically attach the underlying transport for
a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer argument to setsockopt must be
one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
must be specified before invoking bind(2) on the socket, and may only
be used once on the socket. An attempt to set the option on a bound
socket, or to invoke the option after a successful SO_RDS_TRANSPORT
attachment, will return EOPNOTSUPP.Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller
15 Jan, 2014
1 commit
-
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.Signed-off-by: Aruna-Hewapathirane
Suggested-by: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
28 Feb, 2013
1 commit
-
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;type T;
expression a,c,d,e;
identifier b;
statement S;
@@-T b;
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jun, 2011
1 commit
-
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited()Signed-off-by: Manuel Zerpies
Signed-off-by: David S. Miller
09 Sep, 2010
3 commits
-
The RDS bind lookups are somewhat expensive in terms of CPU
time and locking overhead. This commit changes them into a
faster RCU based hash tree instead of the rbtrees they were using
before.On large NUMA systems it is a significant improvement.
Signed-off-by: Chris Mason
-
The bind_lock is almost entirely readonly, but it gets
hammered during normal operations and is a major bottleneck.This commit changes it to an rwlock, which takes it from 80%
of the system time on a big numa machine down to much lower
numbers.A better fix would involve RCU, which is done in a later commit
Signed-off-by: Chris Mason
-
Favor "if (foo)" style over "if (foo != NULL)".
Signed-off-by: Andy Grover
24 Aug, 2009
1 commit
-
Now that RDS transports are no longer compiled-in to RDS core,
there is now the possibility that they will not be loaded. This
adds a helpful suggestion when rds_bind() fails to find a transport.Signed-off-by: Andy Grover
Signed-off-by: David S. Miller
27 Feb, 2009
1 commit
-
Implement the RDS (Reliable Datagram Sockets) interface.
Signed-off-by: Andy Grover
Signed-off-by: David S. Miller