Eric Lee / smarc-fsl-linux-kernel

17 Nov, 2019

1 commit

a36e629ee rds: ib: update WR sizes when bringing up connection ... Browse Code »

Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
a connection being down while rds_ib_sysctl_max_send_wr or
rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
it comes back up.

Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
with the most current WR sizes.

Signed-off-by: Dag Moxnes
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Dag Moxnes
2019-11-17 04:59:08 +0800

03 Oct, 2019

1 commit

d64bf89a7 net/rds: Fix error handling in rds_ib_add_one() ... Browse Code »

rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
after allocation some resources such as protection domain.
If allocation of such resources fail, then these uninitialized
variables are accessed in rds_ib_dev_free() in failure path. This
can potentially crash the system. The code has been updated to
initialize these variables very early in the function.

Signed-off-by: Dotan Barak
Signed-off-by: Sudhakar Dindukurti
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Dotan Barak
2019-10-03 00:16:57 +0800

27 Sep, 2019

1 commit

05733434e net/rds: Check laddr_check before calling it ... Browse Code »

In rds_bind(), laddr_check is called without checking if it is NULL or
not. And rs_transport should be reset if rds_add_bound() fails.

Fixes: c5c1a030a7db ("net/rds: An rds_sock is added too early to the hash table")
Reported-by: syzbot+fae39afd2101a17ec624@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2019-09-27 18:10:55 +0800

26 Sep, 2019

1 commit

bf69abad2 net: Fix Kconfig indentation ... Browse Code »

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski
Acked-by: Sven Eckelmann
Signed-off-by: David S. Miller

Krzysztof Kozlowski
2019-09-26 14:56:17 +0800

18 Sep, 2019

1 commit

1bab8d4c4 Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Pull in bug fixes from 'net' tree for the merge window.

Signed-off-by: David S. Miller

David S. Miller
2019-09-18 05:51:10 +0800

16 Sep, 2019

1 commit

05a82481a net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names' ... Browse Code »

All entries in 'rds_ib_stat_names' are stringified versions
of the corresponding "struct rds_ib_statistics" element
without the "s_"-prefix.

Fix entry 'ib_evt_handler_call' to do the same.

Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance")
Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-09-16 02:56:19 +0800

15 Sep, 2019

1 commit

aa2eaa8c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor overlapping changes in the btusb and ixgbe drivers.

Signed-off-by: David S. Miller

David S. Miller
2019-09-15 20:17:27 +0800

11 Sep, 2019

1 commit

c5c1a030a net/rds: An rds_sock is added too early to the hash table ... Browse Code »

In rds_bind(), an rds_sock is added to the RDS bind hash table before
rs_transport is set. This means that the socket can be found by the
receive code path when rs_transport is NULL. And the receive code
path de-references rs_transport for congestion update check. This can
cause a panic. An rds_sock should not be added to the bind hash table
before all the needed fields are set.

Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon
Signed-off-by: David S. Miller

Ka-Cheong Poon
2019-09-11 22:05:40 +0800

05 Sep, 2019

1 commit

842841ece Convert usage of IN_MULTICAST to ipv4_is_multicast ... Browse Code »

IN_MULTICAST's primary intent is as a uapi macro.

Elsewhere in the kernel we use ipv4_is_multicast consistently.

This patch unifies linux's multicast checks to use that function
rather than this macro.

Signed-off-by: Dave Taht
Reviewed-by: Toke Høiland-Jørgensen
Signed-off-by: David S. Miller

Dave Taht
2019-09-05 15:38:32 +0800

03 Sep, 2019

1 commit

765b7590c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

r8152 conflicts are the NAPI fixes in 'net' overlapping with
some tasklet stuff in net-next

Signed-off-by: David S. Miller

David S. Miller
2019-09-03 02:20:17 +0800

28 Aug, 2019

2 commits

7d0a06586 net/rds: Fix info leak in rds6_inc_info_copy() ... Browse Code »

The rds6_inc_info_copy() function has a couple struct members which
are leaking stack information. The ->tos field should hold actual
information and the ->flags field needs to be zeroed out.

Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure")
Fixes: b7ff8b1036f0 ("rds: Extend RDS API for IPv6 support")
Reported-by: 黄ID蝴蝶
Signed-off-by: Dan Carpenter
Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2019-08-28 11:56:06 +0800
68aaf4459 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Minor conflict in r8169, bug fix had two versions in net
and net-next, take the net-next hunks.

Signed-off-by: David S. Miller

David S. Miller
2019-08-28 05:23:31 +0800

25 Aug, 2019

1 commit

e0e6d0628 net: rds: add service level support in rds-info ... Browse Code »

>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.

Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"

The commit fe3475af3bdf ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
__u32 rdma_mr_max;
__u32 rdma_mr_size;
__u8 tos;
__u32 cache_allocs;
};
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
uint32_t rdma_mr_max;
uint32_t rdma_mr_size;
uint8_t tos;
uint8_t sl;
uint32_t cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.

Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
CC: Joe Jin
CC: JUNXIAO_BI
Suggested-by: Gerd Rausch
Signed-off-by: Zhu Yanjun
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Zhu Yanjun
2019-08-25 07:55:25 +0800

24 Aug, 2019

1 commit

bf1867db9 net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy ... Browse Code »

Add the RDMA cookie and RX timestamp to the usercopy whitelist.

After the introduction of hardened usercopy whitelisting
(https://lwn.net/Articles/727322/), a warning is displayed when the
RDMA cookie or RX timestamp is copied to userspace:

kernel: WARNING: CPU: 3 PID: 5750 at
mm/usercopy.c:81 usercopy_warn+0x8e/0xa6
[...]
kernel: Call Trace:
kernel: __check_heap_object+0xb8/0x11b
kernel: __check_object_size+0xe3/0x1bc
kernel: put_cmsg+0x95/0x115
kernel: rds_recvmsg+0x43d/0x620 [rds]
kernel: sock_recvmsg+0x43/0x4a
kernel: ___sys_recvmsg+0xda/0x1e6
kernel: ? __handle_mm_fault+0xcae/0xf79
kernel: __sys_recvmsg+0x51/0x8a
kernel: SyS_recvmsg+0x12/0x1c
kernel: do_syscall_64+0x79/0x1ae

When the whitelisting feature was introduced, the memory for the RDMA
cookie and RX timestamp in RDS was not added to the whitelist, causing
the warning above.

Signed-off-by: Dag Moxnes
Tested-by: Jenny
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Dag Moxnes
2019-08-24 05:55:52 +0800

16 Aug, 2019

5 commits

11740ef44 rds: check for excessive looping in rds_send_xmit ... Browse Code »

Original commit from 2011 updated to include a change by
Yuval Shaia
that adds a new statistic counter "send_stuck_rm"
to capture the messages looping exessively
in the send path.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Andy Grover
2019-08-16 03:04:24 +0800
55c70ca00 net/rds: Add a few missing rds_stat_names entries ... Browse Code »

In a previous commit, fields were added to "struct rds_statistics"
but array "rds_stat_names" was not updated accordingly.

Please note the inconsistent naming of the string representations
that is done in the name of compatibility
with the Oracle internal code-base.

s_recv_bytes_added_to_socket -> "recv_bytes_added_to_sock"
s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock"

Fixes: 192a798f5299 ("RDS: add stat for socket recv memory usage")
Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-08-16 03:04:24 +0800
356feaad4 RDS: don't use GFP_ATOMIC for sk_alloc in rds_create ... Browse Code »

Signed-off-by: Chris Mason
Signed-off-by: Bang Nguyen
Signed-off-by: Gerd Rausch
Signed-off-by: Somasundaram Krishnasamy
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Chris Mason
2019-08-16 03:04:24 +0800
65dedd7fe RDS: limit the number of times we loop in rds_send_xmit ... Browse Code »

This will kick the RDS worker thread if we have been looping
too long.

Original commit from 2012 updated to include a change by
Venkat Venkatsubra
that triggers "must_wake" if "rds_ib_recv_refill_one" fails.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Chris Mason
2019-08-16 03:04:24 +0800
a46b5b6c2 net/rds: Add RDS6_INFO_SOCKETS and RDS6_INFO_RECV_MESSAGES options ... Browse Code »

Add support of the socket options RDS6_INFO_SOCKETS and
RDS6_INFO_RECV_MESSAGES which update the RDS_INFO_SOCKETS and
RDS_INFO_RECV_MESSAGES options respectively. The old options work
for IPv4 sockets only.

Signed-off-by: Ka-Cheong Poon
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Ka-Cheong Poon
2019-08-16 02:40:03 +0800

28 Jul, 2019

1 commit

c7ba50fe2 net: rds: Fix possible null-pointer dereferences in rds_rdma_cm_event_handler_cmn() ... Browse Code »

In rds_rdma_cm_event_handler_cmn(), there are some if statements to
check whether conn is NULL, such as on lines 65, 96 and 112.
But conn is not checked before being used on line 108:
trans->cm_connect_complete(conn, event);
and on lines 140-143:
rdsdebug("DISCONNECT event - dropping connection "
"%pI6c->%pI6c\n", &conn->c_laddr,
&conn->c_faddr);
rds_conn_drop(conn);

Thus, possible null-pointer dereferences may occur.

To fix these bugs, conn is checked before being used.

These bugs are found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Jia-Ju Bai
2019-07-28 04:58:12 +0800

20 Jul, 2019

1 commit

5f4fc6d44 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix AF_XDP cq entry leak, from Ilya Maximets.

2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.

3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.

4) Fix handling of neigh timers wrt. entries added by userspace, from
Lorenzo Bianconi.

5) Various cases of missing of_node_put(), from Nishka Dasgupta.

6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.

7) Various RDS layer fixes, from Gerd Rausch.

8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
Cong Wang.

9) Fix FIB source validation checks over loopback, also from Cong Wang.

10) Use promisc for unsupported number of filters, from Justin Chen.

11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
tcp: fix tcp_set_congestion_control() use from bpf hook
ag71xx: fix return value check in ag71xx_probe()
ag71xx: fix error return code in ag71xx_probe()
usb: qmi_wwan: add D-Link DWM-222 A2 device ID
bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
net: dsa: sja1105: Fix missing unlock on error in sk_buff()
gve: replace kfree with kvfree
selftests/bpf: fix test_xdp_noinline on s390
selftests/bpf: fix "valid read map access into a read-only array 1" on s390
net/mlx5: Replace kfree with kvfree
MAINTAINERS: update netsec driver
ipv6: Unlink sibling route in case of failure
liquidio: Replace vmalloc + memset with vzalloc
udp: Fix typo in net/ipv4/udp.c
net: bcmgenet: use promisc for unsupported filters
ipv6: rt6_check should return NULL if 'from' is NULL
tipc: initialize 'validated' field of received packets
selftests: add a test case for rp_filter
fib: relax source validation check for loopback packets
mlxsw: spectrum: Do not process learned records with a dummy FID
...

Linus Torvalds
2019-07-20 01:06:06 +0800

18 Jul, 2019

7 commits

aa4948937 net/rds: Initialize ic->i_fastreg_wrs upon allocation ... Browse Code »

Otherwise, if an IB connection is torn down before "rds_ib_setup_qp"
is called, the value of "ic->i_fastreg_wrs" is still at zero
(as it wasn't initialized by "rds_ib_setup_qp").
Consequently "rds_ib_conn_path_shutdown" will spin forever,
waiting for it to go back to "RDS_IB_DEFAULT_FR_WR",
which of course will never happen as there are no
outstanding work requests.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:52 +0800
3a2886cca net/rds: Keep track of and wait for FRWR segments in use upon shutdown ... Browse Code »

Since "rds_ib_free_frmr" and "rds_ib_free_frmr_list" simply put
the FRMR memory segments on the "drop_list" or "free_list",
and it is the job of "rds_ib_flush_mr_pool" to reap those entries
by ultimately issuing a "IB_WR_LOCAL_INV" work-request,
we need to trigger and then wait for all those memory segments
attached to a particular connection to be fully released before
we can move on to release the QP, CQ, etc.

So we make "rds_ib_conn_path_shutdown" wait for one more
atomic_t called "i_fastreg_inuse_count" that keeps track of how
many FRWR memory segments are out there marked "FRMR_IS_INUSE"
(and also wake_up rds_ib_ring_empty_wait, as they go away).

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:52 +0800
9547dff10 net/rds: Set fr_state only to FRMR_IS_FREE if IB_WR_LOCAL_INV had been successful ... Browse Code »

Fix a bug where fr_state first goes to FRMR_IS_STALE, because of a failure
of operation IB_WR_LOCAL_INV, but then gets set back to "FRMR_IS_FREE"
uncoditionally, even though the operation failed.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:52 +0800
aea01a223 net/rds: Fix NULL/ERR_PTR inconsistency ... Browse Code »

Make function "rds_ib_try_reuse_ibmr" return NULL in case
memory region could not be allocated, since callers
simply check if the return value is not NULL.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:52 +0800
5f33141d2 net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV ... Browse Code »

In order to:
1) avoid a silly bouncing between "clean_list" and "drop_list"
triggered by function "rds_ib_reg_frmr" as it is releases frmr
regions whose state is not "FRMR_IS_FREE" right away.

2) prevent an invalid access error in a race from a pending
"IB_WR_LOCAL_INV" operation with a teardown ("dma_unmap_sg", "put_page")
and de-registration ("ib_dereg_mr") of the corresponding
memory region.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:52 +0800
c9467447f net/rds: Get rid of "wait_clean_list_grace" and add locking ... Browse Code »

Waiting for activity on the "clean_list" to quiesce is no substitute
for proper locking.

We can have multiple threads competing for "llist_del_first"
via "rds_ib_reuse_mr", and a single thread competing
for "llist_del_all" and "llist_del_first" via "rds_ib_flush_mr_pool".

Since "llist_del_first" depends on "list->first->next" not to change
in the midst of the operation, simply waiting for all current calls
to "rds_ib_reuse_mr" to quiesce across all CPUs is woefully inadequate:

By the time "wait_clean_list_grace" is done iterating over all CPUs to see
that there is no concurrent caller to "rds_ib_reuse_mr", a new caller may
have just shown up on the first CPU.

Furthermore, explicitly calls out the need for locking:
* Cases where locking is needed:
* If we have multiple consumers with llist_del_first used in one consumer,
* and llist_del_first or llist_del_all used in other consumers,
* then a lock is needed.

Also, while at it, drop the unused "pool" parameter
from "list_to_llist_nodes".

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:51 +0800
2c7da8e6b net/rds: Give fr_state a chance to transition to FRMR_IS_FREE ... Browse Code »

In the context of FRMR (ib_frmr.c):

Memory regions make it onto the "clean_list" via "rds_ib_flush_mr_pool",
after the memory region has been posted for invalidation via
"rds_ib_post_inv".

At that point in time, "fr_state" may still be in state "FRMR_IS_INUSE",
since the only place where "fr_state" transitions to "FRMR_IS_FREE"
is in "rds_ib_mr_cqe_handler", which is triggered by a tasklet.

So in case we notice that "fr_state != FRMR_IS_FREE" (see below),
we wait for "fr_inv_done" to trigger with a maximum of 10msec.
Then we check again, and only put the memory region onto the drop_list
(via "rds_ib_free_frmr") in case the situation remains unchanged.

This avoids the problem of memory-regions bouncing between "clean_list"
and "drop_list" before they even have a chance to be properly invalidated.

Signed-off-by: Gerd Rausch
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Gerd Rausch
2019-07-18 03:06:51 +0800

16 Jul, 2019

1 commit

2a3c389a0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ... Browse Code »

Pull rdma updates from Jason Gunthorpe:
"A smaller cycle this time. Notably we see another new driver, 'Soft
iWarp', and the deletion of an ancient unused driver for nes.

- Revise and simplify the signature offload RDMA MR APIs

- More progress on hoisting object allocation boiler plate code out
of the drivers

- Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib,
i40iw

- Tree wide cleanups: struct_size, put_user_page, xarray, rst doc
conversion

- Removal of obsolete ib_ucm chardev and nes driver

- netlink based discovery of chardevs and autoloading of the modules
providing them

- Move more of the rdamvt/hfi1 uapi to include/uapi/rdma

- New driver 'siw' for software based iWarp running on top of netdev,
much like rxe's software RoCE.

- mlx5 feature to report events in their raw devx format to userspace

- Expose per-object counters through rdma tool

- Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core
from netdev"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits)
RMDA/siw: Require a 64 bit arch
RDMA/siw: Mark expected switch fall-throughs
RDMA/core: Fix -Wunused-const-variable warnings
rdma/siw: Remove set but not used variable 's'
rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS
RDMA/siw: Add missing rtnl_lock around access to ifa
rdma/siw: Use proper enumerated type in map_cqe_status
RDMA/siw: Remove unnecessary kthread create/destroy printouts
IB/rdmavt: Fix variable shadowing issue in rvt_create_cq
RDMA/core: Fix race when resolving IP address
RDMA/core: Make rdma_counter.h compile stand alone
IB/core: Work on the caller socket net namespace in nldev_newlink()
RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
RDMA/mlx5: Set RDMA DIM to be enabled by default
RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
RDMA/core: Provide RDMA DIM support for ULPs
linux/dim: Implement RDMA adaptive moderation (DIM)
IB/mlx5: Report correctly tag matching rendezvous capability
docs: infiniband: add it to the driver-api bookset
IB/mlx5: Implement VHCA tunnel mechanism in DEVX
...

Linus Torvalds
2019-07-16 11:38:15 +0800

10 Jul, 2019

5 commits

dc205a8d3 rds: avoid version downgrade to legitimate newer peer connections ... Browse Code »

Connections with legitimate tos values can get into usual connection
race. It can result in consumer reject. We don't want tos value or
protocol version to be demoted for such connections otherwise
piers would end up different tos values which can results in
no connection. Example a peer initiated connection with say
tos 8 while usual connection racing can get downgraded to tos 0
which is not desirable.

Patch fixes above issue introduced by commit
commit d021fabf525f ("rds: rdma: add consumer reject")

Reported-by: Yanjun Zhu
Tested-by: Yanjun Zhu
Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2019-07-10 12:45:43 +0800
fc640d4cb rds: Return proper "tos" value to user-space ... Browse Code »

The proper "tos" value needs to be returned
to user-space (sockopt RDS_INFO_CONNECTIONS).

Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure")
Signed-off-by: Gerd Rausch
Reviewed-by: Zhu Yanjun
Signed-off-by: Santosh Shilimkar

Gerd Rausch
2019-07-10 12:45:42 +0800
8c6166cfc rds: Accept peer connection reject messages due to incompatible version ... Browse Code »

Prior to
commit d021fabf525ff ("rds: rdma: add consumer reject")

function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
connection attempt by issuing a "rds_conn_drop".

The commit mentioned above added a "break", eliminating
the "fallthrough" case and made the "rds_conn_drop" rather conditional:

Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
carries an integer-value of "1" inside "private_data":

if (!conn)
break;
err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
pr_warn("RDS/RDMA: conn rejected, dropping connection\n",
&conn->c_laddr, &conn->c_faddr);
conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
rds_conn_drop(conn);
}
rdsdebug("Connection rejected: %s\n",
rdma_reject_msg(cm_id, event->status));
break;
/* FALLTHROUGH */
A number of issues are worth mentioning here:
#1) Previous versions of the RDS code simply rejected a connection
by calling "rdma_reject(cm_id, NULL, 0);"
So the value of the payload in "private_data" will not be "1",
but "0".

#2) Now the code has become dependent on host byte order and sizing.
If one peer is big-endian, the other is little-endian,
or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
the *err check does not work as intended.

#3) There is no check for "len" to see if the data behind *err is even valid.
Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
carry 148 bytes of zeroized payload.
But that should probably not be relied upon here.

#4) With the added "break;",
we might as well drop the misleading "/* FALLTHROUGH */" comment.

This commit does _not_ address issue #2, as the sender would have to
agree on a byte order as well.

Here is the sequence of messages in this observed error-scenario:
Host-A is pre-QoS changes (excluding the commit mentioned above)
Host-B is post-QoS changes (including the commit mentioned above)

#1 Host-B
issues a connection request via function "rds_conn_path_transition"
connection state transitions to "RDS_CONN_CONNECTING"

#2 Host-A
rejects the incompatible connection request (from #1)
It does so by calling "rdma_reject(cm_id, NULL, 0);"

#3 Host-B
receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
But since the code is changed in the way described above,
it won't drop the connection here, simply because "*err == 0".

#4 Host-A
issues a connection request

#5 Host-B
receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
and ends up calling "rds_ib_cm_handle_connect".
But since the state is already in "RDS_CONN_CONNECTING"
(as of #1) it will end up issuing a "rdma_reject" without
dropping the connection:
if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
/* Wait and see - our connect may still be succeeding */
rds_ib_stats_inc(s_ib_connect_raced);
}
goto out;

#6 Host-A
receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
drops the connection and tries again (goto #4) until it gives up.

Tested-by: Zhu Yanjun
Signed-off-by: Gerd Rausch
Signed-off-by: Santosh Shilimkar

Gerd Rausch
2019-07-10 12:45:42 +0800
a55207884 Revert "RDS: IB: split the mr registration and invalidation path" ... Browse Code »

This reverts commit 56012459310a1dbcc55c2dbf5500a9f7571402cb.

RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
"i_fastreg_wrs" to become incremented:
while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) ic->i_fastreg_wrs);
cpu_relax();
}

Looking at the original commit:

commit 56012459310a ("RDS: IB: split the mr registration and
invalidation path")

In there, the "rds_ib_mr_cqe_handler" was changed in the following
way:

void rds_ib_mr_cqe_handler(struct
rds_ib_connection *ic,
struct ib_wc *wc)
if (frmr->fr_inv) {
frmr->fr_state = FRMR_IS_FREE;
frmr->fr_inv = false;
atomic_inc(&ic->i_fastreg_wrs);
} else {
atomic_inc(&ic->i_fastunreg_wrs);
}

It looks like it's got it exactly backwards:

Function "rds_ib_post_reg_frmr" keeps track of the outstanding
requests via "i_fastreg_wrs".

Function "rds_ib_post_inv" keeps track of the outstanding requests
via "i_fastunreg_wrs" (post original commit). It also sets:
frmr->fr_inv = true;

However the completion handler "rds_ib_mr_cqe_handler" adjusts
"i_fastreg_wrs" when "fr_inv" had been true, and adjusts
"i_fastunreg_wrs" otherwise.

The original commit was done in the name of performance:
to remove the performance bottleneck

No performance benefit could be observed with a fixed-up version
of the original commit measured between two Oracle X7 servers,
both equipped with Mellanox Connect-X5 HCAs.

The prudent course of action is to revert this commit.

Signed-off-by: Gerd Rausch
Signed-off-by: Santosh Shilimkar

Gerd Rausch
2019-07-10 12:45:41 +0800
616d37a07 rds: fix reordering with composite message notification ... Browse Code »

RDS composite message(rdma + control) user notification needs to be
triggered once the full message is delivered and such a fix was
added as part of commit 941f8d55f6d61 ("RDS: RDMA: Fix the composite
message user notification"). But rds_send_remove_from_sock is missing
data part notify check and hence at times the user don't get
notification which isn't desirable.

One way is to fix the rds_send_remove_from_sock to check of that case
but considering the ordering complexity with completion handler and
rdma + control messages are always dispatched back to back in same send
context, just delaying the signaled completion on rmda work request also
gets the desired behaviour. i.e Notifying application only after
RDMA + control message send completes. So patch updates the earlier
fix with this approach. The delay signaling completions of rdma op
till the control message send completes fix was done by Venkat
Venkatsubra in downstream kernel.

Reviewed-and-tested-by: Zhu Yanjun
Reviewed-by: Gerd Rausch
Signed-off-by: Santosh Shilimkar

Santosh Shilimkar
2019-07-10 12:45:41 +0800

29 Jun, 2019

1 commit

371bb6215 Merge tag 'v5.2-rc6' into rdma.git for-next ... Browse Code »

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe

Jason Gunthorpe
2019-06-29 08:18:23 +0800

08 Jun, 2019

1 commit

a6cdeeb16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller

David S. Miller
2019-06-08 02:00:14 +0800

07 Jun, 2019

1 commit

85cb92878 net: rds: fix memory leak in rds_ib_flush_mr_pool ... Browse Code »

When the following tests last for several hours, the problem will occur.

Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30

The following will occur.

"
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
%
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
"
>From vmcore, we can find that clean_list is NULL.

>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.

In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);

/* more than one entry in llist nodes */
if (clean_nodes->next)
llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.

Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Zhu Yanjun
2019-06-07 01:32:16 +0800

06 Jun, 2019

2 commits

b50e05874 net: rds: fix memory leak when unload rds_rdma ... Browse Code »

When KASAN is enabled, after several rds connections are
created, then "rmmod rds_rdma" is run. The following will
appear.

"
BUG rds_ib_incoming (Not tainted): Objects remaining
in rds_ib_incoming on __kmem_cache_shutdown()

Call Trace:
dump_stack+0x71/0xab
slab_err+0xad/0xd0
__kmem_cache_shutdown+0x17d/0x370
shutdown_cache+0x17/0x130
kmem_cache_destroy+0x1df/0x210
rds_ib_recv_exit+0x11/0x20 [rds_rdma]
rds_ib_exit+0x7a/0x90 [rds_rdma]
__x64_sys_delete_module+0x224/0x2c0
? __ia32_sys_delete_module+0x2c0/0x2c0
do_syscall_64+0x73/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
"
This is rds connection memory leak. The root cause is:
When "rmmod rds_rdma" is run, rds_ib_remove_one will call
rds_ib_dev_shutdown to drop the rds connections.
rds_ib_dev_shutdown will call rds_conn_drop to drop rds
connections as below.
"
rds_conn_path_drop(&conn->c_path[0], false);
"
In the above, destroy is set to false.
void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
{
atomic_set(&cp->cp_state, RDS_CONN_ERROR);

rcu_read_lock();
if (!destroy && rds_destroy_pending(cp->cp_conn)) {
rcu_read_unlock();
return;
}
queue_work(rds_wq, &cp->cp_down_w);
rcu_read_unlock();
}
In the above function, destroy is set to false. rds_destroy_pending
is called. This does not move rds connections to ib_nodev_conns.
So destroy is set to true to move rds connections to ib_nodev_conns.
In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
is called to shutdown rds connections finally.
Then rds_ib_recv_exit is called to destroy slab.

void rds_ib_recv_exit(void)
{
kmem_cache_destroy(rds_ib_incoming_slab);
kmem_cache_destroy(rds_ib_frag_slab);
}
The above slab memory leak will not occur again.

>From tests,
256 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real 0m16.522s
user 0m0.000s
sys 0m8.152s
512 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real 0m32.054s
user 0m0.000s
sys 0m15.568s

To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
And with 512 rds connections, about 32 seconds are needed.
>From ftrace, when one rds connection is destroyed,

"
19) | rds_conn_destroy [rds]() {
19) 7.782 us | rds_conn_path_drop [rds]();
15) | rds_shutdown_worker [rds]() {
15) | rds_conn_shutdown [rds]() {
15) 1.651 us | rds_send_path_reset [rds]();
15) 7.195 us | }
15) + 11.434 us | }
19) 2.285 us | rds_cong_remove_conn [rds]();
19) * 24062.76 us | }
"
So if many rds connections will be destroyed, this function
rds_ib_destroy_nodev_conns uses most of time.

Suggested-by: Håkon Bugge
Signed-off-by: Zhu Yanjun
Signed-off-by: David S. Miller

Zhu Yanjun
2019-06-06 08:08:14 +0800
fe3475af3 net: rds: add per rds connection cache statistics ... Browse Code »

The variable cache_allocs is to indicate how many frags (KiB) are in one
rds connection frag cache.
The command "rds-info -Iv" will output the rds connection cache
statistics as below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31
send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
rdma_mr_size=257, cache_allocs=12
"
This means that there are about 12KiB frag in this rds connection frag
cache.
Since rds.h in rds-tools is not related with the kernel rds.h, the change
in kernel rds.h does not affect rds-tools.
rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works
well.

Signed-off-by: Zhu Yanjun
Signed-off-by: David S. Miller

Zhu Yanjun
2019-06-06 08:07:06 +0800

22 May, 2019

1 commit

eaa1ca9cf rds: Don't check return value from destroy CQ ... Browse Code »

There is no value in checking ib_destroy_cq() result and skipping to clear
struct ic fields. This connection needs to be reinitialized anyway.

Signed-off-by: Leon Romanovsky
Acked-by: Santosh Shilimkar
Signed-off-by: Jason Gunthorpe

Leon Romanovsky
2019-05-22 02:50:53 +0800