Eric Lee / smarc-fsl-linux-kernel

19 Mar, 2019

1 commit

c5952168e rxrpc: Fix client call queueing, waiting for channel ... Browse Code »

[ Upstream commit 69ffaebb90369ce08657b5aea4896777b9d6e8fc ]

rxrpc_get_client_conn() adds a new call to the front of the waiting_calls
queue if the connection it's going to use already exists. This is bad as
it allows calls to get starved out.

Fix this by adding to the tail instead.

Also change the other enqueue point in the same function to put it on the
front (ie. when we have a new connection). This makes the point that in
the case of a new connection the new call goes at the front (though it
doesn't actually matter since the queue should be unoccupied).

Fixes: 45025bceef17 ("rxrpc: Improve management and caching of client connection objects")
Signed-off-by: David Howells
Reviewed-by: Marc Dionne
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Howells
2019-03-19 20:12:39 +0800

13 Feb, 2019

1 commit

21b8697e6 rxrpc: bad unlock balance in rxrpc_recvmsg ... Browse Code »

[ Upstream commit 6dce3c20ac429e7a651d728e375853370c796e8d ]

When either "goto wait_interrupted;" or "goto wait_error;"
paths are taken, socket lock has already been released.

This patch fixes following syzbot splat :

WARNING: bad unlock balance detected!
5.0.0-rc4+ #59 Not tainted
-------------------------------------
syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
[] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz-executor223/8256:
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798

stack backtrace:
CPU: 1 PID: 8256 Comm: syz-executor223 Not tainted 5.0.0-rc4+ #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_unlock_imbalance_bug kernel/locking/lockdep.c:3391 [inline]
print_unlock_imbalance_bug.cold+0x114/0x123 kernel/locking/lockdep.c:3368
__lock_release kernel/locking/lockdep.c:3601 [inline]
lock_release+0x67e/0xa00 kernel/locking/lockdep.c:3860
sock_release_ownership include/net/sock.h:1471 [inline]
release_sock+0x183/0x1c0 net/core/sock.c:2808
rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
sock_recvmsg_nosec net/socket.c:794 [inline]
sock_recvmsg net/socket.c:801 [inline]
sock_recvmsg+0xd0/0x110 net/socket.c:797
__sys_recvfrom+0x1ff/0x350 net/socket.c:1845
__do_sys_recvfrom net/socket.c:1863 [inline]
__se_sys_recvfrom net/socket.c:1859 [inline]
__x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:1859
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446379
Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe5da89fd98 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 0000000000446379
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
R13: 0000000000000000 R14: 0000000000000000 R15: 20c49ba5e353f7cf

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: Eric Dumazet
Cc: David Howells
Reported-by: syzbot
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2019-02-13 02:47:22 +0800

23 Nov, 2018

1 commit

36b057507 rxrpc: Fix lockup due to no error backoff after ack transmit error ... Browse Code »

[ Upstream commit c7e86acfcee30794dc99a0759924bf7b9d43f1ca ]

If the network becomes (partially) unavailable, say by disabling IPv6, the
background ACK transmission routine can get itself into a tizzy by
proposing immediate ACK retransmission. Since we're in the call event
processor, that happens immediately without returning to the workqueue
manager.

The condition should clear after a while when either the network comes back
or the call times out.

Fix this by:

(1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
This will allow a certain amount of time to elapse before we try
again.

(2) Enforce a return to the workqueue manager after a certain number of
iterations of the call processing loop.

(3) Add a backoff delay that increases the delay on deferred ACKs by a
jiffy per failed transmission to a limit of HZ. The backoff delay is
cleared on a successful return from kernel_sendmsg().

(4) Cancel calls immediately if the opening sendmsg fails. The layer
above can arrange retransmission or rotate to another server.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-11-23 15:17:07 +0800

04 Nov, 2018

1 commit

fd54c188b Revert "net: simplify sock_poll_wait" ... Browse Code »

[ Upstream commit 89ab066d4229acd32e323f1569833302544a4186 ]

This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317.

This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
internal TCP socket for the initial handshake with the remote peer.
Whenever the SMC connection can not be established this TCP socket is
used as a fallback. All socket operations on the SMC socket are then
forwarded to the TCP socket. In case of poll, the file->private_data
pointer references the SMC socket because the TCP socket has no file
assigned. This causes tcp_poll to wait on the wrong socket.

Signed-off-by: Karsten Graul
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Karsten Graul
2018-11-04 21:50:51 +0800

16 Oct, 2018

4 commits

1890fea79 rxrpc: Fix a missing rxrpc_put_peer() in the error_report handler ... Browse Code »

Fix a missing call to rxrpc_put_peer() on the main path through the
rxrpc_error_report() function. This manifests itself as a ref leak
whenever an ICMP packet or other error comes in.

In commit f334430316e7, the hand-off of the ref to a work item was removed
and was not replaced with a put.

Fixes: f334430316e7 ("rxrpc: Fix error distribution")
Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2018-10-16 14:13:42 +0800
7ec8dc96e rxrpc: Fix incorrect conditional on IPV6 ... Browse Code »

The udpv6_encap_enable() function is part of the ipv6 code, and if that is
configured as a loadable module and rxrpc is built in then a build failure
will occur because the conditional check is wrong:

net/rxrpc/local_object.o: In function `rxrpc_lookup_local':
local_object.c:(.text+0x2688): undefined reference to `udpv6_encap_enable'

Use the correct config symbol (CONFIG_AF_RXRPC_IPV6) in the conditional
check rather than CONFIG_IPV6 as that will do the right thing.

Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Reported-by: kbuild-all@01.org
Reported-by: Arnd Bergmann
Signed-off-by: David Howells
Reviewed-by: Arnd Bergmann
Signed-off-by: David S. Miller

David Howells
2018-10-16 13:19:46 +0800
d6672a5a9 rxrpc: use correct kvec num when sending BUSY response packet ... Browse Code »

Fixes gcc '-Wunused-but-set-variable' warning:

net/rxrpc/output.c: In function 'rxrpc_reject_packets':
net/rxrpc/output.c:527:11: warning:
variable 'ioc' set but not used [-Wunused-but-set-variable]

'ioc' is the correct kvec num when sending a BUSY (or an ABORT) response
packet.

Fixes: ece64fec164f ("rxrpc: Emit BUSY packets when supposed to rather than ABORTs")
Signed-off-by: YueHaibing
Signed-off-by: David Howells
Signed-off-by: David S. Miller

YueHaibing
2018-10-16 13:08:17 +0800
d7b4c24f4 rxrpc: Fix an uninitialised variable ... Browse Code »

Fix an uninitialised variable introduced by the last patch. This can cause
a crash when a new call comes in to a local service, such as when an AFS
fileserver calls back to the local cache manager.

Fixes: c1e15b4944c9 ("rxrpc: Fix the packet reception routine")
Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2018-10-16 13:07:36 +0800

09 Oct, 2018

3 commits

c1e15b494 rxrpc: Fix the packet reception routine ... Browse Code »

The rxrpc_input_packet() function and its call tree was built around the
assumption that data_ready() handler called from UDP to inform a kernel
service that there is data to be had was non-reentrant. This means that
certain locking could be dispensed with.

This, however, turns out not to be the case with a multi-queue network card
that can deliver packets to multiple cpus simultaneously. Each of those
cpus can be in the rxrpc_input_packet() function at the same time.

Fix by adding or changing some structure members:

(1) Add peer->rtt_input_lock to serialise access to the RTT buffer.

(2) Make conn->service_id into a 32-bit variable so that it can be
cmpxchg'd on all arches.

(3) Add call->input_lock to serialise access to the Rx/Tx state. Note
that although the Rx and Tx states are (almost) entirely separate,
there's no point completing the separation and having separate locks
since it's a bi-phasal RPC protocol rather than a bi-direction
streaming protocol. Data transmission and data reception do not take
place simultaneously on any particular call.

and making the following functional changes:

(1) In rxrpc_input_data(), hold call->input_lock around the core to
prevent simultaneous producing of packets into the Rx ring and
updating of tracking state for a particular call.

(2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
The bit test and bit clear can then be combined. No further locking
is needed here.

(3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
the ACK packet. The superseded ACK check is then done both before and
after the lock is taken.

The handing of ackinfo data is split, parsing before the lock is taken
and processing with it held. This is keyed on rxMTU being non-zero.

Congestion management is also done within the locked section.

(4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
rotation. The ACKALL packet carries no information and is only really
useful after all packets have been transmitted since it's imprecise.

(5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
prevent calls being simultaneously implicitly ended on two cpus and
also to prevent any races with incoming call setup.

(6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
on a connection. It is only permitted to happen once for a
connection.

(7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
rx->incoming_lock to see if someone else set up the call, connection
or peer whilst we were getting there. We can't trust the values from
the earlier routing check unless we pin refs on them - which we want
to avoid.

Further, we need to allow for an incoming call to have its state
changed on another CPU between us making it live and us adjusting it
because the conn is now in the RXRPC_CONN_SERVICE state.

(8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
to the RTT buffer. Don't need to lock around setting peer->rtt.

For reference, the inventory of state-accessing or state-altering functions
used by the packet input procedure is:

> rxrpc_input_packet()
* PACKET CHECKING

* ROUTING
> rxrpc_post_packet_to_local()
> rxrpc_find_connection_rcu() - uses RCU
> rxrpc_lookup_peer_rcu() - uses RCU
> rxrpc_find_service_conn_rcu() - uses RCU
> idr_find() - uses RCU

* CONNECTION-LEVEL PROCESSING
- Service upgrade
- Can only happen once per conn
! Changed to use cmpxchg
> rxrpc_post_packet_to_conn()
- Setting conn->hi_serial
- Probably safe not using locks
- Maybe use cmpxchg

* CALL-LEVEL PROCESSING
> Old-call checking
> rxrpc_input_implicit_end_call()
> rxrpc_call_completed()
> rxrpc_queue_call()
! Need to take rx->incoming_lock
> __rxrpc_disconnect_call()
> rxrpc_notify_socket()
> rxrpc_new_incoming_call()
- Uses rx->incoming_lock for the entire process
- Might be able to drop this earlier in favour of the call lock
> rxrpc_incoming_call()
! Conflicts with rxrpc_input_implicit_end_call()
> rxrpc_send_ping()
- Don't need locks to check rtt state
> rxrpc_propose_ACK

* PACKET DISTRIBUTION
> rxrpc_input_call_packet()
> rxrpc_input_data()
* QUEUE DATA PACKET ON CALL
> rxrpc_reduce_call_timer()
- Uses timer_reduce()
! Needs call->input_lock()
> rxrpc_receiving_reply()
! Needs locking around ack state
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_proto_abort()
> rxrpc_input_dup_data()
- Fills the Rx buffer
- rxrpc_propose_ACK()
- rxrpc_notify_socket()

> rxrpc_input_ack()
* APPLY ACK PACKET TO CALL AND DISCARD PACKET
> rxrpc_input_ping_response()
- Probably doesn't need any extra locking
! Need READ_ONCE() on call->ping_serial
> rxrpc_input_check_for_lost_ack()
- Takes call->lock to consult Tx buffer
> rxrpc_peer_add_rtt()
! Needs to take a lock (peer->rtt_input_lock)
! Could perhaps manage with cmpxchg() and xadd() instead
> rxrpc_input_requested_ack
- Consults Tx buffer
! Probably needs a lock
> rxrpc_peer_add_rtt()
> rxrpc_propose_ack()
> rxrpc_input_ackinfo()
- Changes call->tx_winsize
! Use cmpxchg to handle change
! Should perhaps track serial number
- Uses peer->lock to record MTU specification changes
> rxrpc_proto_abort()
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_input_soft_acks()
- Consults the Tx buffer
> rxrpc_congestion_management()
- Modifies the Tx annotations
! Needs call->input_lock()
> rxrpc_queue_call()

> rxrpc_input_abort()
* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
> rxrpc_set_call_completion()
> rxrpc_notify_socket()

> rxrpc_input_ackall()
* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()

> rxrpc_reject_packet()

There are some functions used by the above that queue the packet, after
which the procedure is terminated:

- rxrpc_post_packet_to_local()
- local->event_queue is an sk_buff_head
- local->processor is a work_struct
- rxrpc_post_packet_to_conn()
- conn->rx_queue is an sk_buff_head
- conn->processor is a work_struct
- rxrpc_reject_packet()
- local->reject_queue is an sk_buff_head
- local->processor is a work_struct

And some that offload processing to process context:

- rxrpc_notify_socket()
- Uses RCU lock
- Uses call->notify_lock to call call->notify_rx
- Uses call->recvmsg_lock to queue recvmsg side
- rxrpc_queue_call()
- call->processor is a work_struct
- rxrpc_propose_ACK()
- Uses call->lock to wrap __rxrpc_propose_ACK()

And a bunch that complete a call, all of which use call->state_lock to
protect the call state:

- rxrpc_call_completed()
- rxrpc_set_call_completion()
- rxrpc_abort_call()
- rxrpc_proto_abort()
- Also uses rxrpc_queue_call()

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells

David Howells
2018-10-09 05:42:04 +0800
647530924 rxrpc: Fix connection-level abort handling ... Browse Code »

Fix connection-level abort handling to cache the abort and error codes
properly so that a new incoming call can be properly aborted if it races
with the parent connection being aborted by another CPU.

The abort_code and error parameters can then be dropped from
rxrpc_abort_calls().

Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
Signed-off-by: David Howells

David Howells
2018-10-09 05:42:04 +0800
298bc15b2 rxrpc: Only take the rwind and mtu values from latest ACK ... Browse Code »

Move the out-of-order and duplicate ACK packet check to before the call to
rxrpc_input_ackinfo() so that the receive window size and MTU size are only
checked in the latest ACK packet and don't regress.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells

David Howells
2018-10-09 05:42:04 +0800

08 Oct, 2018

4 commits

dfe995224 rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window() ... Browse Code »

Carry the call state out of the locked section in rxrpc_rotate_tx_window()
rather than sampling it afterwards. This is only used to select tracepoint
data, but could have changed by the time we do the tracepoint.

Signed-off-by: David Howells

David Howells
2018-10-08 22:57:36 +0800
c479d5f2c rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() ... Browse Code »

We should only call the function to end a call's Tx phase if we rotated the
marked-last packet out of the transmission buffer.

Make rxrpc_rotate_tx_window() return an indication of whether it just
rotated the packet marked as the last out of the transmit buffer, carrying
the information out of the locked section in that function.

We can then check the return value instead of examining RXRPC_CALL_TX_LAST.

Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
Signed-off-by: David Howells

David Howells
2018-10-08 22:55:20 +0800
bfd282111 rxrpc: Don't need to take the RCU read lock in the packet receiver ... Browse Code »

We don't need to take the RCU read lock in the rxrpc packet receive
function because it's held further up the stack in the IP input routine
around the UDP receive routines.

Fix this by dropping the RCU read lock calls from rxrpc_input_packet().
This simplifies the code.

Fixes: 70790dbe3f66 ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
Signed-off-by: David Howells

David Howells
2018-10-08 22:45:56 +0800
5271953ca rxrpc: Use the UDP encap_rcv hook ... Browse Code »

Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception
in which a packet is placed onto the UDP receive queue and then immediately
removed again by rxrpc. Going via the queue in this manner seems like it
should be unnecessary.

This does, however, require the invention of a value to place in encap_type
as that's one of the conditions to switch packets out to the encap_rcv
hook. Possibly the value doesn't actually matter for anything other than
sockopts on the UDP socket, which aren't accessible outside of rxrpc
anyway.

This seems to cut a bit of time out of the time elapsed between each
sk_buff being timestamped and turning up in rxrpc (the final number in the
following trace excerpts). I measured this by making the rxrpc_rx_packet
trace point print the time elapsed between the skb being timestamped and
the current time (in ns), e.g.:

... 424.278721: rxrpc_rx_packet: ... ACK 25026

So doing a 512MiB DIO read from my test server, with an unmodified kernel:

N min max sum mean stddev
27605 2626 7581 7.83992e+07 2840.04 181.029

and with the patch applied:

N min max sum mean stddev
27547 1895 12165 6.77461e+07 2459.29 255.02

Signed-off-by: David Howells

David Howells
2018-10-08 22:45:18 +0800

05 Oct, 2018

2 commits

2cfa22716 rxrpc: Fix the data_ready handler ... Browse Code »

Fix the rxrpc_data_ready() function to pick up all packets and to not miss
any. There are two problems:

(1) The sk_data_ready pointer on the UDP socket is set *after* it is
bound. This means that it's open for business before we're ready to
dequeue packets and there's a tiny window exists in which a packet can
sneak onto the receive queue, but we never know about it.

Fix this by setting the pointers on the socket prior to binding it.

(2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was
an error on the transmission side, even though we set the
sk_error_report hook. Because rxrpc_data_ready() returns immediately
in such a case, it never actually removes its packet from the receive
queue.

Fix this by abstracting out the UDP dequeuing and checksumming into a
separate function that keeps hammering on skb_recv_udp() until it
returns -EAGAIN, passing the packets extracted to the remainder of the
function.

and two potential problems:

(3) It might be possible in some circumstances or in the future for
packets to be being added to the UDP receive queue whilst rxrpc is
running consuming them, so the data_ready() handler might get called
less often than once per packet.

Allow for this by fully draining the queue on each call as (2).

(4) If a packet fails the checksum check, the code currently returns after
discarding the packet without checking for more.

Allow for this by fully draining the queue on each call as (2).

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells
Acked-by: Paolo Abeni

David Howells
2018-10-05 21:21:59 +0800
5e33a23ba rxrpc: Fix some missed refs to init_net ... Browse Code »

Fix some refs to init_net that should've been changed to the appropriate
network namespace.

Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells
Acked-by: Paolo Abeni

David Howells
2018-10-05 21:21:59 +0800

28 Sep, 2018

7 commits

f33443031 rxrpc: Fix error distribution ... Browse Code »

Fix error distribution by immediately delivering the errors to all the
affected calls rather than deferring them to a worker thread. The problem
with the latter is that retries and things can happen in the meantime when we
want to stop that sooner.

To this end:

(1) Stop the error distributor from removing calls from the error_targets
list so that peer->lock isn't needed to synchronise against other adds
and removals.

(2) Require the peer's error_targets list to be accessed with RCU, thereby
avoiding the need to take peer->lock over distribution.

(3) Don't attempt to affect a call's state if it is already marked complete.

Signed-off-by: David Howells

David Howells
2018-09-28 17:33:17 +0800
37a675e76 rxrpc: Fix transport sockopts to get IPv4 errors on an IPv6 socket ... Browse Code »

It seems that enabling IPV6_RECVERR on an IPv6 socket doesn't also turn on
IP_RECVERR, so neither local errors nor ICMP-transported remote errors from
IPv4 peer addresses are returned to the AF_RXRPC protocol.

Make the sockopt setting code in rxrpc_open_socket() fall through from the
AF_INET6 case to the AF_INET case to turn on all the AF_INET options too in
the AF_INET6 case.

Fixes: f2aeed3a591f ("rxrpc: Fix error reception on AF_INET6 sockets")
Signed-off-by: David Howells

David Howells
2018-09-28 17:33:10 +0800
0099dc589 rxrpc: Make service call handling more robust ... Browse Code »

Make the following changes to improve the robustness of the code that sets
up a new service call:

(1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
service ID check and pass that along to rxrpc_new_incoming_call().
This means that I can remove the check from rxrpc_new_incoming_call()
without the need to worry about the socket attached to the local
endpoint getting replaced - which would invalidate the check.

(2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
done once. The peer is passed to rxrpc_new_incoming_call(), thereby
saving the need to repeat the search.

This also reduces the possibility of rxrpc_publish_service_conn()
BUG()'ing due to the detection of a duplicate connection, despite the
initial search done by rxrpc_find_connection_rcu() having turned up
nothing.

This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
non-reentrant and the result of the initial search should still hold
true, but it has proven possible to hit.

I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
the iteration over the hash table if it finds a matching peer with a
zero usage count, but I don't know for sure since it's only ever been
hit once that I know of.

Another possibility is that a bug in rxrpc_data_ready() that checked
the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
might've let through a packet that caused a spurious and invalid call
to be set up. That is addressed in another patch.

(3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
usage count rather than stopping and returning not found, just in case
there's another peer record behind it in the bucket.

(4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
rather either use the peer cached in (2) or, if one wasn't found,
preemptively install a new one.

Fixes: 8496af50eb38 ("rxrpc: Use RCU to access a peer's service connection tree")
Signed-off-by: David Howells

David Howells
2018-09-28 17:32:49 +0800
403fc2a13 rxrpc: Improve up-front incoming packet checking ... Browse Code »

Do more up-front checking on incoming packets to weed out invalid ones and
also ones aimed at services that we don't support.

Whilst we're at it, replace the clearing of call and skew if we don't find
a connection with just initialising the variables to zero at the top of the
function.

Signed-off-by: David Howells

David Howells
2018-09-28 17:32:31 +0800
ece64fec1 rxrpc: Emit BUSY packets when supposed to rather than ABORTs ... Browse Code »

In the input path, a received sk_buff can be marked for rejection by
setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
(such as an abort code) in skb->priority. The rejection is handled by
queueing the sk_buff up for dealing with in process context. The output
code reads the mark and priority and, theoretically, generates an
appropriate response packet.

However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
message with a random abort code is generated (since skb->priority wasn't
set to anything).

Fix this by outputting the appropriate sort of packet.

Also, whilst we're at it, most of the marks are no longer used, so remove
them and rename the remaining two to something more obvious.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells

David Howells
2018-09-28 17:32:19 +0800
b604dd988 rxrpc: Fix RTT gathering ... Browse Code »

Fix RTT information gathering in AF_RXRPC by the following means:

(1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.

(2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
collects it, set it at that point.

(3) Allow ACKs to be requested on the last packet of a client call, but
not a service call. We need to be careful lest we undo:

bf7d620abf22c321208a4da4f435e7af52551a21
Author: David Howells
Date: Thu Oct 6 08:11:51 2016 +0100
rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase

but that only really applies to service calls that we're handling,
since the client side gets to send the final ACK (or not).

(4) When about to transmit an ACK or DATA packet, record the Tx timestamp
before only; don't update the timestamp afterwards.

(5) Switch the ordering between recording the serial and recording the
timestamp to always set the serial number first. The serial number
shouldn't be seen referenced by an ACK packet until we've transmitted
the packet bearing it - so in the Rx path, we don't need the timestamp
until we've checked the serial number.

Fixes: cf1a6474f807 ("rxrpc: Add per-peer RTT tracker")
Signed-off-by: David Howells

David Howells
2018-09-28 17:32:03 +0800
dc71db34e rxrpc: Fix checks as to whether we should set up a new call ... Browse Code »

There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
flag in the packet type field rather than in the packet flags field.

Fix this by creating a pair of helper functions to check whether the packet
is going to the client or to the server and use them generally.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells

David Howells
2018-09-28 17:31:20 +0800

27 Sep, 2018

1 commit

092ffc51f rxrpc: Remove dup code from rxrpc_find_connection_rcu() ... Browse Code »

rxrpc_find_connection_rcu() initialises variable k twice with the same
information. Remove one of the initialisations.

Signed-off-by: David Howells

David Howells
2018-09-27 22:13:07 +0800

16 Aug, 2018

1 commit

9a76aba02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.

- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.

- Re-enable ASPM in r8169, from Kai-Heng Feng.

- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.

- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.

- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.

- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

- Lots of cleanups and fixes in L2TP code from Guillaume Nault.

- Add IPSEC offload support to netdevsim, from Shannon Nelson.

- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.

- Add UDP GSO support to mlx5e, from Boris Pismenny.

- Support offloading of Team LAG in NFP, from John Hurley.

- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.

- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.

- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.

- Busyloop optimizations in vhost_net, from Toshiaki Makita.

- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.

- Add parameter support to devlink, from Moshe Shemesh.

- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.

- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.

- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.

- Support regions and region snapshots in devlink, from Alex Vesker.

- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.

- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

- Use PHYLIB in r8169 driver, from Heiner Kallweit.

- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.

- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.

- Support for templates in packet scheduler classifier, from Jiri
Pirko.

- IPV6 support in RDS, from Ka-Cheong Poon.

- Native tproxy support in nf_tables, from Máté Eckl.

- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.

- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...

Linus Torvalds
2018-08-16 06:04:25 +0800

14 Aug, 2018

1 commit

de5d1b39e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking/atomics update from Thomas Gleixner:
"The locking, atomics and memory model brains delivered:

- A larger update to the atomics code which reworks the ordering
barriers, consolidates the atomic primitives, provides the new
atomic64_fetch_add_unless() primitive and cleans up the include
hell.

- Simplify cmpxchg() instrumentation and add instrumentation for
xchg() and cmpxchg_double().

- Updates to the memory model and documentation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/atomics: Rework ordering barriers
locking/atomics: Instrument cmpxchg_double*()
locking/atomics: Instrument xchg()
locking/atomics: Simplify cmpxchg() instrumentation
locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
tools/memory-model: Rename litmus tests to comply to norm7
tools/memory-model/Documentation: Fix typo, smb->smp
sched/Documentation: Update wake_up() & co. memory-barrier guarantees
locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
sched/core: Use smp_mb() in wake_woken_function()
tools/memory-model: Add informal LKMM documentation to MAINTAINERS
locking/atomics/Documentation: Describe atomic_set() as a write operation
tools/memory-model: Make scripts executable
tools/memory-model: Remove ACCESS_ONCE() from model
tools/memory-model: Remove ACCESS_ONCE() from recipes
locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
tools/memory-model: Add litmus test for full multicopy atomicity
locking/refcount: Always allow checked forms
...

Linus Torvalds
2018-08-14 03:23:39 +0800

12 Aug, 2018

1 commit

98ed1e642 rxrpc: remove redundant static int 'zero' ... Browse Code »

The static int 'zero' is defined but is never used hence it is
redundant and can be removed. The use of this variable was removed
with commit a158bdd3247b ("rxrpc: Fix call timeouts").

Cleans up clang warning:
warning: 'zero' defined but not used [-Wunused-const-variable=]

Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller

Colin Ian King
2018-08-12 02:25:18 +0800

10 Aug, 2018

1 commit

a736e0746 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Overlapping changes in RXRPC, changing to ktime_get_seconds() whilst
adding some tracepoints.

Signed-off-by: David S. Miller

David S. Miller
2018-08-10 02:52:36 +0800

09 Aug, 2018

1 commit

330bdcfad rxrpc: Fix the keepalive generator [ver #2] ... Browse Code »

AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open. The implementation is incorrect in the following ways:

(1) It mixes up ktime_t and time64_t types.

(2) It uses ktime_get_real(), the output of which may jump forward or
backward due to adjustments to the time of day.

(3) If the current time jumps forward too much or jumps backwards, the
generator function will crank the base of the time ring round one slot
at a time (ie. a 1s period) until it catches up, spewing out VERSION
packets as it goes.

Fix the problem by:

(1) Only using time64_t. There's no need for sub-second resolution.

(2) Use ktime_get_seconds() rather than ktime_get_real() so that time
isn't perceived to go backwards.

(3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
parts:

(a) The "worker" function that manages the buckets and the timer.

(b) The "dispatch" function that takes the pending peers and
potentially transmits a keepalive packet before putting them back
in the ring into the slot appropriate to the revised last-Tx time.

(4) Taking everything that's pending out of the ring and splicing it into
a temporary collector list for processing.

In the case that there's been a significant jump forward, the ring
gets entirely emptied and then the time base can be warped forward
before the peers are processed.

The warping can't happen if the ring isn't empty because the slot a
peer is in is keepalive-time dependent, relative to the base time.

(5) Limit the number of iterations of the bucket array when scanning it.

(6) Set the timer to skip any empty slots as there's no point waking up if
there's nothing to do yet.

This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started. The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
...
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
EIP: lock_acquire+0x69/0x80
...
Call Trace:
? rxrpc_peer_keepalive_worker+0x5e/0x350
? _raw_spin_lock_bh+0x29/0x60
? rxrpc_peer_keepalive_worker+0x5e/0x350
? rxrpc_peer_keepalive_worker+0x5e/0x350
? __lock_acquire+0x3d3/0x870
? process_one_work+0x110/0x340
? process_one_work+0x166/0x340
? process_one_work+0x110/0x340
? worker_thread+0x39/0x3c0
? kthread+0xdb/0x110
? cancel_delayed_work+0x90/0x90
? kthread_stop+0x70/0x70
? ret_from_fork+0x19/0x24

Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Reported-by: kernel test robot
Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2018-08-09 10:10:26 +0800

04 Aug, 2018

2 commits

eb9950eb3 rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to caller ... Browse Code »

Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow
non-contiguous iovs to be passed down, thereby permitting file reading to
be simplified in the AFS filesystem in a future patch.

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2018-08-04 03:46:20 +0800
54424d389 rxrpc: Reuse SKCIPHER_REQUEST_ON_STACK buffer ... Browse Code »

The use of SKCIPHER_REQUEST_ON_STACK() will trigger FRAME_WARN warnings
(when less than 2048) once the VLA is no longer hidden from the check:

net/rxrpc/rxkad.c:398:1: warning: the frame size of 1152 bytes is larger than 1024 bytes [-Wframe-larger-than=]
net/rxrpc/rxkad.c:242:1: warning: the frame size of 1152 bytes is larger than 1024 bytes [-Wframe-larger-than=]

This passes the initial SKCIPHER_REQUEST_ON_STACK allocation to the leaf
functions for reuse. Two requests allocated on the stack is not needed
when only one is used at a time.

Signed-off-by: Kees Cook
Acked-by: Arnd Bergmann
Signed-off-by: David Howells
Signed-off-by: David S. Miller

Kees Cook
2018-08-04 00:48:12 +0800

03 Aug, 2018

2 commits

89b1698c9 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The BTF conflicts were simple overlapping changes.

The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.

Signed-off-by: David S. Miller

David S. Miller
2018-08-03 01:55:32 +0800
770b26de1 rxrpc: Remove set but not used variable 'nowj' ... Browse Code »

Fixes gcc '-Wunused-but-set-variable' warning:

net/rxrpc/proc.c: In function 'rxrpc_call_seq_show':
net/rxrpc/proc.c:66:29: warning:
variable 'nowj' set but not used [-Wunused-but-set-variable]
unsigned long timeout = 0, nowj;
^

Signed-off-by: Wei Yongjun
Signed-off-by: David Howells
Signed-off-by: David S. Miller

Wei Yongjun
2018-08-03 01:18:20 +0800

02 Aug, 2018

1 commit

c01f6c9b3 rxrpc: Fix user call ID check in rxrpc_service_prealloc_one ... Browse Code »

There just check the user call ID isn't already in use, hence should
compare user_call_ID with xcall->user_call_ID, which is current
node's user_call_ID.

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Suggested-by: David Howells
Signed-off-by: YueHaibing
Signed-off-by: David Howells
Signed-off-by: David S. Miller

YueHaibing
2018-08-02 02:49:49 +0800

01 Aug, 2018

5 commits

d0b35a420 rxrpc: Transmit more ACKs during data reception ... Browse Code »

Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() -
which transfers data to the target buffers - if we previously had an Rx
underrun (ie. we returned -EAGAIN because we ran out of received data).
This lets the server know what we've managed to receive something.

Also flush any outstanding ACK after calling the function if it hit -EAGAIN
to let the server know we processed some data.

It might be better to send more ACKs, possibly on a time-based scheme, but
that needs some more consideration.

With this and some additional AFS patches, it is possible to get large
unencrypted O_DIRECT reads to be almost as fast as NFS over TCP. It looks
like it might be theoretically possible to improve performance yet more for
a server running a single operation as investigation of packet timestamps
indicates that the server keeps stalling.

The issue appears to be that rxrpc runs in to trouble with ACK packets
getting batched together (up to ~32 at a time) somewhere between the IP
transmit queue on the client and the ethernet receive queue on the server.

However, this case isn't too much of a worry as even a lightly loaded
server should be receiving sufficient packet flux to flush the ACK packets
to the UDP socket.

Signed-off-by: David Howells

David Howells
2018-08-01 20:28:24 +0800
a71a2651b rxrpc: Propose, but don't immediately transmit, the final ACK for a call ... Browse Code »

The final ACK that closes out an rxrpc call needs to be transmitted by the
client unless we're going to follow up with a DATA packet for a new call on
the same channel (which implicitly ACK's the previous call, thereby saving
an ACK).

Currently, we don't do that, so if no follow on call is immediately
forthcoming, the server will resend the last DATA packet - at which point
rxrpc_conn_retransmit_call() will be triggered and will (re)send the final
ACK. But the server has to hold on to the last packet until the ACK is
received, thereby holding up its resources.

Fix the client side to propose a delayed final ACK, to be transmitted after
a short delay, assuming the call isn't superseded by a new one.

Signed-off-by: David Howells

David Howells
2018-08-01 20:28:24 +0800
4075295ab rxrpc: Increase the size of a call's Rx window ... Browse Code »

Increase the size of a call's Rx window from 32 to 63 - ie. one less than
the size of the ring buffer. This makes large data transfers perform
better when the Tx window on the other side is around 64 (as is the case
with Auristor's YFS fileserver).

If the server window size is ~32 or smaller, this should make no
difference.

Signed-off-by: David Howells

David Howells
2018-08-01 20:28:23 +0800
4272d3034 rxrpc: Trace socket notification ... Browse Code »

Trace notifications from the softirq side of the socket to the
process-context side.

Signed-off-by: David Howells

David Howells
2018-08-01 20:28:23 +0800
4764c0da6 rxrpc: Trace packet transmission ... Browse Code »

Trace successful packet transmission (kernel_sendmsg() succeeded, that is)
in AF_RXRPC. We can share the enum that defines the transmission points
with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be
applicable to both.

Also, save the internal call->debug_id in the rxrpc_channel struct so that
it can be used in retransmission trace lines.

Signed-off-by: David Howells

David Howells
2018-08-01 20:28:23 +0800