Eric Lee / smarc-fsl-linux-kernel

21 Jun, 2018

1 commit

29475c404 rxrpc: Fix the min security level for kernel calls ... Browse Code »

[ Upstream commit 93864fc3ffcc4bf70e96cfb5cc6e941630419ad0 ]

Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.

Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-06-21 03:02:56 +0800

04 Feb, 2018

1 commit

1392633ba rxrpc: Fix service endpoint expiry ... Browse Code »

[ Upstream commit f859ab61875978eeaa539740ff7f7d91f5d60006 ]

RxRPC service endpoints expire like they're supposed to by the following
means:

(1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
global service conn timeout, otherwise the first rxrpc_net struct to
die will cause connections on all others to expire immediately from
then on.

(2) Mark local service endpoints for which the socket has been closed
(->service_closed) so that the expiration timeout can be much
shortened for service and client connections going through that
endpoint.

(3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
count reaches 1, not 0, as idle conns have a 1 count.

(4) The accumulator for the earliest time we might want to schedule for
should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
the comparison functions use signed arithmetic.

(5) Simplify the expiration handling, adding the expiration value to the
idle timestamp each time rather than keeping track of the time in the
past before which the idle timestamp must go to be expired. This is
much easier to read.

(6) Ignore the timeouts if the net namespace is dead.

(7) Restart the service reaper work item rather the client reaper.

Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-02-04 00:39:01 +0800

22 Oct, 2017

1 commit

6cb3ece96 rxrpc: Don't release call mutex on error pointer ... Browse Code »

Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
call pointer actually holds an error value.

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Marc Dionne
Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2017-10-22 10:05:39 +0800

29 Aug, 2017

2 commits

c038a58cc rxrpc: Allow failed client calls to be retried ... Browse Code »

Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.

Two new functions are provided:

(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.

(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.

A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.

Signed-off-by: David Howells

David Howells
2017-08-29 17:55:20 +0800
3ec0efde5 rxrpc: Remove some excess whitespace ... Browse Code »

Remove indentation from some blank lines.

Signed-off-by: David Howells

David Howells
2017-08-29 17:55:20 +0800

01 Jul, 2017

2 commits

41c6d650f net: convert sock.sk_refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-01 22:39:08 +0800
14afee4b6 net: convert sock.sk_wmem_alloc from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-01 22:39:08 +0800

08 Jun, 2017

2 commits

e754eba68 rxrpc: Provide a cmsg to specify the amount of Tx data for a call ... Browse Code »

Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.

Signed-off-by: David Howells

David Howells
2017-06-08 00:15:46 +0800
515559ca2 rxrpc: Provide a getsockopt call to query what cmsgs types are supported ... Browse Code »

Provide a getsockopt() call that can query what cmsg types are supported by
AF_RXRPC.

David Howells
2017-06-08 00:15:46 +0800

05 Jun, 2017

3 commits

4722974d9 rxrpc: Implement service upgrade ... Browse Code »

Implement AuriStor's service upgrade facility. There are three problems
that this is meant to deal with:

(1) Various of the standard AFS RPC calls have IPv4 addresses in their
requests and/or replies - but there's no room for including IPv6
addresses.

(2) Definition of IPv6-specific RPC operations in the standard operation
sets has not yet been achieved.

(3) One could envision the creation a new service on the same port that as
the original service. The new service could implement improved
operations - and the client could try this first, falling back to the
original service if it's not there.

Unfortunately, certain servers ignore packets addressed to a service
they don't implement and don't respond in any way - not even with an
ABORT. This means that the client must then wait for the call timeout
to occur.

What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats. Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.

In the protocol, the procedure is then as follows:

(1) To request an upgrade, the first DATA packet in a new connection must
have the userStatus set to 1 (this is normally 0). The userStatus
value is normally ignored by the server.

(2) If the server doesn't support upgrading, the reply packets will
contain the same service ID as for the first request packet.

(3) If the server does support upgrading, all future reply packets on that
connection will contain the new service ID and the new service ID will
be applied to *all* further calls on that connection as well.

(4) The RPC op used to probe the upgrade must take the same request data
as the shadow call in the upgrade set (but may return a different
reply). GetCapability RPC ops were added to all standard sets for
just this purpose. Ops where the request formats differ cannot be
used for probing.

(5) The client must wait for completion of the probe before sending any
further RPC ops to the same destination. It should then use the
service ID that recvmsg() reported back in all future calls.

(6) The shadow service must have call definitions for all the operation
IDs defined by the original service.

To support service upgrading, a server should:

(1) Call bind() twice on its AF_RXRPC socket before calling listen().
Each bind() should supply a different service ID, but the transport
addresses must be the same. This allows the server to receive
requests with either service ID.

(2) Enable automatic upgrading by calling setsockopt(), specifying
RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
unsigned shorts as the argument:

unsigned short optval[2];

This specifies a pair of service IDs. They must be different and must
match the service IDs bound to the socket. Member 0 is the service ID
to upgrade from and member 1 is the service ID to upgrade to.

Signed-off-by: David Howells

David Howells
2017-06-05 21:30:49 +0800
28036f448 rxrpc: Permit multiple service binding ... Browse Code »

Permit bind() to be called on an AF_RXRPC socket more than once (currently
maximum twice) to bind multiple listening services to it. There are some
restrictions:

(1) All bind() calls involved must have a non-zero service ID.

(2) The service IDs must all be different.

(3) The rest of the address (notably the transport part) must be the same
in all (a single UDP socket is shared).

(4) This must be done before listen() or sendmsg() is called.

This allows someone to connect to the service socket with different service
IDs and lays the foundation for service upgrading.

The service ID used by an incoming call can be extracted from the msg_name
returned by recvmsg().

Signed-off-by: David Howells

David Howells
2017-06-05 21:30:49 +0800
68d6d1ae5 rxrpc: Separate the connection's protocol service ID from the lookup ID ... Browse Code »

Keep the rxrpc_connection struct's idea of the service ID that is exposed
in the protocol separate from the service ID that's used as a lookup key.

This allows the protocol service ID on a client connection to get upgraded
without making the connection unfindable for other client calls that also
would like to use the upgraded connection.

The connection's actual service ID is then returned through recvmsg() by
way of msg_name.

Whilst we're at it, we get rid of the last_service_id field from each
channel. The service ID is per-connection, not per-call and an entire
connection is upgraded in one go.

Signed-off-by: David Howells

David Howells
2017-06-05 21:30:49 +0800

26 May, 2017

1 commit

2baec2c3f rxrpc: Support network namespacing ... Browse Code »

Support network namespacing in AF_RXRPC with the following changes:

(1) All the local endpoint, peer and call lists, locks, counters, etc. are
moved into the per-namespace record.

(2) All the connection tracking is moved into the per-namespace record
with the exception of the client connection ID tree, which is kept
global so that connection IDs are kept unique per-machine.

(3) Each namespace gets its own epoch. This allows each network namespace
to pretend to be a separate client machine.

(4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
the contents reflect the namespace.

fs/afs/ should be okay with this patch as it explicitly requires the current
net namespace to be init_net to permit a mount to proceed at the moment. It
will, however, need updating so that cells, IP addresses and DNS records are
per-namespace also.

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2017-05-26 01:15:11 +0800

02 Mar, 2017

1 commit

540b1c48c rxrpc: Fix deadlock between call creation and sendmsg/recvmsg ... Browse Code »

All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:

(1) If a number of calls on the same socket are in the process of
connection to the same peer, a maximum of four concurrent live calls
are permitted before further calls need to wait for a slot.

(2) If a call is waiting for a slot, it is deep inside sendmsg() or
rxrpc_kernel_begin_call() and the entry function is holding the socket
lock.

(3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
from servicing the other calls as they need to take the socket lock to
do so.

(4) The socket is stuck until a call is aborted and makes its slot
available to the waiter.

Fix this by:

(1) Provide each call with a mutex ('user_mutex') that arbitrates access
by the users of rxrpc separately for each specific call.

(2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
they've got a call and taken its mutex.

Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
set but someone else has the lock. Should I instead only return
EWOULDBLOCK if there's nothing currently to be done on a socket, and
sleep in this particular instance because there is something to be
done, but we appear to be blocked by the interrupt handler doing its
ping?

(3) Make rxrpc_new_client_call() unlock the socket after allocating a new
call, locking its user mutex and adding it to the socket's call tree.
The call is returned locked so that sendmsg() can add data to it
immediately.

From the moment the call is in the socket tree, it is subject to
access by sendmsg() and recvmsg() - even if it isn't connected yet.

(4) Lock new service calls in the UDP data_ready handler (in
rxrpc_new_incoming_call()) because they may already be in the socket's
tree and the data_ready handler makes them live immediately if a user
ID has already been preassigned.

Note that the new call is locked before any notifications are sent
that it is live, so doing mutex_trylock() *ought* to always succeed.
Userspace is prevented from doing sendmsg() on calls that are in a
too-early state in rxrpc_do_sendmsg().

(5) Make rxrpc_new_incoming_call() return the call with the user mutex
held so that a ping can be scheduled immediately under it.

Note that it might be worth moving the ping call into
rxrpc_new_incoming_call() and then we can drop the mutex there.

(6) Make rxrpc_accept_call() take the lock on the call it is accepting and
release the socket after adding the call to the socket's tree. This
is slightly tricky as we've dequeued the call by that point and have
to requeue it.

Note that requeuing emits a trace event.

(7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
new mutex immediately and don't bother with the socket mutex at all.

This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.

Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.

We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).

Signed-off-by: David Howells
Tested-by: Marc Dionne
Signed-off-by: David S. Miller

David Howells
2017-03-02 01:50:58 +0800

09 Jan, 2017

1 commit

210f03531 rxrpc: Allow listen(sock, 0) to be used to disable listening ... Browse Code »

Allow listen() with a backlog of 0 to be used to disable listening on an
AF_RXRPC socket. This also releases any preallocation, thereby making it
easier for a kernel service to account for all allocated call structures
when shutting down the service.

The socket cannot thereafter have listening reenabled, but must rather be
closed and reopened.

Signed-off-by: David Howells

David Howells
2017-01-09 19:10:02 +0800

15 Dec, 2016

1 commit

444306129 rxrpc: abstract away knowledge of IDR internals ... Browse Code »

Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox
Reviewed-by: David Howells
Tested-by: Kirill A. Shutemov
Cc: Konstantin Khlebnikov
Cc: Ross Zwisler
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2016-12-15 08:04:10 +0800

06 Oct, 2016

1 commit

b63452c11 rxrpc: Accesses of rxrpc_local::service need to be RCU managed ... Browse Code »

struct rxrpc_local->service is marked __rcu - this means that accesses of
it need to be managed using RCU wrappers. There are two such places in
rxrpc_release_sock() where the value is checked and cleared. Fix this by
using the appropriate wrappers.

Signed-off-by: David Howells

David Howells
2016-10-06 15:11:48 +0800

30 Sep, 2016

1 commit

1e9e5c952 rxrpc: Reduce the rxrpc_local::services list to a pointer ... Browse Code »

Reduce the rxrpc_local::services list to just a pointer as we don't permit
multiple service endpoints to bind to a single transport endpoints (this is
excluded by rxrpc_lookup_local()).

The reason we don't allow this is that if you send a request to an AFS
filesystem service, it will try to talk back to your cache manager on the
port you sent from (this is how file change notifications are handled). To
prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
sockets share a UDP socket if at least one of them has a service bound.

Signed-off-by: David Howells

David Howells
2016-09-30 05:57:47 +0800

17 Sep, 2016

2 commits

71f3ca408 rxrpc: Improve skb tracing ... Browse Code »

Improve sk_buff tracing within AF_RXRPC by the following means:

(1) Use an enum to note the event type rather than plain integers and use
an array of event names rather than a big multi ?: list.

(2) Distinguish Rx from Tx packets and account them separately. This
requires the call phase to be tracked so that we know what we might
find in rxtx_buffer[].

(3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
event type.

(4) A pair of 'rotate' events are added to indicate packets that are about
to be rotated out of the Rx and Tx windows.

(5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
packet loss injection recording.

Signed-off-by: David Howells

David Howells
2016-09-17 18:24:04 +0800
d19127473 rxrpc: Make IPv6 support conditional on CONFIG_IPV6 ... Browse Code »

Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.

Without this, the following can be seen:

net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'

Reported-by: kbuild test robot
Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2016-09-17 15:58:45 +0800

14 Sep, 2016

3 commits

75b54cb57 rxrpc: Add IPv6 support ... Browse Code »

Add IPv6 support to AF_RXRPC. With this, AF_RXRPC sockets can be created:

service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);

instead of:

service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);

The AFS filesystem doesn't support IPv6 at the moment, though, since that
requires upgrades to some of the RPC calls.

Note that a good portion of this patch is replacing "%pI4:%u" in print
statements with "%pISpc" which is able to handle both protocols and print
the port.

Signed-off-by: David Howells

David Howells
2016-09-14 06:09:13 +0800
cd5892c75 rxrpc: Create an address for sendmsg() to bind unbound socket with ... Browse Code »

Create an address for sendmsg() to bind unbound socket with rather than
using a completely blank address otherwise the transport socket creation
will fail because it will try to use address family 0.

We use the address family specified in the protocol argument when the
AF_RXRPC socket was created and SOCK_DGRAM as the default. For anything
else, bind() must be used.

Signed-off-by: David Howells

David Howells
2016-09-14 06:09:13 +0800
cbd00891d rxrpc: Adjust the call ref tracepoint to show kernel API refs ... Browse Code »

Adjust the call ref tracepoint to show references held on a call by the
kernel API separately as much as possible and add an additional trace to at
the allocation point from the preallocation buffer for an incoming call.

Note that this doesn't show the allocation of a client call for the kernel
separately at the moment.

Signed-off-by: David Howells

David Howells
2016-09-14 05:38:30 +0800

08 Sep, 2016

3 commits

248f219cb rxrpc: Rewrite the data and ack handling code ... Browse Code »

Rewrite the data and ack handling code such that:

(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).

(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.

(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.

(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).

(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).

(6) sk_buffs no longer take a ref on their parent call.

To make this work, the following changes are made:

(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.

(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.

Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.

(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).

The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.

The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.

Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.

Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.

(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around

(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.

However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.

(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.

Additional changes:

(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).

(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.

(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.

Future additional changes that will need to be considered:

(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.

(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.

Signed-off-by: David Howells

David Howells
2016-09-08 18:10:12 +0800
00e907127 rxrpc: Preallocate peers, conns and calls for incoming service requests ... Browse Code »

Make it possible for the data_ready handler called from the UDP transport
socket to completely instantiate an rxrpc_call structure and make it
immediately live by preallocating all the memory it might need. The idea
is to cut out the background thread usage as much as possible.

[Note that the preallocated structs are not actually used in this patch -
that will be done in a future patch.]

If insufficient resources are available in the preallocation buffers, it
will be possible to discard the DATA packet in the data_ready handler or
schedule a BUSY packet without the need to schedule an attempt at
allocation in a background thread.

To this end:

(1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
maximum number each of the listen backlog size. The backlog size is
limited to a maxmimum of 32. Only this many of each can be in the
preallocation buffer.

(2) For userspace sockets, the preallocation is charged initially by
listen() and will be recharged by accepting or rejecting pending
new incoming calls.

(3) For kernel services {,re,dis}charging of the preallocation buffers is
handled manually. Two notifier callbacks have to be provided before
kernel_listen() is invoked:

(a) An indication that a new call has been instantiated. This can be
used to trigger background recharging.

(b) An indication that a call is being discarded. This is used when
the socket is being released.

A function, rxrpc_kernel_charge_accept() is called by the kernel
service to preallocate a single call. It should be passed the user ID
to be used for that call and a callback to associate the rxrpc call
with the kernel service's side of the ID.

(4) Discard the preallocation when the socket is closed.

(5) Temporarily bump the refcount on the call allocated in
rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
preallocation ref on service calls unconditionally. This will no
longer be necessary once the preallocation is used.

Note that this does not yet control the number of active service calls on a
client - that will come in a later patch.

A future development would be to provide a setsockopt() call that allows a
userspace server to manually charge the preallocation buffer. This would
allow user call IDs to be provided in advance and the awkward manual accept
stage to be bypassed.

Signed-off-by: David Howells

David Howells
2016-09-08 18:10:12 +0800
de8d6c740 rxrpc: Convert rxrpc_local::services to an hlist ... Browse Code »

Convert the rxrpc_local::services list to an hlist so that it can be
accessed under RCU conditions more readily.

Signed-off-by: David Howells

David Howells
2016-09-08 18:10:11 +0800

07 Sep, 2016

2 commits

8d94aa381 rxrpc: Calls shouldn't hold socket refs ... Browse Code »

rxrpc calls shouldn't hold refs on the sock struct. This was done so that
the socket wouldn't go away whilst the call was in progress, such that the
call could reach the socket's queues.

However, we can mark the socket as requiring an RCU release and rely on the
RCU read lock.

To make this work, we do:

(1) rxrpc_release_call() removes the call's call user ID. This is now
only called from socket operations and not from the call processor:

rxrpc_accept_call() / rxrpc_kernel_accept_call()
rxrpc_reject_call() / rxrpc_kernel_reject_call()
rxrpc_kernel_end_call()
rxrpc_release_calls_on_socket()
rxrpc_recvmsg()

Though it is also called in the cleanup path of
rxrpc_accept_incoming_call() before we assign a user ID.

(2) Pass the socket pointer into rxrpc_release_call() rather than getting
it from the call so that we can get rid of uninitialised calls.

(3) Fix call processor queueing to pass a ref to the work queue and to
release that ref at the end of the processor function (or to pass it
back to the work queue if we have to requeue).

(4) Skip out of the call processor function asap if the call is complete
and don't requeue it if the call is complete.

(5) Clean up the call immediately that the refcount reaches 0 rather than
trying to defer it. Actual deallocation is deferred to RCU, however.

(6) Don't hold socket refs for allocated calls.

(7) Use the RCU read lock when queueing a message on a socket and treat
the call's socket pointer according to RCU rules and check it for
NULL.

We also need to use the RCU read lock when viewing a call through
procfs.

(8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
if this hasn't been done yet so that we can then disconnect the call.
Once the call is disconnected, it won't have any access to the
connection struct and the UDP socket for the call work processor to be
able to send the ACK. Terminal retransmission will be handled by the
connection processor.

(9) Release all calls immediately on the closing of a socket rather than
trying to defer this. Incomplete calls will be aborted.

The call refcount model is much simplified. Refs are held on the call by:

(1) A socket's user ID tree.

(2) A socket's incoming call secureq and acceptq.

(3) A kernel service that has a call in progress.

(4) A queued call work processor. We have to take care to put any call
that we failed to queue.

(5) sk_buffs on a socket's receive queue. A future patch will get rid of
this.

Whilst we're at it, we can do:

(1) Get rid of the RXRPC_CALL_EV_RELEASE event. Release is now done
entirely from the socket routines and never from the call's processor.

(2) Get rid of the RXRPC_CALL_DEAD state. Calls now end in the
RXRPC_CALL_COMPLETE state.

(3) Get rid of the rxrpc_call::destroyer work item. Calls are now torn
down when their refcount reaches 0 and then handed over to RCU for
final cleanup.

(4) Get rid of the rxrpc_call::deadspan timer. Calls are cleaned up
immediately they're finished with and don't hang around.
Post-completion retransmission is handled by the connection processor
once the call is disconnected.

(5) Get rid of the dead call expiry setting as there's no longer a timer
to set.

(6) rxrpc_destroy_all_calls() can just check that the call list is empty.

Signed-off-by: David Howells

David Howells
2016-09-07 22:33:20 +0800
fff72429c rxrpc: Improve the call tracking tracepoint ... Browse Code »

Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:

(1) Getting and putting refs for the socket call user ID tree.

(2) Getting and putting refs for queueing and failing to queue the call
processor work item.

Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.

An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.

Signed-off-by: David Howells

David Howells
2016-09-07 22:30:22 +0800

05 Sep, 2016

1 commit

5f2d9c443 rxrpc: Randomise epoch and starting client conn ID values ... Browse Code »

Create a random epoch value rather than a time-based one on startup and set
the top bit to indicate that this is the case.

Also create a random starting client connection ID value. This will be
incremented from here as new client connections are created.

Signed-off-by: David Howells

David Howells
2016-09-05 04:41:39 +0800

02 Sep, 2016

1 commit

d001648ec rxrpc: Don't expose skbs to in-kernel users [ver #2] ... Browse Code »

Don't expose skbs to in-kernel users, such as the AFS filesystem, but
instead provide a notification hook the indicates that a call needs
attention and another that indicates that there's a new call to be
collected.

This makes the following possibilities more achievable:

(1) Call refcounting can be made simpler if skbs don't hold refs to calls.

(2) skbs referring to non-data events will be able to be freed much sooner
rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
will be able to consult the call state.

(3) We can shortcut the receive phase when a call is remotely aborted
because we don't have to go through all the packets to get to the one
cancelling the operation.

(4) It makes it easier to do encryption/decryption directly between AFS's
buffers and sk_buffs.

(5) Encryption/decryption can more easily be done in the AFS's thread
contexts - usually that of the userspace process that issued a syscall
- rather than in one of rxrpc's background threads on a workqueue.

(6) AFS will be able to wait synchronously on a call inside AF_RXRPC.

To make this work, the following interface function has been added:

int rxrpc_kernel_recv_data(
struct socket *sock, struct rxrpc_call *call,
void *buffer, size_t bufsize, size_t *_offset,
bool want_more, u32 *_abort_code);

This is the recvmsg equivalent. It allows the caller to find out about the
state of a specific call and to transfer received data into a buffer
piecemeal.

afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
logic between them. They don't wait synchronously yet because the socket
lock needs to be dealt with.

Five interface functions have been removed:

rxrpc_kernel_is_data_last()
rxrpc_kernel_get_abort_code()
rxrpc_kernel_get_error_number()
rxrpc_kernel_free_skb()
rxrpc_kernel_data_consumed()

As a temporary hack, sk_buffs going to an in-kernel call are queued on the
rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
in-kernel user. To process the queue internally, a temporary function,
temp_deliver_data() has been added. This will be replaced with common code
between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
future patch.

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2016-09-02 07:43:27 +0800

30 Aug, 2016

1 commit

4de48af66 rxrpc: Pass struct socket * to more rxrpc kernel interface functions ... Browse Code »

Pass struct socket * to more rxrpc kernel interface functions. They should
be starting from this rather than the socket pointer in the rxrpc_call
struct if they need to access the socket.

I have left:

rxrpc_kernel_is_data_last()
rxrpc_kernel_get_abort_code()
rxrpc_kernel_get_error_number()
rxrpc_kernel_free_skb()
rxrpc_kernel_data_consumed()

unmodified as they're all about to be removed (and, in any case, don't
touch the socket).

Signed-off-by: David Howells

David Howells
2016-08-30 23:07:53 +0800

23 Aug, 2016

1 commit

df844fd46 rxrpc: Use a tracepoint for skb accounting debugging ... Browse Code »

Use a tracepoint to log various skb accounting points to help in debugging
refcounting errors.

Signed-off-by: David Howells

David Howells
2016-08-23 22:27:24 +0800

13 Jul, 2016

1 commit

8addc0440 rxrpc: Fix error handling in af_rxrpc_init() ... Browse Code »

security initialized after alloc workqueue, so we should exit security
before destroy workqueue in the error handing.

Fixes: 648af7fca159 ("rxrpc: Absorb the rxkad security module")
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2016-07-13 02:07:38 +0800

06 Jul, 2016

2 commits

dee46364c rxrpc: Add RCU destruction for connections and calls ... Browse Code »

Add RCU destruction for connections and calls as the RCU lookup from the
transport socket data_ready handler is going to come along shortly.

Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage
into the destruction code for the objects that need it (locals and
connections) and add the extra RCU barrier required for connection cleanup.

Signed-off-by: David Howells

David Howells
2016-07-06 17:43:51 +0800
eb9b9d227 rxrpc: Check that the client conns cache is empty before module removal ... Browse Code »

Check that the client conns cache is empty before module removal and bug if
not, listing any offending connections that are still present. Unfortunately,
if there are connections still around, then the transport socket is still
unexpectedly open and active, so we can't just unallocate the connections.

Signed-off-by: David Howells

David Howells
2016-07-06 17:43:51 +0800

22 Jun, 2016

5 commits

aa390bbe2 rxrpc: Kill off the rxrpc_transport struct ... Browse Code »

The rxrpc_transport struct is now redundant, given that the rxrpc_peer
struct is now per peer port rather than per peer host, so get rid of it.

Service connection lists are transferred to the rxrpc_peer struct, as is
the conn_lock. Previous patches moved the client connection handling out
of the rxrpc_transport struct and discarded the connection bundling code.

Signed-off-by: David Howells

David Howells
2016-06-22 21:00:23 +0800
999b69f89 rxrpc: Kill the client connection bundle concept ... Browse Code »

Kill off the concept of maintaining a bundle of connections to a particular
target service to increase the number of call slots available for any
beyond four for that service (there are four call slots per connection).

This will make cleaning up the connection handling code easier and
facilitate removal of the rxrpc_transport struct. Bundling can be
reintroduced later if necessary.

Signed-off-by: David Howells

David Howells
2016-06-22 16:20:55 +0800
5627cc8b9 rxrpc: Provide more refcount helper functions ... Browse Code »

Provide refcount helper functions for connections so that the code doesn't
touch local or connection usage counts directly.

Also make it such that local and peer put functions can take a NULL
pointer.

Signed-off-by: David Howells

David Howells
2016-06-22 16:17:51 +0800
f4552c2d2 rxrpc: Validate the net address given to rxrpc_kernel_begin_call() ... Browse Code »

Validate the net address given to rxrpc_kernel_begin_call() before using
it.

Whilst this should be mostly unnecessary for in-kernel users, it does clear
the tail of the address struct in case we want to hash or compare the whole
thing.

Signed-off-by: David Howells

David Howells
2016-06-22 16:17:51 +0800
4a3388c80 rxrpc: Use IDR to allocate client conn IDs on a machine-wide basis ... Browse Code »

Use the IDR facility to allocate client connection IDs on a machine-wide
basis so that each client connection has a unique identifier. When the
connection ID space wraps, we advance the epoch by 1, thereby effectively
having a 62-bit ID space. The IDR facility is then used to look up client
connections during incoming packet routing instead of using an rbtree
rooted on the transport.

This change allows for the removal of the transport in the future and also
means that client connections can be looked up directly in the data-ready
handler by connection ID.

The ID management code is placed in a new file, conn-client.c, to which all
the client connection-specific code will eventually move.

Note that the IDR tree gets very expensive on memory if the connection IDs
are widely scattered throughout the number space, so we shall need to
retire connections that have, say, an ID more than four times the maximum
number of client conns away from the current allocation point to try and
keep the IDs concentrated. We will also need to retire connections from an
old epoch.

Also note that, for the moment, a pointer to the transport has to be passed
through into the ID allocation function so that we can take a BH lock to
prevent a locking issue against in-BH lookup of client connections. This
will go away later when RCU is used for server connections also.

Signed-off-by: David Howells

David Howells
2016-06-22 16:10:02 +0800