21 Jun, 2018

1 commit

  • [ Upstream commit 93864fc3ffcc4bf70e96cfb5cc6e941630419ad0 ]

    Fix the kernel call initiation to set the minimum security level for kernel
    initiated calls (such as from kAFS) from the sockopt value.

    Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info")
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

04 Feb, 2018

1 commit

  • [ Upstream commit f859ab61875978eeaa539740ff7f7d91f5d60006 ]

    RxRPC service endpoints expire like they're supposed to by the following
    means:

    (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
    global service conn timeout, otherwise the first rxrpc_net struct to
    die will cause connections on all others to expire immediately from
    then on.

    (2) Mark local service endpoints for which the socket has been closed
    (->service_closed) so that the expiration timeout can be much
    shortened for service and client connections going through that
    endpoint.

    (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
    count reaches 1, not 0, as idle conns have a 1 count.

    (4) The accumulator for the earliest time we might want to schedule for
    should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
    the comparison functions use signed arithmetic.

    (5) Simplify the expiration handling, adding the expiration value to the
    idle timestamp each time rather than keeping track of the time in the
    past before which the idle timestamp must go to be expired. This is
    much easier to read.

    (6) Ignore the timeouts if the net namespace is dead.

    (7) Restart the service reaper work item rather the client reaper.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

22 Oct, 2017

1 commit

  • Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
    call pointer actually holds an error value.

    Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

29 Aug, 2017

2 commits

  • Allow a client call that failed on network error to be retried, provided
    that the Tx queue still holds DATA packet 1. This allows an operation to
    be submitted to another server or another address for the same server
    without having to repackage and re-encrypt the data so far processed.

    Two new functions are provided:

    (1) rxrpc_kernel_check_call() - This is used to find out the completion
    state of a call to guess whether it can be retried and whether it
    should be retried.

    (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
    connection, reset the state and submit it as a new client call to a
    new address. The new address need not match the previous address.

    A call may be retried even if all the data hasn't been loaded into it yet;
    a partially constructed will be retained at the same point it was at when
    an error condition was detected. msg_data_left() can be used to find out
    how much data was packaged before the error occurred.

    Signed-off-by: David Howells

    David Howells
     
  • Remove indentation from some blank lines.

    Signed-off-by: David Howells

    David Howells
     

01 Jul, 2017

2 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

08 Jun, 2017

2 commits

  • Provide a control message that can be specified on the first sendmsg() of a
    client call or the first sendmsg() of a service response to indicate the
    total length of the data to be transmitted for that call.

    Currently, because the length of the payload of an encrypted DATA packet is
    encrypted in front of the data, the packet cannot be encrypted until we
    know how much data it will hold.

    By specifying the length at the beginning of the transmit phase, each DATA
    packet length can be set before we start loading data from userspace (where
    several sendmsg() calls may contribute to a particular packet).

    An error will be returned if too little or too much data is presented in
    the Tx phase.

    Signed-off-by: David Howells

    David Howells
     
  • Provide a getsockopt() call that can query what cmsg types are supported by
    AF_RXRPC.

    David Howells
     

05 Jun, 2017

3 commits

  • Implement AuriStor's service upgrade facility. There are three problems
    that this is meant to deal with:

    (1) Various of the standard AFS RPC calls have IPv4 addresses in their
    requests and/or replies - but there's no room for including IPv6
    addresses.

    (2) Definition of IPv6-specific RPC operations in the standard operation
    sets has not yet been achieved.

    (3) One could envision the creation a new service on the same port that as
    the original service. The new service could implement improved
    operations - and the client could try this first, falling back to the
    original service if it's not there.

    Unfortunately, certain servers ignore packets addressed to a service
    they don't implement and don't respond in any way - not even with an
    ABORT. This means that the client must then wait for the call timeout
    to occur.

    What service upgrade does is to see if the connection is marked as being
    'upgradeable' and if so, change the service ID in the server and thus the
    request and reply formats. Note that the upgrade isn't mandatory - a
    server that supports only the original call set will ignore the upgrade
    request.

    In the protocol, the procedure is then as follows:

    (1) To request an upgrade, the first DATA packet in a new connection must
    have the userStatus set to 1 (this is normally 0). The userStatus
    value is normally ignored by the server.

    (2) If the server doesn't support upgrading, the reply packets will
    contain the same service ID as for the first request packet.

    (3) If the server does support upgrading, all future reply packets on that
    connection will contain the new service ID and the new service ID will
    be applied to *all* further calls on that connection as well.

    (4) The RPC op used to probe the upgrade must take the same request data
    as the shadow call in the upgrade set (but may return a different
    reply). GetCapability RPC ops were added to all standard sets for
    just this purpose. Ops where the request formats differ cannot be
    used for probing.

    (5) The client must wait for completion of the probe before sending any
    further RPC ops to the same destination. It should then use the
    service ID that recvmsg() reported back in all future calls.

    (6) The shadow service must have call definitions for all the operation
    IDs defined by the original service.

    To support service upgrading, a server should:

    (1) Call bind() twice on its AF_RXRPC socket before calling listen().
    Each bind() should supply a different service ID, but the transport
    addresses must be the same. This allows the server to receive
    requests with either service ID.

    (2) Enable automatic upgrading by calling setsockopt(), specifying
    RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
    unsigned shorts as the argument:

    unsigned short optval[2];

    This specifies a pair of service IDs. They must be different and must
    match the service IDs bound to the socket. Member 0 is the service ID
    to upgrade from and member 1 is the service ID to upgrade to.

    Signed-off-by: David Howells

    David Howells
     
  • Permit bind() to be called on an AF_RXRPC socket more than once (currently
    maximum twice) to bind multiple listening services to it. There are some
    restrictions:

    (1) All bind() calls involved must have a non-zero service ID.

    (2) The service IDs must all be different.

    (3) The rest of the address (notably the transport part) must be the same
    in all (a single UDP socket is shared).

    (4) This must be done before listen() or sendmsg() is called.

    This allows someone to connect to the service socket with different service
    IDs and lays the foundation for service upgrading.

    The service ID used by an incoming call can be extracted from the msg_name
    returned by recvmsg().

    Signed-off-by: David Howells

    David Howells
     
  • Keep the rxrpc_connection struct's idea of the service ID that is exposed
    in the protocol separate from the service ID that's used as a lookup key.

    This allows the protocol service ID on a client connection to get upgraded
    without making the connection unfindable for other client calls that also
    would like to use the upgraded connection.

    The connection's actual service ID is then returned through recvmsg() by
    way of msg_name.

    Whilst we're at it, we get rid of the last_service_id field from each
    channel. The service ID is per-connection, not per-call and an entire
    connection is upgraded in one go.

    Signed-off-by: David Howells

    David Howells
     

26 May, 2017

1 commit

  • Support network namespacing in AF_RXRPC with the following changes:

    (1) All the local endpoint, peer and call lists, locks, counters, etc. are
    moved into the per-namespace record.

    (2) All the connection tracking is moved into the per-namespace record
    with the exception of the client connection ID tree, which is kept
    global so that connection IDs are kept unique per-machine.

    (3) Each namespace gets its own epoch. This allows each network namespace
    to pretend to be a separate client machine.

    (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
    the contents reflect the namespace.

    fs/afs/ should be okay with this patch as it explicitly requires the current
    net namespace to be init_net to permit a mount to proceed at the moment. It
    will, however, need updating so that cells, IP addresses and DNS records are
    per-namespace also.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

02 Mar, 2017

1 commit

  • All the routines by which rxrpc is accessed from the outside are serialised
    by means of the socket lock (sendmsg, recvmsg, bind,
    rxrpc_kernel_begin_call(), ...) and this presents a problem:

    (1) If a number of calls on the same socket are in the process of
    connection to the same peer, a maximum of four concurrent live calls
    are permitted before further calls need to wait for a slot.

    (2) If a call is waiting for a slot, it is deep inside sendmsg() or
    rxrpc_kernel_begin_call() and the entry function is holding the socket
    lock.

    (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
    from servicing the other calls as they need to take the socket lock to
    do so.

    (4) The socket is stuck until a call is aborted and makes its slot
    available to the waiter.

    Fix this by:

    (1) Provide each call with a mutex ('user_mutex') that arbitrates access
    by the users of rxrpc separately for each specific call.

    (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
    they've got a call and taken its mutex.

    Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
    set but someone else has the lock. Should I instead only return
    EWOULDBLOCK if there's nothing currently to be done on a socket, and
    sleep in this particular instance because there is something to be
    done, but we appear to be blocked by the interrupt handler doing its
    ping?

    (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
    call, locking its user mutex and adding it to the socket's call tree.
    The call is returned locked so that sendmsg() can add data to it
    immediately.

    From the moment the call is in the socket tree, it is subject to
    access by sendmsg() and recvmsg() - even if it isn't connected yet.

    (4) Lock new service calls in the UDP data_ready handler (in
    rxrpc_new_incoming_call()) because they may already be in the socket's
    tree and the data_ready handler makes them live immediately if a user
    ID has already been preassigned.

    Note that the new call is locked before any notifications are sent
    that it is live, so doing mutex_trylock() *ought* to always succeed.
    Userspace is prevented from doing sendmsg() on calls that are in a
    too-early state in rxrpc_do_sendmsg().

    (5) Make rxrpc_new_incoming_call() return the call with the user mutex
    held so that a ping can be scheduled immediately under it.

    Note that it might be worth moving the ping call into
    rxrpc_new_incoming_call() and then we can drop the mutex there.

    (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
    release the socket after adding the call to the socket's tree. This
    is slightly tricky as we've dequeued the call by that point and have
    to requeue it.

    Note that requeuing emits a trace event.

    (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
    new mutex immediately and don't bother with the socket mutex at all.

    This patch has the nice bonus that calls on the same socket are now to some
    extent parallelisable.

    Note that we might want to move rxrpc_service_prealloc() calls out from the
    socket lock and give it its own lock, so that we don't hang progress in
    other calls because we're waiting for the allocator.

    We probably also want to avoid calling rxrpc_notify_socket() from within
    the socket lock (rxrpc_accept_call()).

    Signed-off-by: David Howells
    Tested-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     

09 Jan, 2017

1 commit

  • Allow listen() with a backlog of 0 to be used to disable listening on an
    AF_RXRPC socket. This also releases any preallocation, thereby making it
    easier for a kernel service to account for all allocated call structures
    when shutting down the service.

    The socket cannot thereafter have listening reenabled, but must rather be
    closed and reopened.

    Signed-off-by: David Howells

    David Howells
     

15 Dec, 2016

1 commit

  • Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
    to IDR_SIZE.

    Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Reviewed-by: David Howells
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

06 Oct, 2016

1 commit


30 Sep, 2016

1 commit

  • Reduce the rxrpc_local::services list to just a pointer as we don't permit
    multiple service endpoints to bind to a single transport endpoints (this is
    excluded by rxrpc_lookup_local()).

    The reason we don't allow this is that if you send a request to an AFS
    filesystem service, it will try to talk back to your cache manager on the
    port you sent from (this is how file change notifications are handled). To
    prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
    sockets share a UDP socket if at least one of them has a service bound.

    Signed-off-by: David Howells

    David Howells
     

17 Sep, 2016

2 commits

  • Improve sk_buff tracing within AF_RXRPC by the following means:

    (1) Use an enum to note the event type rather than plain integers and use
    an array of event names rather than a big multi ?: list.

    (2) Distinguish Rx from Tx packets and account them separately. This
    requires the call phase to be tracked so that we know what we might
    find in rxtx_buffer[].

    (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
    event type.

    (4) A pair of 'rotate' events are added to indicate packets that are about
    to be rotated out of the Rx and Tx windows.

    (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
    packet loss injection recording.

    Signed-off-by: David Howells

    David Howells
     
  • Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
    This is then made conditional on CONFIG_IPV6.

    Without this, the following can be seen:

    net/built-in.o: In function `rxrpc_init_peer':
    >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'

    Reported-by: kbuild test robot
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

14 Sep, 2016

3 commits

  • Add IPv6 support to AF_RXRPC. With this, AF_RXRPC sockets can be created:

    service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);

    instead of:

    service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);

    The AFS filesystem doesn't support IPv6 at the moment, though, since that
    requires upgrades to some of the RPC calls.

    Note that a good portion of this patch is replacing "%pI4:%u" in print
    statements with "%pISpc" which is able to handle both protocols and print
    the port.

    Signed-off-by: David Howells

    David Howells
     
  • Create an address for sendmsg() to bind unbound socket with rather than
    using a completely blank address otherwise the transport socket creation
    will fail because it will try to use address family 0.

    We use the address family specified in the protocol argument when the
    AF_RXRPC socket was created and SOCK_DGRAM as the default. For anything
    else, bind() must be used.

    Signed-off-by: David Howells

    David Howells
     
  • Adjust the call ref tracepoint to show references held on a call by the
    kernel API separately as much as possible and add an additional trace to at
    the allocation point from the preallocation buffer for an incoming call.

    Note that this doesn't show the allocation of a client call for the kernel
    separately at the moment.

    Signed-off-by: David Howells

    David Howells
     

08 Sep, 2016

3 commits

  • Rewrite the data and ack handling code such that:

    (1) Parsing of received ACK and ABORT packets and the distribution and the
    filing of DATA packets happens entirely within the data_ready context
    called from the UDP socket. This allows us to process and discard ACK
    and ABORT packets much more quickly (they're no longer stashed on a
    queue for a background thread to process).

    (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
    keep track of the offset and length of the content of each packet in
    the sk_buff metadata. This means we don't do any allocation in the
    receive path.

    (3) Jumbo DATA packet parsing is now done in data_ready context. Rather
    than cloning the packet once for each subpacket and pulling/trimming
    it, we file the packet multiple times with an annotation for each
    indicating which subpacket is there. From that we can directly
    calculate the offset and length.

    (4) A call's receive queue can be accessed without taking locks (memory
    barriers do have to be used, though).

    (5) Incoming calls are set up from preallocated resources and immediately
    made live. They can than have packets queued upon them and ACKs
    generated. If insufficient resources exist, DATA packet #1 is given a
    BUSY reply and other DATA packets are discarded).

    (6) sk_buffs no longer take a ref on their parent call.

    To make this work, the following changes are made:

    (1) Each call's receive buffer is now a circular buffer of sk_buff
    pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
    between the call and the socket. This permits each sk_buff to be in
    the buffer multiple times. The receive buffer is reused for the
    transmit buffer.

    (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
    to the data buffer. Transmission phase annotations indicate whether a
    buffered packet has been ACK'd or not and whether it needs
    retransmission.

    Receive phase annotations indicate whether a slot holds a whole packet
    or a jumbo subpacket and, if the latter, which subpacket. They also
    note whether the packet has been decrypted in place.

    (3) DATA packet window tracking is much simplified. Each phase has just
    two numbers representing the window (rx_hard_ack/rx_top and
    tx_hard_ack/tx_top).

    The hard_ack number is the sequence number before base of the window,
    representing the last packet the other side says it has consumed.
    hard_ack starts from 0 and the first packet is sequence number 1.

    The top number is the sequence number of the highest-numbered packet
    residing in the buffer. Packets between hard_ack+1 and top are
    soft-ACK'd to indicate they've been received, but not yet consumed.

    Four macros, before(), before_eq(), after() and after_eq() are added
    to compare sequence numbers within the window. This allows for the
    top of the window to wrap when the hard-ack sequence number gets close
    to the limit.

    Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
    to indicate when rx_top and tx_top point at the packets with the
    LAST_PACKET bit set, indicating the end of the phase.

    (4) Calls are queued on the socket 'receive queue' rather than packets.
    This means that we don't need have to invent dummy packets to queue to
    indicate abnormal/terminal states and we don't have to keep metadata
    packets (such as ABORTs) around

    (5) The offset and length of a (sub)packet's content are now passed to
    the verify_packet security op. This is currently expected to decrypt
    the packet in place and validate it.

    However, there's now nowhere to store the revised offset and length of
    the actual data within the decrypted blob (there may be a header and
    padding to skip) because an sk_buff may represent multiple packets, so
    a locate_data security op is added to retrieve these details from the
    sk_buff content when needed.

    (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
    individually secured and needs to be individually decrypted. The code
    to do this is broken out into rxrpc_recvmsg_data() and shared with the
    kernel API. It now iterates over the call's receive buffer rather
    than walking the socket receive queue.

    Additional changes:

    (1) The timers are condensed to a single timer that is set for the soonest
    of three timeouts (delayed ACK generation, DATA retransmission and
    call lifespan).

    (2) Transmission of ACK and ABORT packets is effected immediately from
    process-context socket ops/kernel API calls that cause them instead of
    them being punted off to a background work item. The data_ready
    handler still has to defer to the background, though.

    (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
    filesystem can shut down the socket and flush its own work items
    before closing the socket to deal with any in-progress service calls.

    Future additional changes that will need to be considered:

    (1) Make sure that a call doesn't hog the front of the queue by receiving
    data from the network as fast as userspace is consuming it to the
    exclusion of other calls.

    (2) Transmit delayed ACKs from within recvmsg() when we've consumed
    sufficiently more packets to avoid the background work item needing to
    run.

    Signed-off-by: David Howells

    David Howells
     
  • Make it possible for the data_ready handler called from the UDP transport
    socket to completely instantiate an rxrpc_call structure and make it
    immediately live by preallocating all the memory it might need. The idea
    is to cut out the background thread usage as much as possible.

    [Note that the preallocated structs are not actually used in this patch -
    that will be done in a future patch.]

    If insufficient resources are available in the preallocation buffers, it
    will be possible to discard the DATA packet in the data_ready handler or
    schedule a BUSY packet without the need to schedule an attempt at
    allocation in a background thread.

    To this end:

    (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
    maximum number each of the listen backlog size. The backlog size is
    limited to a maxmimum of 32. Only this many of each can be in the
    preallocation buffer.

    (2) For userspace sockets, the preallocation is charged initially by
    listen() and will be recharged by accepting or rejecting pending
    new incoming calls.

    (3) For kernel services {,re,dis}charging of the preallocation buffers is
    handled manually. Two notifier callbacks have to be provided before
    kernel_listen() is invoked:

    (a) An indication that a new call has been instantiated. This can be
    used to trigger background recharging.

    (b) An indication that a call is being discarded. This is used when
    the socket is being released.

    A function, rxrpc_kernel_charge_accept() is called by the kernel
    service to preallocate a single call. It should be passed the user ID
    to be used for that call and a callback to associate the rxrpc call
    with the kernel service's side of the ID.

    (4) Discard the preallocation when the socket is closed.

    (5) Temporarily bump the refcount on the call allocated in
    rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
    preallocation ref on service calls unconditionally. This will no
    longer be necessary once the preallocation is used.

    Note that this does not yet control the number of active service calls on a
    client - that will come in a later patch.

    A future development would be to provide a setsockopt() call that allows a
    userspace server to manually charge the preallocation buffer. This would
    allow user call IDs to be provided in advance and the awkward manual accept
    stage to be bypassed.

    Signed-off-by: David Howells

    David Howells
     
  • Convert the rxrpc_local::services list to an hlist so that it can be
    accessed under RCU conditions more readily.

    Signed-off-by: David Howells

    David Howells
     

07 Sep, 2016

2 commits

  • rxrpc calls shouldn't hold refs on the sock struct. This was done so that
    the socket wouldn't go away whilst the call was in progress, such that the
    call could reach the socket's queues.

    However, we can mark the socket as requiring an RCU release and rely on the
    RCU read lock.

    To make this work, we do:

    (1) rxrpc_release_call() removes the call's call user ID. This is now
    only called from socket operations and not from the call processor:

    rxrpc_accept_call() / rxrpc_kernel_accept_call()
    rxrpc_reject_call() / rxrpc_kernel_reject_call()
    rxrpc_kernel_end_call()
    rxrpc_release_calls_on_socket()
    rxrpc_recvmsg()

    Though it is also called in the cleanup path of
    rxrpc_accept_incoming_call() before we assign a user ID.

    (2) Pass the socket pointer into rxrpc_release_call() rather than getting
    it from the call so that we can get rid of uninitialised calls.

    (3) Fix call processor queueing to pass a ref to the work queue and to
    release that ref at the end of the processor function (or to pass it
    back to the work queue if we have to requeue).

    (4) Skip out of the call processor function asap if the call is complete
    and don't requeue it if the call is complete.

    (5) Clean up the call immediately that the refcount reaches 0 rather than
    trying to defer it. Actual deallocation is deferred to RCU, however.

    (6) Don't hold socket refs for allocated calls.

    (7) Use the RCU read lock when queueing a message on a socket and treat
    the call's socket pointer according to RCU rules and check it for
    NULL.

    We also need to use the RCU read lock when viewing a call through
    procfs.

    (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
    if this hasn't been done yet so that we can then disconnect the call.
    Once the call is disconnected, it won't have any access to the
    connection struct and the UDP socket for the call work processor to be
    able to send the ACK. Terminal retransmission will be handled by the
    connection processor.

    (9) Release all calls immediately on the closing of a socket rather than
    trying to defer this. Incomplete calls will be aborted.

    The call refcount model is much simplified. Refs are held on the call by:

    (1) A socket's user ID tree.

    (2) A socket's incoming call secureq and acceptq.

    (3) A kernel service that has a call in progress.

    (4) A queued call work processor. We have to take care to put any call
    that we failed to queue.

    (5) sk_buffs on a socket's receive queue. A future patch will get rid of
    this.

    Whilst we're at it, we can do:

    (1) Get rid of the RXRPC_CALL_EV_RELEASE event. Release is now done
    entirely from the socket routines and never from the call's processor.

    (2) Get rid of the RXRPC_CALL_DEAD state. Calls now end in the
    RXRPC_CALL_COMPLETE state.

    (3) Get rid of the rxrpc_call::destroyer work item. Calls are now torn
    down when their refcount reaches 0 and then handed over to RCU for
    final cleanup.

    (4) Get rid of the rxrpc_call::deadspan timer. Calls are cleaned up
    immediately they're finished with and don't hang around.
    Post-completion retransmission is handled by the connection processor
    once the call is disconnected.

    (5) Get rid of the dead call expiry setting as there's no longer a timer
    to set.

    (6) rxrpc_destroy_all_calls() can just check that the call list is empty.

    Signed-off-by: David Howells

    David Howells
     
  • Improve the call tracking tracepoint by showing more differentiation
    between some of the put and get events, including:

    (1) Getting and putting refs for the socket call user ID tree.

    (2) Getting and putting refs for queueing and failing to queue the call
    processor work item.

    Note that these aren't necessarily used in this patch, but will be taken
    advantage of in future patches.

    An enum is added for the event subtype numbers rather than coding them
    directly as decimal numbers and a table of 3-letter strings is provided
    rather than a sequence of ?: operators.

    Signed-off-by: David Howells

    David Howells
     

05 Sep, 2016

1 commit


02 Sep, 2016

1 commit

  • Don't expose skbs to in-kernel users, such as the AFS filesystem, but
    instead provide a notification hook the indicates that a call needs
    attention and another that indicates that there's a new call to be
    collected.

    This makes the following possibilities more achievable:

    (1) Call refcounting can be made simpler if skbs don't hold refs to calls.

    (2) skbs referring to non-data events will be able to be freed much sooner
    rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
    will be able to consult the call state.

    (3) We can shortcut the receive phase when a call is remotely aborted
    because we don't have to go through all the packets to get to the one
    cancelling the operation.

    (4) It makes it easier to do encryption/decryption directly between AFS's
    buffers and sk_buffs.

    (5) Encryption/decryption can more easily be done in the AFS's thread
    contexts - usually that of the userspace process that issued a syscall
    - rather than in one of rxrpc's background threads on a workqueue.

    (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.

    To make this work, the following interface function has been added:

    int rxrpc_kernel_recv_data(
    struct socket *sock, struct rxrpc_call *call,
    void *buffer, size_t bufsize, size_t *_offset,
    bool want_more, u32 *_abort_code);

    This is the recvmsg equivalent. It allows the caller to find out about the
    state of a specific call and to transfer received data into a buffer
    piecemeal.

    afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
    logic between them. They don't wait synchronously yet because the socket
    lock needs to be dealt with.

    Five interface functions have been removed:

    rxrpc_kernel_is_data_last()
    rxrpc_kernel_get_abort_code()
    rxrpc_kernel_get_error_number()
    rxrpc_kernel_free_skb()
    rxrpc_kernel_data_consumed()

    As a temporary hack, sk_buffs going to an in-kernel call are queued on the
    rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
    in-kernel user. To process the queue internally, a temporary function,
    temp_deliver_data() has been added. This will be replaced with common code
    between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
    future patch.

    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

30 Aug, 2016

1 commit

  • Pass struct socket * to more rxrpc kernel interface functions. They should
    be starting from this rather than the socket pointer in the rxrpc_call
    struct if they need to access the socket.

    I have left:

    rxrpc_kernel_is_data_last()
    rxrpc_kernel_get_abort_code()
    rxrpc_kernel_get_error_number()
    rxrpc_kernel_free_skb()
    rxrpc_kernel_data_consumed()

    unmodified as they're all about to be removed (and, in any case, don't
    touch the socket).

    Signed-off-by: David Howells

    David Howells
     

23 Aug, 2016

1 commit


13 Jul, 2016

1 commit


06 Jul, 2016

2 commits

  • Add RCU destruction for connections and calls as the RCU lookup from the
    transport socket data_ready handler is going to come along shortly.

    Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage
    into the destruction code for the objects that need it (locals and
    connections) and add the extra RCU barrier required for connection cleanup.

    Signed-off-by: David Howells

    David Howells
     
  • Check that the client conns cache is empty before module removal and bug if
    not, listing any offending connections that are still present. Unfortunately,
    if there are connections still around, then the transport socket is still
    unexpectedly open and active, so we can't just unallocate the connections.

    Signed-off-by: David Howells

    David Howells
     

22 Jun, 2016

5 commits

  • The rxrpc_transport struct is now redundant, given that the rxrpc_peer
    struct is now per peer port rather than per peer host, so get rid of it.

    Service connection lists are transferred to the rxrpc_peer struct, as is
    the conn_lock. Previous patches moved the client connection handling out
    of the rxrpc_transport struct and discarded the connection bundling code.

    Signed-off-by: David Howells

    David Howells
     
  • Kill off the concept of maintaining a bundle of connections to a particular
    target service to increase the number of call slots available for any
    beyond four for that service (there are four call slots per connection).

    This will make cleaning up the connection handling code easier and
    facilitate removal of the rxrpc_transport struct. Bundling can be
    reintroduced later if necessary.

    Signed-off-by: David Howells

    David Howells
     
  • Provide refcount helper functions for connections so that the code doesn't
    touch local or connection usage counts directly.

    Also make it such that local and peer put functions can take a NULL
    pointer.

    Signed-off-by: David Howells

    David Howells
     
  • Validate the net address given to rxrpc_kernel_begin_call() before using
    it.

    Whilst this should be mostly unnecessary for in-kernel users, it does clear
    the tail of the address struct in case we want to hash or compare the whole
    thing.

    Signed-off-by: David Howells

    David Howells
     
  • Use the IDR facility to allocate client connection IDs on a machine-wide
    basis so that each client connection has a unique identifier. When the
    connection ID space wraps, we advance the epoch by 1, thereby effectively
    having a 62-bit ID space. The IDR facility is then used to look up client
    connections during incoming packet routing instead of using an rbtree
    rooted on the transport.

    This change allows for the removal of the transport in the future and also
    means that client connections can be looked up directly in the data-ready
    handler by connection ID.

    The ID management code is placed in a new file, conn-client.c, to which all
    the client connection-specific code will eventually move.

    Note that the IDR tree gets very expensive on memory if the connection IDs
    are widely scattered throughout the number space, so we shall need to
    retire connections that have, say, an ID more than four times the maximum
    number of client conns away from the current allocation point to try and
    keep the IDs concentrated. We will also need to retire connections from an
    old epoch.

    Also note that, for the moment, a pointer to the transport has to be passed
    through into the ID allocation function so that we can take a BH lock to
    prevent a locking issue against in-BH lookup of client connections. This
    will go away later when RCU is used for server connections also.

    Signed-off-by: David Howells

    David Howells