30 Sep, 2016

12 commits

  • The call timer's concept of a call timeout (of which there are three) that
    is inactive is that it is the timeout has the same expiration time as the
    call expiration timeout (the expiration timer is never inactive). However,
    I'm not resetting the timeouts when they expire, leading to repeated
    processing of expired timeouts when other timeout events occur.

    Fix this by:

    (1) Move the timer expiry detection into rxrpc_set_timer() inside the
    locked section. This means that if a timeout is set that will expire
    immediately, we deal with it immediately.

    (2) If a timeout is at or before now then it has expired. When an expiry
    is detected, an event is raised, the timeout is automatically
    inactivated and the event processor is queued.

    (3) If a timeout is at or after the expiry timeout then it is inactive.
    Inactive timeouts do not contribute to the timer setting.

    (4) The call timer callback can now just call rxrpc_set_timer() to handle
    things.

    (5) The call processor work function now checks the event flags rather
    than checking the timeouts directly.

    Signed-off-by: David Howells

    David Howells
     
  • Keep that call timeouts as ktimes rather than jiffies so that they can be
    expressed as functions of RTT.

    Signed-off-by: David Howells

    David Howells
     
  • Remove error from struct rxrpc_skb_priv as it is no longer used.

    Signed-off-by: David Howells

    David Howells
     
  • The offset field in struct rxrpc_skb_priv is unnecessary as the value can
    always be calculated.

    Signed-off-by: David Howells

    David Howells
     
  • When we receive an ACK from the peer that tells us what the peer's receive
    window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller
    than ssthresh.

    Signed-off-by: David Howells

    David Howells
     
  • Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying
    on cwnd getting incremented beyond ssthresh and the window size, the mode
    being shifted and then cwnd being corrected.

    We need to make sure we switch into CA mode so that we stop marking every
    packet for ACK.

    Signed-off-by: David Howells

    David Howells
     
  • Note the serial number of the packet being ACK'd in the congestion
    management trace rather than the serial number of the ACK packet. Whilst
    the serial number of the ACK packet is useful for matching ACK packet in
    the output of wireshark, the serial number that the ACK is in response to
    is of more use in working out how different trace lines relate.

    Signed-off-by: David Howells

    David Howells
     
  • Set the request-ACK on more DATA packets whilst we're in slow start mode so
    that we get sufficient ACKs back to supply information to configure the
    window.

    Signed-off-by: David Howells

    David Howells
     
  • Reduce the rxrpc_local::services list to just a pointer as we don't permit
    multiple service endpoints to bind to a single transport endpoints (this is
    excluded by rxrpc_lookup_local()).

    The reason we don't allow this is that if you send a request to an AFS
    filesystem service, it will try to talk back to your cache manager on the
    port you sent from (this is how file change notifications are handled). To
    prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
    sockets share a UDP socket if at least one of them has a service bound.

    Signed-off-by: David Howells

    David Howells
     
  • In rxrpc_activate_channels(), the connection cache state is checked outside
    of the lock, which means it can change whilst we're waking calls up,
    thereby changing whether or not we're allowed to wake calls up.

    Fix this by moving the check inside the locked region. The check to see if
    all the channels are currently busy can stay outside of the locked region.

    Whilst we're at it:

    (1) Split the locked section out into its own function so that we can call
    it from other places in a later patch.

    (2) Determine the mask of channels dependent on the state as we're going
    to add another state in a later patch that will restrict the number of
    simultaneous calls to 1 on a connection.

    Signed-off-by: David Howells

    David Howells
     
  • In rxrpc_send_data_packet() make the loss-injection path return through the
    same code as the transmission path so that the RTT determination is
    initiated and any future timer shuffling will be done, despite the packet
    having been binned.

    Whilst we're at it:

    (1) Add to the tx_data tracepoint an indication of whether or not we're
    retransmitting a data packet.

    (2) When we're deciding whether or not to request an ACK, rather than
    checking if we're in fast-retransmit mode check instead if we're
    retransmitting.

    (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
    not altering the sk_buff refcount nor are we just seeing it after
    getting it off the Tx list.

    (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

    (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.

    Signed-off-by: David Howells

    David Howells
     
  • Exclusive connections are currently reusable (which they shouldn't be)
    because rxrpc_alloc_client_connection() checks the exclusive flag in the
    rxrpc_connection struct before it's initialised from the function
    parameters. This means that the DONT_REUSE flag doesn't get set.

    Fix this by checking the function parameters for the exclusive flag.

    Signed-off-by: David Howells

    David Howells
     

25 Sep, 2016

8 commits

  • Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A
    tracepoint is added to log the state of the congestion management algorithm
    and the decisions it makes.

    Notes:

    (1) Since we send fixed-size DATA packets (apart from the final packet in
    each phase), counters and calculations are in terms of packets rather
    than bytes.

    (2) The ACK packet carries the equivalent of TCP SACK.

    (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
    suited to SACK of a small number of packets. It seems that, almost
    inevitably, by the time three 'duplicate' ACKs have been seen, we have
    narrowed the loss down to one or two missing packets, and the
    FLIGHT_SIZE calculation ends up as 2.

    (4) In rxrpc_resend(), if there was no data that apparently needed
    retransmission, we transmit a PING ACK to ask the peer to tell us what
    its Rx window state is.

    Signed-off-by: David Howells

    David Howells
     
  • If we've sent all the request data in a client call but haven't seen any
    sign of the reply data yet, schedule an ACK to be sent to the server to
    find out if the reply data got lost.

    If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to
    demand a response to find out whether we need to retransmit.

    If the server says it has received all of the data, we send an IDLE ACK to
    tell the server that we haven't received anything in the receive phase as
    yet.

    To make this work, a non-immediate PING ACK must carry a delay. I've chosen
    the same as the IDLE ACK for the moment.

    Signed-off-by: David Howells

    David Howells
     
  • Generate a summary of the Tx buffer packet state when an ACK is received
    for use in a later patch that does congestion management.

    Signed-off-by: David Howells

    David Howells
     
  • When determining the resend timer value, we have a value in nsec but the
    timer is in jiffies which may be a million or more times more coarse.
    nsecs_to_jiffies() rounds down - which means that the resend timeout
    expressed as jiffies is very likely earlier than the one expressed as
    nanoseconds from which it was derived.

    The problem is that rxrpc_resend() gets triggered by the timer, but can't
    then find anything to resend yet. It sets the timer again - but gets
    kicked off immediately again and again until the nanosecond-based expiry
    time is reached and we actually retransmit.

    Fix this by adding 1 to the jiffies-based resend_at value to counteract the
    rounding and make sure that the timer happens after the nanosecond-based
    expiry is passed.

    Alternatives would be to adjust the timestamp on the packets to align
    with the jiffie scale or to switch back to using jiffie-timestamps.

    Signed-off-by: David Howells

    David Howells
     
  • Clear the ACK reason, ACK timer and resend timer when entering the client
    reply phase when the first DATA packet is received. New ACKs will be
    proposed once the data is queued.

    The resend timer is no longer relevant and we need to cancel ACKs scheduled
    to probe for a lost reply.

    Signed-off-by: David Howells

    David Howells
     
  • In a client call, include the serial number of the last DATA packet of the
    reply in the final ACK.

    Signed-off-by: David Howells

    David Howells
     
  • Send an immediate ACK if we fill in a hole in the buffer left by an
    out-of-sequence packet. This may allow the congestion management in the peer
    to avoid a retransmission if packets got reordered on the wire.

    Signed-off-by: David Howells

    David Howells
     
  • Send an ACK if we haven't sent one for the last two packets we've received.
    This keeps the other end apprised of where we've got to - which is
    important if they're doing slow-start.

    We do this in recvmsg so that we can dispatch a packet directly without the
    need to wake up the background thread.

    This should possibly be made configurable in future.

    Signed-off-by: David Howells

    David Howells
     

23 Sep, 2016

15 commits

  • Add a tracepoint to log in rxrpc_resend() which packets will be
    retransmitted. Note that if a positive ACK comes in whilst we have dropped
    the lock to retransmit another packet, the actual retransmission may not
    happen, though some of the effects will (such as altering the congestion
    management).

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to log proposed ACKs, including whether the proposal is
    used to update a pending ACK or is discarded in favour of an easlier,
    higher priority ACK.

    Whilst we're at it, get rid of the rxrpc_acks() function and access the
    name array directly. We do, however, need to validate the ACK reason
    number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
    array.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to log received packets that get discarded due to Rx
    packet loss.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to log transmission of DATA packets (including loss
    injection).

    Adjust the ACK transmission tracepoint to include the packet serial number
    and to line this up with the DATA transmission display.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to log call timer initiation, setting and expiry.

    Signed-off-by: David Howells

    David Howells
     
  • rxrpc_send_call_packet() is invoking the tx_ack tracepoint before it checks
    whether there's an ACK to transmit (another thread may jump in and transmit
    it).

    Fix this by only invoking the tracepoint if we get a valid ACK to transmit.

    Further, only allocate a serial number if we're going to actually transmit
    something.

    Signed-off-by: David Howells

    David Howells
     
  • When the last packet of data to be transmitted on a call is queued, tx_top
    is set and then the RXRPC_CALL_TX_LAST flag is set. Unfortunately, this
    leaves a race in the ACK processing side of things because the flag affects
    the interpretation of tx_top and also allows us to start receiving reply
    data before we've finished transmitting.

    To fix this, make the following changes:

    (1) rxrpc_queue_packet() now sets a marker in the annotation buffer
    instead of setting the RXRPC_CALL_TX_LAST flag.

    (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the
    same context as the routines that use it.

    (3) rxrpc_end_tx_phase() is simplified to just shift the call state.
    The Tx window must have been rotated before calling to discard the
    last packet.

    (4) rxrpc_receiving_reply() is added to handle the arrival of the first
    DATA packet of a reply to a client call (which is an implicit ACK of
    the Tx phase).

    (5) The last part of rxrpc_input_ack() is reordered to perform Tx
    rotation, then soft-ACK application and then to end the phase if we've
    rotated the last packet. In the event of a terminal ACK, the soft-ACK
    application will be skipped as nAcks should be 0.

    (6) rxrpc_input_ackall() now has to rotate as well as ending the phase.

    In addition:

    (7) Alter the transmit tracepoint to log the rotation of the last packet.

    (8) Remove the no-longer relevant queue_reqack tracepoint note. The
    ACK-REQUESTED packet header flag is now set as needed when we actually
    transmit the packet and may vary by retransmission.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the call timer in the following ways:

    (1) If call->resend_at or call->ack_at are before or equal to the current
    time, then ignore that timeout.

    (2) If call->expire_at is before or equal to the current time, then don't
    set the timer at all (possibly we should queue the call).

    (3) Don't skip modifying the timer if timer_pending() is true. This
    indicates that the timer is working, not that it has expired and is
    running/waiting to run its expiry handler.

    Also call rxrpc_set_timer() to start the call timer going rather than
    calling add_timer().

    Signed-off-by: David Howells

    David Howells
     
  • When rxrpc_input_soft_acks() is parsing the soft-ACKs from an ACK packet,
    it updates the Tx packet annotations in the annotation buffer. If a
    soft-ACK is an ACK, then we overwrite unack'd, nak'd or to-be-retransmitted
    states and that is fine; but if the soft-ACK is an NACK, we overwrite the
    to-be-retransmitted with a nak - which isn't.

    Instead, we need to let any scheduled retransmission stand if the packet
    was NAK'd.

    Note that we don't reissue a resend if the annotation is in the
    to-be-retransmitted state because someone else must've scheduled the
    resend already.

    Signed-off-by: David Howells

    David Howells
     
  • When a DATA packet has its initial transmission, we may need to start or
    adjust the resend timer. Without this we end up relying on being sent a
    NACK to initiate the resend.

    Signed-off-by: David Howells

    David Howells
     
  • before_eq() and friends should be used to compare serial numbers (when not
    checking for (non)equality) rather than casting to int, subtracting and
    checking the result.

    Signed-off-by: David Howells

    David Howells
     
  • ktime_add_ms() should be used to add the resend time (in ms) rather than
    ktime_add_ns().

    Signed-off-by: David Howells

    David Howells
     
  • Make sure that sendmsg() gets woken up if the call it is waiting for
    completes abnormally.

    Signed-off-by: David Howells

    David Howells
     
  • Don't send an IDLE ACK at the end of the transmission of the response to a
    service call. The service end resends DATA packets until the client sends an
    ACK that hard-acks all the send data. At that point, the call is complete.

    Signed-off-by: David Howells

    David Howells
     
  • Set the timestamp on sk_buffs holding packets to be transmitted before
    queueing them because the moment the packet is on the queue it can be seen
    by the retransmission algorithm - which may see a completely random
    timestamp.

    If the retransmission algorithm sees such a timestamp, it may retransmit
    the packet and, in future, tell the congestion management algorithm that
    the retransmit timer expired.

    Signed-off-by: David Howells

    David Howells
     

22 Sep, 2016

5 commits

  • We don't want to send a PING ACK for every new incoming call as that just
    adds to the network traffic. Instead, we send a PING ACK to the first
    three that we receive and then once per second thereafter.

    This could probably be made adjustable in future.

    Signed-off-by: David Howells

    David Howells
     
  • Reduce the number of ACK-Requests we set on DATA packets that we're sending
    to reduce network traffic. We set the flag on odd-numbered DATA packets to
    start off the RTT cache until we have at least three entries in it and then
    probe once per second thereafter to keep it topped up.

    This could be made tunable in future.

    Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
    packets as we transmit them and not stored statically in the sk_buff.

    Signed-off-by: David Howells

    David Howells
     
  • In addition to sending a PING ACK to gain RTT data, we can set the
    RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The
    ACK packet contains the serial number of the packet it is in response to,
    so we can look through the Tx buffer for a matching DATA packet.

    This requires that the data packets be stamped with the time of
    transmission as a ktime rather than having the resend_at time in jiffies.

    This further requires the resend code to do the resend determination in
    ktimes and convert to jiffies to set the timer.

    Signed-off-by: David Howells

    David Howells
     
  • Expedite the transmission of a response to a PING ACK by sending it from
    sendmsg if one is pending. We're most likely to see a PING ACK during the
    client call Tx phase as the other side may use it to determine a number of
    parameters, such as the client's receive window size, the RTT and whether
    the client is doing slow start (similar to RFC5681).

    If we don't expedite it, it's left to the background processing thread to
    transmit.

    Signed-off-by: David Howells

    David Howells
     
  • Send a PING ACK packet to the peer when we get a new incoming call from a
    peer we don't have a record for. The PING RESPONSE ACK packet will tell us
    the following about the peer:

    (1) its receive window size

    (2) its MTU sizes

    (3) its support for jumbo DATA packets

    (4) if it supports slow start (similar to RFC 5681)

    (5) an estimate of the RTT

    This is necessary because the peer won't normally send us an ACK until it
    gets to the Rx phase and we send it a packet, but we would like to know
    some of this information before we start sending packets.

    A pair of tracepoints are added so that RTT determination can be observed.

    Signed-off-by: David Howells

    David Howells