03 Sep, 2016

1 commit

  • When we send broadcasts in clusters of more 70-80 nodes, we sometimes
    see the broadcast link resetting because of an excessive number of
    retransmissions. This is caused by a combination of two factors:

    1) A 'NACK crunch", where loss of broadcast packets is discovered
    and NACK'ed by several nodes simultaneously, leading to multiple
    redundant broadcast retransmissions.

    2) The fact that the NACKS as such also are sent as broadcast, leading
    to excessive load and packet loss on the transmitting switch/bridge.

    This commit deals with the latter problem, by moving sending of
    broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
    to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
    bits in word 8 of the said message for this purpose, and introduce a
    new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
    backwards compatible.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

16 Apr, 2016

1 commit

  • According to the link FSM, a received traffic packet can take a link
    from state ESTABLISHING to ESTABLISHED, but the link can still not be
    fully set up in one atomic operation. This means that even if the the
    very first packet on the link is a traffic packet with sequence number
    1 (one), it has to be dropped and retransmitted.

    This can be avoided if we let the mentioned packet be preceded by a
    LINK_PROTOCOL/STATE message, which takes up the endpoint before the
    arrival of the traffic.

    We add this small feature in this commit.

    This is a fully compatible change.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

07 Mar, 2016

1 commit

  • Until now, we have kept a pre-allocated protocol message header
    aggregated into struct tipc_link. Apart from adding unnecessary
    footprint to the link instances, this requires extra code both to
    initialize and re-initialize it.

    We now remove this sub-optimization. This change also makes it
    possible to clean up the function tipc_build_proto_msg() and remove
    a couple of small functions that were accessing the mentioned header.
    In particular, we can replace all occurrences of the local function
    call link_own_addr(link) with the generic tipc_own_addr(net).

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 Feb, 2016

1 commit

  • Changing certain link attributes (link tolerance and link priority)
    from the TIPC management tool is supposed to automatically take
    effect at both endpoints of the affected link.

    Currently the media address is not instantiated for the link and is
    used uninstantiated when crafting protocol messages designated for the
    peer endpoint. This means that changing a link property currently
    results in the property being changed on the local machine but the
    protocol message designated for the peer gets lost. Resulting in
    property discrepancy between the endpoints.

    In this patch we resolve this by using the media address from the
    link entry and using the bearer transmit function to send it. Hence,
    we can now eliminate the redundant function tipc_link_prot_xmit() and
    the redundant field tipc_link::media_addr.

    Fixes: 2af5ae372a4b (tipc: clean up unused code and structures)
    Reviewed-by: Jon Maloy
    Reported-by: Jason Hu
    Signed-off-by: Richard Alpe
    Signed-off-by: David S. Miller

    Richard Alpe
     

21 Nov, 2015

3 commits

  • The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
    has been significantly reduced over the last couple of years.

    We now root out the last traces of this practice.
    There are no functional changes in this commit.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We move the definition of struct tipc_link from link.h to link.c in
    order to minimize its exposure to the rest of the code.

    When needed, we define new functions to make it possible for external
    entities to access and set data in the link.

    Apart from the above, there are no functional changes.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In our effort to have less code and include dependencies between
    entities such as node, link and bearer, we try to narrow down
    the exposed interface towards the node as much as possible.

    In this commit, we move the definition of struct tipc_node, along
    with many of its associated function declarations, from node.h to
    node.c. We also move some function definitions from link.c and
    name_distr.c to node.c, since they access fields in struct tipc_node
    that should not be externally visible. The moved functions are renamed
    according to new location, and made static whenever possible.

    There are no functional changes in this commit.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

9 commits

  • After the previous changes in this series, we can now remove some
    unused code and structures, both in the broadcast, link aggregation
    and link code.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • With the recent commit series, we have established a one-way dependency
    between the link aggregation (struct tipc_node) instances and their
    pertaining tipc_link instances. This has enabled quite significant code
    and structure simplifications.

    In this commit, we eliminate the field 'owner', which points to an
    instance of struct tipc_node, from struct tipc_link, and replace it with
    a pointer to struct net, which is the only external reference now needed
    by a link instance.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have only been supporting a fix MTU size of 1500 bytes
    for all broadcast media, irrespective of their actual capability.

    We now make the broadcast MTU adaptable to the carrying media, i.e.,
    we use the smallest MTU supported by any of the interfaces attached
    to TIPC.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The code path for receiving broadcast packets is currently distinct
    from the unicast path. This leads to unnecessary code and data
    duplication, something that can be avoided with some effort.

    We now introduce separate per-peer tipc_link instances for handling
    broadcast packet reception. Each receive link keeps a pointer to the
    common, single, broadcast link instance, and can hence handle release
    and retransmission of send buffers as if they belonged to the own
    instance.

    Furthermore, we let each unicast link instance keep a reference to both
    the pertaining broadcast receive link, and to the common send link.
    This makes it possible for the unicast links to easily access data for
    broadcast link synchronization, as well as for carrying acknowledges for
    received broadcast packets.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have tried to support both the newer, dedicated broadcast
    synchronization mechanism along with the older, less safe, RESET_MSG/
    ACTIVATE_MSG based one. The latter method has turned out to be a hazard
    in a highly dynamic cluster, so we find it safer to disable it completely
    when we find that the former mechanism is supported by the peer node.

    For this purpose, we now introduce a new capabability bit,
    TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
    syncronization is supported by the present node. The new bit is conveyed
    between peers in the 'capabilities' field of neighbor discovery messages.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • This commit simplifies the broadcast link transmission function, by
    leveraging previous changes to the link transmission function and the
    broadcast transmission link life cycle.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Realizing that unicast is just a special case of broadcast, we also see
    that we can go in the other direction, i.e., that modest changes to the
    current unicast link can make it generic enough to support broadcast.

    The following changes are introduced here:

    - A new counter ("ackers") in struct tipc_link, to indicate how many
    peers need to ack a packet before it can be released.
    - A corresponding counter in the skb user area, to keep track of how
    many peers a are left to ack before a buffer can be released.
    - A new counter ("acked"), to keep persistent track of how far a peer
    has acked at the moment, i.e., where in the transmission queue to
    start updating buffers when the next ack arrives. This is to avoid
    double acknowledgements from a peer, with inadvertent relase of
    packets as a result.
    - A more generic tipc_link_retrans() function, where retransmit starts
    from a given sequence number, instead of the first packet in the
    transmision queue. This is to minimize the number of retransmitted
    packets on the broadcast media.

    When the new functionality is taken into use in the next commits,
    we expect it to have minimal effect on unicast mode performance.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast link instance (struct tipc_link) used for sending is
    currently aggregated into struct tipc_bclink. This means that we cannot
    use the regular tipc_link_create() function for initiating the link, but
    do instead have to initiate numerous fields directly from the
    bcast_init() function.

    We want to reduce dependencies between the broadcast functionality
    and the inner workings of tipc_link. In this commit, we introduce
    a new function tipc_bclink_create() to link.c, and allocate the
    instance of the link separately using this function.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In reality, the link implementation is already independent from
    struct tipc_bearer, in that it doesn't store any reference to it.
    However, we still pass on a pointer to a bearer instance in the
    function tipc_link_create(), just to have it extract some
    initialization information from it.

    I later commits, we need to create instances of tipc_link without
    having any associated struct tipc_bearer. To facilitate this, we
    want to extract the initialization data already in the creator
    function in node.c, before calling tipc_link_create(), and pass
    this info on as individual parameters in the call.

    This commit introduces this change.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

16 Oct, 2015

4 commits

  • The change made in the previous commit revealed a small flaw in the way
    the node FSM is updated. When the function tipc_node_link_down() is
    called for the last link to a node, we should check whether this was
    caused by a local reset or by a received RESET message from the peer.
    In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
    the node FSM, so that it is ready to re-establish contact. If this is
    not done, the peer node will sometimes have to go through a second
    establish cycle before the link becomes stable.

    We fix this in this commit by conditionally issuing the mentioned
    event in the function tipc_node_link_down(). We also move LINK_RESET
    FSM even away from the link_reset() function and into the caller
    function, partially because it is easier to follow the code when state
    changes are gathered at a limited number of locations, partially
    because there will be cases in future commits where we don't want the
    link to go RESET mode when link_reset() is called.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • When a link is taken down because of a node local event, such as
    disabling of a bearer or an interface, we currently leave it to the
    peer node to discover the broken communication. The default time for
    such failure discovery is 1.5-2 seconds.

    If we instead allow the terminating link endpoint to send out a RESET
    message at the moment it is reset, we can achieve the impression that
    both endpoints are going down instantly. Since this is a very common
    scenario, we find it worthwhile to make this small modification.

    Apart from letting the link produce the said message, we also have to
    ensure that the interface is able to transmit it before TIPC is
    detached. We do this by performing the disabling of a bearer in three
    steps:

    1) Disable reception of TIPC packets from the interface in question.
    2) Take down the links, while allowing them so send out a RESET message.
    3) Disable transmission of TIPC packets on the interface.

    Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
    instead of as currently the NEDEV_DOWN event, to ensure that such
    transmission is possible during the teardown phase.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Link establishing, just like link teardown, is a non-atomic action, in
    the sense that discovering that conditions are right to establish a link,
    and the actual adding of the link to one of the node's send slots is done
    in two different lock contexts. The link FSM is designed to help bridging
    the gap between the two contexts in a safe manner.

    We have now discovered a weakness in the implementaton of this FSM.
    Because we directly let the link go from state LINK_ESTABLISHING to
    state LINK_ESTABLISHED already in the first lock context, we are unable
    to distinguish between a fully established link, i.e., a link that has
    been added to its slot, and a link that has not yet reached the second
    lock context. It may hence happen that a manual intervention, e.g., when
    disabling an interface, causes the function tipc_node_link_down() to try
    removing the link from the node slots, decrementing its active link
    counter etc, although the link was never added there in the first place.

    We solve this by delaying the actual state change until we reach the
    second lock context, inside the function tipc_node_link_up(). This
    makes it possible for potentail callers of __tipc_node_link_down() to
    know if they should proceed or not, and the problem is solved.

    Unforunately, the situation described above also has a second problem.
    Since there by necessity is a tipc_node_link_up() call pending once
    the node lock has been released, we must defuse that call by setting
    the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
    us to make a slight modification to the link FSM, which will now look
    as follows.

    +------------------------------------+
    |RESET_EVT |
    | |
    | +--------------+
    | +-----------------| SYNCHING |-----------------+
    | |FAILURE_EVT +--------------+ PEER_RESET_EVT|
    | | A | |
    | | | | |
    | | | | |
    | | |SYNCH_ |SYNCH_ |
    | | |BEGIN_EVT |END_EVT |
    | | | | |
    | V | V V
    | +-------------+ +--------------+ +------------+
    | | RESETTING || PEER_RESET |
    | +-------------+ FAILURE_ +--------------+ PEER_ +------------+
    | | EVT | A RESET_EVT |
    | | | | |
    | | +----------------+ | |
    | RESET_EVT| |RESET_EVT | |
    | | | | |
    | | | |ESTABLISH_EVT |
    | | | +-------------+ | |
    | | | | RESET_EVT | | |
    | | | | | | |
    | V V V | | |
    | +-------------+ +--------------+ RESET_EVT|
    +--->| RESET |--------->| ESTABLISHING |
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The sequence number of an incoming packet is currently only checked
    for less than, equality to, or bigger than the next expected number,
    meaning that the receive window in practice becomes one half sequence
    number cycle, or U16_MAX/2. This does not make sense, and may not even
    be safe if there are extreme delays in the network. Any packet sent by
    the peer during the ongoing cycle must belong inside his current send
    window, or should otherwise be dropped if possible.

    Since a link endpoint cannot know its peer's current send window, it
    has to base this sanity check on a worst-case assumption, i.e., that
    the peer is using a maximum sized window of 8191 packets. Using this
    assumption, we now add a check that the sequence number is not bigger
    than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
    done, so that the receive window test is performed before the gap test.
    This way, we are guaranteed that no packet with illegal sequence numbers
    are ever added to the deferred queue.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

31 Jul, 2015

6 commits

  • We simplify the link creation function tipc_link_create() and the way
    the link struct it is connected to the node struct. In particular, we
    remove the duplicate initialization of some fields which are anyway set
    in tipc_link_reset().

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have been handling link failover and synchronization
    by using an additional link state variable, "exec_mode". This variable
    is not independent of the link FSM state, something causing a risk of
    inconsistencies, apart from the fact that it clutters the code.

    The conditions are now in place to define a new link FSM that covers
    all existing use cases, including failover and synchronization, and
    eliminate the "exec_mode" field altogether. The FSM must also support
    non-atomic resetting of links, which will be introduced later.

    The new link FSM is shown below, with 7 states and 8 events.
    Only events leading to state change are shown as edges.

    +------------------------------------+
    |RESET_EVT |
    | |
    | +--------------+
    | +-----------------| SYNCHING |-----------------+
    | |FAILURE_EVT +--------------+ PEER_RESET_EVT|
    | | A | |
    | | | | |
    | | | | |
    | | |SYNCH_ |SYNCH_ |
    | | |BEGIN_EVT |END_EVT |
    | | | | |
    | V | V V
    | +-------------+ +--------------+ +------------+
    | | RESETTING || PEER_RESET |
    | +-------------+ FAILURE_ +--------------+ PEER_ +------------+
    | | EVT | A RESET_EVT |
    | | | | |
    | | | | |
    | | +--------------+ | |
    | RESET_EVT| |RESET_EVT |ESTABLISH_EVT |
    | | | | |
    | | | | |
    | V V | |
    | +-------------+ +--------------+ RESET_EVT|
    +--->| RESET |--------->| ESTABLISHING |
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The implementation of the link FSM currently takes decisions about and
    sends out link protocol messages. This is unnecessary, since such
    actions are not the result of any link state change, and are even
    decided based on non-FSM state information ("silent_intv_cnt").

    We now move the sending of unicast link protocol messages to the
    function tipc_link_timeout(), and the initial broadcast synchronization
    message to tipc_node_link_up(). The latter is done because a link
    instance should not need to know whether it is the first or second
    link to a destination. Such information is now restricted to and
    handled by the link aggregation layer in node.c

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Link failover and synchronization have until now been handled by the
    links themselves, forcing them to have knowledge about and to access
    parallel links in order to make the two algorithms work correctly.

    In this commit, we move the control part of this functionality to the
    link aggregation level in node.c, which is the right location for this.
    As a result, the two algorithms become easier to follow, and the link
    implementation becomes simpler.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In line with our effort to let the node level have full control over
    its links, we want to move all link reset calls from link.c to node.c.
    Some of the calls can be moved by simply moving the calling function,
    when this is the right thing to do. For the remaining calls we use
    the now established technique of returning a TIPC_LINK_DOWN_EVT
    flag from tipc_link_rcv(), whereafter we perform the reset call when
    the call returns.

    This change serves as a preparation for the coming commits.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The function tipc_link_activate() is redundant, since it mostly performs
    settings that have already been done in a preceding tipc_link_reset().

    There are three exceptions to this:
    - The actual state change to TIPC_LINK_WORKING. This should anyway be done
    in the FSM, and not in a separate function.
    - Registration of the link with the bearer. This should be done by the
    node, since we don't want the link to have any knowledge about its
    specific bearer.
    - Call to tipc_node_link_up() for user access registration. With the new
    role distribution between link aggregation and link level this becomes
    the wrong call order; tipc_node_link_up() should instead be called
    directly as a result of a TIPC_LINK_UP event, hence by the node itself.

    This commit implements those changes.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2015

7 commits

  • We convert packet/message reception according to the same principle
    we have been using for message sending and timeout handling:

    We move the function tipc_rcv() to node.c, hence handling the initial
    packet reception at the link aggregation level. The function grabs
    the node lock, selects the receiving link, and accesses it via a new
    call tipc_link_rcv(). This function appends buffers to the input
    queue for delivery upwards, but it may also append outgoing packets
    to the xmit queue, just as we do during regular message sending. The
    latter will happen when buffers are forwarded from the link backlog,
    or when retransmission is requested.

    Upon return of this function, and after having released the node lock,
    tipc_rcv() delivers/tranmsits the contents of those queues, but it may
    also perform actions such as link activation or reset, as indicated by
    the return flags from the link.

    This reduces the number of cpu cycles spent inside the node spinlock,
    and reduces contention on that lock.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In our effort to move control of the links to the link aggregation
    layer, we move the perodic link supervision timer to struct tipc_node.
    The new timer is shared between all links belonging to the node, thus
    saving resources, while still kicking the FSM on both its pertaining
    links at each expiration.

    The current link timer and corresponding functions are removed.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We create a second, simpler, link timer function, tipc_link_timeout().
    The new function makes use of the new FSM function introduced in the
    previous commit, and just like it, takes a buffer queue as parameter.
    It returns an event bit field and potentially a link protocol packet
    to the caller.

    The existing timer function, link_timeout(), is still needed for a
    while, so we redesign it to become a wrapper around the new function.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The link FSM implementation is currently unnecessarily complex.
    It sometimes checks for conditional state outside the FSM data
    before deciding next state, and often performs actions directly
    inside the FSM logics.

    In this commit, we create a second, simpler FSM implementation,
    that as far as possible acts only on states and events that it is
    strictly defined for, and postpone any actions until it is finished
    with its decisions. It also returns an event flag field and an a
    buffer queue which may potentially contain a protocol message to
    be sent by the caller.

    Unfortunately, we cannot yet make the FSM "clean", in the sense
    that its decisions are only based on FSM state and event, and that
    state changes happen only here. That will have to wait until the
    activate/reset logics has been cleaned up in a future commit.

    We also rename the link states as follows:

    WORKING_WORKING -> TIPC_LINK_WORKING
    WORKING_UNKNOWN -> TIPC_LINK_PROBING
    RESET_UNKNOWN -> TIPC_LINK_RESETTING
    RESET_RESET -> TIPC_LINK_ESTABLISHING

    The existing FSM function, link_state_event(), is still needed for
    a while, so we redesign it to make use of the new function.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The status flag LINK_STOPPED is not needed any more, since the
    mechanism for delayed deletion of links has been removed.
    Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
    because we can just as well start the link timer directly from
    inside tipc_link_create().

    We eliminate these flags in this commit.

    Instead of the above flags, we now introduce three new link modes,
    TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
    indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
    the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
    blocks timer-driven protocol messages to be sent out, and any change
    to the link FSM. Since the modes are mutually exclusive, we convert
    them to state values, and rename the 'flags' field in struct tipc_link
    to 'exec_mode'.

    Finally, we move the #defines for link FSM states and events from link.h
    into enums inside the file link.c, which is the real usage scope of
    these definitions.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently, message sending is performed through a deep call chain,
    where the node spinlock is grabbed and held during a significant
    part of the transmission time. This is clearly detrimental to
    overall throughput performance; it would be better if we could send
    the message after the spinlock has been released.

    In this commit, we do instead let the call revert on the stack after
    the buffer chain has been added to the transmission queue, whereafter
    clones of the buffers are transmitted to the device layer outside the
    spinlock scope.

    As a further step in our effort to separate the roles of the node
    and link entities we also move the function tipc_link_xmit() to
    node.c, and rename it to tipc_node_xmit().

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • At present, the link input queue and the name distributor receive
    queues are fields aggregated in struct tipc_link. This is a hazard,
    because a link might be deleted while a receiving socket still keeps
    reference to one of the queues.

    This commit fixes this bug. However, rather than adding yet another
    reference counter to the critical data path, we move the two queues
    to safe ground inside struct tipc_node, which is already protected, and
    let the link code only handle references to the queues. This is also
    in line with planned later changes in this area.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

29 Jun, 2015

1 commit

  • In commit 1f66d161ab3d8b518903fa6c3f9c1f48d6919e74
    ("tipc: introduce starvation free send algorithm")
    we introduced a counter per priority level for buffers
    in the link backlog queue. We also introduced a new
    function tipc_link_purge_backlog(), to reset these
    counters to zero when the link is reset.

    Unfortunately, we missed to call this function when
    the broadcast link is reset, with the result that the
    values of these counters might be permanently skewed
    when new nodes are attached. This may in the worst case
    lead to permananent, but spurious, broadcast link
    congestion, where no broadcast packets can be sent at
    all.

    We fix this bug with this commit.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2015

4 commits

  • We change the sequence number checkpointing that is performed
    by the timer in order to discover if the peer is active. Currently,
    we store a checkpoint of the next expected sequence number "rcv_nxt"
    at each timer expiration, and compare it to the current expected
    number at next timeout expiration. Instead, we now use the already
    existing field "silent_intv_cnt" for this task. We step the counter
    at each timeout expiration, and zero it at each valid received packet.
    If no valid packet has been received from the peer after "abort_limit"
    number of silent timer intervals, the link is declared faulty and reset.

    We also remove the multiple instances of timer activation from inside
    the FSM function "link_state_event()", and now do it at only one place;
    at the end of the timer function itself.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We rename some fields in struct tipc_link, in order to give them more
    descriptive names:

    next_in_no -> rcv_nxt
    next_out_no-> snd_nxt
    fsm_msg_cnt-> silent_intv_cnt
    cont_intv -> keepalive_intv
    last_retransmitted -> last_retransm

    There are no functional changes in this commit.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Although the sequence number in the TIPC protocol is 16 bits, we have
    until now stored it internally as an unsigned 32 bits integer.
    We got around this by always doing explicit modulo-65535 operations
    whenever we need to access a sequence number.

    We now make the incoming and outgoing sequence numbers to unsigned
    16-bit integers, and remove the modulo operations where applicable.

    We also move the arithmetic inline functions for 16 bit integers
    to core.h, and the function buf_seqno() to msg.h, so they can easily
    be accessed from anywhere in the code.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Since commit 4b475e3f2f8e4e241de101c8240f1d74d0470494
    ("tipc: eliminate delayed link deletion at link failover") the extra
    boolean parameter "shutting_down" is not any longer needed for the
    functions bearer_disable() and tipc_link_delete_list().

    Furhermore, the function tipc_link_reset_links(), called from
    bearer_reset() is now unnecessary. We can just as well delete
    all the links, as we do in bearer_disable(), and start over with
    creating new links.

    This commit introduces those changes.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

03 Apr, 2015

2 commits

  • When a link is being established, the two endpoints advertise their
    respective interface MTU in the transmitted RESET and ACTIVATE messages.
    If there is any difference, the lower of the two MTUs will be selected
    for use by both endpoints.

    However, as a remnant of earlier attempts to introduce TIPC level
    routing. there also exists an MTU discovery mechanism. If an intermediate
    node has a lower MTU than the two endpoints, they will discover this
    through a bisectional approach, and finally adopt this MTU for common use.

    Since there is no TIPC level routing, and probably never will be,
    this mechanism doesn't make any sense, and only serves to make the
    link level protocol unecessarily complex.

    In this commit, we eliminate the MTU discovery algorithm,and fall back
    to the simple MTU advertising approach. This change is fully backwards
    compatible.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • When a bearer is disabled manually, all its links have to be reset
    and deleted. However, if there is a remaining, parallel link ready
    to take over a deleted link's traffic, we currently delay the delete
    of the removed link until the failover procedure is finished. This
    is because the remaining link needs to access state from the reset
    link, such as the last received packet number, and any partially
    reassembled buffer, in order to perform a successful failover.

    In this commit, we do instead move the state data over to the new
    link, so that it can fulfill the procedure autonomously, without
    accessing any data on the old link. This means that we can now
    proceed and delete all pertaining links immediately when a bearer
    is disabled. This saves us from some unnecessary complexity in such
    situations.

    We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
    ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
    FAILOVER_MSG and SYNCH_MSG respectively.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy