17 Jun, 2020

1 commit

  • Currently, updating binding table (add service binding to
    name table/withdraw a service binding) is being sent over replicast.
    However, if we are scaling up clusters to > 100 nodes/containers this
    method is less affection because of looping through nodes in a cluster one
    by one.

    It is worth to use broadcast to update a binding service. This way, the
    binding table can be updated on all peer nodes in one shot.

    Broadcast is used when all peer nodes, as indicated by a new capability
    flag TIPC_NAMED_BCAST, support reception of this message type.

    Four problems need to be considered when introducing this feature.
    1) When establishing a link to a new peer node we still update this by a
    unicast 'bulk' update. This may lead to race conditions, where a later
    broadcast publication/withdrawal bypass the 'bulk', resulting in
    disordered publications, or even that a withdrawal may arrive before the
    corresponding publication. We solve this by adding an 'is_last_bulk' bit
    in the last bulk messages so that it can be distinguished from all other
    messages. Only when this message has arrived do we open up for reception
    of broadcast publications/withdrawals.

    2) When a first legacy node is added to the cluster all distribution
    will switch over to use the legacy 'replicast' method, while the
    opposite happens when the last legacy node leaves the cluster. This
    entails another risk of message disordering that has to be handled. We
    solve this by adding a sequence number to the broadcast/replicast
    messages, so that disordering can be discovered and corrected. Note
    however that we don't need to consider potential message loss or
    duplication at this protocol level.

    3) Bulk messages don't contain any sequence numbers, and will always
    arrive in order. Hence we must exempt those from the sequence number
    control and deliver them unconditionally. We solve this by adding a new
    'is_bulk' bit in those messages so that they can be recognized.

    4) Legacy messages, which don't contain any new bits or sequence
    numbers, but neither can arrive out of order, also need to be exempt
    from the initial synchronization and sequence number check, and
    delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
    to all new messages so that those can be distinguished from legacy
    messages and the latter delivered directly.

    v1->v2:
    - fix warning issue reported by kbuild test robot
    - add santiy check to drop the publication message with a sequence
    number that is lower than the agreed synch point

    Signed-off-by: kernel test robot
    Signed-off-by: Hoang Huu Le
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Hoang Huu Le
     

01 Apr, 2018

1 commit

  • With the new RB tree structure for service ranges it becomes possible to
    solve an old problem; - we can now allow overlapping service ranges in
    the table.

    When inserting a new service range to the tree, we use 'lower' as primary
    key, and when necessary 'upper' as secondary key.

    Since there may now be multiple service ranges matching an indicated
    'lower' value, we must also add the 'upper' value to the functions
    used for removing publications, so that the correct, corresponding
    range item can be found.

    These changes guarantee that a well-formed publication/withdrawal item
    from a peer node never will be rejected, and make it possible to
    eliminate the problematic backlog functionality we currently have for
    handling such cases.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

18 Mar, 2018

1 commit

  • We rename some lists and fields in struct publication both to make
    the naming more consistent and to better reflect their roles. We
    also update the descriptions of those lists.

    node_list -> local_publ
    cluster_list -> all_publ
    pport_list -> binding_sock
    ref -> port

    There are no functional changes in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

21 Nov, 2015

1 commit

  • The file name_distr.c currently contains three functions,
    named_cluster_distribute(), tipc_publ_subcscribe() and
    tipc_publ_unsubscribe() that all directly access fields in
    struct tipc_node. We want to eliminate such dependencies, so
    we move those functions to the file node.c and rename them to
    tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
    respectively.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 Feb, 2015

1 commit

  • TIPC handles message cardinality and sequencing at the link layer,
    before passing messages upwards to the destination sockets. During the
    upcall from link to socket no locks are held. It is therefore possible,
    and we see it happen occasionally, that messages arriving in different
    threads and delivered in sequence still bypass each other before they
    reach the destination socket. This must not happen, since it violates
    the sequentiality guarantee.

    We solve this by adding a new input buffer queue to the link structure.
    Arriving messages are added safely to the tail of that queue by the
    link, while the head of the queue is consumed, also safely, by the
    receiving socket. Sequentiality is secured per socket by only allowing
    buffers to be dequeued inside the socket lock. Since there may be multiple
    simultaneous readers of the queue, we use a 'filter' parameter to reduce
    the risk that they peek the same buffer from the queue, hence also
    reducing the risk of contention on the receiving socket locks.

    This solves the sequentiality problem, and seems to cause no measurable
    performance degradation.

    A nice side effect of this change is that lock handling in the functions
    tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
    will enable future simplifications of those functions.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

13 Jan, 2015

3 commits

  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

27 Nov, 2014

1 commit

  • The node subscribe infrastructure represents a virtual base class, so
    its users, such as struct tipc_port and struct publication, can derive
    its implemented functionalities. However, after the removal of struct
    tipc_port, struct publication is left as its only single user now. So
    defining an abstract infrastructure for one user becomes no longer
    reasonable. If corresponding new functions associated with the
    infrastructure are moved to name_table.c file, the node subscription
    infrastructure can be removed as well.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

02 Sep, 2014

1 commit

  • TIPC name table updates are distributed asynchronously in a cluster,
    entailing a risk of certain race conditions. E.g., if two nodes
    simultaneously issue conflicting (overlapping) publications, this may
    not be detected until both publications have reached a third node, in
    which case one of the publications will be silently dropped on that
    node. Hence, we end up with an inconsistent name table.

    In most cases this conflict is just a temporary race, e.g., one
    node is issuing a publication under the assumption that a previous,
    conflicting, publication has already been withdrawn by the other node.
    However, because of the (rtt related) distributed update delay, this
    may not yet hold true on all nodes. The symptom of this failure is a
    syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

    In this commit we add a resiliency queue at the receiving end of
    the name table distributor. When insertion of an arriving publication
    fails, we retain it in this queue for a short amount of time, assuming
    that another update will arrive very soon and clear the conflict. If so
    happens, we insert the publication, otherwise we drop it.

    The (configurable) retention value defaults to 2000 ms. Knowing from
    experience that the situation described above is extremely rare, there
    is no risk that the queue will accumulate any large number of items.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     

17 Jul, 2014

1 commit

  • In a previous commit series ("tipc: new unicast transmission code")
    we introduced a new message sending function, tipc_link_xmit2(),
    and moved the unicast data users over to use that function. We now
    let the internal name table distributor do the same.

    The interaction between the name distributor and the node/link
    layer also becomes significantly simpler, so we can eliminate
    the function tipc_link_names_xmit().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 May, 2014

1 commit


29 Apr, 2014

1 commit

  • Commit a89778d8baf19cd7e728d81121a294a06cedaad1 ("tipc: add support
    for link state subscriptions") introduced below possible deadlock
    scenario:

    CPU0 CPU1
    T0: tipc_publish() link_timeout()
    T1: tipc_nametbl_publish() [grab node lock]*
    T2: [grab nametbl write lock]* link_state_event()
    T3: named_cluster_distribute() link_activate()
    T4: [grab node lock]* tipc_node_link_up()
    T5: tipc_nametbl_publish()
    T6: [grab nametble write lock]*

    The opposite order of holding nametbl write lock and node lock on
    above two different paths may result in a deadlock. If we move the
    the delivery of named messages via link out of name nametbl lock,
    the reverse order of holding locks will be eliminated, as a result,
    the deadlock will be killed as well.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

19 Feb, 2014

1 commit

  • Rename the following functions, which are shorter and more in line
    with common naming practice in the network subsystem.

    tipc_bclink_send_msg->tipc_bclink_xmit
    tipc_bclink_recv_pkt->tipc_bclink_rcv
    tipc_disc_recv_msg->tipc_disc_rcv
    tipc_link_send_proto_msg->tipc_link_proto_xmit
    link_recv_proto_msg->tipc_link_proto_rcv
    link_send_sections_long->tipc_link_iovec_long_xmit
    tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
    tipc_link_send_sync->tipc_link_sync_xmit
    tipc_link_recv_sync->tipc_link_sync_rcv
    tipc_link_send_buf->__tipc_link_xmit
    tipc_link_send->tipc_link_xmit
    tipc_link_send_names->tipc_link_names_xmit
    tipc_named_recv->tipc_named_rcv
    tipc_link_recv_bundle->tipc_link_bundle_rcv
    tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
    link_send_long_buf->tipc_link_frag_xmit

    tipc_multicast->tipc_port_mcast_xmit
    tipc_port_recv_mcast->tipc_port_mcast_rcv
    tipc_port_reject_sections->tipc_port_iovec_reject
    tipc_port_recv_proto_msg->tipc_port_proto_rcv
    tipc_connect->tipc_port_connect
    __tipc_connect->__tipc_port_connect
    __tipc_disconnect->__tipc_port_disconnect
    tipc_disconnect->tipc_port_disconnect
    tipc_shutdown->tipc_port_shutdown
    tipc_port_recv_msg->tipc_port_rcv
    tipc_port_recv_sections->tipc_port_iovec_rcv

    release->tipc_release
    accept->tipc_accept
    bind->tipc_bind
    get_name->tipc_getname
    poll->tipc_poll
    send_msg->tipc_sendmsg
    send_packet->tipc_send_packet
    send_stream->tipc_send_stream
    recv_msg->tipc_recvmsg
    recv_stream->tipc_recv_stream
    connect->tipc_connect
    listen->tipc_listen
    shutdown->tipc_shutdown
    setsockopt->tipc_setsockopt
    getsockopt->tipc_getsockopt

    Above changes have no impact on current users of the functions.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

11 Feb, 2007

1 commit


18 Jan, 2006

1 commit


13 Jan, 2006

4 commits