19 Aug, 2019

1 commit

  • The policy for handling the skb list locks on the send and receive paths
    is simple.

    - On the send path we never need to grab the lock on the 'xmitq' list
    when the destination is an exernal node.

    - On the receive path we always need to grab the lock on the 'inputq'
    list, irrespective of source node.

    However, when transmitting node local messages those will eventually
    end up on the receive path of a local socket, meaning that the argument
    'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the
    function tipc_sk_rcv(). This has been handled by always initializing
    the spinlock of the 'xmitq' list at message creation, just in case it
    may end up on the receive path later, and despite knowing that the lock
    in most cases never will be used.

    This approach is inaccurate and confusing, and has also concealed the
    fact that the stated 'no lock grabbing' policy for the send path is
    violated in some cases.

    We now clean up this by never initializing the lock at message creation,
    instead doing this at the moment we find that the message actually will
    enter the receive path. At the same time we fix the four locations
    where we incorrectly access the spinlock on the send/error path.

    This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
    is initialised") which has now become redundant.

    CC: Eric Dumazet
    Reported-by: Chris Packham
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Reviewed-by: Xin Long
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Jun, 2019

1 commit

  • Syzbot reported a memleak caused by grp members' deferredq list not
    purged when the grp is be deleted.

    The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in
    tipc_group_filter_msg() and the skb will stay in deferredq.

    So fix it by calling __skb_queue_purge for each member's deferredq
    in tipc_group_delete() when a tipc sk leaves the grp.

    Fixes: b87a5ea31c93 ("tipc: guarantee group unicast doesn't bypass group broadcast")
    Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Xin Long
     

28 Apr, 2019

1 commit

  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

17 Mar, 2019

1 commit


19 Oct, 2018

1 commit


22 Jul, 2018

1 commit

  • Fixes the following sparse warnings:

    net/tipc/link.c:376:5: warning: symbol 'link_bc_rcv_gap' was not declared. Should it be static?
    net/tipc/link.c:823:6: warning: symbol 'link_prepare_wakeup' was not declared. Should it be static?
    net/tipc/link.c:959:6: warning: symbol 'tipc_link_advance_backlog' was not declared. Should it be static?
    net/tipc/link.c:1009:5: warning: symbol 'tipc_link_retrans' was not declared. Should it be static?
    net/tipc/monitor.c:687:5: warning: symbol '__tipc_nl_add_monitor_peer' was not declared. Should it be static?
    net/tipc/group.c:230:20: warning: symbol 'tipc_group_find_member' was not declared. Should it be static?

    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

19 Jul, 2018

1 commit


30 Jun, 2018

1 commit


06 Mar, 2018

1 commit


28 Feb, 2018

1 commit

  • In commit 60c253069632 ("tipc: fix race between poll() and
    setsockopt()") we introduced a pointer from struct tipc_group to the
    'group_is_connected' flag in struct tipc_sock, so that this field can
    be checked without dereferencing the group pointer of the latter struct.

    The initial value for this flag is correctly set to 'false' when a
    group is created, but we miss the case when no group is created at
    all, in which case the initial value should be 'true'. This has the
    effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
    POLLOUT if they request so.

    This commit corrects this bug.

    Fixes: 60c253069632 ("tipc: fix race between poll() and setsockopt()")
    Reported-by: Hoang Le
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Feb, 2018

1 commit

  • We rename struct tipc_server to struct tipc_topsrv. This reflect its now
    specialized role as topology server. Accoringly, we change or add function
    prefixes to make it clearer which functionality those belong to.

    There are no functional changes in this commit.

    Acked-by: Ying.Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

20 Jan, 2018

1 commit

  • Letting tipc_poll() dereference a socket's pointer to struct tipc_group
    entails a race risk, as the group item may be deleted in a concurrent
    tipc_sk_join() or tipc_sk_leave() thread.

    We now move the 'open' flag in struct tipc_group to struct tipc_sock,
    and let the former retain only a pointer to the moved field. This will
    eliminate the race risk.

    Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

10 Jan, 2018

9 commits

  • The current criteria for returning POLLOUT from a group member socket is
    too simplistic. It basically returns POLLOUT as soon as the group has
    external destinations, something obviously leading to a lot of spinning
    during destination congestion situations. At the same time, the internal
    congestion handling is unnecessarily complex.

    We now change this as follows.

    - We introduce an 'open' flag in struct tipc_group. This flag is used
    only to help poll() get the setting of POLLOUT right, and *not* for
    congeston handling as such. This means that a user can choose to
    ignore an EAGAIN for a destination and go on sending messages to
    other destinations in the group if he wants to.

    - The flag is set to false every time we return EAGAIN on a send call.

    - The flag is set to true every time any member, i.e., not necessarily
    the member that caused EAGAIN, is removed from the small_win list.

    - We remove the group member 'usr_pending' flag. The size of the send
    window and presence in the 'small_win' list is sufficient criteria
    for recognizing congestion.

    This solution seems to be a reasonable compromise between 'anycast',
    which is normally not waiting for POLLOUT for a specific destination,
    and the other three send modes, which are.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • When a member joins a group, it also indicates a binding scope. This
    makes it possible to create both node local groups, invisible to other
    nodes, as well as cluster global groups, visible everywhere.

    In order to avoid that different members end up having permanently
    differing views of group size and memberhip, we must inhibit locally
    and globally bound members from joining the same group.

    We do this by using the binding scope as an additional separator between
    groups. I.e., a member must ignore all membership events from sockets
    using a different scope than itself, and all lookups for message
    destinations must require an exact match between the message's lookup
    scope and the potential target's binding scope.

    Apart from making it possible to create local groups using the same
    identity on different nodes, a side effect of this is that it now also
    becomes possible to create a cluster global group with the same identity
    across the same nodes, without interfering with the local groups.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Currently, when a user is subscribing for binding table publications,
    he will receive a PUBLISH event for all already existing matching items
    in the binding table.

    However, a group socket making a subscriptions doesn't need this initial
    status update from the binding table, because it has already scanned it
    during the join operation. Worse, the multiplicatory effect of issuing
    mutual events for dozens or hundreds group members within a short time
    frame put a heavy load on the topology server, with the end result that
    scale out operations on a big group tend to take much longer than needed.

    We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
    subscriptions, so that this initial avalanche of events is suppressed.
    This change, along with the previous commit, significantly improves the
    range and speed of group scale out operations.

    We keep the new option internal for the tipc driver, at least for now.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • When a socket is joining a group, we look up in the binding table to
    find if there are already other members of the group present. This is
    used for being able to return EAGAIN instead of EHOSTUNREACH if the
    user proceeds directly to a send attempt.

    However, the information in the binding table can be used to directly
    set the created member in state MBR_PUBLISHED and send a JOIN message
    to the peer, instead of waiting for a topology PUBLISH event to do this.
    When there are many members in a group, the propagation time for such
    events can be significant, and we can save time during the join
    operation if we use the initial lookup result fully.

    In this commit, we eliminate the member state MBR_DISCOVERED which has
    been the result of the initial lookup, and do instead go directly to
    MBR_PUBLISHED, which initiates the setup.

    After this change, the tipc_member FSM looks as follows:

    +-----------+
    ---->| PUBLISHED |-----------------------------------------------+
    PUB- +-----------+ LEAVE/WITHRAW |
    LISH |JOIN |
    | +-------------------------------------------+ |
    | | LEAVE/WITHDRAW | |
    | | +------------+ | |
    | | +----------->| PENDING |---------+ | |
    | | |msg/maxactv +-+---+------+ LEAVE/ | | |
    | | | | | WITHDRAW | | |
    | | | +----------+ | | | |
    | | | |revert/maxactv| | | |
    | | | V V V V V
    | +----------+ msg +------------+ +-----------+
    +-->| JOINED |------>| ACTIVE |------>| LEAVING |--->
    | +----------+ +--- -+------+ LEAVE/+-----------+DOWN
    | A A | WITHDRAW A A A EVT
    | | | |RECLAIM | | |
    | | |REMIT V | | |
    | | |== adv +------------+ | | |
    | | +---------| RECLAIMING |--------+ | |
    | | +-----+------+ LEAVE/ | |
    | | |REMIT WITHDRAW | |
    | | |< adv | |
    | |msg/ V LEAVE/ | |
    | |adv==ADV_IDLE+------------+ WITHDRAW | |
    | +-------------| REMITTED |------------+ |
    | +------------+ |
    |PUBLISH |
    JOIN +-----------+ LEAVE/WITHDRAW |
    ---->| JOINING |-----------------------------------------------+
    +-----------+

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy

    Signed-off-by: David S. Miller

    Jon Maloy
     
  • After the changes in the previous commit the group LEAVE sequence
    can be simplified.

    We now let the arrival of a LEAVE message unconditionally issue a group
    DOWN event to the user. When a topology WITHDRAW event is received, the
    member, if it still there, is set to state LEAVING, but we only issue a
    group DOWN event when the link to the peer node is gone, so that no
    LEAVE message is to be expected.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In the current implementation, a group socket receiving topology
    events about other members just converts the topology event message
    into a group event message and stores it until it reaches the right
    state to issue it to the user. This complicates the code unnecessarily,
    and becomes impractical when we in the coming commits will need to
    create and issue membership events independently.

    In this commit, we change this so that we just notice the type and
    origin of the incoming topology event, and then drop the buffer. Only
    when it is time to actually send a group event to the user do we
    explicitly create a new message and send it upwards.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Analysis reveals that the member state MBR_QURANTINED in reality is
    unnecessary, and can be replaced by the state MBR_JOINING at all
    occurrencs.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We handle a corner case in the function tipc_group_update_rcv_win().
    During extreme pessure it might happen that a message receiver has all
    its active senders in RECLAIMING or REMITTED mode, meaning that there
    is nobody to reclaim advertisements from if an additional sender tries
    to go active.

    Currently we just set the new sender to ACTIVE anyway, hence at least
    theoretically opening up for a receiver queue overflow by exceeding the
    MAX_ACTIVE limit. The correct solution to this is to instead add the
    member to the pending queue, while letting the oldest member in that
    queue revert to JOINED state.

    In this commit we refactor the code for handling message arrival from
    a JOINED member, both to make it more comprehensible and to cover the
    case described above.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • - We remove the 'reclaiming' member list in struct tipc_group, since
    it doesn't serve any purpose.

    - We simplify the GRP_REMIT_MSG branch of tipc_group_protocol_rcv().

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

09 Jan, 2018

1 commit


06 Jan, 2018

2 commits

  • We simplify the sorting algorithm in tipc_update_member(). We also make
    the remaining conditional call to this function unconditional, since the
    same condition now is tested for inside the said function.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We rename some functions and variables, to make their purpose clearer.

    - tipc_group::congested -> tipc_group::small_win. Members in this list
    are not necessarily (and typically) congested. Instead, they may
    *potentially* be subject to congestion because their send window is
    less than ADV_IDLE, and therefore need to be checked during message
    transmission.

    - tipc_group_is_receiver() -> tipc_group_is_sender(). This socket will
    accept messages coming from members fulfilling this condition, i.e.,
    they are senders from this member's viewpoint.

    - tipc_group_is_enabled() -> tipc_group_is_receiver(). Members
    fulfilling this condition will accept messages sent from the current
    socket, i.e., they are receivers from its viewpoint.

    There are no functional changes in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

03 Jan, 2018

1 commit

  • In commit 04d7b574b245 ("tipc: add multipoint-to-point flow control") we
    introduced a protocol for preventing buffer overflow when many group
    members try to simultaneously send messages to the same receiving member.

    Stress test of this mechanism has revealed a couple of related bugs:

    - When the receiving member receives an advertisement REMIT message from
    one of the senders, it will sometimes prematurely activate a pending
    member and send it the remitted advertisement, although the upper
    limit for active senders has been reached. This leads to accumulation
    of illegal advertisements, and eventually to messages being dropped
    because of receive buffer overflow.

    - When the receiving member leaves REMITTED state while a received
    message is being read, we miss to look at the pending queue, to
    activate the oldest pending peer. This leads to some pending senders
    being starved out, and never getting the opportunity to profit from
    the remitted advertisement.

    We fix the former in the function tipc_group_proto_rcv() by returning
    directly from the function once it becomes clear that the remitting
    peer cannot leave REMITTED state at that point.

    We fix the latter in the function tipc_group_update_rcv_win() by looking
    up and activate the longest pending peer when it becomes clear that the
    remitting peer now can leave REMITTED state.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

27 Dec, 2017

2 commits

  • When a group member receives a member WITHDRAW event, this might have
    two reasons: either the peer member is leaving the group, or the link
    to the member's node has been lost.

    In the latter case we need to issue a DOWN event to the user right away,
    and let function tipc_group_filter_msg() perform delete of the member
    item. However, in this case we miss to change the state of the member
    item to MBR_LEAVING, so the member item is not deleted, and we have a
    memory leak.

    We now separate better between the four sub-cases of a WITHRAW event
    and make sure that each case is handled correctly.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In commit 2f487712b893 ("tipc: guarantee that group broadcast doesn't
    bypass group unicast") we introduced a mechanism that requires the first
    (replicated) broadcast sent after a unicast to be acknowledged by all
    receivers before permitting sending of the next (true) broadcast.

    The counter for keeping track of the number of acknowledges to expect
    is based on the tipc_group::member_cnt variable. But this misses that
    some of the known members may not be ready for reception, and will never
    acknowledge the message, either because they haven't fully joined the
    group or because they are leaving the group. Such members are identified
    by not fulfilling the condition tested for in the function
    tipc_group_is_enabled().

    We now set the counter for the actual number of acks to receive at the
    moment the message is sent, by just counting the number of recipients
    satisfying the tipc_group_is_enabled() test.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

21 Dec, 2017

1 commit

  • When we receive a JOIN message from a peer member, the message may
    contain an advertised window value ADV_IDLE that permits removing the
    member in question from the tipc_group::congested list. However, since
    the removal has been made conditional on that the advertised window is
    *not* ADV_IDLE, we miss this case. This has the effect that a sender
    sometimes may enter a state of permanent, false, broadcast congestion.

    We fix this by unconditinally removing the member from the congested
    list before calling tipc_member_update(), which might potentially sort
    it into the list again.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

20 Dec, 2017

1 commit

  • When, during a join operation, or during message transmission, a group
    member needs to be added to the group's 'congested' list, we sort it
    into the list in ascending order, according to its current advertised
    window size. However, we miss the case when the member is already on
    that list. This will have the result that the member, after the window
    size has been decremented, might be at the wrong position in that list.
    This again may have the effect that we during broadcast and multicast
    transmissions miss the fact that a destination is not yet ready for
    reception, and we end up sending anyway. From this point on, the
    behavior during the remaining session is unpredictable, e.g., with
    underflowing window sizes.

    We now correct this bug by unconditionally removing the member from
    the list before (re-)sorting it in.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

19 Dec, 2017

2 commits

  • A group member going into state LEAVING should never go back to any
    other state before it is finally deleted. However, this might happen
    if the socket needs to send out a RECLAIM message during this interval.
    Since we forget to remove the leaving member from the group's 'active'
    or 'pending' list, the member might be selected for reclaiming, change
    state to RECLAIMING, and get stuck in this state instead of being
    deleted. This might lead to suppression of the expected 'member down'
    event to the receiver.

    We fix this by removing the member from all lists, except the RB tree,
    at the moment it goes into state LEAVING.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Group messages are not supposed to be returned to sender when the
    destination socket disappears. This is done correctly for regular
    traffic messages, by setting the 'dest_droppable' bit in the header.
    But we forget to do that in group protocol messages. This has the effect
    that such messages may sometimes bounce back to the sender, be perceived
    as a legitimate peer message, and wreak general havoc for the rest of
    the session. In particular, we have seen that a member in state LEAVING
    may go back to state RECLAIMED or REMITTED, hence causing suppression
    of an otherwise expected 'member down' event to the user.

    We fix this by setting the 'dest_droppable' bit even in group protocol
    messages.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

28 Nov, 2017

1 commit


21 Nov, 2017

1 commit

  • When the function tipc_group_filter_msg() finds that a member event
    indicates that the member is leaving the group, it first deletes the
    member instance, and then purges the message queue being handled
    by the call. But the message queue is an aggregated field in the
    just deleted item, leading the purge call to access freed memory.

    We fix this by swapping the order of the two actions.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

13 Oct, 2017

7 commits

  • We already have point-to-multipoint flow control within a group. But
    we even need the opposite; -a scheme which can handle that potentially
    hundreds of sources may try to send messages to the same destination
    simultaneously without causing buffer overflow at the recipient. This
    commit adds such a mechanism.

    The algorithm works as follows:

    - When a member detects a new, joining member, it initially set its
    state to JOINED and advertises a minimum window to the new member.
    This window is chosen so that the new member can send exactly one
    maximum sized message, or several smaller ones, to the recipient
    before it must stop and wait for an additional advertisement. This
    minimum window ADV_IDLE is set to 65 1kB blocks.

    - When a member receives the first data message from a JOINED member,
    it changes the state of the latter to ACTIVE, and advertises a larger
    window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can
    continue sending with minimal disturbances to the data flow.

    - The active members are kept in a dedicated linked list. Each time a
    message is received from an active member, it will be moved to the
    tail of that list. This way, we keep a record of which members have
    been most (tail) and least (head) recently active.

    - There is a maximum number (16) of permitted simultaneous active
    senders per receiver. When this limit is reached, the receiver will
    not advertise anything immediately to a new sender, but instead put
    it in a PENDING state, and add it to a corresponding queue. At the
    same time, it will pick the least recently active member, send it an
    advertisement RECLAIM message, and set this member to state
    RECLAIMING.

    - The reclaimee member has to respond with a REMIT message, meaning that
    it goes back to a send window of ADV_IDLE, and returns its unused
    advertised blocks beyond that value to the reclaiming member.

    - When the reclaiming member receives the REMIT message, it unlinks
    the reclaimee from its active list, resets its state to JOINED, and
    notes that it is now back at ADV_IDLE advertised blocks to that
    member. If there are still unread data messages sent out by
    reclaimee before the REMIT, the member goes into an intermediate
    state REMITTED, where it stays until the said messages have been
    consumed.

    - The returned advertised blocks can now be re-advertised to the
    pending member, which is now set to state ACTIVE and added to
    the active member list.

    - To be proactive, i.e., to minimize the risk that any member will
    end up in the pending queue, we start reclaiming resources already
    when the number of active members exceeds 3/4 of the permitted
    maximum.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The following scenario is possible:
    - A user sends a broadcast message, and thereafter immediately leaves
    the group.
    - The LEAVE message, following a different path than the broadcast,
    arrives ahead of the broadcast, and the sending member is removed
    from the receiver's list.
    - The broadcast message arrives, but is dropped because the sender
    now is unknown to the receipient.

    We fix this by sequence numbering membership events, just like ordinary
    unicast messages. Currently, when a JOIN is sent to a peer, it contains
    a synchronization point, - the sequence number of the next sent
    broadcast, in order to give the receiver a start synchronization point.
    We now let even LEAVE messages contain such an "end synchronization"
    point, so that the recipient can delay the removal of the sending member
    until it knows that all messages have been received.

    The received synchronization points are added as sequence numbers to the
    generated membership events, making it possible to handle them almost
    the same way as regular unicasts in the receiving filter function. In
    particular, a DOWN event with a too high sequence number will be kept
    in the reordering queue until the missing broadcast(s) arrive and have
    been delivered.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We need a mechanism guaranteeing that group unicasts sent out from a
    socket are not bypassed by later sent broadcasts from the same socket.
    We do this as follows:

    - Each time a unicast is sent, we set a the broadcast method for the
    socket to "replicast" and "mandatory". This forces the first
    subsequent broadcast message to follow the same network and data path
    as the preceding unicast to a destination, hence preventing it from
    overtaking the latter.

    - In order to make the 'same data path' statement above true, we let
    group unicasts pass through the multicast link input queue, instead
    of as previously through the unicast link input queue.

    - In the first broadcast following a unicast, we set a new header flag,
    requiring all recipients to immediately acknowledge its reception.

    - During the period before all the expected acknowledges are received,
    the socket refuses to accept any more broadcast attempts, i.e., by
    blocking or returning EAGAIN. This period should typically not be
    longer than a few microseconds.

    - When all acknowledges have been received, the sending socket will
    open up for subsequent broadcasts, this time giving the link layer
    freedom to itself select the best transmission method.

    - The forced and/or abrupt transmission method changes described above
    may lead to broadcasts arriving out of order to the recipients. We
    remedy this by introducing code that checks and if necessary
    re-orders such messages at the receiving end.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Group unicast messages don't follow the same path as broadcast messages,
    and there is a high risk that unicasts sent from a socket might bypass
    previously sent broadcasts from the same socket.

    We fix this by letting all unicast messages carry the sequence number of
    the next sent broadcast from the same node, but without updating this
    number at the receiver. This way, a receiver can check and if necessary
    re-order such messages before they are added to the socket receive buffer.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The previously introduced message transport to all group members is
    based on the tipc multicast service, but is logically a broadcast
    service within the group, and that is what we call it.

    We now add functionality for sending messages to all group members
    having a certain identity. Correspondingly, we call this feature 'group
    multicast'. The service is using unicast when only one destination is
    found, otherwise it will use the bearer broadcast service to transfer
    the messages. In the latter case, the receiving members filter arriving
    messages by looking at the intended destination instance. If there is
    no match, the message will be dropped, while still being considered
    received and read as seen by the flow control mechanism.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In this commit, we make it possible to send connectionless unicast
    messages to any member corresponding to the given member identity,
    when there is more than one such member. The sender must use a
    TIPC_ADDR_NAME address to achieve this effect.

    We also perform load balancing between the destinations, i.e., we
    primarily select one which has advertised sufficient send window
    to not cause a block/EAGAIN delay, if any. This mechanism is
    overlayed on the always present round-robin selection.

    Anycast messages are subject to the same start synchronization
    and flow control mechanism as group broadcast messages.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We now make it possible to send connectionless unicast messages
    within a communication group. To send a message, the sender can use
    either a direct port address, aka port identity, or an indirect port
    name to be looked up.

    This type of messages are subject to the same start synchronization
    and flow control mechanism as group broadcast messages.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy