21 Jan, 2017

3 commits

  • If the bearer carrying multicast messages supports broadcast, those
    messages will be sent to all cluster nodes, irrespective of whether
    these nodes host any actual destinations socket or not. This is clearly
    wasteful if the cluster is large and there are only a few real
    destinations for the message being sent.

    In this commit we extend the eligibility of the newly introduced
    "replicast" transmit option. We now make it possible for a user to
    select which method he wants to be used, either as a mandatory setting
    via setsockopt(), or as a relative setting where we let the broadcast
    layer decide which method to use based on the ratio between cluster
    size and the message's actual number of destination nodes.

    In the latter case, a sending socket must stick to a previously
    selected method until it enters an idle period of at least 5 seconds.
    This eliminates the risk of message reordering caused by method change,
    i.e., when changes to cluster size or number of destinations would
    otherwise mandate a new method to be used.

    Reviewed-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • TIPC multicast messages are currently carried over a reliable
    'broadcast link', making use of the underlying media's ability to
    transport packets as L2 broadcast or IP multicast to all nodes in
    the cluster.

    When the used bearer is lacking that ability, we can instead emulate
    the broadcast service by replicating and sending the packets over as
    many unicast links as needed to reach all identified destinations.
    We now introduce a new TIPC link-level 'replicast' service that does
    this.

    Reviewed-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • As a further preparation for the upcoming 'replicast' functionality,
    we add some necessary structs and functions for looking up and returning
    a list of all nodes that host destinations for a given multicast message.

    Reviewed-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

30 Oct, 2016

1 commit

  • In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
    criteria") we tried to fix a problem with the initial synchronization
    of broadcast link acknowledge values. Unfortunately that solution is
    not sufficient to solve the issue.

    We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
    non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
    initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
    broadcast acknowledge numbers, leading to premature opening of the
    broadcast link. When the bypassed packets finally arrive, they are
    inadvertently accepted, and the already correctly initialized
    acknowledge number in the broadcast receive link is overwritten by
    the invalid (zero) value of the said packets. After this the broadcast
    link goes stale.

    We now fix this by marking the packets where we know the acknowledge
    value is or may be invalid, and then ignoring the acks from those.

    To this purpose, we claim an unused bit in the header to indicate that
    the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
    synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
    packets, plus those LINK_PROTOCOL packets sent out before the broadcast
    links are fully synchronized.

    This minor protocol update is fully backwards compatible.

    Reported-by: John Thompson
    Tested-by: John Thompson
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

03 Sep, 2016

1 commit

  • When we send broadcasts in clusters of more 70-80 nodes, we sometimes
    see the broadcast link resetting because of an excessive number of
    retransmissions. This is caused by a combination of two factors:

    1) A 'NACK crunch", where loss of broadcast packets is discovered
    and NACK'ed by several nodes simultaneously, leading to multiple
    redundant broadcast retransmissions.

    2) The fact that the NACKS as such also are sent as broadcast, leading
    to excessive load and packet loss on the transmitting switch/bridge.

    This commit deals with the latter problem, by moving sending of
    broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
    to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
    bits in word 8 of the said message for this purpose, and introduce a
    new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
    backwards compatible.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

07 Mar, 2016

1 commit

  • Until now, we have kept a pre-allocated protocol message header
    aggregated into struct tipc_link. Apart from adding unnecessary
    footprint to the link instances, this requires extra code both to
    initialize and re-initialize it.

    We now remove this sub-optimization. This change also makes it
    possible to clean up the function tipc_build_proto_msg() and remove
    a couple of small functions that were accessing the mentioned header.
    In particular, we can replace all occurrences of the local function
    call link_own_addr(link) with the generic tipc_own_addr(net).

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Nov, 2015

1 commit

  • In our effort to have less code and include dependencies between
    entities such as node, link and bearer, we try to narrow down
    the exposed interface towards the node as much as possible.

    In this commit, we move the definition of struct tipc_node, along
    with many of its associated function declarations, from node.h to
    node.c. We also move some function definitions from link.c and
    name_distr.c to node.c, since they access fields in struct tipc_node
    that should not be externally visible. The moved functions are renamed
    according to new location, and made static whenever possible.

    There are no functional changes in this commit.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

8 commits

  • After the previous changes in this series, we can now remove some
    unused code and structures, both in the broadcast, link aggregation
    and link code.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have only been supporting a fix MTU size of 1500 bytes
    for all broadcast media, irrespective of their actual capability.

    We now make the broadcast MTU adaptable to the carrying media, i.e.,
    we use the smallest MTU supported by any of the interfaces attached
    to TIPC.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have been keeping track of the exact set of broadcast
    destinations though the help structure tipc_node_map. This leads us to
    have to maintain a whole infrastructure for supporting this, including
    a pseudo-bearer and a number of functions to manipulate both the bearers
    and the node map correctly. Apart from the complexity, this approach is
    also limiting, as struct tipc_node_map only can support cluster local
    broadcast if we want to avoid it becoming excessively large. We want to
    eliminate this limitation, in order to enable introduction of scoped
    multicast in the future.

    A closer analysis reveals that it is unnecessary maintaining this "full
    set" overview; it is sufficient to keep a counter per bearer, indicating
    how many nodes can be reached via this bearer at the moment. The protocol
    is now robust enough to handle transitional discrepancies between the
    nominal number of reachable destinations, as expected by the broadcast
    protocol itself, and the number which is actually reachable at the
    moment. The initial broadcast synchronization, in conjunction with the
    retransmission mechanism, ensures that all packets will eventually be
    acknowledged by the correct set of destinations.

    This commit introduces these changes.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The code path for receiving broadcast packets is currently distinct
    from the unicast path. This leads to unnecessary code and data
    duplication, something that can be avoided with some effort.

    We now introduce separate per-peer tipc_link instances for handling
    broadcast packet reception. Each receive link keeps a pointer to the
    common, single, broadcast link instance, and can hence handle release
    and retransmission of send buffers as if they belonged to the own
    instance.

    Furthermore, we let each unicast link instance keep a reference to both
    the pertaining broadcast receive link, and to the common send link.
    This makes it possible for the unicast links to easily access data for
    broadcast link synchronization, as well as for carrying acknowledges for
    received broadcast packets.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast link instance (struct tipc_link) used for sending is
    currently aggregated into struct tipc_bclink. This means that we cannot
    use the regular tipc_link_create() function for initiating the link, but
    do instead have to initiate numerous fields directly from the
    bcast_init() function.

    We want to reduce dependencies between the broadcast functionality
    and the inner workings of tipc_link. In this commit, we introduce
    a new function tipc_bclink_create() to link.c, and allocate the
    instance of the link separately using this function.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast transmission link is currently instantiated when the
    network subsystem is started, i.e., on order from user space via netlink.

    This forces the broadcast transmission code to do unnecessary tests for
    the existence of the transmission link, as well in single mode node as
    in network mode.

    In this commit, we do instead create the link during initialization of
    the name space, and remove it when it is stopped. The fact that the
    transmission link now has a guaranteed longer life cycle than any of its
    potential clients paves the way for further code simplifcations
    and optimizations.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast lock will need to be acquired outside bcast.c in a later
    commit. For this reason, we move the lock to struct tipc_net. Consistent
    with the changes in the previous commit, we also introducee two new
    functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
    currently using tipc_bclink_lock()/unlock() will be phased out during
    the coming commits in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently, a number of structure and function definitions related
    to the broadcast functionality are unnecessarily exposed in the file
    bcast.h. This obscures the fact that the external interface towards
    the broadcast link in fact is very narrow, and causes unnecessary
    recompilations of other files when anything changes in those
    definitions.

    In this commit, we move as many of those definitions as is currently
    possible to the file bcast.c.

    We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
    since the name does not correctly describe the contents of this
    struct, and will do so even less in the future, and because we want
    to use the term 'link' more appropriately in the functionality
    introduced later in this series.

    Finally, we rename a couple of functions, such as tipc_bclink_xmit()
    and others that will be kept in the future, to include the term 'bcast'
    instead.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2015

1 commit

  • We convert packet/message reception according to the same principle
    we have been using for message sending and timeout handling:

    We move the function tipc_rcv() to node.c, hence handling the initial
    packet reception at the link aggregation level. The function grabs
    the node lock, selects the receiving link, and accesses it via a new
    call tipc_link_rcv(). This function appends buffers to the input
    queue for delivery upwards, but it may also append outgoing packets
    to the xmit queue, just as we do during regular message sending. The
    latter will happen when buffers are forwarded from the link backlog,
    or when retransmission is requested.

    Upon return of this function, and after having released the node lock,
    tipc_rcv() delivers/tranmsits the contents of those queues, but it may
    also perform actions such as link activation or reset, as indicated by
    the return flags from the link.

    This reduces the number of cpu cycles spent inside the node spinlock,
    and reduces contention on that lock.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

10 May, 2015

1 commit

  • Add the ability to get or set the broadcast link window through the
    new netlink API. The functionality was unintentionally missing from
    the new netlink API. Adding this means that we also fix the breakage
    in the old API when coming through the compat layer.

    Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat)
    Reported-by: Tomi Ollila
    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

30 Mar, 2015

1 commit

  • [ 60.988363] ======================================================
    [ 60.988754] [ INFO: possible circular locking dependency detected ]
    [ 60.989152] 3.19.0+ #194 Not tainted
    [ 60.989377] -------------------------------------------------------
    [ 60.989781] swapper/3/0 is trying to acquire lock:
    [ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [] tipc_link_retransmit+0x1aa/0x240 [tipc]
    [ 60.990743]
    [ 60.990743] but task is already holding lock:
    [ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
    [ 60.991738]
    [ 60.991738] which lock already depends on the new lock.
    [ 60.991738]
    [ 60.992174]
    [ 60.992174] the existing dependency chain (in reverse order) is:
    [ 60.992174]
    -> #1 (&(&bclink->lock)->rlock){+.-...}:
    [ 60.992174] [] lock_acquire+0x9c/0x140
    [ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
    [ 60.992174] [] tipc_bclink_lock+0x8e/0xa0 [tipc]
    [ 60.992174] [] tipc_bclink_add_node+0x97/0xf0 [tipc]
    [ 60.992174] [] tipc_node_link_up+0xf5/0x110 [tipc]
    [ 60.992174] [] link_state_event+0x2b3/0x4f0 [tipc]
    [ 60.992174] [] tipc_link_proto_rcv+0x24c/0x418 [tipc]
    [ 60.992174] [] tipc_rcv+0x827/0xac0 [tipc]
    [ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
    [ 60.992174] [] __netif_receive_skb_core+0x746/0x980
    [ 60.992174] [] __netif_receive_skb+0x21/0x70
    [ 60.992174] [] netif_receive_skb_internal+0x35/0x130
    [ 60.992174] [] napi_gro_receive+0x158/0x1d0
    [ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
    [ 60.992174] [] e1000_clean+0x267/0x990
    [ 60.992174] [] net_rx_action+0x150/0x360
    [ 60.992174] [] __do_softirq+0x123/0x360
    [ 60.992174] [] irq_exit+0x8e/0xb0
    [ 60.992174] [] do_IRQ+0x65/0x110
    [ 60.992174] [] ret_from_intr+0x0/0x13
    [ 60.992174] [] arch_cpu_idle+0xf/0x20
    [ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
    [ 60.992174] [] start_secondary+0x13a/0x150
    [ 60.992174]
    -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
    [ 60.992174] [] __lock_acquire+0x163d/0x1ca0
    [ 60.992174] [] lock_acquire+0x9c/0x140
    [ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
    [ 60.992174] [] tipc_link_retransmit+0x1aa/0x240 [tipc]
    [ 60.992174] [] tipc_bclink_rcv+0x611/0x640 [tipc]
    [ 60.992174] [] tipc_rcv+0x616/0xac0 [tipc]
    [ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
    [ 60.992174] [] __netif_receive_skb_core+0x746/0x980
    [ 60.992174] [] __netif_receive_skb+0x21/0x70
    [ 60.992174] [] netif_receive_skb_internal+0x35/0x130
    [ 60.992174] [] napi_gro_receive+0x158/0x1d0
    [ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
    [ 60.992174] [] e1000_clean+0x267/0x990
    [ 60.992174] [] net_rx_action+0x150/0x360
    [ 60.992174] [] __do_softirq+0x123/0x360
    [ 60.992174] [] irq_exit+0x8e/0xb0
    [ 60.992174] [] do_IRQ+0x65/0x110
    [ 60.992174] [] ret_from_intr+0x0/0x13
    [ 60.992174] [] arch_cpu_idle+0xf/0x20
    [ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
    [ 60.992174] [] start_secondary+0x13a/0x150
    [ 60.992174]
    [ 60.992174] other info that might help us debug this:
    [ 60.992174]
    [ 60.992174] Possible unsafe locking scenario:
    [ 60.992174]
    [ 60.992174] CPU0 CPU1
    [ 60.992174] ---- ----
    [ 60.992174] lock(&(&bclink->lock)->rlock);
    [ 60.992174] lock(&(&n_ptr->lock)->rlock);
    [ 60.992174] lock(&(&bclink->lock)->rlock);
    [ 60.992174] lock(&(&n_ptr->lock)->rlock);
    [ 60.992174]
    [ 60.992174] *** DEADLOCK ***
    [ 60.992174]
    [ 60.992174] 3 locks held by swapper/3/0:
    [ 60.992174] #0: (rcu_read_lock){......}, at: [] __netif_receive_skb_core+0x71/0x980
    [ 60.992174] #1: (rcu_read_lock){......}, at: [] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
    [ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
    [ 60.992174]

    The correct the sequence of grabbing n_ptr->lock and bclink->lock
    should be that the former is first held and the latter is then taken,
    which exactly happened on CPU1. But especially when the retransmission
    of broadcast link is failed, bclink->lock is first held in
    tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
    called by tipc_link_retransmit() subsequently, which is demonstrated on
    CPU0. As a result, deadlock occurs.

    If the order of holding the two locks happening on CPU0 is reversed, the
    deadlock risk will be relieved. Therefore, the node lock taken in
    link_retransmit_failure() originally is moved to tipc_bclink_rcv()
    so that it's obtained before bclink lock. But the precondition of
    the adjustment of node lock is that responding to bclink reset event
    must be moved from tipc_bclink_unlock() to tipc_node_unlock().

    Reviewed-by: Erik Hugne
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

10 Feb, 2015

1 commit

  • Add functionality for safely appending string data to a TLV without
    keeping write count in the caller.

    Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     

06 Feb, 2015

3 commits

  • In a previous commit in this series we resolved a race problem during
    unicast message reception.

    Here, we resolve the same problem at multicast reception. We apply the
    same technique: an input queue serializing the delivery of arriving
    buffers. The main difference is that here we do it in two steps.
    First, the broadcast link feeds arriving buffers into the tail of an
    arrival queue, which head is consumed at the socket level, and where
    destination lookup is performed. Second, if the lookup is successful,
    the resulting buffer clones are fed into a second queue, the input
    queue. This queue is consumed at reception in the socket just like
    in the unicast case. Both queues are protected by the same lock, -the
    one of the input queue.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The structure 'tipc_port_list' is used to collect port numbers
    representing multicast destination socket on a receiving node.
    The list is not based on a standard linked list, and is in reality
    optimized for the uncommon case that there are more than one
    multicast destinations per node. This makes the list handling
    unecessarily complex, and as a consequence, even the socket
    multicast reception becomes more complex.

    In this commit, we replace 'tipc_port_list' with a new 'struct
    tipc_plist', which is based on a standard list. We give the new
    list stack (push/pop) semantics, someting that simplifies
    the implementation of the function tipc_sk_mcast_rcv().

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The most common usage of namespace information is when we fetch the
    own node addess from the net structure. This leads to a lot of
    passing around of a parameter of type 'struct net *' between
    functions just to make them able to obtain this address.

    However, in many cases this is unnecessary. The own node address
    is readily available as a member of both struct tipc_sock and
    tipc_link, and can be fetched from there instead.
    The fact that the vast majority of functions in socket.c and link.c
    anyway are maintaining a pointer to their respective base structures
    makes this option even more compelling.

    In this commit, we introduce the inline functions tsk_own_node()
    and link_own_node() to make it easy for functions to fetch the node
    address from those structs instead of having to pass along and
    dereference the namespace struct.

    In particular, we make calls to the msg_xx() functions in msg.{h,c}
    context independent by directly passing them the own node address
    as parameter when needed. Those functions should be regarded as
    leaves in the code dependency tree, and it is hence desirable to
    keep them namspace unaware.

    Apart from a potential positive effect on cache behavior, these
    changes make it easier to introduce the changes that will follow
    later in this series.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

13 Jan, 2015

4 commits

  • TIPC broadcast link is statically established and its relevant states
    are maintained with the global variables: "bcbearer", "bclink" and
    "bcl". Allowing different namespace to own different broadcast link
    instances, these variables must be moved to tipc_net structure and
    broadcast link instances would be allocated and initialized when
    namespace is created.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Bearer list defined as a global variable is used to store bearer
    instances. When tipc supports net namespace, bearers created in
    one namespace must be isolated with others allocated in other
    namespaces, which requires us that the bearer list(bearer_list)
    must be moved to tipc_net structure. As a result, a net namespace
    pointer has to be passed to functions which access the bearer list.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

27 Nov, 2014

1 commit


22 Nov, 2014

1 commit

  • Add TIPC_NL_LINK_GET command to the new tipc netlink API.

    This command supports dumping all information about all links
    (including the broadcast link) or getting all information about a
    specific link (not the broadcast link).

    The information about a link includes name, transmission info,
    properties and link statistics.

    As the tipc broadcast link is special we unfortunately have to treat
    it specially. It is a deliberate decision not to abstract the
    broadcast link on this (API) level.

    Netlink logical layout of link response message:
    -> port
    -> name
    -> MTU
    -> RX
    -> TX
    -> up flag
    -> active flag
    -> properties
    -> priority
    -> tolerance
    -> window
    -> statistics
    -> rx_info
    -> rx_fragments
    -> rx_fragmented
    -> rx_bundles
    -> rx_bundled
    -> tx_info
    -> tx_fragments
    -> tx_fragmented
    -> tx_bundles
    -> tx_bundled
    -> msg_prof_tot
    -> msg_len_cnt
    -> msg_len_tot
    -> msg_len_p0
    -> msg_len_p1
    -> msg_len_p2
    -> msg_len_p3
    -> msg_len_p4
    -> msg_len_p5
    -> msg_len_p6
    -> rx_states
    -> rx_probes
    -> rx_nacks
    -> rx_deferred
    -> tx_states
    -> tx_probes
    -> tx_nacks
    -> tx_acks
    -> retransmitted
    -> duplicates
    -> link_congs
    -> max_queue
    -> avg_queue

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

08 Oct, 2014

1 commit

  • One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc:
    use pseudo message to wake up sockets after link congestion") was
    to handle link congestion abatement in a uniform way for both unicast
    and multicast transmit. However, the latter doesn't work correctly,
    and has been broken since the referenced commit was applied.

    If a user now sends a burst of multicast messages that is big
    enough to cause broadcast link congestion, it will be put to sleep,
    and not be waked up when the congestion abates as it should be.

    This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
    is set correctly, but in the wrong field. Instead of setting it in the
    'action_flags' field of the arrival node struct, it is by mistake set
    in the dummy node struct that is owned by the broadcast link, where it
    will never tested for. Second, we cannot use the same flag for waking
    up unicast and multicast users, since the function tipc_node_unlock()
    needs to pick the wakeup pseudo messages to deliver from different
    queues. It must hence be able to distinguish between the two cases.

    This commit solves this problem by adding a new flag
    TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
    The latter is to be called by tipc_node_unlock() when the named flag,
    now set in the correct field, is encountered.

    v2: using explicit 'unsigned int' declaration instead of 'uint', as
    per comment from David Miller.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Jul, 2014

3 commits

  • After the previous commit, we can now give the functions with temporary
    names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
    names.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We can now remove a number of functions which have become obsolete
    and unreferenced through this commit series. There are no functional
    changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We add a new broadcast link transmit function in bclink.c and a new
    receive function in socket.c. The purpose is to move the branching
    between external and internal destination down to the link layer,
    just as we have done with unicast in earlier commits. We also make
    use of the new link-independent fragmentation support that was
    introduced in an earlier commit series.

    This gives a shorter and simpler code path, and makes it possible
    to obtain copy-free buffer delivery to all node local destination
    sockets.

    The new transmission code is added in parallel with the existing one,
    and will be used by the socket multicast send function in the next
    commit in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 May, 2014

2 commits


23 Apr, 2014

1 commit

  • The node map variable - 'nodes' in bearer structure is only used by
    bclink. When bclink accesses it, bc_lock is held. But when change it,
    for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest()
    the bc_lock is not taken at all. To avoid any inconsistent data, we
    should always grab bc_lock while accessing node map variable.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

19 Feb, 2014

1 commit

  • Rename the following functions, which are shorter and more in line
    with common naming practice in the network subsystem.

    tipc_bclink_send_msg->tipc_bclink_xmit
    tipc_bclink_recv_pkt->tipc_bclink_rcv
    tipc_disc_recv_msg->tipc_disc_rcv
    tipc_link_send_proto_msg->tipc_link_proto_xmit
    link_recv_proto_msg->tipc_link_proto_rcv
    link_send_sections_long->tipc_link_iovec_long_xmit
    tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
    tipc_link_send_sync->tipc_link_sync_xmit
    tipc_link_recv_sync->tipc_link_sync_rcv
    tipc_link_send_buf->__tipc_link_xmit
    tipc_link_send->tipc_link_xmit
    tipc_link_send_names->tipc_link_names_xmit
    tipc_named_recv->tipc_named_rcv
    tipc_link_recv_bundle->tipc_link_bundle_rcv
    tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
    link_send_long_buf->tipc_link_frag_xmit

    tipc_multicast->tipc_port_mcast_xmit
    tipc_port_recv_mcast->tipc_port_mcast_rcv
    tipc_port_reject_sections->tipc_port_iovec_reject
    tipc_port_recv_proto_msg->tipc_port_proto_rcv
    tipc_connect->tipc_port_connect
    __tipc_connect->__tipc_port_connect
    __tipc_disconnect->__tipc_port_disconnect
    tipc_disconnect->tipc_port_disconnect
    tipc_shutdown->tipc_port_shutdown
    tipc_port_recv_msg->tipc_port_rcv
    tipc_port_recv_sections->tipc_port_iovec_rcv

    release->tipc_release
    accept->tipc_accept
    bind->tipc_bind
    get_name->tipc_getname
    poll->tipc_poll
    send_msg->tipc_sendmsg
    send_packet->tipc_send_packet
    send_stream->tipc_send_stream
    recv_msg->tipc_recvmsg
    recv_stream->tipc_recv_stream
    connect->tipc_connect
    listen->tipc_listen
    shutdown->tipc_shutdown
    setsockopt->tipc_setsockopt
    getsockopt->tipc_getsockopt

    Above changes have no impact on current users of the functions.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

18 Jun, 2013

1 commit


01 May, 2012

1 commit

  • Some of the comment blocks are floating in limbo between two
    functions, or between blocks of code. Delete the extra line
    feeds between any comment and its associated following block
    of code, to be consistent with the majority of the rest of
    the kernel. Also delete trailing newlines at EOF and fix
    a couple trivial typos in existing comments.

    This is a 100% cosmetic change with no runtime impact. We get
    rid of over 500 lines of non-code, and being blank line deletes,
    they won't even show up as noise in git blame.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

07 Feb, 2012

1 commit

  • Completely redesigns broadcast link ACK and NACK mechanisms to prevent
    spurious retransmit requests in dual LAN networks, and to prevent the
    broadcast link from stalling due to the failure of a receiving node to
    acknowledge receiving a broadcast message or request its retransmission.

    Note: These changes only impact the timing of when ACK and NACK messages
    are sent, and not the basic broadcast link protocol itself, so inter-
    operability with nodes using the "classic" algorithms is maintained.

    The revised algorithms are as follows:

    1) An explicit ACK message is still sent after receiving 16 in-sequence
    messages, and implicit ACK information continues to be carried in other
    unicast link message headers (including link state messages). However,
    the timing of explicit ACKs is now based on the receiving node's absolute
    network address rather than its relative network address to ensure that
    the failure of another node does not delay the ACK beyond its 16 message
    target.

    2) A NACK message is now typically sent only when a message gap persists
    for two consecutive incoming link state messages; this ensures that a
    suspected gap is not confirmed until both LANs in a dual LAN network have
    had an opportunity to deliver the message, thereby preventing spurious NACKs.
    A NACK message can also be generated by the arrival of a single link state
    message, if the deferred queue is so big that the current message gap
    cannot be the result of "normal" mis-ordering due to the use of dual LANs
    (or one LAN using a bonded interface). Since link state messages typically
    arrive at different nodes at different times the problem of multiple nodes
    issuing identical NACKs simultaneously is inherently avoided.

    3) Nodes continue to "peek" at NACK messages sent by other nodes. If
    another node requests retransmission of a message gap suspected (but not
    yet confirmed) by the peeking node, the peeking node forgets about the
    gap and does not generate a duplicate retransmit request. (If the peeking
    node subsequently fails to receive the lost message, later link state
    messages will cause it to rediscover and confirm the gap and send another
    NACK.)

    4) Message gap "equality" is now determined by the start of the gap only.
    This is sufficient to deal with the most common cases of message loss,
    and eliminates the need for complex end of gap computations.

    5) A peeking node no longer tries to determine whether it should send a
    complementary NACK, since the most common cases of message loss don't
    require it to be sent. Consequently, the node no longer examines the
    "broadcast tag" field of a NACK message when peeking.

    Signed-off-by: Allan Stephens
    Signed-off-by: Paul Gortmaker

    Allan Stephens
     

30 Dec, 2011

1 commit