Eric Lee / smarc-fsl-linux-kernel

21 Jan, 2017

3 commits

01fd12bb1 tipc: make replicast a user selectable option ... Browse Code »

If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.

In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.

In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.

Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-21 01:10:17 +0800
a853e4c6d tipc: introduce replicast as transport option for multicast ... Browse Code »

TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.

Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-21 01:10:17 +0800
2ae0b8af1 tipc: add functionality to lookup multicast destination nodes ... Browse Code »

As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.

Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-21 01:10:16 +0800

30 Oct, 2016

1 commit

06bd2b1ed tipc: fix broadcast link synchronization problem ... Browse Code »

In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson
Tested-by: John Thompson
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-10-30 05:21:09 +0800

03 Sep, 2016

1 commit

02d11ca20 tipc: transfer broadcast nacks in link state messages ... Browse Code »

When we send broadcasts in clusters of more 70-80 nodes, we sometimes
see the broadcast link resetting because of an excessive number of
retransmissions. This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
and NACK'ed by several nodes simultaneously, leading to multiple
redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of
broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
bits in word 8 of the said message for this purpose, and introduce a
new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
backwards compatible.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-09-03 08:10:24 +0800

07 Mar, 2016

1 commit

e74a386d7 tipc: remove pre-allocated message header in link struct ... Browse Code »

Until now, we have kept a pre-allocated protocol message header
aggregated into struct tipc_link. Apart from adding unnecessary
footprint to the link instances, this requires extra code both to
initialize and re-initialize it.

We now remove this sub-optimization. This change also makes it
possible to clean up the function tipc_build_proto_msg() and remove
a couple of small functions that were accessing the mentioned header.
In particular, we can replace all occurrences of the local function
call link_own_addr(link) with the generic tipc_own_addr(net).

Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-03-07 12:01:20 +0800

21 Nov, 2015

1 commit

5be9c0867 tipc: narrow down exposure of struct tipc_node ... Browse Code »

In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.

In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.

There are no functional changes in this commit.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-11-21 03:06:10 +0800

24 Oct, 2015

8 commits

2af5ae372 tipc: clean up unused code and structures ... Browse Code »

After the previous changes in this series, we can now remove some
unused code and structures, both in the broadcast, link aggregation
and link code.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:47 +0800
959e1781a tipc: introduce jumbo frame support for broadcast ... Browse Code »

Until now, we have only been supporting a fix MTU size of 1500 bytes
for all broadcast media, irrespective of their actual capability.

We now make the broadcast MTU adaptable to the carrying media, i.e.,
we use the smallest MTU supported by any of the interfaces attached
to TIPC.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:40 +0800
b06b281e7 tipc: simplify bearer level broadcast ... Browse Code »

Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.

A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.

This commit introduces these changes.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:39 +0800
526669866 tipc: let broadcast packet reception use new link receive function ... Browse Code »

The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.

We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.

Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:37 +0800
323019069 tipc: use explicit allocation of broadcast send link ... Browse Code »

The broadcast link instance (struct tipc_link) used for sending is
currently aggregated into struct tipc_bclink. This means that we cannot
use the regular tipc_link_create() function for initiating the link, but
do instead have to initiate numerous fields directly from the
bcast_init() function.

We want to reduce dependencies between the broadcast functionality
and the inner workings of tipc_link. In this commit, we introduce
a new function tipc_bclink_create() to link.c, and allocate the
instance of the link separately using this function.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:30 +0800
5fd9fd635 tipc: create broadcast transmission link at namespace init ... Browse Code »

The broadcast transmission link is currently instantiated when the
network subsystem is started, i.e., on order from user space via netlink.

This forces the broadcast transmission code to do unnecessary tests for
the existence of the transmission link, as well in single mode node as
in network mode.

In this commit, we do instead create the link during initialization of
the name space, and remove it when it is stopped. The fact that the
transmission link now has a guaranteed longer life cycle than any of its
potential clients paves the way for further code simplifcations
and optimizations.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:27 +0800
0043550b0 tipc: move broadcast link lock to struct tipc_net ... Browse Code »

The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:25 +0800
6beb19a62 tipc: move bcast definitions to bcast.c ... Browse Code »

Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.

In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.

We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.

Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:24 +0800

21 Jul, 2015

1 commit

d999297c3 tipc: reduce locking scope during packet reception ... Browse Code »

We convert packet/message reception according to the same principle
we have been using for message sending and timeout handling:

We move the function tipc_rcv() to node.c, hence handling the initial
packet reception at the link aggregation level. The function grabs
the node lock, selects the receiving link, and accesses it via a new
call tipc_link_rcv(). This function appends buffers to the input
queue for delivery upwards, but it may also append outgoing packets
to the xmit queue, just as we do during regular message sending. The
latter will happen when buffers are forwarded from the link backlog,
or when retransmission is requested.

Upon return of this function, and after having released the node lock,
tipc_rcv() delivers/tranmsits the contents of those queues, but it may
also perform actions such as link activation or reset, as indicated by
the return flags from the link.

This reduces the number of cpu cycles spent inside the node spinlock,
and reduces contention on that lock.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:16 +0800

10 May, 2015

1 commit

670f4f881 tipc: add broadcast link window set/get to nl api ... Browse Code »

Add the ability to get or set the broadcast link window through the
new netlink API. The functionality was unintentionally missing from
the new netlink API. Adding this means that we also fix the breakage
in the old API when coming through the compat layer.

Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat)
Reported-by: Tomi Ollila
Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2015-05-10 04:40:02 +0800

30 Mar, 2015

1 commit

b952b2bef tipc: fix potential deadlock when all links are reset ... Browse Code »

[ 60.988363] ======================================================
[ 60.988754] [ INFO: possible circular locking dependency detected ]
[ 60.989152] 3.19.0+ #194 Not tainted
[ 60.989377] -------------------------------------------------------
[ 60.989781] swapper/3/0 is trying to acquire lock:
[ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.990743]
[ 60.990743] but task is already holding lock:
[ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.991738]
[ 60.991738] which lock already depends on the new lock.
[ 60.991738]
[ 60.992174]
[ 60.992174] the existing dependency chain (in reverse order) is:
[ 60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[ 60.992174] [] lock_acquire+0x9c/0x140
[ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174] [] tipc_bclink_add_node+0x97/0xf0 [tipc]
[ 60.992174] [] tipc_node_link_up+0xf5/0x110 [tipc]
[ 60.992174] [] link_state_event+0x2b3/0x4f0 [tipc]
[ 60.992174] [] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[ 60.992174] [] tipc_rcv+0x827/0xac0 [tipc]
[ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [] __netif_receive_skb+0x21/0x70
[ 60.992174] [] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [] napi_gro_receive+0x158/0x1d0
[ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [] e1000_clean+0x267/0x990
[ 60.992174] [] net_rx_action+0x150/0x360
[ 60.992174] [] __do_softirq+0x123/0x360
[ 60.992174] [] irq_exit+0x8e/0xb0
[ 60.992174] [] do_IRQ+0x65/0x110
[ 60.992174] [] ret_from_intr+0x0/0x13
[ 60.992174] [] arch_cpu_idle+0xf/0x20
[ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [] start_secondary+0x13a/0x150
[ 60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[ 60.992174] [] __lock_acquire+0x163d/0x1ca0
[ 60.992174] [] lock_acquire+0x9c/0x140
[ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.992174] [] tipc_bclink_rcv+0x611/0x640 [tipc]
[ 60.992174] [] tipc_rcv+0x616/0xac0 [tipc]
[ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [] __netif_receive_skb+0x21/0x70
[ 60.992174] [] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [] napi_gro_receive+0x158/0x1d0
[ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [] e1000_clean+0x267/0x990
[ 60.992174] [] net_rx_action+0x150/0x360
[ 60.992174] [] __do_softirq+0x123/0x360
[ 60.992174] [] irq_exit+0x8e/0xb0
[ 60.992174] [] do_IRQ+0x65/0x110
[ 60.992174] [] ret_from_intr+0x0/0x13
[ 60.992174] [] arch_cpu_idle+0xf/0x20
[ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [] start_secondary+0x13a/0x150
[ 60.992174]
[ 60.992174] other info that might help us debug this:
[ 60.992174]
[ 60.992174] Possible unsafe locking scenario:
[ 60.992174]
[ 60.992174] CPU0 CPU1
[ 60.992174] ---- ----
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174]
[ 60.992174] *** DEADLOCK ***
[ 60.992174]
[ 60.992174] 3 locks held by swapper/3/0:
[ 60.992174] #0: (rcu_read_lock){......}, at: [] __netif_receive_skb_core+0x71/0x980
[ 60.992174] #1: (rcu_read_lock){......}, at: [] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-03-30 03:40:27 +0800

10 Feb, 2015

1 commit

f2b3b2d4c tipc: convert legacy nl link stat to nl compat ... Browse Code »

Add functionality for safely appending string data to a TLV without
keeping write count in the caller.

Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.

Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Richard Alpe
2015-02-10 05:20:47 +0800

06 Feb, 2015

3 commits

cb1b72809 tipc: eliminate race condition at multicast reception ... Browse Code »

In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-02-06 08:00:03 +0800
3c724acdd tipc: simplify socket multicast reception ... Browse Code »

The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-02-06 08:00:03 +0800
c5898636c tipc: reduce usage of context info in socket and link ... Browse Code »

The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-02-06 08:00:01 +0800

13 Jan, 2015

4 commits

1da465683 tipc: make tipc broadcast link support net namespace ... Browse Code »

TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
7f9f95d9d tipc: make bearer list support net namespace ... Browse Code »

Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800
f2f9800d4 tipc: make tipc node table aware of net namespace ... Browse Code »

Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800
c93d3baa2 tipc: involve namespace infrastructure ... Browse Code »

Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800

27 Nov, 2014

1 commit

a6ca10944 tipc: use generic SKB list APIs to manage TIPC outgoing packet chains ... Browse Code »

Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-11-27 01:30:17 +0800

22 Nov, 2014

1 commit

7be57fc69 tipc: add link get/dump to new netlink api ... Browse Code »

Add TIPC_NL_LINK_GET command to the new tipc netlink API.

This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).

The information about a link includes name, transmission info,
properties and link statistics.

As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.

Netlink logical layout of link response message:
-> port
-> name
-> MTU
-> RX
-> TX
-> up flag
-> active flag
-> properties
-> priority
-> tolerance
-> window
-> statistics
-> rx_info
-> rx_fragments
-> rx_fragmented
-> rx_bundles
-> rx_bundled
-> tx_info
-> tx_fragments
-> tx_fragmented
-> tx_bundles
-> tx_bundled
-> msg_prof_tot
-> msg_len_cnt
-> msg_len_tot
-> msg_len_p0
-> msg_len_p1
-> msg_len_p2
-> msg_len_p3
-> msg_len_p4
-> msg_len_p5
-> msg_len_p6
-> rx_states
-> rx_probes
-> rx_nacks
-> rx_deferred
-> tx_states
-> tx_probes
-> tx_nacks
-> tx_acks
-> retransmitted
-> duplicates
-> link_congs
-> max_queue
-> avg_queue

Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2014-11-22 04:01:30 +0800

08 Oct, 2014

1 commit

908344cdd tipc: fix bug in multicast congestion handling ... Browse Code »

One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.

If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.

This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.

This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.

v2: using explicit 'unsigned int' declaration instead of 'uint', as
per comment from David Miller.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Maloy
2014-10-08 02:50:15 +0800

17 Jul, 2014

3 commits

9fbfb8b12 tipc: rename temporarily named functions ... Browse Code »

After the previous commit, we can now give the functions with temporary
names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
names.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-07-17 12:38:19 +0800
c4116e105 tipc: remove unreferenced functions ... Browse Code »

We can now remove a number of functions which have become obsolete
and unreferenced through this commit series. There are no functional
changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-07-17 12:38:19 +0800
078bec826 tipc: add new functions for multicast and broadcast distribution ... Browse Code »

We add a new broadcast link transmit function in bclink.c and a new
receive function in socket.c. The purpose is to move the branching
between external and internal destination down to the link layer,
just as we have done with unicast in earlier commits. We also make
use of the new link-independent fragmentation support that was
introduced in an earlier commit series.

This gives a shorter and simpler code path, and makes it possible
to obtain copy-free buffer delivery to all node local destination
sockets.

The new transmission code is added in parallel with the existing one,
and will be used by the socket multicast send function in the next
commit in this series.

Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-07-17 12:38:18 +0800

06 May, 2014

2 commits

3f5a12bd9 tipc: avoid to asynchronously reset all links ... Browse Code »

Postpone the actions of resetting all links until after bclink
lock is released, avoiding to asynchronously reset all links.

Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-05-06 05:26:45 +0800
eb8b00f5f tipc: convert allocations of global variables associated with bclink ... Browse Code »

Convert allocations of global variables associated with bclink from
static way to dynamical way for the convenience of bclink instance
initialisation. Meanwhile, this also helps TIPC support name space
in the future easily.

Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-05-06 05:26:45 +0800

23 Apr, 2014

1 commit

28dd94187 tipc: use bc_lock to protect node map in bearer structure ... Browse Code »

The node map variable - 'nodes' in bearer structure is only used by
bclink. When bclink accesses it, bc_lock is held. But when change it,
for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest()
the bc_lock is not taken at all. To avoid any inconsistent data, we
should always grab bc_lock while accessing node map variable.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Reviewed-by: Erik Hugne
Tested-by: Erik Hugne
Signed-off-by: David S. Miller

Ying Xue
2014-04-23 09:17:53 +0800

19 Feb, 2014

1 commit

247f0f3c3 tipc: align tipc function names with common naming practice in the network ... Browse Code »

Rename the following functions, which are shorter and more in line
with common naming practice in the network subsystem.

tipc_bclink_send_msg->tipc_bclink_xmit
tipc_bclink_recv_pkt->tipc_bclink_rcv
tipc_disc_recv_msg->tipc_disc_rcv
tipc_link_send_proto_msg->tipc_link_proto_xmit
link_recv_proto_msg->tipc_link_proto_rcv
link_send_sections_long->tipc_link_iovec_long_xmit
tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
tipc_link_send_sync->tipc_link_sync_xmit
tipc_link_recv_sync->tipc_link_sync_rcv
tipc_link_send_buf->__tipc_link_xmit
tipc_link_send->tipc_link_xmit
tipc_link_send_names->tipc_link_names_xmit
tipc_named_recv->tipc_named_rcv
tipc_link_recv_bundle->tipc_link_bundle_rcv
tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
link_send_long_buf->tipc_link_frag_xmit

tipc_multicast->tipc_port_mcast_xmit
tipc_port_recv_mcast->tipc_port_mcast_rcv
tipc_port_reject_sections->tipc_port_iovec_reject
tipc_port_recv_proto_msg->tipc_port_proto_rcv
tipc_connect->tipc_port_connect
__tipc_connect->__tipc_port_connect
__tipc_disconnect->__tipc_port_disconnect
tipc_disconnect->tipc_port_disconnect
tipc_shutdown->tipc_port_shutdown
tipc_port_recv_msg->tipc_port_rcv
tipc_port_recv_sections->tipc_port_iovec_rcv

release->tipc_release
accept->tipc_accept
bind->tipc_bind
get_name->tipc_getname
poll->tipc_poll
send_msg->tipc_sendmsg
send_packet->tipc_send_packet
send_stream->tipc_send_stream
recv_msg->tipc_recvmsg
recv_stream->tipc_recv_stream
connect->tipc_connect
listen->tipc_listen
shutdown->tipc_shutdown
setsockopt->tipc_setsockopt
getsockopt->tipc_getsockopt

Above changes have no impact on current users of the functions.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-02-19 06:31:59 +0800

18 Jun, 2013

1 commit

ae8509c42 tipc: cosmetic realignment of function arguments ... Browse Code »

No runtime code changes here. Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.

Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Paul Gortmaker
2013-06-18 06:53:01 +0800

01 May, 2012

1 commit

617d3c7a5 tipc: compress out gratuitous extra carriage returns ... Browse Code »

Some of the comment blocks are floating in limbo between two
functions, or between blocks of code. Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel. Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact. We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-05-01 03:53:56 +0800

07 Feb, 2012

1 commit

7a54d4a99 tipc: Major redesign of broadcast link ACK/NACK algorithms ... Browse Code »

Completely redesigns broadcast link ACK and NACK mechanisms to prevent
spurious retransmit requests in dual LAN networks, and to prevent the
broadcast link from stalling due to the failure of a receiving node to
acknowledge receiving a broadcast message or request its retransmission.

Note: These changes only impact the timing of when ACK and NACK messages
are sent, and not the basic broadcast link protocol itself, so inter-
operability with nodes using the "classic" algorithms is maintained.

The revised algorithms are as follows:

1) An explicit ACK message is still sent after receiving 16 in-sequence
messages, and implicit ACK information continues to be carried in other
unicast link message headers (including link state messages). However,
the timing of explicit ACKs is now based on the receiving node's absolute
network address rather than its relative network address to ensure that
the failure of another node does not delay the ACK beyond its 16 message
target.

2) A NACK message is now typically sent only when a message gap persists
for two consecutive incoming link state messages; this ensures that a
suspected gap is not confirmed until both LANs in a dual LAN network have
had an opportunity to deliver the message, thereby preventing spurious NACKs.
A NACK message can also be generated by the arrival of a single link state
message, if the deferred queue is so big that the current message gap
cannot be the result of "normal" mis-ordering due to the use of dual LANs
(or one LAN using a bonded interface). Since link state messages typically
arrive at different nodes at different times the problem of multiple nodes
issuing identical NACKs simultaneously is inherently avoided.

3) Nodes continue to "peek" at NACK messages sent by other nodes. If
another node requests retransmission of a message gap suspected (but not
yet confirmed) by the peeking node, the peeking node forgets about the
gap and does not generate a duplicate retransmit request. (If the peeking
node subsequently fails to receive the lost message, later link state
messages will cause it to rediscover and confirm the gap and send another
NACK.)

4) Message gap "equality" is now determined by the start of the gap only.
This is sufficient to deal with the most common cases of message loss,
and eliminates the need for complex end of gap computations.

5) A peeking node no longer tries to determine whether it should send a
complementary NACK, since the most common cases of message loss don't
require it to be sent. Consequently, the node no longer examines the
"broadcast tag" field of a NACK message when peeking.

Signed-off-by: Allan Stephens
Signed-off-by: Paul Gortmaker

Allan Stephens
2012-02-07 05:59:18 +0800

30 Dec, 2011

1 commit

4584310b4 tipc: rename struct port_list to struct tipc_port_list ... Browse Code »

Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-12-30 10:53:29 +0800