20 Feb, 2016
1 commit
-
tipc_bcast_unlock need to be unlocked in error path.
Signed-off-by: Insu Yun
Signed-off-by: David S. Miller
17 Feb, 2016
1 commit
-
In commit 5266698661401a ("tipc: let broadcast packet reception
use new link receive function") we introduced a new per-node
broadcast reception link instance. This link is created at the
moment the node itself is created. Unfortunately, the allocation
is done after the node instance has already been added to the node
lookup hash table. This creates a potential race condition, where
arriving broadcast packets are able to find and access the node
before it has been fully initialized, and before the above mentioned
link has been created. The result is occasional crashes in the function
tipc_bcast_rcv(), which is trying to access the not-yet existing link.We fix this by deferring the addition of the node instance until after
it has been fully initialized in the function tipc_node_create().Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
30 Jan, 2016
1 commit
-
In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")Reviewed-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
26 Dec, 2015
1 commit
-
By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller
04 Dec, 2015
2 commits
-
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.cAll three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller
-
Commit 5405ff6e15f40f2f ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.This commit fixes this bug.
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
01 Dec, 2015
1 commit
-
The memory barrier in the helper wq_has_sleeper is needed by just
about every user of waitqueue_active. This patch generalises it
by making it take a wait_queue_head_t directly. The existing
helper is renamed to skwq_has_sleeper.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
25 Nov, 2015
1 commit
-
Coverity says:
*** CID 1338065: Error handling issues (CHECKED_RETURN)
/net/tipc/udp_media.c: 162 in tipc_udp_send_msg()
156 struct udp_media_addr *dst = (struct udp_media_addr *)&dest->value;
157 struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value;
158 struct sk_buff *clone;
159 struct rtable *rt;
160
161 if (skb_headroom(skb) < UDP_MIN_HEADROOM)
>>> CID 1338065: Error handling issues (CHECKED_RETURN)
>>> Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56 times).
162 pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
163
164 clone = skb_clone(skb, GFP_ATOMIC);
165 skb_set_inner_protocol(clone, htons(ETH_P_TIPC));
166 ub = rcu_dereference_rtnl(b->media_ptr);
167 if (!ub) {When expanding buffer headroom over udp tunnel with pskb_expand_head(),
it's unfortunate that we don't check its return value. As a result, if
the function returns an error code due to the lack of memory, it may
cause unpredictable consequence as we unconditionally consider that
it's always successful.Fixes: e53567948f82 ("tipc: conditionally expand buffer headroom over udp tunnel")
Reported-by:
Cc: Stephen Hemminger
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
24 Nov, 2015
1 commit
-
Even if we drain receive queue thoroughly in tipc_release() after tipc
socket is removed from rhashtable, it is possible that some packets
are in flight because some CPU runs receiver and did rhashtable lookup
before we removed socket. They will achieve receive queue, but nobody
delete them at all. To avoid this leak, we register a private socket
destructor to purge receive queue, meaning releasing packets pending
on receive queue will be delayed until the last reference of tipc
socket will be released.Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
21 Nov, 2015
9 commits
-
Since commit 5266698661401afc5e ("tipc: let broadcast packet
reception use new link receive function") the broadcast send
link state was meant to always be set to LINK_ESTABLISHED, since
we don't need this link to follow the regular link FSM rules. It
was also the intention that this state anyway shouldn't impact
the run-time working state of the link, since the latter in
reality is controlled by the number of registered peers.We have now discovered that this assumption is not quite correct.
If the broadcast link is reset because of too many retransmissions,
its state will inadvertently go to LINK_RESETTING, and never go
back to LINK_ESTABLISHED, because the LINK_FAILURE event was not
anticipated. This will work well once, but if it happens a second
time, the reset on a link in LINK_RESETTING has has no effect, and
neither the broadcast link nor the unicast links will go down as
they should.Furthermore, it is confusing that the management tool shows that
this link is in UP state when that obviously isn't the case.We now ensure that this state strictly follows the true working
state of the link. The state is set to LINK_ESTABLISHED when
the number of peers is non-zero, and to LINK_RESET otherwise.Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
has been significantly reduced over the last couple of years.We now root out the last traces of this practice.
There are no functional changes in this commit.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
We move the definition of struct tipc_link from link.h to link.c in
order to minimize its exposure to the rest of the code.When needed, we define new functions to make it possible for external
entities to access and set data in the link.Apart from the above, there are no functional changes.
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.There are no functional changes in this commit.
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
According to the node FSM a node in state SELF_UP_PEER_UP cannot
change state inside a lock context, except when a TUNNEL_PROTOCOL
(SYNCH or FAILOVER) packet arrives. However, the node's individual
links may still change state.Since each link now is protected by its own spinlock, we finally have
the conditions in place to convert the node spinlock to an rwlock_t.
If the node state and arriving packet type are rigth, we can let the
link directly receive the packet under protection of its own spinlock
and the node lock in read mode. In all other cases we use the node
lock in write mode. This enables full concurrent execution between
parallel links during steady-state traffic situations, i.e., 99+ %
of the time.This commit implements this change.
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a preparation to allow parallel links to work more independently
from each other we introduce a per-link spinlock, to be stored in the
struct nodes's link entry area. Since the node lock still is a regular
spinlock there is no increase in parallellism at this stage.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The function tipc_node_check_state() contains the core logics
for handling link synchronization and failover. For this reason,
it is important to keep it as comprehensible as possible.In this commit, we make three small cleanups.
1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
packet confirms that the peer has lost contact, there will be no
further action in this function. To make this clearer, we return
from the function directly after the state change.2) Since commit 0f8b8e28fb3241f9fd ("tipc: eliminate risk of stalled
link synchronization") only the logically first TUNNEL_PROTO/SYNCH
packet can alter the link state and set the synch point,
independently of arrival order. Hence, there is not any longer any
need to adjust the synch value in case such packets arrive in
disorder. We remove this adjustment.3) It is the intention that any message arriving on any of the links
may trig a check for and possible termination of a node SYNCH state.
A redundant and unnoticed check for tipc_link_is_synching() obviously
beats this purpose, with the effect that only packets arriving on the
synching link may currently end the synch state. We remove this check.
This change will further shorten the synchronization period between
parallel links.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
04 Nov, 2015
1 commit
-
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.Signed-off-by: David S. Miller
02 Nov, 2015
1 commit
-
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.We now linearize all such buffers before they are delivered up to the
generic reception layer.In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().Fixes: commit d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
25 Oct, 2015
1 commit
-
TO: "David S. Miller"
CC: netdev@vger.kernel.org
CC: Jon Maloy
CC: Ying Xue
CC: tipc-discussion@lists.sourceforge.net
CC: linux-kernel@vger.kernel.orgSigned-off-by: Fengguang Wu
Acked-by: Jon Maloy
Signed-off-by: David S. Miller
24 Oct, 2015
17 commits
-
After the previous changes in this series, we can now remove some
unused code and structures, both in the broadcast, link aggregation
and link code.There are no functional changes in this commit.
Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Correct synchronization of the broadcast link at first contact between
two nodes is dependent on the assumption that the binding table "bulk"
update passes via the same link as the initial broadcast syncronization
message, i.e., via the first link that is established.This is not guaranteed in the current implementation. If two link
come up very close to each other in time, the "bulk" may quite well
pass via the second link, and hence void the guarantee of a correct
initial synchronization before the broadcast link is opened.This commit makes two small changes to strengthen this guarantee.
1) We let the second established link occupy slot 1 of the
"active_links" array, while the first link will retain slot 0.
(This is in reality a cosmetic change, we could just as well keep
the current, opposite order)2) We let the name distributor always use link selector/slot 0 when
it sends it binding table updates.The extra traffic bias on the first link caused by this change should
be negligible, since binding table updates constitutes a very small
fraction of the total traffic.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
With the recent commit series, we have established a one-way dependency
between the link aggregation (struct tipc_node) instances and their
pertaining tipc_link instances. This has enabled quite significant code
and structure simplifications.In this commit, we eliminate the field 'owner', which points to an
instance of struct tipc_node, from struct tipc_link, and replace it with
a pointer to struct net, which is the only external reference now needed
by a link instance.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Since all packet transmitters (link, bcast, discovery) are now sending
consumable buffer clones to the bearer layer, we can remove the
redundant buffer cloning that is perfomed in the lower level functions
tipc_l2_send_msg() and tipc_udp_send_msg().Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
The neighbor discovery function currently uses the function
tipc_bearer_send() for transmitting packets, assuming that the
sent buffers are not consumed by the called function.We want to change this, in order to avoid unnecessary buffer cloning
elswhere in the code.This commit introduces a new function tipc_bearer_skb() which consumes
the sent buffers, and let the discoverer functions use this new call
instead. The discoverer does now itself perform the cloning when
that is necessary.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Until now, we have only been supporting a fix MTU size of 1500 bytes
for all broadcast media, irrespective of their actual capability.We now make the broadcast MTU adaptable to the carrying media, i.e.,
we use the smallest MTU supported by any of the interfaces attached
to TIPC.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.This commit introduces these changes.
Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Until now, we have tried to support both the newer, dedicated broadcast
synchronization mechanism along with the older, less safe, RESET_MSG/
ACTIVATE_MSG based one. The latter method has turned out to be a hazard
in a highly dynamic cluster, so we find it safer to disable it completely
when we find that the former mechanism is supported by the peer node.For this purpose, we now introduce a new capabability bit,
TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
syncronization is supported by the present node. The new bit is conveyed
between peers in the 'capabilities' field of neighbor discovery messages.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
This commit simplifies the broadcast link transmission function, by
leveraging previous changes to the link transmission function and the
broadcast transmission link life cycle.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Realizing that unicast is just a special case of broadcast, we also see
that we can go in the other direction, i.e., that modest changes to the
current unicast link can make it generic enough to support broadcast.The following changes are introduced here:
- A new counter ("ackers") in struct tipc_link, to indicate how many
peers need to ack a packet before it can be released.
- A corresponding counter in the skb user area, to keep track of how
many peers a are left to ack before a buffer can be released.
- A new counter ("acked"), to keep persistent track of how far a peer
has acked at the moment, i.e., where in the transmission queue to
start updating buffers when the next ack arrives. This is to avoid
double acknowledgements from a peer, with inadvertent relase of
packets as a result.
- A more generic tipc_link_retrans() function, where retransmit starts
from a given sequence number, instead of the first packet in the
transmision queue. This is to minimize the number of retransmitted
packets on the broadcast media.When the new functionality is taken into use in the next commits,
we expect it to have minimal effect on unicast mode performance.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
The broadcast link instance (struct tipc_link) used for sending is
currently aggregated into struct tipc_bclink. This means that we cannot
use the regular tipc_link_create() function for initiating the link, but
do instead have to initiate numerous fields directly from the
bcast_init() function.We want to reduce dependencies between the broadcast functionality
and the inner workings of tipc_link. In this commit, we introduce
a new function tipc_bclink_create() to link.c, and allocate the
instance of the link separately using this function.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
In reality, the link implementation is already independent from
struct tipc_bearer, in that it doesn't store any reference to it.
However, we still pass on a pointer to a bearer instance in the
function tipc_link_create(), just to have it extract some
initialization information from it.I later commits, we need to create instances of tipc_link without
having any associated struct tipc_bearer. To facilitate this, we
want to extract the initialization data already in the creator
function in node.c, before calling tipc_link_create(), and pass
this info on as individual parameters in the call.This commit introduces this change.
Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
The broadcast transmission link is currently instantiated when the
network subsystem is started, i.e., on order from user space via netlink.This forces the broadcast transmission code to do unnecessary tests for
the existence of the transmission link, as well in single mode node as
in network mode.In this commit, we do instead create the link during initialization of
the name space, and remove it when it is stopped. The fact that the
transmission link now has a guaranteed longer life cycle than any of its
potential clients paves the way for further code simplifcations
and optimizations.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.There are no functional changes in this commit.
Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.hThe openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.Signed-off-by: David S. Miller
22 Oct, 2015
2 commits
-
In commit d999297c3dbbe ("tipc: reduce locking scope during packet reception")
we altered the packet retransmission function. Since then, when
restransmitting packets, we create a clone of the original buffer
using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of
the area we want to have copied, but also the smallest possible TIPC
packet size. The value of MIN_H_SIZE is 24.Unfortunately, __pskb_copy() also has the effect that the headroom
of the cloned buffer takes the size MIN_H_SIZE. This is too small
for carrying the packet over the UDP tunnel bearer, which requires
a minimum headroom of 28 bytes. A change to just use pskb_copy()
lets the clone inherit the original headroom of 80 bytes, but also
assumes that the copied data area is of at least that size, something
that is not always the case. So that is not a viable solution.We now fix this by adding a check for sufficient headroom in the
transmit function of udp_media.c, and expanding it when necessary.Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception")
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The current code for message reassembly is erroneously assuming that
the the first arriving fragment buffer always is linear, and then goes
ahead resetting the fragment list of that buffer in anticipation of
more arriving fragments.However, if the buffer already happens to be non-linear, we will
inadvertently drop the already attached fragment list, and later
on trig a BUG() in __pskb_pull_tail().We see this happen when running fragmented TIPC multicast across UDP,
something made possible since
commit d0f91938bede ("tipc: add ip/udp media type")We fix this by not resetting the fragment list when the buffer is non-
linear, and by initiatlizing our private fragment list tail pointer to
the tail of the existing fragment list.Fixes: commit d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller