15 Sep, 2019
1 commit
-
Minor overlapping changes in the btusb and ixgbe drivers.
Signed-off-by: David S. Miller
05 Sep, 2019
1 commit
-
Unlike kfree(p), kfree_rcu(p, rcu) won't do NULL pointer check. When
tipc_nametbl_remove_publ returns NULL, the panic below happens:BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
RIP: 0010:__call_rcu+0x1d/0x290
Call Trace:
tipc_publ_notify+0xa9/0x170 [tipc]
tipc_node_write_unlock+0x8d/0x100 [tipc]
tipc_node_link_down+0xae/0x1d0 [tipc]
tipc_node_check_dest+0x3ea/0x8f0 [tipc]
? tipc_disc_rcv+0x2c7/0x430 [tipc]
tipc_disc_rcv+0x2c7/0x430 [tipc]
? tipc_rcv+0x6bb/0xf20 [tipc]
tipc_rcv+0x6bb/0xf20 [tipc]
? ip_route_input_slow+0x9cf/0xb10
tipc_udp_recv+0x195/0x1e0 [tipc]
? tipc_udp_is_known_peer+0x80/0x80 [tipc]
udp_queue_rcv_skb+0x180/0x460
udp_unicast_rcv_skb.isra.56+0x75/0x90
__udp4_lib_rcv+0x4ce/0xb90
ip_local_deliver_finish+0x11c/0x210
ip_local_deliver+0x6b/0xe0
? ip_rcv_finish+0xa9/0x410
ip_rcv+0x273/0x362Fixes: 97ede29e80ee ("tipc: convert name table read-write lock to RCU")
Reported-by: Li Shuang
Signed-off-by: Xin Long
Signed-off-by: David S. Miller
19 Aug, 2019
1 commit
-
The policy for handling the skb list locks on the send and receive paths
is simple.- On the send path we never need to grab the lock on the 'xmitq' list
when the destination is an exernal node.- On the receive path we always need to grab the lock on the 'inputq'
list, irrespective of source node.However, when transmitting node local messages those will eventually
end up on the receive path of a local socket, meaning that the argument
'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the
function tipc_sk_rcv(). This has been handled by always initializing
the spinlock of the 'xmitq' list at message creation, just in case it
may end up on the receive path later, and despite knowing that the lock
in most cases never will be used.This approach is inaccurate and confusing, and has also concealed the
fact that the stated 'no lock grabbing' policy for the send path is
violated in some cases.We now clean up this by never initializing the lock at message creation,
instead doing this at the moment we find that the message actually will
enter the receive path. At the same time we fix the four locations
where we incorrectly access the spinlock on the send/error path.This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
is initialised") which has now become redundant.CC: Eric Dumazet
Reported-by: Chris Packham
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Reviewed-by: Xin Long
Signed-off-by: David S. Miller
13 Jul, 2019
1 commit
-
tipc_named_node_up() creates a skb list. It passes the list to
tipc_node_xmit() which has some code paths that can call
skb_queue_purge() which relies on the list->lock being initialised.The spin_lock is only needed if the messages end up on the receive path
but when the list is created in tipc_named_node_up() we don't
necessarily know if it is going to end up there.Once all the skb list users are updated in tipc it will then be possible
to update them to use the unlocked variants of the skb list functions
and initialise the lock when we know the message will follow the receive
path.Signed-off-by: Chris Packham
Acked-by: Jon Maloy
Signed-off-by: David S. Miller
23 Oct, 2018
1 commit
-
We have seen the following race scenario:
1) named_distribute() builds a "bulk" message, containing a PUBLISH
item for a certain publication. This is based on the contents of
the binding tables's 'cluster_scope' list.
2) tipc_named_withdraw() removes the same publication from the list,
bulds a WITHDRAW message and distributes it to all cluster nodes.
3) tipc_named_node_up(), which was calling named_distribute(), sends
out the bulk message built under 1)
4) The WITHDRAW message arrives at the just detected node, finds
no corresponding publication, and is dropped.
5) The PUBLISH item arrives at the same node, is added to its binding
table, and remains there forever.This arrival disordering was earlier taken care of by the backlog queue,
originally added for a different purpose, which was removed in the
commit referred to below, but we now need a different solution.
In this commit, we replace the rcu lock protecting the 'cluster_scope'
list with a regular RW lock which comprises even the sending of the
bulk message. This both guarantees both the list integrity and the
message sending order. We will later add a commit which cleans up
this code further.Note that this commit needs recently added commit d3092b2efca1 ("tipc:
fix unsafe rcu locking when accessing publication list") to apply
cleanly.Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table")
Reported-by: Tuong Lien Tong
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
16 Oct, 2018
1 commit
-
The binding table's 'cluster_scope' list is rcu protected to handle
races between threads changing the list and those traversing the list at
the same moment. We have now found that the function named_distribute()
uses the regular list_for_each() macro to traverse the said list.
Likewise, the function tipc_named_withdraw() is removing items from the
same list using the regular list_del() call. When these two functions
execute in parallel we see occasional crashes.This commit fixes this by adding the missing _rcu() suffixes.
Signed-off-by: Tung Nguyen
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
01 Apr, 2018
1 commit
-
With the new RB tree structure for service ranges it becomes possible to
solve an old problem; - we can now allow overlapping service ranges in
the table.When inserting a new service range to the tree, we use 'lower' as primary
key, and when necessary 'upper' as secondary key.Since there may now be multiple service ranges matching an indicated
'lower' value, we must also add the 'upper' value to the functions
used for removing publications, so that the correct, corresponding
range item can be found.These changes guarantee that a well-formed publication/withdrawal item
from a peer node never will be rejected, and make it possible to
eliminate the problematic backlog functionality we currently have for
handling such cases.Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
24 Mar, 2018
2 commits
-
We add a 128-bit node identity, as an alternative to the currently used
32-bit node address.For the sake of compatibility and to minimize message header changes
we retain the existing 32-bit address field. When not set explicitly by
the user, this field will be filled with a hash value generated from the
much longer node identity, and be used as a shorthand value for the
latter.We permit either the address or the identity to be set by configuration,
but not both, so when the address value is set by a legacy user the
corresponding 128-bit node identity is generated based on the that value.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a preparation to changing the addressing structure of TIPC we replace
all direct accesses to the tipc_net::own_addr field with the function
dedicated for this, tipc_own_addr().There are no changes to program logics in this commit.
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
18 Mar, 2018
3 commits
-
We rename some lists and fields in struct publication both to make
the naming more consistent and to better reflect their roles. We
also update the descriptions of those lists.node_list -> local_publ
cluster_list -> all_publ
pport_list -> binding_sock
ref -> portThere are no functional changes in this commit.
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The size of struct publication can be reduced further. Membership in
lists 'nodesub_list' and 'local_list' is mutually exlusive, in that
remote publications use the former and local publications the latter.
We replace the two lists with one single, named 'binding_node' which
reflects what it really is.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a consequence of the previous commit we nan now eliminate zone scope
related lists in the name table. We start with name_table::publ_list[3],
which can now be replaced with two lists, one for node scope publications
and one for cluster scope publications.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
17 Jan, 2017
1 commit
-
Until now, we allocate memory always with GFP_ATOMIC flag.
When the system is under memory pressure and a user tries to send,
the send fails due to low memory. However, the user application
can wait for free memory if we allocate it using GFP_KERNEL flag.In this commit, we use allocate memory with GFP_KERNEL for all user
allocation.Reported-by: Rune Torgersen
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
30 Oct, 2016
1 commit
-
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.This minor protocol update is fully backwards compatible.
Reported-by: John Thompson
Tested-by: John Thompson
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
02 Sep, 2016
1 commit
-
In a dual bearer configuration, if the second tipc link becomes
active while the first link still has pending nametable "bulk"
updates, it randomly leads to reset of the second link.When a link is established, the function named_distribute(),
fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
with NAME_DISTRIBUTOR message for each PUBLICATION.
However, the function named_distribute() allocates the buffer by
increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
This consumes the space allocated for TUNNEL_PROTOCOL.When establishing the second link, the link shall tunnel all the
messages in the first link queue including the "bulk" update.
As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
the link mtu the transmission fails (-EMSGSIZE).Thus, the synch point based on the message count of the tunnel
packets is never reached leading to link timeout.In this commit, we adjust the size of name distributor message so that
they can be tunnelled.Reviewed-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
12 Apr, 2016
2 commits
-
If a peer node becomes unavailable, in addition to removing the
nametable entries from this node we also need to purge all deferred
updates associated with this node.Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
Nametable updates received from the network that cannot be applied
immediately are placed on a defer queue. This queue is global to the
TIPC module, which might cause problems when using TIPC in containers.
To prevent nametable updates from escaping into the wrong namespace,
we make the queue pernet instead.Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
21 Nov, 2015
2 commits
-
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
24 Oct, 2015
1 commit
-
Correct synchronization of the broadcast link at first contact between
two nodes is dependent on the assumption that the binding table "bulk"
update passes via the same link as the initial broadcast syncronization
message, i.e., via the first link that is established.This is not guaranteed in the current implementation. If two link
come up very close to each other in time, the "bulk" may quite well
pass via the second link, and hence void the guarantee of a correct
initial synchronization before the broadcast link is opened.This commit makes two small changes to strengthen this guarantee.
1) We let the second established link occupy slot 1 of the
"active_links" array, while the first link will retain slot 0.
(This is in reality a cosmetic change, we could just as well keep
the current, opposite order)2) We let the name distributor always use link selector/slot 0 when
it sends it binding table updates.The extra traffic bias on the first link caused by this change should
be negligible, since binding table updates constitutes a very small
fraction of the total traffic.Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller
21 Jul, 2015
2 commits
-
Currently, message sending is performed through a deep call chain,
where the node spinlock is grabbed and held during a significant
part of the transmission time. This is clearly detrimental to
overall throughput performance; it would be better if we could send
the message after the spinlock has been released.In this commit, we do instead let the call revert on the stack after
the buffer chain has been added to the transmission queue, whereafter
clones of the buffers are transmitted to the device layer outside the
spinlock scope.As a further step in our effort to separate the roles of the node
and link entities we also move the function tipc_link_xmit() to
node.c, and rename it to tipc_node_xmit().Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
30 Mar, 2015
1 commit
-
TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
28 Feb, 2015
1 commit
-
The TIPC name distributor pushes topology updates to the cluster
neighbors. Currently this is done in a unicast manner, and the
skb holding the update is cloned for each cluster member. This
is unnecessary, as we only modify the destnode field in the header
so we change it to do pskb_copy instead.Signed-off-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller
06 Feb, 2015
2 commits
-
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.This solves the sequentiality problem, and seems to cause no measurable
performance degradation.A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
13 Jan, 2015
3 commits
-
If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller
09 Dec, 2014
3 commits
-
Convert tipc name table read-write lock to RCU. After this change,
a new spin lock is used to protect name table on write side while
RCU is applied on read side.Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Tested-by: Erik Hugne
Signed-off-by: David S. Miller -
Name table locking policy is going to be adjusted from read-write
lock protection to RCU lock protection in the future commits. But
its essential precondition is to convert the allocation way of name
table from static to dynamic mode.Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Tested-by: Erik Hugne
Signed-off-by: David S. Miller -
The size variable is introduced in publ_list struct to help us exactly
calculate SKB buffer sizes needed by publications when all publications
in name table are delivered in bulk in named_distribute(). But if
publication SKB buffer size is assumed to MTU, the size variable in
publ_list struct can be completely eliminated at the cost of wasting
a bit memory space for last SKB.Signed-off-by: Ying Xue
Signed-off-by: Tero Aho
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Tested-by: Erik Hugne
Signed-off-by: David S. Miller
27 Nov, 2014
2 commits
-
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
The node subscribe infrastructure represents a virtual base class, so
its users, such as struct tipc_port and struct publication, can derive
its implemented functionalities. However, after the removal of struct
tipc_port, struct publication is left as its only single user now. So
defining an abstract infrastructure for one user becomes no longer
reasonable. If corresponding new functions associated with the
infrastructure are moved to name_table.c file, the node subscription
infrastructure can be removed as well.Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller
11 Sep, 2014
1 commit
-
This fixes the following sparse warnings:
sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
Also, the function is changed to return bool upon success, rather than a
potentially freed pointer.Signed-off-by: Erik Hugne
Reported-by: Dan Carpenter
Signed-off-by: David S. Miller
02 Sep, 2014
2 commits
-
TIPC name table updates are distributed asynchronously in a cluster,
entailing a risk of certain race conditions. E.g., if two nodes
simultaneously issue conflicting (overlapping) publications, this may
not be detected until both publications have reached a third node, in
which case one of the publications will be silently dropped on that
node. Hence, we end up with an inconsistent name table.In most cases this conflict is just a temporary race, e.g., one
node is issuing a publication under the assumption that a previous,
conflicting, publication has already been withdrawn by the other node.
However, because of the (rtt related) distributed update delay, this
may not yet hold true on all nodes. The symptom of this failure is a
syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".In this commit we add a resiliency queue at the receiving end of
the name table distributor. When insertion of an arriving publication
fails, we retain it in this queue for a short amount of time, assuming
that another update will arrive very soon and clear the conflict. If so
happens, we insert the publication, otherwise we drop it.The (configurable) retention value defaults to 2000 ms. Knowing from
experience that the situation described above is extremely rare, there
is no risk that the queue will accumulate any large number of items.Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller -
We need to perform the same actions when processing deferred name
table updates, so this functionality is moved to a separate
function.Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller
17 Jul, 2014
2 commits
-
After the previous commit, we can now give the functions with temporary
names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
names.There are no functional changes in this commit.
Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller -
In a previous commit series ("tipc: new unicast transmission code")
we introduced a new message sending function, tipc_link_xmit2(),
and moved the unicast data users over to use that function. We now
let the internal name table distributor do the same.The interaction between the name distributor and the node/link
layer also becomes significantly simpler, so we can eliminate
the function tipc_link_names_xmit().Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller
06 May, 2014
1 commit
-
Postpone the actions of delivering name tables until after node
lock is released, avoiding to do it under asynchronous context.Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller