Eric Lee / smarc-fsl-linux-kernel

16 Jun, 2016

1 commit

35c55c987 tipc: add neighbor monitoring framework ... Browse Code »

TIPC based clusters are by default set up with full-mesh link
connectivity between all nodes. Those links are expected to provide
a short failure detection time, by default set to 1500 ms. Because
of this, the background load for neighbor monitoring in an N-node
cluster increases with a factor N on each node, while the overall
monitoring traffic through the network infrastructure increases at
a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
scale well beyond ~100 nodes unless we significantly increase failure
discovery tolerance.

This commit introduces a framework and an algorithm that drastically
reduces this background load, while basically maintaining the original
failure detection times across the whole cluster. Using this algorithm,
background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
now have to actively monitor 38 neighbors in a 400-node cluster, instead
of as before 399.

This "Overlapping Ring Supervision Algorithm" is completely distributed
and employs no centralized or coordinated state. It goes as follows:

- Each node makes up a linearly ascending, circular list of all its N
known neighbors, based on their TIPC node identity. This algorithm
must be the same on all nodes.

- The node then selects the next M = sqrt(N) - 1 nodes downstream from
itself in the list, and chooses to actively monitor those. This is
called its "local monitoring domain".

- It creates a domain record describing the monitoring domain, and
piggy-backs this in the data area of all neighbor monitoring messages
(LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
the cluster eventually (default within 400 ms) will learn about
its monitoring domain.

- Whenever a node discovers a change in its local domain, e.g., a node
has been added or has gone down, it creates and sends out a new
version of its node record to inform all neighbors about the change.

- A node receiving a domain record from anybody outside its local domain
matches this against its own list (which may not look the same), and
chooses to not actively monitor those members of the received domain
record that are also present in its own list. Instead, it relies on
indications from the direct monitoring nodes if an indirectly
monitored node has gone up or down. If a node is indicated lost, the
receiving node temporarily activates its own direct monitoring towards
that node in order to confirm, or not, that it is actually gone.

- Since each node is actively monitoring sqrt(N) downstream neighbors,
each node is also actively monitored by the same number of upstream
neighbors. This means that all non-direct monitoring nodes normally
will receive sqrt(N) indications that a node is gone.

- A major drawback with ring monitoring is how it handles failures that
cause massive network partitionings. If both a lost node and all its
direct monitoring neighbors are inside the lost partition, the nodes in
the remaining partition will never receive indications about the loss.
To overcome this, each node also chooses to actively monitor some
nodes outside its local domain. Those nodes are called remote domain
"heads", and are selected in such a way that no node in the cluster
will be more than two direct monitoring hops away. Because of this,
each node, apart from monitoring the member of its local domain, will
also typically monitor sqrt(N) remote head nodes.

- As an optimization, local list status, domain status and domain
records are marked with a generation number. This saves senders from
unnecessarily conveying unaltered domain records, and receivers from
performing unneeded re-adaptations of their node monitoring list, such
as re-assigning domain heads.

- As a measure of caution we have added the possibility to disable the
new algorithm through configuration. We do this by keeping a threshold
value for the cluster size; a cluster that grows beyond this value
will switch from full-mesh to ring monitoring, and vice versa when
it shrinks below the value. This means that if the threshold is set to
a value larger than any anticipated cluster size (default size is 32)
the new algorithm is effectively disabled. A patch set for altering the
threshold value and for listing the table contents will follow shortly.

- This change is fully backwards compatible.

Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-06-16 05:06:28 +0800

12 Apr, 2016

1 commit

541726abe tipc: make dist queue pernet ... Browse Code »

Nametable updates received from the network that cannot be applied
immediately are placed on a defer queue. This queue is global to the
TIPC module, which might cause problems when using TIPC in containers.
To prevent nametable updates from escaping into the wrong namespace,
we make the queue pernet instead.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2016-04-12 03:22:20 +0800

21 Nov, 2015

1 commit

1d7e1c259 tipc: reduce code dependency between binding table and node layer ... Browse Code »

The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-11-21 03:06:10 +0800

24 Oct, 2015

4 commits

2af5ae372 tipc: clean up unused code and structures ... Browse Code »

After the previous changes in this series, we can now remove some
unused code and structures, both in the broadcast, link aggregation
and link code.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:47 +0800
526669866 tipc: let broadcast packet reception use new link receive function ... Browse Code »

The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.

We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.

Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:37 +0800
0043550b0 tipc: move broadcast link lock to struct tipc_net ... Browse Code »

The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:25 +0800
6beb19a62 tipc: move bcast definitions to bcast.c ... Browse Code »

Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.

In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.

We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.

Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-10-24 21:56:24 +0800

31 Jul, 2015

1 commit

440d8963c tipc: clean up link creation ... Browse Code »

We simplify the link creation function tipc_link_create() and the way
the link struct it is connected to the node struct. In particular, we
remove the duplicate initialization of some fields which are anyway set
in tipc_link_reset().

Tested-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-31 08:25:15 +0800

21 Jul, 2015

1 commit

d999297c3 tipc: reduce locking scope during packet reception ... Browse Code »

We convert packet/message reception according to the same principle
we have been using for message sending and timeout handling:

We move the function tipc_rcv() to node.c, hence handling the initial
packet reception at the link aggregation level. The function grabs
the node lock, selects the receiving link, and accesses it via a new
call tipc_link_rcv(). This function appends buffers to the input
queue for delivery upwards, but it may also append outgoing packets
to the xmit queue, just as we do during regular message sending. The
latter will happen when buffers are forwarded from the link backlog,
or when retransmission is requested.

Upon return of this function, and after having released the node lock,
tipc_rcv() delivers/tranmsits the contents of those queues, but it may
also perform actions such as link activation or reset, as indicated by
the return flags from the link.

This reduces the number of cpu cycles spent inside the node spinlock,
and reduces contention on that lock.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:16 +0800

15 May, 2015

2 commits

e4bf4f769 tipc: simplify packet sequence number handling ... Browse Code »

Although the sequence number in the TIPC protocol is 16 bits, we have
until now stored it internally as an unsigned 32 bits integer.
We got around this by always doing explicit modulo-65535 operations
whenever we need to access a sequence number.

We now make the incoming and outgoing sequence numbers to unsigned
16-bit integers, and remove the modulo operations where applicable.

We also move the arithmetic inline functions for 16 bit integers
to core.h, and the function buf_seqno() to msg.h, so they can easily
be accessed from anywhere in the code.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
a6bf70f79 tipc: simplify include dependencies ... Browse Code »

When we try to add new inline functions in the code, we sometimes
run into circular include dependencies.

The main problem is that the file core.h, which really should be at
the root of the dependency chain, instead is a leaf. I.e., core.h
includes a number of header files that themselves should be allowed
to include core.h. In reality this is unnecessary, because core.h does
not need to know the full signature of any of the structs it refers to,
only their type declaration.

In this commit, we remove all dependencies from core.h towards any
other tipc header file.

As a consequence of this change, we can now move the function
tipc_own_addr(net) from addr.c to addr.h, and make it inline.

There are no functional changes in this commit.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:45 +0800

10 Feb, 2015

2 commits

941787b82 tipc: remove tipc_snprintf ... Browse Code »

tipc_snprintf() was heavily utilized by the old netlink API which no
longer exists (now netlink compat).

In this patch we swap tipc_snprintf() to the identical scnprintf() in
the only remaining occurrence.

Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Richard Alpe
2015-02-10 05:20:49 +0800
bfb3e5dd8 tipc: move and rename the legacy nl api to "nl compat" ... Browse Code »

The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.

Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Richard Alpe
2015-02-10 05:20:47 +0800

13 Jan, 2015

11 commits

bafa29e34 tipc: make tipc random value aware of net namespace ... Browse Code »

After namespace is supported, each namespace should own its private
random value. So the global variable representing the random value
must be moved to tipc_net structure.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
a62fbccec tipc: make subscriber server support net namespace ... Browse Code »

TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
347475395 tipc: make tipc node address support net namespace ... Browse Code »

If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
4ac1c8d0e tipc: name tipc name table support net namespace ... Browse Code »

TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
e05b31f4b tipc: make tipc socket support net namespace ... Browse Code »

Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
1da465683 tipc: make tipc broadcast link support net namespace ... Browse Code »

TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:33 +0800
7f9f95d9d tipc: make bearer list support net namespace ... Browse Code »

Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800
f2f9800d4 tipc: make tipc node table aware of net namespace ... Browse Code »

Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800
c93d3baa2 tipc: involve namespace infrastructure ... Browse Code »

Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:32 +0800
859fc7c0c tipc: cleanup core.c and core.h files ... Browse Code »

Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:31 +0800
2f55c4378 tipc: remove unnecessary wrapper functions of kernel timer APIs ... Browse Code »

Not only some wrapper function like k_term_timer() is empty, but also
some others including k_start_timer() and k_cancel_timer() don't return
back any value to its caller, what's more, there is no any component
in the kernel world to do such thing. Therefore, these timer interfaces
defined in tipc module should be purged.

Signed-off-by: Ying Xue
Tested-by: Tero Aho
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-01-13 05:24:31 +0800

09 Jan, 2015

1 commit

07f6c4bc0 tipc: convert tipc reference table to use generic rhashtable ... Browse Code »

As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.

If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.

Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().

Signed-off-by: Ying Xue
Acked-by: Jon Maloy
Acked-by: Erik Hugne
Cc: Thomas Graf
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Ying Xue
2015-01-09 11:47:14 +0800

27 Nov, 2014

1 commit

58311d169 tipc: eliminate two pseudo message types of BUNDLE_OPEN and BUNDLE_CLOSED ... Browse Code »

The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
used to flag whether or not more messages can be bundled into a data
packet in the outgoing transmission queue. Obviously, no more messages
can be appended after the packet has been sent and is waiting to be
acknowledged and deleted. These message types do in reality represent
a send-side local implementation flag, and are not defined as part of
the protocol. It is therefore safe to move it to to where it belongs,
that is, the control area (TIPC_SKB_CB) of the buffer.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-11-27 01:30:17 +0800

22 Nov, 2014

1 commit

0655f6a86 tipc: add bearer disable/enable to new netlink api ... Browse Code »

A new netlink API for tipc that can disable or enable a tipc bearer.

The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.

The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.

Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE

Netlink logical layout of bearer enable message:
-> bearer
-> name
[ -> domain ]
[
-> properties
-> priority
]

Netlink logical layout of bearer disable message:
-> bearer
-> name

Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2014-11-22 04:01:29 +0800

02 Sep, 2014

1 commit

a5325ae5b tipc: add name distributor resiliency queue ... Browse Code »

TIPC name table updates are distributed asynchronously in a cluster,
entailing a risk of certain race conditions. E.g., if two nodes
simultaneously issue conflicting (overlapping) publications, this may
not be detected until both publications have reached a third node, in
which case one of the publications will be silently dropped on that
node. Hence, we end up with an inconsistent name table.

In most cases this conflict is just a temporary race, e.g., one
node is issuing a publication under the assumption that a previous,
conflicting, publication has already been withdrawn by the other node.
However, because of the (rtt related) distributed update delay, this
may not yet hold true on all nodes. The symptom of this failure is a
syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

In this commit we add a resiliency queue at the receiving end of
the name table distributor. When insertion of an arriving publication
fails, we retain it in this queue for a short amount of time, assuming
that another update will arrive very soon and clear the conflict. If so
happens, we insert the publication, otherwise we drop it.

The (configurable) retention value defaults to 2000 ms. Knowing from
experience that the situation described above is extremely rare, there
is no risk that the queue will accumulate any large number of items.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller

Erik Hugne
2014-09-02 08:51:48 +0800

24 Aug, 2014

1 commit

50100a5e3 tipc: use pseudo message to wake up sockets after link congestion ... Browse Code »

The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-08-24 02:18:33 +0800

15 May, 2014

2 commits

38504c28a tipc: improve and extend media address conversion functions ... Browse Code »

TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:

1) A "raw" format as obtained from the device. This format is known
only by the media specific adapter code in eth_media.c and
ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
which can be referenced and passed around by the generic media-
unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
discovery messages.

Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.

We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.

We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.

Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-05-15 03:19:48 +0800
37e22164a tipc: rename and move message reassembly function ... Browse Code »

The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.

In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.

Signed-off-by: Jon Maloy
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Jon Paul Maloy
2014-05-15 03:19:48 +0800

06 May, 2014

1 commit

52ff87205 tipc: purge signal handler infrastructure ... Browse Code »

In the previous commits of this series, we removed all asynchronous
actions which were based on the tasklet handler - "tipc_k_signal()".

So the moment has now come when we can completely remove the tasklet
handler infrastructure. That is done with this commit.

Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2014-05-06 05:26:45 +0800

23 Apr, 2014

1 commit

ef13a262c tipc: replace config_mutex lock with RTNL lock ... Browse Code »

There have two paths where we can configure or change bearer status:
one is that bearer is configured from user space with tipc-config
tool; another one is that bearer is changed by notification events
from its attached interface. On the first path, one dedicated
config_mutex lock is guarded; on the latter path, RTNL lock has been
placed to serialize the process of dealing with interface events.
So, if RTNL lock is also used to protect the first path, this will
not only extremely help us simplify current locking policy, but also
config_mutex lock can be deleted as well.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Reviewed-by: Erik Hugne
Tested-by: Erik Hugne
Signed-off-by: David S. Miller

Ying Xue
2014-04-23 09:17:52 +0800

28 Mar, 2014

1 commit

5902385a2 tipc: obsolete the remote management feature ... Browse Code »

Due to the lacking of any credential, it's allowed to accept commands
requested from remote nodes to query the local node status, which is
prone to involve potential security risks. Instead, if we login to
a remote node with ssh command, this approach is not only more safe
than the remote management feature, but also it can give us more
permissions like changing the remote node configuration. So it's
reasonable for us to obsolete the remote management feature now.

Signed-off-by: Ying Xue
Reviewed-by: Erik Hugne
Signed-off-by: David S. Miller

Ying Xue
2014-03-28 01:08:36 +0800

22 Feb, 2014

1 commit

970122fdf tipc: make bearer set up in module insertion stage ... Browse Code »

Accidentally a side effect is involved by commit 6e967adf7(tipc:
relocate common functions from media to bearer). Now tipc stack
handler of receiving packets from netdevices as well as netdevice
notification handler are registered when bearer is enabled rather
than tipc module initialization stage, but the two handlers are
both unregistered in tipc module exit phase. If tipc module is
inserted and then immediately removed, the following warning
message will appear:

"dev_remove_pack: ffffffffa0380940 not found"

This is because in module insertion stage tipc stack packet handler
is not registered at all, but in module exit phase dev_remove_pack()
needs to remove it. Of course, dev_remove_pack() cannot find tipc
protocol handler from the kernel protocol handler list so that the
warning message is printed out.

But if registering the two handlers is adjusted from enabling bearer
phase into inserting module stage, the warning message will be
eliminated. Due to this change, tipc_core_start_net() and
tipc_core_stop_net() can be deleted as well.

Reported-by: Wang Weidong
Cc: Jon Maloy
Cc: Erik Hugne
Signed-off-by: Ying Xue
Reviewed-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2014-02-22 13:00:15 +0800

14 Feb, 2014

1 commit

64380a04d tipc: fix message corruption bug for deferred packets ... Browse Code »

If a packet received on a link is out-of-sequence, it will be
placed on a deferred queue and later reinserted in the receive
path once the preceding packets have been processed. The problem
with this is that it will be subject to the buffer adjustment from
link_recv_buf_validate twice. The second adjustment for 20 bytes
header space will corrupt the packet.

We solve this by tagging the deferred packets and bail out from
receive buffer validation for packets that have already been
subjected to this.

Signed-off-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2014-02-14 05:35:05 +0800

17 Dec, 2013

1 commit

776a74ce0 tipc: Use <linux/uaccess.h> instead of <asm/uaccess.h> ... Browse Code »

As warned by checkpatch.pl, use #include
instead of

Reviewed-by: Jon Maloy
Reviewed-by: Erik Hugne
Signed-off-by: Wang Weidong
Signed-off-by: David S. Miller

wangweidong
2013-12-17 01:48:35 +0800

20 Oct, 2013

1 commit

c1b1203d6 net: misc: Remove extern from function prototypes ... Browse Code »

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2013-10-20 07:12:11 +0800

18 Jun, 2013

2 commits

c5fa7b3cf tipc: introduce new TIPC server infrastructure ... Browse Code »

TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.

As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.

To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.

The new service also solves another problem:

As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.

Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.

As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.

As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2013-06-18 06:53:00 +0800
cc79dd1ba tipc: change socket buffer overflow control to respect sk_rcvbuf ... Browse Code »

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command. It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.

By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.

Originally-by: Jon Maloy
Cc: Neil Horman
Cc: Jon Maloy
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: Ying Xue
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2013-06-18 06:53:00 +0800