09 Aug, 2019
1 commit
-
Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface. The clones are dropped at reception
and have no functional role except making the traffic visible.The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.v3 - Function dev_nit_active used to check for network taps.
- Procedure netif_rx_ni used to send cloned messages to loopback device.Signed-off-by: John Rutherford
Acked-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller
08 Jul, 2019
1 commit
-
For these places are protected by rcu_read_lock, we change from
rcu_dereference_rtnl to rcu_dereference, as there is no need to
check if rtnl lock is held.For these places are protected by rtnl_lock, we change from
rcu_dereference_rtnl to rtnl_dereference/rcu_dereference_protected,
as no extra memory barriers are needed under rtnl_lock() which also
protects tn->bearer_list[] and dev->tipc_ptr/b->media_ptr updating.rcu_dereference_rtnl will be only used in the places where it could
be under rcu_read_lock or rtnl_lock.Signed-off-by: Xin Long
Signed-off-by: David S. Miller
28 Apr, 2019
2 commits
-
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected acceptedSplit out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute sizeThe default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecatedUsing spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller -
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)Signed-off-by: Michal Kubecek
Acked-by: Jiri Pirko
Acked-by: David Ahern
Signed-off-by: David S. Miller
28 Dec, 2018
1 commit
-
bearer_disable() already calls kfree_rcu() to free struct tipc_bearer,
we don't need to call kfree() again.Fixes: cb30a63384bc ("tipc: refactor function tipc_enable_bearer()")
Reported-by: syzbot+b981acf1fb240c0c128b@syzkaller.appspotmail.com
Cc: Ying Xue
Cc: Jon Maloy
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
20 Dec, 2018
2 commits
-
The commit adds the new trace_event for TIPC bearer, L2 device event:
trace_tipc_l2_device_event()
Also, it puts the trace at the tipc_l2_device_event() function, then
the device/bearer events and related info can be traced out during
runtime when needed.Acked-by: Ying Xue
Tested-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller -
As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.The following trace_events are now available:
- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
e.g. message type, user, droppable, skb truesize, cloned skb, etc.- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
queues, e.g. TIPC link transmq, socket receive queue, etc.- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
sk state, sk type, connection type, rmem_alloc, socket queues, etc.- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
link state, silent_intv_cnt, gap, bc_gap, link queues, etc.- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
node state, active links, capabilities, link entries, etc.How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.For other trace purposes, we can reuse these trace classes as template
but different events.c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:/sys/kernel/debug/tracing/events/tipc/
For example:
To enable the tipc_link_dump event:
echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
To enable all the TIPC trace_events:
echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
To collect the trace data:
cat trace
or
cat trace_pipe > /trace.out &
To disable all the TIPC trace_events:
echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
To clear the trace buffer:
echo > trace
d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:Documentation/trace/ftrace.txt
MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: Ying Xue
Tested-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Tuong Lien
Signed-off-by: David S. Miller
04 Oct, 2018
1 commit
-
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.Signed-off-by: David S. Miller
26 Sep, 2018
1 commit
-
If we detect that under lying carrier detects errors and goes down,
we reset the bearer.Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
11 Sep, 2018
1 commit
-
An SKB is not on a list if skb->next is NULL.
Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.Signed-off-by: David S. Miller
27 Jul, 2018
1 commit
-
when tipc_own_id failed to obtain node identity,dev_put should
be call before return -EINVAL.Fixes: 682cd3cf946b ("tipc: confgiure and apply UDP bearer MTU on running links")
Signed-off-by: YueHaibing
Signed-off-by: David S. Miller
05 Jul, 2018
1 commit
-
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.Warning level 2 was used: -Wimplicit-fallthrough=2
Signed-off-by: Gustavo A. R. Silva
Acked-by: Ying Xue
Signed-off-by: David S. Miller
20 Apr, 2018
2 commits
-
Currently, we have option to configure MTU of UDP media. The configured
MTU takes effect on the links going up after that moment. I.e, a user
has to reset bearer to have new value applied across its links. This is
confusing and disturbing on a running cluster.We now introduce the functionality to change the default UDP bearer MTU
in struct tipc_bearer. Additionally, the links are updated dynamically,
without any need for a reset, when bearer value is changed. We leverage
the existing per-link functionality and the design being symetrical to
the confguration of link tolerance.Acked-by: Jon Maloy
Signed-off-by: GhantaKrishnamurthy MohanKrishna
Signed-off-by: David S. Miller -
In previous commit, we changed the default emulated MTU for UDP bearers
to 14k.This commit adds the functionality to set/change the default value
by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
and enabled back for the new MTU to take effect.Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: GhantaKrishnamurthy MohanKrishna
Signed-off-by: David S. Miller
24 Mar, 2018
5 commits
-
Selecting and explicitly configuring a TIPC node identity may be
unwanted in some cases.In this commit we introduce a default setting if the identity has not
been set at the moment the first bearer is enabled. We do this by
using a raw copy of a unique identifier from the used interface: MAC
address in the case of an L2 bearer, IPv4/IPv6 address in the case
of a UDP bearer.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
When a 32-bit node address is generated from a 128-bit identifier,
there is a risk of collisions which must be discovered and handled.We do this as follows:
- We don't apply the generated address immediately to the node, but do
instead initiate a 1 sec trial period to allow other cluster members
to discover and handle such collisions.- During the trial period the node periodically sends out a new type
of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
to all the other nodes in the cluster.- When a node is receiving such a message, it must check that the
presented 32-bit identifier either is unused, or was used by the very
same peer in a previous session. In both cases it accepts the request
by not responding to it.- If it finds that the same node has been up before using a different
address, it responds with a DSC_TRIAL_FAIL_MSG containing that
address.- If it finds that the address has already been taken by some other
node, it generates a new, unused address and returns it to the
requester.- During the trial period the requesting node must always be prepared
to accept a failure message, i.e., a message where a peer suggests a
different (or equal) address to the one tried. In those cases it
must apply the suggested value as trial address and restart the trial
period.This algorithm ensures that in the vast majority of cases a node will
have the same address before and after a reboot. If a legacy user
configures the address explicitly, there will be no trial period and
messages, so this protocol addition is completely backwards compatible.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.The rules for accepting discovery requests/responses from neighboring
nodes now become:- If the user is using legacy address format on both peers, reception
of discovery messages is subject to the legacy lookup domain check
in addition to the cluster id check.- Otherwise, the discovery request/response is always accepted, provided
both peers have the same network id.This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
To facilitate the coming changes in the neighbor discovery functionality
we make some renaming and refactoring of that code. The functional changes
in this commit are trivial, e.g., that we move the message sending call in
tipc_disc_timeout() outside the spinlock protected region.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a preparation for the next commits we try to reduce the footprint of
the function tipc_enable_bearer(), while hopefully making is simpler to
follow.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
20 Feb, 2018
1 commit
15 Feb, 2018
5 commits
-
Currently, the default link tolerance set in struct tipc_bearer only
has effect on links going up after that moment. I.e., a user has to
reset all the node's links across that bearer to have the new value
applied. This is too limiting and disturbing on a running cluster to
be useful.We now change this so that also already existing links are updated
dynamically, without any need for a reset, when the bearer value is
changed. We leverage the already existing per-link functionality
for this to achieve the wanted effect.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
Introduce __tipc_nl_media_set() which doesn't hold RTNL lock.
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller -
Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock.
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller -
Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock.
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller -
Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock.
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller
27 Dec, 2017
1 commit
-
Fix memory leak in tipc_enable_bearer() if enable_media() fails, and
cleanup with bearer_disable() if tipc_mon_create() fails.Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Tommi Rantala
Signed-off-by: David S. Miller
07 Sep, 2017
1 commit
-
The net device is already stored in the 'net' variable, so no need to call
dev_net() again.Signed-off-by: Kleber Sacilotto de Souza
Acked-by: Ying Xue
Signed-off-by: David S. Miller
02 Sep, 2017
1 commit
-
Three cases of simple overlapping changes.
Signed-off-by: David S. Miller
30 Aug, 2017
1 commit
-
For a bond slave device as a tipc bearer, the dev represents the bond
interface and orig_dev represents the slave in tipc_l2_rcv_msg().
Since we decode the tipc_ptr from bonding device (dev), we fail to
find the bearer and thus tipc links are not established.In this commit, we register the tipc protocol callback per device and
look for tipc bearer from both the devices.Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
22 Aug, 2017
1 commit
-
When the broadcast send link after 100 attempts has failed to
transfer a packet to all peers, we consider it stale, and reset
it. Thereafter it needs to re-synchronize with the peers, something
currently done by just resetting and re-establishing all links to
all peers. This has turned out to be overkill, with potentially
unwanted consequences for the remaining cluster.A closer analysis reveals that this can be done much simpler. When
this kind of failure happens, for reasons that may lie outside the
TIPC protocol, it is typically only one peer which is failing to
receive and acknowledge packets. It is hence sufficient to identify
and reset the links only to that peer to resolve the situation, without
having to reset the broadcast link at all. This solution entails a much
lower risk of negative consequences for the own node as well as for
the overall cluster.We implement this change in this commit.
Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
15 Aug, 2017
1 commit
-
On L2 bearers, the TIPC broadcast function is sending out packets using
the corresponding L2 broadcast address. At reception, we filter such
packets under the assumption that they will also be delivered as
broadcast packets.This assumption doesn't always hold true. Under high load, we have seen
that a switch may convert the destination address and deliver the packet
as a PACKET_MULTICAST, something leading to inadvertently dropped
packets and a stale and reset broadcast link.We fix this by extending the reception filtering to accept packets of
type PACKET_MULTICAST.Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
14 Apr, 2017
2 commits
-
This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.This was done with this spatch (with some manual adjustment of
indentation):@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
extack)
...>
}@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
Signed-off-by: Johannes Berg
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
21 Jan, 2017
1 commit
-
As a preparation for the 'replicast' functionality we are going to
introduce in the next commits, we need the broadcast base structure to
store whether bearer broadcast is available at all from the currently
used bearer or bearers.We do this by adding a new function tipc_bearer_bcast_support() to
the bearer layer, and letting the bearer selection function in
bcast.c use this to give a new boolean field, 'bcast_support' the
appropriate value.Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
03 Dec, 2016
1 commit
-
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek
Reported-by: Qian Zhang (张谦)
Acked-by: Ying Xue
Signed-off-by: David S. Miller
27 Aug, 2016
2 commits
-
Add UDP bearer options to netlink bearer get message. This is used by
the tipc user space tool to display UDP options.The UDP bearer information is passed using either a sockaddr_in or
sockaddr_in6 structs. This means the user space receiver should
intermediately store the retrieved data in a large enough struct
(sockaddr_strage) before casting to the proper IP version type.Signed-off-by: Richard Alpe
Reviewed-by: Jon Maloy
Acked-by: Ying Xue
Signed-off-by: David S. Miller -
This patch introduces UDP replicast. A concept where we emulate
multicast by sending multiple unicast messages to configured peers.The purpose of replicast is mainly to be able to use TIPC in cloud
environments where IP multicast is disabled. Using replicas to unicast
multicast messages is costly as we have to copy each skb and send the
copies individually.Signed-off-by: Richard Alpe
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller
24 Aug, 2016
1 commit
-
Use kfree_skb() instead of kfree() to free sk_buff.
Fixes: 0d051bf93c06 ("tipc: make bearer packet filtering generic")
Signed-off-by: Wei Yongjun
Acked-by: Ying Xue
Signed-off-by: David S. Miller
19 Aug, 2016
1 commit
-
In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
layer") we introduced a method of filtering out messages while a bearer
is being reset, to avoid that links may be re-created and come back in
working state while we are still in the process of shutting them down.This solution works well, but is limited to only work with L2 media, which
is insufficient with the increasing use of UDP as carrier media.We now replace this solution with a more generic one, by introducing a
new flag "up" in the generic struct tipc_bearer. This field will be set
and reset at the same locations as with the previous solution, while
the packet filtering is moved to the generic code for the sending side.
On the receiving side, the filtering is still done in media specific
code, but now including the UDP bearer.Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
27 Jul, 2016
1 commit
-
Introduce a new function to get the bearer name from
its id. This is used in subsequent commit.Reviewed-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller