09 Aug, 2019

1 commit

  • Since node internal messages are passed directly to the socket, it is not
    possible to observe those messages via tcpdump or wireshark.

    We now remedy this by making it possible to clone such messages and send
    the clones to the loopback interface. The clones are dropped at reception
    and have no functional role except making the traffic visible.

    The feature is enabled if network taps are active for the loopback device.
    pcap filtering restrictions require the messages to be presented to the
    receiving side of the loopback device.

    v3 - Function dev_nit_active used to check for network taps.
    - Procedure netif_rx_ni used to send cloned messages to loopback device.

    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    John Rutherford
     

08 Jul, 2019

1 commit

  • For these places are protected by rcu_read_lock, we change from
    rcu_dereference_rtnl to rcu_dereference, as there is no need to
    check if rtnl lock is held.

    For these places are protected by rtnl_lock, we change from
    rcu_dereference_rtnl to rtnl_dereference/rcu_dereference_protected,
    as no extra memory barriers are needed under rtnl_lock() which also
    protects tn->bearer_list[] and dev->tipc_ptr/b->media_ptr updating.

    rcu_dereference_rtnl will be only used in the places where it could
    be under rcu_read_lock or rtnl_lock.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

28 Dec, 2018

1 commit

  • bearer_disable() already calls kfree_rcu() to free struct tipc_bearer,
    we don't need to call kfree() again.

    Fixes: cb30a63384bc ("tipc: refactor function tipc_enable_bearer()")
    Reported-by: syzbot+b981acf1fb240c0c128b@syzkaller.appspotmail.com
    Cc: Ying Xue
    Cc: Jon Maloy
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Dec, 2018

2 commits

  • The commit adds the new trace_event for TIPC bearer, L2 device event:

    trace_tipc_l2_device_event()

    Also, it puts the trace at the tipc_l2_device_event() function, then
    the device/bearer events and related info can be traced out during
    runtime when needed.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • As for the sake of debugging/tracing, the commit enables tracepoints in
    TIPC along with some general trace_events as shown below. It also
    defines some 'tipc_*_dump()' functions that allow to dump TIPC object
    data whenever needed, that is, for general debug purposes, ie. not just
    for the trace_events.

    The following trace_events are now available:

    - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
    e.g. message type, user, droppable, skb truesize, cloned skb, etc.

    - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
    queues, e.g. TIPC link transmq, socket receive queue, etc.

    - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
    sk state, sk type, connection type, rmem_alloc, socket queues, etc.

    - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
    link state, silent_intv_cnt, gap, bc_gap, link queues, etc.

    - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
    node state, active links, capabilities, link entries, etc.

    How to use:
    Put the trace functions at any places where we want to dump TIPC data
    or events.

    Note:
    a) The dump functions will generate raw data only, that is, to offload
    the trace event's processing, it can require a tool or script to parse
    the data but this should be simple.

    b) The trace_tipc_*_dump() should be reserved for a failure cases only
    (e.g. the retransmission failure case) or where we do not expect to
    happen too often, then we can consider enabling these events by default
    since they will almost not take any effects under normal conditions,
    but once the rare condition or failure occurs, we get the dumped data
    fully for post-analysis.

    For other trace purposes, we can reuse these trace classes as template
    but different events.

    c) A trace_event is only effective when we enable it. To enable the
    TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
    directory in the 'debugfs' file system. Normally, they are located at:

    /sys/kernel/debug/tracing/events/tipc/

    For example:

    To enable the tipc_link_dump event:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable

    To enable all the TIPC trace_events:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/enable

    To collect the trace data:

    cat trace

    or

    cat trace_pipe > /trace.out &

    To disable all the TIPC trace_events:

    echo 0 > /sys/kernel/debug/tracing/events/tipc/enable

    To clear the trace buffer:

    echo > trace

    d) Like the other trace_events, the feature like 'filter' or 'trigger'
    is also usable for the tipc trace_events.
    For more details, have a look at:

    Documentation/trace/ftrace.txt

    MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

04 Oct, 2018

1 commit


26 Sep, 2018

1 commit


11 Sep, 2018

1 commit


27 Jul, 2018

1 commit


05 Jul, 2018

1 commit


20 Apr, 2018

2 commits

  • Currently, we have option to configure MTU of UDP media. The configured
    MTU takes effect on the links going up after that moment. I.e, a user
    has to reset bearer to have new value applied across its links. This is
    confusing and disturbing on a running cluster.

    We now introduce the functionality to change the default UDP bearer MTU
    in struct tipc_bearer. Additionally, the links are updated dynamically,
    without any need for a reset, when bearer value is changed. We leverage
    the existing per-link functionality and the design being symetrical to
    the confguration of link tolerance.

    Acked-by: Jon Maloy
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     
  • In previous commit, we changed the default emulated MTU for UDP bearers
    to 14k.

    This commit adds the functionality to set/change the default value
    by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
    and enabled back for the new MTU to take effect.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     

24 Mar, 2018

5 commits

  • Selecting and explicitly configuring a TIPC node identity may be
    unwanted in some cases.

    In this commit we introduce a default setting if the identity has not
    been set at the moment the first bearer is enabled. We do this by
    using a raw copy of a unique identifier from the used interface: MAC
    address in the case of an L2 bearer, IPv4/IPv6 address in the case
    of a UDP bearer.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Nominally, TIPC organizes network nodes into a three-level network
    hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
    hierarchy is reflected in the node address format, - it is sub-divided
    into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

    However, the 'zone' and 'cluster' levels have in reality never been
    fully implemented,and never will be. The result of this has been
    that the first 20 bits the node identity structure have been wasted,
    and the usable node identity range within a cluster has been limited
    to 12 bits. This is starting to become a problem.

    In the following commits, we will need to be able to connect between
    nodes which are using the whole 32-bit value space of the node address.
    We therefore remove the restrictions on which values can be assigned
    to node identity, -it is from now on only a 32-bit integer with no
    assumed internal structure.

    Isolation between clusters is now achieved only by setting different
    values for the 'network id' field used during neighbor discovery, in
    practice leading to the latter becoming the new cluster identity.

    The rules for accepting discovery requests/responses from neighboring
    nodes now become:

    - If the user is using legacy address format on both peers, reception
    of discovery messages is subject to the legacy lookup domain check
    in addition to the cluster id check.

    - Otherwise, the discovery request/response is always accepted, provided
    both peers have the same network id.

    This secures backwards compatibility for users who have been using zone
    or cluster identities as cluster separators, instead of the intended
    'network id'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • To facilitate the coming changes in the neighbor discovery functionality
    we make some renaming and refactoring of that code. The functional changes
    in this commit are trivial, e.g., that we move the message sending call in
    tipc_disc_timeout() outside the spinlock protected region.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation for the next commits we try to reduce the footprint of
    the function tipc_enable_bearer(), while hopefully making is simpler to
    follow.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

20 Feb, 2018

1 commit


15 Feb, 2018

5 commits


27 Dec, 2017

1 commit


07 Sep, 2017

1 commit


02 Sep, 2017

1 commit


30 Aug, 2017

1 commit

  • For a bond slave device as a tipc bearer, the dev represents the bond
    interface and orig_dev represents the slave in tipc_l2_rcv_msg().
    Since we decode the tipc_ptr from bonding device (dev), we fail to
    find the bearer and thus tipc links are not established.

    In this commit, we register the tipc protocol callback per device and
    look for tipc bearer from both the devices.

    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

22 Aug, 2017

1 commit

  • When the broadcast send link after 100 attempts has failed to
    transfer a packet to all peers, we consider it stale, and reset
    it. Thereafter it needs to re-synchronize with the peers, something
    currently done by just resetting and re-establishing all links to
    all peers. This has turned out to be overkill, with potentially
    unwanted consequences for the remaining cluster.

    A closer analysis reveals that this can be done much simpler. When
    this kind of failure happens, for reasons that may lie outside the
    TIPC protocol, it is typically only one peer which is failing to
    receive and acknowledge packets. It is hence sufficient to identify
    and reset the links only to that peer to resolve the situation, without
    having to reset the broadcast link at all. This solution entails a much
    lower risk of negative consequences for the own node as well as for
    the overall cluster.

    We implement this change in this commit.

    Reviewed-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 Aug, 2017

1 commit

  • On L2 bearers, the TIPC broadcast function is sending out packets using
    the corresponding L2 broadcast address. At reception, we filter such
    packets under the assumption that they will also be delivered as
    broadcast packets.

    This assumption doesn't always hold true. Under high load, we have seen
    that a switch may convert the destination address and deliver the packet
    as a PACKET_MULTICAST, something leading to inadvertently dropped
    packets and a stale and reset broadcast link.

    We fix this by extending the reception filtering to accept packets of
    type PACKET_MULTICAST.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

14 Apr, 2017

2 commits

  • This is an add-on to the previous patch that passes the extended ACK
    structure where it's already available by existing genl_info or extack
    function arguments.

    This was done with this spatch (with some manual adjustment of
    indentation):

    @@
    expression A, B, C, D, E;
    identifier fn, info;
    @@
    fn(..., struct genl_info *info, ...) {
    ...
    -nlmsg_parse(A, B, C, D, E, NULL)
    +nlmsg_parse(A, B, C, D, E, info->extack)
    ...
    }

    @@
    expression A, B, C, D, E;
    identifier fn, info;
    @@
    fn(..., struct genl_info *info, ...) {
    extack)
    ...>
    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {
    ...
    -nlmsg_parse(A, B, C, D, E, NULL)
    +nlmsg_parse(A, B, C, D, E, extack)
    ...
    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    Signed-off-by: Johannes Berg
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Pass the new extended ACK reporting struct to all of the generic
    netlink parsing functions. For now, pass NULL in almost all callers
    (except for some in the core.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

21 Jan, 2017

1 commit

  • As a preparation for the 'replicast' functionality we are going to
    introduce in the next commits, we need the broadcast base structure to
    store whether bearer broadcast is available at all from the currently
    used bearer or bearers.

    We do this by adding a new function tipc_bearer_bcast_support() to
    the bearer layer, and letting the bearer selection function in
    bcast.c use this to give a new boolean field, 'bcast_support' the
    appropriate value.

    Reviewed-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

03 Dec, 2016

1 commit

  • Qian Zhang (张谦) reported a potential socket buffer overflow in
    tipc_msg_build() which is also known as CVE-2016-8632: due to
    insufficient checks, a buffer overflow can occur if MTU is too short for
    even tipc headers. As anyone can set device MTU in a user/net namespace,
    this issue can be abused by a regular user.

    As agreed in the discussion on Ben Hutchings' original patch, we should
    check the MTU at the moment a bearer is attached rather than for each
    processed packet. We also need to repeat the check when bearer MTU is
    adjusted to new device MTU. UDP case also needs a check to avoid
    overflow when calculating bearer MTU.

    Fixes: b97bf3fd8f6a ("[TIPC] Initial merge")
    Signed-off-by: Michal Kubecek
    Reported-by: Qian Zhang (张谦)
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Michal Kubeček
     

27 Aug, 2016

2 commits

  • Add UDP bearer options to netlink bearer get message. This is used by
    the tipc user space tool to display UDP options.

    The UDP bearer information is passed using either a sockaddr_in or
    sockaddr_in6 structs. This means the user space receiver should
    intermediately store the retrieved data in a large enough struct
    (sockaddr_strage) before casting to the proper IP version type.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • This patch introduces UDP replicast. A concept where we emulate
    multicast by sending multiple unicast messages to configured peers.

    The purpose of replicast is mainly to be able to use TIPC in cloud
    environments where IP multicast is disabled. Using replicas to unicast
    multicast messages is costly as we have to copy each skb and send the
    copies individually.

    Signed-off-by: Richard Alpe
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     

24 Aug, 2016

1 commit


19 Aug, 2016

1 commit

  • In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
    layer") we introduced a method of filtering out messages while a bearer
    is being reset, to avoid that links may be re-created and come back in
    working state while we are still in the process of shutting them down.

    This solution works well, but is limited to only work with L2 media, which
    is insufficient with the increasing use of UDP as carrier media.

    We now replace this solution with a more generic one, by introducing a
    new flag "up" in the generic struct tipc_bearer. This field will be set
    and reset at the same locations as with the previous solution, while
    the packet filtering is moved to the generic code for the sending side.
    On the receiving side, the filtering is still done in media specific
    code, but now including the UDP bearer.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

27 Jul, 2016

1 commit