18 Nov, 2018

1 commit

  • We get the following warning:

    [ 47.926140] 32-bit node address hash set to 2010a0a
    [ 47.927202]
    [ 47.927433] ================================
    [ 47.928050] WARNING: inconsistent lock state
    [ 47.928661] 4.19.0+ #37 Tainted: G E
    [ 47.929346] --------------------------------
    [ 47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
    [ 47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0
    [ 47.930116] {SOFTIRQ-ON-W} state was registered at:
    [ 47.930116] _raw_spin_lock+0x29/0x60
    [ 47.930116] rht_deferred_worker+0x556/0x810
    [ 47.930116] process_one_work+0x1f5/0x540
    [ 47.930116] worker_thread+0x64/0x3e0
    [ 47.930116] kthread+0x112/0x150
    [ 47.930116] ret_from_fork+0x3a/0x50
    [ 47.930116] irq event stamp: 14044
    [ 47.930116] hardirqs last enabled at (14044): [] __local_bh_enable_ip+0x7a/0xf0
    [ 47.938117] hardirqs last disabled at (14043): [] __local_bh_enable_ip+0x41/0xf0
    [ 47.938117] softirqs last enabled at (14028): [] irq_enter+0x5e/0x60
    [ 47.938117] softirqs last disabled at (14029): [] irq_exit+0xb5/0xc0
    [ 47.938117]
    [ 47.938117] other info that might help us debug this:
    [ 47.938117] Possible unsafe locking scenario:
    [ 47.938117]
    [ 47.938117] CPU0
    [ 47.938117] ----
    [ 47.938117] lock(&(&ht->lock)->rlock);
    [ 47.938117]
    [ 47.938117] lock(&(&ht->lock)->rlock);
    [ 47.938117]
    [ 47.938117] *** DEADLOCK ***
    [ 47.938117]
    [ 47.938117] 2 locks held by swapper/3/0:
    [ 47.938117] #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280
    [ 47.938117] #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc]
    [ 47.938117]
    [ 47.938117] stack backtrace:
    [ 47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 4.19.0+ #37
    [ 47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 47.938117] Call Trace:
    [ 47.938117]
    [ 47.938117] dump_stack+0x5e/0x8b
    [ 47.938117] print_usage_bug+0x1ed/0x1ff
    [ 47.938117] mark_lock+0x5b5/0x630
    [ 47.938117] __lock_acquire+0x4c0/0x18f0
    [ 47.938117] ? lock_acquire+0xa6/0x180
    [ 47.938117] lock_acquire+0xa6/0x180
    [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] _raw_spin_lock+0x29/0x60
    [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] tipc_sk_reinit+0xb0/0x410 [tipc]
    [ 47.938117] ? mark_held_locks+0x6f/0x90
    [ 47.938117] ? __local_bh_enable_ip+0x7a/0xf0
    [ 47.938117] ? lockdep_hardirqs_on+0x20/0x1a0
    [ 47.938117] tipc_net_finalize+0xbf/0x180 [tipc]
    [ 47.938117] tipc_disc_timeout+0x509/0x540 [tipc]
    [ 47.938117] ? call_timer_fn+0x5/0x280
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] call_timer_fn+0xa1/0x280
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] run_timer_softirq+0x1f2/0x4d0
    [ 47.938117] __do_softirq+0xfc/0x413
    [ 47.938117] irq_exit+0xb5/0xc0
    [ 47.938117] smp_apic_timer_interrupt+0xac/0x210
    [ 47.938117] apic_timer_interrupt+0xf/0x20
    [ 47.938117]
    [ 47.938117] RIP: 0010:default_idle+0x1c/0x140
    [ 47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b
    [ 47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
    [ 47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001
    [ 47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200
    [ 47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
    [ 47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [ 47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200
    [ 47.938117] ? default_idle+0x1a/0x140
    [ 47.938117] do_idle+0x1bc/0x280
    [ 47.938117] cpu_startup_entry+0x19/0x20
    [ 47.938117] start_secondary+0x187/0x1c0
    [ 47.938117] secondary_startup_64+0xa4/0xb0

    The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is
    calling the function rhashtable_walk_enter() within a timer interrupt.
    We fix this by executing tipc_net_finalize() in work queue context.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

07 Jul, 2018

2 commits

  • The duplicate address discovery protocol is not safe against two
    discoverers running in parallel. The one executing first after the
    trial period is over will set the node address and change its own
    message type to DSC_REQ_MSG. The one executing last may find that the
    node address is already set, and never change message type, with the
    result that its links may never be established.

    In this commmit we ensure that the message type always is set correctly
    after the trial period is over.

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • With the duplicate address discovery protocol for tipc nodes addresses
    we introduced a one second trial period before a node is allocated a
    hash number to use as address.

    Unfortunately, we miss to handle the case when a regular LINK REQUEST/
    RESPONSE arrives from a cluster node during the trial period. Such
    messages are not ignored as they should be, leading to links setup
    attempts while the node still has no address.

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

26 Mar, 2018

1 commit


24 Mar, 2018

6 commits

  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We add a 128-bit node identity, as an alternative to the currently used
    32-bit node address.

    For the sake of compatibility and to minimize message header changes
    we retain the existing 32-bit address field. When not set explicitly by
    the user, this field will be filled with a hash value generated from the
    much longer node identity, and be used as a shorthand value for the
    latter.

    We permit either the address or the identity to be set by configuration,
    but not both, so when the address value is set by a legacy user the
    corresponding 128-bit node identity is generated based on the that value.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation to changing the addressing structure of TIPC we replace
    all direct accesses to the tipc_net::own_addr field with the function
    dedicated for this, tipc_own_addr().

    There are no changes to program logics in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The removal of an internal structure of the node address has an unwanted
    side effect.
    - Currently, if a user is sending an anycast message with destination
    domain 0, the tipc_namebl_translate() function will use the 'closest-
    first' algorithm to first look for a node local destination, and only
    when no such is found, will it resort to the cluster global 'round-
    robin' lookup algorithm.
    - Current users can get around this, and enforce unconditional use of
    global round-robin by indicating a destination as Z.0.0 or Z.C.0.
    - This option disappears when we make the node address flat, since the
    lookup algorithm has no way of recognizing this case. So, as long as
    there are node local destinations, the algorithm will always select
    one of those, and there is nothing the sender can do to change this.

    We solve this by eliminating the 'closest-first' option, which was never
    a good idea anyway, for non-legacy users, but only for those. To
    distinguish between legacy users and non-legacy users we introduce a new
    flag 'legacy_addr_format' in struct tipc_core, to be set when the user
    configures a legacy-style Z.C.N node address. Hence, when a legacy user
    indicates a zero lookup domain 'closest-first' is selected, and in all
    other cases we use 'round-robin'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Nominally, TIPC organizes network nodes into a three-level network
    hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
    hierarchy is reflected in the node address format, - it is sub-divided
    into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

    However, the 'zone' and 'cluster' levels have in reality never been
    fully implemented,and never will be. The result of this has been
    that the first 20 bits the node identity structure have been wasted,
    and the usable node identity range within a cluster has been limited
    to 12 bits. This is starting to become a problem.

    In the following commits, we will need to be able to connect between
    nodes which are using the whole 32-bit value space of the node address.
    We therefore remove the restrictions on which values can be assigned
    to node identity, -it is from now on only a 32-bit integer with no
    assumed internal structure.

    Isolation between clusters is now achieved only by setting different
    values for the 'network id' field used during neighbor discovery, in
    practice leading to the latter becoming the new cluster identity.

    The rules for accepting discovery requests/responses from neighboring
    nodes now become:

    - If the user is using legacy address format on both peers, reception
    of discovery messages is subject to the legacy lookup domain check
    in addition to the cluster id check.

    - Otherwise, the discovery request/response is always accepted, provided
    both peers have the same network id.

    This secures backwards compatibility for users who have been using zone
    or cluster identities as cluster separators, instead of the intended
    'network id'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • To facilitate the coming changes in the neighbor discovery functionality
    we make some renaming and refactoring of that code. The functional changes
    in this commit are trivial, e.g., that we move the message sending call in
    tipc_disc_timeout() outside the spinlock protected region.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

01 Nov, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Jon Maloy
    Cc: Ying Xue
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Cc: tipc-discussion@lists.sourceforge.net
    Signed-off-by: Kees Cook
    Signed-off-by: David S. Miller

    Kees Cook
     

17 Jan, 2017

1 commit

  • Until now, we allocate memory always with GFP_ATOMIC flag.
    When the system is under memory pressure and a user tries to send,
    the send fails due to low memory. However, the user application
    can wait for free memory if we allocate it using GFP_KERNEL flag.

    In this commit, we use allocate memory with GFP_KERNEL for all user
    allocation.

    Reported-by: Rune Torgersen
    Acked-by: Jon Maloy
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    Parthasarathy Bhuvaragan
     

29 Jun, 2016

1 commit

  • The UDP msg2addr function tipc_udp_msg2addr() can return -EINVAL which
    prior to this patch was unhanded in the caller.

    Signed-off-by: Richard Alpe
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

08 Apr, 2016

1 commit

  • When enabling a bearer we create a 'neigbor discoverer' instance by
    calling the function tipc_disc_create() before the bearer is actually
    registered in the list of enabled bearers. Because of this, the very
    first discovery broadcast message, created by the mentioned function,
    is lost, since it cannot find any valid bearer to use. Furthermore,
    the used send function, tipc_bearer_xmit_skb() does not free the given
    buffer when it cannot find a bearer, resulting in the leak of exactly
    one send buffer each time a bearer is enabled.

    This commit fixes this problem by introducing two changes:

    1) Instead of attemting to send the discovery message directly, we let
    tipc_disc_create() return the discovery buffer to the calling
    function, tipc_enable_bearer(), so that the latter can send it
    when the enabling sequence is finished.

    2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
    functions at the bearer layer, we now free the indicated buffer or
    buffer chain when a valid bearer cannot be found.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Nov, 2015

1 commit

  • The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
    has been significantly reduced over the last couple of years.

    We now root out the last traces of this practice.
    There are no functional changes in this commit.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

2 commits

  • The neighbor discovery function currently uses the function
    tipc_bearer_send() for transmitting packets, assuming that the
    sent buffers are not consumed by the called function.

    We want to change this, in order to avoid unnecessary buffer cloning
    elswhere in the code.

    This commit introduces a new function tipc_bearer_skb() which consumes
    the sent buffers, and let the discoverer functions use this new call
    instead. The discoverer does now itself perform the cloning when
    that is necessary.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have tried to support both the newer, dedicated broadcast
    synchronization mechanism along with the older, less safe, RESET_MSG/
    ACTIVATE_MSG based one. The latter method has turned out to be a hazard
    in a highly dynamic cluster, so we find it safer to disable it completely
    when we find that the former mechanism is supported by the peer node.

    For this purpose, we now introduce a new capabability bit,
    TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
    syncronization is supported by the present node. The new bit is conveyed
    between peers in the 'capabilities' field of neighbor discovery messages.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

31 Jul, 2015

1 commit

  • The node lock is currently grabbed and and released in the function
    tipc_disc_rcv() in the file discover.c. As a preparation for the next
    commits, we need to move this node lock handling, along with the code
    area it is covering, to node.c.

    This commit introduces this change.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2015

2 commits

  • As a step towards turning links into node internal entities, we move the
    creation of links from the neighbor discovery logics to the node's link
    control logics.

    We also create an additional entry for the link's media address in the
    newly introduced struct tipc_link_entry, since this is where it is
    needed in the upcoming commits. The current copy in struct tipc_link
    is kept for now, but will be removed later.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • struct 'tipc_node' currently contains two arrays for link attributes,
    one for the link pointers, and one for the usable link MTUs.

    We now group those into a new struct 'tipc_link_entry', and intoduce
    one single array consisting of such enties. Apart from being a cosmetic
    improvement, this is a starting point for the strict master-slave
    relation between node and link that we will introduce in the following
    commits.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

30 Mar, 2015

1 commit

  • TIPC node hash node table is protected with rcu lock on read side.
    tipc_node_find() is used to look for a node object with node address
    through iterating the hash node table. As the entire process of what
    tipc_node_find() traverses the table is guarded with rcu read lock,
    it's safe for us. However, when callers use the node object returned
    by tipc_node_find(), there is no rcu read lock applied. Therefore,
    this is absolutely unsafe for callers of tipc_node_find().

    Now we introduce a reference counter for node structure. Before
    tipc_node_find() returns node object to its caller, it first increases
    the reference counter. Accordingly, after its caller used it up,
    it decreases the counter again. This can prevent a node being used by
    one thread from being freed by another thread.

    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

15 Mar, 2015

1 commit

  • The TIPC protocol spec has defined a 13 bit capability bitmap in
    the neighbor discovery header, as a means to maintain compatibility
    between different code and protocol generations. Until now this field
    has been unused.

    We now introduce the basic framework for exchanging capabilities
    between nodes at first contact. After exchange, a peer node's
    capabilities are stored as a 16 bit bitmap in struct tipc_node.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 Mar, 2015

1 commit

  • The payload area following the TIPC discovery message header is an
    opaque area defined by the media. INT_H_SIZE was enough for
    Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing
    information.

    Signed-off-by: Erik Hugne
    Signed-off-by: David S. Miller

    Erik Hugne
     

06 Feb, 2015

1 commit

  • The most common usage of namespace information is when we fetch the
    own node addess from the net structure. This leads to a lot of
    passing around of a parameter of type 'struct net *' between
    functions just to make them able to obtain this address.

    However, in many cases this is unnecessary. The own node address
    is readily available as a member of both struct tipc_sock and
    tipc_link, and can be fetched from there instead.
    The fact that the vast majority of functions in socket.c and link.c
    anyway are maintaining a pointer to their respective base structures
    makes this option even more compelling.

    In this commit, we introduce the inline functions tsk_own_node()
    and link_own_node() to make it easy for functions to fetch the node
    address from those structs instead of having to pass along and
    dereference the namespace struct.

    In particular, we make calls to the msg_xx() functions in msg.{h,c}
    context independent by directly passing them the own node address
    as parameter when needed. Those functions should be regarded as
    leaves in the code dependency tree, and it is hence desirable to
    keep them namspace unaware.

    Apart from a potential positive effect on cache behavior, these
    changes make it easier to introduce the changes that will follow
    later in this series.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

05 Feb, 2015

1 commit

  • Instances of struct node are created in the function tipc_disc_rcv()
    under the assumption that there is no race between received discovery
    messages arriving from the same node. This assumption is wrong.
    When we use more than one bearer, it is possible that discovery
    messages from the same node arrive at the same moment, resulting in
    creation of two instances of struct tipc_node. This may later cause
    confusion during link establishment, and may result in one of the links
    never becoming activated.

    We fix this by making lookup and potential creation of nodes atomic.
    Instead of first looking up the node, and in case of failure, create it,
    we now start with looking up the node inside node_link_create(), and
    return a reference to that one if found. Otherwise, we go ahead and
    create the node as we did before.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

13 Jan, 2015

6 commits

  • After namespace is supported, each namespace should own its private
    random value. So the global variable representing the random value
    must be moved to tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Bearer list defined as a global variable is used to store bearer
    instances. When tipc supports net namespace, bearers created in
    one namespace must be isolated with others allocated in other
    namespaces, which requires us that the bearer list(bearer_list)
    must be moved to tipc_net structure. As a result, a net namespace
    pointer has to be passed to functions which access the bearer list.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Not only some wrapper function like k_term_timer() is empty, but also
    some others including k_start_timer() and k_cancel_timer() don't return
    back any value to its caller, what's more, there is no any component
    in the kernel world to do such thing. Therefore, these timer interfaces
    defined in tipc module should be purged.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

15 May, 2014

2 commits

  • The function tipc_disc_rcv(), which is handling received neighbor
    discovery messages, is perceived as messy, and it is hard to verify
    its correctness by code inspection. The fact that the task it is set
    to resolve is fairly complex does not make the situation better.

    In this commit we try to take a more systematic approach to the
    problem. We define a decision machine which takes three state flags
    as input, and produces three action flags as output. We then walk
    through all permutations of the state flags, and for each of them we
    describe verbally what is going on, plus that we set zero or more of
    the action flags. The action flags indicate what should be done once
    the decision machine has finished its job, while the last part of the
    function deals with performing those actions.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • TIPC currently handles two media specific addresses: Ethernet MAC
    addresses and InfiniBand addresses. Those are kept in three different
    formats:

    1) A "raw" format as obtained from the device. This format is known
    only by the media specific adapter code in eth_media.c and
    ib_media.c.
    2) A "generic" internal format, in the form of struct tipc_media_addr,
    which can be referenced and passed around by the generic media-
    unaware code.
    3) A serialized version of the latter, to be conveyed in neighbor
    discovery messages.

    Conversion between the three formats can only be done by the media
    specific code, so we have function pointers for this purpose in
    struct tipc_media. Here, the media adapters can install their own
    conversion functions at startup.

    We now introduce a new such function, 'raw2addr()', whose purpose
    is to convert from format 1 to format 2 above. We also try to as far
    as possible uniform commenting, variable names and usage of these
    functions, with the purpose of making them more comprehensible.

    We can now also remove the function tipc_l2_media_addr_set(), whose
    job is done better by the new function.

    Finally, we expand the field for serialized addresses (format 3)
    in discovery messages from 20 to 32 bytes. This is permitted
    according to the spec, and reduces the risk of problems when we
    add new media in the future.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

28 Apr, 2014

1 commit

  • The commit a8b9b96e959f3c035af20b1bd2ba67b0b7269b19 ("tipc: fix race
    in disc create/delete") leads to the following static checker warning:

    net/tipc/discover.c:352 tipc_disc_create()
    warn: possible memory leak of 'req'

    The risk of memory leak really exists in practice. Especially when
    it's failed to allocate memory for "req->buf", tipc_disc_create()
    doesn't free its allocated memory, instead just directly returns
    with ENOMEM error code. In this situation, memory leak, of course,
    happens.

    Reported-by: Dan Carpenter
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

23 Apr, 2014

2 commits

  • Commit a21a584d6720ce349b05795b9bcfab3de8e58419 (tipc: fix neighbor
    detection problem after hw address change) introduces a race condition
    involving tipc_disc_delete() and tipc_disc_add/remove_dest that can
    cause TIPC to dereference the pointer to the bearer discovery request
    structure after it has been freed since a stray pointer is left in the
    bearer structure.

    In order to fix the issue, the process of resetting the discovery
    request handler is optimized: the discovery request handler and request
    buffer are just reset instead of being freed, allocated and initialized.
    As the request point is always valid and the request's lock is taken
    while the request handler is reset, the race doesn't happen any more.

    Reported-by: Erik Hugne
    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Currently on both paths of message transmission and reception, the
    read lock of tipc_net_lock must be held before bearer is accessed,
    while the write lock of tipc_net_lock has to be taken before bearer
    is configured. Although it can ensure that bearer is always valid on
    the two data paths, link and bearer is closely bound together.

    So as the part of effort of removing tipc_net_lock, the locking
    policy of bearer protection will be adjusted as below: on the two
    data paths, RCU is used, and on the configuration path of bearer,
    RTNL lock is applied.

    Now RCU just covers the path of message reception. To make it possible
    to protect the path of message transmission with RCU, link should not
    use its stored bearer pointer to access bearer, but it should use the
    bearer identity of its attached bearer as index to get bearer instance
    from bearer_list array, which can help us decouple the relationship
    between bearer and link. As a result, bearer on the path of message
    transmission can be safely protected by RCU when we access bearer_list
    array within RCU lock protection.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

29 Mar, 2014

1 commit

  • The node discovery domain is assigned when a bearer is enabled.
    In the previous commit we reflect this attribute directly in the
    bearer structure since it's needed to reinitialize the node
    discovery mechanism after a hardware address change.

    There's no need to replicate this attribute anywhere else, so we
    remove it from the tipc_link_req structure.

    Signed-off-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     

19 Feb, 2014

1 commit

  • Rename the following functions, which are shorter and more in line
    with common naming practice in the network subsystem.

    tipc_bclink_send_msg->tipc_bclink_xmit
    tipc_bclink_recv_pkt->tipc_bclink_rcv
    tipc_disc_recv_msg->tipc_disc_rcv
    tipc_link_send_proto_msg->tipc_link_proto_xmit
    link_recv_proto_msg->tipc_link_proto_rcv
    link_send_sections_long->tipc_link_iovec_long_xmit
    tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
    tipc_link_send_sync->tipc_link_sync_xmit
    tipc_link_recv_sync->tipc_link_sync_rcv
    tipc_link_send_buf->__tipc_link_xmit
    tipc_link_send->tipc_link_xmit
    tipc_link_send_names->tipc_link_names_xmit
    tipc_named_recv->tipc_named_rcv
    tipc_link_recv_bundle->tipc_link_bundle_rcv
    tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
    link_send_long_buf->tipc_link_frag_xmit

    tipc_multicast->tipc_port_mcast_xmit
    tipc_port_recv_mcast->tipc_port_mcast_rcv
    tipc_port_reject_sections->tipc_port_iovec_reject
    tipc_port_recv_proto_msg->tipc_port_proto_rcv
    tipc_connect->tipc_port_connect
    __tipc_connect->__tipc_port_connect
    __tipc_disconnect->__tipc_port_disconnect
    tipc_disconnect->tipc_port_disconnect
    tipc_shutdown->tipc_port_shutdown
    tipc_port_recv_msg->tipc_port_rcv
    tipc_port_recv_sections->tipc_port_iovec_rcv

    release->tipc_release
    accept->tipc_accept
    bind->tipc_bind
    get_name->tipc_getname
    poll->tipc_poll
    send_msg->tipc_sendmsg
    send_packet->tipc_send_packet
    send_stream->tipc_send_stream
    recv_msg->tipc_recvmsg
    recv_stream->tipc_recv_stream
    connect->tipc_connect
    listen->tipc_listen
    shutdown->tipc_shutdown
    setsockopt->tipc_setsockopt
    getsockopt->tipc_getsockopt

    Above changes have no impact on current users of the functions.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

08 Jan, 2014

1 commit

  • Currently, only 'bearer_lock' is used to protect struct link_req in
    the function disc_timeout(). This is unsafe, since the member fields
    'num_nodes' and 'timer_intv' might be accessed by below three different
    threads simultaneously, none of them grabbing bearer_lock in the
    critical region:

    link_activate()
    tipc_bearer_add_dest()
    tipc_disc_add_dest()
    req->num_nodes++;

    tipc_link_reset()
    tipc_bearer_remove_dest()
    tipc_disc_remove_dest()
    req->num_nodes--
    disc_update()
    read req->num_nodes
    write req->timer_intv

    disc_timeout()
    read req->num_nodes
    read/write req->timer_intv

    Without lock protection, the only symptom of a race is that discovery
    messages occasionally may not be sent out. This is not fatal, since such
    messages are best-effort anyway. On the other hand, since discovery
    messages are not time critical, adding a protecting lock brings no
    serious overhead either. So we add a new, dedicated spinlock in
    order to guarantee absolute data consistency in link_req objects.
    This also helps reduce the overall role of the bearer_lock, which
    we want to remove completely in a later commit series.

    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

10 Dec, 2013

1 commit

  • struct 'tipc_bearer' is a generic representation of the underlying
    media type, and exists in a one-to-one relationship to each interface
    TIPC is using. The struct contains a 'blocked' flag that mirrors the
    operational and execution state of the represented interface, and is
    updated through notification calls from the latter. The users of
    tipc_bearer are checking this flag before each attempt to send a
    packet via the interface.

    This state mirroring serves no purpose in the current code base. TIPC
    links will not discover a media failure any faster through this
    mechanism, and in reality the flag only adds overhead at packet
    sending and reception.

    Furthermore, the fact that the flag needs to be protected by a spinlock
    aggregated into tipc_bearer has turned out to cause a serious and
    completely unnecessary deadlock problem.

    CPU0 CPU1
    ---- ----
    Time 0: bearer_disable() link_timeout()
    Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue()
    Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr)
    Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock)
    Time 4: del_timer_sync(&req->timer)

    I.e., del_timer_sync() on CPU0 never returns, because the timer handler
    on CPU1 is waiting for the bearer lock.

    We eliminate the 'blocked' flag from struct tipc_bearer, along with all
    tests on this flag. This not only resolves the deadlock, but also
    simplifies and speeds up the data path execution of TIPC. It also fits
    well into our ongoing effort to make the locking policy simpler and
    more manageable.

    An effect of this change is that we can get rid of functions such as
    tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
    We replace the latter with a new function, tipc_reset_bearer(), which
    resets all links associated to the bearer immediately after an
    interface goes down.

    A user might notice one slight change in link behaviour after this
    change. When an interface goes down, (e.g. through a NETDEV_DOWN
    event) all attached links will be reset immediately, instead of
    leaving it to each link to detect the failure through a timer-driven
    mechanism. We consider this an improvement, and see no obvious risks
    with the new behavior.

    Signed-off-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne