08 Sep, 2020

1 commit

  • In the commit fdeba99b1e58
    ("tipc: fix use-after-free in tipc_bcast_get_mode"), we're trying
    to make sure the tipc_net_finalize_work work item finished if it
    enqueued. But calling flush_scheduled_work() is not just affecting
    above work item but either any scheduled work. This has turned out
    to be overkill and caused to deadlock as syzbot reported:

    ======================================================
    WARNING: possible circular locking dependency detected
    5.9.0-rc2-next-20200828-syzkaller #0 Not tainted
    ------------------------------------------------------
    kworker/u4:6/349 is trying to acquire lock:
    ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13e0 kernel/workqueue.c:2777

    but task is already holding lock:
    ffffffff8a879430 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x9b/0xb10 net/core/net_namespace.c:565

    [...]
    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(pernet_ops_rwsem);
    lock(&sb->s_type->i_mutex_key#13);
    lock(pernet_ops_rwsem);
    lock((wq_completion)events);

    *** DEADLOCK ***
    [...]

    v1:
    To fix the original issue, we replace above calling by introducing
    a bit flag. When a namespace cleaned-up, bit flag is set to zero and:
    - tipc_net_finalize functionial just does return immediately.
    - tipc_net_finalize_work does not enqueue into the scheduled work queue.

    v2:
    Use cancel_work_sync() helper to make sure ONLY the
    tipc_net_finalize_work() stopped before releasing bcbase object.

    Reported-by: syzbot+d5aa7e0385f6a5d0f4fd@syzkaller.appspotmail.com
    Fixes: fdeba99b1e58 ("tipc: fix use-after-free in tipc_bcast_get_mode")
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Huu Le
    Signed-off-by: Jakub Kicinski

    Hoang Huu Le
     

21 Dec, 2019

1 commit

  • To enable iproute2/tipc to generate backwards compatible
    printouts and validate command parameters for nodes using a
    node address, it needs to be able to read the legacy
    address flag from the kernel. The legacy address flag records
    the way in which the node identity was originally specified.

    The legacy address flag is requested by the netlink message
    TIPC_NL_ADDR_LEGACY_GET. If the flag is set the attribute
    TIPC_NLA_NET_ADDR_LEGACY is set in the return message.

    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    John Rutherford
     

13 Nov, 2019

1 commit

  • In commit 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address
    hash values"), the 32-bit node address only generated after one second
    trial period expired. However the self's addr in struct tipc_monitor do
    not update according to node address generated. This lead to it is
    always zero as initial value. As result, sorting algorithm using this
    value does not work as expected, neither neighbor monitoring framework.

    In this commit, we add a fix to update self's addr when 32-bit node
    address generated.

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

27 Mar, 2019

1 commit

  • When running a syz script, a panic occurred:

    [ 156.088228] BUG: KASAN: use-after-free in tipc_disc_timeout+0x9c9/0xb20 [tipc]
    [ 156.094315] Call Trace:
    [ 156.094844]
    [ 156.095306] dump_stack+0x7c/0xc0
    [ 156.097346] print_address_description+0x65/0x22e
    [ 156.100445] kasan_report.cold.3+0x37/0x7a
    [ 156.102402] tipc_disc_timeout+0x9c9/0xb20 [tipc]
    [ 156.106517] call_timer_fn+0x19a/0x610
    [ 156.112749] run_timer_softirq+0xb51/0x1090

    It was caused by the netns freed without deleting the discoverer timer,
    while later on the netns would be accessed in the timer handler.

    The timer should have been deleted by tipc_net_stop() when cleaning up a
    netns. However, tipc has been able to enable a bearer and start d->timer
    without the local node_addr set since Commit 52dfae5c85a4 ("tipc: obtain
    node identity from interface by default"), which caused the timer not to
    be deleted in tipc_net_stop() then.

    So fix it in tipc_net_stop() by changing to check local node_id instead
    of local node_addr, as Jon suggested.

    While at it, remove the calling of tipc_nametbl_withdraw() there, since
    tipc_nametbl_stop() will take of the nametbl's freeing after.

    Fixes: 52dfae5c85a4 ("tipc: obtain node identity from interface by default")
    Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Xin Long
     

18 Nov, 2018

1 commit

  • We get the following warning:

    [ 47.926140] 32-bit node address hash set to 2010a0a
    [ 47.927202]
    [ 47.927433] ================================
    [ 47.928050] WARNING: inconsistent lock state
    [ 47.928661] 4.19.0+ #37 Tainted: G E
    [ 47.929346] --------------------------------
    [ 47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
    [ 47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0
    [ 47.930116] {SOFTIRQ-ON-W} state was registered at:
    [ 47.930116] _raw_spin_lock+0x29/0x60
    [ 47.930116] rht_deferred_worker+0x556/0x810
    [ 47.930116] process_one_work+0x1f5/0x540
    [ 47.930116] worker_thread+0x64/0x3e0
    [ 47.930116] kthread+0x112/0x150
    [ 47.930116] ret_from_fork+0x3a/0x50
    [ 47.930116] irq event stamp: 14044
    [ 47.930116] hardirqs last enabled at (14044): [] __local_bh_enable_ip+0x7a/0xf0
    [ 47.938117] hardirqs last disabled at (14043): [] __local_bh_enable_ip+0x41/0xf0
    [ 47.938117] softirqs last enabled at (14028): [] irq_enter+0x5e/0x60
    [ 47.938117] softirqs last disabled at (14029): [] irq_exit+0xb5/0xc0
    [ 47.938117]
    [ 47.938117] other info that might help us debug this:
    [ 47.938117] Possible unsafe locking scenario:
    [ 47.938117]
    [ 47.938117] CPU0
    [ 47.938117] ----
    [ 47.938117] lock(&(&ht->lock)->rlock);
    [ 47.938117]
    [ 47.938117] lock(&(&ht->lock)->rlock);
    [ 47.938117]
    [ 47.938117] *** DEADLOCK ***
    [ 47.938117]
    [ 47.938117] 2 locks held by swapper/3/0:
    [ 47.938117] #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280
    [ 47.938117] #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc]
    [ 47.938117]
    [ 47.938117] stack backtrace:
    [ 47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 4.19.0+ #37
    [ 47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 47.938117] Call Trace:
    [ 47.938117]
    [ 47.938117] dump_stack+0x5e/0x8b
    [ 47.938117] print_usage_bug+0x1ed/0x1ff
    [ 47.938117] mark_lock+0x5b5/0x630
    [ 47.938117] __lock_acquire+0x4c0/0x18f0
    [ 47.938117] ? lock_acquire+0xa6/0x180
    [ 47.938117] lock_acquire+0xa6/0x180
    [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] _raw_spin_lock+0x29/0x60
    [ 47.938117] ? rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] rhashtable_walk_enter+0x36/0xb0
    [ 47.938117] tipc_sk_reinit+0xb0/0x410 [tipc]
    [ 47.938117] ? mark_held_locks+0x6f/0x90
    [ 47.938117] ? __local_bh_enable_ip+0x7a/0xf0
    [ 47.938117] ? lockdep_hardirqs_on+0x20/0x1a0
    [ 47.938117] tipc_net_finalize+0xbf/0x180 [tipc]
    [ 47.938117] tipc_disc_timeout+0x509/0x540 [tipc]
    [ 47.938117] ? call_timer_fn+0x5/0x280
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] call_timer_fn+0xa1/0x280
    [ 47.938117] ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
    [ 47.938117] run_timer_softirq+0x1f2/0x4d0
    [ 47.938117] __do_softirq+0xfc/0x413
    [ 47.938117] irq_exit+0xb5/0xc0
    [ 47.938117] smp_apic_timer_interrupt+0xac/0x210
    [ 47.938117] apic_timer_interrupt+0xf/0x20
    [ 47.938117]
    [ 47.938117] RIP: 0010:default_idle+0x1c/0x140
    [ 47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b
    [ 47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
    [ 47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001
    [ 47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200
    [ 47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
    [ 47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [ 47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200
    [ 47.938117] ? default_idle+0x1a/0x140
    [ 47.938117] do_idle+0x1bc/0x280
    [ 47.938117] cpu_startup_entry+0x19/0x20
    [ 47.938117] start_secondary+0x187/0x1c0
    [ 47.938117] secondary_startup_64+0xa4/0xb0

    The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is
    calling the function rhashtable_walk_enter() within a timer interrupt.
    We fix this by executing tipc_net_finalize() in work queue context.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

08 Aug, 2018

1 commit

  • Commit 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread
    safe") tries to make it thread safe to set node address, so it uses
    node_list_lock lock to serialize the whole process of setting node
    address in tipc_net_finalize(). But it causes the following interrupt
    unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    rht_deferred_worker()
    rhashtable_rehash_table()
    lock(&(&ht->lock)->rlock)
    tipc_nl_compat_doit()
    tipc_net_finalize()
    local_irq_disable();
    lock(&(&tn->node_list_lock)->rlock);
    tipc_sk_reinit()
    rhashtable_walk_enter()
    lock(&(&ht->lock)->rlock);

    tipc_disc_rcv()
    tipc_node_check_dest()
    tipc_node_create()
    lock(&(&tn->node_list_lock)->rlock);

    *** DEADLOCK ***

    When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
    disable BH. So if an interrupt happens after the lock, it can create
    an inverse lock ordering between ht->lock and tn->node_list_lock. As
    a consequence, deadlock might happen.

    The reason causing the inverse lock ordering scenario above is because
    the initial purpose of node_list_lock is not designed to do the
    serialization of node address setting.

    As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
    we use it to replace node_list_lock to ensure setting node address can
    be atomically finished. It turns out the potential deadlock can be
    avoided as well.

    Fixes: 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread safe")
    Signed-off-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

07 Jul, 2018

1 commit

  • The setting of the node address is not thread safe, meaning that
    two discoverers may decide to set it simultanously, with a duplicate
    entry in the name table as result. We fix that with this commit.

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Apr, 2018

1 commit

  • syzbot reported a crash in __tipc_nl_net_set() caused by NULL dereference.

    We need to check that both TIPC_NLA_NET_NODEID and TIPC_NLA_NET_NODEID_W1
    are present.

    We also need to make sure userland provided u64 attributes.

    Fixes: d50ccc2d3909 ("tipc: add 128-bit node identifier")
    Signed-off-by: Eric Dumazet
    Cc: Jon Maloy
    Cc: Ying Xue
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Apr, 2018

1 commit

  • With the new RB tree structure for service ranges it becomes possible to
    solve an old problem; - we can now allow overlapping service ranges in
    the table.

    When inserting a new service range to the tree, we use 'lower' as primary
    key, and when necessary 'upper' as secondary key.

    Since there may now be multiple service ranges matching an indicated
    'lower' value, we must also add the 'upper' value to the functions
    used for removing publications, so that the correct, corresponding
    range item can be found.

    These changes guarantee that a well-formed publication/withdrawal item
    from a peer node never will be rejected, and make it possible to
    eliminate the problematic backlog functionality we currently have for
    handling such cases.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

24 Mar, 2018

5 commits

  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We add a 128-bit node identity, as an alternative to the currently used
    32-bit node address.

    For the sake of compatibility and to minimize message header changes
    we retain the existing 32-bit address field. When not set explicitly by
    the user, this field will be filled with a hash value generated from the
    much longer node identity, and be used as a shorthand value for the
    latter.

    We permit either the address or the identity to be set by configuration,
    but not both, so when the address value is set by a legacy user the
    corresponding 128-bit node identity is generated based on the that value.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation to changing the addressing structure of TIPC we replace
    all direct accesses to the tipc_net::own_addr field with the function
    dedicated for this, tipc_own_addr().

    There are no changes to program logics in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The removal of an internal structure of the node address has an unwanted
    side effect.
    - Currently, if a user is sending an anycast message with destination
    domain 0, the tipc_namebl_translate() function will use the 'closest-
    first' algorithm to first look for a node local destination, and only
    when no such is found, will it resort to the cluster global 'round-
    robin' lookup algorithm.
    - Current users can get around this, and enforce unconditional use of
    global round-robin by indicating a destination as Z.0.0 or Z.C.0.
    - This option disappears when we make the node address flat, since the
    lookup algorithm has no way of recognizing this case. So, as long as
    there are node local destinations, the algorithm will always select
    one of those, and there is nothing the sender can do to change this.

    We solve this by eliminating the 'closest-first' option, which was never
    a good idea anyway, for non-legacy users, but only for those. To
    distinguish between legacy users and non-legacy users we introduce a new
    flag 'legacy_addr_format' in struct tipc_core, to be set when the user
    configures a legacy-style Z.C.N node address. Hence, when a legacy user
    indicates a zero lookup domain 'closest-first' is selected, and in all
    other cases we use 'round-robin'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Nominally, TIPC organizes network nodes into a three-level network
    hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
    hierarchy is reflected in the node address format, - it is sub-divided
    into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

    However, the 'zone' and 'cluster' levels have in reality never been
    fully implemented,and never will be. The result of this has been
    that the first 20 bits the node identity structure have been wasted,
    and the usable node identity range within a cluster has been limited
    to 12 bits. This is starting to become a problem.

    In the following commits, we will need to be able to connect between
    nodes which are using the whole 32-bit value space of the node address.
    We therefore remove the restrictions on which values can be assigned
    to node identity, -it is from now on only a 32-bit integer with no
    assumed internal structure.

    Isolation between clusters is now achieved only by setting different
    values for the 'network id' field used during neighbor discovery, in
    practice leading to the latter becoming the new cluster identity.

    The rules for accepting discovery requests/responses from neighboring
    nodes now become:

    - If the user is using legacy address format on both peers, reception
    of discovery messages is subject to the legacy lookup domain check
    in addition to the cluster id check.

    - Otherwise, the discovery request/response is always accepted, provided
    both peers have the same network id.

    This secures backwards compatibility for users who have been using zone
    or cluster identities as cluster separators, instead of the intended
    'network id'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

18 Mar, 2018

1 commit

  • Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
    aspects handled the same way, both on the publishing node and on the
    receiving nodes.

    Despite previous ambitions to the contrary, this is never going to change,
    so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
    macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
    using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
    remain compatible with users and remote nodes still using ZONE_SCOPE.

    Furthermore, the non-formalized scope value 0 has always been permitted
    for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
    We now permit it even as binding scope, but for compatibility reasons we
    choose to not change the value of TIPC_CLUSTER_SCOPE.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

15 Feb, 2018

1 commit


14 Apr, 2017

2 commits

  • This is an add-on to the previous patch that passes the extended ACK
    structure where it's already available by existing genl_info or extack
    function arguments.

    This was done with this spatch (with some manual adjustment of
    indentation):

    @@
    expression A, B, C, D, E;
    identifier fn, info;
    @@
    fn(..., struct genl_info *info, ...) {
    ...
    -nlmsg_parse(A, B, C, D, E, NULL)
    +nlmsg_parse(A, B, C, D, E, info->extack)
    ...
    }

    @@
    expression A, B, C, D, E;
    identifier fn, info;
    @@
    fn(..., struct genl_info *info, ...) {
    extack)
    ...>
    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D, E;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {
    ...
    -nlmsg_parse(A, B, C, D, E, NULL)
    +nlmsg_parse(A, B, C, D, E, extack)
    ...
    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C, D;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    @@
    expression A, B, C;
    identifier fn, extack;
    @@
    fn(..., struct netlink_ext_ack *extack, ...) {

    }

    Signed-off-by: Johannes Berg
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Pass the new extended ACK reporting struct to all of the generic
    netlink parsing functions. For now, pass NULL in almost all callers
    (except for some in the core.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

18 Feb, 2017

1 commit

  • There are two problems with the function tipc_sk_reinit. Firstly
    it's doing a manual walk over an rhashtable. This is broken as
    an rhashtable can be resized and if you manually walk over it
    during a resize then you may miss entries.

    Secondly it's missing memory barriers as previously the code used
    spinlocks which provide the barriers implicitly.

    This patch fixes both problems.

    Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...")
    Signed-off-by: Herbert Xu
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Herbert Xu
     

08 Mar, 2016

1 commit


07 Mar, 2016

1 commit

  • Until now, we have kept a pre-allocated protocol message header
    aggregated into struct tipc_link. Apart from adding unnecessary
    footprint to the link instances, this requires extra code both to
    initialize and re-initialize it.

    We now remove this sub-optimization. This change also makes it
    possible to clean up the function tipc_build_proto_msg() and remove
    a couple of small functions that were accessing the mentioned header.
    In particular, we can replace all occurrences of the local function
    call link_own_addr(link) with the generic tipc_own_addr(net).

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

2 commits

  • The broadcast transmission link is currently instantiated when the
    network subsystem is started, i.e., on order from user space via netlink.

    This forces the broadcast transmission code to do unnecessary tests for
    the existence of the transmission link, as well in single mode node as
    in network mode.

    In this commit, we do instead create the link during initialization of
    the name space, and remove it when it is stopped. The fact that the
    transmission link now has a guaranteed longer life cycle than any of its
    potential clients paves the way for further code simplifcations
    and optimizations.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently, a number of structure and function definitions related
    to the broadcast functionality are unnecessarily exposed in the file
    bcast.h. This obscures the fact that the external interface towards
    the broadcast link in fact is very narrow, and causes unnecessary
    recompilations of other files when anything changes in those
    definitions.

    In this commit, we move as many of those definitions as is currently
    possible to the file bcast.c.

    We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
    since the name does not correctly describe the contents of this
    struct, and will do so even less in the future, and because we want
    to use the term 'link' more appropriately in the functionality
    introduced later in this series.

    Finally, we rename a couple of functions, such as tipc_bclink_xmit()
    and others that will be kept in the future, to include the term 'bcast'
    instead.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2015

1 commit

  • When we try to add new inline functions in the code, we sometimes
    run into circular include dependencies.

    The main problem is that the file core.h, which really should be at
    the root of the dependency chain, instead is a leaf. I.e., core.h
    includes a number of header files that themselves should be allowed
    to include core.h. In reality this is unnecessary, because core.h does
    not need to know the full signature of any of the structs it refers to,
    only their type declaration.

    In this commit, we remove all dependencies from core.h towards any
    other tipc header file.

    As a consequence of this change, we can now move the function
    tipc_own_addr(net) from addr.c to addr.h, and make it inline.

    There are no functional changes in this commit.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

10 Feb, 2015

3 commits


13 Jan, 2015

7 commits

  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Now tipc socket table is statically allocated as a global variable.
    Through it, we can look up one socket instance with port ID, insert
    a new socket instance to the table, and delete a socket from the
    table. But when tipc supports net namespace, each namespace must own
    its specific socket table. So the global variable of socket table
    must be redefined in tipc_net structure. As a concequence, a new
    socket table will be allocated when a new namespace is created, and
    a socket table will be deallocated when namespace is destroyed.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC broadcast link is statically established and its relevant states
    are maintained with the global variables: "bcbearer", "bclink" and
    "bcl". Allowing different namespace to own different broadcast link
    instances, these variables must be moved to tipc_net structure and
    broadcast link instances would be allocated and initialized when
    namespace is created.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Bearer list defined as a global variable is used to store bearer
    instances. When tipc supports net namespace, bearers created in
    one namespace must be isolated with others allocated in other
    namespaces, which requires us that the bearer list(bearer_list)
    must be moved to tipc_net structure. As a result, a net namespace
    pointer has to be passed to functions which access the bearer list.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

22 Nov, 2014

2 commits

  • Add TIPC_NL_NET_SET command to the new tipc netlink API.

    This command can set the network id and network (tipc) address.

    Netlink logical layout of network set message:
    -> net
    [ -> id ]
    [ -> address ]

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • Add TIPC_NL_NET_GET command to the new tipc netlink API.

    This command dumps the network id of the node.

    Netlink logical layout of returned network data:
    -> net
    -> id

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

24 Aug, 2014

2 commits

  • We move the inline functions in the file port.h to socket.c, and modify
    their names accordingly.

    We move struct tipc_port and some macros to socket.h.

    Finally, we remove the file port.h.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The functions tipc_port_get_ports() and tipc_port_reinit() scan over
    all sockets/ports to access each of them. This is done by using a
    dedicated linked list, 'tipc_socks' where all sockets are members. The
    list is in turn protected by a spinlock, 'port_list_lock', while each
    socket is locked by using port_lock at the moment of access.

    In order to reduce complexity and risk of deadlock, we want to get
    rid of the linked list and the accompanying spinlock.

    This is what we do in this commit. Instead of the linked list, we use
    the port registry to scan across the sockets. We also add usage of
    bh_lock_sock() inside the scope of port_lock in both functions, as a
    preparation for the complete removal of port_lock.

    Finally, we move the functions from port.c to socket.c, and rename them
    to tipc_sk_sock_show() and tipc_sk_reinit() repectively.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy