09 Apr, 2018

1 commit

  • Commit 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
    tried to fix the crash but failed, the crash is still 100% reproducible
    with it.

    In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
    correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
    we should use NETLINK_CB(cb->skb).sk here.

    Reported-by:
    Fixes: 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
    Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
    Cc: GhantaKrishnamurthy MohanKrishna
    Cc: Jon Maloy
    Cc: Ying Xue
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

04 Apr, 2018

2 commits

  • To fetch UID info for socket diagnostics, we determine the
    namespace of user context using tipc socket instance. This
    may cause namespace violation, as the kernel will remap based
    on UID.

    We fix this by fetching namespace info using the calling userspace
    netlink socket.

    Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
    Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
    Acked-by: Jon Maloy
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     
  • When an item of struct tipc_subscription is created, we fail to
    initialize the two lists aggregated into the struct. This has so far
    never been a problem, since the items are just added to a root
    object by list_add(), which does not require the addee list to be
    pre-initialized. However, syzbot is provoking situations where this
    addition fails, whereupon the attempted removal if the item from
    the list causes a crash.

    This problem seems to always have been around, despite that the code
    for creating this object was rewritten in commit 242e82cc95f6 ("tipc:
    collapse subscription creation functions"), which is still in net-next.

    We fix this for that commit by initializing the two lists properly.

    Fixes: 242e82cc95f6 ("tipc: collapse subscription creation functions")
    Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

01 Apr, 2018

4 commits

  • gcc points out that the combined length of the fixed-length inputs to
    l->name is larger than the destination buffer size:

    net/tipc/link.c: In function 'tipc_link_create':
    net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
    into a region of size between 26 and 58 [-Werror=format-overflow=]
    sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);

    net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
    (assuming 75) into a destination of size 60
    sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);

    A detailed analysis reveals that the theoretical maximum length of
    a link name is:
    max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
    16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
    Since we also need space for a trailing zero we now set MAX_LINK_NAME
    to 68.

    Just to be on the safe side we also replace the sprintf() call with
    snprintf().

    Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address
    hash values")
    Reported-by: Arnd Bergmann

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • With the new RB tree structure for service ranges it becomes possible to
    solve an old problem; - we can now allow overlapping service ranges in
    the table.

    When inserting a new service range to the tree, we use 'lower' as primary
    key, and when necessary 'upper' as secondary key.

    Since there may now be multiple service ranges matching an indicated
    'lower' value, we must also add the 'upper' value to the functions
    used for removing publications, so that the correct, corresponding
    range item can be found.

    These changes guarantee that a well-formed publication/withdrawal item
    from a peer node never will be rejected, and make it possible to
    eliminate the problematic backlog functionality we currently have for
    handling such cases.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The function tipc_nametbl_translate() function is ugly and hard to
    follow. This can be improved somewhat by introducing a stack variable
    for holding the publication list to be used and re-ordering the if-
    clauses for selection of algorithm.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The current design of the binding table has an unnecessary memory
    consuming and complex data structure. It aggregates the service range
    items into an array, which is expanded by a factor two every time it
    becomes too small to hold a new item. Furthermore, the arrays never
    shrink when the number of ranges diminishes.

    We now replace this array with an RB tree that is holding the range
    items as tree nodes, each range directly holding a list of bindings.

    This, along with a few name changes, improves both readability and
    volume of the code, as well as reducing memory consumption and hopefully
    improving cache hit rate.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

28 Mar, 2018

1 commit


27 Mar, 2018

2 commits


26 Mar, 2018

1 commit


24 Mar, 2018

8 commits

  • Selecting and explicitly configuring a TIPC node identity may be
    unwanted in some cases.

    In this commit we introduce a default setting if the identity has not
    been set at the moment the first bearer is enabled. We do this by
    using a raw copy of a unique identifier from the used interface: MAC
    address in the case of an L2 bearer, IPv4/IPv6 address in the case
    of a UDP bearer.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We add a 128-bit node identity, as an alternative to the currently used
    32-bit node address.

    For the sake of compatibility and to minimize message header changes
    we retain the existing 32-bit address field. When not set explicitly by
    the user, this field will be filled with a hash value generated from the
    much longer node identity, and be used as a shorthand value for the
    latter.

    We permit either the address or the identity to be set by configuration,
    but not both, so when the address value is set by a legacy user the
    corresponding 128-bit node identity is generated based on the that value.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation to changing the addressing structure of TIPC we replace
    all direct accesses to the tipc_net::own_addr field with the function
    dedicated for this, tipc_own_addr().

    There are no changes to program logics in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The removal of an internal structure of the node address has an unwanted
    side effect.
    - Currently, if a user is sending an anycast message with destination
    domain 0, the tipc_namebl_translate() function will use the 'closest-
    first' algorithm to first look for a node local destination, and only
    when no such is found, will it resort to the cluster global 'round-
    robin' lookup algorithm.
    - Current users can get around this, and enforce unconditional use of
    global round-robin by indicating a destination as Z.0.0 or Z.C.0.
    - This option disappears when we make the node address flat, since the
    lookup algorithm has no way of recognizing this case. So, as long as
    there are node local destinations, the algorithm will always select
    one of those, and there is nothing the sender can do to change this.

    We solve this by eliminating the 'closest-first' option, which was never
    a good idea anyway, for non-legacy users, but only for those. To
    distinguish between legacy users and non-legacy users we introduce a new
    flag 'legacy_addr_format' in struct tipc_core, to be set when the user
    configures a legacy-style Z.C.N node address. Hence, when a legacy user
    indicates a zero lookup domain 'closest-first' is selected, and in all
    other cases we use 'round-robin'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Nominally, TIPC organizes network nodes into a three-level network
    hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
    hierarchy is reflected in the node address format, - it is sub-divided
    into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

    However, the 'zone' and 'cluster' levels have in reality never been
    fully implemented,and never will be. The result of this has been
    that the first 20 bits the node identity structure have been wasted,
    and the usable node identity range within a cluster has been limited
    to 12 bits. This is starting to become a problem.

    In the following commits, we will need to be able to connect between
    nodes which are using the whole 32-bit value space of the node address.
    We therefore remove the restrictions on which values can be assigned
    to node identity, -it is from now on only a 32-bit integer with no
    assumed internal structure.

    Isolation between clusters is now achieved only by setting different
    values for the 'network id' field used during neighbor discovery, in
    practice leading to the latter becoming the new cluster identity.

    The rules for accepting discovery requests/responses from neighboring
    nodes now become:

    - If the user is using legacy address format on both peers, reception
    of discovery messages is subject to the legacy lookup domain check
    in addition to the cluster id check.

    - Otherwise, the discovery request/response is always accepted, provided
    both peers have the same network id.

    This secures backwards compatibility for users who have been using zone
    or cluster identities as cluster separators, instead of the intended
    'network id'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • To facilitate the coming changes in the neighbor discovery functionality
    we make some renaming and refactoring of that code. The functional changes
    in this commit are trivial, e.g., that we move the message sending call in
    tipc_disc_timeout() outside the spinlock protected region.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation for the next commits we try to reduce the footprint of
    the function tipc_enable_bearer(), while hopefully making is simpler to
    follow.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

23 Mar, 2018

3 commits

  • Currently when tipc is unable to queue a received message on a
    socket, the message is rejected back to the sender with error
    TIPC_ERR_OVERLOAD. However, the application on this socket
    has no knowledge about these discards.

    In this commit, we try to step the sk_drops counter when tipc
    is unable to queue a received message. Export sk_drops
    using tipc socket diagnostics.

    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     
  • This commit adds socket diagnostics capability for AF_TIPC in netlink
    family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).

    The following are key design considerations:
    - config TIPC_DIAG has default y, like INET_DIAG.
    - only requests with flag NLM_F_DUMP is supported (dump all).
    - tipc_sock_diag_req message is introduced to send filter parameters.
    - the response attributes are of TLV, some nested.

    To avoid exposing data structures between diag and tipc modules and
    avoid code duplication, the following additions are required:
    - export tipc_nl_sk_walk function to reuse socket iterator.
    - export tipc_sk_fill_sock_diag to fill the tipc diag attributes.
    - create a sock_diag response message in __tipc_add_sock_diag defined
    in diag.c and use the above exported tipc_sk_fill_sock_diag
    to fill response.

    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     
  • The current socket iterator function tipc_nl_sk_dump, handles socket
    locks and calls __tipc_nl_add_sk for each socket.
    To reuse this logic in sock_diag implementation, we do minor
    modifications to make these functions generic as described below.

    In this commit, we add a two new functions __tipc_nl_sk_walk,
    __tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk
    accordingly.

    In __tipc_nl_sk_walk we:
    1. acquire and release socket locks
    2. for each socket, execute the specified callback function

    In __tipc_nl_add_sk we:
    - Move the netlink attribute insertion to __tipc_nl_add_sk_info.

    tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument.

    sock_diag will use these generic functions in a later commit.

    There is no functional change in this commit.
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: Parthasarathy Bhuvaragan
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     

18 Mar, 2018

5 commits

  • We rename some lists and fields in struct publication both to make
    the naming more consistent and to better reflect their roles. We
    also update the descriptions of those lists.

    node_list -> local_publ
    cluster_list -> all_publ
    pport_list -> binding_sock
    ref -> port

    There are no functional changes in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The size of struct publication can be reduced further. Membership in
    lists 'nodesub_list' and 'local_list' is mutually exlusive, in that
    remote publications use the former and local publications the latter.
    We replace the two lists with one single, named 'binding_node' which
    reflects what it really is.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a further consequence of the previous commits, we can also remove
    the member 'zone_list 'in struct name_info and struct publication.
    Instead, we now let the member cluster_list take over the role a
    container of all publications of a given .
    We also remove the counters for the size of those lists, since
    they don't serve any purpose.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a consequence of the previous commit we nan now eliminate zone scope
    related lists in the name table. We start with name_table::publ_list[3],
    which can now be replaced with two lists, one for node scope publications
    and one for cluster scope publications.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
    aspects handled the same way, both on the publishing node and on the
    receiving nodes.

    Despite previous ambitions to the contrary, this is never going to change,
    so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
    macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
    using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
    remain compatible with users and remote nodes still using ZONE_SCOPE.

    Furthermore, the non-formalized scope value 0 has always been permitted
    for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
    We now permit it even as binding scope, but for compatibility reasons we
    choose to not change the value of TIPC_CLUSTER_SCOPE.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

13 Mar, 2018

1 commit

  • TIPC looks concentrated in itself, and other pernet_operations
    seem not touching its entities.

    tipc_net_ops look pernet-divided, and they should be safe to
    be executed in parallel for several net the same time.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

08 Mar, 2018

1 commit


06 Mar, 2018

1 commit


28 Feb, 2018

1 commit

  • In commit 60c253069632 ("tipc: fix race between poll() and
    setsockopt()") we introduced a pointer from struct tipc_group to the
    'group_is_connected' flag in struct tipc_sock, so that this field can
    be checked without dereferencing the group pointer of the latter struct.

    The initial value for this flag is correctly set to 'false' when a
    group is created, but we miss the case when no group is created at
    all, in which case the initial value should be 'true'. This has the
    effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
    POLLOUT if they request so.

    This commit corrects this bug.

    Fixes: 60c253069632 ("tipc: fix race between poll() and setsockopt()")
    Reported-by: Hoang Le
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

20 Feb, 2018

3 commits

  • David S. Miller
     
  • syzbot reported a scheduling while atomic issue at netns
    destruction time:

    BUG: sleeping function called from invalid context at net/core/sock.c:2769
    in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3
    5 locks held by kworker/u4:3/85:
    #0: ((wq_completion)"%s""netns"){+.+.}, at: []
    process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
    #1: (net_cleanup_work){+.+.}, at: []
    process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
    #2: (net_sem){++++}, at: [] cleanup_net+0x23f/0xd20
    net/core/net_namespace.c:494
    #3: (net_mutex){+.+.}, at: [] cleanup_net+0xa7d/0xd20
    net/core/net_namespace.c:496
    #4: (&(&srv->idr_lock)->rlock){+...}, at: []
    spin_lock_bh include/linux/spinlock.h:315 [inline]
    #4: (&(&srv->idr_lock)->rlock){+...}, at: []
    tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685
    CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Workqueue: netns cleanup_net
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128
    __might_sleep+0x95/0x190 kernel/sched/core.c:6081
    lock_sock_nested+0x37/0x110 net/core/sock.c:2769
    lock_sock include/net/sock.h:1463 [inline]
    tipc_release+0x103/0xff0 net/tipc/socket.c:572
    sock_release+0x8d/0x1e0 net/socket.c:594
    tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696
    tipc_exit_net+0x15/0x40 net/tipc/core.c:96
    ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148
    cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529
    process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
    worker_thread+0x223/0x1990 kernel/workqueue.c:2247
    kthread+0x33c/0x400 kernel/kthread.c:238
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429

    This is caused by tipc_topsrv_stop() releasing the listener socket
    with the idr lock held. This changeset addresses the issue moving
    the release operation outside such lock.

    Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com
    Fixes: 0ef897be12b8 ("tipc: separate topology server listener socket from subcsriber sockets")
    Signed-off-by: Paolo Abeni
    Acked-by: ///jon
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we
    re-introduced an old bug on the error path in the function
    tipc_topsrv_kern_subscr(). We now re-introduce the correction too.

    Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Feb, 2018

6 commits

  • We rename struct tipc_server to struct tipc_topsrv. This reflect its now
    specialized role as topology server. Accoringly, we change or add function
    prefixes to make it clearer which functionality those belong to.

    There are no functional changes in this commit.

    Acked-by: Ying.Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We move the listener socket to struct tipc_server and give it its own
    work item. This makes it easier to follow the code, and entails some
    simplifications in the reception code in subscriber sockets.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • In order to narrow the interface and dependencies between the topology
    server and the subscription/binding table functionality we move struct
    tipc_server inside the file server.c. This requires some code
    adaptations in other files, but those are mostly minor.

    The most important change is that we have to move the start/stop
    functions for the topology server to server.c, where they logically
    belong anyway.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Since we now have removed struct tipc_subscriber from the code, and
    only struct tipc_subscription remains, there is no longer need for long
    and awkward prefixes to distinguish between their pertaining functions.

    We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is
    a purely cosmetic change.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • After the previous changes it becomes logical to collapse the two-level
    creation of subscription instances into one. We do that here.

    We also rename the creation and deletion functions for more consistency.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Because of the requirement for total distribution transparency, users
    send subscriptions and receive topology events in their own host format.
    It is up to the topology server to determine this format and do the
    correct conversions to and from its own host format when needed.

    Until now, this has been handled in a rather non-transparent way inside
    the topology server and subscriber code, leading to unnecessary
    complexity when creating subscriptions and issuing events.

    We now improve this situation by adding two new macros, tipc_sub_read()
    and tipc_evt_write(). Both those functions calculate the need for
    conversion internally before performing their respective operations.
    Hence, all handling of such conversions become transparent to the rest
    of the code.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy