23 Oct, 2018

1 commit

  • We have seen the following race scenario:
    1) named_distribute() builds a "bulk" message, containing a PUBLISH
    item for a certain publication. This is based on the contents of
    the binding tables's 'cluster_scope' list.
    2) tipc_named_withdraw() removes the same publication from the list,
    bulds a WITHDRAW message and distributes it to all cluster nodes.
    3) tipc_named_node_up(), which was calling named_distribute(), sends
    out the bulk message built under 1)
    4) The WITHDRAW message arrives at the just detected node, finds
    no corresponding publication, and is dropped.
    5) The PUBLISH item arrives at the same node, is added to its binding
    table, and remains there forever.

    This arrival disordering was earlier taken care of by the backlog queue,
    originally added for a different purpose, which was removed in the
    commit referred to below, but we now need a different solution.
    In this commit, we replace the rcu lock protecting the 'cluster_scope'
    list with a regular RW lock which comprises even the sending of the
    bulk message. This both guarantees both the list integrity and the
    message sending order. We will later add a commit which cleans up
    this code further.

    Note that this commit needs recently added commit d3092b2efca1 ("tipc:
    fix unsafe rcu locking when accessing publication list") to apply
    cleanly.

    Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table")
    Reported-by: Tuong Lien Tong
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

28 Aug, 2018

1 commit

  • In function tipc_dest_push, the 32bit variables 'node' and 'port'
    are stored separately in uppper and lower part of 64bit 'value'.
    Then this value is assigned to dst->value which is a union like:
    union
    {
    struct {
    u32 port;
    u32 node;
    };
    u64 value;
    }
    This works on little-endian machines like x86 but fails on big-endian
    machines.

    The fix remove the 'value' stack parameter and even the 'value'
    member of the union in tipc_dest, assign the 'node' and 'port' member
    directly with the input parameter to avoid the endian issue.

    Fixes: a80ae5306a73 ("tipc: improve destination linked list")
    Signed-off-by: Zhenbo Gao
    Acked-by: Jon Maloy
    Signed-off-by: Haiqing Bai
    Signed-off-by: David S. Miller

    Haiqing Bai
     

13 Apr, 2018

1 commit

  • When a topology subscription is created, we may encounter (or KASAN
    may provoke) a failure to create a corresponding service instance in
    the binding table. Instead of letting the tipc_nametbl_subscribe()
    report the failure back to the caller, the function just makes a warning
    printout and returns, without incrementing the subscription reference
    counter as expected by the caller.

    This makes the caller believe that the subscription was successful, so
    it will at a later moment try to unsubscribe the item. This involves
    a sub_put() call. Since the reference counter never was incremented
    in the first place, we get a premature delete of the subscription item,
    followed by a "use-after-free" warning.

    We fix this by adding a return value to tipc_nametbl_subscribe() and
    make the caller aware of the failure to subscribe.

    This bug seems to always have been around, but this fix only applies
    back to the commit shown below. Given the low risk of this happening
    we believe this to be sufficient.

    Fixes: commit 218527fe27ad ("tipc: replace name table service range
    array with rb tree")
    Reported-by: syzbot+aa245f26d42b8305d157@syzkaller.appspotmail.com

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

01 Apr, 2018

2 commits

  • With the new RB tree structure for service ranges it becomes possible to
    solve an old problem; - we can now allow overlapping service ranges in
    the table.

    When inserting a new service range to the tree, we use 'lower' as primary
    key, and when necessary 'upper' as secondary key.

    Since there may now be multiple service ranges matching an indicated
    'lower' value, we must also add the 'upper' value to the functions
    used for removing publications, so that the correct, corresponding
    range item can be found.

    These changes guarantee that a well-formed publication/withdrawal item
    from a peer node never will be rejected, and make it possible to
    eliminate the problematic backlog functionality we currently have for
    handling such cases.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The current design of the binding table has an unnecessary memory
    consuming and complex data structure. It aggregates the service range
    items into an array, which is expanded by a factor two every time it
    becomes too small to hold a new item. Furthermore, the arrays never
    shrink when the number of ranges diminishes.

    We now replace this array with an RB tree that is holding the range
    items as tree nodes, each range directly holding a list of bindings.

    This, along with a few name changes, improves both readability and
    volume of the code, as well as reducing memory consumption and hopefully
    improving cache hit rate.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

18 Mar, 2018

4 commits

  • We rename some lists and fields in struct publication both to make
    the naming more consistent and to better reflect their roles. We
    also update the descriptions of those lists.

    node_list -> local_publ
    cluster_list -> all_publ
    pport_list -> binding_sock
    ref -> port

    There are no functional changes in this commit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The size of struct publication can be reduced further. Membership in
    lists 'nodesub_list' and 'local_list' is mutually exlusive, in that
    remote publications use the former and local publications the latter.
    We replace the two lists with one single, named 'binding_node' which
    reflects what it really is.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a further consequence of the previous commits, we can also remove
    the member 'zone_list 'in struct name_info and struct publication.
    Instead, we now let the member cluster_list take over the role a
    container of all publications of a given .
    We also remove the counters for the size of those lists, since
    they don't serve any purpose.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a consequence of the previous commit we nan now eliminate zone scope
    related lists in the name table. We start with name_table::publ_list[3],
    which can now be replaced with two lists, one for node scope publications
    and one for cluster scope publications.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Feb, 2018

1 commit

  • Because of the requirement for total distribution transparency, users
    send subscriptions and receive topology events in their own host format.
    It is up to the topology server to determine this format and do the
    correct conversions to and from its own host format when needed.

    Until now, this has been handled in a rather non-transparent way inside
    the topology server and subscriber code, leading to unnecessary
    complexity when creating subscriptions and issuing events.

    We now improve this situation by adding two new macros, tipc_sub_read()
    and tipc_evt_write(). Both those functions calculate the need for
    conversion internally before performing their respective operations.
    Hence, all handling of such conversions become transparent to the rest
    of the code.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

16 Jan, 2018

1 commit

  • In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we
    inadvertently broke non-group multicast transmission when changing the
    parameter 'domain' to 'scope' in the function
    tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
    change in the calling function, with the result that the lookup always
    fails.

    A closer anaysis reveals that this parameter is not needed at all.
    Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
    current implementation this will be delivered to all matching
    destinations except those which are published with NODE_SCOPE on other
    nodes. Since such publications never will be visible on the sending node
    anyway, it makes no sense to discriminate by scope at all.

    We now remove this parameter altogether.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

10 Jan, 2018

2 commits

  • When a member joins a group, it also indicates a binding scope. This
    makes it possible to create both node local groups, invisible to other
    nodes, as well as cluster global groups, visible everywhere.

    In order to avoid that different members end up having permanently
    differing views of group size and memberhip, we must inhibit locally
    and globally bound members from joining the same group.

    We do this by using the binding scope as an additional separator between
    groups. I.e., a member must ignore all membership events from sockets
    using a different scope than itself, and all lookups for message
    destinations must require an exact match between the message's lookup
    scope and the potential target's binding scope.

    Apart from making it possible to create local groups using the same
    identity on different nodes, a side effect of this is that it now also
    becomes possible to create a cluster global group with the same identity
    across the same nodes, without interfering with the local groups.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • Currently, when a user is subscribing for binding table publications,
    he will receive a PUBLISH event for all already existing matching items
    in the binding table.

    However, a group socket making a subscriptions doesn't need this initial
    status update from the binding table, because it has already scanned it
    during the join operation. Worse, the multiplicatory effect of issuing
    mutual events for dozens or hundreds group members within a short time
    frame put a heavy load on the topology server, with the end result that
    scale out operations on a big group tend to take much longer than needed.

    We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
    subscriptions, so that this initial avalanche of events is suppressed.
    This change, along with the previous commit, significantly improves the
    range and speed of group scale out operations.

    We keep the new option internal for the tipc driver, at least for now.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

13 Oct, 2017

3 commits

  • In this commit, we make it possible to send connectionless unicast
    messages to any member corresponding to the given member identity,
    when there is more than one such member. The sender must use a
    TIPC_ADDR_NAME address to achieve this effect.

    We also perform load balancing between the destinations, i.e., we
    primarily select one which has advertised sufficient send window
    to not cause a block/EAGAIN delay, if any. This mechanism is
    overlayed on the always present round-robin selection.

    Anycast messages are subject to the same start synchronization
    and flow control mechanism as group broadcast messages.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • As a preparation for introducing flow control for multicast and datagram
    messaging we need a more strictly defined framework than we have now. A
    socket must be able keep track of exactly how many and which other
    sockets it is allowed to communicate with at any moment, and keep the
    necessary state for those.

    We therefore introduce a new concept we have named Communication Group.
    Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
    The call takes four parameters: 'type' serves as group identifier,
    'instance' serves as an logical member identifier, and 'scope' indicates
    the visibility of the group (node/cluster/zone). Finally, 'flags' makes
    it possible to set certain properties for the member. For now, there is
    only one flag, indicating if the creator of the socket wants to receive
    a copy of broadcast or multicast messages it is sending via the socket,
    and if wants to be eligible as destination for its own anycasts.

    A group is closed, i.e., sockets which have not joined a group will
    not be able to send messages to or receive messages from members of
    the group, and vice versa.

    Any member of a group can send multicast ('group broadcast') messages
    to all group members, optionally including itself, using the primitive
    send(). The messages are received via the recvmsg() primitive. A socket
    can only be member of one group at a time.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We often see a need for a linked list of destination identities,
    sometimes containing a port number, sometimes a node identity, and
    sometimes both. The currently defined struct u32_list is not generic
    enough to cover all cases, so we extend it to contain two u32 integers
    and rename it to struct tipc_dest_list.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Maloy
     

21 Jan, 2017

1 commit


04 Jan, 2017

1 commit

  • During multicast reception we currently use a simple linked list with
    push/pop semantics to store port numbers.

    We now see a need for a more generic list for storing values of type
    u32. We therefore make some modifications to this list, while replacing
    the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
    functions which will come to use in the next commits.

    Acked-by: Parthasarathy Bhuvaragan
    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

10 Feb, 2015

1 commit


06 Feb, 2015

2 commits

  • In a previous commit in this series we resolved a race problem during
    unicast message reception.

    Here, we resolve the same problem at multicast reception. We apply the
    same technique: an input queue serializing the delivery of arriving
    buffers. The main difference is that here we do it in two steps.
    First, the broadcast link feeds arriving buffers into the tail of an
    arrival queue, which head is consumed at the socket level, and where
    destination lookup is performed. Second, if the lookup is successful,
    the resulting buffer clones are fed into a second queue, the input
    queue. This queue is consumed at reception in the socket just like
    in the unicast case. Both queues are protected by the same lock, -the
    one of the input queue.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The structure 'tipc_port_list' is used to collect port numbers
    representing multicast destination socket on a receiving node.
    The list is not based on a standard linked list, and is in reality
    optimized for the uncommon case that there are more than one
    multicast destinations per node. This makes the list handling
    unecessarily complex, and as a consequence, even the socket
    multicast reception becomes more complex.

    In this commit, we replace 'tipc_port_list' with a new 'struct
    tipc_plist', which is based on a standard list. We give the new
    list stack (push/pop) semantics, someting that simplifies
    the implementation of the function tipc_sk_mcast_rcv().

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

13 Jan, 2015

2 commits

  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Dec, 2014

2 commits

  • Convert tipc name table read-write lock to RCU. After this change,
    a new spin lock is used to protect name table on write side while
    RCU is applied on read side.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Name table locking policy is going to be adjusted from read-write
    lock protection to RCU lock protection in the future commits. But
    its essential precondition is to convert the allocation way of name
    table from static to dynamic mode.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

27 Nov, 2014

1 commit

  • The node subscribe infrastructure represents a virtual base class, so
    its users, such as struct tipc_port and struct publication, can derive
    its implemented functionalities. However, after the removal of struct
    tipc_port, struct publication is left as its only single user now. So
    defining an abstract infrastructure for one user becomes no longer
    reasonable. If corresponding new functions associated with the
    infrastructure are moved to name_table.c file, the node subscription
    infrastructure can be removed as well.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

22 Nov, 2014

1 commit

  • Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.

    This command supports dumping the name table of all nodes.

    Netlink logical layout of name table response message:
    -> name table
    -> publication
    -> type
    -> lower
    -> upper
    -> scope
    -> node
    -> ref
    -> key

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

18 Jun, 2013

1 commit


01 May, 2012

1 commit

  • Some of the comment blocks are floating in limbo between two
    functions, or between blocks of code. Delete the extra line
    feeds between any comment and its associated following block
    of code, to be consistent with the majority of the rest of
    the kernel. Also delete trailing newlines at EOF and fix
    a couple trivial typos in existing comments.

    This is a 100% cosmetic change with no runtime impact. We get
    rid of over 500 lines of non-code, and being blank line deletes,
    they won't even show up as noise in git blame.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

25 Feb, 2012

1 commit

  • Streamlines the logic that prevents an application from binding a
    reserved TIPC name type to a port by moving the check to the code
    that handles a socket bind() operation. This allows internal TIPC
    subsystems to bind a reserved name without having to set an atomic
    flag to gain permission to use such a name. (This simplification is
    now possible due to the elimination of support for TIPC's native API.)

    Signed-off-by: Allan Stephens
    Signed-off-by: Paul Gortmaker

    Allan Stephens
     

30 Dec, 2011

2 commits


25 Jun, 2011

1 commit

  • Modifies the main circular linked lists of publications used in TIPC's
    name table to use the standard kernel linked list type. This change
    simplifies the deletion of an existing publication by eliminating
    the need to search up to three lists to locate the publication.
    The use of standard list routines also helps improve the readability
    of the name table code by make it clearer what each list operation
    being performed is actually doing.

    Signed-off-by: Allan Stephens
    Signed-off-by: Paul Gortmaker

    Allan Stephens
     

02 Jan, 2011

1 commit

  • Cleans up TIPC's source code to eliminate deviations from generally
    accepted coding conventions relating to leading/trailing white space
    and white space around commas, braces, cases, and sizeof.

    These changes are purely cosmetic and do not alter the operation of TIPC
    in any way.

    Signed-off-by: Allan Stephens
    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Allan Stephens
     

03 Sep, 2008

1 commit


11 Feb, 2007

1 commit


18 Jan, 2006

1 commit


13 Jan, 2006

3 commits