16 Jun, 2016

1 commit

  • TIPC based clusters are by default set up with full-mesh link
    connectivity between all nodes. Those links are expected to provide
    a short failure detection time, by default set to 1500 ms. Because
    of this, the background load for neighbor monitoring in an N-node
    cluster increases with a factor N on each node, while the overall
    monitoring traffic through the network infrastructure increases at
    a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
    scale well beyond ~100 nodes unless we significantly increase failure
    discovery tolerance.

    This commit introduces a framework and an algorithm that drastically
    reduces this background load, while basically maintaining the original
    failure detection times across the whole cluster. Using this algorithm,
    background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
    at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
    now have to actively monitor 38 neighbors in a 400-node cluster, instead
    of as before 399.

    This "Overlapping Ring Supervision Algorithm" is completely distributed
    and employs no centralized or coordinated state. It goes as follows:

    - Each node makes up a linearly ascending, circular list of all its N
    known neighbors, based on their TIPC node identity. This algorithm
    must be the same on all nodes.

    - The node then selects the next M = sqrt(N) - 1 nodes downstream from
    itself in the list, and chooses to actively monitor those. This is
    called its "local monitoring domain".

    - It creates a domain record describing the monitoring domain, and
    piggy-backs this in the data area of all neighbor monitoring messages
    (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
    the cluster eventually (default within 400 ms) will learn about
    its monitoring domain.

    - Whenever a node discovers a change in its local domain, e.g., a node
    has been added or has gone down, it creates and sends out a new
    version of its node record to inform all neighbors about the change.

    - A node receiving a domain record from anybody outside its local domain
    matches this against its own list (which may not look the same), and
    chooses to not actively monitor those members of the received domain
    record that are also present in its own list. Instead, it relies on
    indications from the direct monitoring nodes if an indirectly
    monitored node has gone up or down. If a node is indicated lost, the
    receiving node temporarily activates its own direct monitoring towards
    that node in order to confirm, or not, that it is actually gone.

    - Since each node is actively monitoring sqrt(N) downstream neighbors,
    each node is also actively monitored by the same number of upstream
    neighbors. This means that all non-direct monitoring nodes normally
    will receive sqrt(N) indications that a node is gone.

    - A major drawback with ring monitoring is how it handles failures that
    cause massive network partitionings. If both a lost node and all its
    direct monitoring neighbors are inside the lost partition, the nodes in
    the remaining partition will never receive indications about the loss.
    To overcome this, each node also chooses to actively monitor some
    nodes outside its local domain. Those nodes are called remote domain
    "heads", and are selected in such a way that no node in the cluster
    will be more than two direct monitoring hops away. Because of this,
    each node, apart from monitoring the member of its local domain, will
    also typically monitor sqrt(N) remote head nodes.

    - As an optimization, local list status, domain status and domain
    records are marked with a generation number. This saves senders from
    unnecessarily conveying unaltered domain records, and receivers from
    performing unneeded re-adaptations of their node monitoring list, such
    as re-assigning domain heads.

    - As a measure of caution we have added the possibility to disable the
    new algorithm through configuration. We do this by keeping a threshold
    value for the cluster size; a cluster that grows beyond this value
    will switch from full-mesh to ring monitoring, and vice versa when
    it shrinks below the value. This means that if the threshold is set to
    a value larger than any anticipated cluster size (default size is 32)
    the new algorithm is effectively disabled. A patch set for altering the
    threshold value and for listing the table contents will follow shortly.

    - This change is fully backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

12 Apr, 2016

1 commit

  • Nametable updates received from the network that cannot be applied
    immediately are placed on a defer queue. This queue is global to the
    TIPC module, which might cause problems when using TIPC in containers.
    To prevent nametable updates from escaping into the wrong namespace,
    we make the queue pernet instead.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     

21 Nov, 2015

1 commit

  • The file name_distr.c currently contains three functions,
    named_cluster_distribute(), tipc_publ_subcscribe() and
    tipc_publ_unsubscribe() that all directly access fields in
    struct tipc_node. We want to eliminate such dependencies, so
    we move those functions to the file node.c and rename them to
    tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
    respectively.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

4 commits

  • After the previous changes in this series, we can now remove some
    unused code and structures, both in the broadcast, link aggregation
    and link code.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The code path for receiving broadcast packets is currently distinct
    from the unicast path. This leads to unnecessary code and data
    duplication, something that can be avoided with some effort.

    We now introduce separate per-peer tipc_link instances for handling
    broadcast packet reception. Each receive link keeps a pointer to the
    common, single, broadcast link instance, and can hence handle release
    and retransmission of send buffers as if they belonged to the own
    instance.

    Furthermore, we let each unicast link instance keep a reference to both
    the pertaining broadcast receive link, and to the common send link.
    This makes it possible for the unicast links to easily access data for
    broadcast link synchronization, as well as for carrying acknowledges for
    received broadcast packets.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast lock will need to be acquired outside bcast.c in a later
    commit. For this reason, we move the lock to struct tipc_net. Consistent
    with the changes in the previous commit, we also introducee two new
    functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
    currently using tipc_bclink_lock()/unlock() will be phased out during
    the coming commits in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently, a number of structure and function definitions related
    to the broadcast functionality are unnecessarily exposed in the file
    bcast.h. This obscures the fact that the external interface towards
    the broadcast link in fact is very narrow, and causes unnecessary
    recompilations of other files when anything changes in those
    definitions.

    In this commit, we move as many of those definitions as is currently
    possible to the file bcast.c.

    We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
    since the name does not correctly describe the contents of this
    struct, and will do so even less in the future, and because we want
    to use the term 'link' more appropriately in the functionality
    introduced later in this series.

    Finally, we rename a couple of functions, such as tipc_bclink_xmit()
    and others that will be kept in the future, to include the term 'bcast'
    instead.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

31 Jul, 2015

1 commit

  • We simplify the link creation function tipc_link_create() and the way
    the link struct it is connected to the node struct. In particular, we
    remove the duplicate initialization of some fields which are anyway set
    in tipc_link_reset().

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2015

1 commit

  • We convert packet/message reception according to the same principle
    we have been using for message sending and timeout handling:

    We move the function tipc_rcv() to node.c, hence handling the initial
    packet reception at the link aggregation level. The function grabs
    the node lock, selects the receiving link, and accesses it via a new
    call tipc_link_rcv(). This function appends buffers to the input
    queue for delivery upwards, but it may also append outgoing packets
    to the xmit queue, just as we do during regular message sending. The
    latter will happen when buffers are forwarded from the link backlog,
    or when retransmission is requested.

    Upon return of this function, and after having released the node lock,
    tipc_rcv() delivers/tranmsits the contents of those queues, but it may
    also perform actions such as link activation or reset, as indicated by
    the return flags from the link.

    This reduces the number of cpu cycles spent inside the node spinlock,
    and reduces contention on that lock.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2015

2 commits

  • Although the sequence number in the TIPC protocol is 16 bits, we have
    until now stored it internally as an unsigned 32 bits integer.
    We got around this by always doing explicit modulo-65535 operations
    whenever we need to access a sequence number.

    We now make the incoming and outgoing sequence numbers to unsigned
    16-bit integers, and remove the modulo operations where applicable.

    We also move the arithmetic inline functions for 16 bit integers
    to core.h, and the function buf_seqno() to msg.h, so they can easily
    be accessed from anywhere in the code.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • When we try to add new inline functions in the code, we sometimes
    run into circular include dependencies.

    The main problem is that the file core.h, which really should be at
    the root of the dependency chain, instead is a leaf. I.e., core.h
    includes a number of header files that themselves should be allowed
    to include core.h. In reality this is unnecessary, because core.h does
    not need to know the full signature of any of the structs it refers to,
    only their type declaration.

    In this commit, we remove all dependencies from core.h towards any
    other tipc header file.

    As a consequence of this change, we can now move the function
    tipc_own_addr(net) from addr.c to addr.h, and make it inline.

    There are no functional changes in this commit.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

10 Feb, 2015

2 commits

  • tipc_snprintf() was heavily utilized by the old netlink API which no
    longer exists (now netlink compat).

    In this patch we swap tipc_snprintf() to the identical scnprintf() in
    the only remaining occurrence.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • The new netlink API is no longer "v2" but rather the standard API and
    the legacy API is now "nl compat". We split them into separate
    start/stop and put them in different files in order to further
    distinguish them.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     

13 Jan, 2015

11 commits

  • After namespace is supported, each namespace should own its private
    random value. So the global variable representing the random value
    must be moved to tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC establishes one subscriber server which allows users to subscribe
    their interesting name service status. After tipc supports namespace,
    one dedicated tipc stack instance is created for each namespace, and
    each instance can be deemed as one independent TIPC node. As a result,
    subscriber server must be built for each namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Now tipc socket table is statically allocated as a global variable.
    Through it, we can look up one socket instance with port ID, insert
    a new socket instance to the table, and delete a socket from the
    table. But when tipc supports net namespace, each namespace must own
    its specific socket table. So the global variable of socket table
    must be redefined in tipc_net structure. As a concequence, a new
    socket table will be allocated when a new namespace is created, and
    a socket table will be deallocated when namespace is destroyed.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC broadcast link is statically established and its relevant states
    are maintained with the global variables: "bcbearer", "bclink" and
    "bcl". Allowing different namespace to own different broadcast link
    instances, these variables must be moved to tipc_net structure and
    broadcast link instances would be allocated and initialized when
    namespace is created.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Bearer list defined as a global variable is used to store bearer
    instances. When tipc supports net namespace, bearers created in
    one namespace must be isolated with others allocated in other
    namespaces, which requires us that the bearer list(bearer_list)
    must be moved to tipc_net structure. As a result, a net namespace
    pointer has to be passed to functions which access the bearer list.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Only the works of initializing and shutting down tipc module are done
    in core.h and core.c files, so all stuffs which are not closely
    associated with the two tasks should be moved to appropriate places.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Not only some wrapper function like k_term_timer() is empty, but also
    some others including k_start_timer() and k_cancel_timer() don't return
    back any value to its caller, what's more, there is no any component
    in the kernel world to do such thing. Therefore, these timer interfaces
    defined in tipc module should be purged.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Jan, 2015

1 commit

  • As tipc reference table is statically allocated, its memory size
    requested on stack initialization stage is quite big even if the
    maximum port number is just restricted to 8191 currently, however,
    the number already becomes insufficient in practice. But if the
    maximum ports is allowed to its theory value - 2^32, its consumed
    memory size will reach a ridiculously unacceptable value. Apart from
    this, heavy tipc users spend a considerable amount of time in
    tipc_sk_get() due to the read-lock on ref_table_lock.

    If tipc reference table is converted with generic rhashtable, above
    mentioned both disadvantages would be resolved respectively: making
    use of the new resizable hash table can avoid locking on the lookup;
    smaller memory size is required at initial stage, for example, 256
    hash bucket slots are requested at the beginning phase instead of
    allocating the entire 8191 slots in old mode. The hash table will
    grow if entries exceeds 75% of table size up to a total table size
    of 1M, and it will automatically shrink if usage falls below 30%,
    but the minimum table size is allowed down to 256.

    Also converts ref_table_lock to a separate mutex to protect hash table
    mutations on write side. Lastly defers the release of the socket
    reference using call_rcu() to allow using an RCU read-side protected
    call to rhashtable_lookup().

    Signed-off-by: Ying Xue
    Acked-by: Jon Maloy
    Acked-by: Erik Hugne
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     

27 Nov, 2014

1 commit

  • The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
    used to flag whether or not more messages can be bundled into a data
    packet in the outgoing transmission queue. Obviously, no more messages
    can be appended after the packet has been sent and is waiting to be
    acknowledged and deleted. These message types do in reality represent
    a send-side local implementation flag, and are not defined as part of
    the protocol. It is therefore safe to move it to to where it belongs,
    that is, the control area (TIPC_SKB_CB) of the buffer.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

22 Nov, 2014

1 commit

  • A new netlink API for tipc that can disable or enable a tipc bearer.

    The new API is separated from the old API because of a bug in the
    user space client (tipc-config). The problem is that older versions
    of tipc-config has a very low receive limit and adding commands to
    the legacy genl_opts struct causes the ctrl_getfamily() response
    message to grow, subsequently breaking the tool.

    The new API utilizes netlink policies for input validation. Where the
    top-level netlink attributes are tipc-logical entities, like bearer.
    The top level entities then contain nested attributes. In this case
    a name, nested link properties and a domain.

    Netlink commands implemented in this patch:
    TIPC_NL_BEARER_ENABLE
    TIPC_NL_BEARER_DISABLE

    Netlink logical layout of bearer enable message:
    -> bearer
    -> name
    [ -> domain ]
    [
    -> properties
    -> priority
    ]

    Netlink logical layout of bearer disable message:
    -> bearer
    -> name

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Richard Alpe
     

02 Sep, 2014

1 commit

  • TIPC name table updates are distributed asynchronously in a cluster,
    entailing a risk of certain race conditions. E.g., if two nodes
    simultaneously issue conflicting (overlapping) publications, this may
    not be detected until both publications have reached a third node, in
    which case one of the publications will be silently dropped on that
    node. Hence, we end up with an inconsistent name table.

    In most cases this conflict is just a temporary race, e.g., one
    node is issuing a publication under the assumption that a previous,
    conflicting, publication has already been withdrawn by the other node.
    However, because of the (rtt related) distributed update delay, this
    may not yet hold true on all nodes. The symptom of this failure is a
    syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

    In this commit we add a resiliency queue at the receiving end of
    the name table distributor. When insertion of an arriving publication
    fails, we retain it in this queue for a short amount of time, assuming
    that another update will arrive very soon and clear the conflict. If so
    happens, we insert the publication, otherwise we drop it.

    The (configurable) retention value defaults to 2000 ms. Knowing from
    experience that the situation described above is extremely rare, there
    is no risk that the queue will accumulate any large number of items.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     

24 Aug, 2014

1 commit

  • The current link implementation keeps a linked list of blocked ports/
    sockets that is populated when there is link congestion. The purpose
    of this is to let the link know which users to wake up when the
    congestion abates.

    This adds unnecessary complexity to the data structure and the code,
    since it forces us to involve the link each time we want to delete
    a socket. It also forces us to grab the spinlock port_lock within
    the scope of node_lock. We want to get rid of this direct dependence,
    as well as the deadlock hazard resulting from the usage of port_lock.

    In this commit, we instead let the link keep list of a "wakeup" pseudo
    messages for use in such situations. Those messages are sent to the
    pending sockets via the ordinary message reception path, and wake up
    the socket's owner when they are received.

    This enables us to get rid of the 'waiting_ports' linked lists in struct
    tipc_port that manifest this direct reference. As a consequence, we can
    eliminate another BH entry into the socket, and hence the need to grab
    port_lock. This is a further step in our effort to remove port_lock
    altogether.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2014

2 commits

  • TIPC currently handles two media specific addresses: Ethernet MAC
    addresses and InfiniBand addresses. Those are kept in three different
    formats:

    1) A "raw" format as obtained from the device. This format is known
    only by the media specific adapter code in eth_media.c and
    ib_media.c.
    2) A "generic" internal format, in the form of struct tipc_media_addr,
    which can be referenced and passed around by the generic media-
    unaware code.
    3) A serialized version of the latter, to be conveyed in neighbor
    discovery messages.

    Conversion between the three formats can only be done by the media
    specific code, so we have function pointers for this purpose in
    struct tipc_media. Here, the media adapters can install their own
    conversion functions at startup.

    We now introduce a new such function, 'raw2addr()', whose purpose
    is to convert from format 1 to format 2 above. We also try to as far
    as possible uniform commenting, variable names and usage of these
    functions, with the purpose of making them more comprehensible.

    We can now also remove the function tipc_l2_media_addr_set(), whose
    job is done better by the new function.

    Finally, we expand the field for serialized addresses (format 3)
    in discovery messages from 20 to 32 bytes. This is permitted
    according to the spec, and reduces the risk of problems when we
    add new media in the future.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The function tipc_link_frag_rcv() is in reality a re-entrant generic
    message reassemby function that has nothing in particular to do with
    the link, where it is defined now. This becomes obvious when we see
    the need to call the function from other places in the code.

    In this commit rename it to tipc_buf_append() and move it to the file
    msg.c. We also simplify its signature by moving the tail pointer to
    the control block of the head buffer, hence making the head buffer
    self-contained.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 May, 2014

1 commit

  • In the previous commits of this series, we removed all asynchronous
    actions which were based on the tasklet handler - "tipc_k_signal()".

    So the moment has now come when we can completely remove the tasklet
    handler infrastructure. That is done with this commit.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

23 Apr, 2014

1 commit

  • There have two paths where we can configure or change bearer status:
    one is that bearer is configured from user space with tipc-config
    tool; another one is that bearer is changed by notification events
    from its attached interface. On the first path, one dedicated
    config_mutex lock is guarded; on the latter path, RTNL lock has been
    placed to serialize the process of dealing with interface events.
    So, if RTNL lock is also used to protect the first path, this will
    not only extremely help us simplify current locking policy, but also
    config_mutex lock can be deleted as well.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Tested-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

28 Mar, 2014

1 commit

  • Due to the lacking of any credential, it's allowed to accept commands
    requested from remote nodes to query the local node status, which is
    prone to involve potential security risks. Instead, if we login to
    a remote node with ssh command, this approach is not only more safe
    than the remote management feature, but also it can give us more
    permissions like changing the remote node configuration. So it's
    reasonable for us to obsolete the remote management feature now.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

22 Feb, 2014

1 commit

  • Accidentally a side effect is involved by commit 6e967adf7(tipc:
    relocate common functions from media to bearer). Now tipc stack
    handler of receiving packets from netdevices as well as netdevice
    notification handler are registered when bearer is enabled rather
    than tipc module initialization stage, but the two handlers are
    both unregistered in tipc module exit phase. If tipc module is
    inserted and then immediately removed, the following warning
    message will appear:

    "dev_remove_pack: ffffffffa0380940 not found"

    This is because in module insertion stage tipc stack packet handler
    is not registered at all, but in module exit phase dev_remove_pack()
    needs to remove it. Of course, dev_remove_pack() cannot find tipc
    protocol handler from the kernel protocol handler list so that the
    warning message is printed out.

    But if registering the two handlers is adjusted from enabling bearer
    phase into inserting module stage, the warning message will be
    eliminated. Due to this change, tipc_core_start_net() and
    tipc_core_stop_net() can be deleted as well.

    Reported-by: Wang Weidong
    Cc: Jon Maloy
    Cc: Erik Hugne
    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue
     

14 Feb, 2014

1 commit

  • If a packet received on a link is out-of-sequence, it will be
    placed on a deferred queue and later reinserted in the receive
    path once the preceding packets have been processed. The problem
    with this is that it will be subject to the buffer adjustment from
    link_recv_buf_validate twice. The second adjustment for 20 bytes
    header space will corrupt the packet.

    We solve this by tagging the deferred packets and bail out from
    receive buffer validation for packets that have already been
    subjected to this.

    Signed-off-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     

17 Dec, 2013

1 commit


20 Oct, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

18 Jun, 2013

2 commits

  • TIPC has two internal servers, one providing a subscription
    service for topology events, and another providing the
    configuration interface. These servers have previously been running
    in BH context, accessing the TIPC-port (aka native) API directly.
    Apart from these servers, even the TIPC socket implementation is
    partially built on this API.

    As this API may simultaneously be called via different paths and in
    different contexts, a complex and costly lock policiy is required
    in order to protect TIPC internal resources.

    To eliminate the need for this complex lock policiy, we introduce
    a new, generic service API that uses kernel sockets for message
    passing instead of the native API. Once the toplogy and configuration
    servers are converted to use this new service, all code pertaining
    to the native API can be removed. This entails a significant
    reduction in code amount and complexity, and opens up for a complete
    rework of the locking policy in TIPC.

    The new service also solves another problem:

    As the current topology server works in BH context, it cannot easily
    be blocked when sending of events fails due to congestion. In such
    cases events may have to be silently dropped, something that is
    unacceptable. Therefore, the new service keeps a dedicated outbound
    queue receiving messages from BH context. Once messages are
    inserted into this queue, we will immediately schedule a work from a
    special workqueue. This way, messages/events from the topology server
    are in reality sent in process context, and the server can block
    if necessary.

    Analogously, there is a new workqueue for receiving messages. Once a
    notification about an arriving message is received in BH context, we
    schedule a work from the receive workqueue to do the job of
    receiving the message in process context.

    As both sending and receive messages are now finished in processes,
    subscribed events cannot be dropped any more.

    As of this commit, this new server infrastructure is built, but
    not actually yet called by the existing TIPC code, but since the
    conversion changes required in order to use it are significant,
    the addition is kept here as a separate commit.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue
     
  • As per feedback from the netdev community, we change the buffer
    overflow protection algorithm in receiving sockets so that it
    always respects the nominal upper limit set in sk_rcvbuf.

    Instead of scaling up from a small sk_rcvbuf value, which leads to
    violation of the configured sk_rcvbuf limit, we now calculate the
    weighted per-message limit by scaling down from a much bigger value,
    still in the same field, according to the importance priority of the
    received message.

    To allow for administrative tunability of the socket receive buffer
    size, we create a tipc_rmem sysctl variable to allow the user to
    configure an even bigger value via sysctl command. It is a size of
    three (min/default/max) to be consistent with things like tcp_rmem.

    By default, the value initialized in tipc_rmem[1] is equal to the
    receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
    This value is also set as the default value of sk_rcvbuf.

    Originally-by: Jon Maloy
    Cc: Neil Horman
    Cc: Jon Maloy
    [Ying: added sysctl variation to Jon's original patch]
    Signed-off-by: Ying Xue
    [PG: don't compile sysctl.c if not config'd; add Documentation]
    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue