17 Nov, 2019

1 commit


15 Nov, 2019

1 commit

  • The tipc prefix for log messages generated by tipc was
    removed in commit 07f6c4bc048a ("tipc: convert tipc reference
    table to use generic rhashtable").

    This is still a useful prefix so add it back.

    Signed-off-by: Matt Bennett
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Matt Bennett
     

09 Nov, 2019

1 commit

  • This commit offers an option to encrypt and authenticate all messaging,
    including the neighbor discovery messages. The currently most advanced
    algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
    encryption/decryption is done at the bearer layer, just before leaving
    or after entering TIPC.

    Supported features:
    - Encryption & authentication of all TIPC messages (header + data);
    - Two symmetric-key modes: Cluster and Per-node;
    - Automatic key switching;
    - Key-expired revoking (sequence number wrapped);
    - Lock-free encryption/decryption (RCU);
    - Asynchronous crypto, Intel AES-NI supported;
    - Multiple cipher transforms;
    - Logs & statistics;

    Two key modes:
    - Cluster key mode: One single key is used for both TX & RX in all
    nodes in the cluster.
    - Per-node key mode: Each nodes in the cluster has one specific TX key.
    For RX, a node requires its peers' TX key to be able to decrypt the
    messages from those peers.

    Key setting from user-space is performed via netlink by a user program
    (e.g. the iproute2 'tipc' tool).

    Internal key state machine:

    Attach Align(RX)
    +-+ +-+
    | V | V
    +---------+ Attach +---------+
    | IDLE |---------------->| PENDING |(user = 0)
    +---------+ +---------+
    A A Switch| A
    | | | |
    | | Free(switch/revoked) | |
    (Free)| +----------------------+ | |Timeout
    | (TX) | | |(RX)
    | | | |
    | | v |
    +---------+ Switch +---------+
    | PASSIVE |= 1)

    The number of TFMs is 10 by default and can be changed via the procfs
    'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
    also used to print the crypto statistics at runtime:

    echo 0xfff1 > /proc/sys/net/tipc/max_tfms

    The patch defines a new TIPC version (v7) for the encryption message (-
    backward compatibility as well). The message is basically encapsulated
    as follows:

    +----------------------------------------------------------+
    | TIPCv7 encryption | Original TIPCv2 | Authentication |
    | header | packet (encrypted) | Tag |
    +----------------------------------------------------------+

    The throughput is about ~40% for small messages (compared with non-
    encryption) and ~9% for large messages. With the support from hardware
    crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
    upto ~85% for small messages and ~55% for large messages.

    By default, the new feature is inactive (i.e. no encryption) until user
    sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
    in the kernel configuration to enable/disable the new code when needed.

    MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     

30 Oct, 2019

1 commit

  • Currently, TIPC transports intra-node user data messages directly
    socket to socket, hence shortcutting all the lower layers of the
    communication stack. This gives TIPC very good intra node performance,
    both regarding throughput and latency.

    We now introduce a similar mechanism for TIPC data traffic across
    network namespaces located in the same kernel. On the send path, the
    call chain is as always accompanied by the sending node's network name
    space pointer. However, once we have reliably established that the
    receiving node is represented by a namespace on the same host, we just
    replace the namespace pointer with the receiving node/namespace's
    ditto, and follow the regular socket receive patch though the receiving
    node. This technique gives us a throughput similar to the node internal
    throughput, several times larger than if we let the traffic go though
    the full network stacks. As a comparison, max throughput for 64k
    messages is four times larger than TCP throughput for the same type of
    traffic.

    To meet any security concerns, the following should be noted.

    - All nodes joining a cluster are supposed to have been be certified
    and authenticated by mechanisms outside TIPC. This is no different for
    nodes/namespaces on the same host; they have to auto discover each
    other using the attached interfaces, and establish links which are
    supervised via the regular link monitoring mechanism. Hence, a kernel
    local node has no other way to join a cluster than any other node, and
    have to obey to policies set in the IP or device layers of the stack.

    - Only when a sender has established with 100% certainty that the peer
    node is located in a kernel local namespace does it choose to let user
    data messages, and only those, take the crossover path to the receiving
    node/namespace.

    - If the receiving node/namespace is removed, its namespace pointer
    is invalidated at all peer nodes, and their neighbor link monitoring
    will eventually note that this node is gone.

    - To ensure the "100% certainty" criteria, and prevent any possible
    spoofing, received discovery messages must contain a proof that the
    sender knows a common secret. We use the hash mix of the sending
    node/namespace for this purpose, since it can be accessed directly by
    all other namespaces in the kernel. Upon reception of a discovery
    message, the receiver checks this proof against all the local
    namespaces'hash_mix:es. If it finds a match, that, along with a
    matching node id and cluster id, this is deemed sufficient proof that
    the peer node in question is in a local namespace, and a wormhole can
    be opened.

    - We should also consider that TIPC is intended to be a cluster local
    IPC mechanism (just like e.g. UNIX sockets) rather than a network
    protocol, and hence we think it can justified to allow it to shortcut the
    lower protocol layers.

    Regarding traceability, we should notice that since commit 6c9081a3915d
    ("tipc: add loopback device tracking") it is possible to follow the node
    internal packet flow by just activating tcpdump on the loopback
    interface. This will be true even for this mechanism; by activating
    tcpdump on the involved nodes' loopback interfaces their inter-name
    space messaging can easily be tracked.

    v2:
    - update 'net' pointer when node left/rejoined
    v3:
    - grab read/write lock when using node ref obj
    v4:
    - clone traffics between netns to loopback

    Suggested-by: Jon Maloy
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

09 Aug, 2019

1 commit

  • Since node internal messages are passed directly to the socket, it is not
    possible to observe those messages via tcpdump or wireshark.

    We now remedy this by making it possible to clone such messages and send
    the clones to the loopback interface. The clones are dropped at reception
    and have no functional role except making the traffic visible.

    The feature is enabled if network taps are active for the loopback device.
    pcap filtering restrictions require the messages to be presented to the
    receiving side of the loopback device.

    v3 - Function dev_nit_active used to check for network taps.
    - Procedure netif_rx_ni used to send cloned messages to loopback device.

    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    John Rutherford
     

20 Mar, 2019

1 commit

  • As a preparation for introducing a smooth switching between replicast
    and broadcast method for multicast message, We have to introduce a new
    capability flag TIPC_MCAST_RBCTL to handle this new feature.

    During a cluster upgrade a node can come back with this new capabilities
    which also must be reflected in the cluster capabilities field.
    The new feature is only applicable if all node in the cluster supports
    this new capability.

    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

01 Apr, 2018

1 commit

  • The current design of the binding table has an unnecessary memory
    consuming and complex data structure. It aggregates the service range
    items into an array, which is expanded by a factor two every time it
    becomes too small to hold a new item. Furthermore, the arrays never
    shrink when the number of ranges diminishes.

    We now replace this array with an RB tree that is holding the range
    items as tree nodes, each range directly holding a list of bindings.

    This, along with a few name changes, improves both readability and
    volume of the code, as well as reducing memory consumption and hopefully
    improving cache hit rate.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

24 Mar, 2018

3 commits

  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We add a 128-bit node identity, as an alternative to the currently used
    32-bit node address.

    For the sake of compatibility and to minimize message header changes
    we retain the existing 32-bit address field. When not set explicitly by
    the user, this field will be filled with a hash value generated from the
    much longer node identity, and be used as a shorthand value for the
    latter.

    We permit either the address or the identity to be set by configuration,
    but not both, so when the address value is set by a legacy user the
    corresponding 128-bit node identity is generated based on the that value.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • The removal of an internal structure of the node address has an unwanted
    side effect.
    - Currently, if a user is sending an anycast message with destination
    domain 0, the tipc_namebl_translate() function will use the 'closest-
    first' algorithm to first look for a node local destination, and only
    when no such is found, will it resort to the cluster global 'round-
    robin' lookup algorithm.
    - Current users can get around this, and enforce unconditional use of
    global round-robin by indicating a destination as Z.0.0 or Z.C.0.
    - This option disappears when we make the node address flat, since the
    lookup algorithm has no way of recognizing this case. So, as long as
    there are node local destinations, the algorithm will always select
    one of those, and there is nothing the sender can do to change this.

    We solve this by eliminating the 'closest-first' option, which was never
    a good idea anyway, for non-legacy users, but only for those. To
    distinguish between legacy users and non-legacy users we introduce a new
    flag 'legacy_addr_format' in struct tipc_core, to be set when the user
    configures a legacy-style Z.C.N node address. Hence, when a legacy user
    indicates a zero lookup domain 'closest-first' is selected, and in all
    other cases we use 'round-robin'.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

18 Mar, 2018

1 commit

  • As a consequence of the previous commit we nan now eliminate zone scope
    related lists in the name table. We start with name_table::publ_list[3],
    which can now be replaced with two lists, one for node scope publications
    and one for cluster scope publications.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

17 Feb, 2018

1 commit

  • We rename struct tipc_server to struct tipc_topsrv. This reflect its now
    specialized role as topology server. Accoringly, we change or add function
    prefixes to make it clearer which functionality those belong to.

    There are no functional changes in this commit.

    Acked-by: Ying.Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

09 Jan, 2018

1 commit

  • Preempt counter APIs have been split out, currently, hardirq.h just
    includes irq_enter/exit APIs which are not used by TIPC at all.

    So, remove the unused hardirq.h.

    Signed-off-by: Yang Shi
    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Cc: Jon Maloy
    Cc: "David S. Miller"
    Signed-off-by: David S. Miller

    Yang Shi
     

13 Oct, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

16 Jun, 2016

1 commit

  • TIPC based clusters are by default set up with full-mesh link
    connectivity between all nodes. Those links are expected to provide
    a short failure detection time, by default set to 1500 ms. Because
    of this, the background load for neighbor monitoring in an N-node
    cluster increases with a factor N on each node, while the overall
    monitoring traffic through the network infrastructure increases at
    a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
    scale well beyond ~100 nodes unless we significantly increase failure
    discovery tolerance.

    This commit introduces a framework and an algorithm that drastically
    reduces this background load, while basically maintaining the original
    failure detection times across the whole cluster. Using this algorithm,
    background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
    at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
    now have to actively monitor 38 neighbors in a 400-node cluster, instead
    of as before 399.

    This "Overlapping Ring Supervision Algorithm" is completely distributed
    and employs no centralized or coordinated state. It goes as follows:

    - Each node makes up a linearly ascending, circular list of all its N
    known neighbors, based on their TIPC node identity. This algorithm
    must be the same on all nodes.

    - The node then selects the next M = sqrt(N) - 1 nodes downstream from
    itself in the list, and chooses to actively monitor those. This is
    called its "local monitoring domain".

    - It creates a domain record describing the monitoring domain, and
    piggy-backs this in the data area of all neighbor monitoring messages
    (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
    the cluster eventually (default within 400 ms) will learn about
    its monitoring domain.

    - Whenever a node discovers a change in its local domain, e.g., a node
    has been added or has gone down, it creates and sends out a new
    version of its node record to inform all neighbors about the change.

    - A node receiving a domain record from anybody outside its local domain
    matches this against its own list (which may not look the same), and
    chooses to not actively monitor those members of the received domain
    record that are also present in its own list. Instead, it relies on
    indications from the direct monitoring nodes if an indirectly
    monitored node has gone up or down. If a node is indicated lost, the
    receiving node temporarily activates its own direct monitoring towards
    that node in order to confirm, or not, that it is actually gone.

    - Since each node is actively monitoring sqrt(N) downstream neighbors,
    each node is also actively monitored by the same number of upstream
    neighbors. This means that all non-direct monitoring nodes normally
    will receive sqrt(N) indications that a node is gone.

    - A major drawback with ring monitoring is how it handles failures that
    cause massive network partitionings. If both a lost node and all its
    direct monitoring neighbors are inside the lost partition, the nodes in
    the remaining partition will never receive indications about the loss.
    To overcome this, each node also chooses to actively monitor some
    nodes outside its local domain. Those nodes are called remote domain
    "heads", and are selected in such a way that no node in the cluster
    will be more than two direct monitoring hops away. Because of this,
    each node, apart from monitoring the member of its local domain, will
    also typically monitor sqrt(N) remote head nodes.

    - As an optimization, local list status, domain status and domain
    records are marked with a generation number. This saves senders from
    unnecessarily conveying unaltered domain records, and receivers from
    performing unneeded re-adaptations of their node monitoring list, such
    as re-assigning domain heads.

    - As a measure of caution we have added the possibility to disable the
    new algorithm through configuration. We do this by keeping a threshold
    value for the cluster size; a cluster that grows beyond this value
    will switch from full-mesh to ring monitoring, and vice versa when
    it shrinks below the value. This means that if the threshold is set to
    a value larger than any anticipated cluster size (default size is 32)
    the new algorithm is effectively disabled. A patch set for altering the
    threshold value and for listing the table contents will follow shortly.

    - This change is fully backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

12 Apr, 2016

1 commit

  • Nametable updates received from the network that cannot be applied
    immediately are placed on a defer queue. This queue is global to the
    TIPC module, which might cause problems when using TIPC in containers.
    To prevent nametable updates from escaping into the wrong namespace,
    we make the queue pernet instead.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     

21 Nov, 2015

1 commit

  • The file name_distr.c currently contains three functions,
    named_cluster_distribute(), tipc_publ_subcscribe() and
    tipc_publ_unsubscribe() that all directly access fields in
    struct tipc_node. We want to eliminate such dependencies, so
    we move those functions to the file node.c and rename them to
    tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
    respectively.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

24 Oct, 2015

4 commits

  • After the previous changes in this series, we can now remove some
    unused code and structures, both in the broadcast, link aggregation
    and link code.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The code path for receiving broadcast packets is currently distinct
    from the unicast path. This leads to unnecessary code and data
    duplication, something that can be avoided with some effort.

    We now introduce separate per-peer tipc_link instances for handling
    broadcast packet reception. Each receive link keeps a pointer to the
    common, single, broadcast link instance, and can hence handle release
    and retransmission of send buffers as if they belonged to the own
    instance.

    Furthermore, we let each unicast link instance keep a reference to both
    the pertaining broadcast receive link, and to the common send link.
    This makes it possible for the unicast links to easily access data for
    broadcast link synchronization, as well as for carrying acknowledges for
    received broadcast packets.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The broadcast lock will need to be acquired outside bcast.c in a later
    commit. For this reason, we move the lock to struct tipc_net. Consistent
    with the changes in the previous commit, we also introducee two new
    functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
    currently using tipc_bclink_lock()/unlock() will be phased out during
    the coming commits in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Currently, a number of structure and function definitions related
    to the broadcast functionality are unnecessarily exposed in the file
    bcast.h. This obscures the fact that the external interface towards
    the broadcast link in fact is very narrow, and causes unnecessary
    recompilations of other files when anything changes in those
    definitions.

    In this commit, we move as many of those definitions as is currently
    possible to the file bcast.c.

    We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
    since the name does not correctly describe the contents of this
    struct, and will do so even less in the future, and because we want
    to use the term 'link' more appropriately in the functionality
    introduced later in this series.

    Finally, we rename a couple of functions, such as tipc_bclink_xmit()
    and others that will be kept in the future, to include the term 'bcast'
    instead.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

31 Jul, 2015

1 commit

  • We simplify the link creation function tipc_link_create() and the way
    the link struct it is connected to the node struct. In particular, we
    remove the duplicate initialization of some fields which are anyway set
    in tipc_link_reset().

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2015

1 commit

  • We convert packet/message reception according to the same principle
    we have been using for message sending and timeout handling:

    We move the function tipc_rcv() to node.c, hence handling the initial
    packet reception at the link aggregation level. The function grabs
    the node lock, selects the receiving link, and accesses it via a new
    call tipc_link_rcv(). This function appends buffers to the input
    queue for delivery upwards, but it may also append outgoing packets
    to the xmit queue, just as we do during regular message sending. The
    latter will happen when buffers are forwarded from the link backlog,
    or when retransmission is requested.

    Upon return of this function, and after having released the node lock,
    tipc_rcv() delivers/tranmsits the contents of those queues, but it may
    also perform actions such as link activation or reset, as indicated by
    the return flags from the link.

    This reduces the number of cpu cycles spent inside the node spinlock,
    and reduces contention on that lock.

    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2015

2 commits

  • Although the sequence number in the TIPC protocol is 16 bits, we have
    until now stored it internally as an unsigned 32 bits integer.
    We got around this by always doing explicit modulo-65535 operations
    whenever we need to access a sequence number.

    We now make the incoming and outgoing sequence numbers to unsigned
    16-bit integers, and remove the modulo operations where applicable.

    We also move the arithmetic inline functions for 16 bit integers
    to core.h, and the function buf_seqno() to msg.h, so they can easily
    be accessed from anywhere in the code.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • When we try to add new inline functions in the code, we sometimes
    run into circular include dependencies.

    The main problem is that the file core.h, which really should be at
    the root of the dependency chain, instead is a leaf. I.e., core.h
    includes a number of header files that themselves should be allowed
    to include core.h. In reality this is unnecessary, because core.h does
    not need to know the full signature of any of the structs it refers to,
    only their type declaration.

    In this commit, we remove all dependencies from core.h towards any
    other tipc header file.

    As a consequence of this change, we can now move the function
    tipc_own_addr(net) from addr.c to addr.h, and make it inline.

    There are no functional changes in this commit.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

10 Feb, 2015

2 commits

  • tipc_snprintf() was heavily utilized by the old netlink API which no
    longer exists (now netlink compat).

    In this patch we swap tipc_snprintf() to the identical scnprintf() in
    the only remaining occurrence.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • The new netlink API is no longer "v2" but rather the standard API and
    the legacy API is now "nl compat". We split them into separate
    start/stop and put them in different files in order to further
    distinguish them.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     

13 Jan, 2015

11 commits

  • After namespace is supported, each namespace should own its private
    random value. So the global variable representing the random value
    must be moved to tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC establishes one subscriber server which allows users to subscribe
    their interesting name service status. After tipc supports namespace,
    one dedicated tipc stack instance is created for each namespace, and
    each instance can be deemed as one independent TIPC node. As a result,
    subscriber server must be built for each namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Now tipc socket table is statically allocated as a global variable.
    Through it, we can look up one socket instance with port ID, insert
    a new socket instance to the table, and delete a socket from the
    table. But when tipc supports net namespace, each namespace must own
    its specific socket table. So the global variable of socket table
    must be redefined in tipc_net structure. As a concequence, a new
    socket table will be allocated when a new namespace is created, and
    a socket table will be deallocated when namespace is destroyed.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC broadcast link is statically established and its relevant states
    are maintained with the global variables: "bcbearer", "bclink" and
    "bcl". Allowing different namespace to own different broadcast link
    instances, these variables must be moved to tipc_net structure and
    broadcast link instances would be allocated and initialized when
    namespace is created.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Bearer list defined as a global variable is used to store bearer
    instances. When tipc supports net namespace, bearers created in
    one namespace must be isolated with others allocated in other
    namespaces, which requires us that the bearer list(bearer_list)
    must be moved to tipc_net structure. As a result, a net namespace
    pointer has to be passed to functions which access the bearer list.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Only the works of initializing and shutting down tipc module are done
    in core.h and core.c files, so all stuffs which are not closely
    associated with the two tasks should be moved to appropriate places.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Not only some wrapper function like k_term_timer() is empty, but also
    some others including k_start_timer() and k_cancel_timer() don't return
    back any value to its caller, what's more, there is no any component
    in the kernel world to do such thing. Therefore, these timer interfaces
    defined in tipc module should be purged.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Jan, 2015

1 commit

  • As tipc reference table is statically allocated, its memory size
    requested on stack initialization stage is quite big even if the
    maximum port number is just restricted to 8191 currently, however,
    the number already becomes insufficient in practice. But if the
    maximum ports is allowed to its theory value - 2^32, its consumed
    memory size will reach a ridiculously unacceptable value. Apart from
    this, heavy tipc users spend a considerable amount of time in
    tipc_sk_get() due to the read-lock on ref_table_lock.

    If tipc reference table is converted with generic rhashtable, above
    mentioned both disadvantages would be resolved respectively: making
    use of the new resizable hash table can avoid locking on the lookup;
    smaller memory size is required at initial stage, for example, 256
    hash bucket slots are requested at the beginning phase instead of
    allocating the entire 8191 slots in old mode. The hash table will
    grow if entries exceeds 75% of table size up to a total table size
    of 1M, and it will automatically shrink if usage falls below 30%,
    but the minimum table size is allowed down to 256.

    Also converts ref_table_lock to a separate mutex to protect hash table
    mutations on write side. Lastly defers the release of the socket
    reference using call_rcu() to allow using an RCU read-side protected
    call to rhashtable_lookup().

    Signed-off-by: Ying Xue
    Acked-by: Jon Maloy
    Acked-by: Erik Hugne
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue