18 Dec, 2019

1 commit

  • [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ]

    In order to set/get/dump, the tipc uses the generic netlink
    infrastructure. So, when tipc module is inserted, init function
    calls genl_register_family().
    After genl_register_family(), set/get/dump commands are immediately
    allowed and these callbacks internally use the net_generic.
    net_generic is allocated by register_pernet_device() but this
    is called after genl_register_family() in the __init function.
    So, these callbacks would use un-initialized net_generic.

    Test commands:
    #SHELL1
    while :
    do
    modprobe tipc
    modprobe -rv tipc
    done

    #SHELL2
    while :
    do
    tipc link list
    done

    Splat looks like:
    [ 59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
    [ 59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ 59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
    [ 59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
    [ 59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
    [ 59.622550][ T2780] NET: Registered protocol family 30
    [ 59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
    [ 59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
    [ 59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
    [ 59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
    [ 59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
    [ 59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
    [ 59.624639][ T2788] FS: 00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
    [ 59.624645][ T2788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 59.625875][ T2780] tipc: Started in single node mode
    [ 59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
    [ 59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 59.636478][ T2788] Call Trace:
    [ 59.637025][ T2788] tipc_nl_add_bc_link+0x179/0x1470 [tipc]
    [ 59.638219][ T2788] ? lock_downgrade+0x6e0/0x6e0
    [ 59.638923][ T2788] ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
    [ 59.639533][ T2788] ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
    [ 59.640160][ T2788] ? mutex_lock_io_nested+0x1380/0x1380
    [ 59.640746][ T2788] tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
    [ 59.641356][ T2788] ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
    [ 59.642088][ T2788] ? __skb_ext_del+0x270/0x270
    [ 59.642594][ T2788] genl_lock_dumpit+0x85/0xb0
    [ 59.643050][ T2788] netlink_dump+0x49c/0xed0
    [ 59.643529][ T2788] ? __netlink_sendskb+0xc0/0xc0
    [ 59.644044][ T2788] ? __netlink_dump_start+0x190/0x800
    [ 59.644617][ T2788] ? __mutex_unlock_slowpath+0xd0/0x670
    [ 59.645177][ T2788] __netlink_dump_start+0x5a0/0x800
    [ 59.645692][ T2788] genl_rcv_msg+0xa75/0xe90
    [ 59.646144][ T2788] ? __lock_acquire+0xdfe/0x3de0
    [ 59.646692][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.647340][ T2788] ? genl_lock_dumpit+0xb0/0xb0
    [ 59.647821][ T2788] ? genl_unlock+0x20/0x20
    [ 59.648290][ T2788] ? genl_parallel_done+0xe0/0xe0
    [ 59.648787][ T2788] ? find_held_lock+0x39/0x1d0
    [ 59.649276][ T2788] ? genl_rcv+0x15/0x40
    [ 59.649722][ T2788] ? lock_contended+0xcd0/0xcd0
    [ 59.650296][ T2788] netlink_rcv_skb+0x121/0x350
    [ 59.650828][ T2788] ? genl_family_rcv_msg_attrs_parse+0x320/0x320
    [ 59.651491][ T2788] ? netlink_ack+0x940/0x940
    [ 59.651953][ T2788] ? lock_acquire+0x164/0x3b0
    [ 59.652449][ T2788] genl_rcv+0x24/0x40
    [ 59.652841][ T2788] netlink_unicast+0x421/0x600
    [ ... ]

    Fixes: 7e4369057806 ("tipc: fix a slab object leak")
    Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
    Signed-off-by: Taehee Yoo
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

15 Nov, 2019

1 commit

  • The tipc prefix for log messages generated by tipc was
    removed in commit 07f6c4bc048a ("tipc: convert tipc reference
    table to use generic rhashtable").

    This is still a useful prefix so add it back.

    Signed-off-by: Matt Bennett
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Matt Bennett
     

09 Aug, 2019

1 commit

  • Since node internal messages are passed directly to the socket, it is not
    possible to observe those messages via tcpdump or wireshark.

    We now remedy this by making it possible to clone such messages and send
    the clones to the loopback interface. The clones are dropped at reception
    and have no functional role except making the traffic visible.

    The feature is enabled if network taps are active for the loopback device.
    pcap filtering restrictions require the messages to be presented to the
    receiving side of the loopback device.

    v3 - Function dev_nit_active used to check for network taps.
    - Procedure netif_rx_ni used to send cloned messages to loopback device.

    Signed-off-by: John Rutherford
    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    John Rutherford
     

23 Jun, 2019

1 commit

  • This patch is to fix a dst defcnt leak, which can be reproduced by doing:

    # ip net a c; ip net a s; modprobe tipc
    # ip net e s ip l a n eth1 type veth peer n eth1 netns c
    # ip net e c ip l s lo up; ip net e c ip l s eth1 up
    # ip net e s ip l s lo up; ip net e s ip l s eth1 up
    # ip net e c ip a a 1.1.1.2/8 dev eth1
    # ip net e s ip a a 1.1.1.1/8 dev eth1
    # ip net e c tipc b e m udp n u1 localip 1.1.1.2
    # ip net e s tipc b e m udp n u1 localip 1.1.1.1
    # ip net d c; ip net d s; rmmod tipc

    and it will get stuck and keep logging the error:

    unregister_netdevice: waiting for lo to become free. Usage count = 1

    The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx
    path with udp_early_demux == 1, and this dst (eventually holding lo dev)
    can't be released as bearer's removal in tipc pernet .exit happens after
    lo dev's removal, default_device pernet .exit.

    "There are two distinct types of pernet_operations recognized: subsys and
    device. At creation all subsys init functions are called before device
    init functions, and at destruction all device exit functions are called
    before subsys exit function."

    So by calling register_pernet_device instead to register tipc_net_ops, the
    pernet .exit() will be invoked earlier than loopback dev's removal when a
    netns is being destroyed, as fou/gue does.

    Note that vxlan and geneve udp tunnels don't have this issue, as the udp
    sock is released in their device ndo_stop().

    This fix is also necessary for tipc dst_cache, which will hold dsts on tx
    path and I will introduce in my next patch.

    Reported-by: Li Shuang
    Signed-off-by: Xin Long
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Xin Long
     

21 May, 2019

1 commit

  • Error message printed:
    modprobe: ERROR: could not insert 'tipc': Address family not
    supported by protocol.
    when modprobe tipc after the following patch: switch order of
    device registration, commit 7e27e8d6130c
    ("tipc: switch order of device registration to fix a crash")

    Because sock_create_kern(net, AF_TIPC, ...) called by
    tipc_topsrv_create_listener() in the initialization process
    of tipc_init_net(), so tipc_socket_init() must be execute before that.
    Meanwhile, tipc_net_id need to be initialized when sock_create()
    called, and tipc_socket_init() is no need to be called for each namespace.

    I add a variable tipc_topsrv_net_ops, and split the
    register_pernet_subsys() of tipc into two parts, and split
    tipc_socket_init() with initialization of pernet params.

    By the way, I fixed resources rollback error when tipc_bcast_init()
    failed in tipc_init_net().

    Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash")
    Signed-off-by: Junwei Hu
    Reported-by: Wang Wang
    Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com
    Reviewed-by: Kang Zhou
    Reviewed-by: Suanming Mou
    Signed-off-by: David S. Miller

    Junwei Hu
     

18 May, 2019

2 commits

  • This reverts commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e.

    More revisions coming up.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Error message printed:
    modprobe: ERROR: could not insert 'tipc': Address family not
    supported by protocol.
    when modprobe tipc after the following patch: switch order of
    device registration, commit 7e27e8d6130c
    ("tipc: switch order of device registration to fix a crash")

    Because sock_create_kern(net, AF_TIPC, ...) is called by
    tipc_topsrv_create_listener() in the initialization process
    of tipc_net_ops, tipc_socket_init() must be execute before that.

    I move tipc_socket_init() into function tipc_init_net().

    Fixes: 7e27e8d6130c
    ("tipc: switch order of device registration to fix a crash")
    Signed-off-by: Junwei Hu
    Reported-by: Wang Wang
    Reviewed-by: Kang Zhou
    Reviewed-by: Suanming Mou
    Signed-off-by: David S. Miller

    Junwei Hu
     

17 May, 2019

1 commit

  • When tipc is loaded while many processes try to create a TIPC socket,
    a crash occurs:
    PANIC: Unable to handle kernel paging request at virtual
    address "dfff20000000021d"
    pc : tipc_sk_create+0x374/0x1180 [tipc]
    lr : tipc_sk_create+0x374/0x1180 [tipc]
    Exception class = DABT (current EL), IL = 32 bits
    Call trace:
    tipc_sk_create+0x374/0x1180 [tipc]
    __sock_create+0x1cc/0x408
    __sys_socket+0xec/0x1f0
    __arm64_sys_socket+0x74/0xa8
    ...

    This is due to race between sock_create and unfinished
    register_pernet_device. tipc_sk_insert tries to do
    "net_generic(net, tipc_net_id)".
    but tipc_net_id is not initialized yet.

    So switch the order of the two to close the race.

    This can be reproduced with multiple processes doing socket(AF_TIPC, ...)
    and one process doing module removal.

    Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
    Signed-off-by: Junwei Hu
    Reported-by: Wang Wang
    Reviewed-by: Xiaogang Wang
    Signed-off-by: David S. Miller

    Junwei Hu
     

20 Mar, 2019

1 commit

  • As a preparation for introducing a smooth switching between replicast
    and broadcast method for multicast message, We have to introduce a new
    capability flag TIPC_MCAST_RBCTL to handle this new feature.

    During a cluster upgrade a node can come back with this new capabilities
    which also must be reflected in the cluster capabilities field.
    The new feature is only applicable if all node in the cluster supports
    this new capability.

    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     

28 Mar, 2018

1 commit


24 Mar, 2018

2 commits

  • When a 32-bit node address is generated from a 128-bit identifier,
    there is a risk of collisions which must be discovered and handled.

    We do this as follows:
    - We don't apply the generated address immediately to the node, but do
    instead initiate a 1 sec trial period to allow other cluster members
    to discover and handle such collisions.

    - During the trial period the node periodically sends out a new type
    of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
    to all the other nodes in the cluster.

    - When a node is receiving such a message, it must check that the
    presented 32-bit identifier either is unused, or was used by the very
    same peer in a previous session. In both cases it accepts the request
    by not responding to it.

    - If it finds that the same node has been up before using a different
    address, it responds with a DSC_TRIAL_FAIL_MSG containing that
    address.

    - If it finds that the address has already been taken by some other
    node, it generates a new, unused address and returns it to the
    requester.

    - During the trial period the requesting node must always be prepared
    to accept a failure message, i.e., a message where a peer suggests a
    different (or equal) address to the one tried. In those cases it
    must apply the suggested value as trial address and restart the trial
    period.

    This algorithm ensures that in the vast majority of cases a node will
    have the same address before and after a reboot. If a legacy user
    configures the address explicitly, there will be no trial period and
    messages, so this protocol addition is completely backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • We add a 128-bit node identity, as an alternative to the currently used
    32-bit node address.

    For the sake of compatibility and to minimize message header changes
    we retain the existing 32-bit address field. When not set explicitly by
    the user, this field will be filled with a hash value generated from the
    much longer node identity, and be used as a shorthand value for the
    latter.

    We permit either the address or the identity to be set by configuration,
    but not both, so when the address value is set by a legacy user the
    corresponding 128-bit node identity is generated based on the that value.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

13 Mar, 2018

1 commit

  • TIPC looks concentrated in itself, and other pernet_operations
    seem not touching its entities.

    tipc_net_ops look pernet-divided, and they should be safe to
    be executed in parallel for several net the same time.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

16 Jun, 2016

1 commit

  • TIPC based clusters are by default set up with full-mesh link
    connectivity between all nodes. Those links are expected to provide
    a short failure detection time, by default set to 1500 ms. Because
    of this, the background load for neighbor monitoring in an N-node
    cluster increases with a factor N on each node, while the overall
    monitoring traffic through the network infrastructure increases at
    a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
    scale well beyond ~100 nodes unless we significantly increase failure
    discovery tolerance.

    This commit introduces a framework and an algorithm that drastically
    reduces this background load, while basically maintaining the original
    failure detection times across the whole cluster. Using this algorithm,
    background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
    at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
    now have to actively monitor 38 neighbors in a 400-node cluster, instead
    of as before 399.

    This "Overlapping Ring Supervision Algorithm" is completely distributed
    and employs no centralized or coordinated state. It goes as follows:

    - Each node makes up a linearly ascending, circular list of all its N
    known neighbors, based on their TIPC node identity. This algorithm
    must be the same on all nodes.

    - The node then selects the next M = sqrt(N) - 1 nodes downstream from
    itself in the list, and chooses to actively monitor those. This is
    called its "local monitoring domain".

    - It creates a domain record describing the monitoring domain, and
    piggy-backs this in the data area of all neighbor monitoring messages
    (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
    the cluster eventually (default within 400 ms) will learn about
    its monitoring domain.

    - Whenever a node discovers a change in its local domain, e.g., a node
    has been added or has gone down, it creates and sends out a new
    version of its node record to inform all neighbors about the change.

    - A node receiving a domain record from anybody outside its local domain
    matches this against its own list (which may not look the same), and
    chooses to not actively monitor those members of the received domain
    record that are also present in its own list. Instead, it relies on
    indications from the direct monitoring nodes if an indirectly
    monitored node has gone up or down. If a node is indicated lost, the
    receiving node temporarily activates its own direct monitoring towards
    that node in order to confirm, or not, that it is actually gone.

    - Since each node is actively monitoring sqrt(N) downstream neighbors,
    each node is also actively monitored by the same number of upstream
    neighbors. This means that all non-direct monitoring nodes normally
    will receive sqrt(N) indications that a node is gone.

    - A major drawback with ring monitoring is how it handles failures that
    cause massive network partitionings. If both a lost node and all its
    direct monitoring neighbors are inside the lost partition, the nodes in
    the remaining partition will never receive indications about the loss.
    To overcome this, each node also chooses to actively monitor some
    nodes outside its local domain. Those nodes are called remote domain
    "heads", and are selected in such a way that no node in the cluster
    will be more than two direct monitoring hops away. Because of this,
    each node, apart from monitoring the member of its local domain, will
    also typically monitor sqrt(N) remote head nodes.

    - As an optimization, local list status, domain status and domain
    records are marked with a generation number. This saves senders from
    unnecessarily conveying unaltered domain records, and receivers from
    performing unneeded re-adaptations of their node monitoring list, such
    as re-assigning domain heads.

    - As a measure of caution we have added the possibility to disable the
    new algorithm through configuration. We do this by keeping a threshold
    value for the cluster size; a cluster that grows beyond this value
    will switch from full-mesh to ring monitoring, and vice versa when
    it shrinks below the value. This means that if the threshold is set to
    a value larger than any anticipated cluster size (default size is 32)
    the new algorithm is effectively disabled. A patch set for altering the
    threshold value and for listing the table contents will follow shortly.

    - This change is fully backwards compatible.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

04 May, 2016

1 commit

  • There are two flow control mechanisms in TIPC; one at link level that
    handles network congestion, burst control, and retransmission, and one
    at connection level which' only remaining task is to prevent overflow
    in the receiving socket buffer. In TIPC, the latter task has to be
    solved end-to-end because messages can not be thrown away once they
    have been accepted and delivered upwards from the link layer, i.e, we
    can never permit the receive buffer to overflow.

    Currently, this algorithm is message based. A counter in the receiving
    socket keeps track of number of consumed messages, and sends a dedicated
    acknowledge message back to the sender for each 256 consumed message.
    A counter at the sending end keeps track of the sent, not yet
    acknowledged messages, and blocks the sender if this number ever reaches
    512 unacknowledged messages. When the missing acknowledge arrives, the
    socket is then woken up for renewed transmission. This works well for
    keeping the message flow running, as it almost never happens that a
    sender socket is blocked this way.

    A problem with the current mechanism is that it potentially is very
    memory consuming. Since we don't distinguish between small and large
    messages, we have to dimension the socket receive buffer according
    to a worst-case of both. I.e., the window size must be chosen large
    enough to sustain a reasonable throughput even for the smallest
    messages, while we must still consider a scenario where all messages
    are of maximum size. Hence, the current fix window size of 512 messages
    and a maximum message size of 66k results in a receive buffer of 66 MB
    when truesize(66k) = 131k is taken into account. It is possible to do
    much better.

    This commit introduces an algorithm where we instead use 1024-byte
    blocks as base unit. This unit, always rounded upwards from the
    actual message size, is used when we advertise windows as well as when
    we count and acknowledge transmitted data. The advertised window is
    based on the configured receive buffer size in such a way that even
    the worst-case truesize/msgsize ratio always is covered. Since the
    smallest possible message size (from a flow control viewpoint) now is
    1024 bytes, we can safely assume this ratio to be less than four, which
    is the value we are now using.

    This way, we have been able to reduce the default receive buffer size
    from 66 MB to 2 MB with maintained performance.

    In order to keep this solution backwards compatible, we introduce a
    new capability bit in the discovery protocol, and use this throughout
    the message sending/reception path to always select the right unit.

    Acked-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

12 Apr, 2016

1 commit

  • Nametable updates received from the network that cannot be applied
    immediately are placed on a defer queue. This queue is global to the
    TIPC module, which might cause problems when using TIPC in containers.
    To prevent nametable updates from escaping into the wrong namespace,
    we make the queue pernet instead.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     

24 Oct, 2015

1 commit

  • The broadcast transmission link is currently instantiated when the
    network subsystem is started, i.e., on order from user space via netlink.

    This forces the broadcast transmission code to do unnecessary tests for
    the existence of the transmission link, as well in single mode node as
    in network mode.

    In this commit, we do instead create the link during initialization of
    the name space, and remove it when it is stopped. The fact that the
    transmission link now has a guaranteed longer life cycle than any of its
    potential clients paves the way for further code simplifcations
    and optimizations.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

05 May, 2015

1 commit

  • When a topology server accepts a connection request from its client,
    it allocates a connection instance and a tipc_subscriber structure
    object. The former is used to communicate with client, and the latter
    is often treated as a subscriber which manages all subscription events
    requested from a same client. When a topology server receives a request
    of subscribing name services from a client through the connection, it
    creates a tipc_subscription structure instance which is seen as a
    subscription recording what name services are subscribed. In order to
    manage all subscriptions from a same client, topology server links
    them into the subscrp_list of the subscriber. So subscriber and
    subscription completely represents different meanings respectively,
    but function names associated with them make us so confused that we
    are unable to easily tell which function is against subscriber and
    which is to subscription. So we want to eliminate the confusion by
    renaming them.

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

01 Apr, 2015

1 commit

  • When remove TIPC module, there is a warning to remind us that a slab
    object is leaked like:

    root@localhost:~# rmmod tipc
    [ 19.056226] =============================================================================
    [ 19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
    [ 19.058736] -----------------------------------------------------------------------------
    [ 19.058736]
    [ 19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
    [ 19.061915] INFO: Object 0xffff880014668000 @offset=0
    [ 19.062717] kmem_cache_destroy TIPC: Slab cache still has objects

    This is because the listening socket of TIPC topology server is not
    closed before TIPC proto handler is unregistered with proto_unregister().
    However, as the socket is closed in tipc_exit_net() which is called by
    unregister_pernet_subsys() during unregistering TIPC namespace operation,
    the warning can be eliminated if calling unregister_pernet_subsys() is
    moved before calling proto_unregister().

    Fixes: e05b31f4bf89 ("tipc: make tipc socket support net namespace")
    Reviewed-by: Erik Hugne
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

10 Feb, 2015

2 commits

  • Add TIPC_CMD_NOOP to compat layer and remove the old framework.

    All legacy nl commands are now converted to the compat layer in
    netlink_compat.c.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     
  • The new netlink API is no longer "v2" but rather the standard API and
    the legacy API is now "nl compat". We split them into separate
    start/stop and put them in different files in order to further
    distinguish them.

    Signed-off-by: Richard Alpe
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Richard Alpe
     

13 Jan, 2015

9 commits

  • After namespace is supported, each namespace should own its private
    random value. So the global variable representing the random value
    must be moved to tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC establishes one subscriber server which allows users to subscribe
    their interesting name service status. After tipc supports namespace,
    one dedicated tipc stack instance is created for each namespace, and
    each instance can be deemed as one independent TIPC node. As a result,
    subscriber server must be built for each namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • If net namespace is supported in tipc, each namespace will be treated
    as a separate tipc node. Therefore, every namespace must own its
    private tipc node address. This means the "tipc_own_addr" global
    variable of node address must be moved to tipc_net structure to
    satisfy the requirement. It's turned out that users also can assign
    node address for every namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC name table is used to store the mapping relationship between
    TIPC service name and socket port ID. When tipc supports namespace,
    it allows users to publish service names only owned by a certain
    namespace. Therefore, every namespace must have its private name
    table to prevent service names published to one namespace from being
    contaminated by other service names in another namespace. Therefore,
    The name table global variable (ie, nametbl) and its lock must be
    moved to tipc_net structure, and a parameter of namespace must be
    added for necessary functions so that they can obtain name table
    variable defined in tipc_net structure.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Now tipc socket table is statically allocated as a global variable.
    Through it, we can look up one socket instance with port ID, insert
    a new socket instance to the table, and delete a socket from the
    table. But when tipc supports net namespace, each namespace must own
    its specific socket table. So the global variable of socket table
    must be redefined in tipc_net structure. As a concequence, a new
    socket table will be allocated when a new namespace is created, and
    a socket table will be deallocated when namespace is destroyed.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Global variables associated with node table are below:
    - node table list (node_htable)
    - node hash table list (tipc_node_list)
    - node table lock (node_list_lock)
    - node number counter (tipc_num_nodes)
    - node link number counter (tipc_num_links)

    To make node table support namespace, above global variables must be
    moved to tipc_net structure in order to keep secret for different
    namespaces. As a consequence, these variables are allocated and
    initialized when namespace is created, and deallocated when namespace
    is destroyed. After the change, functions associated with these
    variables have to utilize a namespace pointer to access them. So
    adding namespace pointer as a parameter of these functions is the
    major change made in the commit.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Involve namespace infrastructure, make the "tipc_net_id" global
    variable aware of per namespace, and rename it to "net_id". In
    order that the conversion can be successfully done, an instance
    of networking namespace must be passed to relevant functions,
    allowing them to access the "net_id" variable of per namespace.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Only the works of initializing and shutting down tipc module are done
    in core.h and core.c files, so all stuffs which are not closely
    associated with the two tasks should be moved to appropriate places.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Remove redundant wrapper functions like tipc_core_start() and
    tipc_core_stop(), and directly move them to their callers, such
    as tipc_init() and tipc_exit(), having us clearly know what are
    really done in both initialization and deinitialzation functions.

    Signed-off-by: Ying Xue
    Tested-by: Tero Aho
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

09 Jan, 2015

1 commit

  • As tipc reference table is statically allocated, its memory size
    requested on stack initialization stage is quite big even if the
    maximum port number is just restricted to 8191 currently, however,
    the number already becomes insufficient in practice. But if the
    maximum ports is allowed to its theory value - 2^32, its consumed
    memory size will reach a ridiculously unacceptable value. Apart from
    this, heavy tipc users spend a considerable amount of time in
    tipc_sk_get() due to the read-lock on ref_table_lock.

    If tipc reference table is converted with generic rhashtable, above
    mentioned both disadvantages would be resolved respectively: making
    use of the new resizable hash table can avoid locking on the lookup;
    smaller memory size is required at initial stage, for example, 256
    hash bucket slots are requested at the beginning phase instead of
    allocating the entire 8191 slots in old mode. The hash table will
    grow if entries exceeds 75% of table size up to a total table size
    of 1M, and it will automatically shrink if usage falls below 30%,
    but the minimum table size is allowed down to 256.

    Also converts ref_table_lock to a separate mutex to protect hash table
    mutations on write side. Lastly defers the release of the socket
    reference using call_rcu() to allow using an RCU read-side protected
    call to rhashtable_lookup().

    Signed-off-by: Ying Xue
    Acked-by: Jon Maloy
    Acked-by: Erik Hugne
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Ying Xue
     

24 Aug, 2014

2 commits

  • The reference table is now 'socket aware' instead of being generic,
    and has in reality become a socket internal table. In order to be
    able to minimize the API exposed by the socket layer towards the rest
    of the stack, we now move the reference table definitions and functions
    into the file socket.c, and rename the functions accordingly.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We move the inline functions in the file port.h to socket.c, and modify
    their names accordingly.

    We move struct tipc_port and some macros to socket.h.

    Finally, we remove the file port.h.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

15 May, 2014

1 commit

  • Memory overhead when allocating big buffers for data transfer may
    be quite significant. E.g., truesize of a 64 KB buffer turns out
    to be 132 KB, 2 x the requested size.

    This invalidates the "worst case" calculation we have been
    using to determine the default socket receive buffer limit,
    which is based on the assumption that 1024x64KB = 67MB buffers
    may be queued up on a socket.

    Since TIPC connections cannot survive hitting the buffer limit,
    we have to compensate for this overhead.

    We do that in this commit by dividing the fix connection flow
    control window from 1024 (2*512) messages to 512 (2*256). Since
    older version nodes send out acks at 512 message intervals,
    compatibility with such nodes is guaranteed, although performance
    may be non-optimal in such cases.

    Signed-off-by: Jon Maloy
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

06 May, 2014

1 commit

  • In the previous commits of this series, we removed all asynchronous
    actions which were based on the tasklet handler - "tipc_k_signal()".

    So the moment has now come when we can completely remove the tasklet
    handler infrastructure. That is done with this commit.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

28 Mar, 2014

1 commit

  • Due to the lacking of any credential, it's allowed to accept commands
    requested from remote nodes to query the local node status, which is
    prone to involve potential security risks. Instead, if we login to
    a remote node with ssh command, this approach is not only more safe
    than the remote management feature, but also it can give us more
    permissions like changing the remote node configuration. So it's
    reasonable for us to obsolete the remote management feature now.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Signed-off-by: David S. Miller

    Ying Xue
     

06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Feb, 2014

2 commits

  • Accidentally a side effect is involved by commit 6e967adf7(tipc:
    relocate common functions from media to bearer). Now tipc stack
    handler of receiving packets from netdevices as well as netdevice
    notification handler are registered when bearer is enabled rather
    than tipc module initialization stage, but the two handlers are
    both unregistered in tipc module exit phase. If tipc module is
    inserted and then immediately removed, the following warning
    message will appear:

    "dev_remove_pack: ffffffffa0380940 not found"

    This is because in module insertion stage tipc stack packet handler
    is not registered at all, but in module exit phase dev_remove_pack()
    needs to remove it. Of course, dev_remove_pack() cannot find tipc
    protocol handler from the kernel protocol handler list so that the
    warning message is printed out.

    But if registering the two handlers is adjusted from enabling bearer
    phase into inserting module stage, the warning message will be
    eliminated. Due to this change, tipc_core_start_net() and
    tipc_core_stop_net() can be deleted as well.

    Reported-by: Wang Weidong
    Cc: Jon Maloy
    Cc: Erik Hugne
    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue
     
  • When tipc module is inserted, many tipc components are initialized
    one by one. During the initialization period, if one of them is
    failed, tipc_core_stop() will be called to stop all components
    whatever corresponding components are created or not. To avoid to
    release uncreated ones, relevant components have to add necessary
    enabled flags indicating whether they are created or not.

    But in the initialization stage, if one component is unsuccessfully
    created, we will just destroy successfully created components before
    the failed component instead of all components. All enabled flags
    defined in components, in turn, become redundant. Additionally it's
    also unnecessary to identify whether table.types is NULL in
    tipc_nametbl_stop() because name stable has been definitely created
    successfully when tipc_nametbl_stop() is called.

    Cc: Jon Maloy
    Cc: Erik Hugne
    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue