22 Oct, 2014

2 commits

  • When running tipcTC&tipcTS test suite, below lockdep unsafe locking
    scenario is reported:

    [ 1109.997854]
    [ 1109.997988] =================================
    [ 1109.998290] [ INFO: inconsistent lock state ]
    [ 1109.998575] 3.17.0-rc1+ #113 Not tainted
    [ 1109.998762] ---------------------------------
    [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [] tipc_sk_rcv+0x49/0x2b0 [tipc]
    [ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
    [ 1109.998762] [] __lock_acquire+0x6a0/0x1d80
    [ 1109.998762] [] lock_acquire+0x95/0x1e0
    [ 1109.998762] [] _raw_spin_lock+0x3e/0x80
    [ 1109.998762] [] tipc_sk_rcv+0x49/0x2b0 [tipc]
    [ 1109.998762] [] tipc_link_xmit+0xa8/0xc0 [tipc]
    [ 1109.998762] [] tipc_sendmsg+0x15f/0x550 [tipc]
    [ 1109.998762] [] tipc_connect+0x105/0x140 [tipc]
    [ 1109.998762] [] SYSC_connect+0xae/0xc0
    [ 1109.998762] [] SyS_connect+0xe/0x10
    [ 1109.998762] [] compat_SyS_socketcall+0xb8/0x200
    [ 1109.998762] [] sysenter_dispatch+0x7/0x1f
    [ 1109.998762] irq event stamp: 241060
    [ 1109.998762] hardirqs last enabled at (241060): [] __local_bh_enable_ip+0x6d/0xd0
    [ 1109.998762] hardirqs last disabled at (241059): [] __local_bh_enable_ip+0x2f/0xd0
    [ 1109.998762] softirqs last enabled at (241020): [] _local_bh_enable+0x22/0x50
    [ 1109.998762] softirqs last disabled at (241021): [] irq_exit+0x96/0xc0
    [ 1109.998762]
    [ 1109.998762] other info that might help us debug this:
    [ 1109.998762] Possible unsafe locking scenario:
    [ 1109.998762]
    [ 1109.998762] CPU0
    [ 1109.998762] ----
    [ 1109.998762] lock(slock-AF_TIPC);
    [ 1109.998762]
    [ 1109.998762] lock(slock-AF_TIPC);
    [ 1109.998762]
    [ 1109.998762] *** DEADLOCK ***
    [ 1109.998762]
    [ 1109.998762] 2 locks held by swapper/7/0:
    [ 1109.998762] #0: (rcu_read_lock){......}, at: [] __netif_receive_skb_core+0x69/0xb70
    [ 1109.998762] #1: (rcu_read_lock){......}, at: [] tipc_l2_rcv_msg+0x40/0x260 [tipc]
    [ 1109.998762]
    [ 1109.998762] stack backtrace:
    [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
    [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
    [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
    [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
    [ 1109.998762] Call Trace:
    [ 1109.998762] [] dump_stack+0x4e/0x68
    [ 1109.998762] [] print_usage_bug+0x1f1/0x202
    [ 1109.998762] [] ? save_stack_trace+0x2f/0x50
    [ 1109.998762] [] mark_lock+0x28c/0x2f0
    [ 1109.998762] [] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
    [ 1109.998762] [] __lock_acquire+0x5ad/0x1d80
    [ 1109.998762] [] ? trace_hardirqs_on+0xd/0x10
    [ 1109.998762] [] ? sched_clock_cpu+0x98/0xc0
    [ 1109.998762] [] ? local_clock+0x1b/0x30
    [ 1109.998762] [] ? lock_release_holdtime.part.29+0x1c/0x1a0
    [ 1109.998762] [] ? sched_clock_local+0x25/0x90
    [ 1109.998762] [] ? tipc_sk_get+0x60/0x80 [tipc]
    [ 1109.998762] [] lock_acquire+0x95/0x1e0
    [ 1109.998762] [] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
    [ 1109.998762] [] ? trace_hardirqs_on_caller+0xa6/0x1c0
    [ 1109.998762] [] _raw_spin_lock+0x3e/0x80
    [ 1109.998762] [] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
    [ 1109.998762] [] ? tipc_sk_get+0x60/0x80 [tipc]
    [ 1109.998762] [] tipc_sk_rcv+0x49/0x2b0 [tipc]
    [ 1109.998762] [] tipc_rcv+0x5ed/0x960 [tipc]
    [ 1109.998762] [] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
    [ 1109.998762] [] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
    [ 1109.998762] [] __netif_receive_skb_core+0x5e5/0xb70
    [ 1109.998762] [] ? __netif_receive_skb_core+0x69/0xb70
    [ 1109.998762] [] ? dev_gro_receive+0x259/0x4e0
    [ 1109.998762] [] __netif_receive_skb+0x26/0x70
    [ 1109.998762] [] netif_receive_skb_internal+0x2d/0x1f0
    [ 1109.998762] [] napi_gro_receive+0xd8/0x240
    [ 1109.998762] [] e1000_clean_rx_irq+0x2c4/0x530
    [ 1109.998762] [] e1000_clean+0x266/0x9c0
    [ 1109.998762] [] ? local_clock+0x1b/0x30
    [ 1109.998762] [] ? sched_clock_local+0x25/0x90
    [ 1109.998762] [] net_rx_action+0x141/0x310
    [ 1109.998762] [] ? handle_fasteoi_irq+0xe0/0x150
    [ 1109.998762] [] __do_softirq+0x116/0x4d0
    [ 1109.998762] [] irq_exit+0x96/0xc0
    [ 1109.998762] [] do_IRQ+0x67/0x110
    [ 1109.998762] [] common_interrupt+0x6f/0x6f
    [ 1109.998762] [] ? default_idle+0x37/0x250
    [ 1109.998762] [] ? default_idle+0x35/0x250
    [ 1109.998762] [] arch_cpu_idle+0xf/0x20
    [ 1109.998762] [] cpu_startup_entry+0x27d/0x4d0
    [ 1109.998762] [] start_secondary+0x188/0x1f0

    When intra-node messages are delivered from one process to another
    process, tipc_link_xmit() doesn't disable BH before it directly calls
    tipc_sk_rcv() on process context to forward messages to destination
    socket. Meanwhile, if messages delivered by remote node arrive at the
    node and their destinations are also the same socket, tipc_sk_rcv()
    running on process context might be preempted by tipc_sk_rcv() running
    BH context. As a result, the latter cannot obtain the socket lock as
    the lock was obtained by the former, however, the former has no chance
    to be run as the latter is owning the CPU now, so headlock happens. To
    avoid it, BH should be always disabled in tipc_sk_rcv().

    Signed-off-by: Ying Xue
    Reviewed-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Locking dependency detected below possible unsafe locking scenario:

    CPU0 CPU1
    T0: tipc_named_rcv() tipc_rcv()
    T1: [grab nametble write lock]* [grab node lock]*
    T2: tipc_update_nametbl() tipc_node_link_up()
    T3: tipc_nodesub_subscribe() tipc_nametbl_publish()
    T4: [grab node lock]* [grab nametble write lock]*

    The opposite order of holding nametbl write lock and node lock on
    above two different paths may result in a deadlock. If we move the
    the updating of the name table after link state named out of node
    lock, the reverse order of holding locks will be eliminated, and
    as a result, the deadlock risk.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

18 Oct, 2014

1 commit

  • In commit ec8a2e5621db2da24badb3969eda7fd359e1869f ("tipc: same receive
    code path for connection protocol and data messages") we omitted the
    the possiblilty that an arriving message extracted from a bundle buffer
    may be a multicast message. Such messages need to be to be delivered to
    the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
    small multicast messages arriving as members of a bundle buffer will be
    silently dropped.

    This commit corrects the error by considering this case in the function
    tipc_link_bundle_rcv().

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

08 Oct, 2014

1 commit

  • One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc:
    use pseudo message to wake up sockets after link congestion") was
    to handle link congestion abatement in a uniform way for both unicast
    and multicast transmit. However, the latter doesn't work correctly,
    and has been broken since the referenced commit was applied.

    If a user now sends a burst of multicast messages that is big
    enough to cause broadcast link congestion, it will be put to sleep,
    and not be waked up when the congestion abates as it should be.

    This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
    is set correctly, but in the wrong field. Instead of setting it in the
    'action_flags' field of the arrival node struct, it is by mistake set
    in the dummy node struct that is owned by the broadcast link, where it
    will never tested for. Second, we cannot use the same flag for waking
    up unicast and multicast users, since the function tipc_node_unlock()
    needs to pick the wakeup pseudo messages to deliver from different
    queues. It must hence be able to distinguish between the two cases.

    This commit solves this problem by adding a new flag
    TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
    The latter is to be called by tipc_node_unlock() when the named flag,
    now set in the correct field, is encountered.

    v2: using explicit 'unsigned int' declaration instead of 'uint', as
    per comment from David Miller.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     

11 Sep, 2014

1 commit

  • This fixes the following sparse warnings:
    sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
    Also, the function is changed to return bool upon success, rather than a
    potentially freed pointer.

    Signed-off-by: Erik Hugne
    Reported-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Erik Hugne
     

02 Sep, 2014

2 commits

  • TIPC name table updates are distributed asynchronously in a cluster,
    entailing a risk of certain race conditions. E.g., if two nodes
    simultaneously issue conflicting (overlapping) publications, this may
    not be detected until both publications have reached a third node, in
    which case one of the publications will be silently dropped on that
    node. Hence, we end up with an inconsistent name table.

    In most cases this conflict is just a temporary race, e.g., one
    node is issuing a publication under the assumption that a previous,
    conflicting, publication has already been withdrawn by the other node.
    However, because of the (rtt related) distributed update delay, this
    may not yet hold true on all nodes. The symptom of this failure is a
    syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

    In this commit we add a resiliency queue at the receiving end of
    the name table distributor. When insertion of an arriving publication
    fails, we retain it in this queue for a short amount of time, assuming
    that another update will arrive very soon and clear the conflict. If so
    happens, we insert the publication, otherwise we drop it.

    The (configurable) retention value defaults to 2000 ms. Knowing from
    experience that the situation described above is extremely rare, there
    is no risk that the queue will accumulate any large number of items.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • We need to perform the same actions when processing deferred name
    table updates, so this functionality is moved to a separate
    function.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     

30 Aug, 2014

1 commit

  • Commit 6c9808ce09f7 ("tipc: remove port_lock") accidentally involves
    a potential bug: when tipc socket instance(tsk) is not got with given
    reference number in tipc_sk_get(), tsk is set to NULL. Subsequently
    we jump to exit label where to decrease socket reference counter
    pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL,
    oops may happen because of touching a NULL pointer.

    Signed-off-by: Ying Xue
    Acked-by: Erik Hugne
    Acked-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     

24 Aug, 2014

15 commits

  • We complete the merging of the port and socket layer by aggregating
    the fields of struct tipc_port directly into struct tipc_sock, and
    moving the combined structure into socket.c.

    We also move all functions and macros that are not any longer
    exposed to the rest of the stack into socket.c, and rename them
    accordingly.

    Despite the size of this commit, there are no functional changes.
    We have only made such changes that are necessary due of the removal
    of struct tipc_port.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The reference table is now 'socket aware' instead of being generic,
    and has in reality become a socket internal table. In order to be
    able to minimize the API exposed by the socket layer towards the rest
    of the stack, we now move the reference table definitions and functions
    into the file socket.c, and rename the functions accordingly.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We move the inline functions in the file port.h to socket.c, and modify
    their names accordingly.

    We move struct tipc_port and some macros to socket.h.

    Finally, we remove the file port.h.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In this commit, we move the remaining functions in port.c to
    socket.c, and give them new names that correspond to their new
    location. We then remove the file port.c.

    There are only cosmetic changes to the moved functions.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In previous commits we have reduced usage of port_lock to a minimum,
    and complemented it with usage of bh_lock_sock() at the remaining
    locations. The purpose has been to remove this lock altogether, since
    it largely duplicates the role of bh_lock_sock. We are now ready to do
    this.

    However, we still need to protect the BH callers from inadvertent
    release of the socket while they hold a reference to it. We do this by
    replacing port_lock by a combination of a rw-lock protecting the
    reference table as such, and updating the socket reference counter while
    the socket is referenced from BH. This technique is more standard and
    comprehensible than the previous approach, and turns out to have a
    positive effect on overall performance.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In order to make tipc_sock the only entity referencable from other
    parts of the stack, we add a tipc_sock pointer instead of a tipc_port
    pointer to the registry. As a consequence, we also let the function
    tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port.
    We keep the function's name for now, since the lock still is owned by
    the port.

    This is another step in the direction of eliminating port_lock, replacing
    its usage with lock_sock() and bh_lock_sock().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The functions tipc_port_get_ports() and tipc_port_reinit() scan over
    all sockets/ports to access each of them. This is done by using a
    dedicated linked list, 'tipc_socks' where all sockets are members. The
    list is in turn protected by a spinlock, 'port_list_lock', while each
    socket is locked by using port_lock at the moment of access.

    In order to reduce complexity and risk of deadlock, we want to get
    rid of the linked list and the accompanying spinlock.

    This is what we do in this commit. Instead of the linked list, we use
    the port registry to scan across the sockets. We also add usage of
    bh_lock_sock() inside the scope of port_lock in both functions, as a
    preparation for the complete removal of port_lock.

    Finally, we move the functions from port.c to socket.c, and rename them
    to tipc_sk_sock_show() and tipc_sk_reinit() repectively.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • After the latest changes to the socket/port layer the existence of
    the functions tipc_port_init() and tipc_port_destroy() cannot be
    justified. They are both called only once, from tipc_sk_create() and
    tipc_sk_delete() respectively, and their functionality can better be
    merged into the latter two functions.

    This also entails that all remaining references to port_lock now are
    made from inside socket.c, something that will make it easier to remove
    this lock.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The function tipc_acknowledge() is a remnant from the obsolete native
    API. Currently, it grabs port_lock, before building an acknowledge
    message and sending it to the peer.

    Since all access to socket members now is protected by the socket lock,
    it has become unnecessary to grab port_lock here.

    In this commit, we remove the usage of port_lock, simplify the
    function, and move it to socket.c, renaming it to tipc_sk_send_ack().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
    native API. Their only task is to grab port_lock and call the functions
    __tipc_port_connect()/__tipc_port_disconnect() respectively, which will
    perform the actual state change.

    Since socket/port exection now is single-threaded the use of port_lock
    is not needed any more, so we can safely replace the two functions with
    their lock-free counterparts.

    In this commit, we remove the two functions. Furthermore, the contents
    of __tipc_port_disconnect() is so trivial that we choose to eliminate
    that function too, expanding its functionality into tipc_shutdown().
    __tipc_port_connect() is simplified, moved to socket.c, and given the
    more correct name tipc_sk_finish_conn(). Finally, we eliminate the
    function auto_connect(), and expand its contents into filter_connect().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • tipc_port_shutdown() is a remnant from the now obsolete native
    interface. As such it grabs port_lock in order to protect itself
    from concurrent BH processing.

    However, after the recent changes to the port/socket upcalls, sockets
    are now basically single-threaded, and all execution, except the read-only
    tipc_sk_timer(), is executing within the protection of lock_sock(). So
    the use of port_lock is not needed here.

    In this commit we eliminate the whole function, and merge it into its
    only caller, tipc_shutdown().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The last remaining BH upcall to the socket, apart for the message
    reception function tipc_sk_rcv(), is the timer function.

    We prefer to let this function continue executing in BH, since it only
    does read-acces to semi-permanent data, but we make three changes to it:

    1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
    of port_lock. This is a preparation for replacing port_lock with
    bh_lock_sock() at the locations where it is still used.

    2) We move the function from port.c to socket.c, as a further step
    of eliminating the port code level altogether.

    3) We let it make use of the newly introduced tipc_msg_create()
    function. This enables us to get rid of three context specific
    functions (port_create_self_abort_msg() etc.) in port.c

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In the current implementation, each 'struct tipc_node' instance keeps
    a linked list of those ports/sockets that are connected to the node
    represented by that struct. The purpose of this is to let the node
    object know which sockets to alert when it loses contact with its peer
    node, i.e., which sockets need to have their connections aborted.

    This entails an unwanted direct reference from the node structure
    back to the port/socket structure, and a need to grab port_lock
    when we have to make an upcall to the port. We want to get rid of
    this unecessary BH entry point into the socket, and also eliminate
    its use of port_lock.

    In this commit, we instead let the node struct keep list of "connected
    socket" structs, which each represents a connected socket, but is
    allocated independently by the node at the moment of connection. If
    the node loses contact with its peer node, the list is traversed, and
    a "connection abort" message is created for each entry in the list. The
    message is sent to it respective connected socket using the ordinary
    data path, and the receiving socket aborts its connections upon reception
    of the message.

    This enables us to get rid of the direct reference from 'struct node' to
    ´struct port', and another unwanted BH access point to the latter.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The current link implementation keeps a linked list of blocked ports/
    sockets that is populated when there is link congestion. The purpose
    of this is to let the link know which users to wake up when the
    congestion abates.

    This adds unnecessary complexity to the data structure and the code,
    since it forces us to involve the link each time we want to delete
    a socket. It also forces us to grab the spinlock port_lock within
    the scope of node_lock. We want to get rid of this direct dependence,
    as well as the deadlock hazard resulting from the usage of port_lock.

    In this commit, we instead let the link keep list of a "wakeup" pseudo
    messages for use in such situations. Those messages are sent to the
    pending sockets via the ordinary message reception path, and wake up
    the socket's owner when they are received.

    This enables us to get rid of the 'waiting_ports' linked lists in struct
    tipc_port that manifest this direct reference. As a consequence, we can
    eliminate another BH entry into the socket, and hence the need to grab
    port_lock. This is a further step in our effort to remove port_lock
    altogether.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The function tipc_msg_init() has turned out to be of limited value
    in many cases. It take too few parameters to be usable for creating
    a complete message, it makes too many assumptions about what the
    message should be used for, and it does not allocate any buffer to
    be returned to the caller.

    Therefore, we now introduce the new function tipc_msg_create(), which
    takes all the parameters needed to create a full message, and returns
    a buffer of the requested size. The new function will be very useful
    for the changes we will be doing in later commits in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

20 Aug, 2014

1 commit


17 Aug, 2014

1 commit

  • Commit 3b4f302d8578 ("tipc: eliminate
    redundant locking") introduced a bug by removing the sanity check
    for message importance, allowing programs to assign any value to
    the msg_user field. This will mess up the packet reception logic
    and may cause random link resets.

    Signed-off-by: Erik Hugne
    Signed-off-by: David S. Miller

    Erik Hugne
     

30 Jul, 2014

1 commit


29 Jul, 2014

1 commit

  • As per comment from David Miller, we try to make the buffer reassembly
    function more resilient to user errors than it is today.

    - We check that the "*buf" parameter always is set, since this is
    mandatory input.

    - We ensure that *buf->next always is set to NULL before linking in
    the buffer, instead of relying of the caller to have done this.

    - We ensure that the "tail" pointer in the head buffer's control
    block is initialized to NULL when the first fragment arrives.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

21 Jul, 2014

1 commit

  • Fixes the following sparse warnings:

    net/tipc/socket.c:545:5: warning:
    symbol 'tipc_sk_proto_rcv' was not declared. Should it be static?
    net/tipc/socket.c:2015:5: warning:
    symbol 'tipc_ioctl' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Wei Yongjun
     

17 Jul, 2014

8 commits

  • When we run broadcast packets over dual bearers/interfaces, the
    current transmission code is flipping bearers between each sent
    packet, with the purpose of leveraging the double bandwidth
    available. The receiving bclink is resequencing the packets if
    needed, so all messages are delivered upwards from the broadcast
    link in the correct order, even if they may arrive in concurrent
    interrupts.

    However, at the moment of delivery upwards to the socket, we release
    all spinlocks (bclink_lock, node_lock), so it is still possible
    that arriving messages bypass each other before they reach the socket
    queue.

    We fix this by applying the same technique we are using for unicast
    traffic. We use a link selector (i.e., the last bit of sending port
    number) to ensure that messages from the same sender socket always are
    sent over the same bearer. This guarantees sequential delivery between
    socket pairs, which is sufficient to satisfy the protocol spec, as well
    as all known user requirements.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • After the previous commit, we can now give the functions with temporary
    names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
    names.

    There are no functional changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We can now remove a number of functions which have become obsolete
    and unreferenced through this commit series. There are no functional
    changes in this commit.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In this commit, we convert the socket multicast send function to
    directly call the new multicast/broadcast function (tipc_bclink_xmit2())
    introduced in the previous commit. We do this instead of letting the
    call go via the now obsolete tipc_port_mcast_xmit(), hence saving
    a call level and some code complexity.

    We also remove the initial destination lookup at the message sending
    side, and replace that with an unconditional lookup at the receiving
    side, including on the sending node itself. This makes the destination
    lookup and message transfer more uniform than before.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We add a new broadcast link transmit function in bclink.c and a new
    receive function in socket.c. The purpose is to move the branching
    between external and internal destination down to the link layer,
    just as we have done with unicast in earlier commits. We also make
    use of the new link-independent fragmentation support that was
    introduced in an earlier commit series.

    This gives a shorter and simpler code path, and makes it possible
    to obtain copy-free buffer delivery to all node local destination
    sockets.

    The new transmission code is added in parallel with the existing one,
    and will be used by the socket multicast send function in the next
    commit in this series.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • We convert the link internal users (changeover protocol, broadcast
    synchronization) to use the new packet send function.

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In a previous commit series ("tipc: new unicast transmission code")
    we introduced a new message sending function, tipc_link_xmit2(),
    and moved the unicast data users over to use that function. We now
    let the internal name table distributor do the same.

    The interaction between the name distributor and the node/link
    layer also becomes significantly simpler, so we can eliminate
    the function tipc_link_names_xmit().

    Signed-off-by: Jon Maloy
    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Signed-off-by: David S. Miller

    David S. Miller
     

16 Jul, 2014

1 commit


12 Jul, 2014

1 commit

  • If the 'next' pointer of the last fragment buffer in a message is not
    zeroed before reassembly, we risk ending up with a corrupt message,
    since the reassembly function itself isn't doing this.

    Currently, when a buffer is retrieved from the deferred queue of the
    broadcast link, the next pointer is not cleared, with the result as
    described above.

    This commit corrects this, and thereby fixes a bug that may occur when
    long broadcast messages are transmitted across dual interfaces. The bug
    has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc:
    message reassembly using fragment chain")

    This commit should be applied to both net and net-next.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

09 Jul, 2014

2 commits

  • This fixes a regression bug caused by:
    067608e9d019d6477fd45dd948e81af0e5bf599f ("tipc: introduce direct
    iovec to buffer chain fragmentation function")

    If data is sent on a nonblocking socket and the destination link
    is congested, the buffer chain is leaked. We fix this by freeing
    the chain in this case.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • Since commit 37e22164a8a3c39bdad45aa463b1e69a1fdf4110 ("tipc: rename and
    move message reassembly function") reassembly of long broadcast messages
    has been broken. This is because we test for a non-NULL return value
    of the *buf parameter as criteria for succesful reassembly. However, this
    parameter is left defined even after reception of the first fragment,
    when reassebly is still incomplete. This leads to a kernel crash as soon
    as a the first fragment of a long broadcast message is received.

    We fix this with this commit, by implementing a stricter behavior of the
    function and its return values.

    This commit should be applied to both net and net-next.

    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Jon Paul Maloy