20 Dec, 2018

5 commits

  • The commit adds the new trace_event for TIPC bearer, L2 device event:

    trace_tipc_l2_device_event()

    Also, it puts the trace at the tipc_l2_device_event() function, then
    the device/bearer events and related info can be traced out during
    runtime when needed.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC node object:

    trace_tipc_node_create()
    trace_tipc_node_delete()
    trace_tipc_node_lost_contact()
    trace_tipc_node_timeout()
    trace_tipc_node_link_up()
    trace_tipc_node_link_down()
    trace_tipc_node_reset_links()
    trace_tipc_node_fsm_evt()
    trace_tipc_node_check_state()

    Also, enables the traces for the following cases:
    - When a node is created/deleted;
    - When a node contact is lost;
    - When a node timer is timed out;
    - When a node link is up/down;
    - When all node links are reset;
    - When node state is changed;
    - When a skb comes and node state needs to be checked/updated.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC socket object:

    trace_tipc_sk_create()
    trace_tipc_sk_poll()
    trace_tipc_sk_sendmsg()
    trace_tipc_sk_sendmcast()
    trace_tipc_sk_sendstream()
    trace_tipc_sk_filter_rcv()
    trace_tipc_sk_advance_rx()
    trace_tipc_sk_rej_msg()
    trace_tipc_sk_drop_msg()
    trace_tipc_sk_release()
    trace_tipc_sk_shutdown()
    trace_tipc_sk_overlimit1()
    trace_tipc_sk_overlimit2()

    Also, enables the traces for the following cases:
    - When user creates a TIPC socket;
    - When user calls poll() on TIPC socket;
    - When user sends a dgram/mcast/stream message.
    - When a message is put into the socket 'sk_receive_queue';
    - When a message is released from the socket 'sk_receive_queue';
    - When a message is rejected (e.g. due to no port, invalid, etc.);
    - When a message is dropped (e.g. due to wrong message type);
    - When socket is released;
    - When socket is shutdown;
    - When socket rcvq's allocation is overlimit (> 90%);
    - When socket rcvq + bklq's allocation is overlimit (> 90%);
    - When the 'TIPC_ERR_OVERLOAD/2' issue happens;

    Note:
    a) All the socket traces are designed to be able to trace on a specific
    socket by either using the 'event filtering' feature on a known socket
    'portid' value or the sysctl file:

    /proc/sys/net/tipc/sk_filter

    The file determines a 'tuple' for what socket should be traced:

    (portid, sock type, name type, name lower, name upper)

    where:
    + 'portid' is the socket portid generated at socket creating, can be
    found in the trace outputs or the 'tipc socket list' command printouts;
    + 'sock type' is the socket type (1 = SOCK_TREAM, ...);
    + 'name type', 'name lower' and 'name upper' are the service name being
    connected to or published by the socket.

    Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e.
    the traces happen for every sockets with no filter.

    b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event
    which happens when the socket receive queue (and backlog queue) is
    about to be overloaded, when the queue allocation is > 90%. Then, when
    the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2
    issue can be traced.

    The trace event is designed as an 'upper watermark' notification that
    the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or
    actions can be triggerred in the meanwhile to see what is going on with
    the socket queue.

    In addition, the 'trace_tipc_sk_dump()' is also placed at the
    'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped
    for post-analysis.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • The commit adds the new trace_events for TIPC link object:

    trace_tipc_link_timeout()
    trace_tipc_link_fsm()
    trace_tipc_link_reset()
    trace_tipc_link_too_silent()
    trace_tipc_link_retrans()
    trace_tipc_link_bc_ack()
    trace_tipc_link_conges()

    And the traces for PROTOCOL messages at building and receiving:

    trace_tipc_proto_build()
    trace_tipc_proto_rcv()

    Note:
    a) The 'tipc_link_too_silent' event will only happen when the
    'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
    event is enabled). The benefit for this kind of event is that we can
    get an early indication about TIPC link loss issue due to timeout, then
    can do some necessary actions for troubleshooting.

    For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
    event occurs:

    echo 'enable_event:tipc:tipc_proto_rcv' > \
    events/tipc/tipc_link_too_silent/trigger

    And disable it when TIPC link is reset:

    echo 'disable_event:tipc:tipc_proto_rcv' > \
    events/tipc/tipc_link_reset/trigger

    b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
    trace TIPC retransmission issues.

    In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
    'retransmission failure' case. Then, if the issue occurs, the link
    'transmq' along with the link data can be dumped for post-analysis.
    These dump events should be enabled by default since it will only take
    effect when the failure happens.

    The same approach is also applied for the faulty case that the
    validation of protocol message is failed.

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • As for the sake of debugging/tracing, the commit enables tracepoints in
    TIPC along with some general trace_events as shown below. It also
    defines some 'tipc_*_dump()' functions that allow to dump TIPC object
    data whenever needed, that is, for general debug purposes, ie. not just
    for the trace_events.

    The following trace_events are now available:

    - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
    e.g. message type, user, droppable, skb truesize, cloned skb, etc.

    - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
    queues, e.g. TIPC link transmq, socket receive queue, etc.

    - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
    sk state, sk type, connection type, rmem_alloc, socket queues, etc.

    - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
    link state, silent_intv_cnt, gap, bc_gap, link queues, etc.

    - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
    node state, active links, capabilities, link entries, etc.

    How to use:
    Put the trace functions at any places where we want to dump TIPC data
    or events.

    Note:
    a) The dump functions will generate raw data only, that is, to offload
    the trace event's processing, it can require a tool or script to parse
    the data but this should be simple.

    b) The trace_tipc_*_dump() should be reserved for a failure cases only
    (e.g. the retransmission failure case) or where we do not expect to
    happen too often, then we can consider enabling these events by default
    since they will almost not take any effects under normal conditions,
    but once the rare condition or failure occurs, we get the dumped data
    fully for post-analysis.

    For other trace purposes, we can reuse these trace classes as template
    but different events.

    c) A trace_event is only effective when we enable it. To enable the
    TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
    directory in the 'debugfs' file system. Normally, they are located at:

    /sys/kernel/debug/tracing/events/tipc/

    For example:

    To enable the tipc_link_dump event:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable

    To enable all the TIPC trace_events:

    echo 1 > /sys/kernel/debug/tracing/events/tipc/enable

    To collect the trace data:

    cat trace

    or

    cat trace_pipe > /trace.out &

    To disable all the TIPC trace_events:

    echo 0 > /sys/kernel/debug/tracing/events/tipc/enable

    To clear the trace buffer:

    echo > trace

    d) Like the other trace_events, the feature like 'filter' or 'trigger'
    is also usable for the tipc trace_events.
    For more details, have a look at:

    Documentation/trace/ftrace.txt

    MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc

    Acked-by: Ying Xue
    Tested-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien