17 Sep, 2016

25 commits

  • Improve sk_buff tracing within AF_RXRPC by the following means:

    (1) Use an enum to note the event type rather than plain integers and use
    an array of event names rather than a big multi ?: list.

    (2) Distinguish Rx from Tx packets and account them separately. This
    requires the call phase to be tracked so that we know what we might
    find in rxtx_buffer[].

    (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
    event type.

    (4) A pair of 'rotate' events are added to indicate packets that are about
    to be rotated out of the Rx and Tx windows.

    (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
    packet loss injection recording.

    Signed-off-by: David Howells

    David Howells
     
  • Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
    uses an uninitialised variable.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to follow what recvmsg does within AF_RXRPC.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to follow the life of packets that get added to a call's
    receive buffer.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to log information about ACK transmission.

    Signed-off-by: David Howels

    David Howells
     
  • Add a tracepoint to log information from received ACK packets.

    Signed-off-by: David Howells

    David Howells
     
  • Add a tracepoint to follow the insertion of a packet into the transmit
    buffer, its transmission and its rotation out of the buffer.

    Signed-off-by: David Howells

    David Howells
     
  • Add a pair of tracepoints, one to track rxrpc_connection struct ref
    counting and the other to track the client connection cache state.

    Signed-off-by: David Howells

    David Howells
     
  • Add additional call tracepoint points for noting call-connected,
    call-released and connection-failed events.

    Also fix one tracepoint that was using an integer instead of the
    corresponding enum value as the point type.

    Signed-off-by: David Howells

    David Howells
     
  • Print a symbolic packet type name for each valid received packet in the
    trace output, not just a number.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the basic transmit DATA packet content size at 1412 bytes so that they
    can be arbitrarily assembled into jumbo packets.

    In the future, I'm thinking of moving to keeping a jumbo packet header at
    the beginning of each packet in the Tx queue and creating the packet header
    on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
    be assembled on the spur of the moment for (re-)transmission.

    Signed-off-by: David Howells

    David Howells
     
  • rxrpc_send_call_packet() should use type in both its switch-statements
    rather than using pkt->whdr.type. This might give the compiler an easier
    job of uninitialised variable checking.

    Signed-off-by: David Howells

    David Howells
     
  • Don't transmit an ACK if call->ackr_reason in unset. There's the
    possibility of a race between recvmsg() sending an ACK and the background
    processing thread trying to send the same one.

    Signed-off-by: David Howells

    David Howells
     
  • Make the retransmission algorithm use for-loops instead of do-loops and
    move the counter increments into the for-statement increment slots.

    Though the do-loops are slighly more efficient since there will be at least
    one pass through the each loop, the counter increments are harder to get
    right as the continue-statements skip them.

    Without this, if there are any positive acks within the loop, the do-loop
    will cycle forever because the counter increment is never done.

    Signed-off-by: David Howells

    David Howells
     
  • The soft-ACK parser doesn't increment the pointer into the soft-ACK list,
    resulting in the first ACK/NACK value being applied to all the relevant
    packets in the Tx queue. This has the potential to miss retransmissions
    and cause excessive retransmissions.

    Fix this by incrementing the pointer.

    Signed-off-by: David Howells

    David Howells
     
  • If the last call on a client connection is release after the connection has
    had a bunch of calls allocated but before any DATA packets are sent (so
    that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in
    rxrpc_disconnect_client_call().

    af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false
    ------------[ cut here ]------------
    kernel BUG at ../net/rxrpc/conn_client.c:753!

    This is because it's expecting the conn to have been exposed and to have 2
    or more refs - but this isn't necessarily the case.

    Simply remove the assertion. This allows the conn to be moved into the
    inactive state and deleted if it isn't resurrected before the final put is
    called.

    Signed-off-by: David Howells

    David Howells
     
  • Call rxrpc_release_call() on getting an error in rxrpc_new_client_call()
    rather than trying to do the cleanup ourselves. This isn't a problem,
    provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to
    the calls tree as cleanup code fragments that would otherwise cause
    problems are conditional.

    Without this, we miss some of the cleanup.

    Signed-off-by: David Howells

    David Howells
     
  • In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set
    on it, then it's accounted for in rxrpc_nr_client_conns and may be on
    various lists - and this is cleaned up correctly.

    However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then
    the put routine returns rather than just skipping the extra bit of cleanup.

    Fix this by making the extra bit of clean up conditional instead and always
    killing off the connection.

    This manifests itself as connections with a zero usage count hanging around
    in /proc/net/rxrpc_conns because the connection allocated, but discarded,
    due to a race with another process that set up a parallel connection, which
    was then shared instead.

    Signed-off-by: David Howells

    David Howells
     
  • Purge the queue of to_be_accepted calls on socket release. Note that
    purging sock_calls doesn't release the ref owned by to_be_accepted.

    Probably the sock_calls list is redundant given a purges of the recvmsg_q,
    the to_be_accepted queue and the calls tree.

    Signed-off-by: David Howells

    David Howells
     
  • Record calls that need to be accepted using sk_acceptq_added() otherwise
    the backlog counter goes negative because sk_acceptq_removed() is called.
    This causes the preallocator to malfunction.

    Calls that are preaccepted by AFS within the kernel aren't affected by
    this.

    Signed-off-by: David Howells

    David Howells
     
  • The code for determining the last packet in rxrpc_recvmsg_data() has been
    using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points
    to the last packet or not. This isn't a good idea, however, as the input
    code may be running simultaneously on another CPU and that sets the flag
    *before* updating the top pointer.

    Fix this by the following means:

    (1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only.
    There's otherwise a synchronisation problem between detecting the flag
    and checking tx_top. This could probably be dealt with by appropriate
    application of memory barriers, but there's a simpler way.

    (2) Set RXRPC_CALL_RX_LAST after setting rx_top.

    (3) Make rxrpc_rotate_rx_window() consult the flags header field of the
    DATA packet it's about to discard to see if that was the last packet.
    Use this as the basis for ending the Rx phase. This shouldn't be a
    problem because the recvmsg side of things is guaranteed to see the
    packets in order.

    (4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if:

    (a) the packet it has just processed is marked as RXRPC_LAST_PACKET

    (b) the call's Rx phase has been ended.

    Signed-off-by: David Howells

    David Howells
     
  • Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data().

    Signed-off-by: David Howells

    David Howells
     
  • Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller,
    rxrpc_recvmsg_data(), so that it's more clear what's going on there.

    Signed-off-by: David Howells

    David Howells
     
  • Remove a tab that's on a line that should otherwise be blank.

    Signed-off-by: David Howells

    David Howells
     
  • Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
    This is then made conditional on CONFIG_IPV6.

    Without this, the following can be seen:

    net/built-in.o: In function `rxrpc_init_peer':
    >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'

    Reported-by: kbuild test robot
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

16 Sep, 2016

15 commits

  • John Crispin says:

    ====================
    net-next: dsa: add QCA8K support

    This series is based on the AR8xxx series posted by Matthieu Olivari in may
    2015. The following changes were made since then

    * fixed the nitpicks from the previous review
    * updated to latest API
    * turned it into an mdio device
    * added callbacks for fdb, bridge offloading, stp, eee, port status
    * fixed several minor issues to the port setup and arp learning
    * changed the namespacing as this driver to qca8k

    The driver has so far only been tested on qca8337/N. It should work on other QCA
    switches such as the qca8327 with minor changes.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch contains initial support for the QCA8337 switch. It
    will detect a QCA8337 switch, if present and declared in the DT.

    Each port will be represented through a standalone net_device interface,
    as for other DSA switches. CPU can communicate with any of the ports by
    setting an IP@ on ethN interface. Most of the extra callbacks of the DSA
    subsystem are already supported, such as bridge offloading, stp, fdb.

    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • Add support for the 2-bytes Qualcomm tag that gigabit switches such as
    the QCA8337/N might insert when receiving packets, or that we need
    to insert while targeting specific switch ports. The tag is inserted
    directly behind the ethernet header.

    Reviewed-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • Add device-tree binding for ar8xxx switch families.

    Cc: devicetree@vger.kernel.org
    Signed-off-by: John Crispin
    Reviewed-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    John Crispin
     
  • Remove .owner field if calls are used which set it automatically.

    Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Remove including that don't need it.

    Signed-off-by: Wei Yongjun
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Fixes the following sparse warning:

    drivers/net/dsa/bcm_sf2.c:963:19: warning:
    symbol 'bcm_sf2_io_ops' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • When skb replaces another one in ooo queue, I forgot to also
    update tp->ooo_last_skb as well, if the replaced skb was the last one
    in the queue.

    To fix this, we simply can re-use the code that runs after an insertion,
    trying to merge skbs at the right of current skb.

    This not only fixes the bug, but also remove all small skbs that might
    be a subset of the new one.

    Example:

    We receive segments 2001:3001, 4001:5001

    Then we receive 2001:8001 : We should replace 2001:3001 with the big
    skb, but also remove 4001:50001 from the queue to save space.

    packetdrill test demonstrating the bug

    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0

    +0 < S 0:0(0) win 32792
    +0 > S. 0:0(0) ack 1
    +0.100 < . 1:1(0) ack 1 win 1024
    +0 accept(3, ..., ...) = 4

    +0.01 < . 1001:2001(1000) ack 1 win 1024
    +0 > . 1:1(0) ack 1

    +0.01 < . 1001:3001(2000) ack 1 win 1024
    +0 > . 1:1(0) ack 1

    Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
    Signed-off-by: Eric Dumazet
    Reported-by: Yuchung Cheng
    Cc: Yaogong Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Sean Wang says:

    ====================
    mediatek: add enhancement into the existing reset flow

    Current driver only resets DMA used by descriptor rings which
    can't guarantee it can recover all various kinds of fatal
    errors, so the patch
    1) tries to reset the underlying hardware resource from scratch on
    Mediatek SoC required for ethernet running.
    2) refactors code in order to the reusability of existing code.
    3) considers handling for race condition between the reset flow and
    callbacks registered into core driver called about hardware accessing.
    4) introduces power domain usage to hardware setup which leads to have
    cleanly and completely restore to the state as the initial.

    Changes since v1:
    - fix the build error with module built causing undefined symbol for
    pinctrl_bind_pins, so using pinctrl_select_state instead accomplishes
    the pin mux setup during the reset process.
    ====================

    Reviewed-by: Steve Wise
    Signed-off-by: David S. Miller

    David S. Miller
     
  • add the protection of the race condition between
    the reset process and hardware access happening
    on the related callbacks.

    Signed-off-by: Sean Wang
    Signed-off-by: David S. Miller

    Sean Wang
     
  • struct mtk_eth has already contained struct regmap ethsys pointer
    to the address range of the internal circuit reset, so we reuse it
    to reset more internal blocks on ethernet hardware such as packet
    processing engine (PPE) and frame engine (FE) instead of rstc which
    deals with FE only.

    Signed-off-by: Sean Wang
    Signed-off-by: David S. Miller

    Sean Wang
     
  • 1) original driver only resets DMA used by descriptor rings
    which can't guarantee it can recover all various kinds of fatal
    errors, so the patch tries to reset the underlying hardware
    resource from scratch on Mediatek SoC required for ethernet
    running, including power, pin mux control, clock and internal
    circuits on the ethernet in order to restore into the initial
    state which the rebooted machine gives.

    2) add state variable inside structure mtk_eth to help distinguish
    mtk_hw_init is called between the initialization during boot time
    or re-initialization during the reset process.

    3) add ge_mode variable inside structure mtk_mac for restoring
    the interface mode of the current setup for the target MAC.

    4) remove __init attribute from mtk_hw_init definition

    Signed-off-by: Sean Wang
    Signed-off-by: David S. Miller

    Sean Wang
     
  • introduce power domain control which the digital circuit of
    the ethernet belongs to inside the flow of hardware initialization
    and deinitialization which helps the entire ethernet hardware block
    could restart cleanly and completely as being back to the initial
    state when the whole machine reboot.

    Signed-off-by: Sean Wang
    Signed-off-by: David S. Miller

    Sean Wang
     
  • This cleans up the error path inside mtk_hw_init call, causing it able
    to exit appropriately when something fails and also includes refactoring
    mtk_cleanup call to make the partial logic reusable on the error path.

    Signed-off-by: Sean Wang
    Signed-off-by: David S. Miller

    Sean Wang