31 Jul, 2015

12 commits

  • After the most recent changes, all access calls to a link which
    may entail addition of messages to the link's input queue are
    postpended by an explicit call to tipc_sk_rcv(), using a reference
    to the correct queue.

    This means that the potentially hazardous implicit delivery, using
    tipc_node_unlock() in combination with a binary flag and a cached
    queue pointer, now has become redundant.

    This commit removes this implicit delivery mechanism both for regular
    data messages and for binding table update messages.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In order to facilitate future improvements to the locking structure, we
    want to make resetting and establishing of links non-atomic. I.e., the
    functions tipc_node_link_up() and tipc_node_link_down() should be called
    from outside the node lock context, and grab/release the node lock
    themselves. This requires that we can freeze the link state from the
    moment it is set to RESETTING or PEER_RESET in one lock context until
    it is set to RESET or ESTABLISHING in a later context. The recently
    introduced link FSM makes this possible, so we are now ready to introduce
    the above change.

    This commit implements this.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The node lock is currently grabbed and and released in the function
    tipc_disc_rcv() in the file discover.c. As a preparation for the next
    commits, we need to move this node lock handling, along with the code
    area it is covering, to node.c.

    This commit introduces this change.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Until now, we have been handling link failover and synchronization
    by using an additional link state variable, "exec_mode". This variable
    is not independent of the link FSM state, something causing a risk of
    inconsistencies, apart from the fact that it clutters the code.

    The conditions are now in place to define a new link FSM that covers
    all existing use cases, including failover and synchronization, and
    eliminate the "exec_mode" field altogether. The FSM must also support
    non-atomic resetting of links, which will be introduced later.

    The new link FSM is shown below, with 7 states and 8 events.
    Only events leading to state change are shown as edges.

    +------------------------------------+
    |RESET_EVT |
    | |
    | +--------------+
    | +-----------------| SYNCHING |-----------------+
    | |FAILURE_EVT +--------------+ PEER_RESET_EVT|
    | | A | |
    | | | | |
    | | | | |
    | | |SYNCH_ |SYNCH_ |
    | | |BEGIN_EVT |END_EVT |
    | | | | |
    | V | V V
    | +-------------+ +--------------+ +------------+
    | | RESETTING || PEER_RESET |
    | +-------------+ FAILURE_ +--------------+ PEER_ +------------+
    | | EVT | A RESET_EVT |
    | | | | |
    | | | | |
    | | +--------------+ | |
    | RESET_EVT| |RESET_EVT |ESTABLISH_EVT |
    | | | | |
    | | | | |
    | V V | |
    | +-------------+ +--------------+ RESET_EVT|
    +--->| RESET |--------->| ESTABLISHING |
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The implementation of the link FSM currently takes decisions about and
    sends out link protocol messages. This is unnecessary, since such
    actions are not the result of any link state change, and are even
    decided based on non-FSM state information ("silent_intv_cnt").

    We now move the sending of unicast link protocol messages to the
    function tipc_link_timeout(), and the initial broadcast synchronization
    message to tipc_node_link_up(). The latter is done because a link
    instance should not need to know whether it is the first or second
    link to a destination. Such information is now restricted to and
    handled by the link aggregation layer in node.c

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Link failover and synchronization have until now been handled by the
    links themselves, forcing them to have knowledge about and to access
    parallel links in order to make the two algorithms work correctly.

    In this commit, we move the control part of this functionality to the
    link aggregation level in node.c, which is the right location for this.
    As a result, the two algorithms become easier to follow, and the link
    implementation becomes simpler.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In the next commit, we will move link synch/failover orchestration to
    the link aggregation level. In order to do this, we first need to extend
    the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER,
    plus four new events to enter and leave those states.

    This commit introduces this change, without yet making use of it.
    The node FSM now looks as follows:

    +-----------------------------------------+
    | PEER_DOWN_EVT|
    | |
    +------------------------+----------------+ |
    |SELF_DOWN_EVT | | |
    | | | |
    | +-----------+ +-----------+ |
    | |NODE_ | |NODE_ | |
    | +----------|FAILINGOVER|| SELF_UP_ ||SELF_LEAVING|
    +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+
    | DOWN_EVT A A DOWN_EVT |
    | | | |
    | | | |
    | SELF_UP_EVT| |PEER_UP_EVT |
    | | | |
    | | | |
    |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT|
    +------------------->| SELF_DOWN_ |
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In many cases the call order when a link is reset goes as follows:
    tipc_node_xx()->tipc_link_reset()->tipc_node_link_down()

    This is not the right order if we want the node to be in control,
    so in this commit we change the order to:
    tipc_node_xx()->tipc_node_link_down()->tipc_link_reset()

    The fact that tipc_link_reset() now is called from only one
    location with a well-defined state will also facilitate later
    simplifications of tipc_link_reset() and the link FSM.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • In line with our effort to let the node level have full control over
    its links, we want to move all link reset calls from link.c to node.c.
    Some of the calls can be moved by simply moving the calling function,
    when this is the right thing to do. For the remaining calls we use
    the now established technique of returning a TIPC_LINK_DOWN_EVT
    flag from tipc_link_rcv(), whereafter we perform the reset call when
    the call returns.

    This change serves as a preparation for the coming commits.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • The function tipc_link_activate() is redundant, since it mostly performs
    settings that have already been done in a preceding tipc_link_reset().

    There are three exceptions to this:
    - The actual state change to TIPC_LINK_WORKING. This should anyway be done
    in the FSM, and not in a separate function.
    - Registration of the link with the bearer. This should be done by the
    node, since we don't want the link to have any knowledge about its
    specific bearer.
    - Call to tipc_node_link_up() for user access registration. With the new
    role distribution between link aggregation and link level this becomes
    the wrong call order; tipc_node_link_up() should instead be called
    directly as a result of a TIPC_LINK_UP event, hence by the node itself.

    This commit implements those changes.

    Tested-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2015-07-30

    Here's a set of Bluetooth & 802.15.4 patches intended for the 4.3 kernel.

    - Cleanups & fixes to mac802154
    - Refactoring of Intel Bluetooth HCI driver
    - Various coding style fixes to Bluetooth HCI drivers
    - Support for Intel Lightning Peak Bluetooth devices
    - Generic class code in interface descriptor in btusb to match more HW
    - Refactoring of Bluetooth HS code together with a new config option
    - Support for BCM4330B1 Broadcom UART controller

    Let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
    disabled accept hop limit from RA if it is smaller than the current hop
    limit for security stuff. But this behavior kind of break the RFC definition.

    RFC 4861, 6.3.4. Processing Received Router Advertisements
    A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
    and Retrans Timer) may contain a value denoting that it is
    unspecified. In such cases, the parameter should be ignored and the
    host should continue using whatever value it is already using.

    If the received Cur Hop Limit value is non-zero, the host SHOULD set
    its CurHopLimit variable to the received value.

    So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
    hop limit value they can accept from RA. And set default to 1 to meet RFC
    standards.

    Signed-off-by: Hangbin Liu
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Hangbin Liu
     

30 Jul, 2015

15 commits


27 Jul, 2015

13 commits

  • kfree_skb() is correct here.

    Fixes: ffce41962ef6 ('lwtunnel: support dst output redirect function')
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • While doing experiments with reordering resilience, we found
    linux senders were not able to send at full speed under reordering,
    because every incoming SACK was releasing one MSS.

    This patch removes the limitation, as we did for CWR state
    in commit a0ea700e409 ("tcp: tso: allow CA_CWR state in
    tcp_tso_should_defer()")

    Neal Cardwell had a concern about limited transmit so
    Yuchung conducted experiments on GFE and found nothing
    worth adding an extra check on fast path :

    if (icsk->icsk_ca_state == TCP_CA_Disorder &&
    tcp_sk(sk)->reordering == sysctl_tcp_reordering)
    goto send_now;

    Signed-off-by: Eric Dumazet
    Signed-off-by: Yuchung Cheng
    Cc: Neal Cardwell
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The patch checks neigh->nud_state before acquiring the writer lock.
    Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

    40 udpflood processes and a /64 gateway route are used.
    The gateway has NUD_PERMANENT. Each of them is run for 30s.
    At the end, the total number of finished sendto():

    Before: 55M
    After: 95M

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    CC: Julian Anastasov
    CC: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • It is a prep work for the next patch to remove write_lock
    from rt6_probe().

    1. Reduce the number of if(neigh) check. From 4 to 1.
    2. Bring the write_(un)lock() closer to the operations that the
    lock is protecting.

    Hopefully, the above make rt6_probe() more readable.

    Signed-off-by: Martin KaFai Lau
    Cc: Hannes Frederic Sowa
    Cc: Julian Anastasov
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • It saves some lines and simplify a bit the code when the state is returning
    by this function. It's also useful to handle a NULL entry.

    To avoid too long lines, I've also renamed lwtunnel_state_get() and
    lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

    CC: Thomas Graf
    CC: Roopa Prabhu
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
    use ip6_rt_copy_init() to build a dst).

    CC: Thomas Graf
    CC: Roopa Prabhu
    Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
    The check is already done in IPv4.

    CC: Thomas Graf
    CC: Roopa Prabhu
    Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Fix the following typo
    - unchainged -> unchanged

    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: David S. Miller

    subashab@codeaurora.org
     
  • Send notifications on router port add and del/expire, re-use the already
    existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications
    respectively.

    Signed-off-by: Satish Ashok
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Satish Ashok
     
  • Retrieve the tunnel metadata for packets received by a net_device and
    provide it to ovs_vport_receive() for flow key extraction.

    [This hunk was in the GRE patch in the initial series and missed the
    cut for the initial submission for merging.]

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Signed-off-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following
    compilation error:
    LD init/built-in.o
    net/built-in.o: In function `vxlan_tnl_create':
    .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create'
    make: *** [vmlinux] Error 1

    CC: Thomas Graf
    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Currently, we do not notice if new alternative gateways
    are added. We can do it by checking for present neigh
    entry. Also, gateways that are currently probed (NUD_INCOMPLETE)
    can be skipped from round-robin probing.

    Suggested-by: Florian Westphal
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • Similar check was added in ip_rcv but not in ipv6_rcv.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] ipv6_rcv+0xfa/0x500
    Call Trace:
    [] ? ip_rcv+0x296/0x400
    [] ? packet_rcv+0x52/0x410
    [] __netif_receive_skb_core+0x63f/0x9a0
    [] ? br_handle_frame_finish+0x580/0x580 [bridge]
    [] ? update_rq_clock.part.81+0x1c/0x40
    [] __netif_receive_skb+0x18/0x60
    [] process_backlog+0x9f/0x150

    Fixes: ee122c79d422 (vxlan: Flow based tunneling)
    Signed-off-by: Wei-Chun Chao
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Wei-Chun Chao