04 Sep, 2017

40 commits

  • This patch removes NF_CT_ASSERT() and instead uses WARN_ON().

    Signed-off-by: Varsha Rao

    Varsha Rao
     
  • tested with allmodconfig build.

    Signed-off-by: Florian Westphal

    Florian Westphal
     
  • Register a new limit stateful object type into the stateful object
    infrastructure.

    Signed-off-by: Pablo M. Bermudo Garay
    Signed-off-by: Pablo Neira Ayuso

    Pablo M. Bermudo Garay
     
  • Just a small refactor patch in order to improve the code readability.

    Signed-off-by: Pablo M. Bermudo Garay
    Signed-off-by: Pablo Neira Ayuso

    Pablo M. Bermudo Garay
     
  • This patch adds support for overloading stateful objects operations
    through the select_ops() callback, just as it is implemented for
    expressions.

    This change is needed for upcoming additions to the stateful objects
    infrastructure.

    Signed-off-by: Pablo M. Bermudo Garay
    Signed-off-by: Pablo Neira Ayuso

    Pablo M. Bermudo Garay
     
  • This patch adds a new feature to hashlimit that allows matching on the
    current packet/byte rate without rate limiting. This can be enabled
    with a new flag --hashlimit-rate-match. The match returns true if the
    current rate of packets is above/below the user specified value.

    The main difference between the existing algorithm and the new one is
    that the existing algorithm rate-limits the flow whereas the new
    algorithm does not. Instead it *classifies* the flow based on whether
    it is above or below a certain rate. I will demonstrate this with an
    example below. Let us assume this rule:

    iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain

    If the packet rate is 15/s, the existing algorithm would ACCEPT 10
    packets every second and send 5 packets to "new_chain".

    But with the new algorithm, as long as the rate of 15/s is sustained,
    all packets will continue to match and every packet is sent to new_chain.

    This new functionality will let us classify different flows based on
    their current rate, so that further decisions can be made on them based on
    what the current rate is.

    This is how the new algorithm works:
    We divide time into intervals of 1 (sec/min/hour) as specified by
    the user. We keep track of the number of packets/bytes processed in the
    current interval. After each interval we reset the counter to 0.

    When we receive a packet for match, we look at the packet rate
    during the current interval and the previous interval to make a
    decision:

    if [ prev_rate < user and cur_rate < user ]
    return Below
    else
    return Above

    Where cur_rate is the number of packets/bytes seen in the current
    interval, prev is the number of packets/bytes seen in the previous
    interval and 'user' is the rate specified by the user.

    We also provide flexibility to the user for choosing the time
    interval using the option --hashilmit-interval. For example the user can
    keep a low rate like x/hour but still keep the interval as small as 1
    second.

    To preserve backwards compatibility we have to add this feature in a new
    revision, so I've created revision 3 for hashlimit. The two new options
    we add are:

    --hashlimit-rate-match
    --hashlimit-rate-interval

    I have updated the help text to add these new options. Also added a few
    tests for the new options.

    Suggested-by: Igor Lubashev
    Reviewed-by: Josh Hunt
    Signed-off-by: Vishwanath Pai
    Signed-off-by: Pablo Neira Ayuso

    Vishwanath Pai
     
  • Jakub Kicinski says:

    ====================
    nfp: refactor app init, and minor flower fixes

    This series is a part 2 to what went into net as a simpler fix.
    In net we simply moved when existing callbacks are invoked to
    ensure flower app does not still use representors when lower
    netdev has already been destroyed. In this series we add a
    callback to notify apps when vNIC netdevs are fully initialized
    and they are about to be destroyed. This allows flower to spawn
    representors at the right time, while keeping the start/stop
    callbacks for what they are intended to be used - FW initialization
    over control channel.

    Patch 4 improves drop monitor interaction and patch 5 changes
    the default Kconfig selection of flower offload. Patch 6 fixes
    locking around representor updates which got lost in net-next.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When we moved to updating representors from a workqueue grabbing
    the RTNL somehow got lost in the process. Restore it, and make
    sure RCU lock is not held while we are grabbing the RTNL. RCU
    protects the representor table, so since we will be under RTNL
    we can drop RCU lock as soon as we find the netdev pointer.
    RTNL is needed for the dev_set_mtu() call.

    Fixes: 2dff19622421 ("nfp: process MTU updates from firmware flower app")
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • It's reasonable to assume that if user selects to build the NFP
    driver all offload capabilities will be enabled by default.
    Change the CONFIG_NFP_APP_FLOWER to default to enabled.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Use dev_consume_skb_any() in place of dev_kfree_skb_any()
    when control frame has been successfully processed in flower
    and on the driver's main TX completion path.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Since representors are now created with a separate callback
    start/stop app callbacks can be moved again to their original
    location. They are intended to app-specific init/clean up
    over the control channel.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Create representors after lower vNIC is registered and destroy
    them before it is destroyed. Move the code out of start/stop
    callbacks directly into vnic_init/clean callbacks. Make sure
    SR-IOV callbacks don't try to create representors when lower
    device does not exist.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We currently only have one app callback for vNIC creation
    and destruction. This is insufficient, because some actions
    have to be taken before netdev is registered, after it's
    registered and after it's unregistered. Old callbacks
    were really corresponding to alloc/free actions. Rename
    them and add proper init/clean. Apps using representors
    will be able to use new callbacks to manage lifetime of
    upper devices.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2017-09-03

    This series from Tariq includes micro data path optimization for mlx5e
    netdevice driver.

    Mainly Tariq introduces the following changes to NAPI and RX handling
    path of the driver:
    - RX ring structure reorganizing
    - Trivial code refactoring and optimization
    - NAPI busy-poll for when fast UMR is in progress
    - Non-atomic state operations in NAPI context
    - Remove unnecessary fields from fast path structures
    - page-cache micro optimization
    - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts
    - Stop NAPI when irq changes affinity
    - Distribute RSS table among all RX rings
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Jiri Pirko says:

    ====================
    mlxsw: Offloading GRE tunnels

    Petr says:

    This patch series introduces to mlxsw driver support for offloading
    IP-in-IP tunnels in general, and for (subset of) GRE in particular.

    This patchset supports two ways of configuring GRE:

    - So called "hierarchical configuration", where the GRE device has a bound
    dummy device, which is in a different VRF. The VRF with host traffic is
    called "overlay", the one with encapsulated traffic is called "underlay".

    - So called "flat configuration", where the GRE device doesn't have a bound
    device, and overlay and underlay are both in the same VRF (possibly the
    default one).

    Two routes are then interesting: a route that directs traffic to a GRE
    device (which would typically be in overlay VRF, but could be in another
    one), and a local route for the tunnel's local address (in underlay).
    Handling of these two route types is then introduced as patches to support,
    respectively, IPv4 and IPv6 encapsulation and IPv4 decapsulation.

    The encap and decap routes then reference a loopback device, a new type of
    RIF introduced by this patchset for the specific use of offloading tunnels.

    The encap and decap code is abstract with respect to the particulars of
    individual L3 tunnel types. This patchset introduces support for GRE
    tunnels in particular.

    Limitations:

    - Each tunnel needs to have a different local address (within a given VRF).
    When two tunnels are used that are in conflict, FIB abort is triggered
    and the driver ceases offloading FIBs. Full handling of such
    configurations needs special setup in the hardware, such that the tunnels
    that share an address are dispatched correctly according to their key (or
    lack thereof). That's currently not implemented, and to keep things
    deterministic, the driver triggers FIB abort.

    - A next hop that uses an incompletely-specified tunnel (e.g. such that are
    used for LWT) is not offloaded, but doesn't trigger FIB abort like the
    above. If such routes end up being in a de facto conflict with other
    tunnels, then if there already is an offload for that address, the
    traffic for the conflicting tunnel will end up mismatching the
    configuration of the offloaded tunnel, and thus gets to slow path through
    an error trap.

    - GRE checksumming and sequence numbers are not supported and TTL and TOS
    need to be set to inherit. Tunnels with a different configuration are not
    offloaded and their traffic is trapping to slow path.

    Note in particular that TOS of inherit is not the default configuration
    and needs to be explicitly specified when the tunnel is created.

    - The only feature that is not graciously handled is that if a change is
    made to the tunnel, e.g. through "ip tunnel change", such changes are not
    reflected in the driver. There is currently no notification mechanism for
    these changes. Introduction of this mechanism and its leverage in the
    driver will be subject of follow-up work. For now this limitation can be
    worked around by removing and re-adding the encap route.

    ---
    v1->v2:
    -fix order of patch 5
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch introduces callbacks and tunnel type to offload GRE tunnels.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • struct mlxsw_sp_rif is a router-private structure, and therefore
    everything related to it is as well: parameters, and derived RIF types
    including loopbacks. IPIP module needs access to some details of
    loopback interfaces, but exporting all the RIF shebang would create too
    large an interface.

    So instead export just the bare minimum necessary: accessors for RIF
    index and underlay VRF ID.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • These traps are generated for packets that fail checks for source IP,
    encapsulation type, or GRE key. Trap these packets to CPU for follow-up
    handling by the kernel, which will send ICMP destination unreachable
    responses.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The local route that points at IPIP's underlay device (decap route) can
    be present long before the GRE device. Thus when an encap route is
    added, it's necessary to look inside the underlay FIB if the decap route
    is already present. If so, the current trap offload needs to be
    withdrawn and replaced with a decap offload.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Unlike encapsulation, which is represented by a next hop forwarding to
    an IPIP tunnel, decapsulation is a type of local route. It is created
    for local routes whose prefix corresponds to the local address of one of
    offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
    next hops are removed), the decap offload is migrated back to a trap for
    resolution in slow path.

    This patch assumes that decap route is already present when encap route
    is added. A follow-up patch will fix this issue.

    Note that this patch only supports IPv4 underlay. Support for IPv6
    underlay will be subject to follow-up work apart from this patchset.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
    offloading of IPv6 overlay encapsulation.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This introduces some common code for tracking of offloaded IP-in-IP
    tunnels, and support for offloading IPv4 overlay encapsulating routes in
    particular. A follow-up patch will introduce IPv6 overlay as well.

    Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
    objects hooked up in mlxsw_sp_router. A network device that represents
    the tunnel is used as a key to look up the corresponding IPIP entry.
    Note that in the future, more general keying mechanism will be needed,
    because parts of the tunnel information can be provided by the route.

    IPIP entries are reference counted, because several next hops may end up
    using the same tunnel, and we only want to offload it once.

    Encapsulation path hooks into next hop handling. Routes that forward to
    a tunnel are now considered gateway routes, thus giving them the same
    treatment that other remote routes get. An IPIP next hop type is
    introduced.

    Details of individual tunnel types are kept in an array of
    mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
    known tunnel types, the next-hop is not considered an IPIP next hop.

    The list of IPIP tunnel types is currently empty, follow-up patches will
    add support for GRE. Traffic to IPIP tunnel types that are not
    explicitly recognized by the driver traps and is handled in slow path.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • In the router, some next hops may reference an encapsulating netdevice,
    such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
    keep track of whether a given next hop is a regular Ethernet entry, or
    an IP-in-IP tunneling entry.

    To facilitate this book-keeping, add a type field to struct
    mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
    MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
    variant.

    There are several places where next hops are initialized in the IPv4
    path. Instead of replicating the logic at every one of them, factor it
    out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
    is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
    but create a corresponding protocoled _fini function that dispatches to
    the protocol-neutral one.

    The IPv6 path is simpler, but for symmetry with IPv4, create the same
    suite of functions with corresponding logic.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • IPv6 counterpart of the previous patch: introduce a function to
    determine whether a given route is a gateway route.

    The new function takes a mlxsw_sp argument which follow-up patches will
    use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
    devices need to be considered gateway routes as well. That involves a
    bit more logic, so extract the current test to a separate function,
    where the logic can be later added.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • When offloading L3 tunnels, an adjacency entry is created that loops the
    packet back into the underlay router. Loopback interfaces then hold the
    corresponding information and are created for IP-in-IP netdevices.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Loopback RIFs, which will be introduced in a follow-up patch, differ
    from other RIFs in that they do not have a FID associated with them.

    To support this, demote FID allocation from mlxsw_sp_rif_create to
    configure op of the existing RIF types, and likewise the FID release
    from mlxsw_sp_rif_destroy to deconfigure op.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Details of individual tunnel types are kept in an array of
    mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
    determine whether a constructed RIF should be a loopback, and to decide
    whether a next hop references a tunnel.

    The list is currently empty, follow-up patches will add support for GRE.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The spectrum_ipip module that will be introduced in the follow-up
    patches needs to know the data type.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • To support IPIP, the driver needs to be able to construct an IPIP
    adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
    argument. Adjust the one existing caller.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Unlike other interface types, loopback RIFs do not have MAC address. So
    drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
    to a new function. Call that from callers of mlxsw_reg_ritr_pack.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The RTDP register is used for configuring the tunnel decap properties of
    NVE and IPinIP.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type
    IP2ME with tunnel validity bit and tunnel pointer set. The necessary
    register fields are already available, so add a function to pack the
    RALUE as appropriate.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This enum is used with reg_ratr_trap_id, so move it next to the register
    definition.

    While at it, drop the enumerator initializers.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • So far, adjacencies have always been of type Ethernet (with value of 0),
    and thus there was no need to explicitly support RATR type. However to
    support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific
    attributes need to be added.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Update the register so that loopback RIFs can be created and loopback
    properties specified.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Antoine Tenart says:

    ====================
    net: mvpp2: improve the mac address retrieval logic

    This series aims at fixing the logic behind the MAC address retrieval in the
    PPv2 driver. A possible issue is also fixed in patch 3/3 to introduce fallbacks
    when the address given in the device tree isn't valid.

    Thanks!
    Antoine

    Since v2:
    - Patch 1/4 from v2 was applied on net (and net was merged in net-next).
    - Rebased on net-next.

    Since v1:
    - Rebased onto net (was on net-next).
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When using a mac address described in the device tree, a check is made
    to see if it is valid. When it's not, no fallback is defined. This
    patches tries to get the mac address from h/w (or use a random one if
    the h/w one isn't valid) when the dt mac address isn't valid.

    Signed-off-by: Antoine Tenart
    Signed-off-by: David S. Miller

    Antoine Tenart
     
  • The MAC retrieval logic is using a variable to store an h/w stored mac
    address and checks this mac against invalid ones before using it. But
    the mac address is only read from h/w when using PPv2.1. So when using
    PPv2.2 it defaults to its init state.

    This patches fixes the logic to only check if the h/w mac is valid when
    actually retrieving a mac from h/w.

    Signed-off-by: Antoine Tenart
    Signed-off-by: David S. Miller

    Antoine Tenart
     
  • The MAC retrieval has a quite complicated logic (which is broken). Moves
    it to its own function to prepare for patches fixing its logic, so that
    reviews are easier.

    Signed-off-by: Antoine Tenart
    Signed-off-by: David S. Miller

    Antoine Tenart