04 Sep, 2017
40 commits
-
This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
Signed-off-by: Varsha Rao
-
tested with allmodconfig build.
Signed-off-by: Florian Westphal
-
Register a new limit stateful object type into the stateful object
infrastructure.Signed-off-by: Pablo M. Bermudo Garay
Signed-off-by: Pablo Neira Ayuso -
Just a small refactor patch in order to improve the code readability.
Signed-off-by: Pablo M. Bermudo Garay
Signed-off-by: Pablo Neira Ayuso -
This patch adds support for overloading stateful objects operations
through the select_ops() callback, just as it is implemented for
expressions.This change is needed for upcoming additions to the stateful objects
infrastructure.Signed-off-by: Pablo M. Bermudo Garay
Signed-off-by: Pablo Neira Ayuso -
This patch adds a new feature to hashlimit that allows matching on the
current packet/byte rate without rate limiting. This can be enabled
with a new flag --hashlimit-rate-match. The match returns true if the
current rate of packets is above/below the user specified value.The main difference between the existing algorithm and the new one is
that the existing algorithm rate-limits the flow whereas the new
algorithm does not. Instead it *classifies* the flow based on whether
it is above or below a certain rate. I will demonstrate this with an
example below. Let us assume this rule:iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
If the packet rate is 15/s, the existing algorithm would ACCEPT 10
packets every second and send 5 packets to "new_chain".But with the new algorithm, as long as the rate of 15/s is sustained,
all packets will continue to match and every packet is sent to new_chain.This new functionality will let us classify different flows based on
their current rate, so that further decisions can be made on them based on
what the current rate is.This is how the new algorithm works:
We divide time into intervals of 1 (sec/min/hour) as specified by
the user. We keep track of the number of packets/bytes processed in the
current interval. After each interval we reset the counter to 0.When we receive a packet for match, we look at the packet rate
during the current interval and the previous interval to make a
decision:if [ prev_rate < user and cur_rate < user ]
return Below
else
return AboveWhere cur_rate is the number of packets/bytes seen in the current
interval, prev is the number of packets/bytes seen in the previous
interval and 'user' is the rate specified by the user.We also provide flexibility to the user for choosing the time
interval using the option --hashilmit-interval. For example the user can
keep a low rate like x/hour but still keep the interval as small as 1
second.To preserve backwards compatibility we have to add this feature in a new
revision, so I've created revision 3 for hashlimit. The two new options
we add are:--hashlimit-rate-match
--hashlimit-rate-intervalI have updated the help text to add these new options. Also added a few
tests for the new options.Suggested-by: Igor Lubashev
Reviewed-by: Josh Hunt
Signed-off-by: Vishwanath Pai
Signed-off-by: Pablo Neira Ayuso -
Jakub Kicinski says:
====================
nfp: refactor app init, and minor flower fixesThis series is a part 2 to what went into net as a simpler fix.
In net we simply moved when existing callbacks are invoked to
ensure flower app does not still use representors when lower
netdev has already been destroyed. In this series we add a
callback to notify apps when vNIC netdevs are fully initialized
and they are about to be destroyed. This allows flower to spawn
representors at the right time, while keeping the start/stop
callbacks for what they are intended to be used - FW initialization
over control channel.Patch 4 improves drop monitor interaction and patch 5 changes
the default Kconfig selection of flower offload. Patch 6 fixes
locking around representor updates which got lost in net-next.
====================Signed-off-by: David S. Miller
-
When we moved to updating representors from a workqueue grabbing
the RTNL somehow got lost in the process. Restore it, and make
sure RCU lock is not held while we are grabbing the RTNL. RCU
protects the representor table, so since we will be under RTNL
we can drop RCU lock as soon as we find the netdev pointer.
RTNL is needed for the dev_set_mtu() call.Fixes: 2dff19622421 ("nfp: process MTU updates from firmware flower app")
Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
It's reasonable to assume that if user selects to build the NFP
driver all offload capabilities will be enabled by default.
Change the CONFIG_NFP_APP_FLOWER to default to enabled.Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
Use dev_consume_skb_any() in place of dev_kfree_skb_any()
when control frame has been successfully processed in flower
and on the driver's main TX completion path.Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
Since representors are now created with a separate callback
start/stop app callbacks can be moved again to their original
location. They are intended to app-specific init/clean up
over the control channel.Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
Create representors after lower vNIC is registered and destroy
them before it is destroyed. Move the code out of start/stop
callbacks directly into vnic_init/clean callbacks. Make sure
SR-IOV callbacks don't try to create representors when lower
device does not exist.Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
We currently only have one app callback for vNIC creation
and destruction. This is insufficient, because some actions
have to be taken before netdev is registered, after it's
registered and after it's unregistered. Old callbacks
were really corresponding to alloc/free actions. Rename
them and add proper init/clean. Apps using representors
will be able to use new callbacks to manage lifetime of
upper devices.Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller -
Saeed Mahameed says:
====================
mlx5-updates-2017-09-03This series from Tariq includes micro data path optimization for mlx5e
netdevice driver.Mainly Tariq introduces the following changes to NAPI and RX handling
path of the driver:
- RX ring structure reorganizing
- Trivial code refactoring and optimization
- NAPI busy-poll for when fast UMR is in progress
- Non-atomic state operations in NAPI context
- Remove unnecessary fields from fast path structures
- page-cache micro optimization
- Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts
- Stop NAPI when irq changes affinity
- Distribute RSS table among all RX rings
====================Signed-off-by: David S. Miller
-
Jiri Pirko says:
====================
mlxsw: Offloading GRE tunnelsPetr says:
This patch series introduces to mlxsw driver support for offloading
IP-in-IP tunnels in general, and for (subset of) GRE in particular.This patchset supports two ways of configuring GRE:
- So called "hierarchical configuration", where the GRE device has a bound
dummy device, which is in a different VRF. The VRF with host traffic is
called "overlay", the one with encapsulated traffic is called "underlay".- So called "flat configuration", where the GRE device doesn't have a bound
device, and overlay and underlay are both in the same VRF (possibly the
default one).Two routes are then interesting: a route that directs traffic to a GRE
device (which would typically be in overlay VRF, but could be in another
one), and a local route for the tunnel's local address (in underlay).
Handling of these two route types is then introduced as patches to support,
respectively, IPv4 and IPv6 encapsulation and IPv4 decapsulation.The encap and decap routes then reference a loopback device, a new type of
RIF introduced by this patchset for the specific use of offloading tunnels.The encap and decap code is abstract with respect to the particulars of
individual L3 tunnel types. This patchset introduces support for GRE
tunnels in particular.Limitations:
- Each tunnel needs to have a different local address (within a given VRF).
When two tunnels are used that are in conflict, FIB abort is triggered
and the driver ceases offloading FIBs. Full handling of such
configurations needs special setup in the hardware, such that the tunnels
that share an address are dispatched correctly according to their key (or
lack thereof). That's currently not implemented, and to keep things
deterministic, the driver triggers FIB abort.- A next hop that uses an incompletely-specified tunnel (e.g. such that are
used for LWT) is not offloaded, but doesn't trigger FIB abort like the
above. If such routes end up being in a de facto conflict with other
tunnels, then if there already is an offload for that address, the
traffic for the conflicting tunnel will end up mismatching the
configuration of the offloaded tunnel, and thus gets to slow path through
an error trap.- GRE checksumming and sequence numbers are not supported and TTL and TOS
need to be set to inherit. Tunnels with a different configuration are not
offloaded and their traffic is trapping to slow path.Note in particular that TOS of inherit is not the default configuration
and needs to be explicitly specified when the tunnel is created.- The only feature that is not graciously handled is that if a change is
made to the tunnel, e.g. through "ip tunnel change", such changes are not
reflected in the driver. There is currently no notification mechanism for
these changes. Introduction of this mechanism and its leverage in the
driver will be subject of follow-up work. For now this limitation can be
worked around by removing and re-adding the encap route.---
v1->v2:
-fix order of patch 5
====================Signed-off-by: David S. Miller
-
This patch introduces callbacks and tunnel type to offload GRE tunnels.
Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
struct mlxsw_sp_rif is a router-private structure, and therefore
everything related to it is as well: parameters, and derived RIF types
including loopbacks. IPIP module needs access to some details of
loopback interfaces, but exporting all the RIF shebang would create too
large an interface.So instead export just the bare minimum necessary: accessors for RIF
index and underlay VRF ID.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
These traps are generated for packets that fail checks for source IP,
encapsulation type, or GRE key. Trap these packets to CPU for follow-up
handling by the kernel, which will send ICMP destination unreachable
responses.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
The local route that points at IPIP's underlay device (decap route) can
be present long before the GRE device. Thus when an encap route is
added, it's necessary to look inside the underlay FIB if the decap route
is already present. If so, the current trap offload needs to be
withdrawn and replaced with a decap offload.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Unlike encapsulation, which is represented by a next hop forwarding to
an IPIP tunnel, decapsulation is a type of local route. It is created
for local routes whose prefix corresponds to the local address of one of
offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
next hops are removed), the decap offload is migrated back to a trap for
resolution in slow path.This patch assumes that decap route is already present when encap route
is added. A follow-up patch will fix this issue.Note that this patch only supports IPv4 underlay. Support for IPv6
underlay will be subject to follow-up work apart from this patchset.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
offloading of IPv6 overlay encapsulation.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
This introduces some common code for tracking of offloaded IP-in-IP
tunnels, and support for offloading IPv4 overlay encapsulating routes in
particular. A follow-up patch will introduce IPv6 overlay as well.Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
objects hooked up in mlxsw_sp_router. A network device that represents
the tunnel is used as a key to look up the corresponding IPIP entry.
Note that in the future, more general keying mechanism will be needed,
because parts of the tunnel information can be provided by the route.IPIP entries are reference counted, because several next hops may end up
using the same tunnel, and we only want to offload it once.Encapsulation path hooks into next hop handling. Routes that forward to
a tunnel are now considered gateway routes, thus giving them the same
treatment that other remote routes get. An IPIP next hop type is
introduced.Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
known tunnel types, the next-hop is not considered an IPIP next hop.The list of IPIP tunnel types is currently empty, follow-up patches will
add support for GRE. Traffic to IPIP tunnel types that are not
explicitly recognized by the driver traps and is handled in slow path.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
In the router, some next hops may reference an encapsulating netdevice,
such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
keep track of whether a given next hop is a regular Ethernet entry, or
an IP-in-IP tunneling entry.To facilitate this book-keeping, add a type field to struct
mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
variant.There are several places where next hops are initialized in the IPv4
path. Instead of replicating the logic at every one of them, factor it
out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
but create a corresponding protocoled _fini function that dispatches to
the protocol-neutral one.The IPv6 path is simpler, but for symmetry with IPv4, create the same
suite of functions with corresponding logic.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
IPv6 counterpart of the previous patch: introduce a function to
determine whether a given route is a gateway route.The new function takes a mlxsw_sp argument which follow-up patches will
use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
devices need to be considered gateway routes as well. That involves a
bit more logic, so extract the current test to a separate function,
where the logic can be later added.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
When offloading L3 tunnels, an adjacency entry is created that loops the
packet back into the underlay router. Loopback interfaces then hold the
corresponding information and are created for IP-in-IP netdevices.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Loopback RIFs, which will be introduced in a follow-up patch, differ
from other RIFs in that they do not have a FID associated with them.To support this, demote FID allocation from mlxsw_sp_rif_create to
configure op of the existing RIF types, and likewise the FID release
from mlxsw_sp_rif_destroy to deconfigure op.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
determine whether a constructed RIF should be a loopback, and to decide
whether a next hop references a tunnel.The list is currently empty, follow-up patches will add support for GRE.
Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
The spectrum_ipip module that will be introduced in the follow-up
patches needs to know the data type.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
To support IPIP, the driver needs to be able to construct an IPIP
adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
argument. Adjust the one existing caller.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Unlike other interface types, loopback RIFs do not have MAC address. So
drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
to a new function. Call that from callers of mlxsw_reg_ritr_pack.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
The RTDP register is used for configuring the tunnel decap properties of
NVE and IPinIP.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type
IP2ME with tunnel validity bit and tunnel pointer set. The necessary
register fields are already available, so add a function to pack the
RALUE as appropriate.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
This enum is used with reg_ratr_trap_id, so move it next to the register
definition.While at it, drop the enumerator initializers.
Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
So far, adjacencies have always been of type Ethernet (with value of 0),
and thus there was no need to explicitly support RATR type. However to
support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific
attributes need to be added.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Update the register so that loopback RIFs can be created and loopback
properties specified.Signed-off-by: Petr Machata
Reviewed-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Antoine Tenart says:
====================
net: mvpp2: improve the mac address retrieval logicThis series aims at fixing the logic behind the MAC address retrieval in the
PPv2 driver. A possible issue is also fixed in patch 3/3 to introduce fallbacks
when the address given in the device tree isn't valid.Thanks!
AntoineSince v2:
- Patch 1/4 from v2 was applied on net (and net was merged in net-next).
- Rebased on net-next.Since v1:
- Rebased onto net (was on net-next).
====================Signed-off-by: David S. Miller
-
When using a mac address described in the device tree, a check is made
to see if it is valid. When it's not, no fallback is defined. This
patches tries to get the mac address from h/w (or use a random one if
the h/w one isn't valid) when the dt mac address isn't valid.Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller -
The MAC retrieval logic is using a variable to store an h/w stored mac
address and checks this mac against invalid ones before using it. But
the mac address is only read from h/w when using PPv2.1. So when using
PPv2.2 it defaults to its init state.This patches fixes the logic to only check if the h/w mac is valid when
actually retrieving a mac from h/w.Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller -
The MAC retrieval has a quite complicated logic (which is broken). Moves
it to its own function to prepare for patches fixing its logic, so that
reviews are easier.Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller