19 Jul, 2018

1 commit

  • The offload_handle should be an opaque data cookie for the driver
    to use, much like the data cookie for a timer or alarm callback.
    Thus, the XFRM stack should not be checking for non-zero, because
    the driver might use that to store an array reference, which could
    be zero, or some other zero but meaningful value.

    We can remove the checks for non-zero because there are plenty
    other attributes also being checked to see if there is an offload
    in place for the SA in question.

    Signed-off-by: Shannon Nelson
    Signed-off-by: Steffen Klassert

    Shannon Nelson
     

25 Jun, 2018

1 commit

  • Kristian Evensen says:
    In a project I am involved in, we are running ipsec (Strongswan) on
    different mt7621-based routers. Each router is configured as an
    initiator and has around ~30 tunnels to different responders (running
    on misc. devices). Before the flow cache was removed (kernel 4.9), we
    got a combined throughput of around 70Mbit/s for all tunnels on one
    router. However, we recently switched to kernel 4.14 (4.14.48), and
    the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
    drop of around 20%. Reverting the flow cache removal restores, as
    expected, performance levels to that of kernel 4.9.

    When pcpu xdst exists, it has to be validated first before it can be
    used.

    A negative hit thus increases cost vs. no-cache.

    As number of tunnels increases, hit rate decreases so this pcpu caching
    isn't a viable strategy.

    Furthermore, the xdst cache also needs to run with BH off, so when
    removing this the bh disable/enable pairs can be removed too.

    Kristian tested a 4.14.y backport of this change and reported
    increased performance:

    In our tests, the throughput reduction has been reduced from around -20%
    to -5%. We also see that the overall throughput is independent of the
    number of tunnels, while before the throughput was reduced as the number
    of tunnels increased.

    Reported-by: Kristian Evensen
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

23 Jun, 2018

1 commit

  • We already support setting an output mark at the xfrm_state,
    unfortunately this does not support the input direction and
    masking the marks that will be applied to the skb. This change
    adds support applying a masked value in both directions.

    The existing XFRMA_OUTPUT_MARK number is reused for this purpose
    and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

    An additional XFRMA_SET_MARK_MASK attribute is added for setting the
    mask. If the attribute mask not provided, it is set to 0xffffffff,
    keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.

    Co-developed-by: Tobias Brunner
    Co-developed-by: Eyal Birger
    Co-developed-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert
    Signed-off-by: Tobias Brunner
    Signed-off-by: Eyal Birger
    Signed-off-by: Lorenzo Colitti

    Steffen Klassert
     

30 Mar, 2018

1 commit

  • Currently, driver registers it from pernet_operations::init method,
    and this breaks modularity, because initialization of net namespace
    and netdevice notifiers are orthogonal actions. We don't have
    per-namespace netdevice notifiers; all of them are global for all
    devices in all namespaces.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

05 Mar, 2018

1 commit

  • If you take a GSO skb, and split it into packets, will the network
    length (L3 headers + L4 headers + payload) of those packets be small
    enough to fit within a given MTU?

    skb_gso_validate_mtu gives you the answer to that question. However,
    we recently added to add a way to validate the MAC length of a split GSO
    skb (L2+L3+L4+payload), and the names get confusing, so rename
    skb_gso_validate_mtu to skb_gso_validate_network_len

    Signed-off-by: Daniel Axtens
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Daniel Axtens
     

26 Jan, 2018

1 commit


25 Jan, 2018

1 commit


19 Jan, 2018

1 commit


18 Jan, 2018

1 commit


21 Dec, 2017

1 commit

  • This adds a check for the required add and delete functions up front
    at registration time to be sure both are defined.

    Since both the features check and the registration check are looking
    at the same things, break out the check for both to call.

    Lastly, for some reason the feature check was setting xfrmdev_ops to
    NULL if the NETIF_F_HW_ESP bit was missing, which would probably
    surprise the driver later if the driver turned its NETIF_F_HW_ESP bit
    back on. We shouldn't be messing with the driver's callback list, so
    we stop doing that with this patch.

    Signed-off-by: Shannon Nelson
    Signed-off-by: Steffen Klassert

    Shannon Nelson
     

20 Dec, 2017

3 commits


16 Dec, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-12-15

    1) Currently we can add or update socket policies, but
    not clear them. Support clearing of socket policies
    too. From Lorenzo Colitti.

    2) Add documentation for the xfrm device offload api.
    From Shannon Nelson.

    3) Fix IPsec extended sequence numbers (ESN) for
    IPsec offloading. From Yossef Efraim.

    4) xfrm_dev_state_add function returns success even for
    unsupported options, fix this to fail in such cases.
    From Yossef Efraim.

    5) Remove a redundant xfrm_state assignment.
    From Aviv Heller.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Dec, 2017

1 commit

  • xfrm_dev_state_add function returns success for unsupported HW SA options.
    Resulting the calling function to create SW SA without corrlating HW SA.
    Desipte IPSec device offloading option was chosen.
    These not supported HW SA options are hard coded within xfrm_dev_state_add
    function.
    SW backward compatibility will break if we add any of these option as old
    HW will fail with new SW.

    This patch changes the behaviour to return -EINVAL in case unsupported
    option is chosen.
    Notifying user application regarding failure and not breaking backward
    compatibility for newly added HW SA options.

    Signed-off-by: Yossef Efraim
    Signed-off-by: Steffen Klassert

    Yossef Efraim
     

30 Nov, 2017

2 commits

  • The first member of an IPSEC route bundle chain sets it's dst->path to
    the underlying ipv4/ipv6 route that carries the bundle.

    Stated another way, if one were to follow the xfrm_dst->child chain of
    the bundle, the final non-NULL pointer would be the path and point to
    either an ipv4 or an ipv6 route.

    This is largely used to make sure that PMTU events propagate down to
    the correct ipv4 or ipv6 route.

    When we don't have the top of an IPSEC bundle 'dst->path == dst'.

    Move it down into xfrm_dst and key off of dst->xfrm.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David Miller
     
  • XFRM bundle child chains look like this:

    xdst1 --> xdst2 --> xdst3 --> path_dst

    All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
    The final child pointer in the chain, here called 'path_dst', is some
    other kind of route such as an ipv4 or ipv6 one.

    The xfrm output path pops routes, one at a time, via the child
    pointer, until we hit one which has a dst->xfrm pointer which
    is NULL.

    We can easily preserve the above mechanisms with child sitting
    only in the xfrm_dst structure. All children in the chain
    before we break out of the xfrm_output() loop have dst->xfrm
    non-NULL and are therefore xfrm_dst objects.

    Since we break out of the loop when we find dst->xfrm NULL, we
    will not try to dereference 'dst' as if it were an xfrm_dst.

    Signed-off-by: David S. Miller

    David Miller
     

11 Sep, 2017

1 commit


11 Aug, 2017

1 commit

  • On systems that use mark-based routing it may be necessary for
    routing lookups to use marks in order for packets to be routed
    correctly. An example of such a system is Android, which uses
    socket marks to route packets via different networks.

    Currently, routing lookups in tunnel mode always use a mark of
    zero, making routing incorrect on such systems.

    This patch adds a new output_mark element to the xfrm state and
    a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
    mark differs from the existing xfrm mark in two ways:

    1. The xfrm mark is used to match xfrm policies and states, while
    the xfrm output mark is used to set the mark (and influence
    the routing) of the packets emitted by those states.
    2. The existing mark is constrained to be a subset of the bits of
    the originating socket or transformed packet, but the output
    mark is arbitrary and depends only on the state.

    The use of a separate mark provides additional flexibility. For
    example:

    - A packet subject to two transforms (e.g., transport mode inside
    tunnel mode) can have two different output marks applied to it,
    one for the transport mode SA and one for the tunnel mode SA.
    - On a system where socket marks determine routing, the packets
    emitted by an IPsec tunnel can be routed based on a mark that
    is determined by the tunnel, not by the marks of the
    unencrypted packets.
    - Support for setting the output marks can be introduced without
    breaking any existing setups that employ both mark-based
    routing and xfrm tunnel mode. Simply changing the code to use
    the xfrm mark for routing output packets could xfrm mark could
    change behaviour in a way that breaks these setups.

    If the output mark is unspecified or set to zero, the mark is not
    set or changed.

    Tested: make allyesconfig; make -j64
    Tested: https://android-review.googlesource.com/452776
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert

    Lorenzo Colitti
     

02 Aug, 2017

1 commit

  • IPSec crypto offload depends on the protocol-specific
    offload module (such as esp_offload.ko).

    When the user installs an SA with crypto-offload, load
    the offload module automatically, in the same way
    that the protocol module is loaded (such as esp.ko)

    Signed-off-by: Ilan Tayari
    Signed-off-by: Steffen Klassert

    Ilan Tayari
     

19 Jul, 2017

2 commits

  • retain last used xfrm_dst in a pcpu cache.
    On next request, reuse this dst if the policies are the same.

    The cache will not help with strict RR workloads as there is no hit.

    The cache packet-path part is reasonably small, the notifier part is
    needed so we do not add long hangs when a device is dismantled but some
    pcpu xdst still holds a reference, there are also calls to the flush
    operation when userspace deletes SAs so modules can be removed
    (there is no hit.

    We need to run the dst_release on the correct cpu to avoid races with
    packet path. This is done by adding a work_struct for each cpu and then
    doing the actual test/release on each affected cpu via schedule_work_on().

    Test results using 4 network namespaces and null encryption:

    ns1 ns2 -> ns3 -> ns4
    netperf -> xfrm/null enc -> xfrm/null dec -> netserver

    what TCP_STREAM UDP_STREAM UDP_RR
    Flow cache: 14644.61 294.35 327231.64
    No flow cache: 14349.81 242.64 202301.72
    Pcpu cache: 14629.70 292.21 205595.22

    UDP tests used 64byte packets, tests ran for one minute each,
    value is average over ten iterations.

    'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
    series but without this patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • After rcu conversions performance degradation in forward tests isn't that
    noticeable anymore.

    See next patch for some numbers.

    A followup patcg could then also remove genid from the policies
    as we do not cache bundles anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

01 Jul, 2017

1 commit


24 Jun, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-06-23

    1) Use memdup_user to spmlify xfrm_user_policy.
    From Geliang Tang.

    2) Make xfrm_dev_register static to silence a sparse warning.
    From Wei Yongjun.

    3) Use crypto_memneq to check the ICV in the AH protocol.
    From Sabrina Dubroca.

    4) Remove some unused variables in esp6.
    From Stephen Hemminger.

    5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
    From Antony Antony.

    6) Include the UDP encapsulation port to km_migrate announcements.
    From Antony Antony.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Jun, 2017

1 commit

  • In commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") we
    make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
    But this will make xfrm_dev_event() missing if we only enable default XFRM
    options.

    Then if we set down and unregister an interface with IPsec on it. there
    will no xfrm_garbage_collect(), which will cause dev usage count hold and
    get error like:

    unregister_netdevice: waiting for to become free. Usage count = 4

    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Hangbin Liu
    Signed-off-by: Steffen Klassert

    Hangbin Liu
     

19 May, 2017

1 commit


08 May, 2017

1 commit

  • Upon NETDEV_DOWN event, all xfrm_state objects which are bound to
    the device are flushed.

    The condition for this is wrong, though, testing dev->hw_features
    instead of dev->features. If a device has non-user-modifiable
    NETIF_F_HW_ESP, then its xfrm_state objects are not flushed,
    causing a crash later on after the device is deleted.

    Check dev->features instead of dev->hw_features.

    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Ilan Tayari
    Signed-off-by: Steffen Klassert

    Ilan Tayari
     

14 Apr, 2017

3 commits

  • When we do IPsec offloading, we need a fallback for
    packets that were targeted to be IPsec offloaded but
    rerouted to a device that does not support IPsec offload.
    For that we add a function that checks the offloading
    features of the sending device and and flags the
    requirement of a fallback before it calls the IPsec
    output function. The IPsec output function adds the IPsec
    trailer and does encryption if needed.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • This patch adds all the bits that are needed to do
    IPsec hardware offload for IPsec states and ESP packets.
    We add xfrmdev_ops to the net_device. xfrmdev_ops has
    function pointers that are needed to manage the xfrm
    states in the hardware and to do a per packet
    offloading decision.

    Joint work with:
    Ilan Tayari
    Guy Shapiro
    Yossi Kuperman

    Signed-off-by: Guy Shapiro
    Signed-off-by: Ilan Tayari
    Signed-off-by: Yossi Kuperman
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • This is needed for the upcomming IPsec device offloading.

    Signed-off-by: Steffen Klassert

    Steffen Klassert