03 Mar, 2018
1 commit
-
[ Upstream commit acf568ee859f098279eadf551612f103afdacb4e ]
This is an old bugbear of mine:
https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html
By crafting special packets, it is possible to cause recursion
in our kernel when processing transport-mode packets at levels
that are only limited by packet size.The easiest one is with DNAT, but an even worse one is where
UDP encapsulation is used in which case you just have to insert
an UDP encapsulation header in between each level of recursion.This patch avoids this problem by reinjecting tranport-mode packets
through a tasklet.Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks")
Signed-off-by: Herbert Xu
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
31 Aug, 2017
1 commit
-
In conjunction with crypto offload [1], removing the ESP trailer by
hardware can potentially improve the performance by avoiding (1) a
cache miss incurred by reading the nexthdr field and (2) the necessity
to calculate the csum value of the trailer in order to keep skb->csum
valid.This patch introduces the changes to the xfrm stack and merely serves
as an infrastructure. Subsequent patch to mlx5 driver will put this to
a good use.[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.html
Signed-off-by: Yossi Kuperman
Signed-off-by: Steffen Klassert
11 Aug, 2017
1 commit
-
On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:1. The xfrm mark is used to match xfrm policies and states, while
the xfrm output mark is used to set the mark (and influence
the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
the originating socket or transformed packet, but the output
mark is arbitrary and depends only on the state.The use of a separate mark provides additional flexibility. For
example:- A packet subject to two transforms (e.g., transport mode inside
tunnel mode) can have two different output marks applied to it,
one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
emitted by an IPsec tunnel can be routed based on a mark that
is determined by the tunnel, not by the marks of the
unencrypted packets.
- Support for setting the output marks can be introduced without
breaking any existing setups that employ both mark-based
routing and xfrm tunnel mode. Simply changing the code to use
the xfrm mark for routing output packets could xfrm mark could
change behaviour in a way that breaks these setups.If the output mark is unspecified or set to zero, the mark is not
set or changed.Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776
Signed-off-by: Lorenzo Colitti
Signed-off-by: Steffen Klassert
02 Aug, 2017
2 commits
-
This patch allows local sockets to make use of XFRM GSO code path.
Signed-off-by: Steffen Klassert
Signed-off-by: Ilan Tayari -
IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)Signed-off-by: Ilan Tayari
Signed-off-by: Steffen Klassert
19 Jul, 2017
2 commits
-
retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.The cache will not help with strict RR workloads as there is no hit.
The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.We need to run the dst_release on the correct cpu to avoid races with
packet path. This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().Test results using 4 network namespaces and null encryption:
ns1 ns2 -> ns3 -> ns4
netperf -> xfrm/null enc -> xfrm/null dec -> netserverwhat TCP_STREAM UDP_STREAM UDP_RR
Flow cache: 14644.61 294.35 327231.64
No flow cache: 14349.81 242.64 202301.72
Pcpu cache: 14629.70 292.21 205595.22UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller -
After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.See next patch for some numbers.
A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller
05 Jul, 2017
3 commits
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
01 Jul, 2017
1 commit
-
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.Signed-off-by: David S. Miller
24 Jun, 2017
1 commit
-
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-06-231) Use memdup_user to spmlify xfrm_user_policy.
From Geliang Tang.2) Make xfrm_dev_register static to silence a sparse warning.
From Wei Yongjun.3) Use crypto_memneq to check the ICV in the AH protocol.
From Sabrina Dubroca.4) Remove some unused variables in esp6.
From Stephen Hemminger.5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
From Antony Antony.6) Include the UDP encapsulation port to km_migrate announcements.
From Antony Antony.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
07 Jun, 2017
3 commits
-
Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
to userland. Only add if XFRMA_ENCAP was in user migrate request.Signed-off-by: Antony Antony
Reviewed-by: Richard Guy Briggs
Signed-off-by: Steffen Klassert -
Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
netlink attribute XFRMA_ENCAP.The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
could go to sleep for a few minutes and wake up. When it wake up the
NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
message to migrate the IPsec SA. The change could be a change UDP
encapsulation port, IP address, or both.Reported-by: Paul Wouters
Signed-off-by: Antony Antony
Reviewed-by: Richard Guy Briggs
Signed-off-by: Steffen Klassert -
In commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") we
make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
But this will make xfrm_dev_event() missing if we only enable default XFRM
options.Then if we set down and unregister an interface with IPsec on it. there
will no xfrm_garbage_collect(), which will cause dev usage count hold and
get error like:unregister_netdevice: waiting for to become free. Usage count = 4
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Hangbin Liu
Signed-off-by: Steffen Klassert
04 May, 2017
1 commit
-
When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
that dst. Unfortunately, the code that allocates and fills this copy
doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
passed. In multiple code paths (from raw_sendmsg, from TCP when
replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
passed to xfrm is actually an on-stack flowi4, so we end up reading
stuff from the stack past the end of the flowi4 struct.Since xfrm_dst->origin isn't used anywhere following commit
ca116922afa8 ("xfrm: Eliminate "fl" and "pol" args to
xfrm_bundle_ok()."), just get rid of it. xfrm_dst->partner isn't used
either, so get rid of that too.Fixes: 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces.")
Signed-off-by: Sabrina Dubroca
Signed-off-by: Steffen Klassert
14 Apr, 2017
5 commits
-
When we do IPsec offloading, we need a fallback for
packets that were targeted to be IPsec offloaded but
rerouted to a device that does not support IPsec offload.
For that we add a function that checks the offloading
features of the sending device and and flags the
requirement of a fallback before it calls the IPsec
output function. The IPsec output function adds the IPsec
trailer and does encryption if needed.Signed-off-by: Steffen Klassert
-
This patch adds all the bits that are needed to do
IPsec hardware offload for IPsec states and ESP packets.
We add xfrmdev_ops to the net_device. xfrmdev_ops has
function pointers that are needed to manage the xfrm
states in the hardware and to do a per packet
offloading decision.Joint work with:
Ilan Tayari
Guy Shapiro
Yossi KupermanSigned-off-by: Guy Shapiro
Signed-off-by: Ilan Tayari
Signed-off-by: Yossi Kuperman
Signed-off-by: Steffen Klassert -
This patch adds a gso_segment and xmit callback for the
xfrm_mode and implement these functions for tunnel and
transport mode.Signed-off-by: Steffen Klassert
-
This is needed for the upcomming IPsec device offloading.
Signed-off-by: Steffen Klassert
-
We add a struct xfrm_type_offload so that we have the offloaded
codepath separated to the non offloaded codepath. With this the
non offloade and the offloaded codepath can coexist.Signed-off-by: Steffen Klassert
27 Mar, 2017
1 commit
-
Current addr4_match() code has special test for /0 prefixes because of
standard required undefined behaviour. However, it is possible to omit
it on 64-bit because shifting can be done within a 64-bit register and
then truncated to the expected value (which is 0 mask).Implicit truncation by htonl() fits nicely into R32-within-R64 model
on x86-64.Space savings: none (coincidence)
Branch savings: 1Before:
movzx eax,BYTE PTR [rdi+0x2a] # ->prefixlen_d
test al,al
jne xfrm_selector_match + 0x23f
...
movzx eax,BYTE PTR [rbx+0x2b] # ->prefixlen_s
test al,al
je xfrm_selector_match + 0x1c7After (no branches):
mov r8d,0x20
mov rdx,0xffffffffffffffff
mov esi,DWORD PTR [rsi+0x2c]
mov ecx,r8d
sub cl,BYTE PTR [rdi+0x2a]
xor esi,DWORD PTR [rbx]
mov rdi,rdx
xor eax,eax
shl rdi,cl
bswap ediSigned-off-by: Alexey Dobriyan
Signed-off-by: Steffen Klassert
24 Mar, 2017
2 commits
-
x86_64 is zero-extending arch so "unsigned int" is preferred over "int"
for address calculations and extending to size_t.Space savings:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-24 (-24)
function old new delta
xfrm_state_walk 708 696 -12
xfrm_selector_match 918 906 -12Signed-off-by: Alexey Dobriyan
Signed-off-by: Steffen Klassert -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Steffen Klassert
15 Feb, 2017
4 commits
-
This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.Signed-off-by: Steffen Klassert
-
We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.Signed-off-by: Steffen Klassert
-
We need it in the ESP offload handlers, so export it.
Signed-off-by: Steffen Klassert
-
Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.Signed-off-by: Steffen Klassert
09 Feb, 2017
4 commits
-
Only needed it to register the policy backend at init time.
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert -
Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert -
Nothing checks the return value. Also, the errors returned on unregister
are impossible (we only support INET and INET6, so no way
xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo'
itself).Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert -
Nothing writes to these structures (the module owner was not used).
While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
17 Jan, 2017
1 commit
-
This patch tries to avoid skb_cow_data on esp4.
On the encrypt side we add the IPsec tailbits
to the linear part of the buffer if there is
space on it. If there is no space on the linear
part, we add a page fragment with the tailbits to
the buffer and use separate src and dst scatterlists.On the decrypt side, we leave the buffer as it is
if it is not cloned.With this, we can avoid a linearization of the buffer
in most of the cases.Joint work with:
Sowmini Varadhan
Ilan TayariSigned-off-by: Sowmini Varadhan
Signed-off-by: Ilan Tayari
Signed-off-by: Steffen Klassert
10 Jan, 2017
2 commits
-
xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert -
commit 44abdc3047aecafc141dfbaf1ed
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.Use spatch to replace it with direct calls to rcu_read_unlock:
@@
struct xfrm_state_afinfo *a;
@@- xfrm_state_put_afinfo(a);
+ rcu_read_unlock();old:
text data bss dec hex filename
22570 72 424 23066 5a1a xfrm_state.o
1612 0 0 1612 64c xfrm_output.o
new:
22554 72 424 23050 5a0a xfrm_state.o
1596 0 0 1596 63c xfrm_output.oSigned-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
23 Sep, 2016
1 commit
21 Sep, 2016
1 commit
-
Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).CC: Alexey Kodanev
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
10 Aug, 2016
1 commit
-
The xfrm_replay structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall
Signed-off-by: Steffen Klassert
28 Apr, 2016
1 commit
-
Not used anymore.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller