31 Aug, 2022
4 commits
-
[ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]
While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.While at it, we remove the unnecessary spaces in the doc.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit 17ecd4a4db4783392edd4944f5e8268205083f70 ]
When we try to transmit an skb with metadata_dst attached (i.e. dst->dev
== NULL) through xfrm interface we can hit a null pointer dereference[1]
in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a
loopback skb device when there's no policy which dereferences dst->dev
unconditionally. Not having dst->dev can be interepreted as it not being
a loopback device, so just add a check for a null dst_orig->dev.With this fix xfrm interface's Tx error counters go up as usual.
[1] net-next calltrace captured via netconsole:
BUG: kernel NULL pointer dereference, address: 00000000000000c0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60
Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02
RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80
RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000
R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0
Call Trace:
xfrmi_xmit+0xde/0x460
? tcf_bpf_act+0x13d/0x2a0
dev_hard_start_xmit+0x72/0x1e0
__dev_queue_xmit+0x251/0xd30
ip_finish_output2+0x140/0x550
ip_push_pending_frames+0x56/0x80
raw_sendmsg+0x663/0x10a0
? try_charge_memcg+0x3fd/0x7a0
? __mod_memcg_lruvec_state+0x93/0x110
? sock_sendmsg+0x30/0x40
sock_sendmsg+0x30/0x40
__sys_sendto+0xeb/0x130
? handle_mm_fault+0xae/0x280
? do_user_addr_fault+0x1e7/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7ff35cac1366
Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366
RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003
RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001
Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole]
CR2: 00000000000000c0CC: Steffen Klassert
CC: Daniel Borkmann
Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ]
x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
migrate.Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]
The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.Fix it by decreasing the reference count of `pols[0]` in that path.
Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong
Signed-off-by: Xin Tan
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
29 Jul, 2022
2 commits
-
[ Upstream commit 0968d2a441bf6afb551fd99e60fa65ed67068963 ]
While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin -
[ Upstream commit f85daf0e725358be78dfd208dea5fd665d8cb901 ]
xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
pols[0]. This refcount can be dropped in xfrm_expand_policies() when
xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
num_pols == 1 to drop this refcount when xfrm_expand_policies() return
error.This patch also fix an illegal address access. pols[0] will save a error
point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
an illegal address in xfrm_bundle_lookup's error path.Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.
Fixes: 80c802f3073e ("xfrm: cache bundles instead of policies for outgoing flows")
Signed-off-by: Hangyu Hua
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
25 May, 2022
1 commit
-
[ Upstream commit b58b1f563ab78955d37e9e43e02790a85c66ac05 ]
This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API
complete"). The goal is to align userland API to the internal structures.Signed-off-by: Nicolas Dichtel
Reviewed-by: Antony Antony
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
08 Apr, 2022
1 commit
-
[ Upstream commit 4ff2980b6bd2aa6b4ded3ce3b7c0ccfab29980af ]
in tunnel mode, if outer interface(ipv4) is less, it is easily to let
inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
is received. When send again, packets are fragmentized with 1280, they
are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().According to RFC4213 Section3.2.2:
if (IPv4 path MTU - 20) is less than 1280
if packet is larger than 1280 bytes
Send ICMPv6 "packet too big" with MTU=1280
Drop packet
else
Encapsulate but do not set the Don't Fragment
flag in the IPv4 header. The resulting IPv4
packet might be fragmented by the IPv4 layer
on the encapsulator or by some router along
the IPv4 path.
endif
else
if packet is larger than (IPv4 path MTU - 20)
Send ICMPv6 "packet too big" with
MTU = (IPv4 path MTU - 20).
Drop packet.
else
Encapsulate and set the Don't Fragment flag
in the IPv4 header.
endif
endif
Packets should be fragmentized with ipv4 outer interface, so change it.After it is fragemtized with ipv4, there will be double fragmenation.
No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
cannot decrypt it rightly.48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000)
51 2002::10 2002::11 180 Echo (ping) request
52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) requestSigned-off-by: Lina Wang
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
19 Mar, 2022
3 commits
-
[ Upstream commit e03c3bba351f99ad932e8f06baa9da1afc418e02 ]
xfrm_migrate cannot handle address family change of an xfrm_state.
The symptons are the xfrm_state will be migrated to a wrong address,
and sending as well as receiving packets wil be broken.This commit fixes it by breaking the original xfrm_state_clone
method into two steps so as to update the props.family before
running xfrm_init_state. As the result, xfrm_state's inner mode,
outer mode, type and IP header length in xfrm_state_migrate can
be updated with the new address family.Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1885354Signed-off-by: Yan Yan
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit c1aca3080e382886e2e58e809787441984a2f89b ]
This patch enables distinguishing SAs and SPs based on if_id during
the xfrm_migrate flow. This ensures support for xfrm interfaces
throughout the SA/SP lifecycle.When there are multiple existing SPs with the same direction,
the same xfrm_selector and different endpoint addresses,
xfrm_migrate might fail with ENODATA.Specifically, the code path for performing xfrm_migrate is:
Stage 1: find policy to migrate with
xfrm_migrate_policy_find(sel, dir, type, net)
Stage 2: find and update state(s) with
xfrm_migrate_state_find(mp, net)
Stage 3: update endpoint address(es) of template(s) with
xfrm_policy_migrate(pol, m, num_migrate)Currently "Stage 1" always returns the first xfrm_policy that
matches, and "Stage 3" looks for the xfrm_tmpl that matches the
old endpoint address. Thus if there are multiple xfrm_policy
with same selector, direction, type and net, "Stage 1" might
rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA
because it cannot find a xfrm_tmpl with the matching endpoint
address.The fix is to allow userspace to pass an if_id and add if_id
to the matching rule in Stage 1 and Stage 2 since if_id is a
unique ID for xfrm_policy and xfrm_state. For compatibility,
if_id will only be checked if the attribute is set.Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1668886Signed-off-by: Yan Yan
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
commit a3d9001b4e287fc043e5539d03d71a32ab114bcb upstream.
This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID
0 was meant to be used for configuring the policy/state without
matching for a specific interface (e.g., Cilium is affected, see
https://github.com/cilium/cilium/pull/18789 and
https://github.com/cilium/cilium/pull/19019).Signed-off-by: Kai Lueke
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2022
3 commits
-
commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.
This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.
Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
calculation in ipsec transport mode, resulting complete stalls of TCP
connections. This happens when the (P)MTU is 1280 or slighly larger.The desired formula for the MSS is:
MSS = (MTU - ESP_overhead) - IP header - TCP headerHowever, the above commit clamps the (MTU - ESP_overhead) to a
minimum of 1280, turning the formula into
MSS = max(MTU - ESP overhead, 1280) - IP header - TCP headerWith the (P)MTU near 1280, the calculated MSS is too large and the
resulting TCP packets never make it to the destination because they
are over the actual PMTU.The above commit also causes suboptimal double fragmentation in
xfrm tunnel mode, as described in
https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/The original problem the above commit was trying to fix is now fixed
by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
regression").Signed-off-by: Jiri Bohac
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream.
struct xfrm_user_offload has flags variable that received user input,
but kernel didn't check if valid bits were provided. It caused a situation
where not sanitized input was forwarded directly to the drivers.For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
strongswan, but not implemented in the kernel at all.As a solution, check and sanitize input flags to forward
XFRM_OFFLOAD_INBOUND to the drivers.Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Leon Romanovsky
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream.
if_id will be always 0, because it was not yet initialized.
Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error")
Reported-by: Pavel Machek
Signed-off-by: Antony Antony
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman
27 Jan, 2022
7 commits
-
commit 23e7b1bfed61e301853b5e35472820d919498278 upstream.
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"),
clear the ECN bits from iph->tos when setting ->flowi4_tos.
This ensures that the last bit of ->flowi4_tos is cleared, so
ip_route_output_key_hash() isn't going to restrict the scope of the
route lookup.Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
to clear the high order bits.Found by code inspection, compile tested only.
Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups")
Signed-off-by: Guillaume Nault
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman -
commit bcf141b2eb551b3477b24997ebc09c65f117a803 upstream.
On egress side, xfrm lookup is called from __gre6_xmit() with the
fl6_gre_key field not initialized leading to policies selectors check
failure. Consequently, gre packets are sent without encryption.On ingress side, INET6_PROTO_NOPOLICY was set, thus packets were not
checked against xfrm policies. Like for egress side, fl6_gre_key should be
correctly set, this is now done in decode_session6().Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Cc: stable@vger.kernel.org
Signed-off-by: Ghalem Boudour
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 4e484b3e969b52effd95c17f7a86f39208b2ccf4 ]
Kernel generates mapping change message, XFRM_MSG_MAPPING,
when a source port chage is detected on a input state with UDP
encapsulation set. Kernel generates a message for each IPsec packet
with new source port. For a high speed flow per packet mapping change
message can be excessive, and can overload the user space listener.Introduce rate limiting for XFRM_MSG_MAPPING message to the user space.
The rate limiting is configurable via netlink, when adding a new SA or
updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds.v1->v2 change:
update xfrm_sa_len()v2->v3 changes:
use u32 insted unsigned long to reduce size of struct xfrm_state
fix xfrm_ompat size Reported-by: kernel test robot
accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is presentCo-developed-by: Thomas Egerer
Signed-off-by: Thomas Egerer
Signed-off-by: Antony Antony
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 45a98ef4922def8c679ca7c454403d1957fe70e7 ]
The inner_ipproto saves the inner IP protocol of the plain
text packet. This allows vendor's IPsec feature making offload
decision at skb's features_check and configuring hardware at
ndo_start_xmit, current code implenetation did not handle the
case where IPsec is used in tunnel mode.Fix by handling the case when IPsec is used in tunnel mode by
reading the protocol of the plain text packet IP protocol.Fixes: fa4535238fb5 ("net/xfrm: Add inner_ipproto into sec_path")
Signed-off-by: Raed Salem
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 ]
xfrm ineterface does not allow xfrm if_id = 0
fail to create or update xfrm state and policy.With this commit:
ip xfrm policy add src 192.0.2.1 dst 192.0.2.2 dir out if_id 0
RTNETLINK answers: Invalid argumentip xfrm state add src 192.0.2.1 dst 192.0.2.2 proto esp spi 1 \
reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \
0x1111111111111111111111111111111111111111 96 if_id 0
RTNETLINK answers: Invalid argumentv1->v2 change:
- add Fixes: tagFixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces")
Signed-off-by: Antony Antony
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 8dce43919566f06e865f7e8949f5c10d8c2493f5 ]
xfrm interface if_id = 0 would cause xfrm policy lookup errors since
Commit 9f8550e4bd9d.Now explicitly fail to create an xfrm interface when if_id = 0
With this commit:
ip link add ipsec0 type xfrm dev lo if_id 0
Error: if_id must be non zero.v1->v2 change:
- add Fixes: tagFixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces")
Signed-off-by: Antony Antony
Reviewed-by: Eyal Birger
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit 7770a39d7c63faec6c4f33666d49a8cb664d0482 ]
copy_user_offload() will actually push a struct struct xfrm_user_offload,
which is different than (struct xfrm_state *)->xso
(struct xfrm_state_offload)Fixes: d77e38e612a01 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Eric Dumazet
Cc: Steffen Klassert
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
23 Sep, 2021
1 commit
-
As stated in the comment above xfrm_nlmsg_multicast(), rcu read lock must
be held before calling this function.Reported-by: syzbot+3d9866419b4aa8f985d6@syzkaller.appspotmail.com
Fixes: 703b94b93c19 ("xfrm: notify default policy on update")
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
15 Sep, 2021
2 commits
-
This configuration knob is very sensible, it should be notified when
changing.Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert -
>From a userland POV, this API was based on some magic values:
- dirmask and action were bitfields but meaning of bits
(XFRM_POL_DEFAULT_*) are not exported;
- action is confusing, if a bit is set, does it mean drop or accept?Let's try to simplify this uapi by using explicit field and macros.
Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
09 Sep, 2021
1 commit
-
Syzbot hit shift-out-of-bounds in xfrm_get_default. The problem was in
missing validation check for user data.up->dirmask comes from user-space, so we need to check if this value
is less than XFRM_USERPOLICY_DIRMASK_MAX to avoid shift-out-of-bounds bugs.Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Reported-and-tested-by: syzbot+b2be9dd8ca6f6c73ee2d@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin
Signed-off-by: Steffen Klassert
27 Aug, 2021
1 commit
-
ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-08-271) Remove an unneeded extra variable in esp4 esp_ssg_unref.
From Corey Minyard.2) Add a configuration option to change the default behaviour
to block traffic if there is no matching policy.
Joint work with Christian Langrock and Antony Antony.3) Fix a shift-out-of-bounce bug reported from syzbot.
From Pavel Skripkin.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
04 Aug, 2021
1 commit
-
Steffen Klassert says:
====================
pull request (net): ipsec 2021-08-041) Fix a sysbot reported memory leak in xfrm_user_rcv_msg.
From Pavel Skripkin.2) Revert "xfrm: policy: Read seqcount outside of rcu-read side
in xfrm_policy_lookup_bytype". This commit tried to fix a
lockin bug, but only cured some of the symptoms. A proper
fix is applied on top of this revert.3) Fix a locking bug on xfrm state hash resize. A recent change
on sequence counters accidentally repaced a spinlock by a mutex.
Fix from Frederic Weisbecker.4) Fix possible user-memory-access in xfrm_user_rcv_msg_compat().
From Dmitry Safonov.5) Add initialiation sefltest fot xfrm_spdattr_type_t.
From Dmitry Safonov.
====================Signed-off-by: David S. Miller
29 Jul, 2021
1 commit
-
We need to check up->dirmask to avoid shift-out-of-bounce bug,
since up->dirmask comes from userspace.Also, added XFRM_USERPOLICY_DIRMASK_MAX constant to uapi to inform
user-space that up->dirmask has maximum possible valueFixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
Reported-and-tested-by: syzbot+9cd5837a045bbee5b810@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin
Signed-off-by: Steffen Klassert
26 Jul, 2021
1 commit
-
The list_for_each_entry() iterator, "pos" in this code, can never be
NULL so the warning will never be printed.Signed-off-by: Harshvardhan Jha
Signed-off-by: Steffen Klassert
21 Jul, 2021
2 commits
-
The attribute-translator has to take in mind maxtype, that is
xfrm_link::nla_max. When it is set, attributes are not of xfrm_attr_type_t.
Currently, they can be only XFRMA_SPD_MAX (message XFRM_MSG_NEWSPDINFO),
their UABI is the same for 64/32-bit, so just copy them.Thanks to YueHaibing for reporting this:
In xfrm_user_rcv_msg_compat() if maxtype is not zero and less than
XFRMA_MAX, nlmsg_parse_deprecated() do not initialize attrs array fully.
xfrm_xlate32() will access uninit 'attrs[i]' while iterating all attrs
array.KASAN: probably user-memory-access in range [0x0000000041b58ab0-0x0000000041b58ab7]
CPU: 0 PID: 15799 Comm: syz-executor.2 Tainted: G W 5.14.0-rc1-syzkaller #0
RIP: 0010:nla_type include/net/netlink.h:1130 [inline]
RIP: 0010:xfrm_xlate32_attr net/xfrm/xfrm_compat.c:410 [inline]
RIP: 0010:xfrm_xlate32 net/xfrm/xfrm_compat.c:532 [inline]
RIP: 0010:xfrm_user_rcv_msg_compat+0x5e5/0x1070 net/xfrm/xfrm_compat.c:577
[...]
Call Trace:
xfrm_user_rcv_msg+0x556/0x8b0 net/xfrm/xfrm_user.c:2774
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2824
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:702 [inline]Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Cc:
Reported-by: YueHaibing
Signed-off-by: Dmitry Safonov
Signed-off-by: Steffen Klassert -
As the default we assume the traffic to pass, if we have no
matching IPsec policy. With this patch, we have a possibility to
change this default from allow to block. It can be configured
via netlink. Each direction (input/output/forward) can be
configured separately. With the default to block configuered,
we need allow policies for all packet flows we accept.
We do not use default policy lookup for the loopback device.v1->v2
- fix compiling when XFRM is disabled
- Reported-by: kernel test robotCo-developed-by: Christian Langrock
Signed-off-by: Christian Langrock
Co-developed-by: Antony Antony
Signed-off-by: Antony Antony
Signed-off-by: Steffen Klassert
02 Jul, 2021
2 commits
-
xfrm_bydst_resize() calls synchronize_rcu() while holding
hash_resize_mutex. But then on PREEMPT_RT configurations,
xfrm_policy_lookup_bytype() may acquire that mutex while running in an
RCU read side critical section. This results in a deadlock.In fact the scope of hash_resize_mutex is way beyond the purpose of
xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy
for a given destination/direction, along with other details.The lower level net->xfrm.xfrm_policy_lock, which among other things
protects per destination/direction references to policy entries, is
enough to serialize and benefit from priority inheritance against the
write side. As a bonus, it makes it officially a per network namespace
synchronization business where a policy table resize on namespace A
shouldn't block a policy lookup on namespace B.Fixes: 77cc278f7b20 (xfrm: policy: Use sequence counters with associated lock)
Cc: stable@vger.kernel.org
Cc: Ahmed S. Darwish
Cc: Peter Zijlstra (Intel)
Cc: Varad Gautam
Cc: Steffen Klassert
Cc: Herbert Xu
Cc: David S. Miller
Signed-off-by: Frederic Weisbecker
Signed-off-by: Steffen Klassert -
This reverts commit d7b0408934c749f546b01f2b33d07421a49b6f3e.
This commit tried to fix a locking bug introduced by commit 77cc278f7b20
("xfrm: policy: Use sequence counters with associated lock"). As it
turned out, this patch did not really fix the bug. A proper fix
for this bug is applied on top of this revert.Signed-off-by: Steffen Klassert
01 Jul, 2021
2 commits
-
Pull networking updates from Jakub Kicinski:
"Core:- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_reportDevice APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)- don't require RCU read lock to be held around BPF hooks in NAPI
context- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
... -
Pull SELinux updates from Paul Moore:
- The slow_avc_audit() function is now non-blocking so we can remove
the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of
avc_has_perm().- Use kmemdup() instead of kcalloc()+copy when copying parts of the
SELinux policydb.- The InfiniBand device name is now passed by reference when possible
in the SELinux code, removing a strncpy().- Minor cleanups including: constification of avtab function args,
removal of useless LSM/XFRM function args, SELinux kdoc fixes, and
removal of redundant assignments.* tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
selinux: slow_avc_audit has become non-blocking
selinux: Fix kernel-doc
selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
lsm_audit,selinux: pass IB device name by reference
selinux: Remove redundant assignment to rc
selinux: Corrected comment to match kernel-doc comment
selinux: delete selinux_xfrm_policy_lookup() useless argument
selinux: constify some avtab function arguments
selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
30 Jun, 2021
1 commit
-
Trivial conflict in net/netfilter/nf_tables_api.c.
Duplicate fix in tools/testing/selftests/net/devlink_port_split.py
- take the net-next version.skmsg, and L4 bpf - keep the bpf code but remove the flags
and err params.Signed-off-by: Jakub Kicinski
29 Jun, 2021
2 commits
-
Syzbot reported memory leak in xfrm_user_rcv_msg(). The
problem was is non-freed skb's frag_list.In skb_release_all() skb_release_data() will be called only
in case of skb->head != NULL, but netlink_skb_destructor()
sets head to NULL. So, allocated frag_list skb should be
freed manualy, since consume_skb() won't take care of itFixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Reported-and-tested-by: syzbot+fb347cf82c73a90efcca@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin
Signed-off-by: Steffen Klassert -
/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-06-281) Remove an unneeded error assignment in esp4_gro_receive().
From Yang Li.2) Add a new byseq state hashtable to find acquire states faster.
From Sabrina Dubroca.3) Remove some unnecessary variables in pfkey_create().
From zuoqilin.4) Remove the unused description from xfrm_type struct.
From Florian Westphal.5) Fix a spelling mistake in the comment of xfrm_state_ok().
From gushengxian.6) Replace hdr_off indirections by a small helper function.
From Florian Westphal.7) Remove xfrm4_output_finish and xfrm6_output_finish declarations,
they are not used anymore.From Antony Antony.8) Remove xfrm replay indirections.
From Florian Westphal.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
24 Jun, 2021
1 commit
-
Steffen Klassert says:
====================
pull request (net): ipsec 2021-06-231) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery.
From Sabrina Dubroca2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype
for the PREEMPT_RT case. From Varad Gautam.3) Remove a repeated declaration of xfrm_parse_spi.
From Shaokun Zhang.4) IPv4 beet mode can't handle fragments, but IPv6 does.
commit 68dc022d04eb ("xfrm: BEET mode doesn't support
fragments for inner packets") handled IPv4 and IPv6
the same way. Relax the check for IPv6 because fragments
are possible here. From Xin Long.5) Memory allocation failures are not reported for
XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct.
Fix this by moving both cases in front of the function.6) Fix a missing initialization in the xfrm offload fallback
fail case for bonding devices. From Ayush Sawal.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
23 Jun, 2021
1 commit
-
The inner_ipproto saves the inner IP protocol of the plain
text packet. This allows vendor's IPsec feature making offload
decision at skb's features_check and configuring hardware at
ndo_start_xmit.For example, ConnectX6-DX IPsec device needs the plaintext's
IP protocol to support partial checksum offload on
VXLAN/GENEVE packet over IPsec transport mode tunnel.Signed-off-by: Raed Salem
Signed-off-by: Huy Nguyen
Cc: Steffen Klassert
Acked-by: Steffen Klassert
Signed-off-by: Saeed Mahameed