05 Aug, 2020
1 commit
-
[ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]
In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
it would take 'priority' to make a policy unique, and allow duplicated
policies with different 'priority' to be added, which is not expected
by userland, as Tobias reported in strongswan.To fix this duplicated policies issue, and also fix the issue in
commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
when doing add/del/get/update on user interfaces, this patch is to change
to look up a policy with both mark and mask by doing:mark.v == pol->mark.v && mark.m == pol->mark.m
and leave the check:
(mark & pol->mark.m) == pol->mark.v
for tx/rx path only.
As the userland expects an exact mark and mask match to manage policies.
v1->v2:
- make xfrm_policy_mark_match inline and fix the changelog as
Tobias suggested.Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
Reported-by: Tobias Brunner
Tested-by: Tobias Brunner
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
01 Jul, 2020
1 commit
-
[ Upstream commit 94579ac3f6d0820adc83b5dc5358ead0158101e9 ]
During IPsec performance testing, we see bad ICMP checksum. The error packet
has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
is from ip_output, but the packet cannot be sent because
netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
call is from NET_TX softirq. However after the first call, the packet already
has the ESP trailer.Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
validate_xmit_xfrm to avoid duplicate ESP trailer insertion.Fixes: f6e27114a60a ("net: Add a xfrm validate function to validate_xmit_skb")
Signed-off-by: Huy Nguyen
Reviewed-by: Boris Pismenny
Reviewed-by: Raed Salem
Reviewed-by: Saeed Mahameed
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
03 Jun, 2020
6 commits
-
commit f6a23d85d078c2ffde79c66ca81d0a1dde451649 upstream.
This patch is to fix a crash:
[ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ ] general protection fault: 0000 [#1] SMP KASAN PTI
[ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
[ ] Call Trace:
[ ] xfrm6_local_error+0x1eb/0x300
[ ] xfrm_local_error+0x95/0x130
[ ] __xfrm6_output+0x65f/0xb50
[ ] xfrm6_output+0x106/0x46f
[ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
[ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
[ ] vxlan_xmit+0x6a0/0x4276 [vxlan]
[ ] dev_hard_start_xmit+0x165/0x820
[ ] __dev_queue_xmit+0x1ff0/0x2b90
[ ] ip_finish_output2+0xd3e/0x1480
[ ] ip_do_fragment+0x182d/0x2210
[ ] ip_output+0x1d0/0x510
[ ] ip_send_skb+0x37/0xa0
[ ] raw_sendmsg+0x1b4c/0x2b80
[ ] sock_sendmsg+0xc0/0x110This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
to get ipv6 info from a ipv4 sk.This issue was actually fixed by Commit 628e341f319f ("xfrm: make local
error reporting more robust"), but brought back by Commit 844d48746e4b
("xfrm: choose protocol family by skb protocol").So to fix it, we should call xfrm6_local_error() only when skb->protocol
is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol")
Reported-by: Xiumei Mu
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit ed17b8d377eaf6b4a01d46942b4c647378a79bdd upstream.
This waring can be triggered simply by:
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 1 mark 0 mask 0x10 #[1]
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 2 mark 0 mask 0x1 #[2]
# ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
priority 2 mark 0 mask 0x10 #[3]Then dmesg shows:
[ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548
[ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030
[ ] Call Trace:
[ ] xfrm_policy_inexact_insert+0x85/0xe50
[ ] xfrm_policy_insert+0x4ba/0x680
[ ] xfrm_add_policy+0x246/0x4d0
[ ] xfrm_user_rcv_msg+0x331/0x5c0
[ ] netlink_rcv_skb+0x121/0x350
[ ] xfrm_netlink_rcv+0x66/0x80
[ ] netlink_unicast+0x439/0x630
[ ] netlink_sendmsg+0x714/0xbf0
[ ] sock_sendmsg+0xe2/0x110The issue was introduced by Commit 7cb8a93968e3 ("xfrm: Allow inserting
policies with matching mark and different priorities"). After that, the
policies [1] and [2] would be able to be added with different priorities.However, policy [3] will actually match both [1] and [2]. Policy [1]
was matched due to the 1st 'return true' in xfrm_policy_mark_match(),
and policy [2] was matched due to the 2nd 'return true' in there. It
caused WARN_ON() in xfrm_policy_insert_list().This patch is to fix it by only (the same value and priority) as the
same policy in xfrm_policy_mark_match().Thanks to Yuehaibing, we could make this fix better.
v1->v2:
- check policy->mark.v == pol->mark.v only without mask.Fixes: 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities")
Reported-by: Xiumei Mu
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit c95c5f58b35ef995f66cb55547eee6093ab5fcb8 upstream.
Here is the steps to reproduce the problem:
ip netns add foo
ip netns add bar
ip -n foo link add xfrmi0 type xfrm dev lo if_id 42
ip -n foo link set xfrmi0 netns bar
ip netns del foo
ip netns del barWhich results to:
[ 186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI
[ 186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1
[ 186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 186.689420] Workqueue: netns cleanup_net
[ 186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface]
[ 186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08
[ 186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286
[ 186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0
[ 186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000
[ 186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040
[ 186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80
[ 186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8
[ 186.697738] FS: 0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000
[ 186.698705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0
[ 186.700221] Call Trace:
[ 186.700508] rollback_registered_many+0x32b/0x3fd
[ 186.701058] ? __rtnl_unlock+0x20/0x3d
[ 186.701494] ? arch_local_irq_save+0x11/0x17
[ 186.702012] unregister_netdevice_many+0x12/0x55
[ 186.702594] default_device_exit_batch+0x12b/0x150
[ 186.703160] ? prepare_to_wait_exclusive+0x60/0x60
[ 186.703719] cleanup_net+0x17d/0x234
[ 186.704138] process_one_work+0x196/0x2e8
[ 186.704652] worker_thread+0x1a4/0x249
[ 186.705087] ? cancel_delayed_work+0x92/0x92
[ 186.705620] kthread+0x105/0x10f
[ 186.706000] ? __kthread_bind_mask+0x57/0x57
[ 186.706501] ret_from_fork+0x35/0x40
[ 186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy
[ 186.710423] ---[ end trace 463bba18105537e5 ]---The problem is that x-netns xfrm interface are not removed when the link
netns is removed. This causes later this oops when thoses interfaces are
removed.Let's add a handler to remove all interfaces related to a netns when this
netns is removed.Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Christophe Gouault
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit a204aef9fd77dce1efd9066ca4e44eede99cd858 upstream.
An use-after-free crash can be triggered when sending big packets over
vxlan over esp with esp offload enabled:[] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
[] Call Trace:
[] dump_stack+0x75/0xa0
[] kasan_report+0x37/0x50
[] ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
[] ipv6_gso_segment+0x2c8/0x13c0
[] skb_mac_gso_segment+0x1cb/0x420
[] skb_udp_tunnel_segment+0x6b5/0x1c90
[] inet_gso_segment+0x440/0x1380
[] skb_mac_gso_segment+0x1cb/0x420
[] esp4_gso_segment+0xae8/0x1709 [esp4_offload]
[] inet_gso_segment+0x440/0x1380
[] skb_mac_gso_segment+0x1cb/0x420
[] __skb_gso_segment+0x2d7/0x5f0
[] validate_xmit_skb+0x527/0xb10
[] __dev_queue_xmit+0x10f8/0x2320 inner_network_header would be
set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can
overwrite the former one. It causes skb_udp_tunnel_segment() to use a
wrong skb->inner_network_header, then the issue occurs.This patch is to fix it by calling xfrm_output_gso() instead when the
inner_protocol is set, in which gso_segment of inner_protocol will be
done first.While at it, also improve some code around.
Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6")
Reported-by: Xiumei Mu
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit 06a0afcfe2f551ff755849ea2549b0d8409fd9a0 upstream.
For transport mode, when ipv6 nexthdr is set, the packet format might
be like:----------------------------------------------------
| | dest | | | | ESP | ESP |
| IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
----------------------------------------------------and in __xfrm_transport_prep():
pskb_pull(skb, skb->mac_len + sizeof(ip6hdr) + x->props.header_len);
it will pull the data pointer to the wrong position, as it missed the
nexthdrs/dest opts.This patch is to fix it by using:
pskb_pull(skb, skb_transport_offset(skb) + x->props.header_len);
as we can be sure transport_header points to ESP header at that moment.
It also fixes a panic when packets with ipv6 nexthdr are sent over
esp6 transport mode:[ 100.473845] kernel BUG at net/core/skbuff.c:4325!
[ 100.478517] RIP: 0010:__skb_to_sgvec+0x252/0x260
[ 100.494355] Call Trace:
[ 100.494829] skb_to_sgvec+0x11/0x40
[ 100.495492] esp6_output_tail+0x12e/0x550 [esp6]
[ 100.496358] esp6_xmit+0x1d5/0x260 [esp6_offload]
[ 100.498029] validate_xmit_xfrm+0x22f/0x2e0
[ 100.499604] __dev_queue_xmit+0x589/0x910
[ 100.502928] ip6_finish_output2+0x2a5/0x5a0
[ 100.503718] ip6_output+0x6c/0x120
[ 100.505198] xfrm_output_resume+0x4bf/0x530
[ 100.508683] xfrm6_output+0x3a/0xc0
[ 100.513446] inet6_csk_xmit+0xa1/0xf0
[ 100.517335] tcp_sendmsg+0x27/0x40
[ 100.517977] sock_sendmsg+0x3e/0x60
[ 100.518648] __sys_sendto+0xee/0x160Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit afcaf61be9d1dbdee5ec186d1dcc67b6b692180f upstream.
For beet mode, when it's ipv6 inner address with nexthdrs set,
the packet format might be:----------------------------------------------------
| outer | | dest | | | ESP | ESP |
| IP hdr | ESP | opts.| TCP | Data | Trailer | ICV |
----------------------------------------------------The nexthdr from ESP could be NEXTHDR_HOP(0), so it should
continue processing the packet when nexthdr returns 0 in
xfrm_input(). Otherwise, when ipv6 nexthdr is set, the
packet will be dropped.I don't see any error cases that nexthdr may return 0. So
fix it by removing the check for nexthdr == 0.Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman
01 Apr, 2020
4 commits
-
commit 4c59406ed00379c8663f8663d82b2537467ce9d7 upstream.
After xfrm_add_policy add a policy, its ref is 2, then
xfrm_policy_timer
read_lock
xp->walk.dead is 0
....
mod_timer()
xfrm_policy_kill
policy->walk.dead = 1
....
del_timer(&policy->timer)
xfrm_pol_put //ref is 1
xfrm_pol_put //ref is 0
xfrm_policy_destroy
call_rcu
xfrm_pol_hold //ref is 1
read_unlock
xfrm_pol_put //ref is 0
xfrm_policy_destroy
call_rcuxfrm_policy_destroy is called twice, which may leads to
double free.Call Trace:
RIP: 0010:refcount_warn_saturate+0x161/0x210
...
xfrm_policy_timer+0x522/0x600
call_timer_fn+0x1b3/0x5e0
? __xfrm_decode_session+0x2990/0x2990
? msleep+0xb0/0xb0
? _raw_spin_unlock_irq+0x24/0x40
? __xfrm_decode_session+0x2990/0x2990
? __xfrm_decode_session+0x2990/0x2990
run_timer_softirq+0x5c5/0x10e0Fix this by use write_lock_bh in xfrm_policy_kill.
Fixes: ea2dea9dacc2 ("xfrm: remove policy lock when accessing policy->walk.dead")
Signed-off-by: YueHaibing
Acked-by: Timo Teräs
Acked-by: Herbert Xu
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit a1a7e3a36e01ca6e67014f8cf673cb8e47be5550 upstream.
Without doing verify_sec_ctx_len() check in xfrm_add_acquire(), it may be
out-of-bounds to access uctx->ctx_str with uctx->ctx_len, as noticed by
syz:BUG: KASAN: slab-out-of-bounds in selinux_xfrm_alloc_user+0x237/0x430
Read of size 768 at addr ffff8880123be9b4 by task syz-executor.1/11650Call Trace:
dump_stack+0xe8/0x16e
print_address_description.cold.3+0x9/0x23b
kasan_report.cold.4+0x64/0x95
memcpy+0x1f/0x50
selinux_xfrm_alloc_user+0x237/0x430
security_xfrm_policy_alloc+0x5c/0xb0
xfrm_policy_construct+0x2b1/0x650
xfrm_add_acquire+0x21d/0xa10
xfrm_user_rcv_msg+0x431/0x6f0
netlink_rcv_skb+0x15a/0x410
xfrm_netlink_rcv+0x6d/0x90
netlink_unicast+0x50e/0x6a0
netlink_sendmsg+0x8ae/0xd40
sock_sendmsg+0x133/0x170
___sys_sendmsg+0x834/0x9a0
__sys_sendmsg+0x100/0x1e0
do_syscall_64+0xe5/0x660
entry_SYSCALL_64_after_hwframe+0x6a/0xdfSo fix it by adding the missing verify_sec_ctx_len check there.
Fixes: 980ebd25794f ("[IPSEC]: Sync series - acquire insert")
Reported-by: Hangbin Liu
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit 171d449a028573b2f0acdc7f31ecbb045391b320 upstream.
It's not sufficient to do 'uctx->len != (sizeof(struct xfrm_user_sec_ctx) +
uctx->ctx_len)' check only, as uctx->len may be greater than nla_len(rt),
in which case it will cause slab-out-of-bounds when accessing uctx->ctx_str
later.This patch is to fix it by return -EINVAL when uctx->len > nla_len(rt).
Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Xin Long
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman -
commit 03891f820c2117b19e80b370281eb924a09cf79f upstream.
This patch to handle the asynchronous unregister
device event so the device IPsec offload resources
could be cleanly released.Fixes: e4db5b61c572 ("xfrm: policy: remove pcpu policy cache")
Signed-off-by: Raed Salem
Reviewed-by: Boris Pismenny
Reviewed-by: Saeed Mahameed
Signed-off-by: Steffen Klassert
Signed-off-by: Greg Kroah-Hartman
06 Feb, 2020
2 commits
-
[ Upstream commit 8aaea2b0428b6aad7c7e22d3fddc31a78bb1d724 ]
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.Signed-off-by: Xu Wang
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin -
[ Upstream commit f042365dbffea98fb8148c98c700402e8d099f02 ]
With an ebpf program that redirects packets through a xfrm interface,
packets are dropped because no dst is attached to skb.This could also be reproduced with an AF_PACKET socket, with the following
python script (xfrm1 is a xfrm interface):import socket
send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0)
# scapy
# p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request')
# raw(p)
req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00'
send_s.sendto(req, ('xfrm1', 0x800, 0, 0))It was also not possible to send an ip packet through an AF_PACKET socket
because a LL header was expected. Let's remove those LL header constraints.Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert
Signed-off-by: Sasha Levin
12 Nov, 2019
1 commit
-
An ESP packet could be decrypted in async mode if the input handler for
this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
reference in skb is held. Later xfrm_input() will be invoked again to
resume the processing.
If the transform state is still valid it would continue to release the
device reference and there won't be a problem; however if the transform
state is not valid when async resumption happens, the packet will be
dropped while the device reference is still being held.
When the device is deleted for some reason and the reference to this
device is not properly released, the kernel will keep logging like:unregister_netdevice: waiting for ppp2 to become free. Usage count = 1
The issue is observed when running IPsec traffic over a PPPoE device based
on a bridge interface. By terminating the PPPoE connection on the server
end for multiple times, the PPPoE device on the client side will eventually
get stuck on the above warning message.This patch will check the async mode first and continue to release device
reference in async resumption, before it is dropped due to invalid state.v2: Do not assign address family from outer_mode in the transform if the
state is invalidv3: Release device reference in the error path instead of jumping to resume
Fixes: 4ce3dbe397d7b ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
Signed-off-by: Xiaodong Xu
Reported-by: Bo Chen
Tested-by: Bo Chen
Signed-off-by: Steffen Klassert
07 Nov, 2019
1 commit
-
We leak the page that we use to create skb page fragments
when destroying the xfrm_state. Fix this by dropping a
page reference if a page was assigned to the xfrm_state.Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Reported-by: JD
Reported-by: Paul Wouters
Signed-off-by: Steffen Klassert
02 Oct, 2019
1 commit
-
commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.Suggested-by: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
15 Sep, 2019
1 commit
-
Minor overlapping changes in the btusb and ixgbe drivers.
Signed-off-by: David S. Miller
06 Sep, 2019
1 commit
-
Steffen Klassert says:
====================
pull request (net): ipsec 2019-09-051) Several xfrm interface fixes from Nicolas Dichtel:
- Avoid an interface ID corruption on changelink.
- Fix wrong intterface names in the logs.
- Fix a list corruption when changing network namespaces.
- Fix unregistation of the underying phydev.2) Fix a potential warning when merging xfrm_plocy nodes.
From Florian Westphal.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
28 Aug, 2019
1 commit
-
Minor conflict in r8169, bug fix had two versions in net
and net-next, take the net-next hunks.Signed-off-by: David S. Miller
25 Aug, 2019
1 commit
-
In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:- ip6gre_tunnel_xmit
- ip6gre_xmit_ipv6
- __gre6_xmit
- ip6_tnl_xmit
- if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
- icmpv6_send
- icmpv6_route_lookup
- xfrm_decode_session_reverse
- decode_session4
- oif = skb_dst(skb)->dev->ifindex; dev->ifindex; dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the skb_dst(skb)->dev is actually not needed as we
called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;So make a dst dev check here should be clean and safe.
v4: No changes.
v3: No changes.
v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Hangbin Liu
Tested-by: Jonathan Lemon
Signed-off-by: David S. Miller
20 Aug, 2019
1 commit
-
syzbot reported a splat:
xfrm_policy_inexact_list_reinsert+0x625/0x6e0 net/xfrm/xfrm_policy.c:877
CPU: 1 PID: 6756 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #57
Call Trace:
xfrm_policy_inexact_node_reinsert net/xfrm/xfrm_policy.c:922 [inline]
xfrm_policy_inexact_node_merge net/xfrm/xfrm_policy.c:958 [inline]
xfrm_policy_inexact_insert_node+0x537/0xb50 net/xfrm/xfrm_policy.c:1023
xfrm_policy_inexact_alloc_chain+0x62b/0xbd0 net/xfrm/xfrm_policy.c:1139
xfrm_policy_inexact_insert+0xe8/0x1540 net/xfrm/xfrm_policy.c:1182
xfrm_policy_insert+0xdf/0xce0 net/xfrm/xfrm_policy.c:1574
xfrm_add_policy+0x4cf/0x9b0 net/xfrm/xfrm_user.c:1670
xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2676
netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2477
xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2684
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x809/0x9a0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0xa70/0xd30 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg net/socket.c:657 [inline]There is no reproducer, however, the warning can be reproduced
by adding rules with ever smaller prefixes.The sanity check ("does the policy match the node") uses the prefix value
of the node before its updated to the smaller value.To fix this, update the prefix earlier. The bug has no impact on tree
correctness, this is only to prevent a false warning.Reported-by: syzbot+8cc27ace5f6972910b31@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
31 Jul, 2019
1 commit
-
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.Signed-off-by: Jonathan Lemon
Signed-off-by: David S. Miller
17 Jul, 2019
4 commits
-
With the current implementation, phydev cannot be removed:
$ ip link add dummy type dummy
$ ip link add xfrm1 type xfrm dev dummy if_id 1
$ ip l d dummy
kernel:[77938.465445] unregister_netdevice: waiting for dummy to become free. Usage count = 1Manage it like in ip tunnels, ie just keep the ifindex. Not that the side
effect, is that the phydev is now optional.Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel
Tested-by: Julien Floret
Signed-off-by: Steffen Klassert -
dev_net(dev) is the netns of the device and xi->net is the link netns,
where the device has been linked.
changelink() must operate in the link netns to avoid a corruption of
the xfrm lists.Note that xi->net and dev_net(xi->physdev) are always the same.
Before the patch, the xfrmi lists may be corrupted and can later trigger a
kernel panic.Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Julien Floret
Signed-off-by: Nicolas Dichtel
Tested-by: Julien Floret
Signed-off-by: Steffen Klassert -
The ifname is copied when the interface is created, but is never updated
later. In fact, this property is used only in one error message, where the
netdevice pointer is available, thus let's use it.Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert -
The new parameters must not be stored in the netdev_priv() before
validation, it may corrupt the interface. Note also that if data is NULL,
only a memset() is done.$ ip link add xfrm1 type xfrm dev lo if_id 1
$ ip link add xfrm2 type xfrm dev lo if_id 2
$ ip link set xfrm1 type xfrm dev lo if_id 2
RTNETLINK answers: File exists
$ ip -d link list dev xfrm1
5: xfrm1@lo: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 68 maxmtu 1500
xfrm if_id 0x2 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535=> "if_id 0x2"
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Nicolas Dichtel
Tested-by: Julien Floret
Signed-off-by: Steffen Klassert
09 Jul, 2019
1 commit
-
Two cases of overlapping changes, nothing fancy.
Signed-off-by: David S. Miller
06 Jul, 2019
2 commits
-
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2019-07-051) A lot of work to remove indirections from the xfrm code.
From Florian Westphal.2) Fix a WARN_ON with ipv6 that triggered because of a
forgotten break statement. From Florian Westphal.3) Remove xfrmi_init_net, it is not needed.
From Li RongQing.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
-
Steffen Klassert says:
====================
pull request (net): ipsec 2019-07-051) Fix xfrm selector prefix length validation for
inter address family tunneling.
From Anirudh Gupta.2) Fix a memleak in pfkey.
From Jeremy Sowden.3) Fix SA selector validation to allow empty selectors again.
From Nicolas Dichtel.4) Select crypto ciphers for xfrm_algo, this fixes some
randconfig builds. From Arnd Bergmann.5) Remove a duplicated assignment in xfrm_bydst_resize.
From Cong Wang.6) Fix a hlist corruption on hash rebuild.
From Florian Westphal.7) Fix a memory leak when creating xfrm interfaces.
From Nicolas Dichtel.Please pull or let me know if there are problems.
====================Signed-off-by: David S. Miller
03 Jul, 2019
2 commits
-
The following commands produce a backtrace and return an error but the xfrm
interface is created (in the wrong netns):
$ ip netns add foo
$ ip netns add bar
$ ip -n foo netns set bar 0
$ ip -n foo link add xfrmi0 link-netnsid 0 type xfrm dev lo if_id 23
RTNETLINK answers: Invalid argument
$ ip -n bar link ls xfrmi0
2: xfrmi0@lo: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00Here is the backtrace:
[ 79.879174] WARNING: CPU: 0 PID: 1178 at net/core/dev.c:8172 rollback_registered_many+0x86/0x3c1
[ 79.880260] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic ide_cd_mod ide_gd_mod cdrom ata_$
eneric ata_piix libata scsi_mod 8139too piix psmouse i2c_piix4 ide_core 8139cp mii i2c_core floppy
[ 79.883698] CPU: 0 PID: 1178 Comm: ip Not tainted 5.2.0-rc6+ #106
[ 79.884462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 79.885447] RIP: 0010:rollback_registered_many+0x86/0x3c1
[ 79.886120] Code: 01 e8 d7 7d c6 ff 0f 0b 48 8b 45 00 4c 8b 20 48 8d 58 90 49 83 ec 70 48 8d 7b 70 48 39 ef 74 44 8a 83 d0 04 00 00 84 c0 75 1f 0b e8 61 cd ff ff 48 b8 00 01 00 00 00 00 ad de 48 89 43 70 66
[ 79.888667] RSP: 0018:ffffc900015ab740 EFLAGS: 00010246
[ 79.889339] RAX: ffff8882353e5700 RBX: ffff8882353e56a0 RCX: ffff8882353e5710
[ 79.890174] RDX: ffffc900015ab7e0 RSI: ffffc900015ab7e0 RDI: ffff8882353e5710
[ 79.891029] RBP: ffffc900015ab7e0 R08: ffffc900015ab7e0 R09: ffffc900015ab7e0
[ 79.891866] R10: ffffc900015ab7a0 R11: ffffffff82233fec R12: ffffc900015ab770
[ 79.892728] R13: ffffffff81eb7ec0 R14: ffff88822ed6cf00 R15: 00000000ffffffea
[ 79.893557] FS: 00007ff350f31740(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
[ 79.894581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.895317] CR2: 00000000006c8580 CR3: 000000022c272000 CR4: 00000000000006f0
[ 79.896137] Call Trace:
[ 79.896464] unregister_netdevice_many+0x12/0x6c
[ 79.896998] __rtnl_newlink+0x6e2/0x73b
[ 79.897446] ? __kmalloc_node_track_caller+0x15e/0x185
[ 79.898039] ? pskb_expand_head+0x5f/0x1fe
[ 79.898556] ? stack_access_ok+0xd/0x2c
[ 79.899009] ? deref_stack_reg+0x12/0x20
[ 79.899462] ? stack_access_ok+0xd/0x2c
[ 79.899927] ? stack_access_ok+0xd/0x2c
[ 79.900404] ? __module_text_address+0x9/0x4f
[ 79.900910] ? is_bpf_text_address+0x5/0xc
[ 79.901390] ? kernel_text_address+0x67/0x7b
[ 79.901884] ? __kernel_text_address+0x1a/0x25
[ 79.902397] ? unwind_get_return_address+0x12/0x23
[ 79.903122] ? __cmpxchg_double_slab.isra.37+0x46/0x77
[ 79.903772] rtnl_newlink+0x43/0x56
[ 79.904217] rtnetlink_rcv_msg+0x200/0x24cIn fact, each time a xfrm interface was created, a netdev was allocated
by __rtnl_newlink()/rtnl_create_link() and then another one by
xfrmi_newlink()/xfrmi_create(). Only the second one was registered, it's
why the previous commands produce a backtrace: dev_change_net_namespace()
was called on a netdev with reg_state set to NETREG_UNINITIALIZED (the
first one).CC: Lorenzo Colitti
CC: Benedict Wong
CC: Steffen Klassert
CC: Shannon Nelson
CC: Antony Antony
CC: Eyal Birger
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Reported-by: Julien Floret
Signed-off-by: Nicolas Dichtel
Signed-off-by: Steffen Klassert -
syzbot reported following spat:
BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221
BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455
BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066
Workqueue: events xfrm_hash_rebuild
Call Trace:
__write_once_size include/linux/compiler.h:221 [inline]
hlist_del_rcu include/linux/rculist.h:455 [inline]
xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
process_one_work+0x814/0x1130 kernel/workqueue.c:2269
Allocated by task 8064:
__kmalloc+0x23c/0x310 mm/slab.c:3669
kzalloc include/linux/slab.h:742 [inline]
xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21
xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline]
xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120
ops_init+0x336/0x420 net/core/net_namespace.c:130
setup_net+0x212/0x690 net/core/net_namespace.c:316The faulting address is the address of the old chain head,
free'd by xfrm_hash_resize().In xfrm_hash_rehash(), chain heads get re-initialized without
any hlist_del_rcu:for (i = hmask; i >= 0; i--)
INIT_HLIST_HEAD(odst + i);Then, hlist_del_rcu() gets called on the about to-be-reinserted policy
when iterating the per-net list of policies.hlist_del_rcu() will then make chain->first be nonzero again:
static inline void __hlist_del(struct hlist_node *n)
{
struct hlist_node *next = n->next; // address of next element in list
struct hlist_node **pprev = n->pprev;// location of previous elem, this
// can point at chain->first
WRITE_ONCE(*pprev, next); // chain->first points to next elem
if (next)
next->pprev = pprev;Then, when we walk chainlist to find insertion point, we may find a
non-empty list even though we're supposedly reinserting the first
policy to an empty chain.To fix this first unlink all exact and inexact policies instead of
zeroing the list heads.Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh,
without fix KASAN catches the corruption as it happens, SLUB poisoning
detects it a bit later.Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com
Fixes: 1548bc4e0512 ("xfrm: policy: delete inexact policies from inexact list on hash rebuild")
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
02 Jul, 2019
1 commit
-
Fixes: 30846090a746 ("xfrm: policy: add sequence count to sync with hash resize")
Cc: Florian Westphal
Cc: Steffen Klassert
Signed-off-by: Cong Wang
Signed-off-by: Steffen Klassert
01 Jul, 2019
1 commit
-
esp4_get_mtu and esp6_get_mtu are exactly the same, the only difference
is a single sizeof() (ipv4 vs. ipv6 header).Merge both into xfrm_state_mtu() and remove the indirection.
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
21 Jun, 2019
1 commit
-
kernelci.org reports failed builds on arc because of what looks
like an old missed 'select' statement:net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs':
xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash'I don't see this in randconfig builds on other architectures, but
it's fairly clear we want to select the hash code for it, like we
do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER
is also required even though it has not popped up in build tests.Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms")
Signed-off-by: Arnd Bergmann
Acked-by: Herbert Xu
Signed-off-by: Steffen Klassert
17 Jun, 2019
1 commit
-
After commit b38ff4075a80, the following command does not work anymore:
$ ip xfrm state add src 10.125.0.2 dst 10.125.0.1 proto esp spi 34 reqid 1 \
mode tunnel enc 'cbc(aes)' 0xb0abdba8b782ad9d364ec81e3a7d82a1 auth-trunc \
'hmac(sha1)' 0xe26609ebd00acb6a4d51fca13e49ea78a72c73e6 96 flag align4In fact, the selector is not mandatory, allow the user to provide an empty
selector.Fixes: b38ff4075a80 ("xfrm: Fix xfrm sel prefix length validation")
CC: Anirudh Gupta
Signed-off-by: Nicolas Dichtel
Acked-by: Herbert Xu
Signed-off-by: Steffen Klassert
14 Jun, 2019
1 commit
-
Pointer members of an object with static storage duration, if not
explicitly initialized, will be initialized to a NULL pointer. The
net namespace API checks if this pointer is not NULL before using it,
it are safe to remove the function.Signed-off-by: Li RongQing
Signed-off-by: Steffen Klassert
12 Jun, 2019
1 commit
-
net/xfrm/xfrm_input.c:378:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
skb->protocol = htons(ETH_P_IPV6);... the fallthrough then causes a bogus WARN_ON().
Reported-by: Stephen Rothwell
Fixes: 4c203b0454b ("xfrm: remove eth_proto value from xfrm_state_afinfo")
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert
06 Jun, 2019
2 commits
-
Only a handful of xfrm_types exist, no need to have 512 pointers for them.
Reduces size of afinfo struct from 4k to 120 bytes on 64bit platforms.
Also, the unregister function doesn't need to return an error, no single
caller does anything useful with it.Just place a WARN_ON() where needed instead.
Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert -
xfrm_prepare_input needs to lookup the state afinfo backend again to fetch
the address family ethernet protocol value.There are only two address families, so a switch statement is simpler.
While at it, use u8 for family and proto and remove the owner member --
its not used anywhere.Signed-off-by: Florian Westphal
Signed-off-by: Steffen Klassert