05 Aug, 2020

1 commit

  • [ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

    In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    it would take 'priority' to make a policy unique, and allow duplicated
    policies with different 'priority' to be added, which is not expected
    by userland, as Tobias reported in strongswan.

    To fix this duplicated policies issue, and also fix the issue in
    commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    when doing add/del/get/update on user interfaces, this patch is to change
    to look up a policy with both mark and mask by doing:

    mark.v == pol->mark.v && mark.m == pol->mark.m

    and leave the check:

    (mark & pol->mark.m) == pol->mark.v

    for tx/rx path only.

    As the userland expects an exact mark and mask match to manage policies.

    v1->v2:
    - make xfrm_policy_mark_match inline and fix the changelog as
    Tobias suggested.

    Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
    Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
    Reported-by: Tobias Brunner
    Tested-by: Tobias Brunner
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Long
     

01 Jul, 2020

1 commit

  • [ Upstream commit 94579ac3f6d0820adc83b5dc5358ead0158101e9 ]

    During IPsec performance testing, we see bad ICMP checksum. The error packet
    has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
    is from ip_output, but the packet cannot be sent because
    netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
    call is from NET_TX softirq. However after the first call, the packet already
    has the ESP trailer.

    Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
    validate_xmit_xfrm to avoid duplicate ESP trailer insertion.

    Fixes: f6e27114a60a ("net: Add a xfrm validate function to validate_xmit_skb")
    Signed-off-by: Huy Nguyen
    Reviewed-by: Boris Pismenny
    Reviewed-by: Raed Salem
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Huy Nguyen
     

03 Jun, 2020

6 commits

  • commit f6a23d85d078c2ffde79c66ca81d0a1dde451649 upstream.

    This patch is to fix a crash:

    [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [ ] general protection fault: 0000 [#1] SMP KASAN PTI
    [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
    [ ] Call Trace:
    [ ] xfrm6_local_error+0x1eb/0x300
    [ ] xfrm_local_error+0x95/0x130
    [ ] __xfrm6_output+0x65f/0xb50
    [ ] xfrm6_output+0x106/0x46f
    [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
    [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
    [ ] vxlan_xmit+0x6a0/0x4276 [vxlan]
    [ ] dev_hard_start_xmit+0x165/0x820
    [ ] __dev_queue_xmit+0x1ff0/0x2b90
    [ ] ip_finish_output2+0xd3e/0x1480
    [ ] ip_do_fragment+0x182d/0x2210
    [ ] ip_output+0x1d0/0x510
    [ ] ip_send_skb+0x37/0xa0
    [ ] raw_sendmsg+0x1b4c/0x2b80
    [ ] sock_sendmsg+0xc0/0x110

    This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
    skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
    xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
    to get ipv6 info from a ipv4 sk.

    This issue was actually fixed by Commit 628e341f319f ("xfrm: make local
    error reporting more robust"), but brought back by Commit 844d48746e4b
    ("xfrm: choose protocol family by skb protocol").

    So to fix it, we should call xfrm6_local_error() only when skb->protocol
    is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.

    Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol")
    Reported-by: Xiumei Mu
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit ed17b8d377eaf6b4a01d46942b4c647378a79bdd upstream.

    This waring can be triggered simply by:

    # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
    priority 1 mark 0 mask 0x10 #[1]
    # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
    priority 2 mark 0 mask 0x1 #[2]
    # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
    priority 2 mark 0 mask 0x10 #[3]

    Then dmesg shows:

    [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548
    [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030
    [ ] Call Trace:
    [ ] xfrm_policy_inexact_insert+0x85/0xe50
    [ ] xfrm_policy_insert+0x4ba/0x680
    [ ] xfrm_add_policy+0x246/0x4d0
    [ ] xfrm_user_rcv_msg+0x331/0x5c0
    [ ] netlink_rcv_skb+0x121/0x350
    [ ] xfrm_netlink_rcv+0x66/0x80
    [ ] netlink_unicast+0x439/0x630
    [ ] netlink_sendmsg+0x714/0xbf0
    [ ] sock_sendmsg+0xe2/0x110

    The issue was introduced by Commit 7cb8a93968e3 ("xfrm: Allow inserting
    policies with matching mark and different priorities"). After that, the
    policies [1] and [2] would be able to be added with different priorities.

    However, policy [3] will actually match both [1] and [2]. Policy [1]
    was matched due to the 1st 'return true' in xfrm_policy_mark_match(),
    and policy [2] was matched due to the 2nd 'return true' in there. It
    caused WARN_ON() in xfrm_policy_insert_list().

    This patch is to fix it by only (the same value and priority) as the
    same policy in xfrm_policy_mark_match().

    Thanks to Yuehaibing, we could make this fix better.

    v1->v2:
    - check policy->mark.v == pol->mark.v only without mask.

    Fixes: 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities")
    Reported-by: Xiumei Mu
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit c95c5f58b35ef995f66cb55547eee6093ab5fcb8 upstream.

    Here is the steps to reproduce the problem:
    ip netns add foo
    ip netns add bar
    ip -n foo link add xfrmi0 type xfrm dev lo if_id 42
    ip -n foo link set xfrmi0 netns bar
    ip netns del foo
    ip netns del bar

    Which results to:
    [ 186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI
    [ 186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1
    [ 186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    [ 186.689420] Workqueue: netns cleanup_net
    [ 186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface]
    [ 186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08
    [ 186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286
    [ 186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0
    [ 186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000
    [ 186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040
    [ 186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80
    [ 186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8
    [ 186.697738] FS: 0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000
    [ 186.698705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0
    [ 186.700221] Call Trace:
    [ 186.700508] rollback_registered_many+0x32b/0x3fd
    [ 186.701058] ? __rtnl_unlock+0x20/0x3d
    [ 186.701494] ? arch_local_irq_save+0x11/0x17
    [ 186.702012] unregister_netdevice_many+0x12/0x55
    [ 186.702594] default_device_exit_batch+0x12b/0x150
    [ 186.703160] ? prepare_to_wait_exclusive+0x60/0x60
    [ 186.703719] cleanup_net+0x17d/0x234
    [ 186.704138] process_one_work+0x196/0x2e8
    [ 186.704652] worker_thread+0x1a4/0x249
    [ 186.705087] ? cancel_delayed_work+0x92/0x92
    [ 186.705620] kthread+0x105/0x10f
    [ 186.706000] ? __kthread_bind_mask+0x57/0x57
    [ 186.706501] ret_from_fork+0x35/0x40
    [ 186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy
    [ 186.710423] ---[ end trace 463bba18105537e5 ]---

    The problem is that x-netns xfrm interface are not removed when the link
    netns is removed. This causes later this oops when thoses interfaces are
    removed.

    Let's add a handler to remove all interfaces related to a netns when this
    netns is removed.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Reported-by: Christophe Gouault
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Dichtel
     
  • commit a204aef9fd77dce1efd9066ca4e44eede99cd858 upstream.

    An use-after-free crash can be triggered when sending big packets over
    vxlan over esp with esp offload enabled:

    [] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
    [] Call Trace:
    [] dump_stack+0x75/0xa0
    [] kasan_report+0x37/0x50
    [] ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
    [] ipv6_gso_segment+0x2c8/0x13c0
    [] skb_mac_gso_segment+0x1cb/0x420
    [] skb_udp_tunnel_segment+0x6b5/0x1c90
    [] inet_gso_segment+0x440/0x1380
    [] skb_mac_gso_segment+0x1cb/0x420
    [] esp4_gso_segment+0xae8/0x1709 [esp4_offload]
    [] inet_gso_segment+0x440/0x1380
    [] skb_mac_gso_segment+0x1cb/0x420
    [] __skb_gso_segment+0x2d7/0x5f0
    [] validate_xmit_skb+0x527/0xb10
    [] __dev_queue_xmit+0x10f8/0x2320 inner_network_header would be
    set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can
    overwrite the former one. It causes skb_udp_tunnel_segment() to use a
    wrong skb->inner_network_header, then the issue occurs.

    This patch is to fix it by calling xfrm_output_gso() instead when the
    inner_protocol is set, in which gso_segment of inner_protocol will be
    done first.

    While at it, also improve some code around.

    Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6")
    Reported-by: Xiumei Mu
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit 06a0afcfe2f551ff755849ea2549b0d8409fd9a0 upstream.

    For transport mode, when ipv6 nexthdr is set, the packet format might
    be like:

    ----------------------------------------------------
    | | dest | | | | ESP | ESP |
    | IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
    ----------------------------------------------------

    and in __xfrm_transport_prep():

    pskb_pull(skb, skb->mac_len + sizeof(ip6hdr) + x->props.header_len);

    it will pull the data pointer to the wrong position, as it missed the
    nexthdrs/dest opts.

    This patch is to fix it by using:

    pskb_pull(skb, skb_transport_offset(skb) + x->props.header_len);

    as we can be sure transport_header points to ESP header at that moment.

    It also fixes a panic when packets with ipv6 nexthdr are sent over
    esp6 transport mode:

    [ 100.473845] kernel BUG at net/core/skbuff.c:4325!
    [ 100.478517] RIP: 0010:__skb_to_sgvec+0x252/0x260
    [ 100.494355] Call Trace:
    [ 100.494829] skb_to_sgvec+0x11/0x40
    [ 100.495492] esp6_output_tail+0x12e/0x550 [esp6]
    [ 100.496358] esp6_xmit+0x1d5/0x260 [esp6_offload]
    [ 100.498029] validate_xmit_xfrm+0x22f/0x2e0
    [ 100.499604] __dev_queue_xmit+0x589/0x910
    [ 100.502928] ip6_finish_output2+0x2a5/0x5a0
    [ 100.503718] ip6_output+0x6c/0x120
    [ 100.505198] xfrm_output_resume+0x4bf/0x530
    [ 100.508683] xfrm6_output+0x3a/0xc0
    [ 100.513446] inet6_csk_xmit+0xa1/0xf0
    [ 100.517335] tcp_sendmsg+0x27/0x40
    [ 100.517977] sock_sendmsg+0x3e/0x60
    [ 100.518648] __sys_sendto+0xee/0x160

    Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2")
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit afcaf61be9d1dbdee5ec186d1dcc67b6b692180f upstream.

    For beet mode, when it's ipv6 inner address with nexthdrs set,
    the packet format might be:

    ----------------------------------------------------
    | outer | | dest | | | ESP | ESP |
    | IP hdr | ESP | opts.| TCP | Data | Trailer | ICV |
    ----------------------------------------------------

    The nexthdr from ESP could be NEXTHDR_HOP(0), so it should
    continue processing the packet when nexthdr returns 0 in
    xfrm_input(). Otherwise, when ipv6 nexthdr is set, the
    packet will be dropped.

    I don't see any error cases that nexthdr may return 0. So
    fix it by removing the check for nexthdr == 0.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     

01 Apr, 2020

4 commits

  • commit 4c59406ed00379c8663f8663d82b2537467ce9d7 upstream.

    After xfrm_add_policy add a policy, its ref is 2, then

    xfrm_policy_timer
    read_lock
    xp->walk.dead is 0
    ....
    mod_timer()
    xfrm_policy_kill
    policy->walk.dead = 1
    ....
    del_timer(&policy->timer)
    xfrm_pol_put //ref is 1
    xfrm_pol_put //ref is 0
    xfrm_policy_destroy
    call_rcu
    xfrm_pol_hold //ref is 1
    read_unlock
    xfrm_pol_put //ref is 0
    xfrm_policy_destroy
    call_rcu

    xfrm_policy_destroy is called twice, which may leads to
    double free.

    Call Trace:
    RIP: 0010:refcount_warn_saturate+0x161/0x210
    ...
    xfrm_policy_timer+0x522/0x600
    call_timer_fn+0x1b3/0x5e0
    ? __xfrm_decode_session+0x2990/0x2990
    ? msleep+0xb0/0xb0
    ? _raw_spin_unlock_irq+0x24/0x40
    ? __xfrm_decode_session+0x2990/0x2990
    ? __xfrm_decode_session+0x2990/0x2990
    run_timer_softirq+0x5c5/0x10e0

    Fix this by use write_lock_bh in xfrm_policy_kill.

    Fixes: ea2dea9dacc2 ("xfrm: remove policy lock when accessing policy->walk.dead")
    Signed-off-by: YueHaibing
    Acked-by: Timo Teräs
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    YueHaibing
     
  • commit a1a7e3a36e01ca6e67014f8cf673cb8e47be5550 upstream.

    Without doing verify_sec_ctx_len() check in xfrm_add_acquire(), it may be
    out-of-bounds to access uctx->ctx_str with uctx->ctx_len, as noticed by
    syz:

    BUG: KASAN: slab-out-of-bounds in selinux_xfrm_alloc_user+0x237/0x430
    Read of size 768 at addr ffff8880123be9b4 by task syz-executor.1/11650

    Call Trace:
    dump_stack+0xe8/0x16e
    print_address_description.cold.3+0x9/0x23b
    kasan_report.cold.4+0x64/0x95
    memcpy+0x1f/0x50
    selinux_xfrm_alloc_user+0x237/0x430
    security_xfrm_policy_alloc+0x5c/0xb0
    xfrm_policy_construct+0x2b1/0x650
    xfrm_add_acquire+0x21d/0xa10
    xfrm_user_rcv_msg+0x431/0x6f0
    netlink_rcv_skb+0x15a/0x410
    xfrm_netlink_rcv+0x6d/0x90
    netlink_unicast+0x50e/0x6a0
    netlink_sendmsg+0x8ae/0xd40
    sock_sendmsg+0x133/0x170
    ___sys_sendmsg+0x834/0x9a0
    __sys_sendmsg+0x100/0x1e0
    do_syscall_64+0xe5/0x660
    entry_SYSCALL_64_after_hwframe+0x6a/0xdf

    So fix it by adding the missing verify_sec_ctx_len check there.

    Fixes: 980ebd25794f ("[IPSEC]: Sync series - acquire insert")
    Reported-by: Hangbin Liu
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit 171d449a028573b2f0acdc7f31ecbb045391b320 upstream.

    It's not sufficient to do 'uctx->len != (sizeof(struct xfrm_user_sec_ctx) +
    uctx->ctx_len)' check only, as uctx->len may be greater than nla_len(rt),
    in which case it will cause slab-out-of-bounds when accessing uctx->ctx_str
    later.

    This patch is to fix it by return -EINVAL when uctx->len > nla_len(rt).

    Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit 03891f820c2117b19e80b370281eb924a09cf79f upstream.

    This patch to handle the asynchronous unregister
    device event so the device IPsec offload resources
    could be cleanly released.

    Fixes: e4db5b61c572 ("xfrm: policy: remove pcpu policy cache")
    Signed-off-by: Raed Salem
    Reviewed-by: Boris Pismenny
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Raed Salem
     

06 Feb, 2020

2 commits

  • [ Upstream commit 8aaea2b0428b6aad7c7e22d3fddc31a78bb1d724 ]

    When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
    we should not call dst_confirm_neigh() as there is no two-way communication.

    Signed-off-by: Xu Wang
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xu Wang
     
  • [ Upstream commit f042365dbffea98fb8148c98c700402e8d099f02 ]

    With an ebpf program that redirects packets through a xfrm interface,
    packets are dropped because no dst is attached to skb.

    This could also be reproduced with an AF_PACKET socket, with the following
    python script (xfrm1 is a xfrm interface):

    import socket
    send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0)
    # scapy
    # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request')
    # raw(p)
    req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00'
    send_s.sendto(req, ('xfrm1', 0x800, 0, 0))

    It was also not possible to send an ip packet through an AF_PACKET socket
    because a LL header was expected. Let's remove those LL header constraints.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Nicolas Dichtel
     

12 Nov, 2019

1 commit

  • An ESP packet could be decrypted in async mode if the input handler for
    this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
    reference in skb is held. Later xfrm_input() will be invoked again to
    resume the processing.
    If the transform state is still valid it would continue to release the
    device reference and there won't be a problem; however if the transform
    state is not valid when async resumption happens, the packet will be
    dropped while the device reference is still being held.
    When the device is deleted for some reason and the reference to this
    device is not properly released, the kernel will keep logging like:

    unregister_netdevice: waiting for ppp2 to become free. Usage count = 1

    The issue is observed when running IPsec traffic over a PPPoE device based
    on a bridge interface. By terminating the PPPoE connection on the server
    end for multiple times, the PPPoE device on the client side will eventually
    get stuck on the above warning message.

    This patch will check the async mode first and continue to release device
    reference in async resumption, before it is dropped due to invalid state.

    v2: Do not assign address family from outer_mode in the transform if the
    state is invalid

    v3: Release device reference in the error path instead of jumping to resume

    Fixes: 4ce3dbe397d7b ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
    Signed-off-by: Xiaodong Xu
    Reported-by: Bo Chen
    Tested-by: Bo Chen
    Signed-off-by: Steffen Klassert

    Xiaodong Xu
     

07 Nov, 2019

1 commit

  • We leak the page that we use to create skb page fragments
    when destroying the xfrm_state. Fix this by dropping a
    page reference if a page was assigned to the xfrm_state.

    Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
    Reported-by: JD
    Reported-by: Paul Wouters
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

15 Sep, 2019

1 commit


06 Sep, 2019

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2019-09-05

    1) Several xfrm interface fixes from Nicolas Dichtel:
    - Avoid an interface ID corruption on changelink.
    - Fix wrong intterface names in the logs.
    - Fix a list corruption when changing network namespaces.
    - Fix unregistation of the underying phydev.

    2) Fix a potential warning when merging xfrm_plocy nodes.
    From Florian Westphal.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Aug, 2019

1 commit


25 Aug, 2019

1 commit

  • In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
    e,g, with tunnel collect_md mode, which will cause kernel crash.
    Here is what the code path looks like, for GRE:

    - ip6gre_tunnel_xmit
    - ip6gre_xmit_ipv6
    - __gre6_xmit
    - ip6_tnl_xmit
    - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
    - icmpv6_route_lookup
    - xfrm_decode_session_reverse
    - decode_session4
    - oif = skb_dst(skb)->dev->ifindex; dev->ifindex; dev to NULL by default.
    We could not fix it in __metadata_dst_init() as there is no dev supplied.
    On the other hand, the skb_dst(skb)->dev is actually not needed as we
    called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
    used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

    So make a dst dev check here should be clean and safe.

    v4: No changes.

    v3: No changes.

    v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
    in {ip_md, ip6}_tunnel_xmit.

    Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
    Signed-off-by: Hangbin Liu
    Tested-by: Jonathan Lemon
    Signed-off-by: David S. Miller

    Hangbin Liu
     

20 Aug, 2019

1 commit

  • syzbot reported a splat:
    xfrm_policy_inexact_list_reinsert+0x625/0x6e0 net/xfrm/xfrm_policy.c:877
    CPU: 1 PID: 6756 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #57
    Call Trace:
    xfrm_policy_inexact_node_reinsert net/xfrm/xfrm_policy.c:922 [inline]
    xfrm_policy_inexact_node_merge net/xfrm/xfrm_policy.c:958 [inline]
    xfrm_policy_inexact_insert_node+0x537/0xb50 net/xfrm/xfrm_policy.c:1023
    xfrm_policy_inexact_alloc_chain+0x62b/0xbd0 net/xfrm/xfrm_policy.c:1139
    xfrm_policy_inexact_insert+0xe8/0x1540 net/xfrm/xfrm_policy.c:1182
    xfrm_policy_insert+0xdf/0xce0 net/xfrm/xfrm_policy.c:1574
    xfrm_add_policy+0x4cf/0x9b0 net/xfrm/xfrm_user.c:1670
    xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2676
    netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2477
    xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2684
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x809/0x9a0 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0xa70/0xd30 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg net/socket.c:657 [inline]

    There is no reproducer, however, the warning can be reproduced
    by adding rules with ever smaller prefixes.

    The sanity check ("does the policy match the node") uses the prefix value
    of the node before its updated to the smaller value.

    To fix this, update the prefix earlier. The bug has no impact on tree
    correctness, this is only to prevent a false warning.

    Reported-by: syzbot+8cc27ace5f6972910b31@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

31 Jul, 2019

1 commit


17 Jul, 2019

4 commits

  • With the current implementation, phydev cannot be removed:

    $ ip link add dummy type dummy
    $ ip link add xfrm1 type xfrm dev dummy if_id 1
    $ ip l d dummy
    kernel:[77938.465445] unregister_netdevice: waiting for dummy to become free. Usage count = 1

    Manage it like in ip tunnels, ie just keep the ifindex. Not that the side
    effect, is that the phydev is now optional.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Nicolas Dichtel
    Tested-by: Julien Floret
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     
  • dev_net(dev) is the netns of the device and xi->net is the link netns,
    where the device has been linked.
    changelink() must operate in the link netns to avoid a corruption of
    the xfrm lists.

    Note that xi->net and dev_net(xi->physdev) are always the same.

    Before the patch, the xfrmi lists may be corrupted and can later trigger a
    kernel panic.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Reported-by: Julien Floret
    Signed-off-by: Nicolas Dichtel
    Tested-by: Julien Floret
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     
  • The ifname is copied when the interface is created, but is never updated
    later. In fact, this property is used only in one error message, where the
    netdevice pointer is available, thus let's use it.

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     
  • The new parameters must not be stored in the netdev_priv() before
    validation, it may corrupt the interface. Note also that if data is NULL,
    only a memset() is done.

    $ ip link add xfrm1 type xfrm dev lo if_id 1
    $ ip link add xfrm2 type xfrm dev lo if_id 2
    $ ip link set xfrm1 type xfrm dev lo if_id 2
    RTNETLINK answers: File exists
    $ ip -d link list dev xfrm1
    5: xfrm1@lo: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 68 maxmtu 1500
    xfrm if_id 0x2 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

    => "if_id 0x2"

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Nicolas Dichtel
    Tested-by: Julien Floret
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

09 Jul, 2019

1 commit


06 Jul, 2019

2 commits

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2019-07-05

    1) A lot of work to remove indirections from the xfrm code.
    From Florian Westphal.

    2) Fix a WARN_ON with ipv6 that triggered because of a
    forgotten break statement. From Florian Westphal.

    3) Remove xfrmi_init_net, it is not needed.
    From Li RongQing.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2019-07-05

    1) Fix xfrm selector prefix length validation for
    inter address family tunneling.
    From Anirudh Gupta.

    2) Fix a memleak in pfkey.
    From Jeremy Sowden.

    3) Fix SA selector validation to allow empty selectors again.
    From Nicolas Dichtel.

    4) Select crypto ciphers for xfrm_algo, this fixes some
    randconfig builds. From Arnd Bergmann.

    5) Remove a duplicated assignment in xfrm_bydst_resize.
    From Cong Wang.

    6) Fix a hlist corruption on hash rebuild.
    From Florian Westphal.

    7) Fix a memory leak when creating xfrm interfaces.
    From Nicolas Dichtel.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jul, 2019

2 commits

  • The following commands produce a backtrace and return an error but the xfrm
    interface is created (in the wrong netns):
    $ ip netns add foo
    $ ip netns add bar
    $ ip -n foo netns set bar 0
    $ ip -n foo link add xfrmi0 link-netnsid 0 type xfrm dev lo if_id 23
    RTNETLINK answers: Invalid argument
    $ ip -n bar link ls xfrmi0
    2: xfrmi0@lo: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00

    Here is the backtrace:
    [ 79.879174] WARNING: CPU: 0 PID: 1178 at net/core/dev.c:8172 rollback_registered_many+0x86/0x3c1
    [ 79.880260] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic ide_cd_mod ide_gd_mod cdrom ata_$
    eneric ata_piix libata scsi_mod 8139too piix psmouse i2c_piix4 ide_core 8139cp mii i2c_core floppy
    [ 79.883698] CPU: 0 PID: 1178 Comm: ip Not tainted 5.2.0-rc6+ #106
    [ 79.884462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    [ 79.885447] RIP: 0010:rollback_registered_many+0x86/0x3c1
    [ 79.886120] Code: 01 e8 d7 7d c6 ff 0f 0b 48 8b 45 00 4c 8b 20 48 8d 58 90 49 83 ec 70 48 8d 7b 70 48 39 ef 74 44 8a 83 d0 04 00 00 84 c0 75 1f 0b e8 61 cd ff ff 48 b8 00 01 00 00 00 00 ad de 48 89 43 70 66
    [ 79.888667] RSP: 0018:ffffc900015ab740 EFLAGS: 00010246
    [ 79.889339] RAX: ffff8882353e5700 RBX: ffff8882353e56a0 RCX: ffff8882353e5710
    [ 79.890174] RDX: ffffc900015ab7e0 RSI: ffffc900015ab7e0 RDI: ffff8882353e5710
    [ 79.891029] RBP: ffffc900015ab7e0 R08: ffffc900015ab7e0 R09: ffffc900015ab7e0
    [ 79.891866] R10: ffffc900015ab7a0 R11: ffffffff82233fec R12: ffffc900015ab770
    [ 79.892728] R13: ffffffff81eb7ec0 R14: ffff88822ed6cf00 R15: 00000000ffffffea
    [ 79.893557] FS: 00007ff350f31740(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
    [ 79.894581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 79.895317] CR2: 00000000006c8580 CR3: 000000022c272000 CR4: 00000000000006f0
    [ 79.896137] Call Trace:
    [ 79.896464] unregister_netdevice_many+0x12/0x6c
    [ 79.896998] __rtnl_newlink+0x6e2/0x73b
    [ 79.897446] ? __kmalloc_node_track_caller+0x15e/0x185
    [ 79.898039] ? pskb_expand_head+0x5f/0x1fe
    [ 79.898556] ? stack_access_ok+0xd/0x2c
    [ 79.899009] ? deref_stack_reg+0x12/0x20
    [ 79.899462] ? stack_access_ok+0xd/0x2c
    [ 79.899927] ? stack_access_ok+0xd/0x2c
    [ 79.900404] ? __module_text_address+0x9/0x4f
    [ 79.900910] ? is_bpf_text_address+0x5/0xc
    [ 79.901390] ? kernel_text_address+0x67/0x7b
    [ 79.901884] ? __kernel_text_address+0x1a/0x25
    [ 79.902397] ? unwind_get_return_address+0x12/0x23
    [ 79.903122] ? __cmpxchg_double_slab.isra.37+0x46/0x77
    [ 79.903772] rtnl_newlink+0x43/0x56
    [ 79.904217] rtnetlink_rcv_msg+0x200/0x24c

    In fact, each time a xfrm interface was created, a netdev was allocated
    by __rtnl_newlink()/rtnl_create_link() and then another one by
    xfrmi_newlink()/xfrmi_create(). Only the second one was registered, it's
    why the previous commands produce a backtrace: dev_change_net_namespace()
    was called on a netdev with reg_state set to NETREG_UNINITIALIZED (the
    first one).

    CC: Lorenzo Colitti
    CC: Benedict Wong
    CC: Steffen Klassert
    CC: Shannon Nelson
    CC: Antony Antony
    CC: Eyal Birger
    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Reported-by: Julien Floret
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     
  • syzbot reported following spat:

    BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221
    BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455
    BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
    Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066
    Workqueue: events xfrm_hash_rebuild
    Call Trace:
    __write_once_size include/linux/compiler.h:221 [inline]
    hlist_del_rcu include/linux/rculist.h:455 [inline]
    xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
    process_one_work+0x814/0x1130 kernel/workqueue.c:2269
    Allocated by task 8064:
    __kmalloc+0x23c/0x310 mm/slab.c:3669
    kzalloc include/linux/slab.h:742 [inline]
    xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21
    xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline]
    xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120
    ops_init+0x336/0x420 net/core/net_namespace.c:130
    setup_net+0x212/0x690 net/core/net_namespace.c:316

    The faulting address is the address of the old chain head,
    free'd by xfrm_hash_resize().

    In xfrm_hash_rehash(), chain heads get re-initialized without
    any hlist_del_rcu:

    for (i = hmask; i >= 0; i--)
    INIT_HLIST_HEAD(odst + i);

    Then, hlist_del_rcu() gets called on the about to-be-reinserted policy
    when iterating the per-net list of policies.

    hlist_del_rcu() will then make chain->first be nonzero again:

    static inline void __hlist_del(struct hlist_node *n)
    {
    struct hlist_node *next = n->next; // address of next element in list
    struct hlist_node **pprev = n->pprev;// location of previous elem, this
    // can point at chain->first
    WRITE_ONCE(*pprev, next); // chain->first points to next elem
    if (next)
    next->pprev = pprev;

    Then, when we walk chainlist to find insertion point, we may find a
    non-empty list even though we're supposedly reinserting the first
    policy to an empty chain.

    To fix this first unlink all exact and inexact policies instead of
    zeroing the list heads.

    Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh,
    without fix KASAN catches the corruption as it happens, SLUB poisoning
    detects it a bit later.

    Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com
    Fixes: 1548bc4e0512 ("xfrm: policy: delete inexact policies from inexact list on hash rebuild")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

02 Jul, 2019

1 commit


01 Jul, 2019

1 commit


21 Jun, 2019

1 commit

  • kernelci.org reports failed builds on arc because of what looks
    like an old missed 'select' statement:

    net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs':
    xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash'

    I don't see this in randconfig builds on other architectures, but
    it's fairly clear we want to select the hash code for it, like we
    do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER
    is also required even though it has not popped up in build tests.

    Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms")
    Signed-off-by: Arnd Bergmann
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Arnd Bergmann
     

17 Jun, 2019

1 commit

  • After commit b38ff4075a80, the following command does not work anymore:
    $ ip xfrm state add src 10.125.0.2 dst 10.125.0.1 proto esp spi 34 reqid 1 \
    mode tunnel enc 'cbc(aes)' 0xb0abdba8b782ad9d364ec81e3a7d82a1 auth-trunc \
    'hmac(sha1)' 0xe26609ebd00acb6a4d51fca13e49ea78a72c73e6 96 flag align4

    In fact, the selector is not mandatory, allow the user to provide an empty
    selector.

    Fixes: b38ff4075a80 ("xfrm: Fix xfrm sel prefix length validation")
    CC: Anirudh Gupta
    Signed-off-by: Nicolas Dichtel
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

14 Jun, 2019

1 commit

  • Pointer members of an object with static storage duration, if not
    explicitly initialized, will be initialized to a NULL pointer. The
    net namespace API checks if this pointer is not NULL before using it,
    it are safe to remove the function.

    Signed-off-by: Li RongQing
    Signed-off-by: Steffen Klassert

    Li RongQing
     

12 Jun, 2019

1 commit

  • net/xfrm/xfrm_input.c:378:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
    skb->protocol = htons(ETH_P_IPV6);

    ... the fallthrough then causes a bogus WARN_ON().

    Reported-by: Stephen Rothwell
    Fixes: 4c203b0454b ("xfrm: remove eth_proto value from xfrm_state_afinfo")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

06 Jun, 2019

2 commits

  • Only a handful of xfrm_types exist, no need to have 512 pointers for them.

    Reduces size of afinfo struct from 4k to 120 bytes on 64bit platforms.

    Also, the unregister function doesn't need to return an error, no single
    caller does anything useful with it.

    Just place a WARN_ON() where needed instead.

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • xfrm_prepare_input needs to lookup the state afinfo backend again to fetch
    the address family ethernet protocol value.

    There are only two address families, so a switch statement is simpler.
    While at it, use u8 for family and proto and remove the owner member --
    its not used anywhere.

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal