31 Aug, 2022

4 commits

  • [ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ]

    While reading netdev_max_backlog, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    While at it, we remove the unnecessary spaces in the doc.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit 17ecd4a4db4783392edd4944f5e8268205083f70 ]

    When we try to transmit an skb with metadata_dst attached (i.e. dst->dev
    == NULL) through xfrm interface we can hit a null pointer dereference[1]
    in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a
    loopback skb device when there's no policy which dereferences dst->dev
    unconditionally. Not having dst->dev can be interepreted as it not being
    a loopback device, so just add a check for a null dst_orig->dev.

    With this fix xfrm interface's Tx error counters go up as usual.

    [1] net-next calltrace captured via netconsole:
    BUG: kernel NULL pointer dereference, address: 00000000000000c0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP
    CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
    RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60
    Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02
    RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80
    RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000
    R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880
    R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
    FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0
    Call Trace:

    xfrmi_xmit+0xde/0x460
    ? tcf_bpf_act+0x13d/0x2a0
    dev_hard_start_xmit+0x72/0x1e0
    __dev_queue_xmit+0x251/0xd30
    ip_finish_output2+0x140/0x550
    ip_push_pending_frames+0x56/0x80
    raw_sendmsg+0x663/0x10a0
    ? try_charge_memcg+0x3fd/0x7a0
    ? __mod_memcg_lruvec_state+0x93/0x110
    ? sock_sendmsg+0x30/0x40
    sock_sendmsg+0x30/0x40
    __sys_sendto+0xeb/0x130
    ? handle_mm_fault+0xae/0x280
    ? do_user_addr_fault+0x1e7/0x680
    ? kvm_read_and_reset_apf_flags+0x3b/0x50
    __x64_sys_sendto+0x20/0x30
    do_syscall_64+0x34/0x80
    entry_SYSCALL_64_after_hwframe+0x46/0xb0
    RIP: 0033:0x7ff35cac1366
    Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
    RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366
    RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003
    RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
    R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001

    Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole]
    CR2: 00000000000000c0

    CC: Steffen Klassert
    CC: Daniel Borkmann
    Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Nikolay Aleksandrov
     
  • [ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ]

    x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
    migrate.

    Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Antony Antony
     
  • [ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]

    The issue happens on an error path in __xfrm_policy_check(). When the
    fetching process of the object `pols[1]` fails, the function simply
    returns 0, forgetting to decrement the reference count of `pols[0]`,
    which is incremented earlier by either xfrm_sk_policy_lookup() or
    xfrm_policy_lookup(). This may result in memory leaks.

    Fix it by decreasing the reference count of `pols[0]` in that path.

    Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup")
    Signed-off-by: Xin Xiong
    Signed-off-by: Xin Tan
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Xiong
     

29 Jul, 2022

2 commits

  • [ Upstream commit 0968d2a441bf6afb551fd99e60fa65ed67068963 ]

    While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Kuniyuki Iwashima
     
  • [ Upstream commit f85daf0e725358be78dfd208dea5fd665d8cb901 ]

    xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
    pols[0]. This refcount can be dropped in xfrm_expand_policies() when
    xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
    here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
    num_pols == 1 to drop this refcount when xfrm_expand_policies() return
    error.

    This patch also fix an illegal address access. pols[0] will save a error
    point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
    an illegal address in xfrm_bundle_lookup's error path.

    Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.

    Fixes: 80c802f3073e ("xfrm: cache bundles instead of policies for outgoing flows")
    Signed-off-by: Hangyu Hua
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Hangyu Hua
     

25 May, 2022

1 commit

  • [ Upstream commit b58b1f563ab78955d37e9e43e02790a85c66ac05 ]

    This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API
    complete"). The goal is to align userland API to the internal structures.

    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Antony Antony
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Nicolas Dichtel
     

08 Apr, 2022

1 commit

  • [ Upstream commit 4ff2980b6bd2aa6b4ded3ce3b7c0ccfab29980af ]

    in tunnel mode, if outer interface(ipv4) is less, it is easily to let
    inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
    is received. When send again, packets are fragmentized with 1280, they
    are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().

    According to RFC4213 Section3.2.2:
    if (IPv4 path MTU - 20) is less than 1280
    if packet is larger than 1280 bytes
    Send ICMPv6 "packet too big" with MTU=1280
    Drop packet
    else
    Encapsulate but do not set the Don't Fragment
    flag in the IPv4 header. The resulting IPv4
    packet might be fragmented by the IPv4 layer
    on the encapsulator or by some router along
    the IPv4 path.
    endif
    else
    if packet is larger than (IPv4 path MTU - 20)
    Send ICMPv6 "packet too big" with
    MTU = (IPv4 path MTU - 20).
    Drop packet.
    else
    Encapsulate and set the Don't Fragment flag
    in the IPv4 header.
    endif
    endif
    Packets should be fragmentized with ipv4 outer interface, so change it.

    After it is fragemtized with ipv4, there will be double fragmenation.
    No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
    then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
    cannot decrypt it rightly.

    48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
    49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
    50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000)
    51 2002::10 2002::11 180 Echo (ping) request
    52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)

    xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
    1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
    2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
    3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request

    Signed-off-by: Lina Wang
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Lina Wang
     

19 Mar, 2022

3 commits

  • [ Upstream commit e03c3bba351f99ad932e8f06baa9da1afc418e02 ]

    xfrm_migrate cannot handle address family change of an xfrm_state.
    The symptons are the xfrm_state will be migrated to a wrong address,
    and sending as well as receiving packets wil be broken.

    This commit fixes it by breaking the original xfrm_state_clone
    method into two steps so as to update the props.family before
    running xfrm_init_state. As the result, xfrm_state's inner mode,
    outer mode, type and IP header length in xfrm_state_migrate can
    be updated with the new address family.

    Tested with additions to Android's kernel unit test suite:
    https://android-review.googlesource.com/c/kernel/tests/+/1885354

    Signed-off-by: Yan Yan
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Yan Yan
     
  • [ Upstream commit c1aca3080e382886e2e58e809787441984a2f89b ]

    This patch enables distinguishing SAs and SPs based on if_id during
    the xfrm_migrate flow. This ensures support for xfrm interfaces
    throughout the SA/SP lifecycle.

    When there are multiple existing SPs with the same direction,
    the same xfrm_selector and different endpoint addresses,
    xfrm_migrate might fail with ENODATA.

    Specifically, the code path for performing xfrm_migrate is:
    Stage 1: find policy to migrate with
    xfrm_migrate_policy_find(sel, dir, type, net)
    Stage 2: find and update state(s) with
    xfrm_migrate_state_find(mp, net)
    Stage 3: update endpoint address(es) of template(s) with
    xfrm_policy_migrate(pol, m, num_migrate)

    Currently "Stage 1" always returns the first xfrm_policy that
    matches, and "Stage 3" looks for the xfrm_tmpl that matches the
    old endpoint address. Thus if there are multiple xfrm_policy
    with same selector, direction, type and net, "Stage 1" might
    rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA
    because it cannot find a xfrm_tmpl with the matching endpoint
    address.

    The fix is to allow userspace to pass an if_id and add if_id
    to the matching rule in Stage 1 and Stage 2 since if_id is a
    unique ID for xfrm_policy and xfrm_state. For compatibility,
    if_id will only be checked if the attribute is set.

    Tested with additions to Android's kernel unit test suite:
    https://android-review.googlesource.com/c/kernel/tests/+/1668886

    Signed-off-by: Yan Yan
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Yan Yan
     
  • commit a3d9001b4e287fc043e5539d03d71a32ab114bcb upstream.

    This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID
    0 was meant to be used for configuring the policy/state without
    matching for a specific interface (e.g., Cilium is affected, see
    https://github.com/cilium/cilium/pull/18789 and
    https://github.com/cilium/cilium/pull/19019).

    Signed-off-by: Kai Lueke
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Kai Lueke
     

09 Mar, 2022

3 commits

  • commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.

    This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.

    Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
    should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
    calculation in ipsec transport mode, resulting complete stalls of TCP
    connections. This happens when the (P)MTU is 1280 or slighly larger.

    The desired formula for the MSS is:
    MSS = (MTU - ESP_overhead) - IP header - TCP header

    However, the above commit clamps the (MTU - ESP_overhead) to a
    minimum of 1280, turning the formula into
    MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header

    With the (P)MTU near 1280, the calculated MSS is too large and the
    resulting TCP packets never make it to the destination because they
    are over the actual PMTU.

    The above commit also causes suboptimal double fragmentation in
    xfrm tunnel mode, as described in
    https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/

    The original problem the above commit was trying to fix is now fixed
    by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
    regression").

    Signed-off-by: Jiri Bohac
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Jiri Bohac
     
  • commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream.

    struct xfrm_user_offload has flags variable that received user input,
    but kernel didn't check if valid bits were provided. It caused a situation
    where not sanitized input was forwarded directly to the drivers.

    For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
    strongswan, but not implemented in the kernel at all.

    As a solution, check and sanitize input flags to forward
    XFRM_OFFLOAD_INBOUND to the drivers.

    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Leon Romanovsky
     
  • commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream.

    if_id will be always 0, because it was not yet initialized.

    Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error")
    Reported-by: Pavel Machek
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Antony Antony
     

27 Jan, 2022

7 commits

  • commit 23e7b1bfed61e301853b5e35472820d919498278 upstream.

    Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"),
    clear the ECN bits from iph->tos when setting ->flowi4_tos.
    This ensures that the last bit of ->flowi4_tos is cleared, so
    ip_route_output_key_hash() isn't going to restrict the scope of the
    route lookup.

    Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
    to clear the high order bits.

    Found by code inspection, compile tested only.

    Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • commit bcf141b2eb551b3477b24997ebc09c65f117a803 upstream.

    On egress side, xfrm lookup is called from __gre6_xmit() with the
    fl6_gre_key field not initialized leading to policies selectors check
    failure. Consequently, gre packets are sent without encryption.

    On ingress side, INET6_PROTO_NOPOLICY was set, thus packets were not
    checked against xfrm policies. Like for egress side, fl6_gre_key should be
    correctly set, this is now done in decode_session6().

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ghalem Boudour
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Ghalem Boudour
     
  • [ Upstream commit 4e484b3e969b52effd95c17f7a86f39208b2ccf4 ]

    Kernel generates mapping change message, XFRM_MSG_MAPPING,
    when a source port chage is detected on a input state with UDP
    encapsulation set. Kernel generates a message for each IPsec packet
    with new source port. For a high speed flow per packet mapping change
    message can be excessive, and can overload the user space listener.

    Introduce rate limiting for XFRM_MSG_MAPPING message to the user space.

    The rate limiting is configurable via netlink, when adding a new SA or
    updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds.

    v1->v2 change:
    update xfrm_sa_len()

    v2->v3 changes:
    use u32 insted unsigned long to reduce size of struct xfrm_state
    fix xfrm_ompat size Reported-by: kernel test robot
    accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present

    Co-developed-by: Thomas Egerer
    Signed-off-by: Thomas Egerer
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Antony Antony
     
  • [ Upstream commit 45a98ef4922def8c679ca7c454403d1957fe70e7 ]

    The inner_ipproto saves the inner IP protocol of the plain
    text packet. This allows vendor's IPsec feature making offload
    decision at skb's features_check and configuring hardware at
    ndo_start_xmit, current code implenetation did not handle the
    case where IPsec is used in tunnel mode.

    Fix by handling the case when IPsec is used in tunnel mode by
    reading the protocol of the plain text packet IP protocol.

    Fixes: fa4535238fb5 ("net/xfrm: Add inner_ipproto into sec_path")
    Signed-off-by: Raed Salem
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Raed Salem
     
  • [ Upstream commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 ]

    xfrm ineterface does not allow xfrm if_id = 0
    fail to create or update xfrm state and policy.

    With this commit:
    ip xfrm policy add src 192.0.2.1 dst 192.0.2.2 dir out if_id 0
    RTNETLINK answers: Invalid argument

    ip xfrm state add src 192.0.2.1 dst 192.0.2.2 proto esp spi 1 \
    reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \
    0x1111111111111111111111111111111111111111 96 if_id 0
    RTNETLINK answers: Invalid argument

    v1->v2 change:
    - add Fixes: tag

    Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces")
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Antony Antony
     
  • [ Upstream commit 8dce43919566f06e865f7e8949f5c10d8c2493f5 ]

    xfrm interface if_id = 0 would cause xfrm policy lookup errors since
    Commit 9f8550e4bd9d.

    Now explicitly fail to create an xfrm interface when if_id = 0

    With this commit:
    ip link add ipsec0 type xfrm dev lo if_id 0
    Error: if_id must be non zero.

    v1->v2 change:
    - add Fixes: tag

    Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces")
    Signed-off-by: Antony Antony
    Reviewed-by: Eyal Birger
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Antony Antony
     
  • [ Upstream commit 7770a39d7c63faec6c4f33666d49a8cb664d0482 ]

    copy_user_offload() will actually push a struct struct xfrm_user_offload,
    which is different than (struct xfrm_state *)->xso
    (struct xfrm_state_offload)

    Fixes: d77e38e612a01 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Eric Dumazet
    Cc: Steffen Klassert
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Eric Dumazet
     

23 Sep, 2021

1 commit

  • As stated in the comment above xfrm_nlmsg_multicast(), rcu read lock must
    be held before calling this function.

    Reported-by: syzbot+3d9866419b4aa8f985d6@syzkaller.appspotmail.com
    Fixes: 703b94b93c19 ("xfrm: notify default policy on update")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

15 Sep, 2021

2 commits

  • This configuration knob is very sensible, it should be notified when
    changing.

    Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     
  • >From a userland POV, this API was based on some magic values:
    - dirmask and action were bitfields but meaning of bits
    (XFRM_POL_DEFAULT_*) are not exported;
    - action is confusing, if a bit is set, does it mean drop or accept?

    Let's try to simplify this uapi by using explicit field and macros.

    Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

09 Sep, 2021

1 commit

  • Syzbot hit shift-out-of-bounds in xfrm_get_default. The problem was in
    missing validation check for user data.

    up->dirmask comes from user-space, so we need to check if this value
    is less than XFRM_USERPOLICY_DIRMASK_MAX to avoid shift-out-of-bounds bugs.

    Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
    Reported-and-tested-by: syzbot+b2be9dd8ca6f6c73ee2d@syzkaller.appspotmail.com
    Signed-off-by: Pavel Skripkin
    Signed-off-by: Steffen Klassert

    Pavel Skripkin
     

27 Aug, 2021

1 commit

  • ipsec-next

    Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2021-08-27

    1) Remove an unneeded extra variable in esp4 esp_ssg_unref.
    From Corey Minyard.

    2) Add a configuration option to change the default behaviour
    to block traffic if there is no matching policy.
    Joint work with Christian Langrock and Antony Antony.

    3) Fix a shift-out-of-bounce bug reported from syzbot.
    From Pavel Skripkin.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Aug, 2021

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2021-08-04

    1) Fix a sysbot reported memory leak in xfrm_user_rcv_msg.
    From Pavel Skripkin.

    2) Revert "xfrm: policy: Read seqcount outside of rcu-read side
    in xfrm_policy_lookup_bytype". This commit tried to fix a
    lockin bug, but only cured some of the symptoms. A proper
    fix is applied on top of this revert.

    3) Fix a locking bug on xfrm state hash resize. A recent change
    on sequence counters accidentally repaced a spinlock by a mutex.
    Fix from Frederic Weisbecker.

    4) Fix possible user-memory-access in xfrm_user_rcv_msg_compat().
    From Dmitry Safonov.

    5) Add initialiation sefltest fot xfrm_spdattr_type_t.
    From Dmitry Safonov.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Jul, 2021

1 commit

  • We need to check up->dirmask to avoid shift-out-of-bounce bug,
    since up->dirmask comes from userspace.

    Also, added XFRM_USERPOLICY_DIRMASK_MAX constant to uapi to inform
    user-space that up->dirmask has maximum possible value

    Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy")
    Reported-and-tested-by: syzbot+9cd5837a045bbee5b810@syzkaller.appspotmail.com
    Signed-off-by: Pavel Skripkin
    Signed-off-by: Steffen Klassert

    Pavel Skripkin
     

26 Jul, 2021

1 commit


21 Jul, 2021

2 commits

  • The attribute-translator has to take in mind maxtype, that is
    xfrm_link::nla_max. When it is set, attributes are not of xfrm_attr_type_t.
    Currently, they can be only XFRMA_SPD_MAX (message XFRM_MSG_NEWSPDINFO),
    their UABI is the same for 64/32-bit, so just copy them.

    Thanks to YueHaibing for reporting this:
    In xfrm_user_rcv_msg_compat() if maxtype is not zero and less than
    XFRMA_MAX, nlmsg_parse_deprecated() do not initialize attrs array fully.
    xfrm_xlate32() will access uninit 'attrs[i]' while iterating all attrs
    array.

    KASAN: probably user-memory-access in range [0x0000000041b58ab0-0x0000000041b58ab7]
    CPU: 0 PID: 15799 Comm: syz-executor.2 Tainted: G W 5.14.0-rc1-syzkaller #0
    RIP: 0010:nla_type include/net/netlink.h:1130 [inline]
    RIP: 0010:xfrm_xlate32_attr net/xfrm/xfrm_compat.c:410 [inline]
    RIP: 0010:xfrm_xlate32 net/xfrm/xfrm_compat.c:532 [inline]
    RIP: 0010:xfrm_user_rcv_msg_compat+0x5e5/0x1070 net/xfrm/xfrm_compat.c:577
    [...]
    Call Trace:
    xfrm_user_rcv_msg+0x556/0x8b0 net/xfrm/xfrm_user.c:2774
    netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
    xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2824
    netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
    netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
    netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
    sock_sendmsg_nosec net/socket.c:702 [inline]

    Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
    Cc:
    Reported-by: YueHaibing
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Steffen Klassert

    Dmitry Safonov
     
  • As the default we assume the traffic to pass, if we have no
    matching IPsec policy. With this patch, we have a possibility to
    change this default from allow to block. It can be configured
    via netlink. Each direction (input/output/forward) can be
    configured separately. With the default to block configuered,
    we need allow policies for all packet flows we accept.
    We do not use default policy lookup for the loopback device.

    v1->v2
    - fix compiling when XFRM is disabled
    - Reported-by: kernel test robot

    Co-developed-by: Christian Langrock
    Signed-off-by: Christian Langrock
    Co-developed-by: Antony Antony
    Signed-off-by: Antony Antony
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

02 Jul, 2021

2 commits

  • xfrm_bydst_resize() calls synchronize_rcu() while holding
    hash_resize_mutex. But then on PREEMPT_RT configurations,
    xfrm_policy_lookup_bytype() may acquire that mutex while running in an
    RCU read side critical section. This results in a deadlock.

    In fact the scope of hash_resize_mutex is way beyond the purpose of
    xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy
    for a given destination/direction, along with other details.

    The lower level net->xfrm.xfrm_policy_lock, which among other things
    protects per destination/direction references to policy entries, is
    enough to serialize and benefit from priority inheritance against the
    write side. As a bonus, it makes it officially a per network namespace
    synchronization business where a policy table resize on namespace A
    shouldn't block a policy lookup on namespace B.

    Fixes: 77cc278f7b20 (xfrm: policy: Use sequence counters with associated lock)
    Cc: stable@vger.kernel.org
    Cc: Ahmed S. Darwish
    Cc: Peter Zijlstra (Intel)
    Cc: Varad Gautam
    Cc: Steffen Klassert
    Cc: Herbert Xu
    Cc: David S. Miller
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Steffen Klassert

    Frederic Weisbecker
     
  • This reverts commit d7b0408934c749f546b01f2b33d07421a49b6f3e.

    This commit tried to fix a locking bug introduced by commit 77cc278f7b20
    ("xfrm: policy: Use sequence counters with associated lock"). As it
    turned out, this patch did not really fix the bug. A proper fix
    for this bug is applied on top of this revert.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

01 Jul, 2021

2 commits

  • Pull networking updates from Jakub Kicinski:
    "Core:

    - BPF:
    - add syscall program type and libbpf support for generating
    instructions and bindings for in-kernel BPF loaders (BPF loaders
    for BPF), this is a stepping stone for signed BPF programs
    - infrastructure to migrate TCP child sockets from one listener to
    another in the same reuseport group/map to improve flexibility
    of service hand-off/restart
    - add broadcast support to XDP redirect

    - allow bypass of the lockless qdisc to improving performance (for
    pktgen: +23% with one thread, +44% with 2 threads)

    - add a simpler version of "DO_ONCE()" which does not require jump
    labels, intended for slow-path usage

    - virtio/vsock: introduce SOCK_SEQPACKET support

    - add getsocketopt to retrieve netns cookie

    - ip: treat lowest address of a IPv4 subnet as ordinary unicast
    address allowing reclaiming of precious IPv4 addresses

    - ipv6: use prandom_u32() for ID generation

    - ip: add support for more flexible field selection for hashing
    across multi-path routes (w/ offload to mlxsw)

    - icmp: add support for extended RFC 8335 PROBE (ping)

    - seg6: add support for SRv6 End.DT46 behavior

    - mptcp:
    - DSS checksum support (RFC 8684) to detect middlebox meddling
    - support Connection-time 'C' flag
    - time stamping support

    - sctp: packetization Layer Path MTU Discovery (RFC 8899)

    - xfrm: speed up state addition with seq set

    - WiFi:
    - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
    - aggregation handling improvements for some drivers
    - minstrel improvements for no-ack frames
    - deferred rate control for TXQs to improve reaction times
    - switch from round robin to virtual time-based airtime scheduler

    - add trace points:
    - tcp checksum errors
    - openvswitch - action execution, upcalls
    - socket errors via sk_error_report

    Device APIs:

    - devlink: add rate API for hierarchical control of max egress rate
    of virtual devices (VFs, SFs etc.)

    - don't require RCU read lock to be held around BPF hooks in NAPI
    context

    - page_pool: generic buffer recycling

    New hardware/drivers:

    - mobile:
    - iosm: PCIe Driver for Intel M.2 Modem
    - support for Qualcomm MSM8998 (ipa)

    - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

    - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

    - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

    - NXP SJA1110 Automotive Ethernet 10-port switch

    - Qualcomm QCA8327 switch support (qca8k)

    - Mikrotik 10/25G NIC (atl1c)

    Driver changes:

    - ACPI support for some MDIO, MAC and PHY devices from Marvell and
    NXP (our first foray into MAC/PHY description via ACPI)

    - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

    - Mellanox/Nvidia NIC (mlx5)
    - NIC VF offload of L2 bridging
    - support IRQ distribution to Sub-functions

    - Marvell (prestera):
    - add flower and match all
    - devlink trap
    - link aggregation

    - Netronome (nfp): connection tracking offload

    - Intel 1GE (igc): add AF_XDP support

    - Marvell DPU (octeontx2): ingress ratelimit offload

    - Google vNIC (gve): new ring/descriptor format support

    - Qualcomm mobile (rmnet & ipa): inline checksum offload support

    - MediaTek WiFi (mt76)
    - mt7915 MSI support
    - mt7915 Tx status reporting
    - mt7915 thermal sensors support
    - mt7921 decapsulation offload
    - mt7921 enable runtime pm and deep sleep

    - Realtek WiFi (rtw88)
    - beacon filter support
    - Tx antenna path diversity support
    - firmware crash information via devcoredump

    - Qualcomm WiFi (wcn36xx)
    - Wake-on-WLAN support with magic packets and GTK rekeying

    - Micrel PHY (ksz886x/ksz8081): add cable test support"

    * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
    tcp: change ICSK_CA_PRIV_SIZE definition
    tcp_yeah: check struct yeah size at compile time
    gve: DQO: Fix off by one in gve_rx_dqo()
    stmmac: intel: set PCI_D3hot in suspend
    stmmac: intel: Enable PHY WOL option in EHL
    net: stmmac: option to enable PHY WOL with PMT enabled
    net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
    net: use netdev_info in ndo_dflt_fdb_{add,del}
    ptp: Set lookup cookie when creating a PTP PPS source.
    net: sock: add trace for socket errors
    net: sock: introduce sk_error_report
    net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
    net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
    net: dsa: include fdb entries pointing to bridge in the host fdb list
    net: dsa: include bridge addresses which are local in the host fdb list
    net: dsa: sync static FDB entries on foreign interfaces to hardware
    net: dsa: install the host MDB and FDB entries in the master's RX filter
    net: dsa: reference count the FDB addresses at the cross-chip notifier level
    net: dsa: introduce a separate cross-chip notifier type for host FDBs
    net: dsa: reference count the MDB entries at the cross-chip notifier level
    ...

    Linus Torvalds
     
  • Pull SELinux updates from Paul Moore:

    - The slow_avc_audit() function is now non-blocking so we can remove
    the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of
    avc_has_perm().

    - Use kmemdup() instead of kcalloc()+copy when copying parts of the
    SELinux policydb.

    - The InfiniBand device name is now passed by reference when possible
    in the SELinux code, removing a strncpy().

    - Minor cleanups including: constification of avtab function args,
    removal of useless LSM/XFRM function args, SELinux kdoc fixes, and
    removal of redundant assignments.

    * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
    selinux: slow_avc_audit has become non-blocking
    selinux: Fix kernel-doc
    selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
    lsm_audit,selinux: pass IB device name by reference
    selinux: Remove redundant assignment to rc
    selinux: Corrected comment to match kernel-doc comment
    selinux: delete selinux_xfrm_policy_lookup() useless argument
    selinux: constify some avtab function arguments
    selinux: simplify duplicate_policydb_cond_list() by using kmemdup()

    Linus Torvalds
     

30 Jun, 2021

1 commit


29 Jun, 2021

2 commits

  • Syzbot reported memory leak in xfrm_user_rcv_msg(). The
    problem was is non-freed skb's frag_list.

    In skb_release_all() skb_release_data() will be called only
    in case of skb->head != NULL, but netlink_skb_destructor()
    sets head to NULL. So, allocated frag_list skb should be
    freed manualy, since consume_skb() won't take care of it

    Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
    Reported-and-tested-by: syzbot+fb347cf82c73a90efcca@syzkaller.appspotmail.com
    Signed-off-by: Pavel Skripkin
    Signed-off-by: Steffen Klassert

    Pavel Skripkin
     
  • /klassert/ipsec-next

    Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2021-06-28

    1) Remove an unneeded error assignment in esp4_gro_receive().
    From Yang Li.

    2) Add a new byseq state hashtable to find acquire states faster.
    From Sabrina Dubroca.

    3) Remove some unnecessary variables in pfkey_create().
    From zuoqilin.

    4) Remove the unused description from xfrm_type struct.
    From Florian Westphal.

    5) Fix a spelling mistake in the comment of xfrm_state_ok().
    From gushengxian.

    6) Replace hdr_off indirections by a small helper function.
    From Florian Westphal.

    7) Remove xfrm4_output_finish and xfrm6_output_finish declarations,
    they are not used anymore.From Antony Antony.

    8) Remove xfrm replay indirections.
    From Florian Westphal.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2021

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2021-06-23

    1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery.
    From Sabrina Dubroca

    2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype
    for the PREEMPT_RT case. From Varad Gautam.

    3) Remove a repeated declaration of xfrm_parse_spi.
    From Shaokun Zhang.

    4) IPv4 beet mode can't handle fragments, but IPv6 does.
    commit 68dc022d04eb ("xfrm: BEET mode doesn't support
    fragments for inner packets") handled IPv4 and IPv6
    the same way. Relax the check for IPv6 because fragments
    are possible here. From Xin Long.

    5) Memory allocation failures are not reported for
    XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct.
    Fix this by moving both cases in front of the function.

    6) Fix a missing initialization in the xfrm offload fallback
    fail case for bonding devices. From Ayush Sawal.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Jun, 2021

1 commit

  • The inner_ipproto saves the inner IP protocol of the plain
    text packet. This allows vendor's IPsec feature making offload
    decision at skb's features_check and configuring hardware at
    ndo_start_xmit.

    For example, ConnectX6-DX IPsec device needs the plaintext's
    IP protocol to support partial checksum offload on
    VXLAN/GENEVE packet over IPsec transport mode tunnel.

    Signed-off-by: Raed Salem
    Signed-off-by: Huy Nguyen
    Cc: Steffen Klassert
    Acked-by: Steffen Klassert
    Signed-off-by: Saeed Mahameed

    Huy Nguyen