24 Mar, 2019

1 commit

  • [ Upstream commit 660899ddf06ae8bb5bbbd0a19418b739375430c5 ]

    After moving an XFRM interface to another namespace it stays associated
    with the original namespace (net in `struct xfrm_if` and the list keyed
    with `xfrmi_net_id`), allowing processes in the new namespace to use
    SAs/policies that were created in the original namespace. For instance,
    this allows a keying daemon in one namespace to establish IPsec SAs for
    other namespaces without processes there having access to the keys or IKE
    credentials.

    This worked fine for outbound traffic, however, for inbound traffic the
    lookup for the interfaces and the policies used the incorrect namespace
    (the one the XFRM interface was moved to).

    Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Tobias Brunner
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Tobias Brunner
     

15 Feb, 2019

2 commits

  • commit 35e6103861a3a970de6c84688c6e7a1f65b164ca upstream.

    The check assumes that in transport mode, the first templates family
    must match the address family of the policy selector.

    Syzkaller managed to build a template using MODE_ROUTEOPTIMIZATION,
    with ipv4-in-ipv6 chain, leading to following splat:

    BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x1db/0x1854
    Read of size 4 at addr ffff888063e57aa0 by task a.out/2050
    xfrm_state_find+0x1db/0x1854
    xfrm_tmpl_resolve+0x100/0x1d0
    xfrm_resolve_and_create_bundle+0x108/0x1000 [..]

    Problem is that addresses point into flowi4 struct, but xfrm_state_find
    treats them as being ipv6 because it uses templ->encap_family is used
    (AF_INET6 in case of reproducer) rather than family (AF_INET).

    This patch inverts the logic: Enforce 'template family must match
    selector' EXCEPT for tunnel and BEET mode.

    In BEET and Tunnel mode, xfrm_tmpl_resolve_one will have remote/local
    address pointers changed to point at the addresses found in the template,
    rather than the flowi ones, so no oob read will occur.

    Reported-by: 3ntr0py1337@gmail.com
    Reported-by: Daniel Borkmann
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit e2612cd496e7b465711d219ea6118893d7253f52 upstream.

    Fixes 9b42c1f179a6, which changed the default route lookup behavior for
    tunnel mode SAs in the outbound direction to use the skb mark, whereas
    previously mark=0 was used if the output mark was unspecified. In
    mark-based routing schemes such as Android’s, this change in default
    behavior causes routing loops or lookup failures.

    This patch restores the default behavior of using a 0 mark while still
    incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is
    specified.

    Tested with additions to Android's kernel unit test suite:
    https://android-review.googlesource.com/c/kernel/tests/+/860150

    Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking")
    Signed-off-by: Benedict Wong
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Benedict Wong
     

13 Jan, 2019

3 commits

  • [ Upstream commit 0152eee6fc3b84298bb6a79961961734e8afa5b8 ]

    Since commit 222d7dbd258d ("net: prevent dst uses after free")
    skb_dst_force() might clear the dst_entry attached to the skb.
    The xfrm code doesn't expect this to happen, so we crash with
    a NULL pointer dereference in this case.

    Fix it by checking skb_dst(skb) for NULL after skb_dst_force()
    and drop the packet in case the dst_entry was cleared. We also
    move the skb_dst_force() to a codepath that is not used when
    the transformation was offloaded, because in this case we
    don't have a dst_entry attached to the skb.

    The output and forwarding path was already fixed by
    commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
    skb_dst_force clears the dst_entry.")

    Fixes: 222d7dbd258d ("net: prevent dst uses after free")
    Reported-by: Jean-Philippe Menil
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Steffen Klassert
     
  • [ Upstream commit ca92e173ab34a4f7fc4128bd372bd96f1af6f507 ]

    sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the
    hash mask.

    Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.")
    Signed-off-by: Benjamin Poirier
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Benjamin Poirier
     
  • [ Upstream commit 533555e5cbb6aa2d77598917871ae5b579fe724b ]

    xfrm_output_one() does not return a error code when there is
    no dst_entry attached to the skb, it is still possible crash
    with a NULL pointer dereference in xfrm_output_resume(). Fix
    it by return error code -EHOSTUNREACH.

    Fixes: 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.")
    Signed-off-by: Wei Yongjun
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Wei Yongjun
     

29 Dec, 2018

1 commit

  • commit 4a135e538962cb00a9667c82e7d2b9e4d7cd7177 upstream.

    Commit 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct
    xfrm_state") moved xfrm state objects to use their own slab cache.
    However, it missed to adapt xfrm_user to use this new cache when
    freeing xfrm states.

    Fix this by introducing and make use of a new helper for freeing
    xfrm_state objects.

    Fixes: 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct xfrm_state")
    Reported-by: Pan Bian
    Cc: # v4.18+
    Signed-off-by: Mathias Krause
    Acked-by: Herbert Xu
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Mathias Krause
     

11 Oct, 2018

1 commit


02 Oct, 2018

2 commits

  • The device gro_cells has been initialized, it should be freed,
    otherwise it will be leaked

    Fixes: f203b76d78092faf2 ("xfrm: Add virtual xfrm interfaces")
    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Signed-off-by: Steffen Klassert

    Li RongQing
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2018-10-01

    1) Validate address prefix lengths in the xfrm selector,
    otherwise we may hit undefined behaviour in the
    address matching functions if the prefix is too
    big for the given address family.

    2) Fix skb leak on local message size errors.
    From Thadeu Lima de Souza Cascardo.

    3) We currently reset the transport header back to the network
    header after a transport mode transformation is applied. This
    leads to an incorrect transport header when multiple transport
    mode transformations are applied. Reset the transport header
    only after all transformations are already applied to fix this.
    From Sowmini Varadhan.

    4) We only support one offloaded xfrm, so reset crypto_done after
    the first transformation in xfrm_input(). Otherwise we may call
    the wrong input method for subsequent transformations.
    From Sowmini Varadhan.

    5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
    skb_dst_force does not really force a dst refcount anymore, it might
    clear it instead. xfrm code did not expect this, add a check to not
    dereference skb_dst() if it was cleared by skb_dst_force.

    6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
    read in xfrm_state_find. From Sean Tranchetti.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Sep, 2018

1 commit

  • XFRM mode parameters passed as part of the user templates
    in the IP_XFRM_POLICY are never properly validated. Passing
    values other than valid XFRM modes can cause stack-out-of-bounds
    reads to occur later in the XFRM processing:

    [ 140.535608] ================================================================
    [ 140.543058] BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x17e4/0x1cc4
    [ 140.550306] Read of size 4 at addr ffffffc0238a7a58 by task repro/5148
    [ 140.557369]
    [ 140.558927] Call trace:
    [ 140.558936] dump_backtrace+0x0/0x388
    [ 140.558940] show_stack+0x24/0x30
    [ 140.558946] __dump_stack+0x24/0x2c
    [ 140.558949] dump_stack+0x8c/0xd0
    [ 140.558956] print_address_description+0x74/0x234
    [ 140.558960] kasan_report+0x240/0x264
    [ 140.558963] __asan_report_load4_noabort+0x2c/0x38
    [ 140.558967] xfrm_state_find+0x17e4/0x1cc4
    [ 140.558971] xfrm_resolve_and_create_bundle+0x40c/0x1fb8
    [ 140.558975] xfrm_lookup+0x238/0x1444
    [ 140.558977] xfrm_lookup_route+0x48/0x11c
    [ 140.558984] ip_route_output_flow+0x88/0xc4
    [ 140.558991] raw_sendmsg+0xa74/0x266c
    [ 140.558996] inet_sendmsg+0x258/0x3b0
    [ 140.559002] sock_sendmsg+0xbc/0xec
    [ 140.559005] SyS_sendto+0x3a8/0x5a8
    [ 140.559008] el0_svc_naked+0x34/0x38
    [ 140.559009]
    [ 140.592245] page dumped because: kasan: bad access detected
    [ 140.597981] page_owner info is not active (free page?)
    [ 140.603267]
    [ 140.653503] ================================================================

    Signed-off-by: Sean Tranchetti
    Signed-off-by: Steffen Klassert

    Sean Tranchetti
     

11 Sep, 2018

1 commit

  • Since commit 222d7dbd258d ("net: prevent dst uses after free")
    skb_dst_force() might clear the dst_entry attached to the skb.
    The xfrm code don't expect this to happen, so we crash with
    a NULL pointer dereference in this case. Fix it by checking
    skb_dst(skb) for NULL after skb_dst_force() and drop the packet
    in cast the dst_entry was cleared.

    Fixes: 222d7dbd258d ("net: prevent dst uses after free")
    Reported-by: Tobias Hommel
    Reported-by: Kristian Evensen
    Reported-by: Wolfgang Walter
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

04 Sep, 2018

1 commit

  • We only support one offloaded xfrm (we do not have devices that
    can handle more than one offload), so reset crypto_done in
    xfrm_input() when iterating over multiple transforms in xfrm_input,
    so that we can invoke the appropriate x->type->input for the
    non-offloaded transforms

    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Sowmini Varadhan
    Signed-off-by: Steffen Klassert

    Sowmini Varadhan
     

03 Aug, 2018

2 commits

  • We don't validate the address prefix lengths in the xfrm
    selector we got from userspace. This can lead to undefined
    behaviour in the address matching functions if the prefix
    is too big for the given address family. Fix this by checking
    the prefixes and refuse SA/policy insertation when a prefix
    is invalid.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: Air Icy
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • The BTF conflicts were simple overlapping changes.

    The virtio_net conflict was an overlap of a fix of statistics counter,
    happening alongisde a move over to a bonafide statistics structure
    rather than counting value on the stack.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Jul, 2018

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2018-07-27

    1) Extend the output_mark to also support the input direction
    and masking the mark values before applying to the skb.

    2) Add a new lookup key for the upcomming xfrm interfaces.

    3) Extend the xfrm lookups to match xfrm interface IDs.

    4) Add virtual xfrm interfaces. The purpose of these interfaces
    is to overcome the design limitations that the existing
    VTI devices have.

    The main limitations that we see with the current VTI are the
    following:

    VTI interfaces are L3 tunnels with configurable endpoints.
    For xfrm, the tunnel endpoint are already determined by the SA.
    So the VTI tunnel endpoints must be either the same as on the
    SA or wildcards. In case VTI tunnel endpoints are same as on
    the SA, we get a one to one correlation between the SA and
    the tunnel. So each SA needs its own tunnel interface.

    On the other hand, we can have only one VTI tunnel with
    wildcard src/dst tunnel endpoints in the system because the
    lookup is based on the tunnel endpoints. The existing tunnel
    lookup won't work with multiple tunnels with wildcard
    tunnel endpoints. Some usecases require more than on
    VTI tunnel of this type, for example if somebody has multiple
    namespaces and every namespace requires such a VTI.

    VTI needs separate interfaces for IPv4 and IPv6 tunnels.
    So when routing to a VTI, we have to know to which address
    family this traffic class is going to be encapsulated.
    This is a lmitation because it makes routing more complex
    and it is not always possible to know what happens behind the
    VTI, e.g. when the VTI is move to some namespace.

    VTI works just with tunnel mode SAs. We need generic interfaces
    that ensures transfomation, regardless of the xfrm mode and
    the encapsulated address family.

    VTI is configured with a combination GRE keys and xfrm marks.
    With this we have to deal with some extra cases in the generic
    tunnel lookup because the GRE keys on the VTI are actually
    not GRE keys, the GRE keys were just reused for something else.
    All extensions to the VTI interfaces would require to add
    even more complexity to the generic tunnel lookup.

    So to overcome this, we developed xfrm interfaces with the
    following design goal:

    It should be possible to tunnel IPv4 and IPv6 through the same
    interface.

    No limitation on xfrm mode (tunnel, transport and beet).

    Should be a generic virtual interface that ensures IPsec
    transformation, no need to know what happens behind the
    interface.

    Interfaces should be configured with a new key that must match a
    new policy/SA lookup key.

    The lookup logic should stay in the xfrm codebase, no need to
    change or extend generic routing and tunnel lookups.

    Should be possible to use IPsec hardware offloads of the underlying
    interface.

    5) Remove xfrm pcpu policy cache. This was added after the flowcache
    removal, but it turned out to make things even worse.
    From Florian Westphal.

    6) Allow to update the set mark on SA updates.
    From Nathan Harold.

    7) Convert some timestamps to time64_t.
    From Arnd Bergmann.

    8) Don't check the offload_handle in xfrm code,
    it is an opaque data cookie for the driver.
    From Shannon Nelson.

    9) Remove xfrmi interface ID from flowi. After this pach
    no generic code is touched anymore to do xfrm interface
    lookups. From Benedict Wong.

    10) Allow to update the xfrm interface ID on SA updates.
    From Nathan Harold.

    11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
    From YueHaibing.

    12) Return more detailed errors on xfrm interface creation.
    From Benedict Wong.

    13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
    From the kbuild test robot.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Jul, 2018

1 commit

  • net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used

    Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

    Generated by: scripts/coccinelle/api/ptr_ret.cocci

    Fixes: 44e2b838c24d ("xfrm: Return detailed errors from xfrmi_newlink")
    CC: Benedict Wong
    Signed-off-by: kbuild test robot
    Signed-off-by: Steffen Klassert

    kbuild test robot
     

26 Jul, 2018

2 commits

  • Currently all failure modes of xfrm interface creation return EEXIST.
    This change improves the granularity of errnos provided by also
    returning ENODEV or EINVAL if failures happen in looking up the
    underlying interface, or a required parameter is not provided.

    This change has been tested against the Android Kernel Networking Tests,
    with additional xfrmi_newlink tests here:

    https://android-review.googlesource.com/c/kernel/tests/+/715755

    Signed-off-by: Benedict Wong
    Signed-off-by: Steffen Klassert

    Benedict Wong
     
  • Fix a static code checker warning:

    net/xfrm/xfrm_policy.c:1836 xfrm_resolve_and_create_bundle() warn: passing zero to 'ERR_PTR'

    xfrm_tmpl_resolve return 0 just means no xdst found, return NULL
    instead of passing zero to ERR_PTR.

    Fixes: d809ec895505 ("xfrm: do not assume that template resolving always returns xfrms")
    Signed-off-by: YueHaibing
    Signed-off-by: Steffen Klassert

    YueHaibing
     

25 Jul, 2018

1 commit


20 Jul, 2018

2 commits

  • Allow attaching an SA to an xfrm interface id after
    the creation of the SA, so that tasks such as keying
    which must be done as the SA is created, can remain
    separate from the decision on how to route traffic
    from an SA. This permits SA creation to be decomposed
    in to three separate steps:
    1) allocation of a SPI
    2) algorithm and key negotiation
    3) insertion into the data path

    Signed-off-by: Nathan Harold
    Signed-off-by: Steffen Klassert

    Nathan Harold
     
  • In order to remove performance impact of having the extra u32 in every
    single flowi, this change removes the flowi_xfrm struct, prefering to
    take the if_id as a method parameter where needed.

    In the inbound direction, if_id is only needed during the
    __xfrm_check_policy() function, and the if_id can be determined at that
    point based on the skb. As such, xfrmi_decode_session() is only called
    with the skb in __xfrm_check_policy().

    In the outbound direction, the only place where if_id is needed is the
    xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
    directly passed into the xfrm_lookup_with_ifid() call. All existing
    callers can still call xfrm_lookup(), which uses a default if_id of 0.

    This change does not change any behavior of XFRMIs except for improving
    overall system performance via flowi size reduction.

    This change has been tested against the Android Kernel Networking Tests:

    https://android.googlesource.com/kernel/tests/+/master/net/test

    Signed-off-by: Benedict Wong
    Signed-off-by: Steffen Klassert

    Benedict Wong
     

19 Jul, 2018

1 commit

  • The offload_handle should be an opaque data cookie for the driver
    to use, much like the data cookie for a timer or alarm callback.
    Thus, the XFRM stack should not be checking for non-zero, because
    the driver might use that to store an array reference, which could
    be zero, or some other zero but meaningful value.

    We can remove the checks for non-zero because there are plenty
    other attributes also being checked to see if there is an offload
    in place for the SA in question.

    Signed-off-by: Shannon Nelson
    Signed-off-by: Steffen Klassert

    Shannon Nelson
     

11 Jul, 2018

1 commit

  • The lifetime managment uses '__u64' timestamps on the user space
    interface, but 'unsigned long' for reading the current time in the kernel
    with get_seconds().

    While this is probably safe beyond y2038, it will still overflow in 2106,
    and the get_seconds() call is deprecated because fo that.

    This changes the xfrm time handling to use time64_t consistently, along
    with reading the time using the safer ktime_get_real_seconds(). It still
    suffers from problems that can happen from a concurrent settimeofday()
    call or (to a lesser degree) a leap second update, but since the time
    stamps are part of the user API, there is nothing we can do to prevent
    that.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Steffen Klassert

    Arnd Bergmann
     

01 Jul, 2018

1 commit

  • Allow UPDSA to change "set mark" to permit
    policy separation of packet routing decisions from
    SA keying in systems that use mark-based routing.

    The set mark, used as a routing and firewall mark
    for outbound packets, is made update-able which
    allows routing decisions to be handled independently
    of keying/SA creation. To maintain consistency with
    other optional attributes, the set mark is only
    updated if sent with a non-zero value.

    The per-SA lock and the xfrm_state_lock are taken in
    that order to avoid a deadlock with
    xfrm_timer_handler(), which also takes the locks in
    that order.

    Signed-off-by: Nathan Harold
    Signed-off-by: Steffen Klassert

    Nathan Harold
     

25 Jun, 2018

2 commits

  • Kristian Evensen says:
    In a project I am involved in, we are running ipsec (Strongswan) on
    different mt7621-based routers. Each router is configured as an
    initiator and has around ~30 tunnels to different responders (running
    on misc. devices). Before the flow cache was removed (kernel 4.9), we
    got a combined throughput of around 70Mbit/s for all tunnels on one
    router. However, we recently switched to kernel 4.14 (4.14.48), and
    the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
    drop of around 20%. Reverting the flow cache removal restores, as
    expected, performance levels to that of kernel 4.9.

    When pcpu xdst exists, it has to be validated first before it can be
    used.

    A negative hit thus increases cost vs. no-cache.

    As number of tunnels increases, hit rate decreases so this pcpu caching
    isn't a viable strategy.

    Furthermore, the xdst cache also needs to run with BH off, so when
    removing this the bh disable/enable pairs can be removed too.

    Kristian tested a 4.14.y backport of this change and reported
    increased performance:

    In our tests, the throughput reduction has been reduced from around -20%
    to -5%. We also see that the overall throughput is independent of the
    number of tunnels, while before the throughput was reduced as the number
    of tunnels increased.

    Reported-by: Kristian Evensen
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • nlmsg_multicast() always frees the skb, so in case we cannot call
    it we must do that ourselves.

    Fixes: 21ee543edc0dea ("xfrm: fix race between netns cleanup and state expire notification")
    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

23 Jun, 2018

4 commits

  • This patch adds support for virtual xfrm interfaces.
    Packets that are routed through such an interface
    are guaranteed to be IPsec transformed or dropped.
    It is a generic virtual interface that ensures IPsec
    transformation, no need to know what happens behind
    the interface. This means that we can tunnel IPv4 and
    IPv6 through the same interface and support all xfrm
    modes (tunnel, transport and beet) on it.

    Co-developed-by: Lorenzo Colitti
    Co-developed-by: Benedict Wong
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Benedict Wong
    Signed-off-by: Steffen Klassert
    Acked-by: Shannon Nelson
    Tested-by: Benedict Wong
    Tested-by: Antony Antony
    Reviewed-by: Eyal Birger

    Steffen Klassert
     
  • This patch adds the xfrm interface id as a lookup key
    for xfrm states and policies. With this we can assign
    states and policies to virtual xfrm interfaces.

    Signed-off-by: Steffen Klassert
    Acked-by: Shannon Nelson
    Acked-by: Benedict Wong
    Tested-by: Benedict Wong
    Tested-by: Antony Antony
    Reviewed-by: Eyal Birger

    Steffen Klassert
     
  • We already support setting an output mark at the xfrm_state,
    unfortunately this does not support the input direction and
    masking the marks that will be applied to the skb. This change
    adds support applying a masked value in both directions.

    The existing XFRMA_OUTPUT_MARK number is reused for this purpose
    and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

    An additional XFRMA_SET_MARK_MASK attribute is added for setting the
    mask. If the attribute mask not provided, it is set to 0xffffffff,
    keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.

    Co-developed-by: Tobias Brunner
    Co-developed-by: Eyal Birger
    Co-developed-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert
    Signed-off-by: Tobias Brunner
    Signed-off-by: Eyal Birger
    Signed-off-by: Lorenzo Colitti

    Steffen Klassert
     
  • Fix missing dst_release() when local broadcast or multicast traffic is
    xfrm policy blocked.

    For IPv4 this results to dst leak: ip_route_output_flow() allocates
    dst_entry via __ip_route_output_key() and passes it to
    xfrm_lookup_route(). xfrm_lookup returns ERR_PTR(-EPERM) that is
    propagated. The dst that was allocated is never released.

    IPv4 local broadcast testcase:
    ping -b 192.168.1.255 &
    sleep 1
    ip xfrm policy add src 0.0.0.0/0 dst 192.168.1.255/32 dir out action block

    IPv4 multicast testcase:
    ping 224.0.0.1 &
    sleep 1
    ip xfrm policy add src 0.0.0.0/0 dst 224.0.0.1/32 dir out action block

    For IPv6 the missing dst_release() causes trouble e.g. when used in netns:
    ip netns add TEST
    ip netns exec TEST ip link set lo up
    ip link add dummy0 type dummy
    ip link set dev dummy0 netns TEST
    ip netns exec TEST ip addr add fd00::1111 dev dummy0
    ip netns exec TEST ip link set dummy0 up
    ip netns exec TEST ping -6 -c 5 ff02::1%dummy0 &
    sleep 1
    ip netns exec TEST ip xfrm policy add src ::/0 dst ff02::1 dir out action block
    wait
    ip netns del TEST

    After netns deletion we see:
    [ 258.239097] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 268.279061] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 278.367018] unregister_netdevice: waiting for lo to become free. Usage count = 2
    [ 288.375259] unregister_netdevice: waiting for lo to become free. Usage count = 2

    Fixes: ac37e2515c1a ("xfrm: release dst_orig in case of error in xfrm_lookup()")
    Signed-off-by: Tommi Rantala
    Signed-off-by: Steffen Klassert

    Tommi Rantala
     

19 Jun, 2018

1 commit

  • struct xfrm_userpolicy_type has two holes, so we should not
    use C99 style initializer.

    KMSAN report:

    BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:140 [inline]
    BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571
    CPU: 1 PID: 4520 Comm: syz-executor841 Not tainted 4.17.0+ #5
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1211
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1253
    copyout lib/iov_iter.c:140 [inline]
    _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571
    copy_to_iter include/linux/uio.h:106 [inline]
    skb_copy_datagram_iter+0x422/0xfa0 net/core/datagram.c:431
    skb_copy_datagram_msg include/linux/skbuff.h:3268 [inline]
    netlink_recvmsg+0x6f1/0x1900 net/netlink/af_netlink.c:1959
    sock_recvmsg_nosec net/socket.c:802 [inline]
    sock_recvmsg+0x1d6/0x230 net/socket.c:809
    ___sys_recvmsg+0x3fe/0x810 net/socket.c:2279
    __sys_recvmmsg+0x58e/0xe30 net/socket.c:2391
    do_sys_recvmmsg+0x2a6/0x3e0 net/socket.c:2472
    __do_sys_recvmmsg net/socket.c:2485 [inline]
    __se_sys_recvmmsg net/socket.c:2481 [inline]
    __x64_sys_recvmmsg+0x15d/0x1c0 net/socket.c:2481
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x446ce9
    RSP: 002b:00007fc307918db8 EFLAGS: 00000293 ORIG_RAX: 000000000000012b
    RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000446ce9
    RDX: 000000000000000a RSI: 0000000020005040 RDI: 0000000000000003
    RBP: 00000000006dbc20 R08: 0000000020004e40 R09: 0000000000000000
    R10: 0000000040000000 R11: 0000000000000293 R12: 0000000000000000
    R13: 00007ffc8d2df32f R14: 00007fc3079199c0 R15: 0000000000000001

    Uninit was stored to memory at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
    kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
    kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
    kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527
    __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:413
    __nla_put lib/nlattr.c:569 [inline]
    nla_put+0x276/0x340 lib/nlattr.c:627
    copy_to_user_policy_type net/xfrm/xfrm_user.c:1678 [inline]
    dump_one_policy+0xbe1/0x1090 net/xfrm/xfrm_user.c:1708
    xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013
    xfrm_dump_policy+0x1c0/0x2a0 net/xfrm/xfrm_user.c:1749
    netlink_dump+0x9b5/0x1550 net/netlink/af_netlink.c:2226
    __netlink_dump_start+0x1131/0x1270 net/netlink/af_netlink.c:2323
    netlink_dump_start include/linux/netlink.h:214 [inline]
    xfrm_user_rcv_msg+0x8a3/0x9b0 net/xfrm/xfrm_user.c:2577
    netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
    xfrm_netlink_rcv+0xb2/0xf0 net/xfrm/xfrm_user.c:2598
    netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
    netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
    sock_sendmsg_nosec net/socket.c:629 [inline]
    sock_sendmsg net/socket.c:639 [inline]
    ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
    __sys_sendmsg net/socket.c:2155 [inline]
    __do_sys_sendmsg net/socket.c:2164 [inline]
    __se_sys_sendmsg net/socket.c:2162 [inline]
    __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
    do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Local variable description: ----upt.i@dump_one_policy
    Variable was created at:
    dump_one_policy+0x78/0x1090 net/xfrm/xfrm_user.c:1689
    xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013

    Byte 130 of 137 is uninitialized
    Memory access starts at ffff88019550407f

    Fixes: c0144beaeca42 ("[XFRM] netlink: Use nla_put()/NLA_PUT() variantes")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Steffen Klassert
    Cc: Herbert Xu
    Signed-off-by: Steffen Klassert

    Eric Dumazet
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds
     

05 Jun, 2018

1 commit

  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     

03 Jun, 2018

1 commit


31 May, 2018

1 commit


16 May, 2018

1 commit

  • Variant of proc_create_data that directly take a seq_file show
    callback and deals with network namespaces in ->open and ->release.
    All callers of proc_create + single_open_net converted over, and
    single_{open,release}_net are removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

12 May, 2018

1 commit

  • The bpf syscall and selftests conflicts were trivial
    overlapping changes.

    The r8169 change involved moving the added mdelay from 'net' into a
    different function.

    A TLS close bug fix overlapped with the splitting of the TLS state
    into separate TX and RX parts. I just expanded the tests in the bug
    fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
    == X".

    Signed-off-by: David S. Miller

    David S. Miller
     

04 May, 2018

1 commit

  • struct xfrm_state is rather large (768 bytes here) and therefore wastes
    quite a lot of memory as it falls into the kmalloc-1024 slab cache,
    leaving 256 bytes of unused memory per XFRM state object -- a net waste
    of 25%.

    Using a dedicated slab cache for struct xfrm_state reduces the level of
    internal fragmentation to a minimum.

    On my configuration SLUB chooses to create a slab cache covering 4
    pages holding 21 objects, resulting in an average memory waste of ~13
    bytes per object -- a net waste of only 1.6%.

    In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
    states.

    Signed-off-by: Mathias Krause
    Signed-off-by: Steffen Klassert

    Mathias Krause
     

16 Apr, 2018

1 commit

  • We need to make sure that all states are really deleted
    before we check that the state lists are empty. Otherwise
    we trigger a warning.

    Fixes: baeb0dbbb5659 ("xfrm6_tunnel: exit_net cleanup check added")
    Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com
    Signed-off-by: Steffen Klassert

    Steffen Klassert