03 Mar, 2018

1 commit

  • [ Upstream commit acf568ee859f098279eadf551612f103afdacb4e ]

    This is an old bugbear of mine:

    https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html

    By crafting special packets, it is possible to cause recursion
    in our kernel when processing transport-mode packets at levels
    that are only limited by packet size.

    The easiest one is with DNAT, but an even worse one is where
    UDP encapsulation is used in which case you just have to insert
    an UDP encapsulation header in between each level of recursion.

    This patch avoids this problem by reinjecting tranport-mode packets
    through a tasklet.

    Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks")
    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

31 Aug, 2017

1 commit

  • In conjunction with crypto offload [1], removing the ESP trailer by
    hardware can potentially improve the performance by avoiding (1) a
    cache miss incurred by reading the nexthdr field and (2) the necessity
    to calculate the csum value of the trailer in order to keep skb->csum
    valid.

    This patch introduces the changes to the xfrm stack and merely serves
    as an infrastructure. Subsequent patch to mlx5 driver will put this to
    a good use.

    [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.html

    Signed-off-by: Yossi Kuperman
    Signed-off-by: Steffen Klassert

    Yossi Kuperman
     

11 Aug, 2017

1 commit

  • On systems that use mark-based routing it may be necessary for
    routing lookups to use marks in order for packets to be routed
    correctly. An example of such a system is Android, which uses
    socket marks to route packets via different networks.

    Currently, routing lookups in tunnel mode always use a mark of
    zero, making routing incorrect on such systems.

    This patch adds a new output_mark element to the xfrm state and
    a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
    mark differs from the existing xfrm mark in two ways:

    1. The xfrm mark is used to match xfrm policies and states, while
    the xfrm output mark is used to set the mark (and influence
    the routing) of the packets emitted by those states.
    2. The existing mark is constrained to be a subset of the bits of
    the originating socket or transformed packet, but the output
    mark is arbitrary and depends only on the state.

    The use of a separate mark provides additional flexibility. For
    example:

    - A packet subject to two transforms (e.g., transport mode inside
    tunnel mode) can have two different output marks applied to it,
    one for the transport mode SA and one for the tunnel mode SA.
    - On a system where socket marks determine routing, the packets
    emitted by an IPsec tunnel can be routed based on a mark that
    is determined by the tunnel, not by the marks of the
    unencrypted packets.
    - Support for setting the output marks can be introduced without
    breaking any existing setups that employ both mark-based
    routing and xfrm tunnel mode. Simply changing the code to use
    the xfrm mark for routing output packets could xfrm mark could
    change behaviour in a way that breaks these setups.

    If the output mark is unspecified or set to zero, the mark is not
    set or changed.

    Tested: make allyesconfig; make -j64
    Tested: https://android-review.googlesource.com/452776
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert

    Lorenzo Colitti
     

02 Aug, 2017

2 commits


19 Jul, 2017

2 commits

  • retain last used xfrm_dst in a pcpu cache.
    On next request, reuse this dst if the policies are the same.

    The cache will not help with strict RR workloads as there is no hit.

    The cache packet-path part is reasonably small, the notifier part is
    needed so we do not add long hangs when a device is dismantled but some
    pcpu xdst still holds a reference, there are also calls to the flush
    operation when userspace deletes SAs so modules can be removed
    (there is no hit.

    We need to run the dst_release on the correct cpu to avoid races with
    packet path. This is done by adding a work_struct for each cpu and then
    doing the actual test/release on each affected cpu via schedule_work_on().

    Test results using 4 network namespaces and null encryption:

    ns1 ns2 -> ns3 -> ns4
    netperf -> xfrm/null enc -> xfrm/null dec -> netserver

    what TCP_STREAM UDP_STREAM UDP_RR
    Flow cache: 14644.61 294.35 327231.64
    No flow cache: 14349.81 242.64 202301.72
    Pcpu cache: 14629.70 292.21 205595.22

    UDP tests used 64byte packets, tests ran for one minute each,
    value is average over ten iterations.

    'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
    series but without this patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • After rcu conversions performance degradation in forward tests isn't that
    noticeable anymore.

    See next patch for some numbers.

    A followup patcg could then also remove genid from the policies
    as we do not cache bundles anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Jul, 2017

3 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

01 Jul, 2017

1 commit


24 Jun, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-06-23

    1) Use memdup_user to spmlify xfrm_user_policy.
    From Geliang Tang.

    2) Make xfrm_dev_register static to silence a sparse warning.
    From Wei Yongjun.

    3) Use crypto_memneq to check the ICV in the AH protocol.
    From Sabrina Dubroca.

    4) Remove some unused variables in esp6.
    From Stephen Hemminger.

    5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
    From Antony Antony.

    6) Include the UDP encapsulation port to km_migrate announcements.
    From Antony Antony.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Jun, 2017

3 commits

  • Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
    to userland. Only add if XFRMA_ENCAP was in user migrate request.

    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
    netlink attribute XFRMA_ENCAP.

    The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
    could go to sleep for a few minutes and wake up. When it wake up the
    NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
    message to migrate the IPsec SA. The change could be a change UDP
    encapsulation port, IP address, or both.

    Reported-by: Paul Wouters
    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • In commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") we
    make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
    But this will make xfrm_dev_event() missing if we only enable default XFRM
    options.

    Then if we set down and unregister an interface with IPsec on it. there
    will no xfrm_garbage_collect(), which will cause dev usage count hold and
    get error like:

    unregister_netdevice: waiting for to become free. Usage count = 4

    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Hangbin Liu
    Signed-off-by: Steffen Klassert

    Hangbin Liu
     

04 May, 2017

1 commit

  • When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
    that dst. Unfortunately, the code that allocates and fills this copy
    doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
    passed. In multiple code paths (from raw_sendmsg, from TCP when
    replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
    passed to xfrm is actually an on-stack flowi4, so we end up reading
    stuff from the stack past the end of the flowi4 struct.

    Since xfrm_dst->origin isn't used anywhere following commit
    ca116922afa8 ("xfrm: Eliminate "fl" and "pol" args to
    xfrm_bundle_ok()."), just get rid of it. xfrm_dst->partner isn't used
    either, so get rid of that too.

    Fixes: 9d6ec938019c ("ipv4: Use flowi4 in public route lookup interfaces.")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Sabrina Dubroca
     

14 Apr, 2017

5 commits


27 Mar, 2017

1 commit

  • Current addr4_match() code has special test for /0 prefixes because of
    standard required undefined behaviour. However, it is possible to omit
    it on 64-bit because shifting can be done within a 64-bit register and
    then truncated to the expected value (which is 0 mask).

    Implicit truncation by htonl() fits nicely into R32-within-R64 model
    on x86-64.

    Space savings: none (coincidence)
    Branch savings: 1

    Before:

    movzx eax,BYTE PTR [rdi+0x2a] # ->prefixlen_d
    test al,al
    jne xfrm_selector_match + 0x23f
    ...
    movzx eax,BYTE PTR [rbx+0x2b] # ->prefixlen_s
    test al,al
    je xfrm_selector_match + 0x1c7

    After (no branches):

    mov r8d,0x20
    mov rdx,0xffffffffffffffff
    mov esi,DWORD PTR [rsi+0x2c]
    mov ecx,r8d
    sub cl,BYTE PTR [rdi+0x2a]
    xor esi,DWORD PTR [rbx]
    mov rdi,rdx
    xor eax,eax
    shl rdi,cl
    bswap edi

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Steffen Klassert

    Alexey Dobriyan
     

24 Mar, 2017

2 commits


15 Feb, 2017

4 commits


09 Feb, 2017

4 commits


17 Jan, 2017

1 commit

  • This patch tries to avoid skb_cow_data on esp4.

    On the encrypt side we add the IPsec tailbits
    to the linear part of the buffer if there is
    space on it. If there is no space on the linear
    part, we add a page fragment with the tailbits to
    the buffer and use separate src and dst scatterlists.

    On the decrypt side, we leave the buffer as it is
    if it is not cloned.

    With this, we can avoid a linearization of the buffer
    in most of the cases.

    Joint work with:
    Sowmini Varadhan
    Ilan Tayari

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: Ilan Tayari
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

10 Jan, 2017

2 commits

  • xfrm_init_tempstate is always called from within rcu read side section.
    We can thus use a simpler function that doesn't call rcu_read_lock
    again.

    While at it, also make xfrm_init_tempstate return value void, the
    return value was never tested.

    A followup patch will replace remaining callers of xfrm_state_get_afinfo
    with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
    get_afinfo interface.

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • commit 44abdc3047aecafc141dfbaf1ed
    ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
    xfrm_state_put_afinfo equivalent to rcu_read_unlock.

    Use spatch to replace it with direct calls to rcu_read_unlock:

    @@
    struct xfrm_state_afinfo *a;
    @@

    - xfrm_state_put_afinfo(a);
    + rcu_read_unlock();

    old:
    text data bss dec hex filename
    22570 72 424 23066 5a1a xfrm_state.o
    1612 0 0 1612 64c xfrm_output.o
    new:
    22554 72 424 23050 5a0a xfrm_state.o
    1596 0 0 1596 63c xfrm_output.o

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

23 Sep, 2016

1 commit


21 Sep, 2016

1 commit

  • Since commit 1625f4529957, vti6 is broken, all input packets are dropped
    (LINUX_MIB_XFRMINNOSTATES is incremented).

    XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
    xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
    xfrm6_rcv_spi().

    A new function xfrm6_rcv_tnl() that enables to pass a value to
    xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
    is used in several handlers).

    CC: Alexey Kodanev
    Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

10 Aug, 2016

1 commit


28 Apr, 2016

1 commit