02 Oct, 2019

1 commit

  • commit 174e23810cd31
    ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
    recycle always drop skb extensions. The additional skb_ext_del() that is
    performed via nf_reset on napi skb recycle is not needed anymore.

    Most nf_reset() calls in the stack are there so queued skb won't block
    'rmmod nf_conntrack' indefinitely.

    This removes the skb_ext_del from nf_reset, and renames it to a more
    fitting nf_reset_ct().

    In a few selected places, add a call to skb_ext_reset to make sure that
    no active extensions remain.

    I am submitting this for "net", because we're still early in the release
    cycle. The patch applies to net-next too, but I think the rename causes
    needless divergence between those trees.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 Apr, 2019

5 commits

  • This structure is now only 4 bytes, so its more efficient
    to cache a copy rather than its address.

    No significant size difference in allmodconfig vmlinux.

    With non-modular kernel that has all XFRM options enabled, this
    series reduces vmlinux image size by ~11kb. All xfrm_mode
    indirections are gone and all modes are built-in.

    before (ipsec-next master):
    text data bss dec filename
    21071494 7233140 11104324 39408958 vmlinux.master

    after this series:
    21066448 7226772 11104324 39397544 vmlinux.patched

    With allmodconfig kernel, the size increase is only 362 bytes,
    even all the xfrm config options removed in this series are
    modular.

    before:
    text data bss dec filename
    15731286 6936912 4046908 26715106 vmlinux.master

    after this series:
    15731492 6937068 4046908 26715468 vmlinux

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • after previous changes, xfrm_mode contains no function pointers anymore
    and all modules defining such struct contain no code except an init/exit
    functions to register the xfrm_mode struct with the xfrm core.

    Just place the xfrm modes core and remove the modules,
    the run-time xfrm_mode register/unregister functionality is removed.

    Before:

    text data bss dec filename
    7523 200 2364 10087 net/xfrm/xfrm_input.o
    40003 628 440 41071 net/xfrm/xfrm_state.o
    15730338 6937080 4046908 26714326 vmlinux

    7389 200 2364 9953 net/xfrm/xfrm_input.o
    40574 656 440 41670 net/xfrm/xfrm_state.o
    15730084 6937068 4046908 26714060 vmlinux

    The xfrm*_mode_{transport,tunnel,beet} modules are gone.

    v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
    ones rather than removing them.

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from
    ipv6 in case of CONFIG_IPV6=m.

    This change has virtually no effect on vmlinux size, but it reduces
    afinfo size and allows followup patch to make xfrm modes const.

    v2: mark if (afinfo) tests as likely (Sabrina)
    re-fetch afinfo according to inner_mode in xfrm_prepare_input().

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • similar to previous patch: no external module dependencies,
    so we can avoid the indirection by placing this in the core.

    This change removes the last indirection from xfrm_mode and the
    xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.

    Before:
    text data bss dec hex filename
    3957 136 0 4093 ffd net/xfrm/xfrm_output.o
    587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o
    649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o
    625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o
    599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o
    After:
    text data bss dec hex filename
    5359 184 0 5543 15a7 net/xfrm/xfrm_output.o
    171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o
    171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o
    172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o
    172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o

    v2: fold the *encap_add functions into xfrm*_prepare_output
    preserve (move) output2 comment (Sabrina)
    use x->outer_mode->encap, not inner
    fix a build breakage on ppc (kbuild robot)

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     
  • Same is input indirection. Only exception: we need to export
    xfrm_outer_mode_output for pktgen.

    Increases size of vmlinux by about 163 byte:
    Before:
    text data bss dec filename
    15730208 6936948 4046908 26714064 vmlinux

    After:
    15730311 6937008 4046908 26714227 vmlinux

    xfrm_inner_extract_output has no more external callers, make it static.

    v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
    add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
    add WARN_ON_ONCE for 'call AF_INET6 related output function, but
    CONFIG_IPV6=n' case.
    make xfrm_inner_extract_output static

    Signed-off-by: Florian Westphal
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

21 Dec, 2018

1 commit


20 Dec, 2018

1 commit

  • secpath_set is a wrapper for secpath_dup that will not perform
    an allocation if the secpath attached to the skb has a reference count
    of one, i.e., it doesn't need to be COW'ed.

    Also, secpath_dup doesn't attach the secpath to the skb, it leaves
    this to the caller.

    Use secpath_set in places that immediately assign the return value to
    skb.

    This allows to remove skb->sp without touching these spots again.

    secpath_dup can eventually be removed in followup patch.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

28 Oct, 2018

1 commit

  • xfrm_output_one() does not return a error code when there is
    no dst_entry attached to the skb, it is still possible crash
    with a NULL pointer dereference in xfrm_output_resume(). Fix
    it by return error code -EHOSTUNREACH.

    Fixes: 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.")
    Signed-off-by: Wei Yongjun
    Signed-off-by: Steffen Klassert

    Wei Yongjun
     

04 Oct, 2018

1 commit


02 Oct, 2018

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2018-10-01

    1) Validate address prefix lengths in the xfrm selector,
    otherwise we may hit undefined behaviour in the
    address matching functions if the prefix is too
    big for the given address family.

    2) Fix skb leak on local message size errors.
    From Thadeu Lima de Souza Cascardo.

    3) We currently reset the transport header back to the network
    header after a transport mode transformation is applied. This
    leads to an incorrect transport header when multiple transport
    mode transformations are applied. Reset the transport header
    only after all transformations are already applied to fix this.
    From Sowmini Varadhan.

    4) We only support one offloaded xfrm, so reset crypto_done after
    the first transformation in xfrm_input(). Otherwise we may call
    the wrong input method for subsequent transformations.
    From Sowmini Varadhan.

    5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
    skb_dst_force does not really force a dst refcount anymore, it might
    clear it instead. xfrm code did not expect this, add a check to not
    dereference skb_dst() if it was cleared by skb_dst_force.

    6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
    read in xfrm_state_find. From Sean Tranchetti.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Sep, 2018

2 commits

  • Since commit 222d7dbd258d ("net: prevent dst uses after free")
    skb_dst_force() might clear the dst_entry attached to the skb.
    The xfrm code don't expect this to happen, so we crash with
    a NULL pointer dereference in this case. Fix it by checking
    skb_dst(skb) for NULL after skb_dst_force() and drop the packet
    in cast the dst_entry was cleared.

    Fixes: 222d7dbd258d ("net: prevent dst uses after free")
    Reported-by: Tobias Hommel
    Reported-by: Kristian Evensen
    Reported-by: Wolfgang Walter
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • An SKB is not on a list if skb->next is NULL.

    Codify this convention into a helper function and use it
    where we are dequeueing an SKB and need to mark it as such.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Jun, 2018

1 commit

  • We already support setting an output mark at the xfrm_state,
    unfortunately this does not support the input direction and
    masking the marks that will be applied to the skb. This change
    adds support applying a masked value in both directions.

    The existing XFRMA_OUTPUT_MARK number is reused for this purpose
    and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

    An additional XFRMA_SET_MARK_MASK attribute is added for setting the
    mask. If the attribute mask not provided, it is set to 0xffffffff,
    keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.

    Co-developed-by: Tobias Brunner
    Co-developed-by: Eyal Birger
    Co-developed-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert
    Signed-off-by: Tobias Brunner
    Signed-off-by: Eyal Birger
    Signed-off-by: Lorenzo Colitti

    Steffen Klassert
     

16 Mar, 2018

1 commit


30 Nov, 2017

1 commit


31 Oct, 2017

1 commit

  • We reset the encapsulation field of the skb too early
    in xfrm_output. As a result, the GRE GSO handler does
    not segment the packets. This leads to a performance
    drop down. We fix this by resetting the encapsulation
    field right before we do the transformation, when
    the inner headers become invalid.

    Fixes: f1bd7d659ef0 ("xfrm: Add encapsulation header offsets while SKB is not encrypted")
    Reported-by: Vicente De Luca
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

11 Aug, 2017

1 commit

  • On systems that use mark-based routing it may be necessary for
    routing lookups to use marks in order for packets to be routed
    correctly. An example of such a system is Android, which uses
    socket marks to route packets via different networks.

    Currently, routing lookups in tunnel mode always use a mark of
    zero, making routing incorrect on such systems.

    This patch adds a new output_mark element to the xfrm state and
    a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
    mark differs from the existing xfrm mark in two ways:

    1. The xfrm mark is used to match xfrm policies and states, while
    the xfrm output mark is used to set the mark (and influence
    the routing) of the packets emitted by those states.
    2. The existing mark is constrained to be a subset of the bits of
    the originating socket or transformed packet, but the output
    mark is arbitrary and depends only on the state.

    The use of a separate mark provides additional flexibility. For
    example:

    - A packet subject to two transforms (e.g., transport mode inside
    tunnel mode) can have two different output marks applied to it,
    one for the transport mode SA and one for the tunnel mode SA.
    - On a system where socket marks determine routing, the packets
    emitted by an IPsec tunnel can be routed based on a mark that
    is determined by the tunnel, not by the marks of the
    unencrypted packets.
    - Support for setting the output marks can be introduced without
    breaking any existing setups that employ both mark-based
    routing and xfrm tunnel mode. Simply changing the code to use
    the xfrm mark for routing output packets could xfrm mark could
    change behaviour in a way that breaks these setups.

    If the output mark is unspecified or set to zero, the mark is not
    set or changed.

    Tested: make allyesconfig; make -j64
    Tested: https://android-review.googlesource.com/452776
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert

    Lorenzo Colitti
     

14 Apr, 2017

2 commits

  • Both esp4 and esp6 used to assume that the SKB payload is encrypted
    and therefore the inner_network and inner_transport offsets are
    not relevant.
    When doing crypto offload in the NIC, this is no longer the case
    and the NIC driver needs these offsets so it can do TX TCP checksum
    offloading.
    This patch sets the inner_network and inner_transport members of
    the SKB, as well as encapsulation, to reflect the actual positions
    of these headers, and removes them only once encryption is done
    on the payload.

    Signed-off-by: Ilan Tayari
    Signed-off-by: Steffen Klassert

    Ilan Tayari
     
  • This patch adds all the bits that are needed to do
    IPsec hardware offload for IPsec states and ESP packets.
    We add xfrmdev_ops to the net_device. xfrmdev_ops has
    function pointers that are needed to manage the xfrm
    states in the hardware and to do a per packet
    offloading decision.

    Joint work with:
    Ilan Tayari
    Guy Shapiro
    Yossi Kuperman

    Signed-off-by: Guy Shapiro
    Signed-off-by: Ilan Tayari
    Signed-off-by: Yossi Kuperman
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

10 Jan, 2017

1 commit

  • commit 44abdc3047aecafc141dfbaf1ed
    ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
    xfrm_state_put_afinfo equivalent to rcu_read_unlock.

    Use spatch to replace it with direct calls to rcu_read_unlock:

    @@
    struct xfrm_state_afinfo *a;
    @@

    - xfrm_state_put_afinfo(a);
    + rcu_read_unlock();

    old:
    text data bss dec hex filename
    22570 72 424 23066 5a1a xfrm_state.o
    1612 0 0 1612 64c xfrm_output.o
    new:
    22554 72 424 23050 5a0a xfrm_state.o
    1596 0 0 1596 63c xfrm_output.o

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

17 Mar, 2016

1 commit


16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

08 Oct, 2015

3 commits


18 Sep, 2015

4 commits

  • In code review it was noticed that I had failed to add some blank lines
    in places where they are customarily used. Taking a second look at the
    code I have to agree blank lines would be nice so I have added them
    here.

    Reported-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 May, 2015

1 commit


08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

21 Oct, 2014

1 commit

  • skb_gso_segment has three possible return values:
    1. a pointer to the first segmented skb
    2. an errno value (IS_ERR())
    3. NULL. This can happen when GSO is used for header verification.

    However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
    and would oops when NULL is returned.

    Note that these call sites should never actually see such a NULL return
    value; all callers mask out the GSO bits in the feature argument.

    However, there have been issues with some protocol handlers erronously not
    respecting the specified feature mask in some cases.

    It is preferable to get 'have to turn off hw offloading, else slow' reports
    rather than 'kernel crashes'.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

10 Sep, 2014

1 commit


13 May, 2014

1 commit


19 Aug, 2013

1 commit

  • We need to choose the protocol family by skb->protocol. Otherwise we
    call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is
    used in ipv4 mode, in which case we should call down to xfrm4_local_error
    (ip6 sockets are a superset of ip4 ones).

    We are called before before ip_output functions, so skb->protocol is
    not reset.

    Cc: Steffen Klassert
    Acked-by: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

14 Aug, 2013

1 commit

  • In xfrm4 and xfrm6 we need to take care about sockets of the other
    address family. This could happen because a 6in4 or 4in6 tunnel could
    get protected by ipsec.

    Because we don't want to have a run-time dependency on ipv6 when only
    using ipv4 xfrm we have to embed a pointer to the correct local_error
    function in xfrm_state_afinet and look it up when returning an error
    depending on the socket address family.

    Thanks to vi0ss for the great bug report:

    v2:
    a) fix two more unsafe interpretations of skb->sk as ipv6 socket
    (xfrm6_local_dontfrag and __xfrm6_output)
    v3:
    a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
    building ipv6 as a module (thanks to Steffen Klassert)

    Reported-by:
    Cc: Steffen Klassert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

05 Jun, 2013

1 commit


23 May, 2013

1 commit

  • The error exit path needs err explicitly set. Otherwise it
    returns success and the only caller, xfrm_output_resume(),
    would oops in skb_dst(skb)->ops derefence as skb_dst(skb) is
    NULL.

    Bug introduced in commit bb65a9cb (xfrm: removes a superfluous
    check and add a statistic).

    Signed-off-by: Timo Teräs
    Cc: Li RongQing
    Cc: Steffen Klassert
    Signed-off-by: David S. Miller

    Timo Teräs