04 Nov, 2018

1 commit

  • [ Upstream commit 215ab0f021c9fea3c18b75e7d522400ee6a49990 ]

    After commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ("vti6: fix PMTU caching
    and reporting on xmit"), some too big skbs might be potentially passed down to
    __xfrm6_output, causing it to fail to transmit but not free the skb, causing a
    leak of skb, and consequentially a leak of dst references.

    After running pmtu.sh, that shows as failure to unregister devices in a namespace:

    [ 311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1

    The fix is to call kfree_skb in case of transmit failures.

    Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
    Signed-off-by: Thadeu Lima de Souza Cascardo
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Thadeu Lima de Souza Cascardo
     

14 Apr, 2017

1 commit

  • This patch adds all the bits that are needed to do
    IPsec hardware offload for IPsec states and ESP packets.
    We add xfrmdev_ops to the net_device. xfrmdev_ops has
    function pointers that are needed to manage the xfrm
    states in the hardware and to do a per packet
    offloading decision.

    Joint work with:
    Ilan Tayari
    Guy Shapiro
    Yossi Kuperman

    Signed-off-by: Guy Shapiro
    Signed-off-by: Ilan Tayari
    Signed-off-by: Yossi Kuperman
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Oct, 2015

1 commit

  • Commit 044a832a777 ("xfrm: Fix local error reporting crash
    with interfamily tunnels") moved the setting of skb->protocol
    behind the last access of the inner mode family to fix an
    interfamily crash. Unfortunately now skb->protocol might not
    be set at all, so we fail dispatch to the inner address family.
    As a reault, the local error handler is not called and the
    mtu value is not reported back to userspace.

    We fix this by setting skb->protocol on message size errors
    before we call xfrm_local_error.

    Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels")
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

08 Oct, 2015

2 commits


02 Oct, 2015

1 commit


30 Sep, 2015

1 commit


18 Sep, 2015

4 commits

  • In code review it was noticed that I had failed to add some blank lines
    in places where they are customarily used. Taking a second look at the
    code I have to agree blank lines would be nice so I have added them
    here.

    Reported-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Sep, 2015

1 commit

  • The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
    packets. That is, we perform fragmentation pre-encap rather than
    post-encap.

    A check was added later to ensure that proper MTU information is
    passed back for locally generated traffic. Unfortunately this
    check was performed on all IPsec packets, including transport-mode
    packets.

    What's more, the check failed to take GSO into account.

    The end result is that transport-mode GSO packets get dropped at
    the check.

    This patch fixes it by moving the tunnel mode check forward as well
    as adding the GSO check.

    Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

09 Feb, 2015

1 commit

  • We set the outer mode protocol too early. As a result, the
    local error handler might dispatch to the wrong address family
    and report the error to a wrong socket type. We fix this by
    setting the outer protocol to the skb after we accessed the
    inner mode for the last time, right before we do the atcual
    encapsulation where we switch finally to the outer mode.

    Reported-by: Chris Ruehl
    Tested-by: Chris Ruehl
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

25 Aug, 2014

1 commit

  • This patch makes no changes to the logic of the code but simply addresses
    coding style issues as detected by checkpatch.

    Both objdump and diff -w show no differences.

    This patch removes some blank lines between the end of a function
    definition and the EXPORT_SYMBOL_GPL macro in order to prevent
    checkpatch warning that EXPORT_SYMBOL must immediately follow
    a function.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

24 May, 2014

1 commit

  • Conflicts:
    drivers/net/bonding/bond_alb.c
    drivers/net/ethernet/altera/altera_msgdma.c
    drivers/net/ethernet/altera/altera_sgdma.c
    net/ipv6/xfrm6_output.c

    Several cases of overlapping changes.

    The xfrm6_output.c has a bug fix which overlaps the renaming
    of skb->local_df to skb->ignore_df.

    In the Altera TSE driver cases, the register access cleanups
    in net-next overlapped with bug fixes done in net.

    Similarly a bug fix to send ALB packets in the bonding driver using
    the right source address overlaps with cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 May, 2014

1 commit

  • Conflicts:
    net/ipv4/ip_vti.c

    Steffen Klassert says:

    ====================
    pull request (net): ipsec 2014-05-15

    This pull request has a merge conflict in net/ipv4/ip_vti.c
    between commit 8d89dcdf80d8 ("vti: don't allow to add the same
    tunnel twice") and commit a32452366b72 ("vti4:Don't count header
    length twice"). It can be solved like it is done in linux-next.

    1) Fix a ipv6 xfrm output crash when a packet is rerouted
    by netfilter to not use IPsec.

    2) vti4 counts some header lengths twice leading to an incorrect
    device mtu. Fix this by counting these headers only once.

    3) We don't catch the case if an unsupported protocol is submitted
    to the xfrm protocol handlers, this can lead to NULL pointer
    dereferences. Fix this by adding the appropriate checks.

    4) vti6 may unregister pernet ops twice on init errors.
    Fix this by removing one of the calls to do it only once.
    From Mathias Krause.

    5) Set the vti tunnel mark before doing a lookup in the error
    handlers. Otherwise we don't find the correct xfrm state.
    ====================

    The conflict in ip_vti.c was simple, 'net' had a commit
    removing a line from vti_tunnel_init() and this tree
    being merged had a commit adding a line to the same
    location.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 May, 2014

1 commit

  • As suggested by several people, rename local_df to ignore_df,
    since it means "ignore df bit if it is set".

    Cc: Maciej Żenczykowski
    Cc: Florian Westphal
    Cc: David S. Miller
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Acked-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    WANG Cong
     

16 Apr, 2014

1 commit

  • In the dst->output() path for ipv4, the code assumes the skb it has to
    transmit is attached to an inet socket, specifically via
    ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
    provider of the packet is an AF_PACKET socket.

    The dst->output() method gets an additional 'struct sock *sk'
    parameter. This needs a cascade of changes so that this parameter can
    be propagated from vxlan to final consumer.

    Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
    Reported-by: lucien xin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Apr, 2014

1 commit

  • The ipv6 xfrm output path is not aware that packets can be
    rerouted by NAT to not use IPsec. We crash in this case
    because we expect to have a xfrm state at the dst_entry.
    This crash happens if the ipv6 layer does IPsec and NAT
    or if we have an interfamily IPsec tunnel with ipv4 NAT.

    We fix this by checking for a NAT rerouted packet in each
    address family and dst_output() to the new destination
    in this case.

    Reported-by: Martin Pelikan
    Tested-by: Martin Pelikan
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

26 Aug, 2013

1 commit

  • In commit 0ea9d5e3e0e03a63b11392f5613378977dae7eca ("xfrm: introduce
    helper for safe determination of mtu") I switched the determination of
    ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in
    case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is
    never correct for ipv4 ipsec.

    This patch partly reverts 0ea9d5e3e0e03a63b11392f5613378977dae7eca
    ("xfrm: introduce helper for safe determination of mtu").

    Cc: Steffen Klassert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

19 Aug, 2013

1 commit


14 Aug, 2013

2 commits

  • skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we
    always have to make sure we a referring to the correct interpretation
    of skb->sk.

    We only depend on header defines to query the mtu, so we don't introduce
    a new dependency to ipv6 by this change.

    Cc: Steffen Klassert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     
  • In xfrm4 and xfrm6 we need to take care about sockets of the other
    address family. This could happen because a 6in4 or 4in6 tunnel could
    get protected by ipsec.

    Because we don't want to have a run-time dependency on ipv6 when only
    using ipv4 xfrm we have to embed a pointer to the correct local_error
    function in xfrm_state_afinet and look it up when returning an error
    depending on the socket address family.

    Thanks to vi0ss for the great bug report:

    v2:
    a) fix two more unsafe interpretations of skb->sk as ipv6 socket
    (xfrm6_local_dontfrag and __xfrm6_output)
    v3:
    a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
    building ipv6 as a module (thanks to Steffen Klassert)

    Reported-by:
    Cc: Steffen Klassert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

01 Feb, 2012

1 commit

  • We don't check for NULL consistently in __xfrm6_output(). If "x" were
    NULL here it would lead to an OOPs later. I asked Steffen Klassert
    about this and he suggested that we remove the NULL check.

    On 10/29/11, Steffen Klassert wrote:
    >> net/ipv6/xfrm6_output.c
    >> 148
    >> 149 if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
    >> ^
    >
    > x can't be null here. It would be a bug if __xfrm6_output() is called
    > without a xfrm_state attached to the skb. I think we can just remove
    > this null check.

    Cc: Steffen Klassert
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

23 Nov, 2011

1 commit


19 Oct, 2011

1 commit

  • Calling icmpv6_send() on a local message size error leads to
    an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
    to notify about the pmtu if the IPV6_DONTFRAG socket option is
    set on an udp or raw socket, according RFC 3542 and use
    ipv6_local_error() otherwise.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

11 May, 2011

1 commit

  • As it is, we assign the outer modes output function to the dst entry
    when we create the xfrm bundle. This leads to two problems on interfamily
    scenarios. We might insert ipv4 packets into ip6_fragment when called
    from xfrm6_output. The system crashes if we try to fragment an ipv4
    packet with ip6_fragment. This issue was introduced with git commit
    ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
    as needed). The second issue is, that we might insert ipv4 packets in
    netfilter6 and vice versa on interfamily scenarios.

    With this patch we assign the inner mode output function to the dst entry
    when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
    mode is used and the right fragmentation and netfilter functions are called.
    We switch then to outer mode with the output_finish functions.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

20 Dec, 2010

1 commit


25 Mar, 2010

1 commit


19 Feb, 2010

1 commit


03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Apr, 2009

1 commit

  • If an ipv4 packet (not locally generated with IP_DF flag not set) bigger
    than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize
    check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet
    without sending a notice to the original sender of the ipv4 packet.

    Another issue is that ipv4 connection tracking does reassembling of
    incomming fragmented packets. If such a reassembled packet is supposed to
    go via a xfrm ipv6 tunnel it will be droped, even if the original sender
    did proper fragmentation.

    According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the
    encapsulation of an original packet are considered as locally generated
    packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size()
    fragmentation is allowed according to RFC 2473 (section 7.1/7.2).

    This patch sets skb->local_df in xfrm6_prepare_output() to achieve
    fragmentation in this case.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

25 Mar, 2008

1 commit


13 Feb, 2008

1 commit

  • This is a long-standing bug in the IPsec IPv6 code that breaks
    when we emit a IPsec tunnel-mode datagram packet. The problem
    is that the code the emits the packet assumes the IPv6 stack
    will fragment it later, but the IPv6 stack assumes that whoever
    is emitting the packet is going to pre-fragment the packet.

    In the long term we need to fix both sides, e.g., to get the
    datagram code to pre-fragment as well as to get the IPv6 stack
    to fragment locally generated tunnel-mode packet.

    For now this patch does the second part which should make it
    work for the IPsec host case.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

29 Jan, 2008

4 commits

  • The IPv4 and IPv6 hook values are identical, yet some code tries to figure
    out the "correct" value by looking at the address family. Introduce NF_INET_*
    values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
    section for userspace compatibility.

    Signed-off-by: Patrick McHardy
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • The nhoff field isn't actually necessary in xfrm_input. For tunnel
    mode transforms we now throw away the output IP header so it makes no
    sense to fill in the nexthdr field. For transport mode we can now let
    the function transport_finish do the setting and it knows where the
    nexthdr field is.

    The only other thing that needs the nexthdr field to be set is the
    header extraction code. However, we can simply move the protocol
    extraction out of the generic header extraction.

    We want to minimise the amount of info we have to carry around between
    transforms as this simplifies the resumption process for async crypto.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As part of the work on asynchrnous cryptographic operations, we need
    to be able to resume from the spot where they occur. As such, it
    helps if we isolate them to one spot.

    This patch moves most of the remaining family-specific processing into
    the common output code.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Most callers of the LOCAL_OUT chain will set the IP packet length
    before doing so. They also share the same output function dst_output.

    This patch creates a new function called ip6_local_out which does all
    of that and converts the appropriate users over to it.

    Apart from removing duplicate code, it will also help in merging the
    IPsec output path.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu