13 Jan, 2019

1 commit

  • [ Upstream commit 533555e5cbb6aa2d77598917871ae5b579fe724b ]

    xfrm_output_one() does not return a error code when there is
    no dst_entry attached to the skb, it is still possible crash
    with a NULL pointer dereference in xfrm_output_resume(). Fix
    it by return error code -EHOSTUNREACH.

    Fixes: 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.")
    Signed-off-by: Wei Yongjun
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Wei Yongjun
     

04 Nov, 2018

1 commit

  • [ Upstream commit 9e1437937807b0122e8da1ca8765be2adca9aee6 ]

    Since commit 222d7dbd258d ("net: prevent dst uses after free")
    skb_dst_force() might clear the dst_entry attached to the skb.
    The xfrm code don't expect this to happen, so we crash with
    a NULL pointer dereference in this case. Fix it by checking
    skb_dst(skb) for NULL after skb_dst_force() and drop the packet
    in cast the dst_entry was cleared.

    Fixes: 222d7dbd258d ("net: prevent dst uses after free")
    Reported-by: Tobias Hommel
    Reported-by: Kristian Evensen
    Reported-by: Wolfgang Walter
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Steffen Klassert
     

30 May, 2018

1 commit

  • [ Upstream commit 46c0ef6e1eb95f619d9f62da4332749153db92f7 ]

    In the xfrm_local_error, rcu_read_unlock should be called when afinfo
    is not NULL. because xfrm_state_get_afinfo calls rcu_read_unlock
    if afinfo is NULL.

    Fixes: af5d27c4e12b ("xfrm: remove xfrm_state_put_afinfo")
    Signed-off-by: Taehee Yoo
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

31 Oct, 2017

1 commit

  • We reset the encapsulation field of the skb too early
    in xfrm_output. As a result, the GRE GSO handler does
    not segment the packets. This leads to a performance
    drop down. We fix this by resetting the encapsulation
    field right before we do the transformation, when
    the inner headers become invalid.

    Fixes: f1bd7d659ef0 ("xfrm: Add encapsulation header offsets while SKB is not encrypted")
    Reported-by: Vicente De Luca
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

11 Aug, 2017

1 commit

  • On systems that use mark-based routing it may be necessary for
    routing lookups to use marks in order for packets to be routed
    correctly. An example of such a system is Android, which uses
    socket marks to route packets via different networks.

    Currently, routing lookups in tunnel mode always use a mark of
    zero, making routing incorrect on such systems.

    This patch adds a new output_mark element to the xfrm state and
    a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
    mark differs from the existing xfrm mark in two ways:

    1. The xfrm mark is used to match xfrm policies and states, while
    the xfrm output mark is used to set the mark (and influence
    the routing) of the packets emitted by those states.
    2. The existing mark is constrained to be a subset of the bits of
    the originating socket or transformed packet, but the output
    mark is arbitrary and depends only on the state.

    The use of a separate mark provides additional flexibility. For
    example:

    - A packet subject to two transforms (e.g., transport mode inside
    tunnel mode) can have two different output marks applied to it,
    one for the transport mode SA and one for the tunnel mode SA.
    - On a system where socket marks determine routing, the packets
    emitted by an IPsec tunnel can be routed based on a mark that
    is determined by the tunnel, not by the marks of the
    unencrypted packets.
    - Support for setting the output marks can be introduced without
    breaking any existing setups that employ both mark-based
    routing and xfrm tunnel mode. Simply changing the code to use
    the xfrm mark for routing output packets could xfrm mark could
    change behaviour in a way that breaks these setups.

    If the output mark is unspecified or set to zero, the mark is not
    set or changed.

    Tested: make allyesconfig; make -j64
    Tested: https://android-review.googlesource.com/452776
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert

    Lorenzo Colitti
     

14 Apr, 2017

2 commits

  • Both esp4 and esp6 used to assume that the SKB payload is encrypted
    and therefore the inner_network and inner_transport offsets are
    not relevant.
    When doing crypto offload in the NIC, this is no longer the case
    and the NIC driver needs these offsets so it can do TX TCP checksum
    offloading.
    This patch sets the inner_network and inner_transport members of
    the SKB, as well as encapsulation, to reflect the actual positions
    of these headers, and removes them only once encryption is done
    on the payload.

    Signed-off-by: Ilan Tayari
    Signed-off-by: Steffen Klassert

    Ilan Tayari
     
  • This patch adds all the bits that are needed to do
    IPsec hardware offload for IPsec states and ESP packets.
    We add xfrmdev_ops to the net_device. xfrmdev_ops has
    function pointers that are needed to manage the xfrm
    states in the hardware and to do a per packet
    offloading decision.

    Joint work with:
    Ilan Tayari
    Guy Shapiro
    Yossi Kuperman

    Signed-off-by: Guy Shapiro
    Signed-off-by: Ilan Tayari
    Signed-off-by: Yossi Kuperman
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

10 Jan, 2017

1 commit

  • commit 44abdc3047aecafc141dfbaf1ed
    ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
    xfrm_state_put_afinfo equivalent to rcu_read_unlock.

    Use spatch to replace it with direct calls to rcu_read_unlock:

    @@
    struct xfrm_state_afinfo *a;
    @@

    - xfrm_state_put_afinfo(a);
    + rcu_read_unlock();

    old:
    text data bss dec hex filename
    22570 72 424 23066 5a1a xfrm_state.o
    1612 0 0 1612 64c xfrm_output.o
    new:
    22554 72 424 23050 5a0a xfrm_state.o
    1596 0 0 1596 63c xfrm_output.o

    Signed-off-by: Florian Westphal
    Signed-off-by: Steffen Klassert

    Florian Westphal
     

17 Mar, 2016

1 commit


16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

08 Oct, 2015

3 commits


18 Sep, 2015

4 commits

  • In code review it was noticed that I had failed to add some blank lines
    in places where they are customarily used. Taking a second look at the
    code I have to agree blank lines would be nice so I have added them
    here.

    Reported-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 May, 2015

1 commit


08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

21 Oct, 2014

1 commit

  • skb_gso_segment has three possible return values:
    1. a pointer to the first segmented skb
    2. an errno value (IS_ERR())
    3. NULL. This can happen when GSO is used for header verification.

    However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
    and would oops when NULL is returned.

    Note that these call sites should never actually see such a NULL return
    value; all callers mask out the GSO bits in the feature argument.

    However, there have been issues with some protocol handlers erronously not
    respecting the specified feature mask in some cases.

    It is preferable to get 'have to turn off hw offloading, else slow' reports
    rather than 'kernel crashes'.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

10 Sep, 2014

1 commit


13 May, 2014

1 commit


19 Aug, 2013

1 commit

  • We need to choose the protocol family by skb->protocol. Otherwise we
    call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is
    used in ipv4 mode, in which case we should call down to xfrm4_local_error
    (ip6 sockets are a superset of ip4 ones).

    We are called before before ip_output functions, so skb->protocol is
    not reset.

    Cc: Steffen Klassert
    Acked-by: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

14 Aug, 2013

1 commit

  • In xfrm4 and xfrm6 we need to take care about sockets of the other
    address family. This could happen because a 6in4 or 4in6 tunnel could
    get protected by ipsec.

    Because we don't want to have a run-time dependency on ipv6 when only
    using ipv4 xfrm we have to embed a pointer to the correct local_error
    function in xfrm_state_afinet and look it up when returning an error
    depending on the socket address family.

    Thanks to vi0ss for the great bug report:

    v2:
    a) fix two more unsafe interpretations of skb->sk as ipv6 socket
    (xfrm6_local_dontfrag and __xfrm6_output)
    v3:
    a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
    building ipv6 as a module (thanks to Steffen Klassert)

    Reported-by:
    Cc: Steffen Klassert
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Steffen Klassert

    Hannes Frederic Sowa
     

05 Jun, 2013

1 commit


23 May, 2013

1 commit

  • The error exit path needs err explicitly set. Otherwise it
    returns success and the only caller, xfrm_output_resume(),
    would oops in skb_dst(skb)->ops derefence as skb_dst(skb) is
    NULL.

    Bug introduced in commit bb65a9cb (xfrm: removes a superfluous
    check and add a statistic).

    Signed-off-by: Timo Teräs
    Cc: Li RongQing
    Cc: Steffen Klassert
    Signed-off-by: David S. Miller

    Timo Teräs
     

01 Feb, 2013

1 commit


07 Jan, 2013

1 commit

  • Remove the check if x->km.state equal to XFRM_STATE_VALID in
    xfrm_state_check_expire(), which will be done before call
    xfrm_state_check_expire().

    add a LINUX_MIB_XFRMOUTSTATEINVALID statistic to record the
    outbound error due to invalid xfrm state.

    Signed-off-by: Li RongQing
    Signed-off-by: Steffen Klassert

    Li RongQing
     

23 Mar, 2012

1 commit


28 Mar, 2011

2 commits


14 Mar, 2011

2 commits


17 Sep, 2010

1 commit


05 Jun, 2010

1 commit

  • xfrm triggers a warning if dst_pop() drops a refcount
    on a noref dst. This patch changes dst_pop() to
    skb_dst_pop(). skb_dst_pop() drops the refcnt only
    on a refcounted dst. Also we don't clone the child
    dst_entry, so it is not refcounted and we can use
    skb_dst_set_noref() in xfrm_output_one().

    Signed-off-by: Steffen Klassert
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Steffen Klassert
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Nov, 2008

2 commits


30 Sep, 2008

1 commit