30 May, 2018

1 commit

  • [ Upstream commit d52e5a7e7ca49457dd31fc8b42fb7c0d58a31221 ]

    Prior to the rework of PMTU information storage in commit
    2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."),
    when a PMTU event advertising a PMTU smaller than
    net.ipv4.route.min_pmtu was received, we would disable setting the DF
    flag on packets by locking the MTU metric, and set the PMTU to
    net.ipv4.route.min_pmtu.

    Since then, we don't disable DF, and set PMTU to
    net.ipv4.route.min_pmtu, so the intermediate router that has this link
    with a small MTU will have to drop the packets.

    This patch reestablishes pre-2.6.39 behavior by splitting
    rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
    rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
    and is checked in ip_dont_fragment().

    One possible workaround is to set net.ipv4.route.min_pmtu to a value low
    enough to accommodate the lowest MTU encountered.

    Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

11 Aug, 2017

1 commit

  • On systems that use mark-based routing it may be necessary for
    routing lookups to use marks in order for packets to be routed
    correctly. An example of such a system is Android, which uses
    socket marks to route packets via different networks.

    Currently, routing lookups in tunnel mode always use a mark of
    zero, making routing incorrect on such systems.

    This patch adds a new output_mark element to the xfrm state and
    a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
    mark differs from the existing xfrm mark in two ways:

    1. The xfrm mark is used to match xfrm policies and states, while
    the xfrm output mark is used to set the mark (and influence
    the routing) of the packets emitted by those states.
    2. The existing mark is constrained to be a subset of the bits of
    the originating socket or transformed packet, but the output
    mark is arbitrary and depends only on the state.

    The use of a separate mark provides additional flexibility. For
    example:

    - A packet subject to two transforms (e.g., transport mode inside
    tunnel mode) can have two different output marks applied to it,
    one for the transport mode SA and one for the tunnel mode SA.
    - On a system where socket marks determine routing, the packets
    emitted by an IPsec tunnel can be routed based on a mark that
    is determined by the tunnel, not by the marks of the
    unencrypted packets.
    - Support for setting the output marks can be introduced without
    breaking any existing setups that employ both mark-based
    routing and xfrm tunnel mode. Simply changing the code to use
    the xfrm mark for routing output packets could xfrm mark could
    change behaviour in a way that breaks these setups.

    If the output mark is unspecified or set to zero, the mark is not
    set or changed.

    Tested: make allyesconfig; make -j64
    Tested: https://android-review.googlesource.com/452776
    Signed-off-by: Lorenzo Colitti
    Signed-off-by: Steffen Klassert

    Lorenzo Colitti
     

19 Jul, 2017

2 commits

  • After rcu conversions performance degradation in forward tests isn't that
    noticeable anymore.

    See next patch for some numbers.

    A followup patcg could then also remove genid from the policies
    as we do not cache bundles anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • revert c386578f1cdb4dac230395 ("xfrm: Let the flowcache handle its size by default.").

    Once we remove flow cache, we don't have a flow cache limit anymore.
    We must not allow (virtually) unlimited allocations of xfrm dst entries.
    Revert back to the old xfrm dst gc limits.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

09 Feb, 2017

3 commits


13 Sep, 2016

1 commit


11 Sep, 2016

1 commit


09 Sep, 2016

1 commit

  • Steffen Klassert says:

    ====================
    ipsec 2016-09-08

    1) Fix a crash when xfrm_dump_sa returns an error.
    From Vegard Nossum.

    2) Remove some incorrect WARN() on normal error handling.
    From Vegard Nossum.

    3) Ignore socket policies when rebuilding hash tables,
    socket policies are not inserted into the hash tables.
    From Tobias Brunner.

    4) Initialize and check tunnel pointers properly before
    we use it. From Alexey Kodanev.

    5) Fix l3mdev oif setting on xfrm dst lookups.
    From David Ahern.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Aug, 2016

1 commit

  • Subash reported that commit 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
    broke a wifi use case that uses fib rules and xfrms. The intent of
    42a7b32b73d6 was driven by VRFs with IPsec. As a compromise relax the
    use of oif in xfrm lookups to L3 master devices only (ie., oif is either
    an L3 master device or is enslaved to a master device).

    Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
    Reported-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: David Ahern
    Signed-off-by: Steffen Klassert

    David Ahern
     

17 Jun, 2016

1 commit

  • Modern C standards expect the '__inline__' keyword to come before the return
    type in a declaration, and we get a couple of warnings for this with "make W=1"
    in the xfrm{4,6}_policy.c files:

    net/ipv6/xfrm6_policy.c:369:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]
    static int inline xfrm6_net_sysctl_init(struct net *net)
    net/ipv6/xfrm6_policy.c:374:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]
    static void inline xfrm6_net_sysctl_exit(struct net *net)
    net/ipv4/xfrm4_policy.c:339:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]
    static int inline xfrm4_net_sysctl_init(struct net *net)
    net/ipv4/xfrm4_policy.c:344:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]
    static void inline xfrm4_net_sysctl_exit(struct net *net)

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

23 Dec, 2015

1 commit


03 Nov, 2015

1 commit

  • Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
    templates; their dst_entries counters will never be used. Move the
    xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
    xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
    and dst_entries_destroy for each net namespace.

    The ipv4 and ipv6 xfrms each create dst_ops template, and perform
    dst_entries_init on the templates. The template values are copied to each
    net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
    pcpuc_entries field is a percpu counter and cannot be used correctly by
    simply copying it to another object.

    The result of this is a very subtle bug; changes to the dst entries
    counter from one net namespace may sometimes get applied to a different
    net namespace dst entries counter. This is because of how the percpu
    counter works; it has a main count field as well as a pointer to the
    percpu variables. Each net namespace maintains its own main count
    variable, but all point to one set of percpu variables. When any net
    namespace happens to change one of the percpu variables to outside its
    small batch range, its count is moved to the net namespace's main count
    variable. So with multiple net namespaces operating concurrently, the
    dst_ops entries counter can stray from the actual value that it should
    be; if counts are consistently moved from one net namespace to another
    (which my testing showed is likely), then one net namespace winds up
    with a negative dst_ops count while another winds up with a continually
    increasing count, eventually reaching its gc_thresh limit, which causes
    all new traffic on the net namespace to fail with -ENOBUFS.

    Signed-off-by: Dan Streetman
    Signed-off-by: Dan Streetman
    Signed-off-by: Steffen Klassert

    Dan Streetman
     

30 Oct, 2015

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2015-10-30

    1) The flow cache is limited by the flow cache limit which
    depends on the number of cpus and the xfrm garbage collector
    threshold which is independent of the number of cpus. This
    leads to the fact that on systems with more than 16 cpus
    we hit the xfrm garbage collector limit and refuse new
    allocations, so new flows are dropped. On systems with 16
    or less cpus, we hit the flowcache limit. In this case, we
    shrink the flow cache instead of refusing new flows.

    We increase the xfrm garbage collector threshold to INT_MAX
    to get the same behaviour, independent of the number of cpus.

    2) Fix some unaligned accesses on sparc systems.
    From Sowmini Varadhan.

    3) Fix some header checks in _decode_session4. We may call
    pskb_may_pull with a negative value converted to unsigened
    int from pskb_may_pull. This can lead to incorrect policy
    lookups. We fix this by a check of the data pointer position
    before we call pskb_may_pull.

    4) Reload skb header pointers after calling pskb_may_pull
    in _decode_session4 as this may change the pointers into
    the packet.

    5) Add a missing statistic counter on inner mode errors.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

2 commits


30 Sep, 2015

1 commit


29 Sep, 2015

1 commit

  • The xfrm flowcache size is limited by the flowcache limit
    (4096 * number of online cpus) and the xfrm garbage collector
    threshold (2 * 32768), whatever is reached first. This means
    that we can hit the garbage collector limit only on systems
    with more than 16 cpus. On such systems we simply refuse
    new allocations if we reach the limit, so new flows are dropped.
    On syslems with 16 or less cpus, we hit the flowcache limit.
    In this case, we shrink the flow cache instead of refusing new
    flows.

    We increase the xfrm garbage collector threshold to INT_MAX
    to get the same behaviour, independent of the number of cpus.

    The xfrm garbage collector threshold can still be set below
    the flowcache limit to reduce the memory usage of the flowcache.

    Tested-by: Dan Streetman
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

27 Sep, 2015

1 commit


18 Sep, 2015

1 commit

  • Steffen reported that the recent change to add oif to dst lookups breaks
    the VTI use case. The problem is that with the oif set in the flow struct
    the comparison to the nh_oif is triggered. Fix by splitting the
    FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
    bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
    nh oif (FLOWI_FLAG_SKIP_NH_OIF).

    Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")

    Signed-off-by: David Ahern
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David Ahern
     

16 Sep, 2015

1 commit


26 Aug, 2015

1 commit

  • Directs route lookups to VRF table. Compiles out if NET_VRF is not
    enabled. With this patch able to successfully bring up ipsec tunnels
    in VRFs, even with duplicate network configuration.

    Signed-off-by: David Ahern
    Acked-by: Nikolay Aleksandrov
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    David Ahern
     

11 Aug, 2015

1 commit

  • Rules can be installed that direct route lookups to specific tables based
    on oif. Plumb the oif through the xfrm lookups so it gets set in the flow
    struct and passed to the resolver routines.

    Signed-off-by: David Ahern
    Signed-off-by: Steffen Klassert

    David Ahern
     

04 Apr, 2015

1 commit

  • The ipv4 code uses a mixture of coding styles. In some instances check
    for NULL pointer is done as x == NULL and sometimes as !x. !x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

10 Mar, 2015

1 commit

  • After my change to neigh_hh_init to obtain the protocol from the
    neigh_table there are no more users of protocol in struct dst_ops.
    Remove the protocol field from dst_ops and all of it's initializers.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

14 Mar, 2014

1 commit


01 Nov, 2013

1 commit

  • On some codepaths the skb does not have a dst entry
    when xfrm_decode_session() is called. So check for
    a valid skb_dst() before dereferencing the device
    interface index. We use 0 as the device index if
    there is no valid skb_dst(), or at reverse decoding
    we use skb_iif as device interface index.

    Bug was introduced with git commit bafd4bd4dc
    ("xfrm: Decode sessions with output interface.").

    Reported-by: Meelis Roos
    Tested-by: Meelis Roos
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

28 Oct, 2013

1 commit

  • With the removal of the routing cache, we lost the
    option to tweak the garbage collector threshold
    along with the maximum routing cache size. So git
    commit 703fb94ec ("xfrm: Fix the gc threshold value
    for ipv4") moved back to a static threshold.

    It turned out that the current threshold before we
    start garbage collecting is much to small for some
    workloads, so increase it from 1024 to 32768. This
    means that we start the garbage collector if we have
    more than 32768 dst entries in the system and refuse
    new allocations if we are above 65536.

    Reported-by: Wolfgang Walter
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

16 Sep, 2013

1 commit


06 Feb, 2013

2 commits


13 Nov, 2012

1 commit

  • The xfrm gc threshold value depends on ip_rt_max_size. This
    value was set to INT_MAX with the routing cache removal patch,
    so we start doing garbage collecting when we have INT_MAX/2
    IPsec routes cached. Fix this by going back to the static
    threshold of 1024 routes.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

09 Oct, 2012

1 commit

  • Add new flag to remember when route is via gateway.
    We will use it to allow rt_gateway to contain address of
    directly connected host for the cases when DST_NOCACHE is
    used or when the NH exception caches per-destination route
    without DST_NOCACHE flag, i.e. when routes are not used for
    other destinations. By this way we force the neighbour
    resolving to work with the routed destination but we
    can use different address in the packet, feature needed
    for IPVS-DR where original packet for virtual IP is routed
    via route to real IP.

    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Julian Anastasov
     

01 Aug, 2012

1 commit

  • When a device is unregistered, we have to purge all of the
    references to it that may exist in the entire system.

    If a route is uncached, we currently have no way of accomplishing
    this.

    So create a global list that is scanned when a network device goes
    down. This mirrors the logic in net/core/dst.c's dst_ifdown().

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Jul, 2012

4 commits