09 May, 2020

1 commit

  • percpu_counter_add() uses a default batch size which is quite big
    on platforms with 256 cpus. (2*256 -> 512)

    This means dst_entries_get_fast() can be off by +/- 2*(nr_cpus^2)
    (131072 on servers with 256 cpus)

    Reduce the batch size to something more reasonable, and
    add logic to ip6_dst_gc() to call dst_entries_get_slow()
    before calling the _very_ expensive fib6_run_gc() function.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     

25 Dec, 2019

1 commit

  • The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Feb, 2017

1 commit

  • Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
    to lookup and confirm the neighbour. Its usage via the new helper
    dst_confirm_neigh() should be restricted to MSG_PROBE users for
    performance reasons.

    For XFRM prefer the last tunnel address, if present. With help
    from Steffen Klassert.

    Signed-off-by: Julian Anastasov
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Julian Anastasov
     

21 Jan, 2017

1 commit


08 Oct, 2015

2 commits


10 Mar, 2015

1 commit

  • After my change to neigh_hh_init to obtain the protocol from the
    neigh_table there are no more users of protocol in struct dst_ops.
    Remove the protocol field from dst_ops and all of it's initializers.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_counters too.

    We could have left percpu_counter_init() alone and added
    percpu_counter_init_gfp(); however, the number of users isn't that
    high and introducing _gfp variants to all percpu data structures would
    be quite ugly, so let's just do the conversion. This is the one with
    the most users. Other percpu data structures are a lot easier to
    convert.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Jan Kara
    Acked-by: "David S. Miller"
    Cc: x86@kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andrew Morton

    Tejun Heo
     

20 Jul, 2012

1 commit


17 Jul, 2012

1 commit

  • This will be used so that we can compose a full flow key.

    Even though we have a route in this context, we need more. In the
    future the routes will be without destination address, source address,
    etc. keying. One ipv4 route will cover entire subnets, etc.

    In this environment we have to have a way to possess persistent storage
    for redirects and PMTU information. This persistent storage will exist
    in the FIB tables, and that's why we'll need to be able to rebuild a
    full lookup flow key here. Using that flow key will do a fib_lookup()
    and create/update the persistent entry.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jul, 2012

1 commit


05 Jul, 2012

1 commit


16 Apr, 2012

1 commit


27 Nov, 2011

1 commit


18 Jul, 2011

1 commit


27 Jan, 2011

1 commit

  • Routing metrics are now copy-on-write.

    Initially a route entry points it's metrics at a read-only location.
    If a routing table entry exists, it will point there. Else it will
    point at the all zero metric place-holder called 'dst_default_metrics'.

    The writeability state of the metrics is stored in the low bits of the
    metrics pointer, we have two bits left to spare if we want to store
    more states.

    For the initial implementation, COW is implemented simply via kmalloc.
    However future enhancements will change this to place the writable
    metrics somewhere else, in order to increase sharing. Very likely
    this "somewhere else" will be the inetpeer cache.

    Note also that this means that metrics updates may transiently fail
    if we cannot COW the metrics successfully.

    But even by itself, this patch should decrease memory usage and
    increase cache locality especially for routing workloads. In those
    cases the read-only metric copies stay in place and never get written
    to.

    TCP workloads where metrics get updated, and those rare cases where
    PMTU triggers occur, will take a very slight performance hit. But
    that hit will be alleviated when the long-term writable metrics
    move to a more sharable location.

    Since the metrics storage went from a u32 array of RTAX_MAX entries to
    what is essentially a pointer, some retooling of the dst_entry layout
    was necessary.

    Most importantly, we need to preserve the alignment of the reference
    count so that it doesn't share cache lines with the read-mostly state,
    as per Eric Dumazet's alignment assertion checks.

    The only non-trivial bit here is the move of the 'flags' member into
    the writeable cacheline. This is OK since we are always accessing the
    flags around the same moment when we made a modification to the
    reference count.

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Dec, 2010

1 commit


14 Dec, 2010

1 commit

  • Make all RTAX_ADVMSS metric accesses go through a new helper function,
    dst_metric_advmss().

    Leave the actual default metric as "zero" in the real metric slot,
    and compute the actual default value dynamically via a new dst_ops
    AF specific callback.

    For stacked IPSEC routes, we use the advmss of the path which
    preserves existing behavior.

    Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
    advmss on pmtu updates. This inconsistency in advmss handling
    results in more raw metric accesses than I wish we ended up with.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Nov, 2010

1 commit

  • Presently the b43legacy build fails on an sh randconfig:

    In file included from include/net/dst.h:12,
    from drivers/net/wireless/b43legacy/xmit.c:32:
    include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
    include/net/dst_ops.h: In function 'dst_entries_get_fast':
    include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
    include/net/dst_ops.h: In function 'dst_entries_get_slow':
    include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
    include/net/dst_ops.h: In function 'dst_entries_add':
    include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
    include/net/dst_ops.h: In function 'dst_entries_init':
    include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
    include/net/dst_ops.h: In function 'dst_entries_destroy':
    include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
    make[5]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
    make[5]: *** Waiting for unfinished jobs....

    Signed-off-by: Paul Mundt
    Signed-off-by: David S. Miller

    Paul Mundt
     

12 Oct, 2010

1 commit

  • struct dst_ops tracks number of allocated dst in an atomic_t field,
    subject to high cache line contention in stress workload.

    Switch to a percpu_counter, to reduce number of time we need to dirty a
    central location. Place it on a separate cache line to avoid dirtying
    read only fields.

    Stress test :

    (Sending 160.000.000 UDP frames,
    IP route cache disabled, dual E5540 @2.53GHz,
    32bit kernel, FIB_TRIE, SLUB/NUMA)

    Before:

    real 0m51.179s
    user 0m15.329s
    sys 10m15.942s

    After:

    real 0m45.570s
    user 0m15.525s
    sys 9m56.669s

    With a small reordering of struct neighbour fields, subject of a
    following patch, (to separate refcnt from other read mostly fields)

    real 0m41.841s
    user 0m15.261s
    sys 8m45.949s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Sep, 2009

1 commit

  • struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
    but there is no fundamental reason for it. Embed it directly into
    struct netns_ipv6.

    For that:
    * move struct dst_ops into separate header to fix circular dependencies
    I honestly tried not to, it's pretty impossible to do other way
    * drop dynamical allocation, allocate together with netns

    For a change, remove struct dst_ops::dst_net, it's deducible
    by using container_of() given dst_ops pointer.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan