Eric Lee / smarc-fsl-linux-kernel

09 May, 2020

1 commit

cf86a086a net/dst: use a smaller percpu_counter batch for dst entries accounting ... Browse Code »

percpu_counter_add() uses a default batch size which is quite big
on platforms with 256 cpus. (2*256 -> 512)

This means dst_entries_get_fast() can be off by +/- 2*(nr_cpus^2)
(131072 on servers with 256 cpus)

Reduce the batch size to something more reasonable, and
add logic to ip6_dst_gc() to call dst_entries_get_slow()
before calling the _very_ expensive fib6_run_gc() function.

Signed-off-by: Eric Dumazet
Signed-off-by: Jakub Kicinski

Eric Dumazet
2020-05-09 12:33:33 +0800

25 Dec, 2019

1 commit

bd085ef67 net: add bool confirm_neigh parameter for dst_ops.update_pmtu ... Browse Code »

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller
Reviewed-by: Guillaume Nault
Acked-by: David Ahern
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller

Hangbin Liu
2019-12-25 14:28:54 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

08 Feb, 2017

1 commit

63fca65d0 net: add confirm_neigh method to dst_ops ... Browse Code »

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

For XFRM prefer the last tunnel address, if present. With help
from Steffen Klassert.

Signed-off-by: Julian Anastasov
Acked-by: Steffen Klassert
Signed-off-by: David S. Miller

Julian Anastasov
2017-02-08 02:07:46 +0800

21 Jan, 2017

1 commit

c2a2efbbf net: remove bh disabling around percpu_counter accesses ... Browse Code »

Shaohua Li made percpu_counter irq safe in commit 098faf5805c8
("percpu_counter: make APIs irq safe")

We can safely remove BH disable/enable sections around various
percpu_counter manipulations.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2017-01-21 00:27:22 +0800

08 Oct, 2015

2 commits

cf91a99da ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:27:02 +0800
4ebdfba73 dst: Pass a sk into .local_out ... Browse Code »

For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a
struct net reference into .local_out.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-10-08 19:26:55 +0800

10 Mar, 2015

1 commit

ddb3b6033 net: Remove protocol from struct dst_ops ... Browse Code »

After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-10 04:06:10 +0800

08 Sep, 2014

1 commit

908c7f194 percpu_counter: add @gfp to percpu_counter_init() ... Browse Code »

Percpu allocator now supports allocation mask. Add @gfp to
percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
with percpu_counters too.

We could have left percpu_counter_init() alone and added
percpu_counter_init_gfp(); however, the number of users isn't that
high and introducing _gfp variants to all percpu data structures would
be quite ugly, so let's just do the conversion. This is the one with
the most users. Other percpu data structures are a lot easier to
convert.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo
Acked-by: Jan Kara
Acked-by: "David S. Miller"
Cc: x86@kernel.org
Cc: Jens Axboe
Cc: "Theodore Ts'o"
Cc: Alexander Viro
Cc: Andrew Morton

Tejun Heo
2014-09-08 08:51:29 +0800

20 Jul, 2012

1 commit

d8f1641b5 net: Fix warnings in dst_ops.h ... Browse Code »

include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller

David S. Miller
2012-07-20 01:43:03 +0800

17 Jul, 2012

1 commit

6700c2709 net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() ... Browse Code »

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller

David S. Miller
2012-07-17 18:29:28 +0800

12 Jul, 2012

1 commit

e47a185b3 ipv4: Generalize ip_do_redirect() and hook into new dst_ops->redirect. ... Browse Code »

All of the redirect acceptance policy is now contained within.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 11:55:47 +0800

05 Jul, 2012

1 commit

f894cbf84 net: Add optional SKB arg to dst_ops->neigh_lookup(). ... Browse Code »

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:04:01 +0800

16 Apr, 2012

1 commit

95c961747 net: cleanup unsigned to unsigned int ... Browse Code »

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-16 00:44:40 +0800

27 Nov, 2011

1 commit

ebb762f27 net: Rename the dst_opt default_mtu method to mtu ... Browse Code »

We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-11-27 03:29:50 +0800

18 Jul, 2011

1 commit

d3aaeb38c net: Add ->neigh_lookup() operation to dst_ops ... Browse Code »
1

In the future dst entries will be neigh-less. In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.

Signed-off-by: David S. Miller

David S. Miller
2011-07-18 15:40:17 +0800

27 Jan, 2011

1 commit

62fa8a846 net: Implement read-only protection and COW'ing of metrics. ... Browse Code »

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there. Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing. Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads. In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit. But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline. This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller

David S. Miller
2011-01-27 12:51:05 +0800

15 Dec, 2010

1 commit

d33e45533 net: Abstract default MTU metric calculation behind an accessor. ... Browse Code »

Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.

Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.

Signed-off-by: David S. Miller

David S. Miller
2010-12-15 05:01:14 +0800

14 Dec, 2010

1 commit

0dbaee3b3 net: Abstract default ADVMSS behind an accessor. ... Browse Code »

Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().

Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.

For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.

Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates. This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.

Signed-off-by: David S. Miller

David S. Miller
2010-12-14 04:52:14 +0800

08 Nov, 2010

1 commit

43b81f85e net dst: need linux/cache.h for ____cacheline_aligned_in_smp. ... Browse Code »

Presently the b43legacy build fails on an sh randconfig:

In file included from include/net/dst.h:12,
from drivers/net/wireless/b43legacy/xmit.c:32:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[5]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[5]: *** Waiting for unfinished jobs....

Signed-off-by: Paul Mundt
Signed-off-by: David S. Miller

Paul Mundt
2010-11-08 11:58:05 +0800

12 Oct, 2010

1 commit

fc66f95c6 net dst: use a percpu_counter to track entries ... Browse Code »

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real 0m51.179s
user 0m15.329s
sys 10m15.942s

After:

real 0m45.570s
user 0m15.525s
sys 9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real 0m41.841s
user 0m15.261s
sys 8m45.949s

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-10-12 04:06:53 +0800

02 Sep, 2009

1 commit

86393e52c netns: embed ip6_dst_ops directly ... Browse Code »

struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2009-09-02 08:40:31 +0800