02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
17 Jul, 2017
1 commit
-
As discussed in Faro during Netfilter Workshop 2017, RB trees can be
used with RCU, using a seqlock.Note that net/rxrpc/conn_service.c is already using this.
This patch converts inetpeer from AVL tree to RB tree, since it allows
to remove private AVL implementation in favor of shared RB code.$ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
text data bss dec hex filename
3195 40 128 3363 d23 net/ipv4/inetpeer.before
1562 24 0 1586 632 net/ipv4/inetpeer.afterThe same technique can be used to speed up
net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Jul, 2017
1 commit
-
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
This conversion requires overall +1 on the whole
refcounting scheme.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller
16 Dec, 2015
1 commit
-
David Ahern added a vif field in the a4 part of inetpeer_addr struct.
This broke IPv4 TCP fast open client side and more generally tcp metrics
cache, because inetpeer_addr_cmp() is now comparing two u32 instead of
one.inetpeer_set_addr_v4() needs to properly init vif field, otherwise
the comparison result depends on uninitialized data.Fixes: 192132b9a034 ("net: Add support for VRFs to inetpeer cache")
Reported-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Signed-off-by: David S. Miller
29 Aug, 2015
4 commits
-
inetpeer caches based on address only, so duplicate IP addresses within
a namespace return the same cached entry. Enhance the ipv4 address key
to contain both the IPv4 address and VRF device index.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Move the inetpeer_addr_base union to inetpeer_addr and drop
inetpeer_addr_base.Both the a6 and in6_addr overlays are not needed; drop the __be32 version
and rename in6 to a6 for consistency with ipv4. Add a new u32 array to
the union which removes the need for the typecast in the compare function
and the use of a consistent arg for both ipv4 and ipv6 addresses which
makes the compare function more readable.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.Signed-off-by: David Ahern
Signed-off-by: David S. Miller
26 Aug, 2015
1 commit
-
Remove various inlined functions not referenced in the kernel.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
01 Apr, 2015
1 commit
-
In many places, the a6 field is typecasted to struct in6_addr. As the
fields are in union anyway, just add in6_addr type to the union and get rid
of the typecasting.Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller
09 Sep, 2014
1 commit
-
inetpeer sequence numbers are no longer incremented, so no need to
check and flush the tree. The function that increments the sequence
number was already dead code and removed in in "ipv4: remove unused
function" (068a6e18). Remove the code that checks for a change, too.Verifying that v4_seq and v6_seq are never incremented and thus that
flush_check compares bp->flush_seq to 0 is trivial.The second part of the change removes flush_check completely even
though bp->flush_seq is exactly !0 once, at initialization. This
change is correct because the time this branch is true is when
bp->root == peer_avl_empty_rcu, in which the branch and
inetpeer_invalidate_tree are a NOOP.Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
04 Jun, 2014
1 commit
-
Conflicts:
include/net/inetpeer.h
net/ipv6/output_core.cChanges in net were fixing bugs in code removed in net-next.
Signed-off-by: David S. Miller
03 Jun, 2014
2 commits
-
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)It appears I introduced this bug in linux-3.1.
inet_getid() must return the old value of peer->ip_id_count,
not the new one.Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Ideally, we would need to generate IP ID using a per destination IP
generator.linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Dec, 2013
1 commit
-
inetpeer_invalidate_family defined but never used
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
22 Sep, 2013
1 commit
-
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
11 Jul, 2012
2 commits
-
Maintaining this in the inetpeer entries was not the right way to do
this at all.Signed-off-by: David S. Miller
-
With help from Lin Ming.
Signed-off-by: David S. Miller
11 Jun, 2012
3 commits
-
We handle NULL in rt{,6}_set_peer but then our caller will try to pass
that NULL pointer into inet_putpeer() which isn't ready for it.Fix this by moving the NULL check one level up, and then remove the
now unnecessary NULL check from inetpeer_ptr_set_peer().Reported-by: Eric Dumazet
Signed-off-by: David S. Miller -
This implementation can deal with having many inetpeer roots, which is
a necessary prerequisite for per-FIB table rooted peer tables.Each family (AF_INET, AF_INET6) has a sequence number which we bump
when we get a family invalidation request.Each peer lookup cheaply checks whether the flush sequence of the
root we are using is out of date, and if so flushes it and updates
the sequence number.Signed-off-by: David S. Miller
-
We encode the pointer(s) into an unsigned long with one state bit.
The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.Later the peer roots will be per-FIB table, and this change works to
facilitate that.Signed-off-by: David S. Miller
10 Jun, 2012
3 commits
-
Otherwise we reference potentially non-existing members when
ipv6 is disabled.Signed-off-by: David S. Miller
-
We only need one interface for this operation, since we always know
which inetpeer root we want to flush.Signed-off-by: David S. Miller
-
Instead of net/ipv4/inetpeer.c
Signed-off-by: David S. Miller
09 Jun, 2012
2 commits
-
add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.and modify some places to provide net for inet_getpeer_v[4,6]
Signed-off-by: Gao feng
Signed-off-by: David S. Miller -
now inetpeer doesn't support namespace,the information will
be leaking across namespace.this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.and change family_to_base and inet_getpeer to support namespace.
Signed-off-by: Gao feng
Signed-off-by: David S. Miller
07 Jun, 2012
1 commit
-
commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.inetpeer_invalidate_tree() waits for a RCU grace period before inserting
inetpeer tree into gc_list and waking the worker. At that time, no
concurrent lookup can find a inetpeer in this tree.Signed-off-by: Eric Dumazet
Cc: Steffen Klassert
Acked-by: Steffen Klassert
Signed-off-by: David S. Miller
08 Mar, 2012
2 commits
-
As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller -
We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
03 Dec, 2011
1 commit
27 Nov, 2011
1 commit
-
Now inetpeer is the place where we cache redirect information for ipv4
destinations, we must be able to invalidate informations when a route is
added/removed on host.As inetpeer is not yet namespace aware, this patch adds a shared
redirect_genid, and a per inetpeer redirect_genid. This might be changed
later if inetpeer becomes ns aware.Cache information for one inerpeer is valid as long as its
redirect_genid has the same value than global redirect_genid.Reported-by: Arkadiusz Miśkiewicz
Tested-by: Arkadiusz Miśkiewicz
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Nov, 2011
1 commit
-
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
27 Jul, 2011
1 commit
-
This allows us to move duplicated code in
(atomic_inc_not_zero() for now) toSigned-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Jul, 2011
1 commit
-
IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameterReported-by: Fernando Gont
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Jun, 2011
2 commits
-
Profiles show false sharing in addr_compare() because refcnt/dtime
changes dirty the first inet_peer cache line, where are lying the keys
used at lookup time. If many cpus are calling inet_getpeer() and
inet_putpeer(), or need frag ids, addr_compare() is in 2nd position in
"perf top".Before patch, my udpflood bench (16 threads) on my 2x4x2 machine :
5784.00 9.7% csum_partial_copy_generic [kernel]
3356.00 5.6% addr_compare [kernel]
2638.00 4.4% fib_table_lookup [kernel]
2625.00 4.4% ip_fragment [kernel]
1934.00 3.2% neigh_lookup [kernel]
1617.00 2.7% udp_sendmsg [kernel]
1608.00 2.7% __ip_route_output_key [kernel]
1480.00 2.5% __ip_append_data [kernel]
1396.00 2.3% kfree [kernel]
1195.00 2.0% kmem_cache_free [kernel]
1157.00 1.9% inet_getpeer [kernel]
1121.00 1.9% neigh_resolve_output [kernel]
1012.00 1.7% dev_queue_xmit [kernel]
# time ./udpflood.shreal 0m44.511s
user 0m20.020s
sys 11m22.780s# time ./udpflood.sh
real 0m44.099s
user 0m20.140s
sys 11m15.870sAfter patch, no more addr_compare() in profiles :
4171.00 10.7% csum_partial_copy_generic [kernel]
1787.00 4.6% fib_table_lookup [kernel]
1756.00 4.5% ip_fragment [kernel]
1234.00 3.2% udp_sendmsg [kernel]
1191.00 3.0% neigh_lookup [kernel]
1118.00 2.9% __ip_append_data [kernel]
1022.00 2.6% kfree [kernel]
993.00 2.5% __ip_route_output_key [kernel]
841.00 2.2% neigh_resolve_output [kernel]
816.00 2.1% kmem_cache_free [kernel]
658.00 1.7% ia32_sysenter_target [kernel]
632.00 1.6% kmem_cache_alloc_node [kernel]# time ./udpflood.sh
real 0m41.587s
user 0m19.190s
sys 10m36.370s# time ./udpflood.sh
real 0m41.486s
user 0m19.290s
sys 10m33.650sSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.Signed-off-by: Eric Dumazet
CC: Andi Kleen
CC: Tim Chen
Signed-off-by: David S. Miller
23 Apr, 2011
1 commit
-
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Feb, 2011
2 commits
-
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst->expires.The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.In this way PMTU expiration does not kill off the cached route as is
done currently.Redirect information is permanent, or at least until another redirect
is received.Signed-off-by: David S. Miller
-
Future changes will add caching information, and some of
these new elements will be addresses.Since the family is implicit via the ->daddr.family member,
replicating the family in ever address we store is entirely
redundant.Signed-off-by: David S. Miller
05 Feb, 2011
1 commit
-
Like metrics, the ICMP rate limiting bits are cached state about
a destination. So move it into the inet_peer entries.If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.Signed-off-by: David S. Miller