21 Jul, 2012
2 commits
-
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions. (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.Signed-off-by: David S. Miller
-
If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.Before we allocate the route, we lookup the exception.
Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.With help from Eric Dumazet.
Signed-off-by: David S. Miller
18 Jul, 2012
1 commit
-
free_nh_exceptions() should use rcu_dereference_protected(..., 1)
since its called after one RCU grace period.Also add some const-ification in recent code.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Jul, 2012
1 commit
-
In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.This is implemented here via nexthop exceptions.
The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5. A more sophisticated scheme can be
devised if that proves necessary.Signed-off-by: David S. Miller
11 Jul, 2012
1 commit
-
Rather than at every struct rtable creation.
Signed-off-by: David S. Miller
06 Jul, 2012
1 commit
-
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.It's just pure overhead.
Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: David S. Miller
29 Jun, 2012
1 commit
-
If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.Signed-off-by: David S. Miller
18 Jun, 2012
1 commit
-
It makes no sense to execute this limit test every time we create a
routing cache entry.We can't simply error out on these things since we've silently
accepted and truncated them forever.Signed-off-by: David S. Miller
24 May, 2012
1 commit
-
We hit a kernel OOPS.
[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G W
3.0.8-137685-ge7742f9 #1
[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
[23899.904225] Call Trace:
[23899.989209] [] ? pgtable_bad+0x130/0x130
[23900.000416] [] __might_sleep+0x10a/0x110
[23900.007357] [] do_page_fault+0xd1/0x3c0
[23900.013764] [] ? restore_all+0xf/0xf
[23900.024024] [] ? napi_complete+0x8b/0x690
[23900.029297] [] ? pgtable_bad+0x130/0x130
[23900.123739] [] ? pgtable_bad+0x130/0x130
[23900.128955] [] error_code+0x5f/0x64
[23900.133466] [] ? pgtable_bad+0x130/0x130
[23900.138450] [] ? __ip_route_output_key+0x698/0x7c0
[23900.144312] [] ? __ip_route_output_key+0x38d/0x7c0
[23900.150730] [] ip_route_output_flow+0x1f/0x60
[23900.156261] [] ip4_datagram_connect+0x188/0x2b0
[23900.161960] [] ? _raw_spin_unlock_bh+0x1f/0x30
[23900.167834] [] inet_dgram_connect+0x36/0x80
[23900.173224] [] ? _copy_from_user+0x48/0x140
[23900.178817] [] sys_connect+0x9a/0xd0
[23900.183538] [] ? alloc_file+0xdc/0x240
[23900.189111] [] ? sub_preempt_count+0x3d/0x50Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.Thank Eric for very detailed review/checking the issue.
Signed-off-by: Yanmin Zhang
Signed-off-by: Kun Jiang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Apr, 2012
1 commit
02 Apr, 2012
1 commit
-
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.Signed-off-by: David S. Miller
29 Mar, 2012
1 commit
-
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`
Signed-off-by: David Howells
12 Mar, 2012
1 commit
-
Use a more current kernel messaging style.
Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.Some messages that were prefixed with _close are
now prefixed with _fini. Some ah4 and esp messages
are now not prefixed with "ip ".The intent of this patch is to later add something like
#define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.Text size is trivially reduced. (x86-32 allyesconfig)
$ size net/ipv4/built-in.o*
text data bss dec hex filename
887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new
887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.oldSigned-off-by: Joe Perches
Signed-off-by: David S. Miller
17 Sep, 2011
1 commit
-
Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to
kfree_rcu()) introduced a memory leak. This patch reverts it.Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller
08 May, 2011
1 commit
-
The rcu callback fc_rport_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(fc_rport_free_rcu).Signed-off-by: Lai Jiangshan
Acked-by: David S. Miller
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett
25 Mar, 2011
3 commits
-
Move the scope value out of the fib alias entries and into fib_info,
so that we always use the correct scope when recomputing the nexthop
cached source address.Reported-by: Julian Anastasov
Signed-off-by: David S. Miller -
Any operation that:
1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interfacecan potentially invalidate the nh_saddr value, requiring
it to be recomputed.Perform the recomputation lazily using a generation ID.
Reported-by: Julian Anastasov
Signed-off-by: David S. Miller -
Alessandro Suardi reported that we could not change route metrics :
ip ro change default .... advmss 1400
This regression came with commit 9c150e82ac50 (Allocate fib metrics
dynamically). fib_metrics is no longer an array, but a pointer to an
array.Reported-by: Alessandro Suardi
Signed-off-by: Eric Dumazet
Tested-by: Alessandro Suardi
Signed-off-by: David S. Miller
13 Mar, 2011
3 commits
-
Signed-off-by: David S. Miller
-
To start doing these conversions, we need to add some temporary
flow4_* macros which will eventually go away when all the protocol
code paths are changed to work on AF specific flowi objects.Signed-off-by: David S. Miller
-
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.This is the first step to move in that direction.
Signed-off-by: David S. Miller
11 Mar, 2011
1 commit
-
Completely unused.
Signed-off-by: David S. Miller
09 Mar, 2011
1 commit
-
We have to use cfg->fc_scope not the final nh_scope value.
Reported-by: Julian Anastasov
Signed-off-by: David S. Miller
08 Mar, 2011
3 commits
-
When doing output route lookups, we have to select the source address
if the user has not specified an explicit one.First, if the route has an explicit preferred source address
specified, then we use that.Otherwise we search the route's outgoing interface for a suitable
address.This search can be precomputed and cached at route insertion time.
The only missing part is that we have to refresh this precomputed
value any time addresses are added or removed from the interface, and
this is accomplished by fib_update_nh_saddrs().Signed-off-by: David S. Miller
-
This elimiates a lot of pure overhead due to parameter
passing.Signed-off-by: David S. Miller
-
fib_semantic_match() requires that if the type doesn't signal an
automatic error, it must be of type RTN_UNICAST, RTN_LOCAL,
RTN_BROADCAST, RTN_ANYCAST, or RTN_MULTICAST.Checking this every route lookup is pointless work.
Instead validate it during route insertion, via fib_create_info().
Also, there was nothing making sure the type value was less than
RTN_MAX, so add that missing check while we're here.Signed-off-by: David S. Miller
15 Feb, 2011
1 commit
-
Commit 0c838ff1ade7 (ipv4: Consolidate all default route selection
implementations.) forgot to remove one rcu_read_unlock() from
fib_select_default().Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
02 Feb, 2011
1 commit
-
To avoid confusion with the recently deleted fib_hash.c
code, use "fib_info_hash_*" instead of plain "fib_hash_*".Signed-off-by: David S. Miller
01 Feb, 2011
2 commits
-
Both fib_trie and fib_hash have a local implementation of
fib_table_select_default(). This is completely unnecessary
code duplication.Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route. The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.Signed-off-by: David S. Miller
-
This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.Signed-off-by: David S. Miller
29 Jan, 2011
2 commits
-
If there are no explicit metrics attached to a route, hook
fi->fib_info up to dst_default_metrics.Signed-off-by: David S. Miller
-
This is the initial gateway towards super-sharing metrics
if they are all set to zero for a route.Signed-off-by: David S. Miller
14 Jan, 2011
2 commits
-
Conflicts:
net/ipv4/route.cSigned-off-by: Patrick McHardy
-
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.Reported-by: Vladis Kletnieks
Signed-off-by: Patrick McHardy
18 Nov, 2010
1 commit
-
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
21 Oct, 2010
1 commit
-
Perf tools session at NFWS 2010 pointed out a false sharing on struct
fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit
only if needed (ie : not already set)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Oct, 2010
1 commit
-
Convert inetdev_by_index() to not increment in_dev refcount.
Callers hold RCU or RTNL, and should not decrement in_dev refcount.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Oct, 2010
1 commit
-
fib_lookup() converted to be called in RCU protected context, no
reference taken and released on a contended cache line (fib_clntref)fib_table_lookup() and fib_semantic_match() get an additional parameter.
struct fib_info gets an rcu_head field, and is freed after an rcu grace
period.Stress test :
(Sending 160.000.000 UDP frames on same neighbour,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_HASH) (about same results for FIB_TRIE)Before patch :
real 1m31.199s
user 0m13.761s
sys 23m24.780sAfter patch:
real 1m5.375s
user 0m14.997s
sys 15m50.115sBefore patch Profile :
13044.00 15.4% __ip_route_output_key vmlinux
8438.00 10.0% dst_destroy vmlinux
5983.00 7.1% fib_semantic_match vmlinux
5410.00 6.4% fib_rules_lookup vmlinux
4803.00 5.7% neigh_lookup vmlinux
4420.00 5.2% _raw_spin_lock vmlinux
3883.00 4.6% rt_set_nexthop vmlinux
3261.00 3.9% _raw_read_lock vmlinux
2794.00 3.3% fib_table_lookup vmlinux
2374.00 2.8% neigh_resolve_output vmlinux
2153.00 2.5% dst_alloc vmlinux
1502.00 1.8% _raw_read_lock_bh vmlinux
1484.00 1.8% kmem_cache_alloc vmlinux
1407.00 1.7% eth_header vmlinux
1406.00 1.7% ipv4_dst_destroy vmlinux
1298.00 1.5% __copy_from_user_ll vmlinux
1174.00 1.4% dev_queue_xmit vmlinux
1000.00 1.2% ip_output vmlinuxAfter patch Profile :
13712.00 15.8% dst_destroy vmlinux
8548.00 9.9% __ip_route_output_key vmlinux
7017.00 8.1% neigh_lookup vmlinux
4554.00 5.3% fib_semantic_match vmlinux
4067.00 4.7% _raw_read_lock vmlinux
3491.00 4.0% dst_alloc vmlinux
3186.00 3.7% neigh_resolve_output vmlinux
3103.00 3.6% fib_table_lookup vmlinux
2098.00 2.4% _raw_read_lock_bh vmlinux
2081.00 2.4% kmem_cache_alloc vmlinux
2013.00 2.3% _raw_spin_lock vmlinux
1763.00 2.0% __copy_from_user_ll vmlinux
1763.00 2.0% ip_output vmlinux
1761.00 2.0% ipv4_dst_destroy vmlinux
1631.00 1.9% eth_header vmlinux
1440.00 1.7% _raw_read_unlock_bh vmlinuxReference results, if IP route cache is enabled :
real 0m29.718s
user 0m10.845s
sys 7m37.341s25213.00 29.5% __ip_route_output_key vmlinux
9011.00 10.5% dst_release vmlinux
4817.00 5.6% ip_push_pending_frames vmlinux
4232.00 5.0% ip_finish_output vmlinux
3940.00 4.6% udp_sendmsg vmlinux
3730.00 4.4% __copy_from_user_ll vmlinux
3716.00 4.4% ip_route_output_flow vmlinux
2451.00 2.9% __xfrm_lookup vmlinux
2221.00 2.6% ip_append_data vmlinux
1718.00 2.0% _raw_spin_lock_bh vmlinux
1655.00 1.9% __alloc_skb vmlinux
1572.00 1.8% sock_wfree vmlinux
1345.00 1.6% kfree vmlinuxSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
05 Oct, 2010
1 commit
-
Code style cleanups before upcoming functional changes.
C99 initializer for fib_props array.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>