Eric Lee / smarc-fsl-linux-kernel

31 Dec, 2011

1 commit

32b293a53 IPv6: Avoid taking write lock for /proc/net/ipv6_route ... Browse Code »

During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.

To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
* doing nothing
* reading ipv6 route table via proc
(while :; do cat /proc/net/ipv6_route > /dev/null; done)
* reading ipv6 route table via rtnetlink
(while :; do ip -6 route show table all > /dev/null; done)

* Load the ipv6 route table up with:
* for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done

* iperf commands:
* client: iperf -i 1 -V -c
* server: iperf -V -s

* iperf results - 3 runs each (in Mbits/sec)
* nothing: client: 927,927,927 server: 927,927,927
* proc: client: 179,97,96,113 server: 142,112,133
* iproute: client: 928,927,928 server: 927,927,927

lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.

Signed-off-by: Josh Hunt
Signed-off-by: David S. Miller

Josh Hunt
2011-12-31 06:07:33 +0800

29 Dec, 2011

1 commit

d19185428 ipv6: Kill rt6i_dev and rt6i_expires defines. ... Browse Code »

It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.

And it makes grepping for dst_entry member uses much harder too.

Signed-off-by: David S. Miller

David S. Miller
2011-12-29 09:19:20 +0800

18 Jul, 2011

1 commit

9cbb7ecbc ipv6: Get rid of rt6i_nexthop macro. ... Browse Code »

It just makes it harder to see 1) what the code is doing
and 2) grep for all users of dst{->,.}neighbour

Signed-off-by: David S. Miller

David S. Miller
2011-07-18 14:11:35 +0800

25 Apr, 2011

1 commit

2a9e95070 net: Remove __KERNEL__ cpp checks from include/net ... Browse Code »

These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.

Projects should also not copy these files into their userland utility
sources and try to use them there. If they insist on doing so, the
onus is on them to sanitize the headers as needed.

Signed-off-by: David S. Miller

David S. Miller
2011-04-25 01:54:56 +0800

23 Apr, 2011

1 commit

b71d1d426 inet: constify ip headers and in6_addr ... Browse Code »

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-23 02:04:14 +0800

16 Apr, 2011

1 commit

c3968a857 ipv6: RTA_PREFSRC support for ipv6 route source address selection ... Browse Code »

[ipv6] Add support for RTA_PREFSRC

This patch allows a user to select the preferred source address
for a specific IPv6-Route. It can be set via a netlink message
setting RTA_PREFSRC to a valid IPv6 address which must be
up on the device the route will be bound to.

Signed-off-by: Daniel Walter
Signed-off-by: David S. Miller

Daniel Walter
2011-04-16 06:44:37 +0800

13 Mar, 2011

1 commit

4c9483b2f ipv6: Convert to use flowi6 where applicable. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800

11 Feb, 2011

1 commit

6431cbc25 inet: Create a mechanism for upward inetpeer propagation into routes. ... Browse Code »

If we didn't have a routing cache, we would not be able to properly
propagate certain kinds of dynamic path attributes, for example
PMTU information and redirects.

The reason is that if we didn't have a routing cache, then there would
be no way to lookup all of the active cached routes hanging off of
sockets, tunnels, IPSEC bundles, etc.

Consider the case where we created a cached route, but no inetpeer
entry existed and also we were not asked to pre-COW the route metrics
and therefore did not force the creation a new inetpeer entry.

If we later get a PMTU message, or a redirect, and store this
information in a new inetpeer entry, there is no way to teach that
cached route about the newly existing inetpeer entry.

The facilities implemented here handle this problem.

First we create a generation ID. When we create a cached route of any
kind, we remember the generation ID at the time of attachment. Any
time we force-create an inetpeer entry in response to new path
information, we bump that generation ID.

The dst_ops->check() callback is where the knowledge of this event
is propagated. If the global generation ID does not equal the one
stored in the cached route, and the cached route has not attached
to an inetpeer yet, we look it up and attach if one is found. Now
that we've updated the cached route's information, we update the
route's generation ID too.

This clears the way for implementing PMTU and redirects directly in
the inetpeer cache. There is absolutely no need to consult cached
route information in order to maintain this information.

At this point nothing bumps the inetpeer genids, that comes in the
later changes which handle PMTUs and redirects using inetpeers.

Signed-off-by: David S. Miller

David S. Miller
2011-02-11 05:33:41 +0800

01 Dec, 2010

1 commit

b34193638 ipv6: Add infrastructure to bind inet_peer objects to routes. ... Browse Code »

They are only allowed on cached ipv6 routes.

Signed-off-by: David S. Miller

David S. Miller
2010-12-01 04:27:11 +0800

11 Jun, 2010

1 commit

d8d1f30b9 net-next: remove useless union keyword ... Browse Code »

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Changli Gao
2010-06-11 14:31:35 +0800

02 Apr, 2010

1 commit

bd2c77a0a ipv6 fib: Make rt6_info{} more cache-line aware. ... Browse Code »

The head element of rt6_info{} is dst_entry{}, and
IPv6 specific elements follow.

Because elements at the end of dst_entry{} are frequently
updated, it is not good to put frequently-used static
elements, such as rt6i_idev, rt6i_dst or rt6i_flags in the
same cache line.

On the other hand, fib6_table, rt6i_node or rt6i_gateway are
rarely used, so it is okay to stay in the same cache line.

Let's rearrange rt6_info{}.

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki / 吉藤英明
2010-04-02 09:41:41 +0800

19 Feb, 2010

1 commit

bbef49dac ipv6: use standard lists for FIB walks ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2010-02-19 06:30:17 +0800

13 Feb, 2010

1 commit

2bec5a369 ipv6: fib: fix crash when changing large fib while dumping it ... Browse Code »
43

When the fib size exceeds what can be dumped in a single skb, the
dump is suspended and resumed once the last skb has been received
by userspace. When the fib is changed while the dump is suspended,
the walker might contain stale pointers, causing a crash when the
dump is resumed.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [] fib6_walk_continue+0xbb/0x124 [ipv6]
PGD 5347a067 PUD 65c7067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
RIP: 0010:[]
[] fib6_walk_continue+0xbb/0x124 [ipv6]
...
Call Trace:
[] ? mutex_spin_on_owner+0x59/0x71
[] inet6_dump_fib+0x11b/0x1b9 [ipv6]
[] netlink_dump+0x5b/0x19e
[] ? consume_skb+0x28/0x2a
[] netlink_recvmsg+0x1ab/0x2c6
[] ? netlink_unicast+0xfa/0x151
[] __sock_recvmsg+0x6d/0x79
[] sock_recvmsg+0xca/0xe3
[] ? autoremove_wake_function+0x0/0x38
[] ? radix_tree_lookup_slot+0xe/0x10
[] ? find_get_page+0x90/0xa5
[] ? filemap_fault+0x201/0x34f
[] ? fget_light+0x2f/0xac
[] ? verify_iovec+0x4f/0x94
[] sys_recvmsg+0x14d/0x223

Store the serial number when beginning to walk the fib and reload
pointers when continuing to walk after a change occured. Similar
to other dumping functions, this might cause unrelated entries to
be missed when entries are deleted.

Tested-by: Ben Greear
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2010-02-13 04:06:35 +0800

04 Nov, 2009

1 commit

fd2c3ef76 net: cleanup include/net ... Browse Code »

This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-11-04 21:06:25 +0800

31 Jul, 2009

1 commit

a33bc5c15 xfrm: select sane defaults for xfrm[4|6] gc_thresh ... Browse Code »

Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024). Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache. Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier. there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).

Signed-off-by: Neil Horman
Signed-off-by: David S. Miller

Neil Horman
2009-07-31 09:52:15 +0800

05 Mar, 2008

1 commit

8ed677896 [NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace ... Browse Code »

The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-05 05:48:30 +0800

04 Mar, 2008

3 commits

5b7c931df [NETNS][IPV6] ip6_fib - add net to gc timer parameter ... Browse Code »

The fib tables are now relative to the network namespace. When the
garbage collector timer expires, we must have a network namespace
parameter in order to retrieve the tables. For now this is the
init_net, but we should be able to have a timer per namespace and use
the timer callback parameter to pass the network namespace from the
expired timer.

The timer callback, fib6_run_gc, is actually used to be called
synchronously by some functions and asynchronously when the timer
expires.

When the timer expires, the delay specified for fib6_run_gc parameter
is always zero. So, I changed fib6_run_gc to not be a timer callback
but a function called by the timer callback and I added a timer
callback where its work is just to retrieve from the data arg of the
timer the network namespace and call fib6_run_gc with zero expiring
time and the network namespace parameters. That makes the code cleaner
for the fib6_run_gc callers.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-04 15:28:58 +0800
f3db48517 [NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespaces ... Browse Code »

The function fib6_clean_all takes the network namespace as
parameter. That allows to flush the routes related to a specific
network namespace.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-04 15:27:06 +0800
58f09b78b [NETNS][IPV6] ip6_fib - make it per network namespace ... Browse Code »

The fib table for ipv6 are moved to the network namespace structure.
All references to them are made relatively to the network namespace.

All external calls to the ip6_fib functions taking the network
namespace parameter are made using the init_net variable, so the
ip6_fib engine is ready for the namespaces but the callers not yet.

Signed-off-by: Daniel Lezcano
Signed-off-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-03-04 15:25:27 +0800

08 Feb, 2008

1 commit

4e881a217 [IPV6] Minor cleanup: remove unused definitions in net/ip6_fib.h ... Browse Code »

This patch removes some unused definitions and one method typedef
declaration (f_pnode)
in include/net/ip6_fib.h, as they are not used in the kernel.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2008-02-08 10:11:49 +0800

29 Jan, 2008

5 commits

a1b051405 [XFRM] IPv6: Fix dst/routing check at transformation. ... Browse Code »

IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
care about routing changes. It is required by MIPv6 and
off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Masahide NAKAMURA
2008-01-29 06:59:36 +0800
7e5449c21 [IPV6]: route6 remove ifdef for fib_rules ... Browse Code »

The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano
Acked-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Daniel Lezcano
2008-01-29 06:56:59 +0800
9eb87f3f7 [IPV6]: Make fib6_rules_init to return an error code. ... Browse Code »

When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano
Acked-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-01-29 06:56:46 +0800
d63bddbe9 [IPV6]: Make fib6_init to return an error code. ... Browse Code »

If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano
Acked-by: Benjamin Thery
Signed-off-by: David S. Miller

Daniel Lezcano
2008-01-29 06:56:45 +0800
b4ce92775 [IPV6]: Move nfheader_len into rt6_info ... Browse Code »

The dst member nfheader_len is only used by IPv6. It's also currently
creating a rather ugly alignment hole in struct dst. Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2008-01-29 06:53:37 +0800

11 Oct, 2007

1 commit

a47ed4cd8 [IPV6] XFRM: Fix connected socket to use transformation. ... Browse Code »

When XFRM policy and state are ready after TCP connection is started,
the traffic should be transformed immediately, however it does not
on IPv6 TCP.

It depends on a dst cache replacement policy with connected socket.
It seems that the replacement is always done for IPv4, however, on
IPv6 case it is done only when routing cookie is changed.

This patch fix that non-transformation dst can be changed to
transformation one.
This behavior is required by MIPv6 and improves IPv6 IPsec.

Fixes by Masahide NAKAMURA.

Signed-off-by: Noriaki TAKAMIYA
Signed-off-by: Masahide NAKAMURA
Signed-off-by: David S. Miller

Noriaki TAKAMIYA
2007-10-11 07:48:32 +0800

26 Apr, 2007

1 commit

c127ea2c4 [IPv6]: Use rtnl registration interface ... Browse Code »

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2007-04-26 13:27:13 +0800

26 Mar, 2007

1 commit

f11e6659c [IPV6]: Fix routing round-robin locking. ... Browse Code »

As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.

Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf. The round robin code executes during
lookup and thus with the rwlock taken as a reader. A small local
spinlock tries to provide protection but this does not work at all
for two reasons:

1) The round-robin list manipulation, as coded, goes like this (with
read lock held):

walk routes finding head and tail

spin_lock();
rotate list using head and tail
spin_unlock();

While one thread is rotating the list, another thread can
end up with stale values of head and tail and then proceed
to corrupt the list when it gets the lock. This ends up causing
the OOPS in fib6_add() later onthat many people have been hitting.

2) All the other code paths that run with the rwlock held as
a reader do not expect the list to change on them, they
expect it to remain completely fixed while they hold the
lock in that way.

So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.

Reimplement using a per-fib6_node round-robin pointer. This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself. We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.

The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.

Signed-off-by: David S. Miller

David S. Miller
2007-03-26 09:48:05 +0800

11 Feb, 2007

1 commit

7cc482634 [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer ... Browse Code »

This patch removes the next pointer from 'struct rt6_info.u' union,
and renames u.next to u.dst.rt6_next.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2007-02-11 15:20:40 +0800

14 Dec, 2006

1 commit

8bce65b95 [IPV6]: Make fib6_node subtree depend on IPV6_SUBTREES ... Browse Code »

Make fib6_node 'subtree' depend on IPV6_SUBTREES.

Signed-off-by: Kim Nordlund
Signed-off-by: David S. Miller

Kim Nordlund
2006-12-14 08:48:31 +0800

03 Dec, 2006

1 commit

7a3025b1b [IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}. ... Browse Code »

Otherwise, we will see a lot of casts...

Signed-off-by: YOSHIFUJI Hideaki

YOSHIFUJI Hideaki
2006-12-03 13:22:07 +0800

23 Sep, 2006

7 commits

77d16f450 [IPV6] ROUTE: Unify RT6_F_xxx and RT6_SELECT_F_xxx flags ... Browse Code »

Unify RT6_F_xxx and RT6_SELECT_F_xxx flags into
RT6_LOOKUP_F_xxx flags, and put them into ip6_route.h

Signed-off-by: YOSHIFUJI Hideaki
Acked-by: Ville Nuorvala

YOSHIFUJI Hideaki
2006-09-23 05:55:56 +0800
7fc33165a [IPV6] ROUTE: Put SUBTREE() as FIB6_SUBTREE() into ip6_fib.h for future use. ... Browse Code »

Based on MIPL2 kernel patch.

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: Ville Nuorvala
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki
2006-09-23 05:55:51 +0800
86872cb57 [IPv6] route: FIB6 configuration using struct fib6_config ... Browse Code »

Replaces the struct in6_rtmsg based interface orignating from
the ioctl interface with a struct fib6_config based on. Allows
changing the interface without breaking the ioctl interface
and avoids passing on tons of parameters.

The recently introduced struct nl_info is used to pass on
netlink authorship information for notifications.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2006-09-23 05:55:12 +0800
90d41122f [IPV6] ip6_fib.c: make code static ... Browse Code »

Make the following needlessly global code static:
- fib6_walker_lock
- struct fib6_walker_list
- fib6_walk_continue()
- fib6_walk()

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Adrian Bunk
2006-09-23 05:54:38 +0800
8ce11e6a9 [NET]: Make code static. ... Browse Code »

This patch makes needlessly global code static.

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
2006-09-23 05:54:07 +0800
101367c2f [IPV6]: Policy Routing Rules ... Browse Code »

Adds support for policy routing rules including a new
local table for routes with a local destination.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2006-09-23 05:53:41 +0800
c71099acc [IPV6]: Multiple Routing Tables ... Browse Code »

Adds the framework to support multiple IPv6 routing tables.
Currently all automatically generated routes are put into the
same table. This could be changed at a later point after
considering the produced locking overhead.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2006-09-23 05:53:39 +0800

22 Jun, 2005

1 commit

0d51aa80a [IPV6]: V6 route events reported with wrong netlink PID and seq number ... Browse Code »

Essentially netlink at the moment always reports a pid and sequence of 0
always for v6 route activities.
To understand the repurcassions of this look at:
http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

While fixing this, i took the liberty to resolve the outstanding issue
of IPV6 routes inserted via ioctls to have the correct pids as well.

This patch tries to behave as close as possible to the v4 routes i.e
maintains whatever PID the socket issuing the command owns as opposed to
the process. That made the patch a little bulky.

I have tested against both netlink derived utility to add/del routes as
well as ioctl derived one. The Quagga folks have tested against quagga.
This fixes the problem and so far hasnt been detected to introduce any
new issues.

Signed-off-by: Jamal Hadi Salim
Acked-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

Jamal Hadi Salim
2005-06-22 04:51:04 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800