05 Jan, 2012
2 commits
-
This ensures a linear behaviour when filling /proc/net/if_inet6 thus making
ifconfig run really fast on IPv6 only addresses. In fact, with this patch and
the IPv4 one sent a while ago, ifconfig will run in linear time regardless of
address type.IPv4 related patch: f04565ddf52e401880f8ba51de0dff8ba51c99fd
dev: use name hash for dev_seq_ops
...Some statistics (running ifconfig > /dev/null on a different setup):
iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time
----------------------------------------------------------------
6250 | 0.23 s | 0.13 s | 0.11 s
12500 | 0.62 s | 0.28 s | 0.22 s
25000 | 2.91 s | 0.57 s | 0.46 s
50000 | 11.37 s | 1.21 s | 0.94 s
128000 | 86.78 s | 3.05 s | 2.54 sSigned-off-by: Mihai Maruseac
Cc: Daniel Baluta
Signed-off-by: David S. Miller -
Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
route for the interface we're adding an addres to was wrong (see commit
7ffbcecbeed91e5874e9a1cfc4c0cbb07dac3069). for one, it never triggers, and two,
it was completely wrong to begin with. This test was meant to cover this
section of RFC 4429:3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration
* (modifies section 5.5) A host MAY choose to configure a new address
as an Optimistic Address. A host that does not know the SLLAO
of its router SHOULD NOT configure a new address as Optimistic.
A router SHOULD NOT configure an Optimistic Address.This patch should bring us into proper compliance with the above clause. Since
we only add a SLAAC address after we've received a RA which may or may not
contain a source link layer address option, we can pass a pointer to that option
to addrconf_prefix_rcv (which may be null if the option is not present), and
only set the optimistic flag if the option was found in the RA.Change notes:
(v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
a pointer to make its use more clear as per request from davem.Signed-off-by: Neil Horman
CC: "David S. Miller"
CC: Hideaki YOSHIFUJI
Signed-off-by: David S. Miller
31 Dec, 2011
1 commit
-
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
* doing nothing
* reading ipv6 route table via proc
(while :; do cat /proc/net/ipv6_route > /dev/null; done)
* reading ipv6 route table via rtnetlink
(while :; do ip -6 route show table all > /dev/null; done)* Load the ipv6 route table up with:
* for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done* iperf commands:
* client: iperf -i 1 -V -c
* server: iperf -V -s* iperf results - 3 runs each (in Mbits/sec)
* nothing: client: 927,927,927 server: 927,927,927
* proc: client: 179,97,96,113 server: 142,112,133
* iproute: client: 928,927,928 server: 927,927,927lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.Signed-off-by: Josh Hunt
Signed-off-by: David S. Miller
30 Dec, 2011
2 commits
-
In some of the rt6_bind_neighbour() call sites, it hasn't hooked
up the rt->dst.dev pointer yet, so we'd deref a NULL pointer when
obtaining dev->ifindex for the neighbour hash function computation.Just pass the netdevice explicitly in to fix this problem.
Reported-by: Bjarke Istrup Pedersen
Signed-off-by: David S. Miller -
I missed this while adding ipv6 support to inet_peer.
Signed-off-by: David S. Miller
29 Dec, 2011
4 commits
-
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.And it makes grepping for dst_entry member uses much harder too.
Signed-off-by: David S. Miller
-
Also, create and use an rt6_bind_neighbour() in net/ipv6/route.c to
consolidate some common logic.Signed-off-by: David S. Miller
-
In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller
-
The route we have here is for the address being added to the interface,
ie. for input packet processing.Therefore using that route to determine whether an output nexthop gateway
is known and resolved doesn't make any sense.So, simply remove this test, it never triggered anyways.
Signed-off-by: David S. Miller
Acked-By: Neil Horman
27 Dec, 2011
1 commit
-
RDBG() wasn't even used, and the messages printed by RT6_DEBUG() were
far from useful. Just get rid of all this stuff, we can replace it
with something more suitable if we want.Signed-off-by: David S. Miller
25 Dec, 2011
1 commit
24 Dec, 2011
1 commit
-
Conflicts:
net/bluetooth/l2cap_core.cJust two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.Signed-off-by: David S. Miller
23 Dec, 2011
1 commit
-
Chris Boot reported crashes occurring in ipv6_select_ident().
[ 461.457562] RIP: 0010:[] []
ipv6_select_ident+0x31/0xa7[ 461.578229] Call Trace:
[ 461.580742]
[ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
[ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
[ 461.595140] [] ? skb_gso_segment+0x208/0x28b
[ 461.601198] [] ? ipv6_confirm+0x146/0x15e
[nf_conntrack_ipv6]
[ 461.608786] [] ? nf_iterate+0x41/0x77
[ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
[ 461.620659] [] ? nf_hook_slow+0x73/0x111
[ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
[bridge]
[ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
[ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
[bridge]
[ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
[bridge]
[ 461.653997] [] ? nf_iterate+0x41/0x77
[ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.665485] [] ? nf_hook_slow+0x73/0x111
[ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.677299] [] ?
nf_bridge_update_protocol+0x20/0x20 [bridge]
[ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
[ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
[bridge]
[ 461.704616] [] ?
nf_bridge_push_encap_header+0x1c/0x26 [bridge]
[ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
[bridge]
[ 461.719490] [] ?
nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
[ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
[ 461.734292] [] ? nf_iterate+0x41/0x77
[ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
[ 461.746203] [] ? nf_hook_slow+0x73/0x111
[ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
[ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
[bridge]This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.Problem is present since commit 87c48fa3b46 (ipv6: make fragment
identifications less predictable)Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.Reported-by: Chris Boot
Tested-by: Chris Boot
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Dec, 2011
1 commit
-
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell
Signed-off-by: David S. Miller
16 Dec, 2011
1 commit
-
Conflicts:
drivers/net/ethernet/freescale/fsl_pq_mdio.c
net/batman-adv/translation-table.c
net/ipv6/route.c
14 Dec, 2011
2 commits
-
After commit 8e2ec639173f325977818c45011ee176ef2b11f6 ("ipv6: don't
use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
for setting the ANYCAST flag is now wrong.'rt' will always now have a plen of 128, because it is set explicitly
to 128 by ip6_rt_copy.So to restore the semantics of the test, check the destination prefix
length of 'ort'.Signed-off-by: David S. Miller
-
Don't just succeed with a route that has a NULL neighbour attached.
This follows the behavior of addrconf_dst_alloc().Allowing this kind of route to end up with a NULL neigh attached will
result in packet drops on output until the route is somehow
invalidated, since nothing will meanwhile try to lookup the neigh
again.A statistic is bumped for the case where we see a neigh-less route on
output, but the resulting packet drop is otherwise silent in nature,
and frankly it's a hard error for this to happen and ipv6 should do
what ipv4 does which is say something in the kernel logs.Signed-off-by: David S. Miller
13 Dec, 2011
6 commits
-
This is not merged with the ipv4 match into xt_rpfilter.c
to avoid ipv6 module dependency issues.Signed-off-by: Florian Westphal
Acked-by: David S. Miller
Signed-off-by: Pablo Neira Ayuso -
This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller -
Same fix as 731abb9cb2 for ipip and sit tunnel.
Commit 1c5cae815d removed an explicit call to dev_alloc_name in
ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
will now create a valid name, however the tunnel keeps a copy of the
name in the private parms structure. Fix this by copying the name back
after register_netdevice has successfully returned.This shows up if you do a simple tunnel add, followed by a tunnel show:
$ sudo ip tunnel add mode ipip remote 10.2.20.211
$ ip tunnel
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit
$ sudo ip tunnel add mode sit remote 10.2.20.212
$ ip tunnel
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16
sit%d: ioctl 89f8 failed: No such device
sit%d: ipv6/ip remote 10.2.20.212 local any ttl inheritCc: stable@vger.kernel.org
Signed-off-by: Ted Feng
Signed-off-by: David S. Miller -
There is no obvious reason to add a default multicast route for loopback
devices, otherwise there would be a route entry whose dst.error set to
-ENETUNREACH that would blocking all multicast packets.====================
[ more detailed explanation ]
The problem is that the resulting routing table depends on the sequence
of interface's initialization and in some situation, that would block all
muticast packets. Suppose there are two interfaces on my computer
(lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
table(for multicast) would be# ip -6 route show | grep ff00::
unreachable ff00::/8 dev lo metric 256 error -101
ff00::/8 dev eth0 metric 256When sending multicasting packets, routing subsystem will return the first
route entry which with a error set to -101(ENETUNREACH).I know the kernel will set the default ipv6 address for 'lo' when it is up
and won't set the default multicast route for it, but there is no reason to
stop 'init' program from setting address for 'lo', and that is exactly what
systemd did.I am sure there is something wrong with kernel or systemd, currently I preferred
kernel caused this problem.====================
Signed-off-by: Li Wei
Signed-off-by: David S. Miller
10 Dec, 2011
1 commit
-
The UDP diag get_exact handler will require them to find a
socket by provided net, [sd]addr-s, [sd]ports and device.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
07 Dec, 2011
2 commits
-
And return error pointers.
Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
06 Dec, 2011
1 commit
-
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.Signed-off-by: David S. Miller
Acked-by: Roland Dreier
05 Dec, 2011
1 commit
-
like rt6_lookup, but allows caller to pass in flowi6 structure.
Will be used by the upcoming ipv6 netfilter reverse path filter
match.Signed-off-by: Florian Westphal
Acked-by: David S. Miller
Signed-off-by: Pablo Neira Ayuso
04 Dec, 2011
5 commits
-
It's only used in net/ipv6/route.c and the NULL device check is
superfluous for all of the existing call sites.Just expand the __ndisc_lookup_errno() call at each location.
Signed-off-by: David S. Miller
-
1) x == NULL --> !x
2) x != NULL --> x
3) (x&BIT) --> (x & BIT)
4) (BIT1|BIT2) --> (BIT1 | BIT2)
5) proper argument and struct member alignmentSigned-off-by: David S. Miller
-
1) x == NULL --> !x
2) x != NULL --> x
3) if() --> if ()
4) while() --> while ()
5) (x & BIT) == 0 --> !(x & BIT)
6) (x&BIT) --> (x & BIT)
7) x=y --> x = y
8) (BIT1|BIT2) --> (BIT1 | BIT2)
9) if ((x & BIT)) --> if (x & BIT)
10) proper argument and struct member alignmentSigned-off-by: David S. Miller
-
While parsing through IPv6 extension headers, fragment headers are
skipped making them invisible to the caller. This reports the
fragment offset of the last header in order to make it possible to
determine whether the packet is fragmented and, if so whether it is
a first or last fragment.Signed-off-by: Jesse Gross
03 Dec, 2011
1 commit
02 Dec, 2011
1 commit
-
This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a.
If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated. So we won't compute
the same values as the original code did.Reported-by: paul bilke
Signed-off-by: David S. Miller
01 Dec, 2011
2 commits
-
Need not to used 'delta' flag when add single-source to interface
filter source list.Signed-off-by: Jun Zhao
Signed-off-by: David S. Miller -
Let the core self-size the neigh entry based upon the key length.
Signed-off-by: David S. Miller
29 Nov, 2011
2 commits
-
Igor Maravic reported an error caused by jump_label_dec() being called
from IRQ context :BUG: sleeping function called from invalid context at kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
1 lock held by swapper/0:
#0: (&n->timer){+.-...}, at: [] call_timer_fn+0x0/0x340
Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1
Call Trace:
[] __might_sleep+0x137/0x1f0
[] mutex_lock_nested+0x2f/0x370
[] ? trace_hardirqs_off+0xd/0x10
[] ? local_clock+0x6f/0x80
[] ? lock_release_holdtime.part.22+0x15/0x1a0
[] ? sock_def_write_space+0x59/0x160
[] ? arp_error_report+0x3e/0x90
[] atomic_dec_and_mutex_lock+0x5d/0x80
[] jump_label_dec+0x1d/0x50
[] net_disable_timestamp+0x15/0x20
[] sock_disable_timestamp+0x45/0x50
[] __sk_free+0x80/0x200
[] ? sk_send_sigurg+0x70/0x70
[] ? arp_error_report+0x3e/0x90
[] sock_wfree+0x3a/0x70
[] skb_release_head_state+0x70/0x120
[] __kfree_skb+0x16/0x30
[] kfree_skb+0x49/0x170
[] arp_error_report+0x3e/0x90
[] neigh_invalidate+0x89/0xc0
[] neigh_timer_handler+0x9e/0x2a0
[] ? neigh_update+0x640/0x640
[] __do_softirq+0xc8/0x3a0Since jump_label_{inc|dec} must be called from process context only,
we must defer jump_label_dec() if net_disable_timestamp() is called
from interrupt context.Reported-by: Igor Maravic
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
We need to set np->mcast_hops to it's default value at this moment
otherwise when we use it and found it's value is -1, the logic to
get default hop limit doesn't take multicast into account and will
return wrong hop limit(IPV6_DEFAULT_HOPLIMIT) which is for unicast.Signed-off-by: Li Wei
Signed-off-by: David S. Miller
27 Nov, 2011
1 commit
-
Conflicts:
net/ipv4/inet_diag.c