29 Jun, 2013
1 commit
-
Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.So for the time being, cache separate copies of input routes for
each next hop exception.Signed-off-by: Timo Teräs
Reviewed-by: Julian Anastasov
Signed-off-by: David S. Miller
13 Jun, 2013
1 commit
-
Reduce the uses of this unnecessary typedef.
Done via perl script:
$ git grep --name-only -w ctl_table net | \
xargs perl -p -i -e '\
sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'Reflow the modified lines that now exceed 80 columns.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
06 Jun, 2013
1 commit
-
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge. Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.Signed-off-by: David S. Miller
03 Jun, 2013
3 commits
-
commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.Cc: Steffen Klassert
Signed-off-by: Timo Teräs
Signed-off-by: David S. Miller -
The tunnel devices call update_pmtu for each packet sent, this causes
contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
actually changed, and there is still plenty of time before the entry
expires.Signed-off-by: Timo Teräs
Signed-off-by: David S. Miller -
This reverts commit 05ab86c5 (xfrm4: Invalidate all ipv4 routes on
IPsec pmtu events). Flushing all cached entries is not needed.Instead, invalidate only the related next hop dsts to recheck for
the added next hop exception where needed. This also fixes a subtle
race due to bumping generation id's before updating the pmtu.Cc: Steffen Klassert
Signed-off-by: Timo Teräs
Signed-off-by: David S. Miller
28 May, 2013
1 commit
-
Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
doesn't call __build_flow_key() directly but via
ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
getting pointer to IPv4 header of the ICMP redirect packet
rather than pointer to the embedded IPv4 header of the packet
initiating the redirect.As a result, handling of ICMP redirects initiated by TCP packets
is broken. Issue was introduced by4895c771c ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Michal Kubecek
Signed-off-by: David S. Miller
22 Mar, 2013
1 commit
-
With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
20 Feb, 2013
1 commit
-
the vars ip_rt_gc_timeout is used only when
CONFIG_SYSCTL is selected.move these vars into CONFIG_SYSCTL.
Signed-off-by: Gao feng
Signed-off-by: David S. Miller
19 Feb, 2013
1 commit
-
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller
23 Jan, 2013
1 commit
-
git commit 9cb3a50c (ipv4: Invalidate the socket cached route on
pmtu events if possible) introduced a refcount problem. We don't
get a refcount on the route if we get it from__sk_dst_get(), but
we need one if we want to reuse this route because __sk_dst_set()
releases the refcount of the old route. This patch adds proper
refcount handling for that case. We introduce a 'new' flag to
indicate that we are going to use a new route and we release the
old route only if we replace it by a new one.Reported-by: Julian Anastasov
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
22 Jan, 2013
1 commit
-
The route lookup in ipv4_sk_update_pmtu() might return a route
different from the route we cached at the socket. This is because
standart routes are per cpu, so each cpu has it's own struct rtable.
This means that we do not invalidate the socket cached route if the
NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
uses. As a result, the cached route reused until we disconnect.With this patch we invalidate the socket cached route if possible.
If the socket is owened by the user, we can't update the cached
route directly. A followup patch will implement socket release
callback functions for datagram sockets to handle this case.Reported-by: Yurij M. Plotnikov
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
17 Jan, 2013
2 commits
-
Routes with locked mtu should not use learned pmtu informations,
so do not update the pmtu on these routes.Reported-by: Julian Anastasov
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller -
The output route check was introduced with git commit 261663b0
(ipv4: Don't use the cached pmtu informations for input routes)
during times when we cached the pmtu informations on the
inetpeer. Now the pmtu informations are back in the routes,
so this check is obsolete. It also had some unwanted side effects,
as reported by Timo Teras and Lukas Tribus.Signed-off-by: Steffen Klassert
Acked-by: Timo Teräs
Signed-off-by: David S. Miller
08 Dec, 2012
1 commit
-
Commit f1ce3062c538 (ipv4: Remove 'rt_dst' from 'struct rtable') removes the
call to ipmr_get_route(), which will get multicast parameters of the route.I revert the part of the patch that remove this call. I think the goal was only
to get rid of rt_dst field.The patch is only compiled-tested. My first idea was to remove ipmr_get_route()
because rt_fill_info() was the only user, but it seems the previous patch cleans
the code a bit too much ;-)Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller
26 Nov, 2012
1 commit
-
Conflicts:
drivers/net/wireless/iwlwifi/pcie/tx.cMinor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.Signed-off-by: David S. Miller
23 Nov, 2012
1 commit
-
Starting from 3.6 we cache output routes for
multicasts only when using route to 224/4. For local receivers
we can set RTCF_LOCAL flag depending on the membership but
in such case we use maddr and saddr which are not caching
keys as before. Additionally, we can not use same place to
cache routes that differ in RTCF_LOCAL flag value.Fix it by caching only RTCF_MULTICAST entries
without RTCF_LOCAL (send-only, no loopback). As a side effect,
we avoid unneeded lookup for fnhe when not caching because
multicasts are not redirected and they do not learn PMTU.Thanks to Maxime Bizon for showing the caching
problems in __mkroute_output for 3.6 kernels: different
RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
ip_output call and the visible problem is that traffic can
not reach local receivers via loopback.Reported-by: Maxime Bizon
Tested-by: Maxime Bizon
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller
19 Nov, 2012
1 commit
-
In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
13 Nov, 2012
1 commit
-
The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.Signed-off-by: Steffen Klassert
19 Oct, 2012
1 commit
-
Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.Signed-off-by: Steffen Klassert
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Oct, 2012
1 commit
-
Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_markSigned-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
09 Oct, 2012
7 commits
-
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.The returned route can be cached as follows:
- in NH exception: because the cached routes are not shared
with other destinations
- in FIB NH: when using gateway because all destinations for
NH share same gatewayAs last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
After "Cache input routes in fib_info nexthops" (commit
d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller -
When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller -
Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.With this patch we always invalidate or update the route on pmtu events.
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
29 Sep, 2012
1 commit
-
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.cThe team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller
19 Sep, 2012
3 commits
-
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
We dont use jhash anymore since route cache removal,
so we can get rid of get_random_bytes() calls for rt_genid
changes.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.Signed-off-by: Nicolas Dichtel
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Sep, 2012
1 commit
-
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
08 Sep, 2012
2 commits
-
We dont use jhash anymore since route cache removal,
so we can get rid of get_random_bytes() calls for rt_genid
changes.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.Signed-off-by: Nicolas Dichtel
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Sep, 2012
2 commits
-
In ipv4_mtu there is some logic where we are testing for a non-zero value
and a timer expiration, then setting the value to zero, and then testing if
the value is zero we set it to a value based on the dst. Instead of
bothering with the extra steps it is easier to just cleanup the logic so
that we set it to the dst based value if it is zero or if the timer has
expired.Signed-off-by: Alexander Duyck
-
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.Signed-off-by: David S. Miller
31 Aug, 2012
1 commit
-
Following lockdep splat was reported by Pavel Roskin :
[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240] #0: (slock-AF_INET){+.-...}, at: [] release_sock+0x2c/0xa0
[ 1570.586253] #1: (fnhe_lock){+.-...}, at: [] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271] [] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275] [] update_or_create_fnhe+0x15c/0x270
[ 1570.586278] [] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282] [] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285] [] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290] [] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293] [] tcp_release_cb+0xa4/0xb0
[ 1570.586296] [] release_sock+0x55/0xa0
[ 1570.586300] [] tcp_sendmsg+0x4af/0xf50
[ 1570.586305] [] inet_sendmsg+0x120/0x230
[ 1570.586308] [] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312] [] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315] [] ? sock_update_classid+0x130/0x3b0
[ 1570.586320] [] do_sock_write+0xc5/0xe0
[ 1570.586323] [] sock_aio_write+0x53/0x80
[ 1570.586328] [] do_sync_write+0xa3/0xe0
[ 1570.586332] [] vfs_write+0x165/0x180
[ 1570.586335] [] sys_write+0x45/0x90
[ 1570.586340] [] system_call_fastpath+0x16/0x1bSigned-off-by: Eric Dumazet
Reported-by: Pavel Roskin
Signed-off-by: David S. Miller
24 Aug, 2012
1 commit
-
Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Aug, 2012
1 commit
-
Sylvain Munault reported following info :
- TCP connection get "stuck" with data in send queue when doing
"large" transfers ( like typing 'ps ax' on a ssh connection )
- Only happens on path where the PMTU is lower than the MTU of
the interface
- Is not present right after boot, it only appears 10-20min after
boot or so. (and that's inside the _same_ TCP connection, it works
fine at first and then in the same ssh session, it'll get stuck)
- Definitely seems related to fragments somehow since I see a router
sending ICMP message saying fragmentation is needed.
- Exact same setup works fine with kernel 3.5.1Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.It seems we want to set the expires field to a new value, regardless
of prior one.With help from Julian Anastasov.
Reported-by: Sylvain Munaut
Signed-off-by: Eric Dumazet
CC: Julian Anastasov
Tested-by: Sylvain Munaut
Signed-off-by: David S. Miller