11 Sep, 2016
1 commit
-
No longer needed
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
24 Apr, 2016
1 commit
-
Conflicts were two cases of simple overlapping changes,
nothing serious.In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.Signed-off-by: David S. Miller
12 Apr, 2016
1 commit
-
Vivek reported a kernel exception deleting a VRF with an active
connection through it. The root cause is that the socket has a cached
reference to a dst that is destroyed. Converting the dst_destroy to
dst_release and letting proper reference counting kick in does not
work as the dst has a reference to the device which needs to be released
as well.I talked to Hannes about this at netdev and he pointed out the ipv4 and
ipv6 dst handling has dst_ifdown for just this scenario. Rather than
continuing with the reinvented dst wheel in VRF just remove it and
leverage the ipv4 and ipv6 versions.Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")Signed-off-by: David Ahern
Signed-off-by: David S. Miller
08 Apr, 2016
1 commit
-
In inet_iif check if skb_rtable is NULL for the skb and return
skb->skb_iif if it is.This change allows inet_iif to be called before the dst
information has been set in the skb (e.g. when doing socket based
UDP GRO).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
17 Feb, 2016
1 commit
-
Signed-off-by: Nikolay Borisov
Signed-off-by: David S. Miller
05 Jan, 2016
1 commit
-
Commands run in a vrf context are not failing as expected on a route lookup:
root@kenny:~# ip ro ls table vrf-red
unreachable defaultroot@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
ping: Warning: source address might be selected on device other than vrf-red.
PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.--- 10.100.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999msSince the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
connect: No route to hostSigned-off-by: David Ahern
Signed-off-by: David S. Miller
07 Oct, 2015
2 commits
-
Add operation to l3mdev to lookup source address for a given flow.
Add support for the operation to VRF driver and convert existing
IPv4 hooks to use the new lookup.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
05 Oct, 2015
1 commit
-
ICMP packets are inspected to let them route together with the flow they
belong to, minimizing the chance that a problematic path will affect flows
on other paths, and so that anycast environments can work with ECMP.Signed-off-by: Peter Nørlund
Signed-off-by: David S. Miller
30 Sep, 2015
2 commits
-
Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well.
Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.Signed-off-by: David Ahern
Signed-off-by: David S. Miller
27 Sep, 2015
1 commit
-
Conflicts:
net/ipv4/arp.cThe net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.Signed-off-by: David S. Miller
26 Sep, 2015
1 commit
-
Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Sep, 2015
1 commit
-
Steffen reported that the recent change to add oif to dst lookups breaks
the VTI use case. The problem is that with the oif set in the flow struct
the comparison to the nh_oif is triggered. Fix by splitting the
FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache
bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare
nh oif (FLOWI_FLAG_SKIP_NH_OIF).Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
Signed-off-by: David Ahern
Acked-by: Steffen Klassert
Signed-off-by: David S. Miller
16 Sep, 2015
1 commit
-
Add the FIB table id to rtable to make the information available for
IPv4 as it is for IPv6.Signed-off-by: David Ahern
Signed-off-by: David S. Miller
02 Sep, 2015
1 commit
-
A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.Fixes:
4e3c89920cd3a ("net: Introduce VRF related flags and helpers")
15be405eb2ea9 ("net: Add inet_addr lookup by table")
30bbaa1950055 ("net: Fix up inet_addr_type checks")
021dd3b8a142d ("net: Add routes to the table associated with the device")
dc028da54ed35 ("inet: Move VRF table lookup to inlined function")
f6d3c19274c74 ("net: FIB tracepoints")Signed-off-by: David Ahern
Reviewed-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
21 Aug, 2015
1 commit
-
Currently, the lwtunnel state resides in per-protocol data. This is
a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
The xmit function of the tunnel does not know whether the packet has been
routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
lwtstate data to dst_entry makes such inter-protocol tunneling possible.As a bonus, this brings a nice diffstat.
Signed-off-by: Jiri Benc
Acked-by: Roopa Prabhu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
14 Aug, 2015
3 commits
-
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.Signed-off-by: Shrijeet Mukherjee
Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.Signed-off-by: Shrijeet Mukherjee
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
22 Jul, 2015
1 commit
-
This patch adds support in ipv4 fib functions to parse user
provided encap attributes and attach encap state data to fib_nh
and rtable.Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
16 Jan, 2015
1 commit
-
RAW sockets with hdrinc suffer from contention on rt_uncached_lock
spinlock.One solution is to use percpu lists, since most routes are destroyed
by the cpu that created them.It is unclear why we even have to put these routes in uncached_list,
as all outgoing packets should be freed when a device is dismantled.Signed-off-by: Eric Dumazet
Fixes: caacf05e5ad1 ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: David S. Miller
25 Mar, 2014
1 commit
-
ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.
Signed-off-by: Li RongQing
Signed-off-by: David S. Miller
14 Jan, 2014
1 commit
-
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.Cc: Eric Dumazet
Cc: David Miller
Cc: John Heffner
Cc: Steffen Klassert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
06 Dec, 2013
1 commit
-
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.Signed-off-by: Steffen Klassert
06 Nov, 2013
1 commit
-
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:This issue was previously discussed among the DNS community, e.g.
,
without leading to fixes.This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.Cc: David S. Miller
Suggested-by: Florian Weimer
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
18 Oct, 2013
1 commit
-
Half of the rt_cache_stat fields are no longer used after IP
route cache removal, lets shrink this per cpu area.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Sep, 2013
1 commit
-
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.Signed-off-by: Francesco Fusco
Signed-off-by: David S. Miller
23 Sep, 2013
1 commit
-
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
14 Aug, 2013
1 commit
-
skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we
always have to make sure we a referring to the correct interpretation
of skb->sk.We only depend on header defines to query the mtu, so we don't introduce
a new dependency to ipv6 by this change.Cc: Steffen Klassert
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: Steffen Klassert
04 Nov, 2012
1 commit
-
We can save a test in ip_rt_put(), considering dst_release() accepts
a NULL parameter, and dst is first element in rtable.Add a BUILD_BUG_ON() to catch any change that could break this
assertion.Signed-off-by: Eric Dumazet
Cc: Cong Wang
Acked-by: Cong Wang
Signed-off-by: David S. Miller
09 Oct, 2012
1 commit
-
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller
19 Sep, 2012
1 commit
-
Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.Signed-off-by: Nicolas Dichtel
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Aug, 2012
1 commit
-
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.If a route is uncached, we currently have no way of accomplishing
this.So create a global list that is scanned when a network device goes
down. This mirrors the logic in net/core/dst.c's dst_ifdown().Signed-off-by: David S. Miller
27 Jul, 2012
1 commit
-
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.Reinstate the noref path when we hit a cached route in the FIB
nexthops.With help from Eric Dumazet.
Reported-by: Alexander Duyck
Signed-off-by: David S. Miller
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
24 Jul, 2012
1 commit
-
On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.Signed-off-by: David S. Miller
21 Jul, 2012
4 commits
-
It's not really needed.
We only grabbed a reference to the fib_info for the sake of fib_info
local metrics.However, fib_info objects are freed using RCU, as are therefore their
private metrics (if any).We would have triggered a route cache flush if we eliminated a
reference to a fib_info object in the routing tables.Therefore, any existing cached routes will first check and see that
they have been invalidated before an errant reference to these
metric values would occur.Signed-off-by: David S. Miller
-
That is this value's only use, as a boolean to indicate whether
a route is an input route or not.So implement it that way, using a u16 gap present in the struct
already.Signed-off-by: David S. Miller
-
Never actually used.
It was being set on output routes to the original OIF specified in the
flow key used for the lookup.Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.Signed-off-by: David S. Miller
-
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.The new interpretation is:
1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr
2) rt_gateway != 0, destination requires a nexthop gateway
Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.Signed-off-by: David S. Miller
Tested-by: Vijay Subramanian