Eric Lee / smarc-fsl-linux-kernel

05 Nov, 2015

1 commit

e1b8d903c net: Fix prefsrc lookups ... Browse Code »

A bug report (https://bugzilla.kernel.org/show_bug.cgi?id=107071) noted
that the follwoing ip command is failing with v4.3:

$ ip route add 10.248.5.0/24 dev bond0.250 table vlan_250 src 10.248.5.154
RTNETLINK answers: Invalid argument

021dd3b8a142d changed the lookup of the given preferred source address to
use the table id passed in, but this assumes the local entries are in the
given table which is not necessarily true for non-VRF use cases. When
validating the preferred source fallback to the local table on failure.

Fixes: 021dd3b8a142d ("net: Add routes to the table associated with the device")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-11-05 10:34:37 +0800

04 Nov, 2015

1 commit

73186df8d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.

Signed-off-by: David S. Miller

David S. Miller
2015-11-04 02:41:45 +0800

03 Nov, 2015

1 commit

9920e48b8 ipv4: use l4 hash for locally generated multipath flows ... Browse Code »

This patch changes how the multipath hash is computed for locally
generated flows: now the hash comprises l4 information.

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

v2 changes:
- use get_hash_from_flowi4() instead of implementing just another l4 hash
function

Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2015-11-03 03:38:43 +0800

02 Nov, 2015

2 commits

c9b3292ee ipv4: update RTNH_F_LINKDOWN flag on UP event ... Browse Code »

When nexthop is part of multipath route we should clear the
LINKDOWN flag when link goes UP or when first address is added.
This is needed because we always set LINKDOWN flag when DEAD flag
was set but now on UP the nexthop is not dead anymore. Examples when
LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:

- link goes down (LINKDOWN is set), then link goes UP and device
shows carrier OK but LINKDOWN remains set

- last address is deleted (LINKDOWN is set), then address is
added and device shows carrier OK but LINKDOWN remains set

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up

here add a multipath route where one nexthop is for dummy0:

ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
ifconfig dummy0 down
ifconfig dummy0 up

now ip route shows nexthop that is not dead. Now set the sysctl var:

echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown

now ip route will show a dead nexthop because the forgotten
RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD.

Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops")
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2015-11-02 05:57:39 +0800
4f823defd ipv4: fix to not remove local route on link down ... Browse Code »

When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ifconfig dummy0 down
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1

Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops")
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2015-11-02 05:57:39 +0800

16 Oct, 2015

1 commit

51161aa98 net: Fix suspicious RCU usage in fib_rebalance ... Browse Code »

This command:
ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 10.2.2.5 dev eth2

generated this suspicious RCU usage message:

[ 63.249262]
[ 63.249939] ===============================
[ 63.251571] [ INFO: suspicious RCU usage. ]
[ 63.253250] 4.3.0-rc3+ #298 Not tainted
[ 63.254724] -------------------------------
[ 63.256401] ../include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!
[ 63.259450]
[ 63.259450] other info that might help us debug this:
[ 63.259450]
[ 63.262297]
[ 63.262297] rcu_scheduler_active = 1, debug_locks = 1
[ 63.264647] 1 lock held by ip/2870:
[ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x14
[ 63.268858]
[ 63.268858] stack backtrace:
[ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298
[ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 63.275745] 0000000000000001 ffff8800b8c9f8b8 ffffffff8125f73c ffff88013afcf301
[ 63.278185] ffff8800bab7a380 ffff8800b8c9f8e8 ffffffff8107bf30 ffff8800bb728000
[ 63.280634] ffff880139fe9a60 0000000000000000 ffff880139fe9a00 ffff8800b8c9f908
[ 63.283177] Call Trace:
[ 63.283959] [] dump_stack+0x4c/0x68
[ 63.285593] [] lockdep_rcu_suspicious+0xfa/0x103
[ 63.287500] [] __in_dev_get_rcu+0x48/0x4f
[ 63.289169] [] fib_rebalance+0x3e/0x127
[ 63.290753] [] ? rcu_read_unlock+0x3e/0x5f
[ 63.292442] [] fib_create_info+0xaf9/0xdcc
[ 63.294093] [] ? sched_clock_local+0x12/0x75
[ 63.295791] [] fib_table_insert+0x8c/0x451
[ 63.297493] [] ? fib_get_table+0x36/0x43
[ 63.299109] [] inet_rtm_newroute+0x43/0x51
[ 63.300709] [] rtnetlink_rcv_msg+0x182/0x195
[ 63.302334] [] ? trace_hardirqs_on+0xd/0xf
[ 63.303888] [] ? rtnl_lock+0x12/0x14
[ 63.305346] [] ? __rtnl_unlock+0x12/0x12
[ 63.306878] [] netlink_rcv_skb+0x3d/0x90
[ 63.308437] [] rtnetlink_rcv+0x21/0x28
[ 63.309916] [] netlink_unicast+0xfa/0x17f
[ 63.311447] [] netlink_sendmsg+0x297/0x2dc
[ 63.313029] [] sock_sendmsg_nosec+0x12/0x1d
[ 63.314597] [] ___sys_sendmsg+0x196/0x21b
[ 63.316125] [] ? native_sched_clock+0x1f/0x3c
[ 63.317671] [] ? sched_clock_local+0x12/0x75
[ 63.319185] [] ? sched_clock_cpu+0x9d/0xb6
[ 63.320693] [] ? __lock_is_held+0x32/0x54
[ 63.322145] [] ? __fget_light+0x4b/0x77
[ 63.323541] [] __sys_sendmsg+0x3d/0x5b
[ 63.324947] [] SyS_sendmsg+0xd/0x19
[ 63.326274] [] entry_SYSCALL_64_fastpath+0x12/0x6f

It looks like all of the code paths to fib_rebalance are under rtnl.

Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath")
Cc: Peter Nørlund
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-10-16 15:57:55 +0800

07 Oct, 2015

1 commit

3ce58d843 net: Refactor path selection in __ip_route_output_key_hash ... Browse Code »

VRF device needs the same path selection following lookup to set source
address. Rather than duplicating code, move existing code into a
function that is exported to modules.

Code move only; no functional change.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-10-07 19:27:44 +0800

06 Oct, 2015

1 commit

0a837fe47 ipv4: Fix compilation errors in fib_rebalance ... Browse Code »

This fixes

net/built-in.o: In function `fib_rebalance':
fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'

and

net/built-in.o: In function `fib_rebalance':
net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'

Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath")

Signed-off-by: Peter Nørlund
Signed-off-by: David S. Miller

Peter Nørlund
2015-10-06 14:48:09 +0800

05 Oct, 2015

1 commit

0e884c78e ipv4: L3 hash-based multipath ... Browse Code »

Replaces the per-packet multipath with a hash-based multipath using
source and destination address.

Signed-off-by: Peter Nørlund
Signed-off-by: David S. Miller

Peter Nørlund
2015-10-05 17:59:21 +0800

02 Sep, 2015

1 commit

9b8ff5182 net: Make table id type u32 ... Browse Code »

A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.

Fixes:
4e3c89920cd3a ("net: Introduce VRF related flags and helpers")
15be405eb2ea9 ("net: Add inet_addr lookup by table")
30bbaa1950055 ("net: Fix up inet_addr_type checks")
021dd3b8a142d ("net: Add routes to the table associated with the device")
dc028da54ed35 ("inet: Move VRF table lookup to inlined function")
f6d3c19274c74 ("net: FIB tracepoints")

Signed-off-by: David Ahern
Reviewed-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

David Ahern
2015-09-02 05:32:44 +0800

01 Sep, 2015

3 commits

c3a8d9474 tcp: use dctcp if enabled on the route to the initiator ... Browse Code »

Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.

We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().

Joint work with Florian Westphal.

Signed-off-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Daniel Borkmann
2015-09-01 03:34:00 +0800
b8d3e4163 fib, fib6: reject invalid feature bits ... Browse Code »

Feature bits that are invalid should not be accepted by the kernel,
only the lower 4 bits may be configured, but not the remaining ones.
Even from these 4, 2 of them are unused.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-09-01 03:34:00 +0800
6cf9dfd3b net: fib: move metrics parsing to a helper ... Browse Code »

fib_create_info() is already quite large, so before adding more
code to the metrics section move that to a helper, similar to
ip6_convert_metrics.

Suggested-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Signed-off-by: David S. Miller

Florian Westphal
2015-09-01 03:34:00 +0800

25 Aug, 2015

1 commit

127eb7cd3 lwt: Add cfg argument to build_state ... Browse Code »

Add cfg and family arguments to lwt build state functions. cfg is a void
pointer and will either be a pointer to a fib_config or fib6_config
structure. The family parameter indicates which one (either AF_INET
or AF_INET6).

LWT encpasulation implementation may use the fib configuration to build
the LWT state.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-08-25 01:34:40 +0800

21 Aug, 2015

2 commits

4c9bcd117 net: Fix nexthop lookups ... Browse Code »

Andreas reported breakage adding routes with local nexthops:
$ ip route show table main
...
172.28.0.0/24 dev vnf-xe1p0 proto kernel scope link src 172.28.0.16

$ ip route add 10.0.0.0/8 via 172.28.0.32 table 100 dev vnf-xe1p0
RTNETLINK answers: Resource temporarily unavailable

3bfd847203c changed the lookup to use the passed in table but for cases like
this the nexthop is in the local table rather than the passed in table.

Fixes: 3bfd847203c ("net: Use passed in table for nexthop lookups")
Reported-by: Andreas Schultz
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-21 05:42:39 +0800
e01286ef0 ipv4: Make fib_encap_match static ... Browse Code »

Make fib_encap_match() static as it isn't used outside the file.

Signed-off-by: Ying Xue
Reviewed-by: Jiri Benc
Signed-off-by: David S. Miller

Ying Xue
2015-08-21 05:12:23 +0800

19 Aug, 2015

1 commit

df383e624 lwtunnel: fix memory leak ... Browse Code »

The built lwtunnel_state struct has to be freed after comparison.

Fixes: 571e722676fe3 ("ipv4: support for fib route lwtunnel encap attributes")
Signed-off-by: Jiri Benc
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Jiri Benc
2015-08-19 10:11:19 +0800

17 Aug, 2015

1 commit

1e3136789 ipv4: fix refcount leak in fib_check_nh() ... Browse Code »

fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.

This patch solves the typical message at reboot time or device
dismantle :

unregister_netdevice: waiting for eth0 to become free. Usage count = 4

Fixes: 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet
Cc: David Ahern
Acked-by: David Ahern
Signed-off-by: David S. Miller

Eric Dumazet
2015-08-17 13:14:32 +0800

14 Aug, 2015

3 commits

3bfd84720 net: Use passed in table for nexthop lookups ... Browse Code »

If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,

$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1003
192.168.56.0/24 dev eth1 proto kernel scope link src 192.168.56.51

$ ip route ls table 10
1.1.1.0/24 dev eth2 scope link

Without this patch adding a nexthop route fails:

$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable

With this patch the route is added successfully.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-14 13:43:21 +0800
021dd3b8a net: Add routes to the table associated with the device ... Browse Code »

When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.

A part of this is directing prefsrc validations to the correct
table as well.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-14 13:43:21 +0800
30bbaa195 net: Fix up inet_addr_type checks ... Browse Code »

Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.

inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2015-08-14 13:43:21 +0800

01 Aug, 2015

1 commit

5510b3c2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.c

All four conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller

David S. Miller
2015-08-01 14:52:20 +0800

27 Jul, 2015

2 commits

5a6228a0b lwtunnel: change prototype of lwtunnel_state_get() ... Browse Code »

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf
CC: Roopa Prabhu
Signed-off-by: Nicolas Dichtel
Acked-by: Thomas Graf
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller

Nicolas Dichtel
2015-07-27 16:02:49 +0800
88f643203 ipv4: be more aggressive when probing alternative gateways ... Browse Code »

Currently, we do not notice if new alternative gateways
are added. We can do it by checking for present neigh
entry. Also, gateways that are currently probed (NUD_INCOMPLETE)
can be skipped from round-robin probing.

Suggested-by: Florian Westphal
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2015-07-27 11:56:27 +0800

25 Jul, 2015

2 commits

2392debc2 ipv4: consider TOS in fib_select_default ... Browse Code »

fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.

This patch solves the problem as follows:

- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.

- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.

For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:

tos 8 via gw1 metric 2 fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1

Reported-by: Hagen Paul Pfeifer
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2015-07-25 13:46:11 +0800
18a912e9a ipv4: fib_select_default should match the prefix ... Browse Code »

fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.

Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info")
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2015-07-25 13:46:09 +0800

22 Jul, 2015

1 commit

571e72267 ipv4: support for fib route lwtunnel encap attributes ... Browse Code »

This patch adds support in ipv4 fib functions to parse user
provided encap attributes and attach encap state data to fib_nh
and rtable.

Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller

Roopa Prabhu
2015-07-22 01:39:03 +0800

29 Jun, 2015

1 commit

96ac5cc96 ipv4: fix RCU lockdep warning from linkdown changes ... Browse Code »

The following lockdep splat was seen due to the wrong context for
grabbing in_dev.

===============================
[ INFO: suspicious RCU usage. ]
4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted
-------------------------------
include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/403:
#0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x19
#1: ((inetaddr_chain).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x35/0x6a

stack backtrace:
CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244
0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3
ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00
ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8
Call Trace:
[] dump_stack+0x4c/0x6e
[] ? up+0x39/0x3e
[] lockdep_rcu_suspicious+0xf7/0x100
[] fib_dump_info+0x227/0x3e2
[] rtmsg_fib+0xa6/0x116
[] fib_table_insert+0x316/0x355
[] fib_magic+0xb7/0xc7
[] fib_add_ifaddr+0xb1/0x13b
[] fib_inetaddr_event+0x36/0x90
[] notifier_call_chain+0x4c/0x71
[] __blocking_notifier_call_chain+0x4e/0x6a
[] blocking_notifier_call_chain+0x14/0x16
[] __inet_insert_ifa+0x1a5/0x1b3
[] inet_rtm_newaddr+0x350/0x35f
[] rtnetlink_rcv_msg+0x17b/0x18a
[] ? trace_hardirqs_on+0xd/0xf
[] ? netlink_deliver_tap+0x1cb/0x1f7
[] ? rtnl_newlink+0x72a/0x72a
...

This patch resolves that splat.

Signed-off-by: Andy Gospodarek
Reported-by: Sergey Senozhatsky
Signed-off-by: David S. Miller

Andy Gospodarek
2015-06-29 07:47:12 +0800

24 Jun, 2015

2 commits

0eeb075fa net: ipv4 sysctl option to ignore routes when nexthop link is down ... Browse Code »

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup. This will signal to userspace that the route will not be
selected. The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down. This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected. The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache
80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down. Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache
80.0.0.2 dev p8p1 src 80.0.0.1
cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set. Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup

Signed-off-by: Andy Gospodarek
Signed-off-by: Dinesh Dutt
Acked-by: Scott Feldman
Signed-off-by: David S. Miller

Andy Gospodarek
2015-06-24 17:15:54 +0800
8a3d03166 net: track link-status of ipv4 nexthops ... Browse Code »

Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off. No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

v5: Whitespace and variable declaration fixups suggested by Dave.

v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.

Signed-off-by: Andy Gospodarek
Signed-off-by: Dinesh Dutt
Acked-by: Scott Feldman
Signed-off-by: David S. Miller

Andy Gospodarek
2015-06-24 17:15:54 +0800

03 May, 2015

1 commit

7eee8cd4d ipv4: remove the unnecessary codes in fib_info_hash_move ... Browse Code »

The whole hlist will be moved, so not need to call hlist_del before
add the hlist_node to other hlist_head.

Signed-off-by: Li RongQing
Signed-off-by: David S. Miller

Li RongQing
2015-05-03 10:17:44 +0800

04 Apr, 2015

1 commit

51456b291 ipv4: coding style: comparison for equality with NULL ... Browse Code »

The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris
Signed-off-by: David S. Miller

Ian Morris
2015-04-04 00:11:15 +0800

01 Apr, 2015

2 commits

67b61f6c1 netlink: implement nla_get_in_addr and nla_get_in6_addr ... Browse Code »

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800
930345ea6 netlink: implement nla_put_in_addr and nla_put_in6_addr ... Browse Code »

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800

13 Mar, 2015

1 commit

efd7ef1c1 net: Kill hold_net release_net ... Browse Code »

hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008. Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman"
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 02:39:40 +0800

28 Feb, 2015

1 commit

56315f9e6 fib_trie: Convert fib_alias to hlist from list ... Browse Code »

There isn't any advantage to having it as a list and by making it an hlist
we make the fib_alias more compatible with the list_info in terms of the
type of list used.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2015-02-28 05:37:06 +0800

26 Jan, 2015

1 commit

02525368f fib_trie: Move fib_find_alias to file where it is used ... Browse Code »

The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2015-01-26 06:47:16 +0800

18 Jan, 2015

1 commit

053c095a8 netlink: make nlmsg_end() and genlmsg_end() void ... Browse Code »

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller

Johannes Berg
2015-01-18 14:03:45 +0800

06 Jan, 2015

1 commit

ea6976399 net: tcp: add RTAX_CC_ALGO fib handling ... Browse Code »

This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2015-01-06 11:55:24 +0800

15 Oct, 2014

1 commit

f76936d07 ipv4: fix nexthop attlen check in fib_nh_match ... Browse Code »

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jiri Pirko
2014-10-15 03:59:37 +0800