22 Feb, 2016
1 commit
-
Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
18 Dec, 2015
1 commit
-
Conflicts:
drivers/net/geneve.cHere we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.Signed-off-by: David S. Miller
12 Dec, 2015
5 commits
-
The via address is optional for a single path route, yet is mandatory
when the multipath attribute is used:# ip -f mpls route add 100 dev lo
# ip -f mpls route add 101 nexthop dev lo
RTNETLINK answers: Invalid argumentMake them consistent by making the via address optional when the
RTA_MULTIPATH attribute is being parsed so that both forms of
specifying the route work.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
When a via address isn't specified, the via table is left initialised
to 0 (NEIGH_ARP_TABLE), and the via address length also left
initialised to 0. This results in a via address array of length 0
being allocated (contiguous with route and nexthop array), meaning
that when a packet is sent using neigh_xmit the neighbour lookup and
creation will cause an out-of-bounds access when accessing the 4 bytes
of the IPv4 address it assumes it has been given a pointer to.This could be fixed by allocating the 4 bytes of via address necessary
and leaving it as all zeroes. However, it seems wrong to me to use an
ipv4 nexthop (including possibly ARPing for 0.0.0.0) when the user
didn't specify to do so.Instead, set the via address table to NEIGH_NR_TABLES to signify it
hasn't been specified and use this at forwarding time to signify a
neigh_xmit using an L2 address consisting of the device address. This
mechanism is the same as that used for both ARP and ND for loopback
interfaces and those flagged as no-arp, which are all we can really
support in this case.Fixes: cf4b24f0024f ("mpls: reduce memory usage of routes")
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
The problem seen is that when adding a route with a nexthop with no
via address specified, iproute2 generates bogus output:# ip -f mpls route add 100 dev lo
# ip -f mpls route list
100 via inet 0.0.8.0 dev loThe reason for this is that the kernel generates an RTA_VIA attribute
with the family set to AF_INET, but the via address data having zero
length. The cause of family being AF_INET is that on route insert
cfg->rc_via_table is left set to 0, which just happens to be
NEIGH_ARP_TABLE which is then translated into AF_INET.iproute2 doesn't validate the length prior to printing and so prints
garbage. Although it could be fixed to do the validation, I would
argue that AF_INET addresses should always be exactly 4 bytes so the
kernel is really giving userspace bogus data.Therefore, avoid generating the RTA_VIA attribute when dumping the
route if the via address wasn't specified on add/modify. This is
indicated by NEIGH_ARP_TABLE and a zero via address length - if the
user specified a via address the address length would have been
validated such that it was 4 bytes. Although this is a change in
behaviour that is visible to userspace, I believe that what was
generated before was invalid and as such userspace wouldn't be
expecting it.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
If an L2 via address for an mpls nexthop is specified, the length of
the L2 address must match that expected by the output device,
otherwise it could access memory beyond the end of the via address
buffer in the route.This check was present prior to commit f8efb73c97e2 ("mpls: multipath
route support"), but got lost in the refactoring, so add it back,
applying it to all nexthops in multipath routes.Fixes: f8efb73c97e2 ("mpls: multipath route support")
Signed-off-by: Robert Shearman
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller -
This gets rid of the following compile warn:
net/mpls/mpls_iptunnel.c:40:5: warning: no previous prototype for
mpls_output [-Wmissing-prototypes]Signed-off-by: Roopa Prabhu
Acked-by: Robert Shearman
Signed-off-by: David S. Miller
08 Dec, 2015
1 commit
-
Locally generated IPv4 and (probably) IPv6 packets are dropped because
skb->protocol isn't set. We could write wrappers to lwtunnel_output
for IPv4 and IPv6 that set the protocol accordingly and then call
lwtunnel_output, but mpls_output relies on the AF-specific type of dst
anyway to get the via address.Therefore, make use of dst->dst_ops->family in mpls_output to
determine the type of nexthop and thus protocol of the packet instead
of checking skb->protocol.Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Reported-by: Sam Russell
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
04 Dec, 2015
1 commit
-
Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).dead routes:
-----------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2$ip link set dev swp1 down
$ip link show dev swp1
4: swp1: mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2linkdown routes:
----------------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2$ip link show dev swp1
4: swp1: mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff/* carrier goes down */
$ip link show dev swp1
4: swp1: mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2Signed-off-by: Roopa Prabhu
Acked-by: Robert Shearman
Signed-off-by: David S. Miller
28 Oct, 2015
2 commits
-
Nexthops for MPLS routes have a via address field sized for the
largest via address that is expected, which is 32 bytes. This means
that in the most common case of having ipv4 via addresses, 28 bytes of
memory more than required are used per nexthop. In the other common
case of an ipv6 nexthop then 16 bytes more than required are
used. With large numbers of MPLS routes this extra memory usage could
start to become significant.To avoid allocating memory for a maximum length via address when not
all of it is required and to allow for ease of iterating over
nexthops, then the via addresses are changed to be stored in the same
memory block as the route and nexthops, but in an array after the end
of the array of nexthops. New accessors are provided to retrieve a
pointer to the via address.To allow for O(1) access without having to store a pointer or offset
per nh, the via address for each nexthop is sized according to the
maximum via address for any nexthop in the route, which is stored in a
new route field, rt_max_alen, but this is in an existing hole in
struct mpls_route so it doesn't increase the size of the
structure. Each via address is ensured to be aligned to VIA_ALEN_ALIGN
to account for architectures that don't allow unaligned accesses.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller -
Fill in the via address length for the predefined IPv4 and IPv6
explicit-null label routes.Fixes: f8efb73c97e2 ("mpls: multipath route support")
Signed-off-by: Robert Shearman
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller
23 Oct, 2015
2 commits
-
Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
payload, if present.Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.Signed-off-by: Robert Shearman
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller -
This patch adds support for MPLS multipath routes.
Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'- 'struct mpls_nh' represents a mpls nexthop label forwarding entry
- moves mpls route and nexthop structures into internal.h
- A mpls_route can point to multiple mpls_nh structs
- the nexthops are maintained as a array (similar to ipv4 fib)
- In the process of restructuring, this patch also consistently changes
all labels to u8- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple
nexthops. Cannot match against any single nexthops dev. So, this patch
removes the unused 'dev' handling in mpls_route_update.- dead route/path handling will be implemented in a subsequent patch
Example:
$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
nexthop as to 800 via inet 40.1.1.2 dev swp3Signed-off-by: Roopa Prabhu
Acked-by: Robert Shearman
Signed-off-by: David S. Miller
08 Oct, 2015
1 commit
-
The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
01 Sep, 2015
1 commit
-
Fix a memory leak in the mpls netns init function in case of failure. If
register_net_sysctl fails then we need to free the ctl_table.Fixes: 7720c01f3f59 ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
25 Aug, 2015
1 commit
-
Add cfg and family arguments to lwt build state functions. cfg is a void
pointer and will either be a pointer to a fib_config or fib6_config
structure. The family parameter indicates which one (either AF_INET
or AF_INET6).LWT encpasulation implementation may use the fib configuration to build
the LWT state.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
21 Aug, 2015
1 commit
-
Currently, the lwtunnel state resides in per-protocol data. This is
a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
The xmit function of the tunnel does not know whether the packet has been
routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
lwtstate data to dst_entry makes such inter-protocol tunneling possible.As a bonus, this brings a nice diffstat.
Signed-off-by: Jiri Benc
Acked-by: Roopa Prabhu
Acked-by: Thomas Graf
Signed-off-by: David S. Miller
10 Aug, 2015
1 commit
-
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
07 Aug, 2015
2 commits
-
This patch adds null dev check for the 'cfg->rc_via_table ==
NEIGH_LINK_TABLE or dev_get_by_index() failed' caseReported-by: Dan Carpenter
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller -
We recently changed this code from returning NULL to returning ERR_PTR.
There are some left over NULL assignments which we can remove. We can
preserve the error code from ip_route_output() instead of always
returning -ENODEV. Also these functions use a mix of gotos and direct
returns. There is no cleanup necessary so I changed the gotos to
direct returns.Signed-off-by: Dan Carpenter
Acked-by: Roopa Prabhu
Acked-by: Robert Shearman
Signed-off-by: David S. Miller
04 Aug, 2015
1 commit
-
In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.Signed-off-by: Robert Shearman
Acked-by: Roopa Prabhu
Signed-off-by: David S. Miller
01 Aug, 2015
1 commit
-
Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.This patch uses ipv6_stub_impl.ipv6_dst_lookup instead of
ip6_route_output. And wraps affected code under
IS_ENABLED(CONFIG_INET) and IS_ENABLED(CONFIG_IPV6).Reported-by: kbuild test robot
Reported-by: Thomas Graf
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
23 Jul, 2015
1 commit
-
fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.Reported-by: kbuild test robot
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
22 Jul, 2015
3 commits
-
If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller -
This implementation uses lwtunnel infrastructure to register
hooks for mpls tunnel encaps.It picks cues from iptunnel_encaps infrastructure and previous
mpls iptunnel RFC patches from Eric W. Biederman and Robert ShearmanSigned-off-by: Roopa Prabhu
Signed-off-by: David S. Miller -
Signed-off-by: Roopa Prabhu
Signed-off-by: David S. Miller
14 Jun, 2015
1 commit
12 Jun, 2015
1 commit
-
If a device is renamed and the original name is subsequently reused
for a new device, the following warning is generated:sysctl duplicate entry: /net/mpls/conf/veth0//input
CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
Call Trace:
[] ? dump_stack+0x40/0x50
[] ? __register_sysctl_table+0x289/0x5a0
[] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
[] ? notifier_call_chain+0x4f/0x70
[] ? register_netdevice+0x2b2/0x480
[] ? veth_newlink+0x178/0x2d3 [veth]
[] ? rtnl_newlink+0x73c/0x8e0
[] ? rtnl_newlink+0x16a/0x8e0
[] ? __kmalloc_reserve.isra.30+0x32/0x90
[] ? rtnetlink_rcv_msg+0x8d/0x250
[] ? __alloc_skb+0x47/0x1f0
[] ? __netlink_lookup+0xab/0xe0
[] ? rtnetlink_rcv+0x30/0x30
[] ? netlink_rcv_skb+0xb0/0xd0
[] ? rtnetlink_rcv+0x24/0x30
[] ? netlink_unicast+0x107/0x1a0
[] ? netlink_sendmsg+0x50e/0x630
[] ? sock_sendmsg+0x3c/0x50
[] ? ___sys_sendmsg+0x27b/0x290
[] ? mem_cgroup_try_charge+0x88/0x110
[] ? mem_cgroup_commit_charge+0x56/0xa0
[] ? do_filp_open+0x30/0xa0
[] ? __sys_sendmsg+0x3e/0x80
[] ? system_call_fastpath+0x16/0x75Fix this by unregistering the previous sysctl table (registered for
the path containing the original device name) and re-registering the
table for the path containing the new device name.Fixes: 37bde79979c3 ("mpls: Per-device enabling of packet input")
Reported-by: Scott Feldman
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
09 Jun, 2015
1 commit
08 Jun, 2015
1 commit
-
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.Fixes: 03c57747a702 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman
Acked-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
02 Jun, 2015
1 commit
-
When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.So add a priority field so we can handle this properly.
IPv4/IPv6 get the highest priority with the implicit zero priority
field.Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.Suggested-by: Eric Dumazet
Suggested-by: Toshiaki Makita
Signed-off-by: David S. Miller
10 May, 2015
1 commit
-
Since these are now visible to userspace it is nice to be consistent
with BSD (sys/netmpls/mpls.h in netBSD).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
06 May, 2015
1 commit
-
Move to include/uapi/linux/mpls.h to be externally visibile.
Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
23 Apr, 2015
3 commits
-
The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.Suggested-by: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
13 Mar, 2015
1 commit
-
Reobert Shearman noticed that mpls_egress is failing to verify that
the bytes to be examined are in fact present in the packet before
mpls_egress reads those bytes.As suggested by David Miller reduce this to a single pskb_may_pull
call so that we don't do unnecessary work in the fast path.Reported-by: Robert Shearman
Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
12 Mar, 2015
1 commit
-
CONFIG_MPLS=m doesn't result in a kernel module being built because it
applies to the net/mpls directory, rather than to .o files.So revert the MPLS menuitem to being a boolean and make MPLS_GSO and
MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be
produced as desired.Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Signed-off-by: David S. Miller
10 Mar, 2015
1 commit
-
Signed-off-by: Geert Uytterhoeven
Acked-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
09 Mar, 2015
1 commit
-
Remove a little bit of unnecessary work when transmitting a packet with
neigh_packet_xmit. Use the neighbour table index not the address family
as a parameter.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller