Eric Lee / smarc-fsl-linux-kernel

29 May, 2018

1 commit

cb1603948 vrf: add CRC32c offload to device features ... Browse Code »

SCTP sockets originated in a VRF can improve their performance if CRC32c
computation is delegated to underlying devices: update device features,
setting NETIF_F_SCTP_CRC. Iterating the following command in the topology
proposed with [1],

# ip vrf exec vrf-h2 netperf -H 192.0.2.1 -t SCTP_STREAM -- -m 10K

the measured throughput in Mbit/s improved from 2395 ± 1% to 2720 ± 1%.

[1] https://www.spinics.net/lists/netdev/msg486007.html

Signed-off-by: Davide Caratti
Reviewed-by: Marcelo Ricardo Leitner
Acked-by: David Ahern
Signed-off-by: David S. Miller

Davide Caratti
2018-05-29 10:55:13 +0800

18 Apr, 2018

1 commit

43b059a31 vrf: Move fib6_table into net_vrf ... Browse Code »

A later patch removes rt6i_table from rt6_info. Save the ipv6
table for a VRF in net_vrf. fib tables can not be deleted so
no reference counting or locking is required.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2018-04-18 11:41:15 +0800

02 Apr, 2018

1 commit

c0b458a94 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller

David S. Miller
2018-04-02 07:49:34 +0800

31 Mar, 2018

1 commit

82dd0d2a9 vrf: Fix use after free and double free in vrf_finish_output ... Browse Code »

Miguel reported an skb use after free / double free in vrf_finish_output
when neigh_output returns an error. The vrf driver should return after
the call to neigh_output as it takes over the skb on error path as well.

Patch is a simplified version of Miguel's patch which was written for 4.9,
and updated to top of tree.

Fixes: 8f58336d3f78a ("net: Add ethernet header for pass through VRF device")
Signed-off-by: Miguel Fadon Perlines
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2018-03-31 02:20:23 +0800

28 Mar, 2018

1 commit

2f635ceeb net: Drop pernet_operations::async ... Browse Code »

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-03-28 01:18:09 +0800

05 Mar, 2018

1 commit

b75cc8f90 net/ipv6: Pass skb to route lookup ... Browse Code »

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern
Reviewed-by: Ido Schimmel
Reviewed-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

David Ahern
2018-03-05 02:04:22 +0800

28 Feb, 2018

1 commit

02df428ca net: Convert simple pernet_operations ... Browse Code »

These pernet_operations make pretty simple actions
like variable initialization on init, debug checks
on exit, and so on, and they obviously are able
to be executed in parallel with any others:

vrf_net_ops
lockd_net_ops
grace_net_ops
xfrm6_tunnel_net_ops
kcm_net_ops
tcf_net_ops

Signed-off-by: Kirill Tkhai
Signed-off-by: David S. Miller

Kirill Tkhai
2018-02-28 00:01:35 +0800

24 Feb, 2018

1 commit

1b71af605 net: fib_rules: Add new attribute to set protocol ... Browse Code »

For ages iproute2 has used `struct rtmsg` as the ancillary header for
FIB rules and in the process set the protocol value to RTPROT_BOOT.
Until ca56209a66 ("net: Allow a rule to track originating protocol")
the kernel rules code ignored the protocol value sent from userspace
and always returned 0 in notifications. To avoid incompatibility with
existing iproute2, send the protocol as a new attribute.

Fixes: cac56209a66 ("net: Allow a rule to track originating protocol")
Signed-off-by: Donald Sharp
Signed-off-by: David S. Miller

Donald Sharp
2018-02-24 04:47:20 +0800

22 Feb, 2018

1 commit

cac56209a net: Allow a rule to track originating protocol ... Browse Code »

Allow a rule that is being added/deleted/modified or
dumped to contain the originating protocol's id.

The protocol is handled just like a routes originating
protocol is. This is especially useful because there
is starting to be a plethora of different user space
programs adding rules.

Allow the vrf device to specify that the kernel is the originator
of the rule created for this device.

Signed-off-by: Donald Sharp
Signed-off-by: David S. Miller

Donald Sharp
2018-02-22 06:49:24 +0800

16 Feb, 2018

1 commit

68e813aa4 net/ipv4: Remove fib table id from rtable ... Browse Code »

Remove rt_table_id from rtable. It was added for getroute to return the
table id that was hit in the lookup. With the changes for fibmatch the
table id can be extracted from the fib_info returned in the fib_result
so it no longer needs to be in rtable directly.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2018-02-16 04:41:42 +0800

26 Jan, 2018

1 commit

1e19c4d68 net: vrf: Add support for sends to local broadcast address ... Browse Code »

Sukumar reported that sends to the local broadcast address
(255.255.255.255) are broken. Check for the address in vrf driver
and do not redirect to the VRF device - similar to multicast
packets.

With this change sockets can use SO_BINDTODEVICE to specify an
egress interface and receive responses. Note: the egress interface
can not be a VRF device but needs to be the enslaved device.

https://bugzilla.kernel.org/show_bug.cgi?id=198521

Reported-by: Sukumar Gopalakrishnan
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2018-01-26 10:51:03 +0800

04 Nov, 2017

1 commit

2a171788b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2017-11-04 08:26:51 +0800

02 Nov, 2017

1 commit

18129a249 net: vrf: correct FRA_L3MDEV encode type ... Browse Code »

FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On
big endian architecture, this results in the l3mdev entry not being
added to the FIB rules.

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Signed-off-by: Jeff Barnhill
Acked-by: David Ahern
Signed-off-by: David S. Miller

Jeff Barnhill
2017-11-02 15:20:53 +0800

05 Oct, 2017

3 commits

de3baa3ed net: vrf: Add extack messages for enslave errors ... Browse Code »

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-10-05 12:39:33 +0800
42ab19ee9 net: Add extack to upper device linking ... Browse Code »

Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-10-05 12:39:33 +0800
33eaf2a6e net: Add extack to ndo_add_slave ... Browse Code »

Pass extack to do_set_master and down to ndo_add_slave

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-10-05 12:39:33 +0800

22 Sep, 2017

1 commit

a42412076 net: vrf: remove skb_dst_force() after skb_dst_set() ... Browse Code »

skb_dst_set(skb, dst) installs a normal (refcounted) dst, there is no
point using skb_dst_force(skb)

Signed-off-by: Eric Dumazet
Acked-by: David Ahern
Signed-off-by: David S. Miller

Eric Dumazet
2017-09-22 11:36:32 +0800

16 Sep, 2017

1 commit

ecf091171 net: vrf: avoid gcc-4.6 warning ... Browse Code »

When building an allmodconfig kernel with gcc-4.6, we get a rather
odd warning:

drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]

I have no idea what this warning is even trying to say, but it does
seem like a false positive. Reordering the initialization in to match
the structure definition gets rid of the warning, and might also avoid
whatever gcc thinks is wrong here.

Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2017-09-16 05:22:21 +0800

14 Aug, 2017

1 commit

4f04256c9 net: vrf: Drop local rtable and rt6_info ... Browse Code »

The VRF cached rtable and rt6_info for local traffic are no longer
needed and actually prevent local traffic through enslaved devices.
Remove them.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-08-14 11:05:12 +0800

08 Aug, 2017

1 commit

53b948356 net: vrf: Add extack messages for newlink failures ... Browse Code »

Add extack error messages for failure paths creating vrf devices. Once
extack support is added to iproute2, we go from the unhelpful:
$ ip li add foobar type vrf
RTNETLINK answers: Invalid argument

to:
$ ip li add foobar type vrf
Error: VRF table id is missing

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-08-08 06:16:33 +0800

06 Jul, 2017

1 commit

f630c38ef vrf: fix bug_on triggered by rx when destroying a vrf ... Browse Code »

When destroying a VRF device we cleanup the slaves in its ndo_uninit()
function, but that causes packets to be switched (skb->dev == vrf being
destroyed) even though we're pass the point where the VRF should be
receiving any packets while it is being dismantled. This causes a BUG_ON
to trigger if we have raw sockets (trace below).
The reason is that the inetdev of the VRF has been destroyed but we're
still sending packets up the stack with it, so let's free the slaves in
the dellink callback as David Ahern suggested.

Note that this fix doesn't prevent packets from going up when the VRF
device is admin down.

[ 35.631371] ------------[ cut here ]------------
[ 35.631603] kernel BUG at net/ipv4/fib_frontend.c:285!
[ 35.631854] invalid opcode: 0000 [#1] SMP
[ 35.631977] Modules linked in:
[ 35.632081] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.12.0-rc7+ #45
[ 35.632247] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 35.632477] task: ffff88005ad68000 task.stack: ffff88005ad64000
[ 35.632632] RIP: 0010:fib_compute_spec_dst+0xfc/0x1ee
[ 35.632769] RSP: 0018:ffff88005ad67978 EFLAGS: 00010202
[ 35.632910] RAX: 0000000000000001 RBX: ffff880059a7f200 RCX: 0000000000000000
[ 35.633084] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82274af0
[ 35.633256] RBP: ffff88005ad679f8 R08: 000000000001ef70 R09: 0000000000000046
[ 35.633430] R10: ffff88005ad679f8 R11: ffff880037731cb0 R12: 0000000000000001
[ 35.633603] R13: ffff8800599e3000 R14: 0000000000000000 R15: ffff8800599cb852
[ 35.634114] FS: 0000000000000000(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
[ 35.634306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.634456] CR2: 00007f3563227095 CR3: 000000000201d000 CR4: 00000000000406e0
[ 35.634632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 35.634865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 35.635055] Call Trace:
[ 35.635271] ? __lock_acquire+0xf0d/0x1117
[ 35.635522] ipv4_pktinfo_prepare+0x82/0x151
[ 35.635831] raw_rcv_skb+0x17/0x3c
[ 35.636062] raw_rcv+0xe5/0xf7
[ 35.636287] raw_local_deliver+0x169/0x1d9
[ 35.636534] ip_local_deliver_finish+0x87/0x1c4
[ 35.636820] ip_local_deliver+0x63/0x7f
[ 35.637058] ip_rcv_finish+0x340/0x3a1
[ 35.637295] ip_rcv+0x314/0x34a
[ 35.637525] __netif_receive_skb_core+0x49f/0x7c5
[ 35.637780] ? lock_acquire+0x13f/0x1d7
[ 35.638018] ? lock_acquire+0x15e/0x1d7
[ 35.638259] __netif_receive_skb+0x1e/0x94
[ 35.638502] ? __netif_receive_skb+0x1e/0x94
[ 35.638748] netif_receive_skb_internal+0x74/0x300
[ 35.639002] ? dev_gro_receive+0x2ed/0x411
[ 35.639246] ? lock_is_held_type+0xc4/0xd2
[ 35.639491] napi_gro_receive+0x105/0x1a0
[ 35.639736] receive_buf+0xc32/0xc74
[ 35.639965] ? detach_buf+0x67/0x153
[ 35.640201] ? virtqueue_get_buf_ctx+0x120/0x176
[ 35.640453] virtnet_poll+0x128/0x1c5
[ 35.640690] net_rx_action+0x103/0x343
[ 35.640932] __do_softirq+0x1c7/0x4b7
[ 35.641171] run_ksoftirqd+0x23/0x5c
[ 35.641403] smpboot_thread_fn+0x24f/0x26d
[ 35.641646] ? sort_range+0x22/0x22
[ 35.641878] kthread+0x129/0x131
[ 35.642104] ? __list_add+0x31/0x31
[ 35.642335] ? __list_add+0x31/0x31
[ 35.642568] ret_from_fork+0x2a/0x40
[ 35.642804] Code: 05 bd 87 a3 00 01 e8 1f ef 98 ff 4d 85 f6 48 c7 c7 f0 4a 27 82 41 0f 94 c4 31 c9 31 d2 41 0f b6 f4 e8 04 71 a1 ff 45 84 e4 74 02 0b 0f b7 93 c4 00 00 00 4d 8b a5 80 05 00 00 48 03 93 d0 00
[ 35.644342] RIP: fib_compute_spec_dst+0xfc/0x1ee RSP: ffff88005ad67978

Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Chris Cormier
Signed-off-by: Nikolay Aleksandrov
Acked-by: David Ahern
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2017-07-06 23:46:07 +0800

27 Jun, 2017

2 commits

a8b8a889e net: add netlink_ext_ack argument to rtnl_link_ops.validate ... Browse Code »

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer
Acked-by: David Ahern
Signed-off-by: David S. Miller

Matthias Schiffer
2017-06-27 11:13:22 +0800
7a3f4a185 net: add netlink_ext_ack argument to rtnl_link_ops.newlink ... Browse Code »

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer
Acked-by: David Ahern
Signed-off-by: David S. Miller

Matthias Schiffer
2017-06-27 11:13:21 +0800

18 Jun, 2017

2 commits

a4c2fd7f7 net: remove DST_NOCACHE flag ... Browse Code »

DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:01 +0800
1cfb71eeb ipv6: take dst->__refcnt for insertion into fib6 tree ... Browse Code »

In IPv6 routing code, struct rt6_info is created for each static route
and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
ref count is not taken.
As explained in the previous patch, this leads to the need of the dst
garbage collector.

This patch holds ref count of dst before inserting the route into fib6
tree and properly releases the dst when deleting it from the fib6 tree
as a preparation in order to fully get rid of dst gc later.

Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
no user is referencing the dst.

And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
dst->__refcnt to 1.

Signed-off-by: Wei Wang
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Wei Wang
2017-06-18 10:54:00 +0800

16 Jun, 2017

1 commit

d58ff3512 networking: make skb_push & __skb_push return void pointers ... Browse Code »

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:40 +0800

09 Jun, 2017

1 commit

097d3c950 net: vrf: Make add_fib_rules per network namespace flag ... Browse Code »

Commit 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
adds the l3mdev FIB rule the first time a VRF device is created. However,
it only creates the rule once and only in the namespace the first device
is created - which may not be init_net. Fix by using the net_generic
capability to make the add_fib_rules flag per network namespace.

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Reported-by: Petr Machata
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-06-09 07:27:42 +0800

08 Jun, 2017

1 commit

cf124db56 net: Fix inconsistent teardown and release of private netdev state. ... Browse Code »

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller

David S. Miller
2017-06-08 03:53:24 +0800

12 May, 2017

1 commit

1a4a5bf52 driver: vrf: Fix one possible use-after-free issue ... Browse Code »

The current codes only deal with the case that the skb is dropped, it
may meet one use-after-free issue when NF_HOOK returns 0 that means
the skb is stolen by one netfilter rule or hook.

When one netfilter rule or hook stoles the skb and return NF_STOLEN,
it means the skb is taken by the rule, and other modules should not
touch this skb ever. Maybe the skb is queued or freed directly by the
rule.

Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
and check the return value of nf_hook. Only when its value equals 1, it
means the skb could go ahead. Or reset the skb as NULL.

BTW, because vrf_rcv_finish is empty function, so needn't invoke it
even though nf_hook returns 1. But we need to modify vrf_rcv_finish
to deal with the NF_STOLEN case.

There are two cases when skb is stolen.
1. The skb is stolen and freed directly.
There is nothing we need to do, and vrf_rcv_finish isn't invoked.
2. The skb is queued and reinjected again.
The vrf_rcv_finish would be invoked as okfn, so need to free the
skb in it.

Signed-off-by: Gao Feng
Signed-off-by: David S. Miller

Gao Feng
2017-05-12 00:13:11 +0800

28 Apr, 2017

1 commit

26d31ac11 net: vrf: Do not allow looback to be moved to a VRF ... Browse Code »

Moving the loopback into a VRF breaks networking for the default VRF.
Since the VRF device is the loopback for VRF domains, there is no
reason to move the loopback. Given the repercussions, block attempts
to set lo into a VRF.

Signed-off-by: David Ahern
Reviewed-by: Greg Rose
Signed-off-by: David S. Miller

David Ahern
2017-04-28 04:49:43 +0800

20 Apr, 2017

1 commit

7b9f6da17 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2017-04-20 22:35:33 +0800

18 Apr, 2017

2 commits

c21ef3e34 net: rtnetlink: plumb extended ack to doit function ... Browse Code »

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-04-18 03:35:38 +0800
426c87caa net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule ... Browse Code »

Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-04-18 01:27:54 +0800

24 Mar, 2017

1 commit

16ae1f223 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/broadcom/genet/bcmmii.c
drivers/net/hyperv/netvsc.c
kernel/bpf/hashtab.c

Almost entirely overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2017-03-24 07:41:27 +0800

23 Mar, 2017

2 commits

a9ec54d1b net: vrf: performance improvements for IPv6 ... Browse Code »

The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv6 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) from a +3% improvement for UDP which has a lookup
per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
which connects a socket for each request-response.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-03-23 02:19:48 +0800
dcdd43c41 net: vrf: performance improvements for IPv4 ... Browse Code »

The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv4 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
< 1% overhead for connected sockets that leverage early demux and
avoid FIB lookups.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-03-23 02:19:48 +0800

22 Mar, 2017

1 commit

3dc857f0e net: vrf: Reset rt6i_idev in local dst after put ... Browse Code »

The VRF driver takes a reference to the inet6_dev on the VRF device for
its rt6_local dst when handling local traffic through the VRF device as
a loopback. When the device is deleted the driver does a put on the idev
but does not reset rt6i_idev in the rt6_info struct. When the dst is
destroyed, dst_destroy calls ip6_dst_destroy which does a second put for
what is essentially the same reference causing it to be prematurely freed.
Reset rt6i_idev after the put in the vrf driver.

Fixes: b4869aa2f881e ("net: vrf: ipv6 support for local traffic to
local addresses")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-03-22 08:50:20 +0800

17 Mar, 2017

1 commit

fdeea7be8 net: vrf: Set slave's private flag before linking ... Browse Code »

Allow listeners of the subsequent CHANGEUPPER notification to retrieve
the VRF's table ID by calling l3mdev_fib_table() with the slave netdev.
Without this change, the netdev won't be considered an L3 slave and the
function would return 0.

This is consistent with other master device such as bridge and bond that
set the slave's private flag before linking. It also makes
do_vrf_{add,del}_slave() symmetric.

Signed-off-by: Ido Schimmel
Acked-by: David Ahern
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2017-03-17 01:18:34 +0800

09 Mar, 2017

1 commit

f7887d40e vrf: Fix use-after-free in vrf_xmit ... Browse Code »

KASAN detected a use-after-free:

[ 269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr ffff8800350a21c0
[ 269.467067] Read of size 4 by task ssh/1879
[ 269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249
[ 269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 269.467067] Call Trace:
[ 269.467067] dump_stack+0x81/0xb6
[ 269.467067] kasan_object_err+0x21/0x78
[ 269.467067] kasan_report+0x2f7/0x450
[ 269.467067] ? vrf_xmit+0x7f1/0x827 [vrf]
[ 269.467067] ? ip_output+0xa4/0xdb
[ 269.467067] __asan_load4+0x6b/0x6d
[ 269.467067] vrf_xmit+0x7f1/0x827 [vrf]
...

Which corresponds to the skb access after xmit handling. Fix by saving
skb->len and using the saved value to update stats.

Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-03-09 15:10:02 +0800

12 Feb, 2017

1 commit

c16ec1859 net: rename dst_neigh_output back to neigh_output ... Browse Code »

After the dst->pending_confirm flag was removed, we do not
need anymore to provide dst arg to dst_neigh_output.
So, rename it to neigh_output as before commit 5110effee8fd
("net: Do delayed neigh confirmation.").

Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2017-02-12 10:25:18 +0800