Eric Lee / smarc-fsl-linux-kernel

28 Dec, 2016

1 commit

df30f7408 openvswitch: upcall: Fix vlan handling. ... Browse Code »

Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme
CC: Jiri Benc
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

pravin shelar
2016-12-28 01:28:07 +0800

21 Dec, 2016

1 commit

87e159c59 openvswitch: Add a missing break statement. ... Browse Code »

Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme
Acked-by: Pravin B Shelar
Acked-by: Jiri Benc
Signed-off-by: David S. Miller

Jarno Rajahalme
2016-12-21 03:07:41 +0800

04 Dec, 2016

1 commit

2745529ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller

David S. Miller
2016-12-04 01:29:53 +0800

01 Dec, 2016

1 commit

f92a80a99 openvswitch: Fix skb leak in IPv6 reassembly. ... Browse Code »

If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it
means that we still have a reference to the skb. We should free it
before returning from handle_fragments, as stated in the comment above.

Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion")
CC: Florian Westphal
CC: Pravin B Shelar
CC: Joe Stringer
Signed-off-by: Daniele Di Proietto
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Daniele Di Proietto
2016-12-01 00:00:45 +0800

18 Nov, 2016

1 commit

c7d03a00b netns: make struct pernet_operations::id unsigned int ... Browse Code »

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2016-11-18 23:59:15 +0800

14 Nov, 2016

1 commit

7d384846b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.

Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:

nft add table netdev x
nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
nft add rule netdev x y drop

And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:

17.30% kpktgend_0 [nf_tables] [k] nft_do_chain
15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev

I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

This rework contains more specifically, in strict order, these patches:

1) Remove compile-time debugging from core.

2) Remove obsolete comments that predate the rcu era. These days it is
well known that a Netfilter hook always runs under rcu_read_lock().

3) Remove threshold handling, this is only used by br_netfilter too.
We already have specific code to handle this from br_netfilter,
so remove this code from the core path.

4) Deprecate NF_STOP, as this is only used by br_netfilter.

5) Place nf_state_hook pointer into xt_action_param structure, so
this structure fits into one single cacheline according to pahole.
This also implicit affects nftables since it also relies on the
xt_action_param structure.

6) Move state->hook_entries into nf_queue entry. The hook_entries
pointer is only required by nf_queue(), so we can store this in the
queue entry instead.

7) use switch() statement to handle verdict cases.

8) Remove hook_entries field from nf_hook_state structure, this is only
required by nf_queue, so store it in nf_queue_entry structure.

9) Merge nf_iterate() into nf_hook_slow() that results in a much more
simple and readable function.

10) Handle NF_REPEAT away from the core, so far the only client is
nf_conntrack_in() and we can restart the packet processing using a
simple goto to jump back there when the TCP requires it.
This update required a second pass to fix fallout, fix from
Arnd Bergmann.

11) Set random seed from nft_hash when no seed is specified from
userspace.

12) Simplify nf_tables expression registration, in a much smarter way
to save lots of boiler plate code, by Liping Zhang.

13) Simplify layer 4 protocol conntrack tracker registration, from
Davide Caratti.

14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
to recent generalization of the socket infrastructure, from Arnd
Bergmann.

15) Then, the ipset batch from Jozsef, he describes it as it follows:

* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
because it is unnecessary to zero out the area just before
explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
across all set types.
* Count non-static extension memory into memsize calculation for
userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
by Muhammad Falak R Wani, individually for the set type families.

16) Remove useless connlabel field in struct netns_ct, patch from
Florian Westphal.

17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
{ip,ip6,arp}tables code that uses this.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-11-14 11:41:25 +0800

13 Nov, 2016

8 commits

217ac77a3 openvswitch: allow L3 netdev ports ... Browse Code »

Allow ARPHRD_NONE interfaces to be added to ovs bridge.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab
Signed-off-by: Simon Horman
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
91820da6a openvswitch: add Ethernet push and pop actions ... Browse Code »

It's not allowed to push Ethernet header in front of another Ethernet
header.

It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab
Signed-off-by: Simon Horman
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
0a6410fbd openvswitch: netlink: support L3 packets ... Browse Code »

Extend the ovs flow netlink protocol to support L3 packets. Packets without
OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the
OVS_KEY_ATTR_ETHERTYPE attribute is mandatory.

Push/pop vlan actions are only supported for Ethernet packets.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab
Signed-off-by: Simon Horman
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
5108bbadd openvswitch: add processing of L3 packets ... Browse Code »

Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).

Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab
Signed-off-by: Simon Horman
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
1560a074d openvswitch: support MPLS push and pop for L3 packets ... Browse Code »

Update Ethernet header only if there is one.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
e2d9d8358 openvswitch: pass mac_proto to ovs_vport_send ... Browse Code »

We'll need it to alter packets sent to ARPHRD_NONE interfaces.

Change do_output() to use the actual L2 header size of the packet when
deciding on the minimum cutlen. The assumption here is that what matters is
not the output interface hard_header_len but rather the L2 header of the
particular packet. For example, ARPHRD_NONE tunnels that encapsulate
Ethernet should get at least the Ethernet header.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
329f45bc4 openvswitch: add mac_proto field to the flow key ... Browse Code »

Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.

It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.

Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800
738314a08 openvswitch: use hard_header_len instead of hardcoded ETH_HLEN ... Browse Code »

On tx, use hard_header_len while deciding whether to refragment or drop the
packet. That way, all combinations are calculated correctly:

* L2 packet going to L2 interface (the L2 header len is subtracted),
* L2 packet going to L3 interface (the L2 header is included in the packet
lenght),
* L3 packet going to L3 interface.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-11-13 13:51:02 +0800

03 Nov, 2016

1 commit

08733a0cb netfilter: handle NF_REPEAT from nf_conntrack_in() ... Browse Code »

NF_REPEAT is only needed from nf_conntrack_in() under a very specific
case required by the TCP protocol tracker, we can handle this case
without returning to the core hook path. Handling of NF_REPEAT from the
nf_reinject() is left untouched.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2016-11-03 18:53:00 +0800

28 Oct, 2016

3 commits

56989f6d8 genetlink: mark families as __ro_after_init ... Browse Code »

Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.

In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.

This protects the data structure from accidental corruption.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2016-10-28 04:16:09 +0800
489111e5c genetlink: statically initialize families ... Browse Code »

Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.

This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2016-10-28 04:16:09 +0800
a07ea4d99 genetlink: no longer support using static family IDs ... Browse Code »

Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.

Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)

Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2016-10-28 04:16:09 +0800

21 Oct, 2016

1 commit

91572088e net: use core MTU range checking in core net infra ... Browse Code »

geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function
- This one is also not as straight-forward and could use closer inspection
and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel
CC: Hannes Frederic Sowa
CC: Tom Herbert
CC: Daniel Borkmann
CC: Alexander Duyck
CC: Paolo Abeni
CC: Jiri Benc
CC: WANG Cong
CC: Roopa Prabhu
CC: Pravin B Shelar
CC: Sabrina Dubroca
CC: Patrick McHardy
CC: Stephen Hemminger
CC: Pravin Shelar
CC: Maxim Krasnyansky
Signed-off-by: Jarod Wilson
Signed-off-by: David S. Miller

Jarod Wilson
2016-10-21 02:51:09 +0800

20 Oct, 2016

2 commits

76e4cc773 openvswitch: remove unnecessary EXPORT_SYMBOLs ... Browse Code »

Some symbols exported to other modules are really used only by
openvswitch.ko. Remove the exports.

Tested by loading all 4 openvswitch modules, nothing breaks.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-10-20 03:11:55 +0800
f33eb0cf9 openvswitch: remove unused functions ... Browse Code »

ovs_vport_deferred_free is not used anywhere. It's the only caller of
free_vport_rcu thus this one can be removed, too.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-10-20 03:11:55 +0800

14 Oct, 2016

1 commit

8eed1cd4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2016-10-14 22:00:27 +0800

13 Oct, 2016

3 commits

3145c037e openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev ... Browse Code »

The internal device does support 802.1AD offloading since 018c1dda5ff1
("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
attributes").

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Eric Garver
Signed-off-by: David S. Miller

Jiri Benc
2016-10-13 22:03:23 +0800
72ec108d7 openvswitch: fix vlan subtraction from packet length ... Browse Code »

When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
header is not counted in skb->len. It doesn't make sense to subtract it.

Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Eric Garver
Signed-off-by: David S. Miller

Jiri Benc
2016-10-13 22:03:23 +0800
20ecf1e4e openvswitch: vlan: remove wrong likely statement ... Browse Code »

This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.

Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Acked-by: Eric Garver
Signed-off-by: David S. Miller

Jiri Benc
2016-10-13 22:03:23 +0800

12 Oct, 2016

1 commit

c66549ffd openvswitch: correctly fragment packet with mpls headers ... Browse Code »

If mpls headers were pushed to a defragmented packet, the refragmentation no
longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
network header has to be shifted after the mpls headers for the
fragmentation and restored afterwards.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2016-10-12 13:42:52 +0800

03 Oct, 2016

2 commits

85de4a210 openvswitch: use mpls_hdr ... Browse Code »

skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
instead.

Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-10-03 14:00:22 +0800
f7d49bce8 openvswitch: mpls: set network header correctly on key extract ... Browse Code »

After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2016-10-03 14:00:21 +0800

21 Sep, 2016

2 commits

2279994d0 openvswitch: avoid resetting flow key while installing new flow. ... Browse Code »

since commit commit db74a3335e0f6 ("openvswitch: use percpu
flow stats") flow alloc resets flow-key. So there is no need
to reset the flow-key again if OVS is using newly allocated
flow-key.

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

pravin shelar
2016-09-21 10:54:35 +0800
190aa3e77 openvswitch: Fix Frame-size larger than 1024 bytes warning. ... Browse Code »

There is no need to declare separate key on stack,
we can just use sw_flow->key to store the key directly.

This commit fixes following warning:

net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’:
net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes
is larger than 1024 bytes [-Wframe-larger-than=]

Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

pravin shelar
2016-09-21 10:54:35 +0800

19 Sep, 2016

2 commits

db74a3335 openvswitch: use percpu flow stats ... Browse Code »

Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.

On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.

This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.

Signed-off-by: Thadeu Lima de Souza Cascardo
Cc: pravin shelar
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-19 10:14:01 +0800
40773966c openvswitch: fix flow stats accounting when node 0 is not possible ... Browse Code »

On a system with only node 1 as possible, all statistics is going to be
accounted on node 0 as it will have a single writer.

However, when getting and clearing the statistics, node 0 is not going
to be considered, as it's not a possible node.

Tested that statistics are not zero on a system with only node 1
possible. Also compile-tested with CONFIG_NUMA off.

Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-19 10:14:01 +0800

16 Sep, 2016

1 commit

2679d0404 openvswitch: avoid deferred execution of recirc actions ... Browse Code »

The ovs kernel data path currently defers the execution of all
recirc actions until stack utilization is at a minimum.
This is too limiting for some packet forwarding scenarios due to
the small size of the deferred action FIFO (10 entries). For
example, broadcast traffic sent out more than 10 ports with
recirculation results in packet drops when the deferred action
FIFO becomes full, as reported here:

http://openvswitch.org/pipermail/dev/2016-March/067672.html

Since the current recursion depth is available (it is already tracked
by the exec_actions_level pcpu variable), we can use it to determine
whether to execute recirculation actions immediately (safe when
recursion depth is low) or defer execution until more stack space is
available.

With this change, the deferred action fifo size becomes a non-issue
for currently failing scenarios because it is no longer used when
there are three or fewer recursions through ovs_execute_actions().

Suggested-by: Pravin Shelar
Signed-off-by: Lance Richardson
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Lance Richardson
2016-09-16 08:35:52 +0800

11 Sep, 2016

1 commit

ed227099d openvswitch: use alias for genetlink family names ... Browse Code »

When userspace tries to create datapaths and the module is not loaded,
it will simply fail. With this patch, the module will be automatically
loaded.

Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thadeu Lima de Souza Cascardo
2016-09-11 12:42:46 +0800

09 Sep, 2016

1 commit

018c1dda5 openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes ... Browse Code »

Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.

Signed-off-by: Thomas F Herbert
Signed-off-by: Eric Garver
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Eric Garver
2016-09-09 08:10:28 +0800

05 Sep, 2016

1 commit

76644232e openvswitch: Free tmpl with tmpl_free. ... Browse Code »

When an error occurs during conntrack template creation as part of
actions validation, we need to free the template. Previously we've been
using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate.

Signed-off-by: Joe Stringer
Signed-off-by: David S. Miller

Joe Stringer
2016-09-05 02:38:10 +0800

31 Aug, 2016

1 commit

48d2ab609 net: mpls: Fixups for GSO ... Browse Code »

As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
protocol layers are not getting run, and

2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.

Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.

Reported-by: Lennert Buytenhek
Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-08-31 13:27:18 +0800

11 Aug, 2016

1 commit

4b5b9ba55 openvswitch: do not ignore netdev errors when creating tunnel vports ... Browse Code »

The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.

For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.

Signed-off-by: Martynas Pumputis
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Martynas Pumputis
2016-08-11 14:13:23 +0800

06 Aug, 2016

1 commit

5ef9f289c OVS: Ignore negative headroom value ... Browse Code »

net_device->ndo_set_rx_headroom (introduced in
871b642adebe300be2e50aa5f65a418510f636ec) says

"Setting a negtaive value reset the rx headroom
to the default value".

It seems that the OVS implementation in
3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
dev->needed_headroom unconditionally.

This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.

Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.

Thanks to Ben for some pointers from the crash dumps!

Cc: Benjamin Poirier
Cc: Paolo Abeni
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
Signed-off-by: Ian Wienand
Signed-off-by: David S. Miller

Ian Wienand
2016-08-06 12:06:11 +0800

04 Aug, 2016

1 commit

bce91f8a4 openvswitch: Remove incorrect WARN_ONCE(). ... Browse Code »

ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.

Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
Suggested-by: Joe Stringer
Signed-off-by: Jarno Rajahalme
Acked-by: Joe Stringer
Signed-off-by: David S. Miller

Jarno Rajahalme
2016-08-04 02:50:40 +0800