Eric Lee / smarc-fsl-linux-kernel

27 Nov, 2017

2 commits

67c8d22a7 openvswitch: fix the incorrect flow action alloc size ... Browse Code »

If we want to add a datapath flow, which has more than 500 vxlan outputs'
action, we will get the following error reports:
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Actions may not be safe on all matching packets
... ...

It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
this is not the root cause. For example, for a vxlan output action, we need
about 60 bytes for the nlattr, but after it is converted to the flow
action, it only occupies 24 bytes. This means that we can still support
more than 1000 vxlan output actions for a single datapath flow under the
the current 32k max limitation.

So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
shouldn't report EINVAL and keep it move on, as the judgement can be
done by the reserve_sfa_size.

Signed-off-by: zhangliping
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

zhangliping
2017-11-27 07:34:59 +0800
2734166e8 net: openvswitch: datapath: fix data type in queue_gso_packets ... Browse Code »

gso_type is being used in binary AND operations together with SKB_GSO_UDP.
The issue is that variable gso_type is of type unsigned short and
SKB_GSO_UDP expands to more than 16 bits:

SKB_GSO_UDP = 1 << 16

this makes any binary AND operation between gso_type and SKB_GSO_UDP to
be always zero, hence making some code unreachable and likely causing
undesired behavior.

Fix this by changing the data type of variable gso_type to unsigned int.

Addresses-Coverity-ID: 1462223
Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
Signed-off-by: Gustavo A. R. Silva
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2017-11-27 01:15:33 +0800

24 Nov, 2017

1 commit

0c19f846d net: accept UFO datagrams from tuntap and packet ... Browse Code »

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.

A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0

guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt

Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:

./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure

Link: http://lkml.kernel.org/r/
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek
Signed-off-by: Willem de Bruijn
Acked-by: Jason Wang
Signed-off-by: David S. Miller

Willem de Bruijn
2017-11-24 00:37:35 +0800

15 Nov, 2017

1 commit

b74912a2f openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start ... Browse Code »

It seems that the intention of the code is to null check the value
returned by function genlmsg_put. But the current code is null
checking the address of the pointer that holds the value returned
by genlmsg_put.

Fix this by properly null checking the value returned by function
genlmsg_put in order to avoid a pontential null pointer dereference.

Addresses-Coverity-ID: 1461561 ("Dereference before null check")
Addresses-Coverity-ID: 1461562 ("Dereference null return value")
Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2017-11-15 13:16:07 +0800

14 Nov, 2017

3 commits

6dc14dc40 openvswitch: Using kfree_rcu() to simplify the code ... Browse Code »

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Wei Yongjun
2017-11-14 21:02:08 +0800
06c2351fd openvswitch: Make local function ovs_nsh_key_attr_size() static ... Browse Code »

Fixes the following sparse warnings:

net/openvswitch/flow_netlink.c:340:8: warning:
symbol 'ovs_nsh_key_attr_size' was not declared. Should it be static?

Signed-off-by: Wei Yongjun
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Wei Yongjun
2017-11-14 21:02:08 +0800
8a860c2bc openvswitch: Fix return value check in ovs_meter_cmd_features() ... Browse Code »

In case of error, the function ovs_meter_cmd_reply_start() returns
ERR_PTR() not NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Wei Yongjun
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Wei Yongjun
2017-11-14 21:02:08 +0800

13 Nov, 2017

3 commits

cd8a6c336 openvswitch: Add meter action support ... Browse Code »

Implements OVS kernel meter action support.

Signed-off-by: Andy Zhou
Signed-off-by: David S. Miller

Andy Zhou
2017-11-13 09:37:07 +0800
96fbc13d7 openvswitch: Add meter infrastructure ... Browse Code »

OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.

Signed-off-by: Andy Zhou
Signed-off-by: David S. Miller

Andy Zhou
2017-11-13 09:37:07 +0800
9602c01e5 openvswitch: export get_dp() API. ... Browse Code »

Later patches will invoke get_dp() outside of datapath.c. Export it.

Signed-off-by: Andy Zhou
Signed-off-by: David S. Miller

Andy Zhou
2017-11-13 09:37:07 +0800

08 Nov, 2017

1 commit

b2d0f5d5d openvswitch: enable NSH support ... Browse Code »

v16->17
- Fixed disputed check code: keep them in nsh_push and nsh_pop
but also add them in __ovs_nla_copy_actions

v15->v16
- Add csum recalculation for nsh_push, nsh_pop and set_nsh
pointed out by Pravin
- Move nsh key into the union with ipv4 and ipv6 and add
check for nsh key in match_validate pointed out by Pravin
- Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
- Check size in nsh_hdr_from_nlattr
- Fixed four small issues pointed out By Jiri and Eric

v13->v14
- Rename skb_push_nsh to nsh_push per Dave's comment
- Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
- Fix NSH header length check in set_nsh

v11->v12
- Fix missing changes old comments pointed out
- Fix new comments for v11

v10->v11
- Fix the left three disputable comments for v9
but not fixed in v10.

v9->v10
- Change struct ovs_key_nsh to
struct ovs_nsh_key_base base;
__be32 context[NSH_MD1_CONTEXT_SIZE];
- Fix new comments for v9

v8->v9
- Fix build error reported by daily intel build
because nsh module isn't selected by openvswitch

v7->v8
- Rework nested value and mask for OVS_KEY_ATTR_NSH
- Change pop_nsh to adapt to nsh kernel module
- Fix many issues per comments from Jiri Benc

v6->v7
- Remove NSH GSO patches in v6 because Jiri Benc
reworked it as another patch series and they have
been merged.
- Change it to adapt to nsh kernel module added by NSH
GSO patch series

v5->v6
- Fix the rest comments for v4.
- Add NSH GSO support for VxLAN-gpe + NSH and
Eth + NSH.

v4->v5
- Fix many comments by Jiri Benc and Eric Garver
for v4.

v3->v4
- Add new NSH match field ttl
- Update NSH header to the latest format
which will be final format and won't change
per its author's confirmation.
- Fix comments for v3.

v2->v3
- Change OVS_KEY_ATTR_NSH to nested key to handle
length-fixed attributes and length-variable
attriubte more flexibly.
- Remove struct ovs_action_push_nsh completely
- Add code to handle nested attribute for SET_MASKED
- Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
to transfer NSH header data.
- Fix comments and coding style issues by Jiri and Eric

v1->v2
- Change encap_nsh and decap_nsh to push_nsh and pop_nsh
- Dynamically allocate struct ovs_action_push_nsh for
length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang
Acked-by: Jiri Benc
Acked-by: Eric Garver
Acked-by: Pravin Shelar
Signed-off-by: David S. Miller

Yi Yang
2017-11-08 15:12:33 +0800

05 Nov, 2017

1 commit

9354d4520 openvswitch: reliable interface indentification in port dumps ... Browse Code »

This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.

Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.

The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.

On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2017-11-05 20:49:17 +0800

04 Nov, 2017

1 commit

2a171788b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2017-11-04 08:26:51 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

22 Oct, 2017

1 commit

279badc2a openvswitch: conntrack: mark expected switch fall-through ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed a "fall through" comment on
its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller

Gustavo A. R. Silva
2017-10-22 09:01:26 +0800

11 Oct, 2017

1 commit

b8226962b openvswitch: add ct_clear action ... Browse Code »

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.

Signed-off-by: Eric Garver
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Eric Garver
2017-10-11 07:38:34 +0800

10 Oct, 2017

1 commit

ceaa001a1 openvswitch: Add erspan tunnel support. ... Browse Code »

Add erspan netlink interface for OVS.

Signed-off-by: William Tu
Cc: Pravin B Shelar
Signed-off-by: David S. Miller

William Tu
2017-10-10 11:45:50 +0800

05 Oct, 2017

1 commit

42ab19ee9 net: Add extack to upper device linking ... Browse Code »

Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2017-10-05 12:39:33 +0800

13 Sep, 2017

1 commit

5829e62ac openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()' ... Browse Code »

All other error handling paths in this function go through the 'error'
label. This one should do the same.

Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Christophe JAILLET
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Christophe JAILLET
2017-09-13 11:37:31 +0800

04 Sep, 2017

1 commit

b63f6044d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:

1) Add expection to hashes after timer initialization to prevent
access from another CPU that walks on the hashes and calls
del_timer(), from Florian Westphal.

2) Don't update nf_tables chain counters from hot path, this is only
used by the x_tables compatibility layer.

3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
Hooks are always guaranteed to run from rcu read side, so remove
nested rcu_read_lock() where possible. Patch from Taehee Yoo.

4) nf_tables new ruleset generation notifications include PID and name
of the process that has updated the ruleset, from Phil Sutter.

5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
the nf_family netdev family. Patch from Pablo M. Bermudo.

6) Add support for nft_fib in nf_tables netdev family, also from Pablo.

7) Use deferrable workqueue for conntrack garbage collection, to reduce
power consumption, from Patch from Subash Abhinov Kasiviswanathan.

8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
Westphal.

9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.

10) Drop references on conntrack removal path when skbuffs has escaped via
nfqueue, from Florian.

11) Don't queue packets to nfqueue with dying conntrack, from Florian.

12) Constify nf_hook_ops structure, from Florian.

13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.

14) Add nla_strdup(), from Phil Sutter.

15) Rise nf_tables objects name size up to 255 chars, people want to use
DNS names, so increase this according to what RFC 1035 specifies.
Patch series from Phil Sutter.

16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
registration on demand, suggested by Eric Dumazet, patch from Florian.

17) Remove unused variables in compat_copy_entry_from_user both in
ip_tables and arp_tables code. Patch from Taehee Yoo.

18) Constify struct nf_conntrack_l4proto, from Julia Lawall.

19) Constify nf_loginfo structure, also from Julia.

20) Use a single rb root in connlimit, from Taehee Yoo.

21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.

22) Use audit_log() instead of open-coding it, from Geliang Tang.

23) Allow to mangle tcp options via nft_exthdr, from Florian.

24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
a fix for a miscalculation of the minimal length.

25) Simplify branch logic in h323 helper, from Nick Desaulniers.

26) Calculate netlink attribute size for conntrack tuple at compile
time, from Florian.

27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
From Florian.

28) Remove holes in nf_conntrack_l4proto structure, so it becomes
smaller. From Florian.

29) Get rid of print_tuple() indirection for /proc conntrack listing.
Place all the code in net/netfilter/nf_conntrack_standalone.c.
Patch from Florian.

30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
off. From Florian.

31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
Florian.

32) Fix broken indentation in ebtables extensions, from Colin Ian King.

33) Fix several harmless sparse warning, from Florian.

34) Convert netfilter hook infrastructure to use array for better memory
locality, joint work done by Florian and Aaron Conole. Moreover, add
some instrumentation to debug this.

35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
per batch, from Florian.

36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.

37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.

38) Remove unused code in the generic protocol tracker, from Davide
Caratti.

I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================

Signed-off-by: David S. Miller

David S. Miller
2017-09-04 08:08:42 +0800

25 Aug, 2017

1 commit

b3480fe05 netfilter: conntrack: make protocol tracker pointers const ... Browse Code »

Doesn't change generated code, but will make it easier to eventually
make the actual trackers themselvers const.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2017-08-25 00:52:33 +0800

22 Aug, 2017

1 commit

e2a7c34fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2017-08-22 08:06:42 +0800

17 Aug, 2017

1 commit

494bea39f openvswitch: fix skb_panic due to the incorrect actions attrlen ... Browse Code »

For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:

[] skb_put+0x43/0x44
[] skb_zerocopy+0x6c/0x1f4
[] queue_userspace_packet+0x3a3/0x448 [openvswitch]
[] ovs_dp_upcall+0x30/0x5c [openvswitch]
[] output_userspace+0x132/0x158 [openvswitch]
[] ? ip6_rcv_finish+0x74/0x77 [ipv6]
[] do_execute_actions+0xcc1/0xdc8 [openvswitch]
[] ovs_execute_actions+0x74/0x106 [openvswitch]
[] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
[] ? key_extract+0x63c/0x8d5 [openvswitch]
[] ovs_vport_receive+0xa1/0xc3 [openvswitch]
[...]

Also we can find that the actions_len is much little than the orig_len:
crash> struct sw_flow_actions 0xffff8812f539d000
struct sw_flow_actions {
rcu = {
next = 0xffff8812f5398800,
func = 0xffffe3b00035db32
},
orig_len = 1384,
actions_len = 592,
actions = 0xffff8812f539d01c
}

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee
Signed-off-by: Liping Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Liping Zhang
2017-08-17 05:12:37 +0800

12 Aug, 2017

1 commit

0ed80da51 openvswitch: Remove unnecessary newlines from OVS_NLERR uses ... Browse Code »

OVS_NLERR already adds a newline so these just add blank
lines to the logging.

Signed-off-by: Joe Perches
Acked-by: Joe Stringer
Signed-off-by: David S. Miller

Joe Perches
2017-08-12 05:52:47 +0800

02 Aug, 2017

1 commit

29fda25a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).

Signed-off-by: David S. Miller

David S. Miller
2017-08-02 01:07:50 +0800

25 Jul, 2017

1 commit

69ec932e3 openvswitch: fix potential out of bound access in parse_ct ... Browse Code »

Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Liping Zhang
2017-07-25 07:25:06 +0800

21 Jul, 2017

1 commit

7a68ada6e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2017-07-21 10:38:43 +0800

20 Jul, 2017

2 commits

c4b2bf6b4 openvswitch: Optimize operations for OvS flow_stats. ... Browse Code »

When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).

The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.

VM01 --- OvS-VM --- VM02

Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.

4.23% [kernel] [k] _raw_spin_unlock_irqrestore
3.62% [kernel] [k] __do_softirq
3.16% [kernel] [k] __memcpy
3.02% [kernel] [k] flow_free
2.42% libc-2.17.so [.] __memcpy_ssse3_back
2.18% [kernel] [k] copy_user_generic_unrolled
2.17% [kernel] [k] find_next_bit

When applied this patch, perf-top show as below: Not shown on
the list anymore.

4.11% [kernel] [k] _raw_spin_unlock_irqrestore
3.79% [kernel] [k] __do_softirq
3.46% [kernel] [k] __memcpy
2.73% libc-2.17.so [.] __memcpy_ssse3_back
2.25% [kernel] [k] copy_user_generic_unrolled
1.89% libc-2.17.so [.] _int_malloc
1.53% ovs-vswitchd [.] xlate_actions

With this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).

This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.

Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-20 04:49:39 +0800
c57c054eb openvswitch: Optimize updating for OvS flow_stats. ... Browse Code »

In the ovs_flow_stats_update(), we only use the node
var to alloc flow_stats struct. But this is not a
common case, it is unnecessary to call the numa_node_id()
everytime. This patch is not a bugfix, but there maybe
a small increase.

Signed-off-by: Tonghao Zhang
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-20 04:49:39 +0800

18 Jul, 2017

1 commit

880388aa3 net: Remove all references to SKB_GSO_UDP. ... Browse Code »

Such packets are no longer possible.

Signed-off-by: David S. Miller

David S. Miller
2017-07-18 00:52:58 +0800

16 Jul, 2017

1 commit

8b97ac5bd openvswitch: Fix for force/commit action failures ... Browse Code »

When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer
Signed-off-by: Greg Rose
Signed-off-by: David S. Miller

Greg Rose
2017-07-16 05:40:21 +0800

03 Jul, 2017

1 commit

52427fa06 openvswitch: fix mis-ordered comment lines for ovs_skb_cb ... Browse Code »

I was trying to wrap my head around meaning of mru, and realised
that the second line of the comment defining it had somehow
ended up after the line defining cutlen, leading to much confusion.

Reorder the lines to make sense.

Signed-off-by: Daniel Axtens
Signed-off-by: David S. Miller

Daniel Axtens
2017-07-03 20:54:00 +0800

02 Jul, 2017

1 commit

9cc9a5cb1 datapath: Avoid using stack larger than 1024. ... Browse Code »

When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

CC [M] /root/ovs/datapath/linux/datapath.o
/root/ovs/datapath/linux/datapath.c: In function
'ovs_flow_cmd_set':
/root/ovs/datapath/linux/datapath.c:1221:1: warning:
the frame size of 1040 bytes is larger than 1024 bytes
[-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Tonghao Zhang
2017-07-02 00:09:59 +0800

25 Jun, 2017

1 commit

3fcece12b net: store port/representator id in metadata_dst ... Browse Code »

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id. This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission. The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski
Signed-off-by: Sridhar Samudrala
Signed-off-by: Simon Horman
Signed-off-by: David S. Miller

Jakub Kicinski
2017-06-25 23:42:01 +0800

21 Jun, 2017

1 commit

dc5321d79 vxlan: get rid of redundant vxlan_dev.flags ... Browse Code »

There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer
Signed-off-by: David S. Miller

Matthias Schiffer
2017-06-21 01:37:02 +0800

16 Jun, 2017

1 commit

b080db585 networking: convert many more places to skb_put_zero() ... Browse Code »

There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_zero(skb, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(skb_put(skb, len), 0, len);
+skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-06-16 23:48:35 +0800

15 Jun, 2017

1 commit

0ddead90b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller

David S. Miller
2017-06-15 23:59:32 +0800

08 Jun, 2017

1 commit

cf124db56 net: Fix inconsistent teardown and release of private netdev state. ... Browse Code »

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller

David S. Miller
2017-06-08 03:53:24 +0800

23 May, 2017

1 commit

218b6a5b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2017-05-23 11:32:48 +0800

20 May, 2017

1 commit

7529390d0 openvswitch: more accurate checksumming in queue_userspace_packet() ... Browse Code »

if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs
CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid
corrupting such packets while queueing them towards userspace.

Signed-off-by: Davide Caratti
Signed-off-by: David S. Miller

Davide Caratti
2017-05-20 07:21:29 +0800