Eric Lee / smarc-fsl-linux-kernel

27 Oct, 2015

1 commit

bc845f677 ovs: do not allocate memory from offline numa node ... Browse Code »

[ Upstream commit 598c12d0ba6de9060f04999746eb1e015774044b ]

When openvswitch tries allocate memory from offline numa node 0:
stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
[ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
This patch disables numa affinity in this case.

Signed-off-by: Konstantin Khlebnikov
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2015-10-27 08:51:50 +0800

03 Oct, 2015

1 commit

6d80e3507 openvswitch: Zero flows on allocation. ... Browse Code »

[ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ]

When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.

While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.

In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.

This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.

Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Jesse Gross
2015-10-03 19:49:16 +0800

04 Jun, 2015

1 commit

640b2b107 openvswitch: disable LRO ... Browse Code »

Currently, openvswitch tries to disable LRO from the user space. This does
not work correctly when the device added is a vlan interface, though.
Instead of dealing with possibly complex stacked cross name space relations
in the user space, do the same as bridging does and call dev_disable_lro in
the kernel.

Signed-off-by: Jiri Benc
Acked-by: Flavio Leitner
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jiri Benc
2015-06-04 10:39:35 +0800

16 Apr, 2015

1 commit

6c373ca89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Add BQL support to via-rhine, from Tino Reichardt.

2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
can support hw switch offloading. From Floria Fainelli.

3) Allow 'ip address' commands to initiate multicast group join/leave,
from Madhu Challa.

4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
Borkmann.

6) Remove the ugly compat support in ARP for ugly layers like ax25,
rose, etc. And use this to clean up the neigh layer, then use it to
implement MPLS support. All from Eric Biederman.

7) Support L3 forwarding offloading in switches, from Scott Feldman.

8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
up route lookups even further. From Alexander Duyck.

9) Many improvements and bug fixes to the rhashtable implementation,
from Herbert Xu and Thomas Graf. In particular, in the case where
an rhashtable user bulk adds a large number of items into an empty
table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
established in the main hash table. Much less false sharing since
hash lookups go direct to the request sockets instead of having to
go first to the listener then to the request socks hashed
underneath. From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6. From
Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
fm10k: Bump driver version to 0.15.2
fm10k: corrected VF multicast update
fm10k: mbx_update_max_size does not drop all oversized messages
fm10k: reset head instead of calling update_max_size
fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
fm10k: update xcast mode before synchronizing multicast addresses
fm10k: start service timer on probe
fm10k: fix function header comment
fm10k: comment next_vf_mbx flow
fm10k: don't handle mailbox events in iov_event path and always process mailbox
fm10k: use separate workqueue for fm10k driver
fm10k: Set PF queues to unlimited bandwidth during virtualization
fm10k: expose tx_timeout_count as an ethtool stat
fm10k: only increment tx_timeout_count in Tx hang path
fm10k: remove extraneous "Reset interface" message
fm10k: separate PF only stats so that VF does not display them
fm10k: use hw->mac.max_queues for stats
fm10k: only show actual queues, not the maximum in hardware
fm10k: allow creation of VLAN on default vid
fm10k: fix unused warnings
...

Linus Torvalds
2015-04-16 00:00:47 +0800

15 Apr, 2015

1 commit

4167e9b2c mm: remove GFP_THISNODE ... Browse Code »

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected. It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE. The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity. Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should. It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.

Signed-off-by: David Rientjes
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Joonsoo Kim
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Pravin Shelar
Cc: Jarno Rajahalme
Cc: Li Zefan
Cc: Greg Thelen
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2015-04-15 07:49:03 +0800

08 Apr, 2015

1 commit

79b16aade udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). ... Browse Code »

That was we can make sure the output path of ipv4/ipv6 operate on
the UDP socket rather than whatever random thing happens to be in
skb->sk.

Based upon a patch by Jiri Pirko.

Signed-off-by: David S. Miller
Acked-by: Hannes Frederic Sowa

David Miller
2015-04-08 03:29:08 +0800

03 Apr, 2015

1 commit

9f0d34bc3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/usb/asix_common.c
drivers/net/usb/sr9800.c
drivers/net/usb/usbnet.c
include/linux/usb/usbnet.h
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller

David S. Miller
2015-04-03 04:16:53 +0800

01 Apr, 2015

3 commits

fa2d8ff4e openvswitch: Return vport module ref before destruction ... Browse Code »

Return module reference before invoking the respective vport
->destroy() function. This is needed as ovs_vport_del() is not
invoked inside an RCU read side critical section so the kfree
can occur immediately before returning to ovs_vport_del().

Returning the module reference before ->destroy() is safe because
the module unregistration is blocked on ovs_lock which we hold
while destroying the datapath.

Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules")
Reported-by: Pravin Shelar
Signed-off-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thomas Graf
2015-04-01 03:59:50 +0800
67b61f6c1 netlink: implement nla_get_in_addr and nla_get_in6_addr ... Browse Code »

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800
930345ea6 netlink: implement nla_put_in_addr and nla_put_in6_addr ... Browse Code »

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-04-01 01:58:35 +0800

13 Mar, 2015

2 commits

0c5c9fb55 net: Introduce possible_net_t ... Browse Code »

Having to say
> #ifdef CONFIG_NET_NS
> struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
> struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman"
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 02:39:40 +0800
efd7ef1c1 net: Kill hold_net release_net ... Browse Code »

hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008. Kill the code it is long past due.

Signed-off-by: "Eric W. Biederman"
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-13 02:39:40 +0800

09 Mar, 2015

1 commit

7d5f41f27 mpls: Fix the openvswitch select of NET_MPLS_GSO ... Browse Code »

Fix the OPENVSWITCH Kconfig option and old Kconfigs by having
OPENVSWITCH select both NET_MPLS_GSO and MPLSO.

A Kbuild test robot reported that when NET_MPLS_GSO is selected by
OPENVSWITCH the generated .config is broken because MPLS is not
selected.

Cc: Simon Horman
Fixes: cec9166ca4e mpls: Refactor how the mpls module is built
Reported-by: kbuild test robot
Signed-off-by: "Eric W. Biederman"
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Eric W. Biederman
2015-03-09 07:30:06 +0800

04 Mar, 2015

1 commit

f4f8e7385 openvswitch: Fix serialization of non-masked set actions. ... Browse Code »

Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
back to regular set actions, the inner attribute length was not changed,
ie, double the length being serialized. This patch fixes the bug.

Fixes: 83d2b9b ("net: openvswitch: Support masked set actions.")
Signed-off-by: Joe Stringer
Acked-by: Jarno Rajahalme
Signed-off-by: David S. Miller

Joe Stringer
2015-03-04 03:38:57 +0800

21 Feb, 2015

1 commit

7b4577a9d openvswitch: Fix net exit. ... Browse Code »

Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
[] lookup_vport+0x21/0xd0 [openvswitch]
[] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
[] genl_family_rcv_msg+0x1bc/0x3e0
[] genl_rcv_msg+0x79/0xc0
[] netlink_rcv_skb+0xb9/0xe0
[] genl_rcv+0x2c/0x40
[] netlink_unicast+0x12d/0x1c0
[] netlink_sendmsg+0x34a/0x6b0
[] sock_sendmsg+0xa0/0xe0
[] ___sys_sendmsg+0x408/0x420
[] __sys_sendmsg+0x51/0x90
[] SyS_sendmsg+0x12/0x20
[] system_call_fastpath+0x12/0x17

Reported-by: Assaf Muller
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar
Reviewed-by: Thomas Graf
Signed-off-by: David S. Miller

Pravin B Shelar
2015-02-21 04:32:08 +0800

15 Feb, 2015

1 commit

26ad0b835 openvswitch: Fix key serialization. ... Browse Code »

Fix typo where mask is used rather than key.

Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
Reported-by: Joe Stringer
Signed-off-by: Pravin B Shelar
Acked-by: Joe Stringer
Signed-off-by: David S. Miller

Pravin B Shelar
2015-02-15 12:20:40 +0800

12 Feb, 2015

2 commits

13101602c openvswitch: Add missing initialization in validate_and_copy_set_tun() ... Browse Code »

net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function

If ipv4_tun_from_nlattr() returns a different positive value than
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
validate_and_copy_set_tun() may return an undefined value instead of a
zero success indicator. Initialize err to zero to fix this.

Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension")
Signed-off-by: Geert Uytterhoeven
Acked-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Geert Uytterhoeven
2015-02-12 06:40:15 +0800
b35725a28 openvswitch: Reset key metadata for packet execution. ... Browse Code »

Userspace packet execute command pass down flow key for given
packet. But userspace can skip some parameter with zero value.
Therefore kernel needs to initialize key metadata to zero.

Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Signed-off-by: Pravin B Shelar
Signed-off-by: David S. Miller

Pravin B Shelar
2015-02-12 06:40:15 +0800

10 Feb, 2015

1 commit

fd3137cd3 openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set ... Browse Code »

This avoids setting TUNNEL_VXLAN_OPT for VXLAN frames which don't
have any GBP metadata set. It is not invalid to set it but unnecessary.

Signed-off-by: Thomas Graf
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thomas Graf
2015-02-10 06:25:52 +0800

08 Feb, 2015

2 commits

ca539345f openvswitch: Initialize unmasked key and uid len ... Browse Code »

Flow alloc needs to initialize unmasked key pointer. Otherwise
it can crash kernel trying to free random unmasked-key pointer.

general protection fault: 0000 [#1] SMP
3.19.0-rc6-net-next+ #457
Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008
RIP: 0010:[] [] kfree+0xac/0x196
Call Trace:
[] flow_free+0x21/0x59 [openvswitch]
[] ovs_flow_free+0x21/0x23 [openvswitch]
[] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
[] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
[] ? nla_parse+0x4f/0xec
[] genl_family_rcv_msg+0x26d/0x2c9
[] ? __lock_acquire+0x90e/0x9aa
[] genl_rcv_msg+0x66/0x89
[] ? genl_family_rcv_msg+0x2c9/0x2c9
[] netlink_rcv_skb+0x3e/0x95
[] ? genl_rcv+0x18/0x37
[] genl_rcv+0x27/0x37
[] netlink_unicast+0x103/0x191
[] netlink_sendmsg+0x2c1/0x310
[] ? might_fault+0x50/0xa0
[] do_sock_sendmsg+0x5f/0x7a
[] sock_sendmsg+0xb/0xd
[] ___sys_sendmsg+0x1a3/0x218
[] ? get_close_on_exec+0x86/0x86
[] ? fsnotify+0x32c/0x348
[] ? fsnotify+0x7c/0x348
[] ? __fget+0xaa/0xbf
[] ? get_close_on_exec+0x86/0x86
[] __sys_sendmsg+0x3d/0x5e
[] SyS_sendmsg+0x14/0x16
[] system_call_fastpath+0x12/0x17

Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
CC: Joe Stringer
Reported-by: Or Gerlitz
Signed-off-by: Pravin B Shelar
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pravin B Shelar
2015-02-08 16:51:14 +0800
83d2b9ba1 net: openvswitch: Support masked set actions. ... Browse Code »

OVS userspace already probes the openvswitch kernel module for
OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module
implementation of masked set actions.

The existing set action sets many fields at once. When only a subset
of the IP header fields, for example, should be modified, all the IP
fields need to be exact matched so that the other field values can be
copied to the set action. A masked set action allows modification of
an arbitrary subset of the supported header bits without requiring the
rest to be matched.

Masked set action is now supported for all writeable key types, except
for the tunnel key. The set tunnel action is an exception as any
input tunnel info is cleared before action processing starts, so there
is no tunnel info to mask.

The kernel module converts all (non-tunnel) set actions to masked set
actions. This makes action processing more uniform, and results in
less branching and duplicating the action processing code. When
returning actions to userspace, the fully masked set actions are
converted back to normal set actions. We use a kernel internal action
code to be able to tell the userspace provided and converted masked
set actions apart.

Signed-off-by: Jarno Rajahalme
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Jarno Rajahalme
2015-02-08 14:40:17 +0800

29 Jan, 2015

1 commit

b8693877a openvswitch: Add support for checksums on UDP tunnels. ... Browse Code »

Currently, it isn't possible to request checksums on the outer UDP
header of tunnels - the TUNNEL_CSUM flag is ignored. This adds
support for requesting that UDP checksums be computed on transmit
and properly reported if they are present on receive.

Signed-off-by: Jesse Gross
Signed-off-by: David S. Miller

Jesse Gross
2015-01-29 15:04:15 +0800

27 Jan, 2015

4 commits

74ed7ab92 openvswitch: Add support for unique flow IDs. ... Browse Code »

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:50 +0800
272c2cf84 openvswitch: Use sw_flow_key_range for key ranges. ... Browse Code »

These minor tidyups make a future patch a little tidier.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:50 +0800
d29ab6f8a openvswitch: Refactor ovs_flow_tbl_insert(). ... Browse Code »

Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:49 +0800
5b4237bbc openvswitch: Refactor ovs_nla_fill_match(). ... Browse Code »

Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.

Signed-off-by: Joe Stringer
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Joe Stringer
2015-01-27 07:45:49 +0800

25 Jan, 2015

1 commit

af33c1ada vxlan: Eliminate dependency on UDP socket in transmit path ... Browse Code »

In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-01-25 15:15:40 +0800

18 Jan, 2015

1 commit

053c095a8 netlink: make nlmsg_end() and genlmsg_end() void ... Browse Code »

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for
Signed-off-by: David S. Miller

Johannes Berg
2015-01-18 14:03:45 +0800

15 Jan, 2015

7 commits

1dd144cf5 openvswitch: Support VXLAN Group Policy extension ... Browse Code »

Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.

ovs-vsctl add-port br0 vxlan0 -- \
set Interface vxlan0 type=vxlan options:exts=gbp

The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.

The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
internally just like Geneve options but transported as nested Netlink
attributes to user space.

Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.

The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800
81bfe3c3c openvswitch: Allow for any level of nesting in flow attributes ... Browse Code »

nlattr_set() is currently hardcoded to two levels of nesting. This change
introduces struct ovs_len_tbl to define minimal length requirements plus
next level nesting tables to traverse the key attributes to arbitrary depth.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800
d91641d9b openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() ... Browse Code »

Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800
3511494ce vxlan: Group Policy extension ... Browse Code »

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels. However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

Host 1: Host 2:

Group A Group B Group B Group A
+-----+ +-------------+ +-------+ +-----+
| lxc | | SELinux CTX | | httpd | | VM |
+--+--+ +--+----------+ +---+---+ +--+--+
\---+---/ \----+---/
| |
+---+---+ +---+---+
| vxlan | | vxlan |
+---+---+ +---+---+
+------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP frames. The extension is therefore disabled by default
and needs to be specifically enabled:

ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
iptables:
host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

OVS:
# ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
# ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 14:11:41 +0800
3f3558bb5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller

David S. Miller
2015-01-15 13:53:17 +0800
1ba398041 openvswitch: packet messages need their own probe attribtue ... Browse Code »

User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom
Tracked-down-by: Florian Westphal
Signed-off-by: Thomas Graf
Reviewed-by: Jesse Gross
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Thomas Graf
2015-01-15 05:49:44 +0800
3f4c1d87a openvswitch: Introduce ovs_tunnel_route_lookup ... Browse Code »

Introduce ovs_tunnel_route_lookup to consolidate route lookup
shared by vxlan, gre, and geneve ports.

Signed-off-by: Fan Du
Signed-off-by: David S. Miller

Fan Du
2015-01-15 05:32:06 +0800

14 Jan, 2015

2 commits

df8a39def net: rename vlan_tx_* helpers since "tx" is misleading there ... Browse Code »

The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2015-01-14 06:51:08 +0800
a440edf1f openvswitch: Remove unnecessary version.h inclusion ... Browse Code »

version.h inclusion is not necessary as detected by versioncheck.

Signed-off-by: Syam Sidhardhan
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Syam Sidhardhan
2015-01-14 03:31:41 +0800

03 Jan, 2015

1 commit

24cc59d1e openvswitch: Consistently include VLAN header in flow and port stats. ... Browse Code »

Until now, when VLAN acceleration was in use, the bytes of the VLAN header
were not included in port or flow byte counters. They were however
included when VLAN acceleration was not used. This commit corrects the
inconsistency, by always including the VLAN header in byte counters.

Previous discussion at
http://openvswitch.org/pipermail/dev/2014-December/049521.html

Reported-by: Motonori Shindo
Signed-off-by: Ben Pfaff
Reviewed-by: Flavio Leitner
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Ben Pfaff
2015-01-03 05:14:20 +0800

27 Dec, 2014

1 commit

f8403a2e4 genetlink: pass only network namespace to genl_has_listeners() ... Browse Code »

There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2014-12-27 15:20:23 +0800

25 Dec, 2014

1 commit

4aa611881 openvswitch: fix odd_ptr_err.cocci warnings ... Browse Code »

net/openvswitch/vport-gre.c:188:5-11: inconsistent IS_ERR and PTR_ERR, PTR_ERR on line 189

PTR_ERR should access the value just tested by IS_ERR

Semantic patch information:
There can be false positives in the patch case, where it is the call
IS_ERR that is wrong.

Generated by: scripts/coccinelle/tests/odd_ptr_err.cocci

CC: Pravin B Shelar
Signed-off-by: Fengguang Wu
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller

Wu Fengguang
2014-12-25 04:18:09 +0800