Eric Lee / smarc-fsl-linux-kernel

21 Jul, 2015

31 commits

1a20cc254 tipc: introduce node contact FSM ... Browse Code »

The logics for determining when a node is permitted to establish
and maintain contact with its peer node becomes non-trivial in the
presence of multiple parallel links that may come and go independently.

A known failure scenario is that one endpoint registers both its links
to the peer lost, cleans up it binding table, and prepares for a table
update once contact is re-establihed, while the other endpoint may
see its links reset and re-established one by one, hence seeing
no need to re-synchronize the binding table. To avoid this, a node
must not allow re-establishing contact until it has confirmation that
even the peer has lost both links.

Currently, the mechanism for handling this consists of setting and
resetting two state flags from different locations in the code. This
solution is hard to understand and maintain. A closer analysis even
reveals that it is not completely safe.

In this commit we do instead introduce an FSM that keeps track of
the conditions for when the node can establish and maintain links.
It has six states and four events, and is strictly based on explicit
knowledge about the own node's and the peer node's contact states.
Only events leading to state change are shown as edges in the figure
below.

+--------------+
| SELF_UP/ |
+---------------->| PEER_COMING |-----------------+
SELF_ | +--------------+ |PEER_
ESTBL_ | | |ESTBL_
CONTACT| SELF_LOST_CONTACT | |CONTACT
| v |
| +--------------+ |
| PEER_ | SELF_DOWN/ | SELF_ |
| LOST_ +--| PEER_LEAVING || PEER_UP/ |-----------------+
| SELF_COMING |
+--------------+

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:16 +0800
8a1577c96 tipc: move link supervision timer to node level ... Browse Code »

In our effort to move control of the links to the link aggregation
layer, we move the perodic link supervision timer to struct tipc_node.
The new timer is shared between all links belonging to the node, thus
saving resources, while still kicking the FSM on both its pertaining
links at each expiration.

The current link timer and corresponding functions are removed.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:16 +0800
333ef69ed tipc: simplify link timer implementation ... Browse Code »

We create a second, simpler, link timer function, tipc_link_timeout().
The new function makes use of the new FSM function introduced in the
previous commit, and just like it, takes a buffer queue as parameter.
It returns an event bit field and potentially a link protocol packet
to the caller.

The existing timer function, link_timeout(), is still needed for a
while, so we redesign it to become a wrapper around the new function.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:16 +0800
6ab30f9cb tipc: improve link FSM implementation ... Browse Code »

The link FSM implementation is currently unnecessarily complex.
It sometimes checks for conditional state outside the FSM data
before deciding next state, and often performs actions directly
inside the FSM logics.

In this commit, we create a second, simpler FSM implementation,
that as far as possible acts only on states and events that it is
strictly defined for, and postpone any actions until it is finished
with its decisions. It also returns an event flag field and an a
buffer queue which may potentially contain a protocol message to
be sent by the caller.

Unfortunately, we cannot yet make the FSM "clean", in the sense
that its decisions are only based on FSM state and event, and that
state changes happen only here. That will have to wait until the
activate/reset logics has been cleaned up in a future commit.

We also rename the link states as follows:

WORKING_WORKING -> TIPC_LINK_WORKING
WORKING_UNKNOWN -> TIPC_LINK_PROBING
RESET_UNKNOWN -> TIPC_LINK_RESETTING
RESET_RESET -> TIPC_LINK_ESTABLISHING

The existing FSM function, link_state_event(), is still needed for
a while, so we redesign it to make use of the new function.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:15 +0800
426cc2b86 tipc: introduce new link protocol msg create function ... Browse Code »

As a preparation for later changes, we introduce a new function
tipc_link_build_proto_msg(). Instead of actually sending the created
protocol message, it only creates it and adds it to the head of a
skb queue provided by the caller.

Since we still need the existing function tipc_link_protocol_xmit()
for a while, we redesign it to make use of the new function.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:15 +0800
d3504c344 tipc: clean up definitions and usage of link flags ... Browse Code »

The status flag LINK_STOPPED is not needed any more, since the
mechanism for delayed deletion of links has been removed.
Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
because we can just as well start the link timer directly from
inside tipc_link_create().

We eliminate these flags in this commit.

Instead of the above flags, we now introduce three new link modes,
TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
blocks timer-driven protocol messages to be sent out, and any change
to the link FSM. Since the modes are mutually exclusive, we convert
them to state values, and rename the 'flags' field in struct tipc_link
to 'exec_mode'.

Finally, we move the #defines for link FSM states and events from link.h
into enums inside the file link.c, which is the real usage scope of
these definitions.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:15 +0800
af9b028e2 tipc: make media xmit call outside node spinlock context ... Browse Code »

Currently, message sending is performed through a deep call chain,
where the node spinlock is grabbed and held during a significant
part of the transmission time. This is clearly detrimental to
overall throughput performance; it would be better if we could send
the message after the spinlock has been released.

In this commit, we do instead let the call revert on the stack after
the buffer chain has been added to the transmission queue, whereafter
clones of the buffers are transmitted to the device layer outside the
spinlock scope.

As a further step in our effort to separate the roles of the node
and link entities we also move the function tipc_link_xmit() to
node.c, and rename it to tipc_node_xmit().

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:15 +0800
22d85c794 tipc: change sk_buffer handling in tipc_link_xmit() ... Browse Code »

When the function tipc_link_xmit() is given a buffer list for
transmission, it currently consumes the list both when transmission
is successful and when it fails, except for the special case when
it encounters link congestion.

This behavior is inconsistent, and needs to be corrected if we want
to avoid problems in later commits in this series.

In this commit, we change this to let the function consume the list
only when transmission is successful, and leave the list with the
sender in all other cases. We also modifiy the socket code so that
it adapts to this change, i.e., purges the list when a non-congestion
error code is returned.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:15 +0800
36e78a463 tipc: use bearer index when looking up active links ... Browse Code »

struct tipc_node currently holds two arrays of link pointers; one,
indexed by bearer identity, which contains all links irrespective of
current state, and one two-slot array for the currently active link
or links. The latter array contains direct pointers into the elements
of the former. This has the effect that we cannot know the bearer id of
a link when accessing it via the "active_links[]" array without actually
dereferencing the pointer, something we want to avoid in some cases.

In this commit, we do instead store the bearer identity in the
"active_links" array, and use this as an index to find the right element
in the overall link entry array. This change should be seen as a
preparation for the later commits in this series.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:14 +0800
d39bbd445 tipc: move link input queue to tipc_node ... Browse Code »

At present, the link input queue and the name distributor receive
queues are fields aggregated in struct tipc_link. This is a hazard,
because a link might be deleted while a receiving socket still keeps
reference to one of the queues.

This commit fixes this bug. However, rather than adding yet another
reference counter to the critical data path, we move the two queues
to safe ground inside struct tipc_node, which is already protected, and
let the link code only handle references to the queues. This is also
in line with planned later changes in this area.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:14 +0800
d3a43b907 tipc: move link creation from neighbor discoverer to node ... Browse Code »

As a step towards turning links into node internal entities, we move the
creation of links from the neighbor discovery logics to the node's link
control logics.

We also create an additional entry for the link's media address in the
newly introduced struct tipc_link_entry, since this is where it is
needed in the upcoming commits. The current copy in struct tipc_link
is kept for now, but will be removed later.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:14 +0800
9d13ec65e tipc: introduce link entry structure to struct tipc_node ... Browse Code »

struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.

We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-07-21 11:41:14 +0800
6acc23266 net: remove skb_frag_add_head ... Browse Code »

It's not used anywhere.

Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2015-07-21 11:38:49 +0800
bd265242c Merge branch 'offload_fwd_mark' ... Browse Code »

Scott Feldman says:

====================
switchdev: avoid duplicate packet forwarding

v3:

- Per Nicolas Dichtel review: remove errant empty union.

v2:

- Per davem review: in sk_buff, union fwd_mark with secmark to save space
since features appear to be mutually exclusive.
- Per Simon Horman review:
- fix grammar in switchdev.txt wrt fwd_mark
- remove some unrelated changes that snuck in

v1:

This patchset was previously submitted as RFC. No changes from the last
version (v2) sent under RFC. Including RFC version history here for reference.

RFC v2:

- s/fwd_mark/offload_fwd_mark
- use consume_skb rather than kfree_skb when dropping pkt on egress.
- Use Jiri's suggestion to use ifindex of one of the ports in a group
as the mark for all the ports in the group. This can be done with
no additional storage (no hashtable from v1). To pull it off, we
need some simple recursive routines to walk the netdev tree ensuring
all leaves in the tree (ports) in the same group (e.g. bridge)
belonging to the same switch device will have the same offload fwd mark.
Maybe someone sees a better design for the recusive routines? They're
not too bad, and should cover the stacked driver cases.

RFC v1:

With switchdev support for offloading L2/L3 forwarding data path to a
switch device, we have a general problem where both the device and the
kernel may forward the packet, resulting in duplicate packets on the wire.
Anytime a packet is forwarded by the device and a copy is sent to the CPU,
there is potential for duplicate forwarding, as the kernel may also do a
forwarding lookup and send the packet on the wire.

The specific problem this patch series is interested in solving is avoiding
duplicate packets on bridged ports. There was a previous RFC from Roopa
(http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
problem, but didn't solve the problem of mixed ports in the bridge from
different devices; there was no way to exclude some ports from forwarding
and include others. This RFC solves that problem by tagging the ingressing
packet with a unique mark, and then comparing the packet mark with the
egress port mark, and skip forwarding when there is a match. For the mixed
ports bridge case, only those ports with matching marks are skipped.

The switchdev port driver must do two things:

1) Generate a fwd_mark for each switch port, using some unique key of the
switch device (and optionally port). This is done when the port netdev
is registered or if the port's group membership changes (joins/leaves
a bridge, for example).

2) On packet ingress from port, mark the skb with the ingress port's
fwd_mark. If the device supports it, it's useful to only mark skbs
which were already forwarded by the device. If the device does not
support such indication, all skbs can be marked, even if they're
local dst.

Two new 32-bit fields are added to struct sk_buff and struct netdevice to
hold the fwd_mark. I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
tried using skb->mark for this purpose, but ebtables can overwrite the
skb->mark before the bridge gets it, so that will not work.

In general, this fwd_mark can be used for any case where a packet is
forwarded by the device and a copy is sent to the CPU, to avoid the kernel
re-forwarding the packet. sFlow is another use-case that comes to mind,
but I haven't explored the details.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-07-21 09:32:45 +0800
a48037e7c switchdev: update documentation for offload_fwd_mark ... Browse Code »

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-07-21 09:32:45 +0800
3f98a8e63 rocker: add offload_fwd_mark support ... Browse Code »

If device flags ingress packet as "fwd offload", mark the
skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark. This
will be the hint to the kernel that this packet has already been forwarded
by device to egress ports matching skb->offlaod_fwd_mark.

For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
port ifindex. If port is bridged, use the bridge ifindex rather than the
port ifindex.

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-07-21 09:32:45 +0800
1a3b2ec93 switchdev: add offload_fwd_mark generator helper ... Browse Code »

skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex. group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.

The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-07-21 09:32:44 +0800
d754f98b5 net: add phys ID compare helper to test if two IDs are the same ... Browse Code »

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-07-21 09:32:44 +0800
0c4f691ff net: don't reforward packets already forwarded by offload device ... Browse Code »

Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device. If so, drop skb. A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark. The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Acked-by: Roopa Prabhu
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Scott Feldman
2015-07-21 09:32:44 +0800
8254973fa rocker: forward packets to CPU when port is joined to openvswitch ... Browse Code »

Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
There is scope to later refine what is passed up as per Open vSwitch flows
on a port.

This does not change the behaviour of rocker ports that are
not joined to Open vSwitch.

Signed-off-by: Simon Horman
Acked-by: Scott Feldman
Signed-off-by: David S. Miller

Simon Horman
2015-07-21 09:26:03 +0800
a7ce45a74 bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not defined ... Browse Code »

Fix:
net/bridge/br_if.c: In function 'br_dev_delete':
>> net/bridge/br_if.c:284:2: error: implicit declaration of function
>> 'br_multicast_dev_del' [-Werror=implicit-function-declaration]
br_multicast_dev_del(br);
^
cc1: some warnings being treated as errors

when igmp snooping is not defined.

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-07-21 07:19:03 +0800
a0a9f33bd net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not set ... Browse Code »

Newly created flows don't have flowi6_oif set (at least if the
associated socket is not interface-bound). This leads to a mismatch in
__xfrm6_selector_match() for policies which specify an interface in the
selector (sel->ifindex != 0).

Backtracing shows this happens in code-paths originating from e.g.
ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was
not tested for.)

In summary, this patch fixes policy matching on outgoing interface for
locally generated packets.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2015-07-21 03:59:32 +0800
22f94e625 bonding: trivial: remove unused variables ... Browse Code »

Get rid of these:
drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’:
drivers/net/bonding//bond_main.c:3754:6: warning: variable
‘slaves_in_agg’ set but not used [-Wunused-but-set-variable]
int slaves_in_agg;
^
CC [M] drivers/net/bonding//bond_3ad.o
drivers/net/bonding//bond_3ad.c: In function
‘ad_marker_response_received’:
drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’
set but not used [-Wunused-but-set-parameter]
static void ad_marker_response_received(struct bond_marker *marker,
^
drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set
but not used [-Wunused-but-set-parameter]
struct port *port)
^

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-07-21 03:50:32 +0800
5b1d0d8f5 Merge branch 'bridge-temp-and-perm' ... Browse Code »

Nikolay Aleksandrov says:

====================
bridge: multicast: temp and perm entries behaviour enhancements

Patch 01 adds a notify when a group is deleted via br_multicast_del_pg()
(on expire, on device delete or on device down).
Patch 02 changes how bridge device and bridge port delete and down/up are
handled. Until now on bridge down all groups were flushed, now only the
temp ones are (same for port), perm entries are flushed only on port or
bridge removal.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-07-21 03:49:11 +0800
e10177abf bridge: multicast: fix handling of temp and perm entries ... Browse Code »

When the bridge (or port) is brought down/up flush only temp entries and
leave the perm ones. Flush perm entries only when deleting the bridge
device or the associated port.

Signed-off-by: Satish Ashok
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Satish Ashok
2015-07-21 03:49:10 +0800
ef8299de7 bridge: multicast: notify on group delete ... Browse Code »

Group notifications were not sent when a group expired or was deleted
due to bridge/port device being deleted. So add br_mdb_notify() to
br_multicast_del_pg().

Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2015-07-21 03:49:10 +0800
03b6dc7d1 Merge branch 'bpf_cgroup_classid' ... Browse Code »

Daniel Borkmann says:

====================
BPF update

This small helper allows for accessing net_cls cgroups classid. Please
see individual patches for more details.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-07-21 03:41:30 +0800
8d20aabe1 ebpf: add helper to retrieve net_cls's classid cookie ... Browse Code »

It would be very useful to retrieve the net_cls's classid from an eBPF
program to allow for a more fine-grained classification, it could be
directly used or in conjunction with additional policies. I.e. docker,
but also tooling such as cgexec, can easily run applications via net_cls
cgroups:

cgcreate -g net_cls:/foo
echo 42 > foo/net_cls.classid
cgexec -g net_cls:foo

Thus, their respecitve classid cookie of foo can then be looked up on
the egress path to apply further policies. The helper is desigend such
that a non-zero value returns the cgroup id.

Signed-off-by: Daniel Borkmann
Cc: Thomas Graf
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2015-07-21 03:41:30 +0800
b87a173e2 cls_cgroup: factor out classid retrieval ... Browse Code »

Split out retrieving the cgroups net_cls classid retrieval into its
own function, so that it can be reused later on from other parts of
the traffic control subsystem. If there's no skb->sk, then the small
helper returns 0 as well, which in cls_cgroup terms means 'could not
classify'.

Signed-off-by: Daniel Borkmann
Cc: Thomas Graf
Signed-off-by: David S. Miller

Daniel Borkmann
2015-07-21 03:41:30 +0800
d9382bda4 enic: allow adaptive coalesce setting for msi/legacy intr ... Browse Code »

* Allow setting of adaptive coalescing setting for all types of interrupt.

* In msi & legacy intr, we use single interrupt for rx & tx. In this case
tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs.
Do not display tx_coal values for msi/intx. And do not allow user to set
this as well.

* Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings.
For other values, driver does not return error. So ethtool succeeds for
unsupported values. Introduce enic_coalesce_valid() function to validate
the coalescing values.

* If user requests for coalesce value greater than what adaptor supports,
driver uses the max value. We should at least log this.

Signed-off-by: Govindarajulu Varadarajan
Signed-off-by: David S. Miller

Govindarajulu Varadarajan
2015-07-21 03:39:34 +0800
fc865d6b4 enic: add adaptive coalescing intr for intx and msi poll ... Browse Code »

Adaptive interrupt coalescing is available for msix. This patch adds the support
for msi poll. Interface for adaptive interrupt coalescing is already added in
driver. We just did not enable it for legacy intr & msi.

enic_calc_int_moderation() & enic_set_int_moderation() are defined as static
after enic_poll. Since enic_poll needs it, move both of these function
definitions above enic_poll. No change in functionality.

Signed-off-by: Govindarajulu Varadarajan
Signed-off-by: David S. Miller

Govindarajulu Varadarajan
2015-07-21 03:39:34 +0800

16 Jul, 2015

9 commits

c15df306f ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). ... Browse Code »

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki
2015-07-16 16:00:56 +0800
0d0578815 Merge branch 'protodown' ... Browse Code »

Anuradha Karuppiah says:

====================
net: Introduce protodown flag.

User space daemons can detect errors in the network that need to be
notified to the switch device drivers.

Drivers can react to this error state by doing a phy-down on the
switch-port which would result in a carrier-off locally and on the directly
connected switch. Doing that would prevent loops and black-holes in the
network.

One such use case is the multi-chassis LAG application -

1. The MLAG application runs on peer switches (say Switch0 and Switch1)
synchronizing states, forwarding entries etc. between the two
switches over the peer-link (this is a link directly connecting the
two switches).
2. An MLAG election process designates one of the switches as a primary
(for e.g. Switch0 is primary and Switch1 is secondary).
3. The peer link plays a critical role in allowing Switch0-Switch1 to
function as a single LAG partner to the downstream dual-connected
servers. When the peer-link between the switches goes down we have a
split-brain situation. Switch0 and Switch1 are no longer in sync and
are acting independently. This can result in traffic loops and
traffic black-holing in the network.
4. To prevent these problems the MLAG application on the secondary
switch phy-downs the MLAG ports on detecting the peer-link down.
This will be seen as a carrier down on servers that are
dual-connected to Switch0 and Switch1.
5. Specifically a dual-connected server will see a carrier-down on the
port connected to the MLAG secondary, Switch1, and will stop using
that port for traffic TX. So traffic black holing is prevented.

v6 to v7:
Removed some unnecessary code in response to review comments.

v5 to v6:
Replaced proto_flags with a simple proto_down boolean attribute in
response to Dave's comments.

v4 to v5:
Changed the ip link display format for protodown to match the set as
recommended by Stephen.

v3 to v4:
I have moved protodown out of IFF_XXX and introduced a separate
proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
switch port errors. This is in response to Stephen's comments that
adding a new IFF_XXX may break user space.

I have used rocker as the sample switch driver. And to test this
functionality I used the qemu-rocker patch that Scott sent out in
response to the v3 posting (needed to set link up/down when phy is
enabled/disabled).

v1 to v2:
Based on Dave's suggestion I have moved out aggregating of error bits
across applications to a user space framework. This patch now simply
notifies an aggregated error bit to drivers enabling them to handle
the error gracefully.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-07-16 12:39:40 +0800
c30552461 rocker: Handle protodown notifications. ... Browse Code »

protodown can be set by user space applications like MLAG on detecting
errors on a switch port. This patch provides sample switch driver changes
for handling protodown. Rocker PHYS disables the port in response to
protodown.

Signed-off-by: Anuradha Karuppiah
Signed-off-by: Andy Gospodarek
Signed-off-by: Roopa Prabhu
Signed-off-by: Wilson Kok
Signed-off-by: David S. Miller

Anuradha Karuppiah
2015-07-16 12:39:40 +0800
88d6378bd netlink: changes for setting and clearing protodown via netlink. ... Browse Code »

Signed-off-by: Anuradha Karuppiah
Signed-off-by: Andy Gospodarek
Signed-off-by: Roopa Prabhu
Signed-off-by: Wilson Kok
Signed-off-by: David S. Miller

Anuradha Karuppiah
2015-07-16 12:39:40 +0800
d746d707a net core: Add protodown support. ... Browse Code »

This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.

The switch driver can react to protodown notification by doing a phys down
on the associated switch port.

Signed-off-by: Anuradha Karuppiah
Signed-off-by: Andy Gospodarek
Signed-off-by: Roopa Prabhu
Signed-off-by: Wilson Kok
Signed-off-by: David S. Miller

Anuradha Karuppiah
2015-07-16 12:39:40 +0800
07e6a97da ibmveth: add support for TSO6 ... Browse Code »

This patch adds support for a new method of signalling the firmware
that TSO packets are being sent. The new method removes the need to
alter the ip and tcp checksums and allows TSO6 support.

Signed-off-by: Thomas Falcon
Signed-off-by: David S. Miller

Thomas Falcon
2015-07-16 12:34:56 +0800
2de8530ba hv_netvsc: Add close of RNDIS filter into change mtu call ... Browse Code »

The current change mtu call only stops tx before removing RNDIS filter.
In case ringbufer is not empty, the rndis_filter_device_remove() may
hang on removing the buffers.

This patch adds close of RNDIS filter before removing it, also a
gradual waiting loop until the ring is empty. The change_mtu hang
issue under heavy traffic is solved by this patch.

Signed-off-by: Haiyang Zhang
Reviewed-by: K. Y. Srinivasan
Signed-off-by: David S. Miller

Haiyang Zhang
2015-07-16 12:11:31 +0800
c0b8da1e7 ipv6: Fix finding best source address in ipv6_dev_get_saddr(). ... Browse Code »

Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available. Plus, it introduced
possible NULL pointer dereference.

Bug was reported by Erik Kline .
Based on patch proposed by Hajime Tazaki .

Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
iterate over all interfaces when finding source address
on specific interface.")
Signed-off-by: YOSHIFUJI Hideaki
Acked-by: Hajime Tazaki
Acked-by: Erik Kline
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki/吉藤英明
2015-07-16 12:06:13 +0800
9243b25b2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue ... Browse Code »

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-07-14

This series contains updates to i40e and i40evf only.

Joe Stringer and Jesse Gross add a ndo_features_check function to ensure
that the i40e driver does not try to offload packets that exceed 80 bytes
in length.

Anjali adds additional stats to track flow director ATR and SB current
state and flow director flush count which will help the need for verbose
debug logs with respect to flow director. Also refines an error message
to avoid confusion, so that it indicates what may have really happened
when the init_shared_code() call possibly fails.

Pawel adds new fields to the capabilities structures to handle Flex-10
device/function capabilities which is needed to support Flex-10 configs.

Jesse improves the transmit performance by added a prefetch for the
next transmit descriptor to be used when we know there are more coming.

Mitch modifies i40evf driver to handle/allow an abundance of vectors.
Currently the driver only maps transmit and receive queues to a single
MSI-X vector per queue if there are exactly enough vectors for this, but
if we have too many vectors, it will fail and allocate queues to vectors
in a suboptimal manner. So change the condition check to allow for an
excess number of vectors and won't use the extras. Also update the
driver to just return success if the user attempts to set a port VLAN on
a VF that already has the same port VLAN configured, instead of going
through unnecessary filter removals & adds. Fix the MAC filters for VFs,
which were being programmed with 0 for the VLAN value when there was no
VLAN assigned. Instead, we must use -1 to indicate that no VLAN is in
use. Fix the VF disable code, which was not properly cleaning up the VF
and would leave the VF in an indeterminate state, so fix this by
notifying the VF and then call the normal VF reset routine. Fix the
logic in the driver so that MAC filters are added and removed correctly
and added a check for the driver's hardware MAC address so that this
filter does not get removed incorrectly.

Carolyn removes incorrect #ifdef's which should not have been added in
the first place and with the #ifdef's removed, make the necessary
changes in the driver to resolve compile errors.

Greg updates the admin queue command header defines.

v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov
====================

Signed-off-by: David S. Miller

David S. Miller
2015-07-16 08:31:14 +0800