Eric Lee / smarc-fsl-linux-kernel

04 May, 2016

4 commits

79e8dc8b8 ipv6/ila: fix nlsize calculation for lwtunnel ... Browse Code »

The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.

Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
CC: Tom Herbert
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-05-04 04:21:33 +0800
bd7c5f983 RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock. ... Browse Code »

An arbitration scheme for duelling SYNs is implemented as part of
commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.

The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).

Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).

Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-05-04 04:03:44 +0800
eb1928402 RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock ... Browse Code »

There is a race condition between rds_send_xmit -> rds_tcp_xmit
and the code that deals with resolution of duelling syns added
by commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()").

Specifically, we may end up derefencing a null pointer in rds_send_xmit
if we have the interleaving sequence:
rds_tcp_accept_one rds_send_xmit

conn is RDS_CONN_UP, so
invoke rds_tcp_xmit

tc = conn->c_transport_data
rds_tcp_restore_callbacks
/* reset t_sock */
null ptr deref from tc->t_sock

The race condition can be avoided without adding the overhead of
additional locking in the xmit path: have rds_tcp_accept_one wait
for rds_tcp_xmit threads to complete before resetting callbacks.
The synchronization can be done in the same manner as rds_conn_shutdown().
First set the rds_conn_state to something other than RDS_CONN_UP
(so that new threads cannot get into rds_tcp_xmit()), then wait for
RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
threads in rds_tcp_xmit are done.

Fixes: 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan
Acked-by: Santosh Shilimkar
Signed-off-by: David S. Miller

Sowmini Varadhan
2016-05-04 04:03:44 +0800
996e80218 net: Disable segmentation if checksumming is not supported ... Browse Code »

In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum
offload for tunnels. With this being the case we should disable GSO in
addition to the checksum offload features when we find that a device cannot
perform a checksum on a given packet type.

Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-05-04 04:00:54 +0800

03 May, 2016

3 commits

6071bd1aa netem: Segment GSO packets on enqueue ... Browse Code »

This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:

[ 788.073771] ---------------------[ cut here ]---------------------------
[ 788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[ 788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[ 788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[ 788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W
------------ 3.10.0-327.el7.x86_64 #1
[ 788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[ 788.542260] ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[ 788.576332] ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[ 788.611943] 0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[ 788.647241] Call Trace:
[ 788.658817] [] dump_stack+0x19/0x1b
[ 788.686193] [] warn_slowpath_common+0x70/0xb0
[ 788.713803] [] warn_slowpath_fmt+0x5c/0x80
[ 788.741314] [] ? ___ratelimit+0x93/0x100
[ 788.767018] [] skb_warn_bad_offload+0xcd/0xda
[ 788.796117] [] skb_checksum_help+0x17c/0x190
[ 788.823392] [] netem_enqueue+0x741/0x7c0 [sch_netem]
[ 788.854487] [] dev_queue_xmit+0x2a8/0x570
[ 788.880870] [] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to which this applies

Signed-off-by: Neil Horman
CC: Jamal Hadi Salim
CC: "David S. Miller"
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com
CC: stephen@networkplumber.org
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Neil Horman
2016-05-03 12:33:14 +0800
9b40d5aae Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge ... Browse Code »

Antonio Quartulli says:

====================
In this small batch of patches you have:
- a fix for our Distributed ARP Table that makes sure that the input
provided to the hash function during a query is the same as the one
provided during an insert (so to prevent false negatives), by Antonio
Quartulli
- a fix for our new protocol implementation B.A.T.M.A.N. V that ensures
that a hard interface is properly re-activated when it is brought down
and then up again, by Antonio Quartulli
- two fixes respectively to the reference counting of the tt_local_entry
and neigh_node objects, by Sven Eckelmann. Such bug is rather severe
as it would prevent the netdev objects references by batman-adv from
being released after shutdown.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-05-03 12:17:38 +0800
9c5d1bc2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips,
from Sara Sharon.

2) Fix SKB size checks in batman-adv stack on receive, from Sven
Eckelmann.

3) Leak fix on mac80211 interface add error paths, from Johannes Berg.

4) Cannot invoke napi_disable() with BH disabled in myri10ge driver,
fix from Stanislaw Gruszka.

5) Fix sign extension problem when computing feature masks in
net_gso_ok(), from Marcelo Ricardo Leitner.

6) lan78xx driver doesn't count packets and packet lengths in its
statistics properly, fix from Woojung Huh.

7) Fix the buffer allocation sizes in pegasus USB driver, from Petko
Manolov.

8) Fix refcount overflows in bpf, from Alexei Starovoitov.

9) Unified dst cache handling introduced a preempt warning in
ip_tunnel, fix by resetting rather then setting the cached route.
From Paolo Abeni.

10) Listener hash collision test fix in soreuseport, from Craig Gallak

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
tipc: only process unicast on intended node
cxgb3: fix out of bounds read
net/smscx5xx: use the device tree for mac address
soreuseport: Fix TCP listener hash collision
net: l2tp: fix reversed udp6 checksum flags
ip_tunnel: fix preempt warning in ip tunnel creation/updating
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
drivers: net: cpsw: use of_phy_connect() in fixed-link case
dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
drivers: net: cpsw: fix segfault in case of bad phy-handle
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
gre: reject GUE and FOU in collect metadata mode
pegasus: fixes reported packet length
pegasus: fixes URB buffer allocation size;
...

Linus Torvalds
2016-05-03 00:40:42 +0800

02 May, 2016

4 commits

b7f8fe251 gre: do not pull header in ICMP error processing ... Browse Code »

iptunnel_pull_header expects that IP header was already pulled; with this
expectation, it pulls the tunnel header. This is not true in gre_err.
Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points
to the IP header.

We cannot pull the tunnel header in this path. It's just a matter of not
calling iptunnel_pull_header - we don't need any of its effects.

Fixes: bda7bb463436 ("gre: Allow multiple protocol listener for gre protocol.")
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2016-05-02 12:19:58 +0800
efe790502 tipc: only process unicast on intended node ... Browse Code »

We have observed complete lock up of broadcast-link transmission due to
unacknowledged packets never being removed from the 'transmq' queue. This
is traced to nodes having their ack field set beyond the sequence number
of packets that have actually been transmitted to them.
Consider an example where node 1 has sent 10 packets to node 2 on a
link and node 3 has sent 20 packets to node 2 on another link. We
see examples of an ack from node 2 destined for node 3 being treated as
an ack from node 2 at node 1. This leads to the ack on the node 1 to node
2 link being increased to 20 even though we have only sent 10 packets.
When node 1 does get around to sending further packets, none of the
packets with sequence numbers less than 21 are actually removed from the
transmq.
To resolve this we reinstate some code lost in commit d999297c3dbb ("tipc:
reduce locking scope during packet reception") which ensures that only
messages destined for the receiving node are processed by that node. This
prevents the sequence numbers from getting out of sync and resolves the
packet leakage, thereby resolving the broadcast-link transmission
lock-ups we observed.

While we are aware that this change only patches over a root problem that
we still haven't identified, this is a sanity test that it is always
legitimate to do. It will remain in the code even after we identify and
fix the real problem.

Reviewed-by: Chris Packham
Reviewed-by: John Thompson
Signed-off-by: Hamish Martin
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Hamish Martin
2016-05-02 09:03:30 +0800
90e5d0db2 soreuseport: Fix TCP listener hash collision ... Browse Code »

I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener bucket.
The same error does not exist in the UDP path.

Fixes: c125e80b8868("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Craig Gallek
2016-05-02 07:36:54 +0800
018f82585 net: l2tp: fix reversed udp6 checksum flags ... Browse Code »

This patch fixes a bug which causes the behavior of whether to ignore
udp6 checksum of udp6 encapsulated l2tp tunnel contrary to what
userspace program requests.

When the flag `L2TP_ATTR_UDP_ZERO_CSUM6_RX` is set by userspace, it is
expected that udp6 checksums of received packets of the l2tp tunnel
to create should be ignored. In `l2tp_netlink.c`:
`l2tp_nl_cmd_tunnel_create()`, `cfg.udp6_zero_rx_checksums` is set
according to the flag, and then passed to `l2tp_core.c`:
`l2tp_tunnel_create()` and then `l2tp_tunnel_sock_create()`. In
`l2tp_tunnel_sock_create()`, `udp_conf.use_udp6_rx_checksums` is set
the same to `cfg.udp6_zero_rx_checksums`. However, if we want the
checksum to be ignored, `udp_conf.use_udp6_rx_checksums` should be set
to `false`, i.e. be set to the contrary. Similarly, the same should be
done to `udp_conf.use_udp6_tx_checksums`.

Signed-off-by: Miao Wang
Acked-by: James Chapman
Signed-off-by: David S. Miller

Wang Shanker
2016-05-02 07:32:16 +0800

30 Apr, 2016

1 commit

f27337e16 ip_tunnel: fix preempt warning in ip tunnel creation/updating ... Browse Code »

After the commit e09acddf873b ("ip_tunnel: replace dst_cache with generic
implementation"), a preemption debug warning is triggered on ip4
tunnels updating; the dst cache helper needs to be invoked in unpreemptible
context.

We don't need to load the cache on tunnel update, so this commit fixes
the warning replacing the load with a dst cache reset, which is
preempt safe.

Fixes: e09acddf873b ("ip_tunnel: replace dst_cache with generic implementation")
Reported-by: Eric Dumazet
Signed-off-by: Paolo Abeni
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Paolo Abeni
2016-04-30 02:11:46 +0800

29 Apr, 2016

10 commits

abe59c652 batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node ... Browse Code »

The batadv_neigh_node was specific to a batadv_hardif_neigh_node and held
an implicit reference to it. But this reference was never stored in form of
a pointer in the batadv_neigh_node itself. Instead
batadv_neigh_node_release depends on a consistent state of
hard_iface->neigh_list and that batadv_hardif_neigh_get always returns the
batadv_hardif_neigh_node object which it has a reference for. But
batadv_hardif_neigh_get cannot guarantee that because it is working only
with rcu_read_lock on this list. It can therefore happen that a neigh_addr
is in this list twice or that batadv_hardif_neigh_get cannot find the
batadv_hardif_neigh_node for an neigh_addr due to some other list
operations taking place at the same time.

Instead add a batadv_hardif_neigh_node pointer directly in
batadv_neigh_node which will be used for the reference counter decremented
on release of batadv_neigh_node.

Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface")
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Sven Eckelmann
2016-04-29 19:46:11 +0800
a33d970d0 batman-adv: Fix reference counting of vlan object for tt_local_entry ... Browse Code »

The batadv_tt_local_entry was specific to a batadv_softif_vlan and held an
implicit reference to it. But this reference was never stored in form of a
pointer in the tt_local_entry itself. Instead batadv_tt_local_remove,
batadv_tt_local_table_free and batadv_tt_local_purge_pending_clients depend
on a consistent state of bat_priv->softif_vlan_list and that
batadv_softif_vlan_get always returns the batadv_softif_vlan object which
it has a reference for. But batadv_softif_vlan_get cannot guarantee that
because it is working only with rcu_read_lock on this list. It can
therefore happen that an vid is in this list twice or that
batadv_softif_vlan_get cannot find the batadv_softif_vlan for an vid due to
some other list operations taking place at the same time.

Instead add a batadv_softif_vlan pointer directly in batadv_tt_local_entry
which will be used for the reference counter decremented on release of
batadv_tt_local_entry.

Fixes: 35df3b298fc8 ("batman-adv: fix TT VLAN inconsistency on VLAN re-add")
Signed-off-by: Sven Eckelmann
Acked-by: Antonio Quartulli
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Sven Eckelmann
2016-04-29 19:46:11 +0800
b6cf5d499 batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event ... Browse Code »

At the moment there is no explicit reactivation of an hard-interface
upon NETDEV_UP event. In case of B.A.T.M.A.N. IV the interface is
reactivated as soon as the next OGM is scheduled for sending, but this
mechanism does not work with B.A.T.M.A.N. V. The latter does not rely
on the same scheduling mechanism as its predecessor and for this reason
the hard-interface remains deactivated forever after being brought down
once.

This patch fixes the reactivation mechanism by adding a new routing API
which explicitly allows each algorithm to perform any needed operation
upon interface re-activation.

Such API is optional and is implemented by B.A.T.M.A.N. V only and it
just takes care of setting the iface status to ACTIVE

Signed-off-by: Antonio Quartulli
Signed-off-by: Marek Lindner

Antonio Quartulli
2016-04-29 19:46:11 +0800
2871734e8 batman-adv: fix DAT candidate selection (must use vid) ... Browse Code »

Now that DAT is VLAN aware, it must use the VID when
computing the DHT address of the candidate nodes where
an entry is going to be stored/retrieved.

Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Antonio Quartulli
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner

Antonio Quartulli
2016-04-29 19:46:10 +0800
6fa9bffbc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client

Linus Torvalds
2016-04-29 09:59:24 +0800
946b636f1 gre: reject GUE and FOU in collect metadata mode ... Browse Code »

The collect metadata mode does not support GUE nor FOU. This might be
implemented later; until then, we should reject such config.

I think this is okay to be changed. It's unlikely anyone has such
configuration (as it doesn't work anyway) and we may need a way to
distinguish whether it's supported or not by the kernel later.

For backwards compatibility with iproute2, it's not possible to just check
the attribute presence (iproute2 always includes the attribute), the actual
value has to be checked, too.

Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2016-04-29 05:09:37 +0800
2090714e1 gre: build header correctly for collect metadata tunnels ... Browse Code »

In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
case, the interface is ARPHRD_IPGRE and the protocol to be used for
encapsulation is skb->protocol.

Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc
Acked-by: Pravin B Shelar
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Jiri Benc
2016-04-29 05:02:45 +0800
a64b04d86 gre: do not assign header_ops in collect metadata mode ... Browse Code »

In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel
is incorrectly assumed to be mGRE in NBMA mode (see commit 6a5f44d7a048c).
This is not the case, we're controlling the encapsulation addresses by
lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata
mode does not make sense.

Although it would be more user firendly to reject requests that specify
both the collect metadata flag and a remote/local IP address, this would
break current users of gretap or introduce ugly code and differences in
handling ipgre and gretap configuration. Keep the current behavior of
remote/local IP address being ignored in such case.

v3: Back to v1, added explanation paragraph.
v2: Reject configuration specifying both remote/local address and collect
metadata flag.

Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jiri Benc
Signed-off-by: David S. Miller

Jiri Benc
2016-04-29 05:02:44 +0800
12395d064 Merge tag 'mac80211-for-davem-2016-04-27' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a single fix, for a per-CPU memory leak in a
(root user triggerable) error case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2016-04-29 04:55:26 +0800
956a7ffe0 Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge ... Browse Code »

Antonio Quartulli says:

====================
In this patchset you can find the following fixes:

1) check skb size to avoid reading beyond its border when delivering
payloads, by Sven Eckelmann
2) initialize last_seen time in neigh_node object to prevent cleanup
routine from accidentally purge it, by Marek Lindner
3) release "recently added" slave interfaces upon virtual/batman
interface shutdown, by Sven Eckelmann
4) properly decrease router object reference counter upon routing table
update, by Sven Eckelmann
5) release queue slots when purging OGM packets of deactivating slave
interface, by Linus Lüssing

Patch 2 and 3 have no "Fixes:" tag because the offending commits date
back to when batman-adv was not yet officially in the net tree.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-04-29 04:42:40 +0800

27 Apr, 2016

1 commit

e6436be21 mac80211: fix statistics leak if dev_alloc_name() fails ... Browse Code »

In the case that dev_alloc_name() fails, e.g. because the name was
given by the user and already exists, we need to clean up properly
and free the per-CPU statistics. Fix that.

Cc: stable@vger.kernel.org
Fixes: 5a490510ba5f ("mac80211: use per-CPU TX/RX statistics")
Signed-off-by: Johannes Berg

Johannes Berg
2016-04-27 16:06:58 +0800

26 Apr, 2016

4 commits

38bd10c44 net: ipv6: Delete host routes on an ifdown ... Browse Code »

It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.

The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:

1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
rt6_fill_node while handling a route dump request.

2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
fib6_del

3. Panic in fib6_purge_rt because rt6i_ref count is not 1.

The root cause of all these is references related to the host route for
an address that is retained.

So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.

All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.

Signed-off-by: David Ahern
Signed-off-by: David S. Miller

David Ahern
2016-04-26 23:48:26 +0800
6a923934c Revert "ipv6: Revert optional address flusing on ifdown." ... Browse Code »

This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61.

Ok, this puts the feature back. I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.

Signed-off-by: David S. Miller

David S. Miller
2016-04-26 23:47:41 +0800
841645b5f ipv6: Revert optional address flusing on ifdown. ... Browse Code »

This reverts the following three commits:

70af921db6f8835f4b11c65731116560adb00c14
799977d9aafbf0ca0b9c39b04cbfb16db71302c9
f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac

The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller

David S. Miller
2016-04-26 03:33:55 +0800
6c1ea260f libceph: make authorizer destruction independent of ceph_auth_client ... Browse Code »

Starting the kernel client with cephx disabled and then enabling cephx
and restarting userspace daemons can result in a crash:

[262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
[262671.531460] IP: [] kfree+0x5a/0x130
[262671.584334] PGD 0
[262671.635847] Oops: 0000 [#1] SMP
[262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
[262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
[262672.268937] Workqueue: ceph-msgr con_work [libceph]
[262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
[262672.428330] RIP: 0010:[] [] kfree+0x5a/0x130
[262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286
[262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
[262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
[262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
[262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
[262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
[262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
[262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
[262673.417556] Stack:
[262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
[262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
[262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
[262673.805230] Call Trace:
[262673.859116] [] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
[262673.968705] [] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
[262674.078852] [] put_osd+0x45/0x80 [libceph]
[262674.134249] [] remove_osd+0xae/0x140 [libceph]
[262674.189124] [] __reset_osd+0x103/0x150 [libceph]
[262674.243749] [] kick_requests+0x223/0x460 [libceph]
[262674.297485] [] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
[262674.350813] [] dispatch+0x4e/0x720 [libceph]
[262674.403312] [] try_read+0x3d1/0x1090 [libceph]
[262674.454712] [] ? dequeue_entity+0x152/0x690
[262674.505096] [] con_work+0xcb/0x1300 [libceph]
[262674.555104] [] process_one_work+0x14e/0x3d0
[262674.604072] [] worker_thread+0x11a/0x470
[262674.652187] [] ? rescuer_thread+0x310/0x310
[262674.699022] [] kthread+0xd2/0xf0
[262674.744494] [] ? kthread_create_on_node+0x1c0/0x1c0
[262674.789543] [] ret_from_fork+0x3f/0x70
[262674.834094] [] ? kthread_create_on_node+0x1c0/0x1c0

What happens is the following:

(1) new MON session is established
(2) old "none" ac is destroyed
(3) new "cephx" ac is constructed
...
(4) old OSD session (w/ "none" authorizer) is put
ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)

osd->o_auth.authorizer in the "none" case is just a bare pointer into
ac, which contains a single static copy for all services. By the time
we get to (4), "none" ac, freed in (2), is long gone. On top of that,
a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
so we end up trying to destroy a "none" authorizer with a "cephx"
destructor operating on invalid memory!

To fix this, decouple authorizer destruction from ac and do away with
a single static "none" authorizer by making a copy for each OSD or MDS
session. Authorizers themselves are independent of ac and so there is
no reason for destroy_authorizer() to be an ac op. Make it an op on
the authorizer itself by turning ceph_authorizer into a real struct.

Fixes: http://tracker.ceph.com/issues/15447

Reported-by: Alan Zhang
Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-04-26 02:54:13 +0800

25 Apr, 2016

4 commits

391a20333 ipv4/fib: don't warn when primary address is missing if in_dev is dead ... Browse Code »

After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni
Signed-off-by: David S. Miller

Paolo Abeni
2016-04-25 11:26:29 +0800
45ebcce56 bridge: mdb: Marking port-group as offloaded ... Browse Code »

There is a race-condition when updating the mdb offload flag without using
the mulicast_lock. This reverts commit 9e8430f8d60d98 ("bridge: mdb:
Passing the port-group pointer to br_mdb module").

This patch marks offloaded MDB entry as "offload" by changing the port-
group flags and marks it as MDB_PG_FLAGS_OFFLOAD.

When switchdev PORT_MDB succeeded and adds a multicast group, a completion
callback is been invoked "br_mdb_complete". The completion function
locks the multicast_lock and finds the right net_bridge_port_group and
marks it as offloaded.

Fixes: 9e8430f8d60d98 ("bridge: mdb: Passing the port-group pointer to br_mdb module")
Reported-by: Nikolay Aleksandrov
Signed-off-by: Elad Raz
Signed-off-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Elad Raz
2016-04-25 02:23:32 +0800
6dd684c0f bridge: mdb: Common function for mdb entry translation ... Browse Code »

There is duplicate code that translates br_mdb_entry to br_ip let's wrap it
in a common function.

Signed-off-by: Elad Raz
Signed-off-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Elad Raz
2016-04-25 02:23:32 +0800
7ceb2afbd switchdev: Adding complete operation to deferred switchdev ops ... Browse Code »

When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation
is executed in different context and the application doesn't have any way
to get the operation real status.

Adding a completion callback fixes that. This patch adds fields to
switchdev_attr and switchdev_obj "complete_priv" field which is used by
the "complete" callback.

Application can set a complete function which will be called once the
operation executed.

Signed-off-by: Elad Raz
Signed-off-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Acked-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Elad Raz
2016-04-25 02:23:32 +0800

24 Apr, 2016

5 commits

c4fdb6cff batman-adv: Fix broadcast/ogm queue limit on a removed interface ... Browse Code »

When removing a single interface while a broadcast or ogm packet is
still pending then we will free the forward packet without releasing the
queue slots again.

This patch is supposed to fix this issue.

Fixes: 6d5808d4ae1b ("batman-adv: Add missing hardif_free_ref in forw_packet_free")
Signed-off-by: Linus Lüssing
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Linus Lüssing
2016-04-24 15:41:56 +0800
d1a65f174 batman-adv: Reduce refcnt of removed router when updating route ... Browse Code »

_batadv_update_route rcu_derefences orig_ifinfo->router outside of a
spinlock protected region to print some information messages to the debug
log. But this pointer is not checked again when the new pointer is assigned
in the spinlock protected region. Thus is can happen that the value of
orig_ifinfo->router changed in the meantime and thus the reference counter
of the wrong router gets reduced after the spinlock protected region.

Just rcu_dereferencing the value of orig_ifinfo->router inside the spinlock
protected region (which also set the new pointer) is enough to get the
correct old router object.

Fixes: e1a5382f978b ("batman-adv: Make orig_node->router an rcu protected pointer")
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Sven Eckelmann
2016-04-24 15:41:25 +0800
f2d23861b batman-adv: Deactivate TO_BE_ACTIVATED hardif on shutdown ... Browse Code »

The shutdown of an batman-adv interface can happen with one of its slave
interfaces still being in the BATADV_IF_TO_BE_ACTIVATED state. A possible
reason for it is that the routing algorithm BATMAN_V was selected and
batadv_schedule_bat_ogm was not yet called for this interface. This slave
interface still has to be set to BATADV_IF_INACTIVE or the batman-adv
interface will never reduce its usage counter and thus never gets shutdown.

This problem can be simulated via:

$ modprobe dummy
$ modprobe batman-adv routing_algo=BATMAN_V
$ ip link add bat0 type batadv
$ ip link set dummy0 master bat0
$ ip link set dummy0 up
$ ip link del bat0
unregister_netdevice: waiting for bat0 to become free. Usage count = 3

Reported-by: Matthias Schiffer
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Sven Eckelmann
2016-04-24 15:40:23 +0800
e48474ed8 batman-adv: init neigh node last seen field ... Browse Code »

Signed-off-by: Marek Lindner
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann
Signed-off-by: Antonio Quartulli

Marek Lindner
2016-04-24 15:39:19 +0800
c78296665 batman-adv: Check skb size before using encapsulated ETH+VLAN header ... Browse Code »

The encapsulated ethernet and VLAN header may be outside the received
ethernet frame. Thus the skb buffer size has to be checked before it can be
parsed to find out if it encapsulates another batman-adv packet.

Fixes: 420193573f11 ("batman-adv: softif bridge loop avoidance")
Signed-off-by: Sven Eckelmann
Signed-off-by: Marek Lindner
Signed-off-by: Antonio Quartulli

Sven Eckelmann
2016-04-24 15:37:21 +0800

22 Apr, 2016

4 commits

c5edde3a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix memory leak in iwlwifi, from Matti Gottlieb.

2) Add missing registration of netfilter arp_tables into initial
namespace, from Florian Westphal.

3) Fix potential NULL deref in DecNET routing code.

4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
Ivanov.

5) Fix dst ref counting in VRF, from David Ahern.

6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.

7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.

8) Ravalidate IPV6 datagram socket cached routes properly, particularly
with UDP, from Martin KaFai Lau.

9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.

10) Fix stats typing in bcmgenet driver, from Eric Dumazet.

11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
from Joe Stringer.

12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.

13) atl2 doesn't actually support scatter-gather, so don't advertise the
feature. From Ben Hucthings.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
openvswitch: use flow protocol when recalculating ipv6 checksums
Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
atl2: Disable unimplemented scatter/gather feature
net/mlx4_en: Split SW RX dropped counter per RX ring
net/mlx4_core: Don't allow to VF change global pause settings
net/mlx4_core: Avoid repeated calls to pci enable/disable
net/mlx4_core: Implement pci_resume callback
net: phy: spi_ks8895: Don't leak references to SPI devices
net: ethernet: davinci_emac: Fix platform_data overwrite
net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
qede: Fix single MTU sized packet from firmware GRO flow
qede: Fix setting Skb network header
qede: Fix various memory allocation error flows for fastpath
tcp: Merge tx_flags and tskey in tcp_shifted_skb
tcp: Merge tx_flags and tskey in tcp_collapse_retrans
drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
openvswitch: Orphan skbs before IPv6 defrag
Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
VSOCK: Only check error on skb_recv_datagram when skb is NULL
...

Linus Torvalds
2016-04-22 03:57:34 +0800
b4f70527f openvswitch: use flow protocol when recalculating ipv6 checksums ... Browse Code »

When using masked actions the ipv6_proto field of an action
to set IPv6 fields may be zero rather than the prevailing protocol
which will result in skipping checksum recalculation.

This patch resolves the problem by relying on the protocol
in the flow key rather than that in the set field action.

Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.")
Cc: Jarno Rajahalme
Signed-off-by: Simon Horman
Signed-off-by: David S. Miller

Simon Horman
2016-04-22 03:28:47 +0800
cfea5a688 tcp: Merge tx_flags and tskey in tcp_shifted_skb ... Browse Code »

After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible. tx_flags and tskey also have to be
merged.

This patch reuses the tcp_skb_collapse_tstamp() to handle
them.

BPF Output Before:
~~~~~

BPF Output After:
~~~~~
-2024 [007] d.s. 88.644374: : ee_data:14599

Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Neal Cardwell
Cc: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Cc: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Tested-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-22 02:40:55 +0800
082ac2d51 tcp: Merge tx_flags and tskey in tcp_collapse_retrans ... Browse Code »

If two skbs are merged/collapsed during retransmission, the current
logic does not merge the tx_flags and tskey. The end result is
the SCM_TSTAMP_ACK timestamp could be missing for a packet.

The patch:
1. Merge the tx_flags
2. Overwrite the prev_skb's tskey with the next_skb's tskey

BPF Output Before:
~~~~~~

BPF Output After:
~~~~~~
packetdrill-2092 [001] d.s. 453.998486: : ee_data:1459

Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792
0.100 > S. 0:0(0) ack 1
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
0.200 write(4, ..., 11680) = 11680
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0

0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1

0.300 < . 1:1(0) ack 1 win 257
0.300 < . 1:1(0) ack 1 win 257
0.300 < . 1:1(0) ack 1 win 257
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 13141 win 257

0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2

Signed-off-by: Martin KaFai Lau
Cc: Eric Dumazet
Cc: Neal Cardwell
Cc: Soheil Hassas Yeganeh
Cc: Willem de Bruijn
Cc: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Tested-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller

Martin KaFai Lau
2016-04-22 02:40:55 +0800