Eric Lee / smarc-fsl-linux-kernel

12 Feb, 2016

1 commit

5de6ac75d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix BPF handling of branch offset adjustmnets on backjumps, from
Daniel Borkmann.

2) Make sure selinux knows about SOCK_DESTROY netlink messages, from
Lorenzo Colitti.

3) Fix openvswitch tunnel mtu regression, from David Wragg.

4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric
Dumazet.

5) Fix SCTP user hmacid byte ordering bug, from Xin Long.

6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov
Kasiviswanathan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
geneve: Relax MTU constraints
vxlan: Relax MTU constraints
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities.
selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
sctp: translate network order to host order when users get a hmacid
enic: increment devcmd2 result ring in case of timeout
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
net:Add sysctl_max_skb_frags
tcp: do not drop syn_recv on all icmp reports
ipv6: fix a lockdep splat
unix: correctly track in-flight fds in sending process user_struct
update be2net maintainers' email addresses
dwc_eth_qos: Reset hardware before PHY start
ipv6: addrconf: Fix recursive spin lock call

Linus Torvalds
2016-02-12 03:00:34 +0800

10 Feb, 2016

1 commit

7e059158d vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices ... Browse Code »

Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.

Signed-off-by: David Wragg
Signed-off-by: David S. Miller

David Wragg
2016-02-10 18:50:03 +0800

09 Feb, 2016

4 commits

461547f31 flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen ... Browse Code »

This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
cause us to check for the flow label.

Reported-by: Sowmini Varadhan
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-02-09 20:07:48 +0800
7a84bd466 sctp: translate network order to host order when users get a hmacid ... Browse Code »

Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.

We fix it by changing hmacids to host order when users get them with
getsockopt.

Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-02-09 17:53:16 +0800
5f74f82ea net:Add sysctl_max_skb_frags ... Browse Code »

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry
Reviewed-by: Håkon Bugge
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Hans Westgaard Ry
2016-02-09 17:28:06 +0800
9cf749036 tcp: do not drop syn_recv on all icmp reports ... Browse Code »

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: Petr Novopashenniy
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-02-09 17:15:37 +0800

08 Feb, 2016

2 commits

44c3d0c1c ipv6: fix a lockdep splat ... Browse Code »

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2016-02-08 23:33:32 +0800
415e3d3e9 unix: correctly track in-flight fds in sending process user_struct ... Browse Code »

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann
Cc: David Herrmann
Cc: Willy Tarreau
Cc: Linus Torvalds
Suggested-by: Linus Torvalds
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2016-02-08 23:30:42 +0800

06 Feb, 2016

2 commits

16186a82d ipv6: addrconf: Fix recursive spin lock call ... Browse Code »

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1} (t=2100 jiffies
g=3992 c=3991 q=4471)
Task dump for CPU 1:
kworker/1:0 R running task 12168 15 2 0x00000002
Workqueue: ipv6_addrconf addrconf_dad_work
Call trace:
[] el1_irq+0x68/0xdc
[] _raw_write_lock_bh+0x20/0x30
[] __ipv6_dev_ac_inc+0x64/0x1b4
[] addrconf_join_anycast+0x9c/0xc4
[] __ipv6_ifa_notify+0x160/0x29c
[] ipv6_ifa_notify+0x50/0x70
[] addrconf_dad_work+0x314/0x334
[] process_one_work+0x244/0x3fc
[] worker_thread+0x2f8/0x418
[] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet
Cc: Erik Kline
Cc: Hannes Frederic Sowa
Signed-off-by: Subash Abhinov Kasiviswanathan
Acked-by: Hannes Frederic Sowa
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

subashab@codeaurora.org
2016-02-06 16:08:15 +0800
5d6a6a75e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fixes from Sage Weil:
"We have a few wire protocol compatibility fixes, ports of a few recent
CRUSH mapping changes, and a couple error path fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: MOSDOpReply v7 encoding
libceph: advertise support for TUNABLES5
crush: decode and initialize chooseleaf_stable
crush: add chooseleaf_stable tunable
crush: ensure take bucket value is valid
crush: ensure bucket id is valid before indexing buckets array
ceph: fix snap context leak in error path
ceph: checking for IS_ERR instead of NULL

Linus Torvalds
2016-02-06 11:52:57 +0800

05 Feb, 2016

5 commits

b0b31a8ff libceph: MOSDOpReply v7 encoding ... Browse Code »

Empty request_redirect_t (struct ceph_request_redirect in the kernel
client) is now encoded with a bool. NEW_OSDOPREPLY_ENCODING feature
bit overlaps with already supported CRUSH_TUNABLES5.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-02-05 01:26:08 +0800
b9b519b78 crush: decode and initialize chooseleaf_stable ... Browse Code »

Also add missing \n while at it.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-02-05 01:25:58 +0800
dc6ae6d8e crush: add chooseleaf_stable tunable ... Browse Code »

Add a tunable to fix the bug that chooseleaf may cause unnecessary pg
migrations when some device fails.

Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-02-05 01:25:55 +0800
56a4f3091 crush: ensure take bucket value is valid ... Browse Code »

Ensure that the take argument is a valid bucket ID before indexing the
buckets array.

Reflects ceph.git commit 93ec538e8a667699876b72459b8ad78966d89c61.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-02-05 01:25:50 +0800
f224a6915 crush: ensure bucket id is valid before indexing buckets array ... Browse Code »

We were indexing the buckets array without verifying the index was
within the [0,max_buckets) range. This could happen because
a multistep rule does not have enough buckets and has CRUSH_ITEM_NONE
for an intermediate result, which would feed in CRUSH_ITEM_NONE and
make us crash.

Reflects ceph.git commit 976a24a326da8931e689ee22fce35feab5b67b76.

Signed-off-by: Ilya Dryomov
Reviewed-by: Sage Weil

Ilya Dryomov
2016-02-05 01:25:23 +0800

02 Feb, 2016

1 commit

34229b277 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.

1) Fix on-channel cancellation in mac80211, from Johannes Berg.

2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.

3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.

4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.

5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.

6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.

7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.

8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel

9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.

10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...

Linus Torvalds
2016-02-02 07:56:08 +0800

31 Jan, 2016

1 commit

53729eb17 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth ... Browse Code »

Johan Hedberg says:

====================
pull request: bluetooth 2016-01-30

Here's a set of important Bluetooth fixes for the 4.5 kernel:

- Two fixes to 6LoWPAN code (one fixing a potential crash)
- Fix LE pairing with devices using both public and random addresses
- Fix allocation of dynamic LE PSM values
- Fix missing COMPATIBLE_IOCTL for UART line discipline

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-01-31 07:32:42 +0800

30 Jan, 2016

10 commits

99b4dd9f2 tcp: avoid cwnd undo after receiving ECN ... Browse Code »

RFC 4015 section 3.4 says the TCP sender MUST refrain from
reversing the congestion control state when the ACK signals
congestion through the ECN-Echo flag. Currently we may not
always do that when prior_ssthresh is reset upon receiving
ACKs with ECE marks. This patch fixes that.

Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2016-01-30 15:03:56 +0800
3d45296ab irda: fix a potential use-after-free in ircomm_param_request ... Browse Code »

self->ctrl_skb is protected by self->spinlock, we should not
access it out of the lock. Move the debugging printk inside.

Reported-by: Dmitry Vyukov
Cc: Samuel Ortiz
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-01-30 14:56:46 +0800
1cdda9187 ipv6/udp: use sticky pktinfo egress ifindex on connect() ... Browse Code »

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Paolo Abeni
2016-01-30 12:31:26 +0800
6f21c96a7 ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() ... Browse Code »

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Acked-by: David Ahern
Signed-off-by: David S. Miller

Paolo Abeni
2016-01-30 12:31:26 +0800
aa3a02209 netlink: not trim skb for mmaped socket when dump ... Browse Code »

We should not trim skb for mmaped socket since its buf size is fixed
and userspace will read as frame which data equals head. mmaped
socket will not call recvmsg, means max_recvmsg_len is 0,
skb_reserve was not called before commit: db65a3aaf29e.

Fixes: db65a3aaf29e (netlink: Trim skb to alloc size to avoid MSG_TRUNC)
Signed-off-by: Ken-ichirou MATSUZAWA
Signed-off-by: David S. Miller

Ken-ichirou MATSUZAWA
2016-01-30 12:25:17 +0800
a5829f536 fib_trie: Fix shift by 32 in fib_table_lookup ... Browse Code »

The fib_table_lookup function had a shift by 32 that triggered a UBSAN
warning. This was due to the fact that I had placed the shift first and
then followed it with the check for the suffix length to ignore the
undefined behavior. If we reorder this so that we verify the suffix is
less than 32 before shifting the value we can avoid the issue.

Reported-by: Toralf Förster
Signed-off-by: Alexander Duyck
Signed-off-by: David S. Miller

Alexander Duyck
2016-01-30 11:41:00 +0800
52b79e2bd ipv4: ipconfig: avoid unused ic_proto_used symbol ... Browse Code »

When CONFIG_PROC_FS, CONFIG_IP_PNP_BOOTP, CONFIG_IP_PNP_DHCP and
CONFIG_IP_PNP_RARP are all disabled, we get a warning about the
ic_proto_used variable being unused:

net/ipv4/ipconfig.c:146:12: error: 'ic_proto_used' defined but not used [-Werror=unused-variable]

This avoids the warning, by making the definition conditional on
whether a dynamic IP configuration protocol is configured. If not,
we know that the value is always zero, so we can optimize away the
variable and all code that depends on it.

Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2016-01-30 11:39:09 +0800
df3eb6cd6 net_sched: drr: check for NULL pointer in drr_dequeue ... Browse Code »

There are cases where qdisc_dequeue_peeked can return NULL, and the result
is dereferenced later on in the function.

Similarly to the other qdisc dequeue functions, check whether the skb
pointer is NULL and if it is, goto out.

Signed-off-by: Bernie Harris
Reviewed-by: Cong Wang
Signed-off-by: David S. Miller

Bernie Harris
2016-01-30 09:26:44 +0800
4d5cfcba2 tipc: fix connection abort during subscription cancel ... Browse Code »

In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.

Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.

In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")

Reviewed-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-01-30 04:14:21 +0800
63e51b6a2 ipv4: early demux should be aware of fragments ... Browse Code »

We should not assume a valid protocol header is present,
as this is not the case for IPv4 fragments.

Lets avoid extra cache line misses and potential bugs
if we actually find a socket and incorrectly uses its dst.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-30 04:14:20 +0800

29 Jan, 2016

12 commits

cff10ce7b Bluetooth: Fix incorrect removing of IRKs ... Browse Code »

The commit cad20c278085d893ebd616cd20c0747a8e9d53c7 was supposed to
fix handling of devices first using public addresses and then
switching to RPAs after pairing. Unfortunately it missed a couple of
key places in the code.

1. When evaluating which devices should be removed from the existing
white list we also need to consider whether we have an IRK for them or
not, i.e. a call to hci_find_irk_by_addr() is needed.

2. In smp_notify_keys() we should not be requiring the knowledge of
the RPA, but should simply keep the IRK around if the other conditions
require it.

Signed-off-by: Johan Hedberg
Signed-off-by: Marcel Holtmann
Cc: stable@vger.kernel.org # 4.4+

Johan Hedberg
2016-01-29 18:47:24 +0800
a2342c5fe Bluetooth: L2CAP: Fix setting chan src info before adding PSM/CID ... Browse Code »

At least the l2cap_add_psm() routine depends on the source address
type being properly set to know what auto-allocation ranges to use, so
the assignment to l2cap_chan needs to happen before this.

Signed-off-by: Johan Hedberg
Signed-off-by: Marcel Holtmann

Johan Hedberg
2016-01-29 18:47:24 +0800
92594a511 Bluetooth: L2CAP: Fix auto-allocating LE PSM values ... Browse Code »

The LE dynamic PSM range is different from BR/EDR (0x0080 - 0x00ff)
and doesn't have requirements relating to parity, so separate checks
are needed.

Signed-off-by: Johan Hedberg
Signed-off-by: Marcel Holtmann

Johan Hedberg
2016-01-29 18:47:24 +0800
114f9f1e0 Bluetooth: L2CAP: Introduce proper defines for PSM ranges ... Browse Code »

Having proper defines makes the code a bit readable, it also avoids
duplicating hard-coded values since these are also needed when
auto-allocating PSM values (in a subsequent patch).

Signed-off-by: Johan Hedberg
Signed-off-by: Marcel Holtmann

Johan Hedberg
2016-01-29 18:47:24 +0800
ff5d74977 tcp: beware of alignments in tcp_get_info() ... Browse Code »

With some combinations of user provided flags in netlink command,
it is possible to call tcp_get_info() with a buffer that is not 8-bytes
aligned.

It does matter on some arches, so we need to use put_unaligned() to
store the u64 fields.

Current iproute2 package does not trigger this particular issue.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2016-01-29 14:49:30 +0800
4f2c6ae5c switchdev: Require RTNL mutex to be held when sending FDB notifications ... Browse Code »

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Ido Schimmel
2016-01-29 08:21:31 +0800
bd7c5e315 Merge tag 'mac80211-for-davem-2016-01-26' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Johannes Berg says:

====================
Here's a first set of fixes for the 4.5-rc cycle:
* make regulatory messages much less verbose by default
* various remain-on-channel fixes
* scheduled scanning fixes with hardware restart
* a PS-Poll handling fix; was broken just recently
* bugfix to avoid buffering non-bufferable MMPDUs
* world regulatory domain data fix
* a fix for scanning causing other work to get stuck
* hwsim: revert an older problematic patch that caused some
userspace tools to have issues - not that big a deal as
it's a debug only driver though
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2016-01-29 08:05:45 +0800
d88270eef tcp: fix tcp_mark_head_lost to check skb len before fragmenting ... Browse Code »

This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.

For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.

This sounds very counterintuitive, but as the commit description for
the related commit c0638c247f55 ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c247f55 commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.

Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Neal Cardwell
2016-01-29 08:02:48 +0800
8282f2744 inet: frag: Always orphan skbs inside ip_defrag() ... Browse Code »

Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.

While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.

kernel BUG at net/ipv4/ip_output.c:586!
[...]
Call Trace:

[] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
[] ovs_fragment+0xcc/0x214 [openvswitch]
[] ? dst_discard_out+0x20/0x20
[] ? dst_ifdown+0x80/0x80
[] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
[] ? mod_timer_pending+0x65/0x210
[] ? __lock_acquire+0x3db/0x1b90
[] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
[] ? __lock_is_held+0x54/0x70
[] do_output.isra.29+0xe3/0x1b0 [openvswitch]
[] do_execute_actions+0xe11/0x11f0 [openvswitch]
[] ? __lock_is_held+0x54/0x70
[] ovs_execute_actions+0x32/0xd0 [openvswitch]
[] ovs_dp_process_packet+0x85/0x140 [openvswitch]
[] ? __lock_is_held+0x54/0x70
[] ovs_execute_actions+0xb2/0xd0 [openvswitch]
[] ovs_dp_process_packet+0x85/0x140 [openvswitch]
[] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
[] ovs_vport_receive+0x5d/0xa0 [openvswitch]
[] ? __lock_acquire+0x3db/0x1b90
[] ? __lock_acquire+0x3db/0x1b90
[] ? __lock_acquire+0x3db/0x1b90
[] ? internal_dev_xmit+0x5/0x140 [openvswitch]
[] internal_dev_xmit+0x6c/0x140 [openvswitch]
[] ? internal_dev_xmit+0x5/0x140 [openvswitch]
[] dev_hard_start_xmit+0x2b9/0x5e0
[] ? netif_skb_features+0xd1/0x1f0
[] __dev_queue_xmit+0x800/0x930
[] ? __dev_queue_xmit+0x50/0x930
[] ? mark_held_locks+0x71/0x90
[] ? neigh_resolve_output+0x106/0x220
[] dev_queue_xmit+0x10/0x20
[] neigh_resolve_output+0x178/0x220
[] ? ip_finish_output2+0x1ff/0x590
[] ip_finish_output2+0x1ff/0x590
[] ? ip_finish_output2+0x7e/0x590
[] ip_do_fragment+0x831/0x8a0
[] ? ip_copy_metadata+0x1b0/0x1b0
[] ip_fragment.constprop.49+0x43/0x80
[] ip_finish_output+0x17c/0x340
[] ? nf_hook_slow+0xe4/0x190
[] ip_output+0x70/0x110
[] ? ip_fragment.constprop.49+0x80/0x80
[] ip_local_out+0x39/0x70
[] ip_send_skb+0x19/0x40
[] ip_push_pending_frames+0x33/0x40
[] icmp_push_reply+0xea/0x120
[] icmp_reply.constprop.23+0x1ed/0x230
[] icmp_echo.part.21+0x4e/0x50
[] ? __lock_is_held+0x54/0x70
[] ? rcu_read_lock_held+0x5e/0x70
[] icmp_echo+0x36/0x70
[] icmp_rcv+0x271/0x450
[] ip_local_deliver_finish+0x127/0x3a0
[] ? ip_local_deliver_finish+0x41/0x3a0
[] ip_local_deliver+0x60/0xd0
[] ? ip_rcv_finish+0x560/0x560
[] ip_rcv_finish+0xdd/0x560
[] ip_rcv+0x283/0x3e0
[] ? match_held_lock+0x192/0x200
[] ? inet_del_offload+0x40/0x40
[] __netif_receive_skb_core+0x392/0xae0
[] ? process_backlog+0x8e/0x230
[] ? mark_held_locks+0x71/0x90
[] __netif_receive_skb+0x18/0x60
[] process_backlog+0x78/0x230
[] ? process_backlog+0xdd/0x230
[] net_rx_action+0x155/0x400
[] __do_softirq+0xcc/0x420
[] ? ip_finish_output2+0x217/0x590
[] do_softirq_own_stack+0x1c/0x30

[] do_softirq+0x4e/0x60
[] __local_bh_enable_ip+0xa8/0xb0
[] ip_finish_output2+0x240/0x590
[] ? ip_do_fragment+0x831/0x8a0
[] ip_do_fragment+0x831/0x8a0
[] ? ip_copy_metadata+0x1b0/0x1b0
[] ip_fragment.constprop.49+0x43/0x80
[] ip_finish_output+0x17c/0x340
[] ? nf_hook_slow+0xe4/0x190
[] ip_output+0x70/0x110
[] ? ip_fragment.constprop.49+0x80/0x80
[] ip_local_out+0x39/0x70
[] ip_send_skb+0x19/0x40
[] ip_push_pending_frames+0x33/0x40
[] raw_sendmsg+0x7d3/0xc30
[] ? __lock_acquire+0x3db/0x1b90
[] ? inet_sendmsg+0xc7/0x1d0
[] ? __lock_is_held+0x54/0x70
[] inet_sendmsg+0x10a/0x1d0
[] ? inet_sendmsg+0x5/0x1d0
[] sock_sendmsg+0x38/0x50
[] ___sys_sendmsg+0x25f/0x270
[] ? handle_mm_fault+0x8dd/0x1320
[] ? _raw_spin_unlock+0x27/0x40
[] ? __do_page_fault+0x1e2/0x460
[] ? __fget_light+0x66/0x90
[] __sys_sendmsg+0x42/0x80
[] SyS_sendmsg+0x12/0x20
[] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
RIP [] ip_do_fragment+0x892/0x8a0
RSP

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer
Signed-off-by: David S. Miller

Joe Stringer
2016-01-29 08:00:46 +0800
47faa1e4c sctp: remove the dead field of sctp_transport ... Browse Code »

After we use refcnt to check if transport is alive, the dead can be
removed from sctp_transport.

The traversal of transport_addr_list in procfs dump is using
list_for_each_entry_rcu, no need to check if it has been freed.

sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
protected by sock lock, it's not necessary to check dead, either.
also, the timers are cancelled when sctp_transport_free() is
called, that it doesn't wait for refcnt to reach 0 to cancel them.

Signed-off-by: Xin Long
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-01-29 07:59:32 +0800
fba4c330c sctp: hold transport before we access t->asoc in sctp proc ... Browse Code »

Previously, before rhashtable, /proc assoc listing was done by
read-locking the entire hash entry and dumping all assocs at once, so we
were sure that the assoc wasn't freed because it wouldn't be possible to
remove it from the hash meanwhile.

Now we use rhashtable to list transports, and dump entries one by one.
That is, now we have to check if the assoc is still a good one, as the
transport we got may be being freed.

Signed-off-by: Xin Long
Reviewed-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-01-29 07:59:32 +0800
1eed67793 sctp: fix the transport dead race check by using atomic_add_unless on refcnt ... Browse Code »

Now when __sctp_lookup_association is running in BH, it will try to
check if t->dead is set, but meanwhile other CPUs may be freeing this
transport and this assoc and if it happens that
__sctp_lookup_association checked t->dead a bit too early, it may think
that the association is still good while it was already freed.

So we fix this race by using atomic_add_unless in sctp_transport_hold.
After we get one transport from hashtable, we will hold it only when
this transport's refcnt is not 0, so that we can make sure t->asoc
cannot be freed before we hold the asoc again.

Note that sctp association is not freed using RCU so we can't use
atomic_add_unless() with it as it may just be too late for that either.

Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path")
Reported-by: Vlad Yasevich
Signed-off-by: Xin Long
Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Xin Long
2016-01-29 07:59:32 +0800

26 Jan, 2016

1 commit

6736fde96 rfkill: fix rfkill_fop_read wait_event usage ... Browse Code »

The code within wait_event_interruptible() is called with
!TASK_RUNNING, so mustn't call any functions that can sleep,
like mutex_lock().

Since we re-check the list_empty() in a loop after the wait,
it's safe to simply use list_empty() without locking.

This bug has existed forever, but was only discovered now
because all userspace implementations, including the default
'rfkill' tool, use poll() or select() to get a readable fd
before attempting to read.

Cc: stable@vger.kernel.org
Fixes: c64fb01627e24 ("rfkill: create useful userspace interface")
Reported-by: Dmitry Vyukov
Signed-off-by: Johannes Berg

Johannes Berg
2016-01-26 18:32:05 +0800