Eric Lee / smarc-fsl-linux-kernel

20 Sep, 2016

1 commit

d979a39d7 cgroup: duplicate cgroup reference when cloning sockets ... Browse Code »

When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: [4.5+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2016-09-20 06:36:17 +0800

17 Sep, 2016

1 commit

87ee1280f Merge tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd bugfix from Bruce Fields:
"Fix a memory corruption bug that I introduced in 4.7"

* tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linux:
svcauth_gss: Revert 64c59a3726f2 ("Remove unnecessary allocation")

Linus Torvalds
2016-09-17 08:00:26 +0800

13 Sep, 2016

2 commits

2c937eb4d Merge tag 'nfs-for-4.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

Stable patches:
- We must serialise LAYOUTGET and LAYOUTRETURN to ensure correct
state accounting
- Fix the CREATE_SESSION slot number

Bugfixes:
- sunrpc: fix a UDP memory accounting regression
- NFS: Fix an error reporting regression in nfs_file_write()
- pNFS: Fix further layout stateid issues
- RPC/rdma: Revert 3d4cf35bd4fa ("xprtrdma: Reply buffer
exhaustion...")
- RPC/rdma: Fix receive buffer accounting"

* tag 'nfs-for-4.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.1: Fix the CREATE_SESSION slot number accounting
xprtrdma: Fix receive buffer accounting
xprtrdma: Revert 3d4cf35bd4fa ("xprtrdma: Reply buffer exhaustion...")
pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs
pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present
pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID
pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
NFS: Fix error reporting in nfs_file_write()
sunrpc: fix UDP memory accounting

Linus Torvalds
2016-09-13 05:13:45 +0800
bf2c4b6f9 svcauth_gss: Revert 64c59a3726f2 ("Remove unnecessary allocation") ... Browse Code »

rsc_lookup steals the passed-in memory to avoid doing an allocation of
its own, so we can't just pass in a pointer to memory that someone else
is using.

If we really want to avoid allocation there then maybe we should
preallocate somwhere, or reference count these handles.

For now we should revert.

On occasion I see this on my server:

kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: events do_cache_clean [sunrpc]
kernel: task: ffff8808541d8000 task.stack: ffff880854344000
kernel: RIP: 0010:[] [] kfree+0x155/0x180
kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
kernel: Stack:
kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
kernel: Call Trace:
kernel: [] rsc_free+0x16/0x90 [auth_rpcgss]
kernel: [] rsc_put+0x16/0x30 [auth_rpcgss]
kernel: [] cache_clean+0x2e0/0x300 [sunrpc]
kernel: [] do_cache_clean+0xe/0x70 [sunrpc]
kernel: [] process_one_work+0x1ff/0x3b0
kernel: [] worker_thread+0x2bc/0x4a0
kernel: [] ? rescuer_thread+0x3a0/0x3a0
kernel: [] kthread+0xe4/0xf0
kernel: [] ret_from_fork+0x1f/0x40
kernel: [] ? kthread_stop+0x110/0x110
kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
kernel: RIP [] kfree+0x155/0x180
kernel: RSP
kernel: ---[ end trace 3fdec044969def26 ]---

It seems to be most common after a server reboot where a client has been
using a Kerberos mount, and reconnects to continue its workload.

Signed-off-by: Chuck Lever
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields

Chuck Lever
2016-09-13 04:57:16 +0800

12 Sep, 2016

1 commit

da499f8f5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Mostly small sets of driver fixes scattered all over the place.

1) Mediatek driver fixes from Sean Wang. Forward port not written
correctly during TX map, missed handling of EPROBE_DEFER, and
mistaken use of put_page() instead of skb_free_frag().

2) Fix socket double-free in KCM code, from WANG Cong.

3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
using the dcbx buffers before initializing them.

4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
double fib removals and an error handling fix in
mlxsw_sp_module_init().

5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
Ertman.

6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.

7) TCP's rcv_wup not initialized properly when using fastopen, from
Neal Cardwell.

8) Don't use uninitialized flow keys in flow dissector, from Gao
Feng.

9) Use after free in l2tp module unload, from Sabrina Dubroca.

10) Fix interrupt registry ordering issues in smsc911x driver, from
Jeremy Linton.

11) Fix crashes in bonding having to do with enslaving and rx_handler,
from Mahesh Bandewar.

12) AF_UNIX deadlock fixes from Linus.

13) In mlx5 driver, don't read skb->xmit_mode after it might have been
freed from the TX reclaim path. From Tariq Toukan.

14) Fix a bug from 2015 in TCP Yeah where the congestion window does
not increase, from Artem Germanov.

15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.

16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
Leitner.

17) Fix deletion of VRF routes, from Mark Tomlinson.

18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
net/mlx4_en: Fix panic on xmit while port is down
net/mlx4_en: Fixes for DCBX
net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
net: ethernet: renesas: sh_eth: add POST registers for rz
drivers: net: phy: mdio-xgene: Add hardware dependency
dwc_eth_qos: do not register semi-initialized device
sctp: identify chunks that need to be fragmented at IP level
mlxsw: spectrum: Set port type before setting its address
mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
nfp: don't pad frames on receive
nfp: drop support for old firmware ABIs
nfp: remove linux/version.h includes
tcp: cwnd does not increase in TCP YeAH
net/mlx5e: Fix parsing of vlan packets when updating lro header
net/mlx5e: Fix global PFC counters replication
net/mlx5e: Prevent casting overflow
net/mlx5e: Move an_disable_cap bit to a new position
net/mlx5e: Fix xmit_more counter race issue
tcp: fastopen: avoid negative sk_forward_alloc
...

Linus Torvalds
2016-09-12 22:56:06 +0800

10 Sep, 2016

1 commit

7303a1475 sctp: identify chunks that need to be fragmented at IP level ... Browse Code »

Previously, without GSO, it was easy to identify it: if the chunk didn't
fit and there was no data chunk in the packet yet, we could fragment at
IP level. So if there was an auth chunk and we were bundling a big data
chunk, it would fragment regardless of the size of the auth chunk. This
also works for the context of PMTU reductions.

But with GSO, we cannot distinguish such PMTU events anymore, as the
packet is allowed to exceed PMTU.

So we need another check: to ensure that the chunk that we are adding,
actually fits the current PMTU. If it doesn't, trigger a flush and let
it be fragmented at IP level in the next round.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2016-09-10 10:18:33 +0800

09 Sep, 2016

3 commits

db7196a0d tcp: cwnd does not increase in TCP YeAH ... Browse Code »

Commit 76174004a0f19785a328f40388e87e982bbf69b9
(tcp: do not slow start when cwnd equals ssthresh )
introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth
~100KB/s.
That is caused by stalled cwnd when cwnd equals ssthresh. This patch
fixes it by proper increasing cwnd in this case.

Signed-off-by: Artem Germanov
Acked-by: Dmitry Adamushko
Signed-off-by: David S. Miller

Artem Germanov
2016-09-09 08:16:12 +0800
76061f631 tcp: fastopen: avoid negative sk_forward_alloc ... Browse Code »

When DATA and/or FIN are carried in a SYN/ACK message or SYN message,
we append an skb in socket receive queue, but we forget to call
sk_forced_mem_schedule().

Effect is that the socket has a negative sk->sk_forward_alloc as long as
the message is not read by the application.

Josh Hunt fixed a similar issue in commit d22e15371811 ("tcp: fix tcp
fin memory accounting")

Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet
Reviewed-by: Josh Hunt
Signed-off-by: David S. Miller

Eric Dumazet
2016-09-09 07:08:10 +0800
40e3012e6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ... Browse Code »

Steffen Klassert says:

====================
ipsec 2016-09-08

1) Fix a crash when xfrm_dump_sa returns an error.
From Vegard Nossum.

2) Remove some incorrect WARN() on normal error handling.
From Vegard Nossum.

3) Ignore socket policies when rebuilding hash tables,
socket policies are not inserted into the hash tables.
From Tobias Brunner.

4) Initialize and check tunnel pointers properly before
we use it. From Alexey Kodanev.

5) Fix l3mdev oif setting on xfrm dst lookups.
From David Ahern.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-09-09 04:12:37 +0800

07 Sep, 2016

5 commits

751eb6b60 ipv6: addrconf: fix dev refcont leak when DAD failed ... Browse Code »

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
-> addrconf_dad_failure addrconf_mod_dad_work
-> schedule addrconf_dad_work()
-> addrconf_dad_stop()
Signed-off-by: David S. Miller

Wei Yongjun
2016-09-07 05:17:49 +0800
5a56a0b3a net: Don't delete routes in different VRFs ... Browse Code »

When deleting an IP address from an interface, there is a clean-up of
routes which refer to this local address. However, there was no check to
see that the VRF matched. This meant that deletion wasn't confined to
the VRF it should have been.

To solve this, a new field has been added to fib_info to hold a table
id. When removing fib entries corresponding to a local ip address, this
table id is also used in the comparison.

The table id is populated when the fib_info is created. This was already
done in some places, but not in ip_rt_ioctl(). This has now been fixed.

Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device")
Acked-by: David Ahern
Tested-by: David Ahern
Signed-off-by: Mark Tomlinson
Signed-off-by: David S. Miller

Mark Tomlinson
2016-09-07 04:56:13 +0800
05c974669 xprtrdma: Fix receive buffer accounting ... Browse Code »

An RPC can terminate before its reply arrives, if a credential
problem or a soft timeout occurs. After this happens, xprtrdma
reports it is out of Receive buffers.

A Receive buffer is posted before each RPC is sent, and returned to
the buffer pool when a reply is received. If no reply is received
for an RPC, that Receive buffer remains posted. But xprtrdma tries
to post another when the next RPC is sent.

If this happens a few dozen times, there are no receive buffers left
to be posted at send time. I don't see a way for a transport
connection to recover at that point, and it will spit warnings and
unnecessarily delay RPCs on occasion for its remaining lifetime.

Commit 1e465fd4ff47 ("xprtrdma: Replace send and receive arrays")
removed a little bit of logic to detect this case and not provide
a Receive buffer so no more buffers are posted, and then transport
operation continues correctly. We didn't understand what that logic
did, and it wasn't commented, so it was removed as part of the
overhaul to support backchannel requests.

Restore it, but be wary of the need to keep extra Receives posted
to deal with backchannel requests.

Fixes: 1e465fd4ff47 ("xprtrdma: Replace send and receive arrays")
Signed-off-by: Chuck Lever
Reviewed-by: Anna Schumaker
Signed-off-by: Trond Myklebust

Chuck Lever
2016-09-07 03:59:35 +0800
78d506e1b xprtrdma: Revert 3d4cf35bd4fa ("xprtrdma: Reply buffer exhaustion...") ... Browse Code »

Receive buffer exhaustion, if it were to actually occur, would be
catastrophic. However, when there are no reply buffers to post, that
means all of them have already been posted and are waiting for
incoming replies. By design, there can never be more RPCs in flight
than there are available receive buffers.

A receive buffer can be left posted after an RPC exits without a
received reply; say, due to a credential problem or a soft timeout.
This does not result in fewer posted receive buffers than there are
pending RPCs, and there is already logic in xprtrdma to deal
appropriately with this case.

It also looks like the "+ 2" that was removed was accidentally
accommodating the number of extra receive buffers needed for
receiving backchannel requests. That will need to be addressed by
another patch.

Fixes: 3d4cf35bd4fa ("xprtrdma: Reply buffer exhaustion can be...")
Signed-off-by: Chuck Lever
Reviewed-by: Anna Schumaker
Signed-off-by: Trond Myklebust

Chuck Lever
2016-09-07 03:59:35 +0800
03c2778a9 ipv6: release dst in ping_v6_sendmsg ... Browse Code »

Neither the failure or success paths of ping_v6_sendmsg release
the dst it acquires. This leads to a flood of warnings from
"net/core/dst.c:288 dst_release" on older kernels that
don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.

That patch optimistically hoped this had been fixed post 3.10, but
it seems at least one case wasn't, where I've seen this triggered
a lot from machines doing unprivileged icmp sockets.

Cc: Martin Lau
Signed-off-by: Dave Jones
Acked-by: Martin KaFai Lau
Signed-off-by: David S. Miller

Dave Jones
2016-09-07 03:54:17 +0800

05 Sep, 2016

3 commits

6e1ce3c34 af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' ... Browse Code »

Right now we use the 'readlock' both for protecting some of the af_unix
IO path and for making the bind be single-threaded.

The two are independent, but using the same lock makes for a nasty
deadlock due to ordering with regards to filesystem locking. The bind
locking would want to nest outside the VSF pathname locking, but the IO
locking wants to nest inside some of those same locks.

We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
splice-bind deadlock") which moved the readlock inside the vfs locks,
but that caused problems with overlayfs that will then call back into
filesystem routines that take the lock in the wrong order anyway.

Splitting the locks means that we can go back to having the bind lock be
the outermost lock, and we don't have any deadlocks with lock ordering.

Acked-by: Rainer Weikusat
Acked-by: Al Viro
Signed-off-by: Linus Torvalds
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Linus Torvalds
2016-09-05 04:29:29 +0800
38f7bd94a Revert "af_unix: Fix splice-bind deadlock" ... Browse Code »

This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1.

It turns out that it just replaces one deadlock with another one: we can
still get the wrong lock ordering with the readlock due to overlayfs
calling back into the filesystem layer and still taking the vfs locks
after the readlock.

The proper solution ends up being to just split the readlock into two
pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
(taken *inside* the filesystem locks). The two locks are independent
anyway.

Signed-off-by: Linus Torvalds
Reviewed-by: Shmulik Ladkani
Signed-off-by: David S. Miller

Linus Torvalds
2016-09-05 04:29:29 +0800
24b27fc4c bonding: Fix bonding crash ... Browse Code »

Following few steps will crash kernel -

(a) Create bonding master
> modprobe bonding miimon=50
(b) Create macvlan bridge on eth2
> ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
type macvlan
(c) Now try adding eth2 into the bond
> echo +eth2 > /sys/class/net/bond0/bonding/slaves

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Mahesh Bandewar
2016-09-05 02:41:12 +0800

03 Sep, 2016

2 commits

a41bd25ae sunrpc: fix UDP memory accounting ... Browse Code »

The commit f9b2ee714c5c ("SUNRPC: Move UDP receive data path
into a workqueue context"), as a side effect, moved the
skb_free_datagram() call outside the scope of the related socket
lock, but UDP sockets require such lock to be held for proper
memory accounting.
Fix it by replacing skb_free_datagram() with
skb_free_datagram_locked().

Fixes: f9b2ee714c5c ("SUNRPC: Move UDP receive data path into a workqueue context")
Reported-and-tested-by: Jan Stancek
Signed-off-by: Paolo Abeni
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Trond Myklebust

Paolo Abeni
2016-09-03 22:00:49 +0800
2f86953e7 l2tp: fix use-after-free during module unload ... Browse Code »

Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
-> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
l2tp_tunnel_destruct).

By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
destroying the socket, the private data reserved via the net_generic
mechanism has already been freed, but l2tp_tunnel_destruct() actually
uses this data.

Make sure tunnel deletion for the netns has completed before returning
from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
then waiting for RCU callbacks to complete.

Fixes: 167eb17e0b17 ("l2tp: create tunnel sockets in the right namespace")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller

Sabrina Dubroca
2016-09-03 02:44:44 +0800

02 Sep, 2016

7 commits

ab3438016 ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit() ... Browse Code »

Commit 8eb30be0352d0916 ("ipv6: Create ip6_tnl_xmit") unsets
flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit().
Since xfrm_selector_match() relies on this info, IPv6 packets
sent by an ip6tunnel cannot be properly selected by their
protocols after removing it. This patch puts flowi6_proto back.

Cc: stable@vger.kernel.org
Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit")
Signed-off-by: Eli Cooper
Signed-off-by: David S. Miller

Eli Cooper
2016-09-02 14:41:24 +0800
635c223cf rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly ... Browse Code »

The original codes depend on that the function parameters are evaluated from
left to right. But the parameter's evaluation order is not defined in C
standard actually.

When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys,
hashrnd) with some compilers or environment, the keys passed to
flow_keys_have_l4 is not initialized.

Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")

Acked-by: Eric Dumazet
Signed-off-by: Gao Feng
Signed-off-by: David S. Miller

Gao Feng
2016-09-02 13:45:03 +0800
28b346cbc tcp: fastopen: fix rcv_wup initialization for TFO server on SYN/data ... Browse Code »

Yuchung noticed that on the first TFO server data packet sent after
the (TFO) handshake, the server echoed the TCP timestamp value in the
SYN/data instead of the timestamp value in the final ACK of the
handshake. This problem did not happen on regular opens.

The tcp_replace_ts_recent() logic that decides whether to remember an
incoming TS value needs tp->rcv_wup to hold the latest receive
sequence number that we have ACKed (latest tp->rcv_nxt we have
ACKed). This commit fixes this issue by ensuring that a TFO server
properly updates tp->rcv_wup to match tp->rcv_nxt at the time it sends
a SYN/ACK for the SYN/data.

Reported-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: David S. Miller

Neal Cardwell
2016-09-02 07:40:15 +0800
85a3d4a93 net: bridge: don't increment tx_dropped in br_do_proxy_arp ... Browse Code »

pskb_may_pull may fail due to various reasons (e.g. alloc failure), but the
skb isn't changed/dropped and processing continues so we shouldn't
increment tx_dropped.

CC: Kyeyoon Park
CC: Roopa Prabhu
CC: Stephen Hemminger
CC: bridge@lists.linux-foundation.org
Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP")
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller

Nikolay Aleksandrov
2016-09-02 07:35:30 +0800
29c994e36 netconf: add a notif when settings are created ... Browse Code »

All changes are notified, but the initial state was missing.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-09-02 06:18:08 +0800
d26c638c1 ipv6: add missing netconf notif when 'all' is updated ... Browse Code »

The 'default' value was not advertised.

Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status")
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2016-09-02 06:18:08 +0800
d2f394dc4 tipc: fix random link resets while adding a second bearer ... Browse Code »

In a dual bearer configuration, if the second tipc link becomes
active while the first link still has pending nametable "bulk"
updates, it randomly leads to reset of the second link.

When a link is established, the function named_distribute(),
fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
with NAME_DISTRIBUTOR message for each PUBLICATION.
However, the function named_distribute() allocates the buffer by
increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
This consumes the space allocated for TUNNEL_PROTOCOL.

When establishing the second link, the link shall tunnel all the
messages in the first link queue including the "bulk" update.
As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
the link mtu the transmission fails (-EMSGSIZE).

Thus, the synch point based on the message count of the tunnel
packets is never reached leading to link timeout.

In this commit, we adjust the size of name distributor message so that
they can be tunnelled.

Reviewed-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-09-02 01:12:26 +0800

01 Sep, 2016

2 commits

c0338aff2 kcm: fix a socket double free ... Browse Code »

Dmitry reported a double free on kcm socket, which could
be easily reproduced by:

#include
#include

int main()
{
int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0);
syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0);
return 0;
}

This is because on the error path, after we install
the new socket file, we call sock_release() to clean
up the socket, which leaves the fd pointing to a freed
socket. Fix this by calling sys_close() on that fd
directly.

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: Dmitry Vyukov
Cc: Tom Herbert
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-09-01 12:00:19 +0800
9264251ee bridge: re-introduce 'fix parsing of MLDv2 reports' ... Browse Code »

commit bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with
INCLUDE and no sources as a leave") seems to have accidentally reverted
commit 47cc84ce0c2f ("bridge: fix parsing of MLDv2 reports"). This
commit brings back a change to br_ip6_multicast_mld2_report() where
parsing of MLDv2 reports stops when the first group is successfully
added to the MDB cache.

Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave")
Signed-off-by: Davide Caratti
Acked-by: Nikolay Aleksandrov
Acked-by: Thadeu Lima de Souza Cascardo
Signed-off-by: David S. Miller

Davide Caratti
2016-09-01 00:29:58 +0800

31 Aug, 2016

3 commits

2df5d103a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Allow nf_tables reject expression from input, forward and output hooks,
since only there the routing information is available, otherwise we crash.

2) Fix unsafe list iteration when flushing timeout and accouting objects.

3) Fix refcount leak on timeout policy parsing failure.

4) Unlink timeout object for unconfirmed conntracks too

5) Missing validation of pkttype mangling from bridge family.

6) Fix refcount leak on ebtables on second lookup for the specific
bridge match extension, this patch from Sabrina Dubroca.

7) Remove unnecessary ip_hdr() in nf_tables_netdev family.

Patches from 1-5 and 7 from Liping Zhang.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-31 13:02:09 +0800
15543692a Merge tag 'mac80211-for-davem-2016-08-30' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three little fixes:
* revert a recent wext patch, which Ben Hutchings noticed was
wrong, and it turns out not to be necessary for any driver

* fix an infinite loop that can occur under certain conditions
in mac80211's TDLS code (depending on regulatory information)

* add a cfg80211_get_station() static inline when cfg80211 isn't
built, to allow other modules to not have to depend on it for it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2016-08-31 12:34:48 +0800
0cf21c660 Merge tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

Stable patches:
- Fix a refcount leak in nfs_callback_up_net
- Fix an Oopsable condition when the flexfile pNFS driver connection
to the DS fails
- Fix an Oopsable condition in NFSv4.1 server callback races
- Ensure pNFS clients stop doing I/O to the DS if their lease has
expired, as required by the NFSv4.1 protocol

Bugfixes:
- Fix potential looping in the NFSv4.x migration code
- Patch series to close callback races for OPEN, LAYOUTGET and
LAYOUTRETURN
- Silence WARN_ON when NFSv4.1 over RDMA is in use
- Fix a LAYOUTCOMMIT race in the pNFS/blocks client
- Fix pNFS timeout issues when the DS fails"

* tag 'nfs-for-4.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.x: Fix a refcount leak in nfs_callback_up_net
NFS4: Avoid migration loops
pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
NFSv4.1: Defer bumping the slot sequence number until we free the slot
NFSv4.1: Delay callback processing when there are referring triples
NFSv4.1: Fix Oopsable condition in server callback races
SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
pnfs/blocklayout: update last_write_offset atomically with extents
pNFS: The client must not do I/O to the DS if it's lease has expired
pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls
pNFS/flexfiles: Set reasonable default retrans values for the data channel
NFS: Allow the mount option retrans=0
pNFS/flexfiles: Fix layoutstat periodic reporting

Linus Torvalds
2016-08-31 02:14:02 +0800

30 Aug, 2016

2 commits

c73c24849 netfilter: nf_tables_netdev: remove redundant ip_hdr assignment ... Browse Code »

We have already use skb_header_pointer to get the ip header pointer,
so there's no need to use ip_hdr again. Moreover, in NETDEV INGRESS
hook, ip header maybe not linear, so use ip_hdr is not appropriate,
remove it.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-30 17:41:04 +0800
554d072e7 mac80211: TDLS: don't require beaconing for AP BW ... Browse Code »

Stop downgrading TDLS chandef when reaching the AP BW. The AP provides
the necessary regulatory protection in this case.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=153961, which
reported an infinite loop here.

Reported-by: Kamil Toman
Signed-off-by: Arik Nemtsov
Signed-off-by: Johannes Berg

Arik Nemtsov
2016-08-30 14:03:41 +0800

27 Aug, 2016

1 commit

5c1f5b457 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth ... Browse Code »

Johan Hedberg says:

====================
pull request: bluetooth 2016-08-25

Here are a couple of important Bluetooth fixes for the 4.8 kernel:

- Memory leak fix for HCI requests
- Fix sk_filter handling with L2CAP
- Fix sock_recvmsg behavior when MSG_TRUNC is not set

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller

David S. Miller
2016-08-27 12:09:17 +0800

26 Aug, 2016

4 commits

166ee5b87 qdisc: fix a module refcount leak in qdisc_create_dflt() ... Browse Code »

Should qdisc_alloc() fail, we must release the module refcount
we got right before.

Fixes: 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
Signed-off-by: Eric Dumazet
Acked-by: John Fastabend
Acked-by: John Fastabend
Signed-off-by: David S. Miller

Eric Dumazet
2016-08-26 07:44:20 +0800
a5de125dd tipc: fix the error handling in tipc_udp_enable() ... Browse Code »

Fix to return a negative error code in enable_mcast() error handling
case, and release udp socket when necessary.

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2016-08-26 07:32:34 +0800
4f34228b6 Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original
flags not msg_flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800
90a56f72e Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set ... Browse Code »

Commit b5f34f9420b50c9b5876b9a2b68e96be6d629054 attempt to introduce
proper handling for MSG_TRUNC but recv and variants should still work
as read if no flag is passed, but because the code may set MSG_TRUNC to
msg->msg_flags that shall not be used as it may cause it to be behave as
if MSG_TRUNC is always, so instead of using it this changes the code to
use the flags parameter which shall contain the original flags.

Signed-off-by: Luiz Augusto von Dentz
Signed-off-by: Marcel Holtmann

Luiz Augusto von Dentz
2016-08-26 02:58:47 +0800

25 Aug, 2016

2 commits

4249fc1f0 netfilter: ebtables: put module reference when an incorrect extension is found ... Browse Code »

commit bcf493428840 ("netfilter: ebtables: Fix extension lookup with
identical name") added a second lookup in case the extension that was
found during the first lookup matched another extension with the same
name, but didn't release the reference on the incorrect module.

Fixes: bcf493428840 ("netfilter: ebtables: Fix extension lookup with identical name")
Signed-off-by: Sabrina Dubroca
Acked-by: Phil Sutter
Signed-off-by: Pablo Neira Ayuso

Sabrina Dubroca
2016-08-25 19:18:06 +0800
960fa72f6 netfilter: nft_meta: improve the validity check of pkttype set expr ... Browse Code »

"meta pkttype set" is only supported on prerouting chain with bridge
family and ingress chain with netdev family.

But the validate check is incomplete, and the user can add the nft
rules on input chain with bridge family, for example:
# nft add table bridge filter
# nft add chain bridge filter input {type filter hook input \
priority 0 \;}
# nft add chain bridge filter test
# nft add rule bridge filter test meta pkttype set unicast
# nft add rule bridge filter input jump test

This patch fixes the problem.

Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso

Liping Zhang
2016-08-25 19:12:03 +0800