Eric Lee / smarc-fsl-linux-kernel

13 Feb, 2018

8 commits

981f20bc7 soreuseport: fix mem leak in reuseport_add_sock() ... Browse Code »

[ Upstream commit 4db428a7c9ab07e08783e0fcdc4ca0f555da0567 ]

reuseport_add_sock() needs to deal with attaching a socket having
its own sk_reuseport_cb, after a prior
setsockopt(SO_ATTACH_REUSEPORT_?BPF)

Without this fix, not only a WARN_ONCE() was issued, but we were also
leaking memory.

Thanks to sysbot and Eric Biggers for providing us nice C repros.

------------[ cut here ]------------
socket already in reuseport group
WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119
reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x211/0x2d0 lib/bug.c:184
fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079

Fixes: ef456144da8e ("soreuseport: define reuseport groups")
Signed-off-by: Eric Dumazet
Reported-by: syzbot+c0ea2226f77a42936bf7@syzkaller.appspotmail.com
Acked-by: Craig Gallek

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-02-13 17:19:48 +0800
456add4c9 ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only ... Browse Code »

[ Upstream commit 7ece54a60ee2ba7a386308cae73c790bd580589c ]

If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket. However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.

This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.

In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here. The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.

It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED). Only one of the socket will
receive packet.

The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed. The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.

Thanks to Calvin Owens who created an easy
reproduction which leads to a fix.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Martin KaFai Lau
2018-02-13 17:19:48 +0800
c04818aba tcp_bbr: fix pacing_gain to always be unity when using lt_bw ... Browse Code »

[ Upstream commit 3aff3b4b986e51bcf4ab249e5d48d39596e0df6a ]

This commit fixes the pacing_gain to remain at BBR_UNIT (1.0) when
using lt_bw and returning from the PROBE_RTT state to PROBE_BW.

Previously, when using lt_bw, upon exiting PROBE_RTT and entering
PROBE_BW the bbr_reset_probe_bw_mode() code could sometimes randomly
end up with a cycle_idx of 0 and hence have bbr_advance_cycle_phase()
set a pacing gain above 1.0. In such cases this would result in a
pacing rate that is 1.25x higher than intended, potentially resulting
in a high loss rate for a little while until we stop using the lt_bw a
bit later.

This commit is a stable candidate for kernels back as far as 4.9.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Soheil Hassas Yeganeh
Reported-by: Beyers Cronje
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Neal Cardwell
2018-02-13 17:19:48 +0800
07ca93e31 net: ipv6: send unsolicited NA after DAD ... Browse Code »

[ Upstream commit c76fe2d98c726224a975a0d0198c3fb50406d325 ]

Unsolicited IPv6 neighbor advertisements should be sent after DAD
completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
addresses and have those sent by addrconf_dad_completed after DAD.

Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up")
Reported-by: Vivek Venkatraman
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Ahern
2018-02-13 17:19:48 +0800
799a34d5b Revert "defer call to mem_cgroup_sk_alloc()" ... Browse Code »

[ Upstream commit edbe69ef2c90fc86998a74b08319a01c508bd497 ]

This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").

Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.

Actually the free-after-use problem was fixed by
commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.

So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.

Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
mem_cgroup_sk_alloc(). I see no reasons why bumping the root
memcg counter is a good reason to panic, and there are no realistic
ways to hit it.

Signed-off-by: Roman Gushchin
Cc: Eric Dumazet
Cc: David S. Miller
Cc: Johannes Weiner
Cc: Tejun Heo
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Roman Gushchin
2018-02-13 17:19:48 +0800
6d35430fd tcp: release sk_frag.page in tcp_disconnect ... Browse Code »

[ Upstream commit 9b42d55a66d388e4dd5550107df051a9637564fc ]

socket can be disconnected and gets transformed back to a listening
socket, if sk_frag.page is not released, which will be cloned into
a new socket by sk_clone_lock, but the reference count of this page
is increased, lead to a use after free or double free issue

Signed-off-by: Li RongQing
Cc: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Li RongQing
2018-02-13 17:19:47 +0800
166f27322 net: igmp: add a missing rcu locking section ... Browse Code »

[ Upstream commit e7aadb27a5415e8125834b84a74477bfbee4eff5 ]

Newly added igmpv3_get_srcaddr() needs to be called under rcu lock.

Timer callbacks do not ensure this locking.

=============================
WARNING: suspicious RCU usage
4.15.0+ #200 Not tainted
-----------------------------
./include/linux/inetdevice.h:216 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
3 locks held by syzkaller616973/4074:
#0: (&mm->mmap_sem){++++}, at: [] __do_page_fault+0x32d/0xc90 arch/x86/mm/fault.c:1355
#1: ((&im->timer)){+.-.}, at: [] lockdep_copy_map include/linux/lockdep.h:178 [inline]
#1: ((&im->timer)){+.-.}, at: [] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1316
#2: (&(&im->lock)->rlock){+.-.}, at: [] spin_lock_bh include/linux/spinlock.h:315 [inline]
#2: (&(&im->lock)->rlock){+.-.}, at: [] igmpv3_send_report+0x98/0x5b0 net/ipv4/igmp.c:600

stack backtrace:
CPU: 0 PID: 4074 Comm: syzkaller616973 Not tainted 4.15.0+ #200
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:

__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4592
__in_dev_get_rcu include/linux/inetdevice.h:216 [inline]
igmpv3_get_srcaddr net/ipv4/igmp.c:329 [inline]
igmpv3_newpack+0xeef/0x12e0 net/ipv4/igmp.c:389
add_grhead.isra.27+0x235/0x300 net/ipv4/igmp.c:432
add_grec+0xbd3/0x1170 net/ipv4/igmp.c:565
igmpv3_send_report+0xd5/0x5b0 net/ipv4/igmp.c:605
igmp_send_report+0xc43/0x1050 net/ipv4/igmp.c:722
igmp_timer_expire+0x322/0x5c0 net/ipv4/igmp.c:831
call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2d7/0xb85 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:541 [inline]
smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:938

Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Signed-off-by: Eric Dumazet
Reported-by: syzbot

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-02-13 17:19:47 +0800
2726946df ip6mr: fix stale iterator ... Browse Code »

[ Upstream commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
WARNING: bad unlock balance detected!
4.15.0-rc3+ #128 Not tainted
syzkaller971460/3195 is trying to release lock (mrt_lock) at:
[] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syzkaller971460/3195:
#0: (&p->lock){+.+.}, at: [] seq_read+0xd5/0x13d0
fs/seq_file.c:165

stack backtrace:
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
__lock_release kernel/locking/lockdep.c:3775 [inline]
lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
__raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
_raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
traverse+0x3bc/0xa00 fs/seq_file.c:135
seq_read+0x96a/0x13d0 fs/seq_file.c:189
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
BUG: sleeping function called from invalid context at lib/usercopy.c:25
in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
INFO: lockdep is turned off.
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
__might_sleep+0x95/0x190 kernel/sched/core.c:6013
__might_fault+0xab/0x1d0 mm/memory.c:4525
_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
copy_to_user include/linux/uaccess.h:155 [inline]
seq_read+0xcb4/0x13d0 fs/seq_file.c:279
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
lib/usercopy.c:26

Reported-by: syzbot
Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Nikolay Aleksandrov
2018-02-13 17:19:47 +0800

08 Feb, 2018

1 commit

c9daf8144 nl80211: Sanitize array index in parse_txq_params ... Browse Code »

commit 259d8c1e984318497c84eef547bbb6b1d9f4eb05

Wireless drivers rely on parse_txq_params to validate that txq_params->ac
is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx()
handler is called. Use a new helper, array_index_nospec(), to sanitize
txq_params->ac with respect to speculation. I.e. ensure that any
speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter
Reported-by: Elena Reshetova
Signed-off-by: Dan Williams
Signed-off-by: Thomas Gleixner
Acked-by: Johannes Berg
Cc: linux-arch@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
Cc: gregkh@linuxfoundation.org
Cc: linux-wireless@vger.kernel.org
Cc: torvalds@linux-foundation.org
Cc: "David S. Miller"
Cc: alan@linux.intel.com
Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman

Dan Williams
2018-02-08 03:12:23 +0800

04 Feb, 2018

10 commits

ca4b61373 SUNRPC: Allow connect to return EHOSTUNREACH ... Browse Code »

[ Upstream commit 4ba161a793d5f43757c35feff258d9f20a082940 ]

Reported-by: Dmitry Vyukov
Signed-off-by: Trond Myklebust
Tested-by: Dmitry Vyukov
Signed-off-by: Anna Schumaker
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2018-02-04 00:39:11 +0800
14a4e9f6b sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1 ... Browse Code »

[ Upstream commit 52a395896a051a3d5c34fba67c324f69ec5e67c6 ]

When doing asoc reset, if the sender of the response has already sent some
chunk and increased asoc->next_tsn before the duplicate request comes, the
response will use the old result with an incorrect sender next_tsn.

Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after
the sender of the response has performed the asoc reset and before the
peer has confirmed it, and it's value is still asoc->next_tsn original
value minus 1.

This patch sets sender next_tsn for the old result with ctsn_ack_point
plus 1 when processing the duplicate request, to make sure the sender
next_tsn value peer gets will be always right.

Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-02-04 00:39:04 +0800
55f3de731 sctp: avoid flushing unsent queue when doing asoc reset ... Browse Code »

[ Upstream commit 159f2a7456c6ae95c1e1a58e8b8ec65ef12d51cf ]

Now when doing asoc reset, it cleans up sacked and abandoned queues
by calling sctp_outq_free where it also cleans up unsent, retransmit
and transmitted queues.

It's safe for the sender of response, as these 3 queues are empty at
that time. But when the receiver of response is doing the reset, the
users may already enqueue some chunks into unsent during the time
waiting the response, and these chunks should not be flushed.

To void the chunks in it would be removed, it moves the queue into a
temp list, then gets it back after sctp_outq_free is done.

The patch also fixes some incorrect comments in
sctp_process_strreset_tsnreq.

Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-02-04 00:39:04 +0800
d4c72a410 sctp: only allow the asoc reset when the asoc outq is empty ... Browse Code »

[ Upstream commit 5c6144a0eb5366ae07fc5059301b139338f39bbd ]

As it says in rfc6525#section5.1.4, before sending the request,

C2: The sender has either no outstanding TSNs or considers all
outstanding TSNs abandoned.

Prior to this patch, it tried to consider all outstanding TSNs abandoned
by dropping all chunks in all outqs with sctp_outq_free (even including
sacked, retransmit and transmitted queues) when doing this reset, which
is too aggressive.

To make it work gently, this patch will only allow the asoc reset when
the sender has no outstanding TSNs by checking if unsent, transmitted
and retransmit are all empty with sctp_outq_is_empty before sending
and processing the request.

Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-02-04 00:39:04 +0800
841211271 mac80211: fix the update of path metric for RANN frame ... Browse Code »

[ Upstream commit fbbdad5edf0bb59786a51b94a9d006bc8c2da9a2 ]

The previous path metric update from RANN frame has not considered
the own link metric toward the transmitting mesh STA. Fix this.

Reported-by: Michael65535
Signed-off-by: Chun-Yeow Yeoh
Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Chun-Yeow Yeoh
2018-02-04 00:39:03 +0800
e23090a7d mac80211: use QoS NDP for AP probing ... Browse Code »

[ Upstream commit 7b6ddeaf27eca72795ceeae2f0f347db1b5f9a30 ]

When connected to a QoS/WMM AP, mac80211 should use a QoS NDP
for probing it, instead of a regular non-QoS one, fix this.

Change all the drivers to *not* allow QoS NDP for now, even
though it looks like most of them should be OK with that.

Signed-off-by: Johannes Berg
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Johannes Berg
2018-02-04 00:39:03 +0800
9be97a9ab openvswitch: fix the incorrect flow action alloc size ... Browse Code »

[ Upstream commit 67c8d22a73128ff910e2287567132530abcf5b71 ]

If we want to add a datapath flow, which has more than 500 vxlan outputs'
action, we will get the following error reports:
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Actions may not be safe on all matching packets
... ...

It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
this is not the root cause. For example, for a vxlan output action, we need
about 60 bytes for the nlattr, but after it is converted to the flow
action, it only occupies 24 bytes. This means that we can still support
more than 1000 vxlan output actions for a single datapath flow under the
the current 32k max limitation.

So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
shouldn't report EINVAL and keep it move on, as the judgement can be
done by the reserve_sfa_size.

Signed-off-by: zhangliping
Acked-by: Pravin B Shelar
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

zhangliping
2018-02-04 00:39:03 +0800
1392633ba rxrpc: Fix service endpoint expiry ... Browse Code »

[ Upstream commit f859ab61875978eeaa539740ff7f7d91f5d60006 ]

RxRPC service endpoints expire like they're supposed to by the following
means:

(1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
global service conn timeout, otherwise the first rxrpc_net struct to
die will cause connections on all others to expire immediately from
then on.

(2) Mark local service endpoints for which the socket has been closed
(->service_closed) so that the expiration timeout can be much
shortened for service and client connections going through that
endpoint.

(3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
count reaches 1, not 0, as idle conns have a 1 count.

(4) The accumulator for the earliest time we might want to schedule for
should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
the comparison functions use signed arithmetic.

(5) Simplify the expiration handling, adding the expiration value to the
idle timestamp each time rather than keeping track of the time in the
past before which the idle timestamp must go to be expired. This is
much easier to read.

(6) Ignore the timeouts if the net namespace is dead.

(7) Restart the service reaper work item rather the client reaper.

Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-02-04 00:39:01 +0800
b89372f23 rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls ... Browse Code »

[ Upstream commit 9faaff593404a9c4e5abc6839a641635d7b9d0cd ]

Provide a different lockdep key for rxrpc_call::user_mutex when the call is
made on a kernel socket, such as by the AFS filesystem.

The problem is that lockdep registers a false positive between userspace
calling the sendmsg syscall on a user socket where call->user_mutex is held
whilst userspace memory is accessed whereas the AFS filesystem may perform
operations with mmap_sem held by the caller.

In such a case, the following warning is produced.

======================================================
WARNING: possible circular locking dependency detected
4.14.0-fscache+ #243 Tainted: G E
------------------------------------------------------
modpost/16701 is trying to acquire lock:
(&vnode->io_lock){+.+.}, at: [] afs_begin_vnode_operation+0x33/0x77 [kafs]

but task is already holding lock:
(&mm->mmap_sem){++++}, at: [] __do_page_fault+0x1ef/0x486

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++}:
__might_fault+0x61/0x89
_copy_from_iter_full+0x40/0x1fa
rxrpc_send_data+0x8dc/0xff3
rxrpc_do_sendmsg+0x62f/0x6a1
rxrpc_sendmsg+0x166/0x1b7
sock_sendmsg+0x2d/0x39
___sys_sendmsg+0x1ad/0x22b
__sys_sendmsg+0x41/0x62
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #2 (&call->user_mutex){+.+.}:
__mutex_lock+0x86/0x7d2
rxrpc_new_client_call+0x378/0x80e
rxrpc_kernel_begin_call+0xf3/0x154
afs_make_call+0x195/0x454 [kafs]
afs_vl_get_capabilities+0x193/0x198 [kafs]
afs_vl_lookup_vldb+0x5f/0x151 [kafs]
afs_create_volume+0x2e/0x2f4 [kafs]
afs_mount+0x56a/0x8d7 [kafs]
mount_fs+0x6a/0x109
vfs_kern_mount+0x67/0x135
do_mount+0x90b/0xb57
SyS_mount+0x72/0x98
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #1 (k-sk_lock-AF_RXRPC){+.+.}:
lock_sock_nested+0x74/0x8a
rxrpc_kernel_begin_call+0x8a/0x154
afs_make_call+0x195/0x454 [kafs]
afs_fs_get_capabilities+0x17a/0x17f [kafs]
afs_probe_fileserver+0xf7/0x2f0 [kafs]
afs_select_fileserver+0x83f/0x903 [kafs]
afs_fetch_status+0x89/0x11d [kafs]
afs_iget+0x16f/0x4f8 [kafs]
afs_mount+0x6c6/0x8d7 [kafs]
mount_fs+0x6a/0x109
vfs_kern_mount+0x67/0x135
do_mount+0x90b/0xb57
SyS_mount+0x72/0x98
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #0 (&vnode->io_lock){+.+.}:
lock_acquire+0x174/0x19f
__mutex_lock+0x86/0x7d2
afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_fetch_data+0x80/0x12a [kafs]
afs_readpages+0x314/0x405 [kafs]
__do_page_cache_readahead+0x203/0x2ba
filemap_fault+0x179/0x54d
__do_fault+0x17/0x60
__handle_mm_fault+0x6d7/0x95c
handle_mm_fault+0x24e/0x2a3
__do_page_fault+0x301/0x486
do_page_fault+0x236/0x259
page_fault+0x22/0x30
__clear_user+0x3d/0x60
padzero+0x1c/0x2b
load_elf_binary+0x785/0xdc7
search_binary_handler+0x81/0x1ff
do_execveat_common.isra.14+0x600/0x888
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

other info that might help us debug this:

Chain exists of:
&vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&call->user_mutex);
lock(&mm->mmap_sem);
lock(&vnode->io_lock);

*** DEADLOCK ***

1 lock held by modpost/16701:
#0: (&mm->mmap_sem){++++}, at: [] __do_page_fault+0x1ef/0x486

stack backtrace:
CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
dump_stack+0x67/0x8e
print_circular_bug+0x341/0x34f
check_prev_add+0x11f/0x5d4
? add_lock_to_list.isra.12+0x8b/0x8b
? add_lock_to_list.isra.12+0x8b/0x8b
? __lock_acquire+0xf77/0x10b4
__lock_acquire+0xf77/0x10b4
lock_acquire+0x174/0x19f
? afs_begin_vnode_operation+0x33/0x77 [kafs]
__mutex_lock+0x86/0x7d2
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_fetch_data+0x80/0x12a [kafs]
afs_readpages+0x314/0x405 [kafs]
__do_page_cache_readahead+0x203/0x2ba
? filemap_fault+0x179/0x54d
filemap_fault+0x179/0x54d
__do_fault+0x17/0x60
__handle_mm_fault+0x6d7/0x95c
handle_mm_fault+0x24e/0x2a3
__do_page_fault+0x301/0x486
do_page_fault+0x236/0x259
page_fault+0x22/0x30
RIP: 0010:__clear_user+0x3d/0x60
RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
padzero+0x1c/0x2b
load_elf_binary+0x785/0xdc7
search_binary_handler+0x81/0x1ff
do_execveat_common.isra.14+0x600/0x888
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x89/0x1be
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdb6009ee07
RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000

Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-02-04 00:39:01 +0800
92c131beb rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing ... Browse Code »

[ Upstream commit 03a6c82218b9a87014b2c6c4e178294fdc8ebd8a ]

The caller of rxrpc_accept_call() must release the lock on call->user_mutex
returned by that function.

Signed-off-by: David Howells
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

David Howells
2018-02-04 00:39:00 +0800

31 Jan, 2018

21 commits

ca0a09672 bpf: fix 32-bit divide by zero ... Browse Code »

[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman

Alexei Starovoitov
2018-01-31 21:03:50 +0800
6fde36d5c bpf: introduce BPF_JIT_ALWAYS_ON config ... Browse Code »

[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next

Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman

Alexei Starovoitov
2018-01-31 21:03:49 +0800
c2fd0b217 net: ipv4: Make "ip route get" match iif lo rules again. ... Browse Code »

[ Upstream commit 6503a30440962f1e1ccb8868816b4e18201218d4 ]

Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.

Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.

inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.

Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti
Acked-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Lorenzo Colitti
2018-01-31 21:03:49 +0800
ed10b9aff tls: reset crypto_info when do_tls_setsockopt_tx fails ... Browse Code »

[ Upstream commit 6db959c82eb039a151d95a0f8b7dea643657327a ]

The current code copies directly from userspace to ctx->crypto_send, but
doesn't always reinitialize it to 0 on failure. This causes any
subsequent attempt to use this setsockopt to fail because of the
TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
ready.

This should result in a correctly set up socket after the 3rd call, but
currently it does not:

size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128);
struct tls12_crypto_info_aes_gcm_128 crypto_good = {
.info.version = TLS_1_2_VERSION,
.info.cipher_type = TLS_CIPHER_AES_GCM_128,
};

struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good;
crypto_bad_type.info.cipher_type = 42;

setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s);
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1);
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sabrina Dubroca
2018-01-31 21:03:48 +0800
2f54941c8 tls: return -EBUSY if crypto_info is already set ... Browse Code »

[ Upstream commit 877d17c79b66466942a836403773276e34fe3614 ]

do_tls_setsockopt_tx returns 0 without doing anything when crypto_info
is already set. Silent failure is confusing for users.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sabrina Dubroca
2018-01-31 21:03:48 +0800
3a28f04bc tls: fix sw_ctx leak ... Browse Code »

[ Upstream commit cf6d43ef66f416282121f436ce1bee9a25199d52 ]

During setsockopt(SOL_TCP, TLS_TX), if initialization of the software
context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't
reassign ctx->priv_ctx to NULL, so we can't even do another attempt to
set it up on the same socket, as it will fail with -EEXIST.

Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Sabrina Dubroca
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sabrina Dubroca
2018-01-31 21:03:48 +0800
a022bbe39 net/tls: Only attach to sockets in ESTABLISHED state ... Browse Code »

[ Upstream commit d91c3e17f75f218022140dee18cf515292184a8f ]

Calling accept on a TCP socket with a TLS ulp attached results
in two sockets that share the same ulp context.
The ulp context is freed while a socket is destroyed, so
after one of the sockets is released, the second second will
trigger a use after free when it tries to access the ulp context
attached to it.
We restrict the TLS ulp to sockets in ESTABLISHED state
to prevent the scenario above.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ilya Lesokhin
2018-01-31 21:03:48 +0800
48606bb1e netlink: reset extack earlier in netlink_rcv_skb ... Browse Code »

[ Upstream commit cd443f1e91ca600a092e780e8250cd6a2954b763 ]

Move up the extack reset/initialization in netlink_rcv_skb, so that
those 'goto ack' will not skip it. Otherwise, later on netlink_ack
may use the uninitialized extack and cause kernel crash.

Fixes: cbbdf8433a5f ("netlink: extack needs to be reset each time through loop")
Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com
Signed-off-by: Xin Long
Acked-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-01-31 21:03:48 +0800
3eae0ba8c netlink: extack needs to be reset each time through loop ... Browse Code »

[ Upstream commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570 ]

syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.

Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Ahern
2018-01-31 21:03:48 +0800
3c6e5f2f5 sctp: reinit stream if stream outcnt has been change by sinit in sendmsg ... Browse Code »

[ Upstream commit 625637bf4afa45204bd87e4218645182a919485a ]

After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.

However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.

This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.

Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long
Acked-by: Neil Horman
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-01-31 21:03:47 +0800
80f327285 flow_dissector: properly cap thoff field ... Browse Code »

[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]

syzbot reported yet another crash [1] that is caused by
insufficient validation of DODGY packets.

Two bugs are happening here to trigger the crash.

1) Flow dissection leaves with incorrect thoff field.

2) skb_probe_transport_header() sets transport header to this invalid
thoff, even if pointing after skb valid data.

3) qdisc_pkt_len_init() reads out-of-bound data because it
trusts tcp_hdrlen(skb)

Possible fixes :

- Full flow dissector validation before injecting bad DODGY packets in
the stack.
This approach was attempted here : https://patchwork.ozlabs.org/patch/
861874/

- Have more robust functions in the core.
This might be needed anyway for stable versions.

This patch fixes the flow dissection issue.

[1]
CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:355 [inline]
kasan_report+0x23b/0x360 mm/kasan/report.c:413
__asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
__tcp_hdrlen include/linux/tcp.h:35 [inline]
tcp_hdrlen include/linux/tcp.h:40 [inline]
qdisc_pkt_len_init net/core/dev.c:3160 [inline]
__dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
packet_snd net/packet/af_packet.c:2943 [inline]
packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
sock_sendmsg_nosec net/socket.c:628 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:638
sock_write_iter+0x31a/0x5d0 net/socket.c:907
call_write_iter include/linux/fs.h:1776 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:482
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value")
Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Reported-by: syzbot
Acked-by: Jason Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-01-31 21:03:47 +0800
dd7e1cbd2 gso: validate gso_type in GSO handlers ... Browse Code »

[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn
Acked-by: Jason Wang
Reviewed-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2018-01-31 21:03:47 +0800
8af27b14b ip6_gre: init dev->mtu and dev->hard_header_len correctly ... Browse Code »

[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") moved dev->mtu initialization
from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
result, the previously set values, before ndo_init(), are
reset in the following cases:

* rtnl_create_link() can update dev->mtu from IFLA_MTU
parameter.

* ip6gre_tnl_link_config() is invoked before ndo_init() in
netlink and ioctl setup, so ndo_init() can reset MTU
adjustments with the lower device MTU as well, dev->mtu
and dev->hard_header_len.

Not applicable for ip6gretap because it has one more call
to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
parameter if a user sets it manually on a device creation,
and fix the second one by moving ip6gre_tnl_link_config()
call after register_netdevice().

Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
Signed-off-by: Alexey Kodanev
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Alexey Kodanev
2018-01-31 21:03:47 +0800
8de7fb3df tipc: fix a memory leak in tipc_nl_node_get_link() ... Browse Code »

[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov
Cc: Jon Maloy
Cc: Ying Xue
Signed-off-by: Cong Wang
Acked-by: Ying Xue
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-01-31 21:03:46 +0800
a940c1964 sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf ... Browse Code »

[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]

After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.

However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.

This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.

Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long
Acked-by: Neil Horman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-01-31 21:03:46 +0800
f2e957097 sctp: do not allow the v4 socket to bind a v4mapped v6 address ... Browse Code »

[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]

The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.

The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.

This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.

Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman
Signed-off-by: Xin Long
Acked-by: Marcelo Ricardo Leitner
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Xin Long
2018-01-31 21:03:46 +0800
d3048a12f net/tls: Fix inverted error codes to avoid endless loop ... Browse Code »

[ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]

sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().

Signed-off-by: Robert Hering
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

r.hering@avm.de
2018-01-31 21:03:45 +0800
32e57f8c5 net: tcp: close sock if net namespace is exiting ... Browse Code »

[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Dan Streetman
2018-01-31 21:03:45 +0800
450449fff net: qdisc_pkt_len_init() should be more robust ... Browse Code »

[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Cc: Jason Wang
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-01-31 21:03:45 +0800
d9bee33e3 net: igmp: fix source address check for IGMPv3 reports ... Browse Code »

[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall
Signed-off-by: Felix Fietkau
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Felix Fietkau
2018-01-31 21:03:45 +0800
347217078 ipv6: ip6_make_skb() needs to clear cork.base.dst ... Browse Code »

[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2018-01-31 21:03:45 +0800