Eric Lee / smarc-fsl-linux-kernel

18 Feb, 2017

35 commits

eee1550b3 Linux 4.9.11 Browse Code »

Greg Kroah-Hartman
2017-02-18 22:11:56 +0800
724aedaa5 x86/fpu/xstate: Fix xcomp_bv in XSAVES header ... Browse Code »

commit dffba9a31c7769be3231c420d4b364c92ba3f1ac upstream.

The compacted-format XSAVES area is determined at boot time and
never changed after. The field xsave.header.xcomp_bv indicates
which components are in the fixed XSAVES format.

In fpstate_init() we did not set xcomp_bv to reflect the XSAVES
format since at the time there is no valid data.

However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(),
as in commit:

b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve()

and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode
app, a #GP occurs. This can be easily triggered by doing valgrind on
a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and
others:

https://bugzilla.kernel.org/show_bug.cgi?id=190061

Fix it by setting xcomp_bv correctly.

This patch also moves the xcomp_bv initialization to the proper
place, which was in copyin_to_xsaves() as of:

4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area

which fixed the bug too, but it's more efficient and cleaner to
initialize things once per boot, not for every signal handling
operation.

Reported-by: Kevin Hao
Reported-by: Joakim Tjernlund
Signed-off-by: Yu-cheng Yu
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Dave Hansen
Cc: Fenghua Yu
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Ravi V. Shankar
Cc: Thomas Gleixner
Cc: haokexin@gmail.com
Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com
[ Combined it with 4c833368f0bf. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Yu-cheng Yu
2017-02-18 22:11:44 +0800
0d4c19ee6 tcp: don't annotate mark on control socket from tcp_v6_send_response() ... Browse Code »

commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.

Unlike ipv4, this control socket is shared by all cpus so we cannot use
it as scratchpad area to annotate the mark that we pass to ip6_xmit().

Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
family caches the flowi6 structure in the sctp_transport structure, so
we cannot use to carry the mark unless we later on reset it back, which
I discarded since it looks ugly to me.

Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
Suggested-by: Eric Dumazet
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Pablo Neira
2017-02-18 22:11:44 +0800
0e0751cdf net/mlx5: Don't unlock fte while still using it ... Browse Code »

commit 0fd758d6112f867b2cc6df0f6a856048ff99b211 upstream.

When adding a new rule to an fte, we need to hold the fte lock
until we add that rule to the fte and increase the fte ref count.

Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API")
Signed-off-by: Mark Bloch
Signed-off-by: Saeed Mahameed
Signed-off-by: Leon Romanovsky
Signed-off-by: Greg Kroah-Hartman

Mark Bloch
2017-02-18 22:11:43 +0800
7c4c32a29 tcp: fix mark propagation with fwmark_reflect enabled ... Browse Code »

commit bf99b4ded5f8a4767dbb9d180626f06c51f9881f upstream.

Otherwise, RST packets generated by the TCP stack for non-existing
sockets always have mark 0.
The mark from the original packet is assigned to the netns_ipv4/6
socket used to send the response so that it can get copied into the
response skb when the socket sends it.

Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti
Signed-off-by: Pau Espin Pedrol
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Greg Kroah-Hartman

Pau Espin Pedrol
2017-02-18 22:11:43 +0800
16a3fbe52 igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() ... Browse Code »

[ Upstream commit 9c8bb163ae784be4f79ae504e78c862806087c54 ]

In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann
Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Hangbin Liu
2017-02-18 22:11:43 +0800
53a76d633 mld: do not remove mld souce list info when set link down ... Browse Code »

[ Upstream commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba ]

This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.

Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().

Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().

Signed-off-by: Hangbin Liu
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Hangbin Liu
2017-02-18 22:11:43 +0800
5b1bb4cbd l2tp: do not use udp_ioctl() ... Browse Code »

[ Upstream commit 72fb96e7bdbbdd4421b0726992496531060f3636 ]

udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(

L2TP should use its own handler, because it really does not
look the same.

SIOCINQ for instance should not assume UDP checksum or headers.

Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.

While crashes only happen on recent kernels (after commit
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.

Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Acked-by: Paolo Abeni
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:43 +0800
12758a282 net: dsa: Do not destroy invalid network devices ... Browse Code »

[ Upstream commit 382e1eea2d983cd2343482c6a638f497bb44a636 ]

dsa_slave_create() can fail, and dsa_user_port_unapply() will properly check
for the network device not being NULL before attempting to destroy it. We were
not setting the slave network device as NULL if dsa_slave_create() failed, so
we would later on be calling dsa_slave_destroy() on a now free'd and
unitialized network device, causing crashes in dsa_slave_destroy().

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Florian Fainelli
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Florian Fainelli
2017-02-18 22:11:43 +0800
a700cf26a ping: fix a null pointer dereference ... Browse Code »

[ Upstream commit 73d2c6678e6c3af7e7a42b1e78cd0211782ade32 ]

Andrey reported a kernel crash:

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880060048040 task.stack: ffff880069be8000
RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
FS: 00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
Call Trace:
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2

This is because we miss a check for NULL pointer for skb_peek() when
the queue is empty. Other places already have the same check.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Andrey Konovalov
Tested-by: Andrey Konovalov
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

WANG Cong
2017-02-18 22:11:43 +0800
828495418 packet: round up linear to header len ... Browse Code »

[ Upstream commit 57031eb794906eea4e1c7b31dc1e2429c0af0c66 ]

Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.

Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.

Round up the linear length to be at least as long as the smallest of
the two.

Reported-by: Dmitry Vyukov
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-02-18 22:11:43 +0800
6ebde312a net: introduce device min_header_len ... Browse Code »

[ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]

The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan
Signed-off-by: Willem de Bruijn
Acked-by: Eric Dumazet
Acked-by: Sowmini Varadhan
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-02-18 22:11:43 +0800
4cd036211 sit: fix a double free on error path ... Browse Code »

[ Upstream commit d7426c69a1942b2b9b709bf66b944ff09f561484 ]

Dmitry reported a double free in sit_init_net():

kernel BUG at mm/percpu.c:689!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
FS: 00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
free_percpu+0x212/0x520 mm/percpu.c:1264
ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
ops_init+0x10a/0x530 net/core/net_namespace.c:115
setup_net+0x2ed/0x690 net/core/net_namespace.c:291
copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
SYSC_unshare kernel/fork.c:2281 [inline]
SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
entry_SYSCALL_64_fastpath+0x1f/0xc2

This is because when tunnel->dst_cache init fails, we free dev->tstats
once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
redundant but its ndo_uinit() does not seem enough to clean up everything
here. So avoid this by setting dev->tstats to NULL after the first free,
at least for -net.

Reported-by: Dmitry Vyukov
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

WANG Cong
2017-02-18 22:11:43 +0800
2b7f50d67 lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled ... Browse Code »

[ Upstream commit 2bd137de531367fb573d90150d1872cb2a2095f7 ]

An error was reported upgrading to 4.9.8:
root@Typhoon:~# ip route add default table 210 nexthop dev eth0 via 10.68.64.1
weight 1 nexthop dev eth0 via 10.68.64.2 weight 1
RTNETLINK answers: Operation not supported

The problem occurs when CONFIG_LWTUNNEL is not enabled and a multipath
route is submitted.

The point of lwtunnel_valid_encap_type_attr is catch modules that
need to be loaded before any references are taken with rntl held. With
CONFIG_LWTUNNEL disabled, there will be no modules to load so the
lwtunnel_valid_encap_type_attr stub should just return 0.

Fixes: 9ed59592e3e3 ("lwtunnel: fix autoload of lwt modules")
Reported-by: pupilla@libero.it
Signed-off-by: David Ahern
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David Ahern
2017-02-18 22:11:42 +0800
00eff2ebb sctp: avoid BUG_ON on sctp_wait_for_sndbuf ... Browse Code »

[ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]

Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.

This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.

Acked-by: Alexander Popov
Signed-off-by: Marcelo Ricardo Leitner
Reviewed-by: Xin Long
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Marcelo Ricardo Leitner
2017-02-18 22:11:42 +0800
4400acce6 mlx4: Invoke softirqs after napi_reschedule ... Browse Code »

[ Upstream commit bd4ce941c8d5b862b2f83364be5dbe8fc8ab48f8 ]

mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame and the following message may be logged:
NOHZ: local_softirq_pending 08

The problem is the same as what was described in commit ec13ee80145c
("virtio_net: invoke softirqs after __napi_schedule") and this patch
applies the same fix to mlx4.

Fixes: 07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails")
Cc: Eric Dumazet
Signed-off-by: Benjamin Poirier
Acked-by: Eric Dumazet
Reviewed-by: Tariq Toukan
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Benjamin Poirier
2017-02-18 22:11:42 +0800
970390fd5 catc: Use heap buffer for memory size test ... Browse Code »

[ Upstream commit 2d6a0e9de03ee658a9adc3bfb2f0ca55dff1e478 ]

Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2017-02-18 22:11:42 +0800
61bf9f381 catc: Combine failure cleanup code in catc_probe() ... Browse Code »

[ Upstream commit d41149145f98fe26dcd0bfd1d6cc095e6e041418 ]

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2017-02-18 22:11:42 +0800
e898f6f00 rtl8150: Use heap buffers for all register access ... Browse Code »

[ Upstream commit 7926aff5c57b577ab0f43364ff0c59d968f6a414 ]

Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2017-02-18 22:11:42 +0800
878b015bc pegasus: Use heap buffers for all register access ... Browse Code »

[ Upstream commit 5593523f968bc86d42a035c6df47d5e0979b5ace ]

Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
References: https://bugs.debian.org/852556
Reported-by: Lisandro Damián Nicanor Pérez Meyer
Tested-by: Lisandro Damián Nicanor Pérez Meyer
Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2017-02-18 22:11:42 +0800
b90cb484c macvtap: read vnet_hdr_size once ... Browse Code »

[ Upstream commit 837585a5375c38d40361cfe64e6fd11e1addb936 ]

When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.

Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.

Signed-off-by: Willem de Bruijn
Suggested-by: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-02-18 22:11:42 +0800
26989c9d9 tun: read vnet_hdr_sz once ... Browse Code »

[ Upstream commit e1edab87faf6ca30cd137e0795bc73aa9a9a22ec ]

When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.

Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).

Signed-off-by: Willem de Bruijn
Reported-by: Dmitry Vyukov
CC: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Willem de Bruijn
2017-02-18 22:11:42 +0800
0f895f51a tcp: avoid infinite loop in tcp_splice_read() ... Browse Code »

[ Upstream commit ccf7abb93af09ad0868ae9033d1ca8108bdaec82 ]

Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.

__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.

This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.

Again, this gem was found by syzkaller tool.

Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Cc: Willy Tarreau
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:42 +0800
1e340bb22 ipv6: tcp: add a missing tcp_v6_restore_cb() ... Browse Code »

[ Upstream commit ebf6c9cb23d7e56eec8575a88071dec97ad5c6e2 ]

Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()

A similar bug was fixed in commit 8ce48623f0cf ("ipv6: tcp: restore
IP6CB for pktoptions skbs"), but I missed another spot.

tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
ae1768bbb ip6_gre: fix ip6gre_err() invalid reads ... Browse Code »

[ Upstream commit 7892032cfe67f4bde6fc2ee967e45a8fbaf33756 ]

Andrey Konovalov reported out of bound accesses in ip6gre_err()

If GRE flags contains GRE_KEY, the following expression
*(((__be32 *)p) + (grehlen / 4) - 1)

accesses data ~40 bytes after the expected point, since
grehlen includes the size of IPv6 headers.

Let's use a "struct gre_base_hdr *greh" pointer to make this
code more readable.

p[1] becomes greh->protocol.
grhlen is the GRE header length.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
66cdd4347 netlabel: out of bound access in cipso_v4_validate() ... Browse Code »

[ Upstream commit d71b7896886345c53ef1d84bda2bc758554f5d61 ]

syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()

Fixes: 20e2a8648596 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Cc: Paul Moore
Acked-by: Paul Moore
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
f5b544466 ipv4: keep skb->dst around in presence of IP options ... Browse Code »

[ Upstream commit 34b2cef20f19c87999fff3da4071e66937db9644 ]

Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.

ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.

We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.

Thanks to syzkaller team for finding this bug.

Fixes: d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Eric Dumazet
Reported-by: Andrey Konovalov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
d5b6fd775 net: use a work queue to defer net_disable_timestamp() work ... Browse Code »

[ Upstream commit 5fa8bbda38c668e56b0c6cdecced2eac2fe36dec ]

Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.

Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.

As Cong suggested, we now use a work queue.

It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.

netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.

[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: []
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: []
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559

Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang
Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
455a45778 stmmac: Discard masked flags in interrupt status register ... Browse Code »

[ Upstream commit 0a764db103376cf69d04449b10688f3516cc0b88 ]

DW GMAC databook says the following about bits in "Register 15 (Interrupt
Mask Register)":
--------------------------->8-------------------------
When set, this bit __disables_the_assertion_of_the_interrupt_signal__
because of the setting of XXX bit in Register 14 (Interrupt
Status Register).
--------------------------->8-------------------------

In fact even if we mask one bit in the mask register it doesn't prevent
corresponding bit to appear in the status register, it only disables
interrupt generation for corresponding event.

But currently we expect a bit different behavior: status bits to be in
sync with their masks, i.e. if mask for bit A is set in the mask
register then bit A won't appear in the interrupt status register.

This was proven to be incorrect assumption, see discussion here [1].
That misunderstanding causes unexpected behaviour of the GMAC, for
example we were happy enough to just see bogus messages about link
state changes.

So from now on we'll be only checking bits that really may trigger an
interrupt.

[1] https://lkml.org/lkml/2016/11/3/413

Signed-off-by: Alexey Brodkin
Cc: Giuseppe Cavallaro
Cc: Fabrice Gasnier
Cc: Joachim Eastwood
Cc: Phil Reid
Cc: David Miller
Cc: Alexandre Torgue
Cc: Vineet Gupta
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Alexey Brodkin
2017-02-18 22:11:41 +0800
ca876dff1 tcp: fix 0 divide in __tcp_select_window() ... Browse Code »

[ Upstream commit 06425c308b92eaf60767bc71d359f4cbc7a561f8 ]

syszkaller fuzzer was able to trigger a divide by zero, when
TCP window scaling is not enabled.

SO_RCVBUF can be used not only to increase sk_rcvbuf, also
to decrease it below current receive buffers utilization.

If mss is negative or 0, just return a zero TCP window.

Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
e6fbace87 ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() ... Browse Code »

[ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]

Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.

Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Dan Carpenter
2017-02-18 22:11:41 +0800
a7fe4e5d0 ipv6: fix ip6_tnl_parse_tlv_enc_lim() ... Browse Code »

[ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]

This function suffers from multiple issues.

First one is that pskb_may_pull() may reallocate skb->head,
so the 'raw' pointer needs either to be reloaded or not used at all.

Second issue is that NEXTHDR_DEST handling does not validate
that the options are present in skb->data, so we might read
garbage or access non existent memory.

With help from Willem de Bruijn.

Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Cc: Willem de Bruijn
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:41 +0800
6c8556f6e net/sched: matchall: Fix configuration race ... Browse Code »

[ Upstream commit fd62d9f5c575f0792f150109f1fd24a0d4b3f854 ]

In the current version, the matchall internal state is split into two
structs: cls_matchall_head and cls_matchall_filter. This makes little
sense, as matchall instance supports only one filter, and there is no
situation where one exists and the other does not. In addition, that led
to some races when filter was deleted while packet was processed.

Unify that two structs into one, thus simplifying the process of matchall
creation and deletion. As a result, the new, delete and get callbacks have
a dummy implementation where all the work is done in destroy and change
callbacks, as was done in cls_cgroup.

Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Reported-by: Daniel Borkmann
Signed-off-by: Yotam Gigi
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Yotam Gigi
2017-02-18 22:11:41 +0800
64cc7ef5c net/mlx5e: Fix update of hash function/key via ethtool ... Browse Code »

[ Upstream commit a100ff3eef193d2d79daf98dcd97a54776ffeb78 ]

Modifying TIR hash should change selected fields bitmask in addition to
the function and key.

Formerly, Only on ethool mlx5e_set_rxfh "ethtoo -X" we would not set this
field resulting in zeroing of its value, which means no packet fields are
used for RX RSS hash calculation thus causing all traffic to arrive in
RQ[0].

On driver load out of the box we don't have this issue, since the TIR
hash is fully created from scratch.

Tested:
ethtool -X ethX hkey
ethtool -X ethX hfunc
ethtool -X ethX equal

All cases are verified with TCP Multi-Stream traffic over IPv4 & IPv6.

Fixes: bdfc028de1b3 ("net/mlx5e: Fix ethtool RX hash func configuration change")
Signed-off-by: Gal Pressman
Signed-off-by: Saeed Mahameed
Signed-off-by: Greg Kroah-Hartman

Gal Pressman
2017-02-18 22:11:40 +0800
adf86d59b can: Fix kernel panic at security_sock_rcv_skb ... Browse Code »

[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:

[] security_sock_rcv_skb+0x4c/0x60
[] sk_filter+0x41/0x210
[] sock_queue_rcv_skb+0x53/0x3a0
[] raw_rcv+0x2a3/0x3c0
[] can_rcv_filter+0x12b/0x370
[] can_receive+0xd9/0x120
[] can_rcv+0xab/0x100
[] __netif_receive_skb_core+0xd8c/0x11f0
[] __netif_receive_skb+0x24/0xb0
[] process_backlog+0x127/0x280
[] net_rx_action+0x33b/0x4f0
[] __do_softirq+0x184/0x440
[] do_softirq_own_stack+0x1c/0x30

[] do_softirq.part.18+0x3b/0x40
[] do_softirq+0x1d/0x20
[] netif_rx_ni+0xe5/0x110
[] slcan_receive_buf+0x507/0x520
[] flush_to_ldisc+0x21c/0x230
[] process_one_work+0x24f/0x670
[] worker_thread+0x9d/0x6f0
[] ? rescuer_thread+0x480/0x480
[] kthread+0x12c/0x150
[] ret_from_fork+0x3f/0x70

Reported-by: Zhang Yanmin
Signed-off-by: Eric Dumazet
Acked-by: Oliver Hartkopp
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2017-02-18 22:11:40 +0800

15 Feb, 2017

5 commits

390caeedd Linux 4.9.10 Browse Code »

Greg Kroah-Hartman
2017-02-15 07:26:10 +0800
e5c2e5147 perf/core: Fix crash in perf_event_read() ... Browse Code »

commit 451d24d1e5f40bad000fa9abe36ddb16fc9928cb upstream.

Alexei had his box explode because doing read() on a package
(rapl/uncore) event that isn't currently scheduled in ends up doing an
out-of-bounds load.

Rework the code to more explicitly deal with event->oncpu being -1.

Reported-by: Alexei Starovoitov
Tested-by: Alexei Starovoitov
Tested-by: David Carrillo-Cisneros
Signed-off-by: Peter Zijlstra (Intel)
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: eranian@google.com
Fixes: d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG")
Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Peter Zijlstra
2017-02-15 07:25:43 +0800
de65c300c perf diff: Fix segfault on 'perf diff -o N' option ... Browse Code »

commit 8381cdd0e32dd748bd34ca3ace476949948bd793 upstream.

The -o/--order option is to select column number to sort a diff result.

It does the job by adding a hpp field at the beginning of the sort list.
But it should not be added to the output field list as it has no
callbacks required by a output field.

During the setup_sorting(), the perf_hpp__setup_output_field() appends
the given sort keys to the output field if it's not there already.

Originally it was checked by fmt->list being non-empty. But commit
3f931f2c4274 ("perf hists: Make hpp setup function generic") changed it
to check the ->equal callback.

Anyways, we don't need to add the pseudo hpp field to the output field
list since it won't be used for output. So just skip fields if they
have no ->color or ->entry callbacks.

Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Peter Zijlstra
Fixes: 3f931f2c4274 ("perf hists: Make hpp setup function generic")
Link: http://lkml.kernel.org/r/20170118051457.30946-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Greg Kroah-Hartman

Namhyung Kim
2017-02-15 07:25:42 +0800
85df621b1 perf diff: Fix -o/--order option behavior (again) ... Browse Code »

commit a1c9f97f0b64e6337d9cfcc08c134450934fdd90 upstream.

Commit 21e6d8428664 ("perf diff: Use perf_hpp__register_sort_field
interface") changed list_add() to perf_hpp__register_sort_field().

This resulted in a behavior change since the field was added to the tail
instead of the head. So the -o option is mostly ignored due to its
order in the list.

This patch fixes it by adding perf_hpp__prepend_sort_field().

Signed-off-by: Namhyung Kim
Acked-by: Jiri Olsa
Cc: Peter Zijlstra
Fixes: 21e6d8428664 ("perf diff: Use perf_hpp__register_sort_field interface")
Link: http://lkml.kernel.org/r/20170118051457.30946-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Greg Kroah-Hartman

Namhyung Kim
2017-02-15 07:25:42 +0800
6b4af0dab stacktrace, lockdep: Fix address, newline ugliness ... Browse Code »

commit bfeda41d06d85ad9d52f2413cfc2b77be5022f75 upstream.

Since KERN_CONT became meaningful again, lockdep stack traces have had
annoying extra newlines, like this:

[ 5.561122] -> #1 (B){+.+...}:
[ 5.561528]
[ 5.561532] [] lock_acquire+0xc3/0x210
[ 5.562178]
[ 5.562181] [] mutex_lock_nested+0x74/0x6d0
[ 5.562861]
[ 5.562880] [] init_btrfs_fs+0x21/0x196 [btrfs]
[ 5.563717]
[ 5.563721] [] do_one_initcall+0x52/0x1b0
[ 5.564554]
[ 5.564559] [] do_init_module+0x5f/0x209
[ 5.565357]
[ 5.565361] [] load_module+0x218d/0x2b80
[ 5.566020]
[ 5.566021] [] SyS_finit_module+0xeb/0x120
[ 5.566694]
[ 5.566696] [] entry_SYSCALL_64_fastpath+0x1f/0xc2

That's happening because each printk() call now gets printed on its own
line, and we do a separate call to print the spaces before the symbol.
Fix it by doing the printk() directly instead of using the
print_ip_sym() helper.

Additionally, the symbol address isn't very helpful, so let's get rid of
that, too. The final result looks like this:

[ 5.194518] -> #1 (B){+.+...}:
[ 5.195002] lock_acquire+0xc3/0x210
[ 5.195439] mutex_lock_nested+0x74/0x6d0
[ 5.196491] do_one_initcall+0x52/0x1b0
[ 5.196939] do_init_module+0x5f/0x209
[ 5.197355] load_module+0x218d/0x2b80
[ 5.197792] SyS_finit_module+0xeb/0x120
[ 5.198251] entry_SYSCALL_64_fastpath+0x1f/0xc2

Suggested-by: Linus Torvalds
Signed-off-by: Omar Sandoval
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: kernel-team@fb.com
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/43b4e114724b2bdb0308fa86cb33aa07d3d67fad.1486510315.git.osandov@fb.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Omar Sandoval
2017-02-15 07:25:42 +0800