18 Feb, 2017

35 commits

  • Greg Kroah-Hartman
     
  • commit dffba9a31c7769be3231c420d4b364c92ba3f1ac upstream.

    The compacted-format XSAVES area is determined at boot time and
    never changed after. The field xsave.header.xcomp_bv indicates
    which components are in the fixed XSAVES format.

    In fpstate_init() we did not set xcomp_bv to reflect the XSAVES
    format since at the time there is no valid data.

    However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(),
    as in commit:

    b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve()

    and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode
    app, a #GP occurs. This can be easily triggered by doing valgrind on
    a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and
    others:

    https://bugzilla.kernel.org/show_bug.cgi?id=190061

    Fix it by setting xcomp_bv correctly.

    This patch also moves the xcomp_bv initialization to the proper
    place, which was in copyin_to_xsaves() as of:

    4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area

    which fixed the bug too, but it's more efficient and cleaner to
    initialize things once per boot, not for every signal handling
    operation.

    Reported-by: Kevin Hao
    Reported-by: Joakim Tjernlund
    Signed-off-by: Yu-cheng Yu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Ravi V. Shankar
    Cc: Thomas Gleixner
    Cc: haokexin@gmail.com
    Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com
    [ Combined it with 4c833368f0bf. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Yu-cheng Yu
     
  • commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.

    Unlike ipv4, this control socket is shared by all cpus so we cannot use
    it as scratchpad area to annotate the mark that we pass to ip6_xmit().

    Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
    family caches the flowi6 structure in the sctp_transport structure, so
    we cannot use to carry the mark unless we later on reset it back, which
    I discarded since it looks ugly to me.

    Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
    Suggested-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira
     
  • commit 0fd758d6112f867b2cc6df0f6a856048ff99b211 upstream.

    When adding a new rule to an fte, we need to hold the fte lock
    until we add that rule to the fte and increase the fte ref count.

    Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API")
    Signed-off-by: Mark Bloch
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Greg Kroah-Hartman

    Mark Bloch
     
  • commit bf99b4ded5f8a4767dbb9d180626f06c51f9881f upstream.

    Otherwise, RST packets generated by the TCP stack for non-existing
    sockets always have mark 0.
    The mark from the original packet is assigned to the netns_ipv4/6
    socket used to send the response so that it can get copied into the
    response skb when the socket sends it.

    Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
    Cc: Lorenzo Colitti
    Signed-off-by: Pau Espin Pedrol
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pau Espin Pedrol
     
  • [ Upstream commit 9c8bb163ae784be4f79ae504e78c862806087c54 ]

    In function igmpv3/mld_add_delrec() we allocate pmc and put it in
    idev->mc_tomb, so we should free it when we don't need it in del_delrec().
    But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

    Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
    Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
    Reported-by: Daniel Borkmann
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba ]

    This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
    souce list..."). In mld_del_delrec(), we will restore back all source filter
    info instead of flush them.

    Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
    we should not remove source list info when set link down. Remove
    igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
    ipv6_mc_down().

    Also clear all source info after igmp6_group_dropped() instead of in it
    because ipv6_mc_down() will call igmp6_group_dropped().

    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 72fb96e7bdbbdd4421b0726992496531060f3636 ]

    udp_ioctl(), as its name suggests, is used by UDP protocols,
    but is also used by L2TP :(

    L2TP should use its own handler, because it really does not
    look the same.

    SIOCINQ for instance should not assume UDP checksum or headers.

    Thanks to Andrey and syzkaller team for providing the report
    and a nice reproducer.

    While crashes only happen on recent kernels (after commit
    7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
    probably needs to be backported to older kernels.

    Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
    Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 382e1eea2d983cd2343482c6a638f497bb44a636 ]

    dsa_slave_create() can fail, and dsa_user_port_unapply() will properly check
    for the network device not being NULL before attempting to destroy it. We were
    not setting the slave network device as NULL if dsa_slave_create() failed, so
    we would later on be calling dsa_slave_destroy() on a now free'd and
    unitialized network device, causing crashes in dsa_slave_destroy().

    Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 73d2c6678e6c3af7e7a42b1e78cd0211782ade32 ]

    Andrey reported a kernel crash:

    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff880060048040 task.stack: ffff880069be8000
    RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
    RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
    RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
    RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
    RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
    RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
    R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
    R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
    FS: 00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
    Call Trace:
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
    sock_sendmsg_nosec net/socket.c:635 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:645
    SYSC_sendto+0x660/0x810 net/socket.c:1687
    SyS_sendto+0x40/0x50 net/socket.c:1655
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    This is because we miss a check for NULL pointer for skb_peek() when
    the queue is empty. Other places already have the same check.

    Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit 57031eb794906eea4e1c7b31dc1e2429c0af0c66 ]

    Link layer protocols may unconditionally pull headers, as Ethernet
    does in eth_type_trans. Ensure that the entire link layer header
    always lies in the skb linear segment. tpacket_snd has such a check.
    Extend this to packet_snd.

    Variable length link layer headers complicate the computation
    somewhat. Here skb->len may be smaller than dev->hard_header_len.

    Round up the linear length to be at least as long as the smallest of
    the two.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]

    The stack must not pass packets to device drivers that are shorter
    than the minimum link layer header length.

    Previously, packet sockets would drop packets smaller than or equal
    to dev->hard_header_len, but this has false positives. Zero length
    payload is used over Ethernet. Other link layer protocols support
    variable length headers. Support for validation of these protocols
    removed the min length check for all protocols.

    Introduce an explicit dev->min_header_len parameter and drop all
    packets below this value. Initially, set it to non-zero only for
    Ethernet and loopback. Other protocols can follow in a patch to
    net-next.

    Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
    Reported-by: Sowmini Varadhan
    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit d7426c69a1942b2b9b709bf66b944ff09f561484 ]

    Dmitry reported a double free in sit_init_net():

    kernel BUG at mm/percpu.c:689!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS Google 01/01/2011
    task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
    RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
    RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
    RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
    RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
    RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
    R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
    R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
    FS: 00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
    DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Call Trace:
    free_percpu+0x212/0x520 mm/percpu.c:1264
    ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
    sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
    ops_init+0x10a/0x530 net/core/net_namespace.c:115
    setup_net+0x2ed/0x690 net/core/net_namespace.c:291
    copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
    create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
    unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
    SYSC_unshare kernel/fork.c:2281 [inline]
    SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    This is because when tunnel->dst_cache init fails, we free dev->tstats
    once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
    redundant but its ndo_uinit() does not seem enough to clean up everything
    here. So avoid this by setting dev->tstats to NULL after the first free,
    at least for -net.

    Reported-by: Dmitry Vyukov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit 2bd137de531367fb573d90150d1872cb2a2095f7 ]

    An error was reported upgrading to 4.9.8:
    root@Typhoon:~# ip route add default table 210 nexthop dev eth0 via 10.68.64.1
    weight 1 nexthop dev eth0 via 10.68.64.2 weight 1
    RTNETLINK answers: Operation not supported

    The problem occurs when CONFIG_LWTUNNEL is not enabled and a multipath
    route is submitted.

    The point of lwtunnel_valid_encap_type_attr is catch modules that
    need to be loaded before any references are taken with rntl held. With
    CONFIG_LWTUNNEL disabled, there will be no modules to load so the
    lwtunnel_valid_encap_type_attr stub should just return 0.

    Fixes: 9ed59592e3e3 ("lwtunnel: fix autoload of lwt modules")
    Reported-by: pupilla@libero.it
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]

    Alexander Popov reported that an application may trigger a BUG_ON in
    sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
    waiting on it to queue more data and meanwhile another thread peels off
    the association being used by the first thread.

    This patch replaces the BUG_ON call with a proper error handling. It
    will return -EPIPE to the original sendmsg call, similarly to what would
    have been done if the association wasn't found in the first place.

    Acked-by: Alexander Popov
    Signed-off-by: Marcelo Ricardo Leitner
    Reviewed-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Ricardo Leitner
     
  • [ Upstream commit bd4ce941c8d5b862b2f83364be5dbe8fc8ab48f8 ]

    mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
    in a deterministic time frame and the following message may be logged:
    NOHZ: local_softirq_pending 08

    The problem is the same as what was described in commit ec13ee80145c
    ("virtio_net: invoke softirqs after __napi_schedule") and this patch
    applies the same fix to mlx4.

    Fixes: 07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails")
    Cc: Eric Dumazet
    Signed-off-by: Benjamin Poirier
    Acked-by: Eric Dumazet
    Reviewed-by: Tariq Toukan
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Benjamin Poirier
     
  • [ Upstream commit 2d6a0e9de03ee658a9adc3bfb2f0ca55dff1e478 ]

    Allocating USB buffers on the stack is not portable, and no longer
    works on x86_64 (with VMAP_STACK enabled as per default).

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • [ Upstream commit d41149145f98fe26dcd0bfd1d6cc095e6e041418 ]

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • [ Upstream commit 7926aff5c57b577ab0f43364ff0c59d968f6a414 ]

    Allocating USB buffers on the stack is not portable, and no longer
    works on x86_64 (with VMAP_STACK enabled as per default).

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • [ Upstream commit 5593523f968bc86d42a035c6df47d5e0979b5ace ]

    Allocating USB buffers on the stack is not portable, and no longer
    works on x86_64 (with VMAP_STACK enabled as per default).

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    References: https://bugs.debian.org/852556
    Reported-by: Lisandro Damián Nicanor Pérez Meyer
    Tested-by: Lisandro Damián Nicanor Pérez Meyer
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • [ Upstream commit 837585a5375c38d40361cfe64e6fd11e1addb936 ]

    When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
    Data length is verified to be greater than or equal to expected header
    length tun->vnet_hdr_sz before copying.

    Macvtap functions read the value once, but unless READ_ONCE is used,
    the compiler may ignore this and read multiple times. Enforce a single
    read and locally cached value to avoid updates between test and use.

    Signed-off-by: Willem de Bruijn
    Suggested-by: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit e1edab87faf6ca30cd137e0795bc73aa9a9a22ec ]

    When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
    Data length is verified to be greater than or equal to expected header
    length tun->vnet_hdr_sz before copying.

    Read this value once and cache locally, as it can be updated between
    the test and use (TOCTOU).

    Signed-off-by: Willem de Bruijn
    Reported-by: Dmitry Vyukov
    CC: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit ccf7abb93af09ad0868ae9033d1ca8108bdaec82 ]

    Splicing from TCP socket is vulnerable when a packet with URG flag is
    received and stored into receive queue.

    __tcp_splice_read() returns 0, and sk_wait_data() immediately
    returns since there is the problematic skb in queue.

    This is a nice way to burn cpu (aka infinite loop) and trigger
    soft lockups.

    Again, this gem was found by syzkaller tool.

    Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: Willy Tarreau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit ebf6c9cb23d7e56eec8575a88071dec97ad5c6e2 ]

    Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()

    A similar bug was fixed in commit 8ce48623f0cf ("ipv6: tcp: restore
    IP6CB for pktoptions skbs"), but I missed another spot.

    tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts

    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 7892032cfe67f4bde6fc2ee967e45a8fbaf33756 ]

    Andrey Konovalov reported out of bound accesses in ip6gre_err()

    If GRE flags contains GRE_KEY, the following expression
    *(((__be32 *)p) + (grehlen / 4) - 1)

    accesses data ~40 bytes after the expected point, since
    grehlen includes the size of IPv6 headers.

    Let's use a "struct gre_base_hdr *greh" pointer to make this
    code more readable.

    p[1] becomes greh->protocol.
    grhlen is the GRE header length.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit d71b7896886345c53ef1d84bda2bc758554f5d61 ]

    syzkaller found another out of bound access in ip_options_compile(),
    or more exactly in cipso_v4_validate()

    Fixes: 20e2a8648596 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
    Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: Paul Moore
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 34b2cef20f19c87999fff3da4071e66937db9644 ]

    Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
    is accessed.

    ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
    are present.

    We could refine the test to the presence of ts_needtime or srr,
    but IP options are not often used, so let's be conservative.

    Thanks to syzkaller team for finding this bug.

    Fixes: d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 5fa8bbda38c668e56b0c6cdecced2eac2fe36dec ]

    Dmitry reported a warning [1] showing that we were calling
    net_disable_timestamp() -> static_key_slow_dec() from a non
    process context.

    Grabbing a mutex while holding a spinlock or rcu_read_lock()
    is not allowed.

    As Cong suggested, we now use a work queue.

    It is possible netstamp_clear() exits while netstamp_needed_deferred
    is not zero, but it is probably not worth trying to do better than that.

    netstamp_needed_deferred atomic tracks the exact number of deferred
    decrements.

    [1]
    [ INFO: suspicious RCU usage. ]
    4.10.0-rc5+ #192 Not tainted
    -------------------------------
    ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
    critical section!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 0
    2 locks held by syz-executor14/23111:
    #0: (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
    include/net/sock.h:1454 [inline]
    #0: (sk_lock-AF_INET6){+.+.+.}, at: []
    rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
    #1: (rcu_read_lock){......}, at: [] nf_hook
    include/linux/netfilter.h:201 [inline]
    #1: (rcu_read_lock){......}, at: []
    __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

    stack backtrace:
    CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
    01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:15 [inline]
    dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
    lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
    rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
    ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
    __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
    mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
    atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
    __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
    static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
    net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
    sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
    __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
    sk_destruct+0x47/0x80 net/core/sock.c:1460
    __sk_free+0x57/0x230 net/core/sock.c:1468
    sock_wfree+0xae/0x120 net/core/sock.c:1645
    skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
    skb_release_all+0x15/0x60 net/core/skbuff.c:668
    __kfree_skb+0x15/0x20 net/core/skbuff.c:684
    kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
    inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
    inet_frag_put include/net/inet_frag.h:133 [inline]
    nf_ct_frag6_gather+0x1106/0x3840
    net/ipv6/netfilter/nf_conntrack_reasm.c:617
    ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
    nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
    nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
    nf_hook include/linux/netfilter.h:212 [inline]
    __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
    ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
    ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
    ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
    rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
    rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
    sock_sendmsg_nosec net/socket.c:635 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:645
    sock_write_iter+0x326/0x600 net/socket.c:848
    do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
    do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
    vfs_writev+0x87/0xc0 fs/read_write.c:911
    do_writev+0x110/0x2c0 fs/read_write.c:944
    SYSC_writev fs/read_write.c:1017 [inline]
    SyS_writev+0x27/0x30 fs/read_write.c:1014
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    RIP: 0033:0x445559
    RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
    RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
    RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
    R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
    BUG: sleeping function called from invalid context at
    kernel/locking/mutex.c:752
    in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
    INFO: lockdep is turned off.
    CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
    01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:15 [inline]
    dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
    ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
    __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
    mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
    atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
    __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
    static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
    net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
    sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
    __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
    sk_destruct+0x47/0x80 net/core/sock.c:1460
    __sk_free+0x57/0x230 net/core/sock.c:1468
    sock_wfree+0xae/0x120 net/core/sock.c:1645
    skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
    skb_release_all+0x15/0x60 net/core/skbuff.c:668
    __kfree_skb+0x15/0x20 net/core/skbuff.c:684
    kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
    inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
    inet_frag_put include/net/inet_frag.h:133 [inline]
    nf_ct_frag6_gather+0x1106/0x3840
    net/ipv6/netfilter/nf_conntrack_reasm.c:617
    ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
    nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
    nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
    nf_hook include/linux/netfilter.h:212 [inline]
    __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
    ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
    ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
    ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
    rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
    rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
    inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
    sock_sendmsg_nosec net/socket.c:635 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:645
    sock_write_iter+0x326/0x600 net/socket.c:848
    do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
    do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
    vfs_writev+0x87/0xc0 fs/read_write.c:911
    do_writev+0x110/0x2c0 fs/read_write.c:944
    SYSC_writev fs/read_write.c:1017 [inline]
    SyS_writev+0x27/0x30 fs/read_write.c:1014
    entry_SYSCALL_64_fastpath+0x1f/0xc2
    RIP: 0033:0x445559

    Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
    Suggested-by: Cong Wang
    Reported-by: Dmitry Vyukov
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 0a764db103376cf69d04449b10688f3516cc0b88 ]

    DW GMAC databook says the following about bits in "Register 15 (Interrupt
    Mask Register)":
    --------------------------->8-------------------------
    When set, this bit __disables_the_assertion_of_the_interrupt_signal__
    because of the setting of XXX bit in Register 14 (Interrupt
    Status Register).
    --------------------------->8-------------------------

    In fact even if we mask one bit in the mask register it doesn't prevent
    corresponding bit to appear in the status register, it only disables
    interrupt generation for corresponding event.

    But currently we expect a bit different behavior: status bits to be in
    sync with their masks, i.e. if mask for bit A is set in the mask
    register then bit A won't appear in the interrupt status register.

    This was proven to be incorrect assumption, see discussion here [1].
    That misunderstanding causes unexpected behaviour of the GMAC, for
    example we were happy enough to just see bogus messages about link
    state changes.

    So from now on we'll be only checking bits that really may trigger an
    interrupt.

    [1] https://lkml.org/lkml/2016/11/3/413

    Signed-off-by: Alexey Brodkin
    Cc: Giuseppe Cavallaro
    Cc: Fabrice Gasnier
    Cc: Joachim Eastwood
    Cc: Phil Reid
    Cc: David Miller
    Cc: Alexandre Torgue
    Cc: Vineet Gupta
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Brodkin
     
  • [ Upstream commit 06425c308b92eaf60767bc71d359f4cbc7a561f8 ]

    syszkaller fuzzer was able to trigger a divide by zero, when
    TCP window scaling is not enabled.

    SO_RCVBUF can be used not only to increase sk_rcvbuf, also
    to decrease it below current receive buffers utilization.

    If mss is negative or 0, just return a zero TCP window.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]

    Casting is a high precedence operation but "off" and "i" are in terms of
    bytes so we need to have some parenthesis here.

    Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
    Signed-off-by: Dan Carpenter
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]

    This function suffers from multiple issues.

    First one is that pskb_may_pull() may reallocate skb->head,
    so the 'raw' pointer needs either to be reloaded or not used at all.

    Second issue is that NEXTHDR_DEST handling does not validate
    that the options are present in skb->data, so we might read
    garbage or access non existent memory.

    With help from Willem de Bruijn.

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit fd62d9f5c575f0792f150109f1fd24a0d4b3f854 ]

    In the current version, the matchall internal state is split into two
    structs: cls_matchall_head and cls_matchall_filter. This makes little
    sense, as matchall instance supports only one filter, and there is no
    situation where one exists and the other does not. In addition, that led
    to some races when filter was deleted while packet was processed.

    Unify that two structs into one, thus simplifying the process of matchall
    creation and deletion. As a result, the new, delete and get callbacks have
    a dummy implementation where all the work is done in destroy and change
    callbacks, as was done in cls_cgroup.

    Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
    Reported-by: Daniel Borkmann
    Signed-off-by: Yotam Gigi
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yotam Gigi
     
  • [ Upstream commit a100ff3eef193d2d79daf98dcd97a54776ffeb78 ]

    Modifying TIR hash should change selected fields bitmask in addition to
    the function and key.

    Formerly, Only on ethool mlx5e_set_rxfh "ethtoo -X" we would not set this
    field resulting in zeroing of its value, which means no packet fields are
    used for RX RSS hash calculation thus causing all traffic to arrive in
    RQ[0].

    On driver load out of the box we don't have this issue, since the TIR
    hash is fully created from scratch.

    Tested:
    ethtool -X ethX hkey
    ethtool -X ethX hfunc
    ethtool -X ethX equal

    All cases are verified with TCP Multi-Stream traffic over IPv4 & IPv6.

    Fixes: bdfc028de1b3 ("net/mlx5e: Fix ethtool RX hash func configuration change")
    Signed-off-by: Gal Pressman
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Greg Kroah-Hartman

    Gal Pressman
     
  • [ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]

    Zhang Yanmin reported crashes [1] and provided a patch adding a
    synchronize_rcu() call in can_rx_unregister()

    The main problem seems that the sockets themselves are not RCU
    protected.

    If CAN uses RCU for delivery, then sockets should be freed only after
    one RCU grace period.

    Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
    ease stable backports with the following fix instead.

    [1]
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] selinux_socket_sock_rcv_skb+0x65/0x2a0

    Call Trace:

    [] security_sock_rcv_skb+0x4c/0x60
    [] sk_filter+0x41/0x210
    [] sock_queue_rcv_skb+0x53/0x3a0
    [] raw_rcv+0x2a3/0x3c0
    [] can_rcv_filter+0x12b/0x370
    [] can_receive+0xd9/0x120
    [] can_rcv+0xab/0x100
    [] __netif_receive_skb_core+0xd8c/0x11f0
    [] __netif_receive_skb+0x24/0xb0
    [] process_backlog+0x127/0x280
    [] net_rx_action+0x33b/0x4f0
    [] __do_softirq+0x184/0x440
    [] do_softirq_own_stack+0x1c/0x30

    [] do_softirq.part.18+0x3b/0x40
    [] do_softirq+0x1d/0x20
    [] netif_rx_ni+0xe5/0x110
    [] slcan_receive_buf+0x507/0x520
    [] flush_to_ldisc+0x21c/0x230
    [] process_one_work+0x24f/0x670
    [] worker_thread+0x9d/0x6f0
    [] ? rescuer_thread+0x480/0x480
    [] kthread+0x12c/0x150
    [] ret_from_fork+0x3f/0x70

    Reported-by: Zhang Yanmin
    Signed-off-by: Eric Dumazet
    Acked-by: Oliver Hartkopp
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

15 Feb, 2017

5 commits

  • Greg Kroah-Hartman
     
  • commit 451d24d1e5f40bad000fa9abe36ddb16fc9928cb upstream.

    Alexei had his box explode because doing read() on a package
    (rapl/uncore) event that isn't currently scheduled in ends up doing an
    out-of-bounds load.

    Rework the code to more explicitly deal with event->oncpu being -1.

    Reported-by: Alexei Starovoitov
    Tested-by: Alexei Starovoitov
    Tested-by: David Carrillo-Cisneros
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: eranian@google.com
    Fixes: d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG")
    Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 8381cdd0e32dd748bd34ca3ace476949948bd793 upstream.

    The -o/--order option is to select column number to sort a diff result.

    It does the job by adding a hpp field at the beginning of the sort list.
    But it should not be added to the output field list as it has no
    callbacks required by a output field.

    During the setup_sorting(), the perf_hpp__setup_output_field() appends
    the given sort keys to the output field if it's not there already.

    Originally it was checked by fmt->list being non-empty. But commit
    3f931f2c4274 ("perf hists: Make hpp setup function generic") changed it
    to check the ->equal callback.

    Anyways, we don't need to add the pseudo hpp field to the output field
    list since it won't be used for output. So just skip fields if they
    have no ->color or ->entry callbacks.

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Peter Zijlstra
    Fixes: 3f931f2c4274 ("perf hists: Make hpp setup function generic")
    Link: http://lkml.kernel.org/r/20170118051457.30946-1-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Namhyung Kim
     
  • commit a1c9f97f0b64e6337d9cfcc08c134450934fdd90 upstream.

    Commit 21e6d8428664 ("perf diff: Use perf_hpp__register_sort_field
    interface") changed list_add() to perf_hpp__register_sort_field().

    This resulted in a behavior change since the field was added to the tail
    instead of the head. So the -o option is mostly ignored due to its
    order in the list.

    This patch fixes it by adding perf_hpp__prepend_sort_field().

    Signed-off-by: Namhyung Kim
    Acked-by: Jiri Olsa
    Cc: Peter Zijlstra
    Fixes: 21e6d8428664 ("perf diff: Use perf_hpp__register_sort_field interface")
    Link: http://lkml.kernel.org/r/20170118051457.30946-2-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Namhyung Kim
     
  • commit bfeda41d06d85ad9d52f2413cfc2b77be5022f75 upstream.

    Since KERN_CONT became meaningful again, lockdep stack traces have had
    annoying extra newlines, like this:

    [ 5.561122] -> #1 (B){+.+...}:
    [ 5.561528]
    [ 5.561532] [] lock_acquire+0xc3/0x210
    [ 5.562178]
    [ 5.562181] [] mutex_lock_nested+0x74/0x6d0
    [ 5.562861]
    [ 5.562880] [] init_btrfs_fs+0x21/0x196 [btrfs]
    [ 5.563717]
    [ 5.563721] [] do_one_initcall+0x52/0x1b0
    [ 5.564554]
    [ 5.564559] [] do_init_module+0x5f/0x209
    [ 5.565357]
    [ 5.565361] [] load_module+0x218d/0x2b80
    [ 5.566020]
    [ 5.566021] [] SyS_finit_module+0xeb/0x120
    [ 5.566694]
    [ 5.566696] [] entry_SYSCALL_64_fastpath+0x1f/0xc2

    That's happening because each printk() call now gets printed on its own
    line, and we do a separate call to print the spaces before the symbol.
    Fix it by doing the printk() directly instead of using the
    print_ip_sym() helper.

    Additionally, the symbol address isn't very helpful, so let's get rid of
    that, too. The final result looks like this:

    [ 5.194518] -> #1 (B){+.+...}:
    [ 5.195002] lock_acquire+0xc3/0x210
    [ 5.195439] mutex_lock_nested+0x74/0x6d0
    [ 5.196491] do_one_initcall+0x52/0x1b0
    [ 5.196939] do_init_module+0x5f/0x209
    [ 5.197355] load_module+0x218d/0x2b80
    [ 5.197792] SyS_finit_module+0xeb/0x120
    [ 5.198251] entry_SYSCALL_64_fastpath+0x1f/0xc2

    Suggested-by: Linus Torvalds
    Signed-off-by: Omar Sandoval
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kernel-team@fb.com
    Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
    Link: http://lkml.kernel.org/r/43b4e114724b2bdb0308fa86cb33aa07d3d67fad.1486510315.git.osandov@fb.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Omar Sandoval