25 Dec, 2017

2 commits

  • [ Upstream commit 16320f363ae128d9b9c70e60f00f2a572f57c23d ]

    To allow canceling all packets of a connection.

    Reviewed-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: Peng Tao
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao
     
  • [ Upstream commit 36d277bac8080202684e67162ebb157f16631581 ]

    So that we can cancel a queued pkt later if necessary.

    Signed-off-by: Peng Tao
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Peng Tao
     

20 Dec, 2017

7 commits

  • [ Upstream commit 99260132fde7bddc6e0132ce53da94d1c9ccabcb ]

    The original code only took into consideration the largest header
    possible after the IB_BTH_BYTES. This was incorrect, as the largest
    possible header size is the largest possible combination of headers we
    might run into. The new code accounts for all possible headers in the
    largest possible combination and subtracts that from the MTU to make
    sure that all packets will fit on the wire.

    Link: https://www.spinics.net/lists/linux-rdma/msg54558.html
    Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
    Signed-off-by: Parav Pandit
    Reviewed-by: Daniel Jurgens
    Reported-by: Roland Dreier
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Parav Pandit
     
  • [ Upstream commit 592e254502041f953e84d091eae2c68cba04c10b ]

    _calc_vm_trans() does not handle the situation when some of the passed
    flags are 0 (which can happen if these VM flags do not make sense for
    the architecture). Improve the _calc_vm_trans() macro to return 0 in
    such situation. Since all passed flags are constant, this does not add
    any runtime overhead.

    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • [ Upstream commit c962cff17dfa11f4a8227ac16de2b28aea3312e4 ]

    Revert: dc6db24d2476 ("x86/acpi: Set persistent cpuid nodeid mapping when booting")

    The mapping of "cpuid nodeid" is established at boot time via ACPI
    tables to keep associations of workqueues and other node related items
    consistent across cpu hotplug.

    But, ACPI tables are unreliable and failures with that boot time mapping
    have been reported on machines where the ACPI table and the physical
    information which is retrieved at actual hotplug is inconsistent.

    Revert the mapping implementation so it can be replaced with a less error
    prone approach.

    Signed-off-by: Dou Liyang
    Tested-by: Xiaolong Ye
    Cc: rjw@rjwysocki.net
    Cc: linux-acpi@vger.kernel.org
    Cc: guzheng1@huawei.com
    Cc: izumi.taku@jp.fujitsu.com
    Cc: lenb@kernel.org
    Link: http://lkml.kernel.org/r/1488528147-2279-2-git-send-email-douly.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dou Liyang
     
  • [ Upstream commit d7175373f2745ed4abe5b388d5aabd06304f801e ]

    The implicit transition time tells initiators the min time
    to wait before timing out a transition. We currently schedule
    the transition to occur in tg_pt_gp_implicit_trans_secs
    seconds so there is no room for delays. If
    core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
    needs to write out info to a remote file, then the initiator can
    easily time out the operation.

    Signed-off-by: Mike Christie
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mike Christie
     
  • [ Upstream commit 4cbe4dac82e423ecc9a0ba46af24a860853259f4 ]

    Some Hypervisors detach VFs from VMs by instantly causing an FLR event
    to be generated for a VF.

    In the mlx4 case, this will cause that VF's comm channel to be disabled
    before the VM has an opportunity to invoke the VF device's "shutdown"
    method.

    For such Hypervisors, there is a race condition between the VF's
    shutdown method and its internal-error detection/reset thread.

    The internal-error detection/reset thread (which runs every 5 seconds) also
    detects a disabled comm channel. If the internal-error detection/reset
    flow wins the race, we still get delays (while that flow tries repeatedly
    to detect comm-channel recovery).

    The cited commit fixed the command timeout problem when the
    internal-error detection/reset flow loses the race.

    This commit avoids the unneeded delays when the internal-error
    detection/reset flow wins.

    Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
    Signed-off-by: Jack Morgenstein
    Reported-by: Simon Xiao
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jack Morgenstein
     
  • commit 541b6fe63023f3059cf85d47ff2767a3e42a8e44 upstream.

    According to USB Specification 2.0 table 9-4,
    wMaxPacketSize is a bitfield. Endpoint's maxpacket
    is laid out in bits 10:0. For high-speed,
    high-bandwidth isochronous endpoints, bits 12:11
    contain a multiplier to tell us how many
    transactions we want to try per uframe.

    This means that if we want an isochronous endpoint
    to issue 3 transfers of 1024 bytes per uframe,
    wMaxPacketSize should contain the value:

    1024 | (2 << 11)

    or 5120 (0x1400). In order to make Host and
    Peripheral controller drivers' life easier, we're
    adding a helper which returns bits 12:11. Note that
    no care is made WRT to checking endpoint type and
    gadget's speed. That's left for drivers to handle.

    Signed-off-by: Felipe Balbi
    Signed-off-by: Greg Kroah-Hartman

    Felipe Balbi
     
  • commit af3ff8045bbf3e32f1a448542e73abb4c8ceb6f1 upstream.

    Because the HMAC template didn't check that its underlying hash
    algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))"
    through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC
    being used without having been keyed, resulting in sha3_update() being
    called without sha3_init(), causing a stack buffer overflow.

    This is a very old bug, but it seems to have only started causing real
    problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3)
    because the innermost hash's state is ->import()ed from a zeroed buffer,
    and it just so happens that other hash algorithms are fine with that,
    but SHA-3 is not. However, there could be arch or hardware-dependent
    hash algorithms also affected; I couldn't test everything.

    Fix the bug by introducing a function crypto_shash_alg_has_setkey()
    which tests whether a shash algorithm is keyed. Then update the HMAC
    template to require that its underlying hash algorithm is unkeyed.

    Here is a reproducer:

    #include
    #include

    int main()
    {
    int algfd;
    struct sockaddr_alg addr = {
    .salg_type = "hash",
    .salg_name = "hmac(hmac(sha3-512-generic))",
    };
    char key[4096] = { 0 };

    algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
    bind(algfd, (const struct sockaddr *)&addr, sizeof(addr));
    setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
    }

    Here was the KASAN report from syzbot:

    BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341 [inline]
    BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
    Write of size 4096 at addr ffff8801cca07c40 by task syzkaller076574/3044

    CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_address_description+0x73/0x250 mm/kasan/report.c:252
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x25b/0x340 mm/kasan/report.c:409
    check_memory_region_inline mm/kasan/kasan.c:260 [inline]
    check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
    memcpy+0x37/0x50 mm/kasan/kasan.c:303
    memcpy include/linux/string.h:341 [inline]
    sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
    crypto_shash_update+0xcb/0x220 crypto/shash.c:109
    shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151
    crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
    hmac_finup+0x182/0x330 crypto/hmac.c:152
    crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
    shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172
    crypto_shash_digest+0xc4/0x120 crypto/shash.c:186
    hmac_setkey+0x36a/0x690 crypto/hmac.c:66
    crypto_shash_setkey+0xad/0x190 crypto/shash.c:64
    shash_async_setkey+0x47/0x60 crypto/shash.c:207
    crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200
    hash_setkey+0x40/0x90 crypto/algif_hash.c:446
    alg_setkey crypto/af_alg.c:221 [inline]
    alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254
    SYSC_setsockopt net/socket.c:1851 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1830
    entry_SYSCALL_64_fastpath+0x1f/0x96

    Reported-by: syzbot
    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

16 Dec, 2017

2 commits

  • [ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ]

    Alexander Potapenko reported use of uninitialized memory [1]

    This happens when inserting a request socket into TCP ehash,
    in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.

    Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for
    mixed v4/v6 sockets")

    Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6
    ordering fix") missed the opportunity to get rid of
    hlist_nulls_add_tail_rcu() :

    Both UDP sockets and TCP/DCCP listeners no longer use
    __sk_nulls_add_node_rcu() for their hash insertion.

    Since all other sockets have unique 4-tuple, the reuseport status
    has no special meaning, so we can always use hlist_nulls_add_head_rcu()
    for them and save few cycles/instructions.

    [1]

    ==================================================================
    BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     
     __dump_stack lib/dump_stack.c:16
     dump_stack+0x185/0x1d0 lib/dump_stack.c:52
     kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
     __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
     __sk_nulls_add_node_rcu ./include/net/sock.h:684
     inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
     reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
     inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
     tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
     tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
     tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
     tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
     tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
     ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
     NF_HOOK ./include/linux/netfilter.h:248
     ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
     dst_input ./include/net/dst.h:477
     ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
     NF_HOOK ./include/linux/netfilter.h:248
     ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
     __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
     __netif_receive_skb net/core/dev.c:4336
     netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
     napi_skb_finish net/core/dev.c:4858
     napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
     e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
     e1000_clean_rx_irq+0x1492/0x1d30
    drivers/net/ethernet/intel/e1000/e1000_main.c:4474
     e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
     napi_poll net/core/dev.c:5500
     net_rx_action+0x73c/0x1820 net/core/dev.c:5566
     __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
     invoke_softirq kernel/softirq.c:364
     irq_exit+0x203/0x240 kernel/softirq.c:405
     exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
     do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
     common_interrupt+0x86/0x86

    Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
    Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
    Signed-off-by: Eric Dumazet
    Reported-by: Alexander Potapenko
    Acked-by: Craig Gallek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit a4abd7a80addb4a9547f7dfc7812566b60ec505c ]

    The qmi_wwan minidriver support a 'raw-ip' mode where frames are
    received without any ethernet header. This causes alignment issues
    because the skbs allocated by usbnet are "IP aligned".

    Fix by allowing minidrivers to disable the additional alignment
    offset. This is implemented using a per-device flag, since the same
    minidriver also supports 'ethernet' mode.

    Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
    Reported-and-tested-by: Jay Foster
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bjørn Mork
     

14 Dec, 2017

6 commits

  • [ Upstream commit 36a3d1dd4e16bcd0d2ddfb4a2ec7092f0ae0d931 ]

    If the amount of resources allocated to a gen_pool exceeds 2^32 then the
    avail atomic overflows and this causes problems when clients try and
    borrow resources from the pool. This is only expected to be an issue on
    64 bit systems.

    Add the header to pull in atomic_long* operations. So
    that 32 bit systems continue to use atomic32_t but 64 bit systems can
    use atomic64_t.

    Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com
    Signed-off-by: Stephen Bates
    Reviewed-by: Logan Gunthorpe
    Reviewed-by: Mathieu Desnoyers
    Reviewed-by: Daniel Mentz
    Cc: Jonathan Corbet
    Cc: Andrew Morton
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Stephen Bates
     
  • [ Upstream commit 7807e086a2d1f69cc1a57958cac04fea79fc2112 ]

    gpmc_probe_onenand_child returns success even on gpmc_onenand_init
    failure. Fix that.

    Signed-off-by: Ladislav Michl
    Acked-by: Roger Quadros
    Signed-off-by: Tony Lindgren
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Ladislav Michl
     
  • commit c0c379e2931b05facef538e53bf3b21f283d9a0b upstream.

    Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
    last pmdp_huge_get_and_clear_notify() user is gone.

    Let's drop the helper.

    Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Cc: Dave Hansen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    [jwang: adjust context for 4.9]
    Signed-off-by: Jack Wang
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     
  • commit af97a77bc01ce49a466f9d4c0125479e2e2230b6 upstream.

    Thanks to the scripts/leaking_addresses.pl script, it was found that
    some EFI values should not be readable by non-root users.

    So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to
    make this easier, and use it in other places at the same time.

    Reported-by: Linus Torvalds
    Tested-by: Dave Young
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Ard Biesheuvel
    Cc: H. Peter Anvin
    Cc: Matt Fleming
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • commit c2e8fbf908afd81ad502b567a6639598f92c9b9d upstream.

    The rps_resp buffer in ata_device is a DMA target, but it isn't
    explicitly cacheline aligned. Due to this, adjacent fields can be
    overwritten with stale data from memory on non-coherent architectures.
    As a result, the kernel is sometimes unable to communicate with an SATA
    device behind a SAS expander.

    Fix this by ensuring that the rps_resp buffer is cacheline aligned.

    This issue is similar to that fixed by Commit 84bda12af31f93 ("libata:
    align ap->sector_buf") and Commit 4ee34ea3a12396f35b26 ("libata: Align
    ata_device's id on a cacheline").

    Signed-off-by: Huacai Chen
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Huacai Chen
     
  • commit 860dd4424f344400b491b212ee4acb3a358ba9d9 upstream.

    Provide the dummy version of dma_get_cache_alignment that always returns
    1 even if CONFIG_HAS_DMA is not set, so that drivers and subsystems can
    use it without ifdefs.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Martin K. Petersen
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     

10 Dec, 2017

6 commits

  • commit 81cf4a45360f70528f1f64ba018d61cb5767249a upstream.

    As most of BOS descriptors are longer in length than their header
    'struct usb_dev_cap_header', comparing solely with it is not sufficient
    to avoid out-of-bounds access to BOS descriptors.

    This patch adds descriptor type specific length check in
    usb_get_bos_descriptor() to fix the issue.

    Signed-off-by: Masakazu Mokuno
    Signed-off-by: Greg Kroah-Hartman

    Masakazu Mokuno
     
  • commit a009e975da5c7d42a7f5eaadc54946eb5f76c9af upstream.

    The dma_fence.error field (formerly known as dma_fence.status) is an
    optional field that may be set by drivers before calling
    dma_fence_signal(). The field can be used to indicate that the fence was
    completed in err rather than with success, and is visible to other
    consumers of the fence and to userspace via sync_file.

    This patch renames the field from status to error so that its meaning is
    hopefully more clear (and distinct from dma_fence_get_status() which is
    a composite between the error state and signal state) and adds a helper
    that validates the preconditions of when it is suitable to adjust the
    error field.

    Signed-off-by: Chris Wilson
    Reviewed-by: Daniel Vetter
    Reviewed-by: Sumit Semwal
    Signed-off-by: Sumit Semwal
    Link: http://patchwork.freedesktop.org/patch/msgid/20170104141222.6992-3-chris@chris-wilson.co.uk
    [s/dma_fence/fence/g - gregkh]
    Cc: Jisheng Zhang
    Signed-off-by: Greg Kroah-Hartman

    Chris Wilson
     
  • commit d6c99f4bf093a58d3ab47caaec74b81f18bc4e3f upstream.

    The fence->status is an optional field that is only valid once the fence
    has been signaled. (Driver may fill the fence->status with an error code
    prior to calling dma_fence_signal().) Given the restriction upon its
    validity, wrap querying of the fence->status into a helper
    dma_fence_get_status().

    Signed-off-by: Chris Wilson
    Reviewed-by: Daniel Vetter
    Reviewed-by: Sumit Semwal
    Signed-off-by: Sumit Semwal
    Link: http://patchwork.freedesktop.org/patch/msgid/20170104141222.6992-2-chris@chris-wilson.co.uk
    [s/dma_fence/fence/g - gregkh]
    Cc: Jisheng Zhang
    Signed-off-by: Greg Kroah-Hartman

    Chris Wilson
     
  • commit 8111477663813caa1a4469cfe6afaae36cd04513 upstream.

    Often we have the task of comparing two seqno known to be on the same
    context, so provide a common __dma_fence_is_later().

    Signed-off-by: Chris Wilson
    Cc: Sumit Semwal
    Cc: Sean Paul
    Cc: Gustavo Padovan
    Reviewed-by: Sean Paul
    Signed-off-by: Gustavo Padovan
    Link: http://patchwork.freedesktop.org/patch/msgid/20170629125930.821-1-chris@chris-wilson.co.uk
    [renamed to __fence_is_later() - gregkh]
    Cc: Jisheng Zhang
    Signed-off-by: Greg Kroah-Hartman

    Chris Wilson
     
  • [ Upstream commit 0911d0041c22922228ca52a977d7b0b0159fee4b ]

    Some ->page_mkwrite handlers may return VM_FAULT_RETRY as its return
    code (GFS2 or Lustre can definitely do this). However VM_FAULT_RETRY
    from ->page_mkwrite is completely unhandled by the mm code and results
    in locking and writeably mapping the page which definitely is not what
    the caller wanted.

    Fix Lustre and block_page_mkwrite_ret() used by other filesystems
    (notably GFS2) to return VM_FAULT_NOPAGE instead which results in
    bailing out from the fault code, the CPU then retries the access, and we
    fault again effectively doing what the handler wanted.

    Link: http://lkml.kernel.org/r/20170203150729.15863-1-jack@suse.cz
    Signed-off-by: Jan Kara
    Reported-by: Al Viro
    Reviewed-by: Jinshan Xiong
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • [ Upstream commit 475113d937adfd150eb82b5e2c5507125a68e7af ]

    It's possible to set up PEBS events to get only errors and not
    any data, like on SNB-X (model 45) and IVB-EP (model 62)
    via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

    This leads to a soft lock up, because the error path of the
    intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
    for error PEBS interrupts, so in case you're getting ONLY
    errors you don't have a way to stop the event when it's over
    the max_samples_per_tick limit:

    NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
    ...
    RIP: 0010:[] [] smp_call_function_single+0xe2/0x140
    ...
    Call Trace:
    ? trace_hardirqs_on_caller+0xf5/0x1b0
    ? perf_cgroup_attach+0x70/0x70
    perf_install_in_context+0x199/0x1b0
    ? ctx_resched+0x90/0x90
    SYSC_perf_event_open+0x641/0xf90
    SyS_perf_event_open+0x9/0x10
    do_syscall_64+0x6c/0x1f0
    entry_SYSCALL64_slow_path+0x25/0x25

    Add perf_event_account_interrupt() which does the interrupt
    and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
    error path.

    We keep the pending_kill and pending_wakeup logic only in the
    __perf_event_overflow() path, because they make sense only if
    there's any data to deliver.

    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     

05 Dec, 2017

2 commits

  • commit cf33c1ee5254c6a430bc1538232b49c3ea13e613 upstream.

    This patch try to fix the building error on MIPS. The reason is MIPS
    has already defined the PTR macro, which conflicts with the PTR macro
    in include/uapi/linux/bcache.h.

    [fixed by mlyle: corrected a line-length issue]

    Signed-off-by: Huacai Chen
    Reviewed-by: Michael Lyle
    Signed-off-by: Michael Lyle
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Huacai Chen
     
  • commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream.

    Patch series "device-dax: fix unaligned munmap handling"

    When device-dax is operating in huge-page mode we want it to behave like
    hugetlbfs and fail attempts to split vmas into unaligned ranges. It
    would be messy to teach the munmap path about device-dax alignment
    constraints in the same (hstate) way that hugetlbfs communicates this
    constraint. Instead, these patches introduce a new ->split() vm
    operation.

    This patch (of 2):

    The device-dax interface has similar constraints as hugetlbfs in that it
    requires the munmap path to unmap in huge page aligned units. Rather
    than add more custom vma handling code in __split_vma() introduce a new
    vm operation to perform this vma specific check.

    Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
    Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
    Signed-off-by: Dan Williams
    Cc: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

30 Nov, 2017

1 commit


24 Nov, 2017

2 commits

  • commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.

    In reset_deferred_meminit() we determine number of pages that must not
    be deferred. We initialize pages for at least 2G of memory, but also
    pages for reserved memory in this node.

    The reserved memory is determined in this function:
    memblock_reserved_memory_within(), which operates over physical
    addresses, and returns size in bytes. However, reset_deferred_meminit()
    assumes that that this function operates with pfns, and returns page
    count.

    The result is that in the best case machine boots slower than expected
    due to initializing more pages than needed in single thread, and in the
    worst case panics because fewer than needed pages are initialized early.

    Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
    Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
    Signed-off-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Pavel Tatashin
     
  • [ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ]

    When run ipvs in two different network namespace at the same host, and one
    ipvs transport network traffic to the other network namespace ipvs.
    'ipvs_property' flag will make the second ipvs take no effect. So we should
    clear 'ipvs_property' when SKB network namespace changed.

    Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()")
    Signed-off-by: Ye Yin
    Signed-off-by: Wei Zhou
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ye Yin
     

21 Nov, 2017

4 commits

  • [ Upstream commit feb0869d90e51ce8b6fd8a46588465b1b5a26d09 ]

    Consistently use types from linux/types.h to fix the following
    linux/rds.h userspace compilation errors:

    /usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
    uint8_t name[32];
    /usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
    uint64_t value;
    /usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
    uint64_t next_tx_seq;
    /usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
    uint64_t next_rx_seq;
    /usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
    uint8_t transport[TRANSNAMSIZ]; /* null term ascii */
    /usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
    uint8_t flags;
    /usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
    uint64_t seq;
    /usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
    uint32_t len;
    /usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
    uint8_t flags;
    /usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
    uint32_t sndbuf;
    /usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
    uint32_t rcvbuf;
    /usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
    uint64_t inum;
    /usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
    uint64_t hdr_rem;
    /usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
    uint64_t data_rem;
    /usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
    uint32_t last_sent_nxt;
    /usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
    uint32_t last_expected_una;
    /usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
    uint32_t last_seen_una;
    /usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
    uint8_t src_gid[RDS_IB_GID_LEN];
    /usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
    uint8_t dst_gid[RDS_IB_GID_LEN];
    /usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
    uint32_t max_send_wr;
    /usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
    uint32_t max_recv_wr;
    /usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
    uint32_t max_send_sge;
    /usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
    uint32_t rdma_mr_max;
    /usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
    uint32_t rdma_mr_size;
    /usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
    typedef uint64_t rds_rdma_cookie_t;
    /usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
    uint64_t addr;
    /usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
    uint64_t bytes;
    /usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
    uint64_t cookie_addr;
    /usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
    uint64_t flags;
    /usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
    uint64_t cookie_addr;
    /usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
    uint64_t flags;
    /usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
    uint64_t flags;
    /usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
    uint64_t local_vec_addr;
    /usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
    uint64_t nr_local;
    /usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
    uint64_t flags;
    /usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
    uint64_t user_token;
    /usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
    uint64_t local_addr;
    /usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
    uint64_t remote_addr;
    /usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
    uint64_t compare;
    /usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
    uint64_t swap;
    /usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
    uint64_t add;
    /usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
    uint64_t compare;
    /usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
    uint64_t swap;
    /usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
    uint64_t compare_mask;
    /usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
    uint64_t swap_mask;
    /usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
    uint64_t add;
    /usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
    uint64_t nocarry_mask;
    /usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
    uint64_t flags;
    /usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
    uint64_t user_token;
    /usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
    uint64_t user_token;
    /usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
    int32_t status;

    Signed-off-by: Dmitry V. Levin
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dmitry V. Levin
     
  • [ Upstream commit 1786dbf3702e33ce3afd2d3dbe630bd04b1d2e58 ]

    On the kernel side, sockaddr_storage is #define'd to
    __kernel_sockaddr_storage. Replacing struct sockaddr_storage with
    struct __kernel_sockaddr_storage defined by fixes
    the following linux/rds.h userspace compilation error:

    /usr/include/linux/rds.h:226:26: error: field 'dest_addr' has incomplete type
    struct sockaddr_storage dest_addr;

    Signed-off-by: Dmitry V. Levin
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dmitry V. Levin
     
  • This reverts commit ad50561ba7a664bc581826c9d57d137fcf17bfa5.

    There was a mixup with the commit message for two upstream commit
    that have the same subject line.

    This revert will be followed by the two commits with proper commit
    messages.

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     
  • [ Upstream commit d97556c8012015901a3ce77f46960078139cd79d ]

    We need to also have OFFPULLUDENABLE bit set to use the off mode pull values.
    Otherwise the line is pulled down internally if no external pull exists.

    This is has some documentation at:

    http://processors.wiki.ti.com/index.php/Optimizing_OMAP35x_and_AM/DM37x_OFF_mode_PAD_configuration

    Note that the value is still glitchy during off mode transitions as documented
    in spz319f.pdf "Advisory 1.45". It's best to use external pulls instead of
    relying on the internal ones for off mode and even then anything pulled up
    will get driven down momentarily on off mode restore for GPIO banks other
    than bank1.

    Signed-off-by: Tony Lindgren
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tony Lindgren
     

18 Nov, 2017

6 commits

  • commit 59b6986dbfcdab96a971f9663221849de79a7556 upstream.

    Allocate a task management request structure for all task management
    requests, including task reassignment. This change avoids that the
    se_tmr->response assignment dereferences an uninitialized se_tmr
    pointer.

    Reported-by: Moshe David
    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Cc: Moshe David
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     
  • commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream.

    This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.

    It was not a good idea. The custom hash table was a much better
    fit for this purpose.

    A fast lookup is not essential, in fact for most cases there is no lookup
    at all because original tuple is not taken and can be used as-is.
    What needs to be fast is insertion and deletion.

    rhlist removal however requires a rhlist walk.
    We can have thousands of entries in such a list if source port/addresses
    are reused for multiple flows, if this happens removal requests are so
    expensive that deletions of a few thousand flows can take several
    seconds(!).

    The advantages that we got from rhashtable are:
    1) table auto-sizing
    2) multiple locks

    1) would be nice to have, but it is not essential as we have at
    most one lookup per new flow, so even a million flows in the bysource
    table are not a problem compared to current deletion cost.
    2) is easy to add to custom hash table.

    I tried to add hlist_node to rhlist to speed up rhltable_remove but this
    isn't doable without changing semantics. rhltable_remove_fast will
    check that the to-be-deleted object is part of the table and that
    requires a list walk that we want to avoid.

    Furthermore, using hlist_node increases size of struct rhlist_head, which
    in turn increases nf_conn size.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
    Reported-by: Ivan Babrou
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 06f877d613be3621604c2520ec0351d9fbdca15f ]

    In my first attempt to fix the lockdep splat, I forgot we could
    enter inet_csk_route_req() with a freshly allocated request socket,
    for which refcount has not yet been elevated, due to complex
    SLAB_TYPESAFE_BY_RCU rules.

    We either are in rcu_read_lock() section _or_ we own a refcount on the
    request.

    Correct RCU verb to use here is rcu_dereference_check(), although it is
    not possible to prove we actually own a reference on a shared
    refcount :/

    In v2, I added ireq_opt_deref() helper and use in three places, to fix other
    possible splats.

    [ 49.844590] lockdep_rcu_suspicious+0xea/0xf3
    [ 49.846487] inet_csk_route_req+0x53/0x14d
    [ 49.848334] tcp_v4_route_req+0xe/0x10
    [ 49.850174] tcp_conn_request+0x31c/0x6a0
    [ 49.851992] ? __lock_acquire+0x614/0x822
    [ 49.854015] tcp_v4_conn_request+0x5a/0x79
    [ 49.855957] ? tcp_v4_conn_request+0x5a/0x79
    [ 49.858052] tcp_rcv_state_process+0x98/0xdcc
    [ 49.859990] ? sk_filter_trim_cap+0x2f6/0x307
    [ 49.862085] tcp_v4_do_rcv+0xfc/0x145
    [ 49.864055] ? tcp_v4_do_rcv+0xfc/0x145
    [ 49.866173] tcp_v4_rcv+0x5ab/0xaf9
    [ 49.868029] ip_local_deliver_finish+0x1af/0x2e7
    [ 49.870064] ip_local_deliver+0x1b2/0x1c5
    [ 49.871775] ? inet_del_offload+0x45/0x45
    [ 49.873916] ip_rcv_finish+0x3f7/0x471
    [ 49.875476] ip_rcv+0x3f1/0x42f
    [ 49.876991] ? ip_local_deliver_finish+0x2e7/0x2e7
    [ 49.878791] __netif_receive_skb_core+0x6d3/0x950
    [ 49.880701] ? process_backlog+0x7e/0x216
    [ 49.882589] __netif_receive_skb+0x1d/0x5e
    [ 49.884122] process_backlog+0x10c/0x216
    [ 49.885812] net_rx_action+0x147/0x3df

    Fixes: a6ca7abe53633 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
    Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races")
    Signed-off-by: Eric Dumazet
    Reported-by: kernel test robot
    Reported-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit c92e8c02fe664155ac4234516e32544bec0f113d ]

    syzkaller found another bug in DCCP/TCP stacks [1]

    For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
    ireq->pktopts race"), we need to make sure we do not access
    ireq->opt unless we own the request sock.

    Note the opt field is renamed to ireq_opt to ease grep games.

    [1]
    BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
    Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295

    CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    print_address_description+0x73/0x250 mm/kasan/report.c:252
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x25b/0x340 mm/kasan/report.c:409
    __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
    ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
    tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
    tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
    tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
    __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
    tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
    tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
    tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
    tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
    ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
    dst_input include/net/dst.h:464 [inline]
    ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
    __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
    __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
    netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
    netif_receive_skb+0xae/0x390 net/core/dev.c:4611
    tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
    tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
    tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
    call_write_iter include/linux/fs.h:1770 [inline]
    new_sync_write fs/read_write.c:468 [inline]
    __vfs_write+0x68a/0x970 fs/read_write.c:481
    vfs_write+0x18f/0x510 fs/read_write.c:543
    SYSC_write fs/read_write.c:588 [inline]
    SyS_write+0xef/0x220 fs/read_write.c:580
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x40c341
    RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
    RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
    RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
    R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000

    Allocated by task 3295:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
    save_stack+0x43/0xd0 mm/kasan/kasan.c:447
    set_track mm/kasan/kasan.c:459 [inline]
    kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
    __do_kmalloc mm/slab.c:3725 [inline]
    __kmalloc+0x162/0x760 mm/slab.c:3734
    kmalloc include/linux/slab.h:498 [inline]
    tcp_v4_save_options include/net/tcp.h:1962 [inline]
    tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
    tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
    tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
    tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
    tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
    tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
    ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
    dst_input include/net/dst.h:464 [inline]
    ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
    __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
    __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
    netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
    netif_receive_skb+0xae/0x390 net/core/dev.c:4611
    tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
    tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
    tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
    call_write_iter include/linux/fs.h:1770 [inline]
    new_sync_write fs/read_write.c:468 [inline]
    __vfs_write+0x68a/0x970 fs/read_write.c:481
    vfs_write+0x18f/0x510 fs/read_write.c:543
    SYSC_write fs/read_write.c:588 [inline]
    SyS_write+0xef/0x220 fs/read_write.c:580
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Freed by task 3306:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
    save_stack+0x43/0xd0 mm/kasan/kasan.c:447
    set_track mm/kasan/kasan.c:459 [inline]
    kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
    __cache_free mm/slab.c:3503 [inline]
    kfree+0xca/0x250 mm/slab.c:3820
    inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
    __sk_destruct+0xfd/0x910 net/core/sock.c:1560
    sk_destruct+0x47/0x80 net/core/sock.c:1595
    __sk_free+0x57/0x230 net/core/sock.c:1603
    sk_free+0x2a/0x40 net/core/sock.c:1614
    sock_put include/net/sock.h:1652 [inline]
    inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
    tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
    tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
    ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
    dst_input include/net/dst.h:464 [inline]
    ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
    NF_HOOK include/linux/netfilter.h:249 [inline]
    ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
    __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
    __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
    netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
    netif_receive_skb+0xae/0x390 net/core/dev.c:4611
    tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
    tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
    tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
    call_write_iter include/linux/fs.h:1770 [inline]
    new_sync_write fs/read_write.c:468 [inline]
    __vfs_write+0x68a/0x970 fs/read_write.c:481
    vfs_write+0x18f/0x510 fs/read_write.c:543
    SYSC_write fs/read_write.c:588 [inline]
    SyS_write+0xef/0x220 fs/read_write.c:580
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 0ad646c81b2182f7fa67ec0c8c825e0ee165696d ]

    register_netdevice() could fail early when we have an invalid
    dev name, in which case ->ndo_uninit() is not called. For tun
    device, this is a problem because a timer etc. are already
    initialized and it expects ->ndo_uninit() to clean them up.

    We could move these initializations into a ->ndo_init() so
    that register_netdevice() knows better, however this is still
    complicated due to the logic in tun_detach().

    Therefore, I choose to just call dev_get_valid_name() before
    register_netdevice(), which is quicker and much easier to audit.
    And for this specific case, it is already enough.

    Fixes: 96442e42429e ("tuntap: choose the txq based on rxq")
    Reported-by: Dmitry Alexeev
    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 2b7cda9c35d3b940eb9ce74b30bbd5eb30db493d ]

    Based on SNMP values provided by Roman, Yuchung made the observation
    that some crashes in tcp_sacktag_walk() might be caused by MTU probing.

    Looking at tcp_mtu_probe(), I found that when a new skb was placed
    in front of the write queue, we were not updating tcp highest sack.

    If one skb is freed because all its content was copied to the new skb
    (for MTU probing), then tp->highest_sack could point to a now freed skb.

    Bad things would then happen, including infinite loops.

    This patch renames tcp_highest_sack_combine() and uses it
    from tcp_mtu_probe() to fix the bug.

    Note that I also removed one test against tp->sacked_out,
    since we want to replace tp->highest_sack regardless of whatever
    condition, since keeping a stale pointer to freed skb is a recipe
    for disaster.

    Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
    Signed-off-by: Eric Dumazet
    Reported-by: Alexei Starovoitov
    Reported-by: Roman Gushchin
    Reported-by: Oleksandr Natalenko
    Acked-by: Alexei Starovoitov
    Acked-by: Neal Cardwell
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

15 Nov, 2017

2 commits

  • commit 3510c7aa069aa83a2de6dab2b41401a198317bdc upstream.

    The recent fix for adding rwsem nesting annotation was using the given
    "hop" argument as the lock subclass key. Although the idea itself
    works, it may trigger a kernel warning like:
    BUG: looking up invalid subclass: 8
    ....
    since the lockdep has a smaller number of subclasses (8) than we
    currently allow for the hops there (10).

    The current definition is merely a sanity check for avoiding the too
    deep delivery paths, and the 8 hops are already enough. So, as a
    quick fix, just follow the max hops as same as the max lockdep
    subclasses.

    Fixes: 1f20f9ff57ca ("ALSA: seq: Fix nested rwsem annotation for lockdep splat")
    Reported-by: syzbot
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Takashi Iwai
     
  • commit 7c4788950ba5922fde976d80b72baf46f14dee8d upstream.

    I recently encountered wreckage because access_ok() was used where it
    should not be, add an explicit WARN when access_ok() is used wrongly.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra