31 Jan, 2019

40 commits

  • commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.

    The function truncate_node frees the page with f2fs_put_page. However,
    the page index is read after that. So, the patch reads the index before
    freeing the page.

    Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
    Cc:
    Signed-off-by: Pan Bian
    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Pan Bian
     
  • commit 867cefb4cb1012f42cada1c7d1f35ac8dd276071 upstream.

    Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
    sched_clock() interface") broke Xen guest time handling across
    migration:

    [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
    [ 187.251137] OOM killer disabled.
    [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
    [ 187.252299] suspending xenstore...
    [ 187.266987] xen:grant_table: Grant tables using version 1 layout
    [18446743811.706476] OOM killer enabled.
    [18446743811.706478] Restarting tasks ... done.
    [18446743811.720505] Setting capacity to 16777216

    Fix that by setting xen_sched_clock_offset at resume time to ensure a
    monotonic clock value.

    [boris: replaced pr_info() with pr_info_once() in xen_callback_vector()
    to avoid printing with incorrect timestamp during resume (as we
    haven't re-adjusted the clock yet)]

    Fixes: f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
    Cc: # 4.11
    Reported-by: Hans van Kranenburg
    Signed-off-by: Juergen Gross
    Tested-by: Hans van Kranenburg
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Juergen Gross
     
  • commit 38669ba205d178d2d38bfd194a196d65a44d5af2 upstream.

    It is expected for sched_clock() to output data from 0, when system boots.

    Add an offset xen_sched_clock_offset (similarly how it is done in other
    hypervisors i.e. kvm_sched_clock_offset) to count sched_clock() from 0,
    when time is first initialized.

    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: peterz@infradead.org
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-14-pasha.tatashin@oracle.com
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Pavel Tatashin
     
  • commit 2229f70b5bbb025e1394b61007938a68060afbfb upstream.

    In order to support pvclock vdso on xen we need to setup the time
    info page for vcpu 0 and register the page with Xen using the
    VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
    will also forcefully update the pvti which will set some of the
    necessary flags for vdso. Afterwards we check if it supports the
    PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
    vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
    will be later on used when mapping the vdso image.

    The xen headers are also updated to include the new hypercall for
    registering the secondary vcpu_time_info struct.

    Signed-off-by: Joao Martins
    Reviewed-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Joao Martins
     
  • commit b888808093113ae7d63d213272d01fea4b8329ed upstream.

    Specifically check for PVCLOCK_TSC_STABLE_BIT and if this bit is set,
    then set it too on pvclock flags. This allows Xen clocksource to use it
    and thus speeding up xen_clocksource_read() callers (i.e. sched_clock())

    Signed-off-by: Joao Martins
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Joao Martins
     
  • commit 9f08890ab906abaf9d4c1bad8111755cbd302260 upstream.

    Right now there is only a pvclock_pvti_cpu0_va() which is defined
    on kvmclock since:

    commit dac16fba6fc5
    ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")

    The only user of this interface so far is kvm. This commit adds a
    setter function for the pvti page and moves pvclock_pvti_cpu0_va
    to pvclock, which is a more generic place to have it; and would
    allow other PV clocksources to use it, such as Xen.

    While moving pvclock_pvti_cpu0_va into pvclock, rename also this
    function to pvclock_get_pvti_cpu0_va (including its call sites)
    to be symmetric with the setter (pvclock_set_pvti_cpu0_va).

    Signed-off-by: Joao Martins
    Acked-by: Andy Lutomirski
    Acked-by: Paolo Bonzini
    Acked-by: Thomas Gleixner
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Joao Martins
     
  • commit 001f60e1f662a6dee1630a2915401aaf5959d479 upstream.

    In the event of moving pvclock_pvti_cpu0_va() definition to common
    pvclock code, this function would return a value on non KVM guests.
    Later on this would fail with a GPF on ptp_kvm_init when running on a
    Xen guest. Therefore, ptp_kvm_init() should check whether it is running
    in a KVM guest.

    Signed-off-by: Joao Martins
    Acked-by: Radim Krčmář
    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Joao Martins
     
  • commit f068090426ea8d72c408ebd42953a82a88e2282c upstream.

    Ensure that the shared_hcd pointer is valid when calling usb_put_hcd()

    The shared_hcd is removed and freed in xhci by first calling
    usb_remove_hcd(xhci->shared_hcd), and later
    usb_put_hcd(xhci->shared_hcd)

    Afer commit fe190ed0d602 ("xhci: Do not halt the host until both HCD have
    disconnected their devices.") the shared_hcd was never properly put as
    xhci->shared_hcd was set to NULL before usb_put_hcd(xhci->shared_hcd) was
    called.

    shared_hcd (USB3) is removed before primary hcd (USB2).
    While removing the primary hcd we might need to handle xhci interrupts
    to cleanly remove last USB2 devices, therefore we need to set
    xhci->shared_hcd to NULL before removing the primary hcd to let xhci
    interrupt handler know shared_hcd is no longer available.

    xhci-plat.c, xhci-histb.c and xhci-mtk first create both their hcd's before
    adding them. so to keep the correct reverse removal order use a temporary
    shared_hcd variable for them.
    For more details see commit 4ac53087d6d4 ("usb: xhci: plat: Create both
    HCDs before adding them")

    Fixes: fe190ed0d602 ("xhci: Do not halt the host until both HCD have disconnected their devices.")
    Cc: Joel Stanley
    Cc: Chunfeng Yun
    Cc: Thierry Reding
    Cc: Jianguo Sun
    Cc:
    Reported-by: Jack Pham
    Tested-by: Jack Pham
    Tested-by: Peter Chen
    Signed-off-by: Mathias Nyman
    Signed-off-by: Sudip Mukherjee
    Signed-off-by: Greg Kroah-Hartman

    Mathias Nyman
     
  • commit bd6742249b9ca918565e4e3abaa06665e587f4b5 upstream.

    OUT endpoint requests may somtimes have this flag set when
    preparing to be submitted to HW indicating that there is an
    additional TRB chained to the request for alignment purposes.
    If that request is removed before the controller can execute the
    transfer (e.g. ep_dequeue/ep_disable), the request will not go
    through the dwc3_gadget_ep_cleanup_completed_request() handler
    and will not have its needs_extra_trb flag cleared when
    dwc3_gadget_giveback() is called. This same request could be
    later requeued for a new transfer that does not require an
    extra TRB and if it is successfully completed, the cleanup
    and TRB reclamation will incorrectly process the additional TRB
    which belongs to the next request, and incorrectly advances the
    TRB dequeue pointer, thereby messing up calculation of the next
    requeust's actual/remaining count when it completes.

    The right thing to do here is to ensure that the flag is cleared
    before it is given back to the function driver. A good place
    to do that is in dwc3_gadget_del_and_unmap_request().

    Fixes: c6267a51639b ("usb: dwc3: gadget: align transfers to wMaxPacketSize")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jack Pham
    Signed-off-by: Felipe Balbi
    [jackp: backport to
    Signed-off-by: Greg Kroah-Hartman

    Jack Pham
     
  • commit 5cbab6303b4791a3e6713dfe2c5fda6a867f9adc upstream.

    Under heavy load if we don't have any pre-allocated rsps left, we
    dynamically allocate a rsp, but we are not actually allocating memory
    for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
    fields (req->rsp->status) in nvmet_req_init() will result in crash.

    To fix this, allocate the memory for nvme_completion by calling
    nvmet_rdma_alloc_rsp()

    Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")

    Cc:
    Reviewed-by: Max Gurtovoy
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Raju Rangoju
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Raju Rangoju
     
  • commit ad1f824948e4ed886529219cf7cd717d078c630d upstream.

    Signed-off-by: Israel Rukshin
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Cc: Raju Rangoju
    Signed-off-by: Greg Kroah-Hartman

    Israel Rukshin
     
  • commit 60f1bf29c0b2519989927cae640cd1f50f59dc7f upstream.

    When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read
    from pcpu_devices->lowcore. However, due to prefixing, that will result
    in reading from absolute address 0 on that CPU. We have to go via the
    actual lowcore instead.

    This means that right now, we will read lc->nodat_stack == 0 and
    therfore work on a very wrong stack.

    This BUG essentially broke rebooting under QEMU TCG (which will report
    a low address protection exception). And checking under KVM, it is
    also broken under KVM. With 1 VCPU it can be easily triggered.

    :/# echo 1 > /proc/sys/kernel/sysrq
    :/# echo b > /proc/sysrq-trigger
    [ 28.476745] sysrq: SysRq : Resetting
    [ 28.476793] Kernel stack overflow.
    [ 28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
    [ 28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
    [ 28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140)
    [ 28.476861] R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    [ 28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000
    [ 28.476864] 0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0
    [ 28.476864] 000000000010dff8 0000000000000000 0000000000000000 0000000000000000
    [ 28.476865] 000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000
    [ 28.476887] Krnl Code: 0000000000115bfe: 4170f000 la %r7,0(%r15)
    [ 28.476887] 0000000000115c02: 41f0a000 la %r15,0(%r10)
    [ 28.476887] #0000000000115c06: e370f0980024 stg %r7,152(%r15)
    [ 28.476887] >0000000000115c0c: c0e5fffff86e brasl %r14,114ce8
    [ 28.476887] 0000000000115c12: 41f07000 la %r15,0(%r7)
    [ 28.476887] 0000000000115c16: a7f4ffa8 brc 15,115b66
    [ 28.476887] 0000000000115c1a: 0707 bcr 0,%r7
    [ 28.476887] 0000000000115c1c: 0707 bcr 0,%r7
    [ 28.476901] Call Trace:
    [ 28.476902] Last Breaking-Event-Address:
    [ 28.476920] [] arch_call_rest_init+0x22/0x80
    [ 28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue.
    [ 28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
    [ 28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
    [ 28.476932] Call Trace:

    Fixes: 2f859d0dad81 ("s390/smp: reduce size of struct pcpu")
    Cc: stable@vger.kernel.org # 4.0+
    Reported-by: Cornelia Huck
    Signed-off-by: David Hildenbrand
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Greg Kroah-Hartman

    David Hildenbrand
     
  • Upstream commit:

    f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

    introduced a bug, which was later fixed by upstream commit:

    5663d8f9bbe4 ("kvm: x86: fix WARN due to uninitialized guest FPU state")

    For reasons unknown, both commits were initially passed-over for
    inclusion in the 4.14 stable branch despite being tagged for stable.
    Eventually, someone noticed that the fixup, commit 5663d8f9bbe4, was
    missing from stable[1], and so it was queued up for 4.14 and included in
    release v4.14.79.

    Even later, the original buggy patch, commit f775b13eedee, was also
    applied to the 4.14 stable branch. Through an unlucky coincidence, the
    incorrect ordering did not generate a conflict between the two patches,
    and led to v4.14.94 and later releases containing a spurious call to
    kvm_load_guest_fpu() in kvm_arch_vcpu_ioctl_run(). As a result, KVM may
    reload stale guest FPU state, e.g. after accepting in INIT event. This
    can manifest as crashes during boot, segfaults, failed checksums and so
    on and so forth.

    Remove the unwanted kvm_{load,put}_guest_fpu() calls, i.e. make
    kvm_arch_vcpu_ioctl_run() look like commit 5663d8f9bbe4 was backported
    after commit f775b13eedee.

    [1] https://www.spinics.net/lists/stable/msg263931.html

    Fixes: 4124a4cff344 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
    Cc: stable@vger.kernel.org
    Cc: Sasha Levin
    Cc: Greg Kroah-Hartman
    Cc: Peter Xu
    Cc: Rik van Riel
    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Reported-by: Roman Mamedov
    Reported-by: Thomas Lindroth
    Signed-off-by: Sean Christopherson
    Acked-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     
  • commit 52a76235d0c4dd259cd0df503afed4757c04ba1d upstream.

    Currently we are using all the available fifo size in RQS and
    TQS fields. This will not work correctly in multi-queues IP's
    because total fifo size must be splitted to the enabled queues.

    Correct this by computing the available fifo size per queue and
    setting the right value in TQS and RQS fields.

    Signed-off-by: Jose Abreu
    Cc: David S. Miller
    Cc: Joao Pinto
    Cc: Giuseppe Cavallaro
    Cc: Alexandre Torgue
    Signed-off-by: David S. Miller
    Cc: Niklas Cassel
    Signed-off-by: Greg Kroah-Hartman

    Jose Abreu
     
  • This reverts commit e65cd9a20343ea90f576c24c38ee85ab6e7d5fec.

    Tommi T. Rrantala notes:

    PTRACE_SECCOMP_GET_METADATA was only added in 4.16
    (26500475ac1b499d8636ff281311d633909f5d20)

    And it's also breaking seccomp_bpf.c compilation for me:

    seccomp_bpf.c: In function ‘get_metadata’:
    seccomp_bpf.c:2878:26: error: storage size of ‘md’ isn’t known
    struct seccomp_metadata md;

    Signed-off-by: Sasha Levin

    Sasha Levin
     
  • [ Upstream commit 1fe627da30331024f453faef04d500079b901107 ]

    libdwfl parses an ELF file itself and creates mappings for the
    individual sections. perf on the other hand sees raw mmap events which
    represent individual sections. When we encounter an address pointing
    into a mapping with pgoff != 0, we must take that into account and
    report the file at the non-offset base address.

    This fixes unwinding with libdwfl in some cases. E.g. for a file like:

    ```

    using namespace std;

    mutex g_mutex;

    double worker()
    {
    lock_guard guard(g_mutex);
    uniform_real_distribution uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 1000; ++i) {
    s += norm(complex(uniform(engine), uniform(engine)));
    }
    cout << s << endl;
    return s;
    }

    int main()
    {
    vector> results;
    for (int i = 0; i < 10000; ++i) {
    results.push_back(async(launch::async, worker));
    }
    return 0;
    }
    ```

    Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp -o cpp-locking`,
    then record it with `perf record --call-graph dwarf -e
    sched:sched_switch`.

    When you analyze it with `perf script` and libunwind, you should see:

    ```
    cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
    7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
    7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
    7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
    7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
    7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
    7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
    7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
    7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
    7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator
    7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
    7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
    563b9cb502f7 std::ostream::operator<(std::__invoke_other, double (*&&)())+0x2b (inlined)
    563b9cb506fb std::__invoke_result::type std::__invoke(double (*&&)())+0x2b (inlined)
    563b9cb506fb decltype (__invoke((_S_declval)())) std::thread::_Invoker >::_M_invoke(std::_Index_tuple)+0x2b (inlined)
    563b9cb506fb std::thread::_Invoker >::operator()()+0x2b (inlined)
    563b9cb506fb std::__future_base::_Task_setter, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker >, dou>
    563b9cb506fb std::_Function_handler (), std::__future_base::_Task_setter
    563b9cb507e8 std::function ()>::operator()() const+0x28 (inlined)
    563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*)+0x28 (/ssd/milian/>
    7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
    563b9cb51149 __gthread_once+0xe9 (inlined)
    563b9cb51149 void std::call_once ()>*, bool*)>
    563b9cb51149 std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool)+0xe9 (inlined)
    563b9cb51149 std::__future_base::_Async_state_impl >, double>::_Async_state_impl(std::thread::_Invoker >&&)::{lambda()#1}::op>
    563b9cb51149 void std::__invoke_impl >, double>::_Async_state_impl(std::thread::_Invoker
    563b9cb51149 std::__invoke_result >, double>::_Async_state_impl(std::thread::_Invoker >>
    563b9cb51149 decltype (__invoke((_S_declval)())) std::thread::_Invoker >, double>::_Async_state_>
    563b9cb51149 std::thread::_Invoker >, double>::_Async_state_impl(std::thread::_Invoker
    563b9cb51149 std::thread::_State_impl >, double>::_Async_state_impl(std::thread>
    7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25)
    7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
    7f38e42ccb22 __GI___clone+0x42 (inlined)
    ```

    Before this patch, using libdwfl, you would see:

    ```
    cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
    a041161e77950c5c [unknown] ([unknown])
    ```

    With this patch applied, we get a bit further in unwinding:

    ```
    cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
    7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
    7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
    7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
    7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
    7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
    7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
    7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
    7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
    7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
    7f38e463fa6d std::basic_streambuf >::sputn(char const*, long)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator >::_M_put(char const*, long)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator > std::__write(std::ostreambuf_iterator >, char const*, int)+0x1cd (inlined)
    7f38e463fa6d std::ostreambuf_iterator > std::num_put > >::_M_insert_float(std::ostreambuf_iterator
    7f38e464bd70 std::num_put > >::put(std::ostreambuf_iterator >, std::ios_base&, char, double) const+0x90 (inl>
    7f38e464bd70 std::ostream& std::ostream::_M_insert(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
    563b9cb502f7 std::ostream::operator<
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20181029141644.3907-1-milian.wolff@kdab.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Milian Wolff
     
  • [ Upstream commit 3d20c6246690219881786de10d2dda93f616d0ac ]

    Path passed to libdw for unwinding doesn't include symfs path
    if specified, so unwinding fails because ELF file is not found.

    Similar to unwinding with libunwind, pass symsrc_filename instead
    of long_name. If there is no symsrc_filename, fallback to long_name.

    Signed-off-by: Martin Vuille
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/r/20180211212420.18388-1-jpmv27@aim.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Martin Vuille
     
  • commit 0c9b1965faddad7534b6974b5b36c4ad37998f8e upstream.

    User space using poll() on /dev/vcs devices are not awaken when a
    screen size change occurs. Let's fix that.

    Signed-off-by: Nicolas Pitre
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Nicolas Pitre
     
  • commit 93171ba6f1deffd82f381d36cb13177872d023f6 upstream.

    Kyungtae Kim detected a potential integer overflow in bcm_[rx|tx]_setup()
    when the conversion into ktime multiplies the given value with NSEC_PER_USEC
    (1000).

    Reference: https://marc.info/?l=linux-can&m=154732118819828&w=2

    Add a check for the given tv_usec, so that the value stays below one second.
    Additionally limit the tv_sec value to a reasonable value for CAN related
    use-cases of 400 days and ensure all values to be positive.

    Reported-by: Kyungtae Kim
    Tested-by: Oliver Hartkopp
    Signed-off-by: Oliver Hartkopp
    Cc: linux-stable # >= 2.6.26
    Tested-by: Kyungtae Kim
    Acked-by: Andre Naujoks
    Signed-off-by: Marc Kleine-Budde
    Signed-off-by: Greg Kroah-Hartman

    Oliver Hartkopp
     
  • commit 7b12c8189a3dc50638e7d53714c88007268d47ef upstream.

    This patch revert commit 7da11ba5c506
    ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")

    After introduction of this change we encountered following new error
    message on various i.MX plattforms (flexcan):

    | flexcan 53fc8000.can can0: __can_get_echo_skb: BUG! Trying to echo non
    | existing skb: can_priv::echo_skb[0]

    The introduction of the message was a mistake because
    priv->echo_skb[idx] = NULL is a perfectly valid in following case: If
    CAN_RAW_LOOPBACK is disabled (setsockopt) in applications, the pkt_type
    of the tx skb's given to can_put_echo_skb is set to PACKET_LOOPBACK. In
    this case can_put_echo_skb will not set priv->echo_skb[idx]. It is
    therefore kept NULL.

    As additional argument for revert: The order of check and usage of idx
    was changed. idx is used to access an array element before checking it's
    boundaries.

    Signed-off-by: Manfred Schlaegl
    Fixes: 7da11ba5c506 ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")
    Cc: linux-stable
    Signed-off-by: Marc Kleine-Budde
    Signed-off-by: Greg Kroah-Hartman

    Manfred Schlaegl
     
  • commit 8208d1708b88b412ca97f50a6d951242c88cbbac upstream.

    The way we allocate events works fine in most cases, except
    when multiple PCI devices share an ITS-visible DevID, and that
    one of them is trying to use MultiMSI allocation.

    In that case, our allocation is not guaranteed to be zero-based
    anymore, and we have to make sure we allocate it on a boundary
    that is compatible with the PCI Multi-MSI constraints.

    Fix this by allocating the full region upfront instead of iterating
    over the number of MSIs. MSI-X are always allocated one by one,
    so this shouldn't change anything on that front.

    Fixes: b48ac83d6bbc2 ("irqchip: GICv3: ITS: MSI support")
    Cc: stable@vger.kernel.org
    Reported-by: Ard Biesheuvel
    Tested-by: Ard Biesheuvel
    Signed-off-by: Marc Zyngier
    Signed-off-by: Greg Kroah-Hartman

    Marc Zyngier
     
  • commit 93ad0fc088c5b4631f796c995bdd27a082ef33a6 upstream.

    The recent commit which prevented a division by 0 issue in the alarm timer
    code broke posix CPU timers as an unwanted side effect.

    The reason is that the common rearm code checks for timer->it_interval
    being 0 now. What went unnoticed is that the posix cpu timer setup does not
    initialize timer->it_interval as it stores the interval in CPU timer
    specific storage. The reason for the separate storage is historical as the
    posix CPU timers always had a 64bit nanoseconds representation internally
    while timer->it_interval is type ktime_t which used to be a modified
    timespec representation on 32bit machines.

    Instead of reverting the offending commit and fixing the alarmtimer issue
    in the alarmtimer code, store the interval in timer->it_interval at CPU
    timer setup time so the common code check works. This also repairs the
    existing inconistency of the posix CPU timer code which kept a single shot
    timer armed despite of the interval being 0.

    The separate storage can be removed in mainline, but that needs to be a
    separate commit as the current one has to be backported to stable kernels.

    Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug")
    Reported-by: H.J. Lu
    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 7e6fc2f50a3197d0e82d1c0e86282976c9e6c8a4 upstream.

    The outb() function takes parameters value and port, in that order. Fix
    the parameters used in the kalsr i8254 fallback code.

    Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions")
    Signed-off-by: Daniel Drake
    Signed-off-by: Thomas Gleixner
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: linux@endlessm.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com
    Signed-off-by: Greg Kroah-Hartman

    Daniel Drake
     
  • commit e1812933b17be7814f51b6c310c5d1ced7a9a5f5 upstream.

    There was a bug where the per-mm pkey state was not being preserved across
    fork() in the child. fork() is performed in the pkey selftests, but all of
    the pkey activity is performed in the parent. The child does not perform
    any actions sensitive to pkey state.

    To make the test more sensitive to these kinds of bugs, add a fork() where
    the parent exits, and execution continues in the child.

    To achieve this let the key exhaustion test not terminate at the first
    allocation failure and fork after 2*NR_PKEYS loops and continue in the
    child.

    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: peterz@infradead.org
    Cc: mpe@ellerman.id.au
    Cc: will.deacon@arm.com
    Cc: luto@kernel.org
    Cc: jroedel@suse.de
    Cc: stable@vger.kernel.org
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Michael Ellerman
    Cc: Will Deacon
    Cc: Andy Lutomirski
    Cc: Joerg Roedel
    Link: https://lkml.kernel.org/r/20190102215657.585704B7@viggo.jf.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit a31e184e4f69965c99c04cc5eb8a4920e0c63737 upstream.

    Memory protection key behavior should be the same in a child as it was
    in the parent before a fork. But, there is a bug that resets the
    state in the child at fork instead of preserving it.

    The creation of new mm's is a bit convoluted. At fork(), the code
    does:

    1. memcpy() the parent mm to initialize child
    2. mm_init() to initalize some select stuff stuff
    3. dup_mmap() to create true copies that memcpy() did not do right

    For pkeys two bits of state need to be preserved across a fork:
    'execute_only_pkey' and 'pkey_allocation_map'.

    Those are preserved by the memcpy(), but mm_init() invokes
    init_new_context() which overwrites 'execute_only_pkey' and
    'pkey_allocation_map' with "new" values.

    The author of the code erroneously believed that init_new_context is *only*
    called at execve()-time. But, alas, init_new_context() is used at execve()
    and fork().

    The result is that, after a fork(), the child's pkey state ends up looking
    like it does after an execve(), which is totally wrong. pkeys that are
    already allocated can be allocated again, for instance.

    To fix this, add code called by dup_mmap() to copy the pkey state from
    parent to child explicitly. Also add a comment above init_new_context() to
    make it more clear to the next poor sod what this code is used for.

    Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls")
    Signed-off-by: Dave Hansen
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Thomas Gleixner
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: peterz@infradead.org
    Cc: mpe@ellerman.id.au
    Cc: will.deacon@arm.com
    Cc: luto@kernel.org
    Cc: jroedel@suse.de
    Cc: stable@vger.kernel.org
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Peter Zijlstra
    Cc: Michael Ellerman
    Cc: Will Deacon
    Cc: Andy Lutomirski
    Cc: Joerg Roedel
    Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Dave Hansen
     
  • commit 5cc244a20b86090c087073c124284381cdf47234 upstream.

    The single-step debugging of KVM guests on x86 is broken: if we run
    gdb 'stepi' command at the breakpoint when the guest interrupts are
    enabled, RIP always jumps to native_apic_mem_write(). Then other
    nasty effects follow.

    Long investigation showed that on Jun 7, 2017 the
    commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
    introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
    be called without X86_EFLAGS_TF set.

    Let's fix it. Please consider that for -stable.

    Signed-off-by: Alexander Popov
    Cc: stable@vger.kernel.org
    Fixes: c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Alexander Popov
     
  • commit 1856b9f7bcc8e9bdcccc360aabb56fbd4dd6c565 upstream.

    The dm-crypt cipher specification in a mapping table is defined as:
    cipher[:keycount]-chainmode-ivmode[:ivopts]
    or (new crypt API format):
    capi:cipher_api_spec-ivmode[:ivopts]

    For ESSIV, the parameter includes hash specification, for example:
    aes-cbc-essiv:sha256

    The implementation expected that additional IV option to never include
    another dash '-' character.

    But, with SHA3, there are names like sha3-256; so the mapping table
    parser fails:

    dmsetup create test --table "0 8 crypt aes-cbc-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"
    or (new crypt API format)
    dmsetup create test --table "0 8 crypt capi:cbc(aes)-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"

    device-mapper: crypt: Ignoring unexpected additional cipher options
    device-mapper: table: 253:0: crypt: Error creating IV
    device-mapper: ioctl: error adding target to table

    Fix the dm-crypt constructor to ignore additional dash in IV options and
    also remove a bogus warning (that is ignored anyway).

    Cc: stable@vger.kernel.org # 4.12+
    Signed-off-by: Milan Broz
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Milan Broz
     
  • commit d445bd9cec1a850c2100fcf53684c13b3fd934f2 upstream.

    Commit 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next
    stage processing") changed process_prepared_discard_passdown_pt1() to
    increment all the blocks being discarded until after the passdown had
    completed to avoid them being prematurely reused.

    IO issued to a thin device that breaks sharing with a snapshot, followed
    by a discard issued to snapshot(s) that previously shared the block(s),
    results in passdown_double_checking_shared_status() being called to
    iterate through the blocks double checking their reference count is zero
    and issuing the passdown if so. So a side effect of commit 00a0ea33b495
    is passdown_double_checking_shared_status() was broken.

    Fix this by checking if the block reference count is greater than 1.
    Also, rename dm_pool_block_is_used() to dm_pool_block_is_shared().

    Fixes: 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing")
    Cc: stable@vger.kernel.org # 4.9+
    Reported-by: ryan.p.norwood@gmail.com
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit 11189c1089da413aa4b5fd6be4c4d47c78968819 upstream.

    The _DSM function number validation only happens to succeed when the
    generic Linux command number translation corresponds with a
    DSM-family-specific function number. This breaks NVDIMM-N
    implementations that correctly implement _LSR, _LSW, and _LSI, but do
    not happen to publish support for DSM function numbers 4, 5, and 6.

    Recall that the support for _LS{I,R,W} family of methods results in the
    DIMM being marked as supporting those command numbers at
    acpi_nfit_register_dimms() time. The DSM function mask is only used for
    ND_CMD_CALL support of non-NVDIMM_FAMILY_INTEL devices.

    Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command...")
    Cc:
    Link: https://github.com/pmem/ndctl/issues/78
    Reported-by: Sujith Pandel
    Tested-by: Sujith Pandel
    Reviewed-by: Vishal Verma
    Reviewed-by: Jeff Moyer
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • commit 5e9e38d0db1d29efed1dd4cf9a70115d33521be7 upstream.

    In preparation for using function number 0 as an error value, prevent it
    from being considered a valid function value by acpi_nfit_ctl().

    Cc:
    Cc: stuart hayes
    Fixes: e02fb7264d8a ("nfit: add Microsoft NVDIMM DSM command set...")
    Reported-by: Jeff Moyer
    Reviewed-by: Jeff Moyer
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • commit d77651a227f8920dd7ec179b84e400cce844eeb3 upstream.

    An integer overflow may arise in uinput_validate_absinfo() if "max - min"
    can't be represented by an "int". We should check for overflow before
    trying to use the result.

    Reported-by: Kyungtae Kim
    Reviewed-by: Peter Hutterer
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     
  • commit f0907827a8a9152aedac2833ed1b674a7b2a44f2 upstream.

    This adds wrappers for the __builtin overflow checkers present in gcc
    5.1+ as well as fallback implementations for earlier compilers. It's not
    that easy to implement the fully generic __builtin_X_overflow(T1 a, T2
    b, T3 *d) in macros, so the fallback code assumes that T1, T2 and T3 are
    the same. We obviously don't want the wrappers to have different
    semantics depending on $GCC_VERSION, so we also insist on that even when
    using the builtins.

    There are a few problems with the 'a+b < a' idiom for checking for
    overflow: For signed types, it relies on undefined behaviour and is
    not actually complete (it doesn't check underflow;
    e.g. INT_MIN+INT_MIN == 0 isn't caught). Due to type promotion it
    is wrong for all types (signed and unsigned) narrower than
    int. Similarly, when a and b does not have the same type, there are
    subtle cases like

    u32 a;

    if (a + sizeof(foo) < a)
    return -EOVERFLOW;
    a += sizeof(foo);

    where the test is always false on 64 bit platforms. Add to that that it
    is not always possible to determine the types involved at a glance.

    The new overflow.h is somewhat bulky, but that's mostly a result of
    trying to be type-generic, complete (e.g. catching not only overflow
    but also signed underflow) and not relying on undefined behaviour.

    Linus is of course right [1] that for unsigned subtraction a-b, the
    right way to check for overflow (underflow) is "b > a" and not
    "__builtin_sub_overflow(a, b, &d)", but that's just one out of six cases
    covered here, and included mostly for completeness.

    So is it worth it? I think it is, if nothing else for the documentation
    value of seeing

    if (check_add_overflow(a, b, &d))
    return -EGOAWAY;
    do_stuff_with(d);

    instead of the open-coded (and possibly wrong and/or incomplete and/or
    UBsan-tickling)

    if (a+b < a)
    return -EGOAWAY;
    do_stuff_with(a+b);

    While gcc does recognize the 'a+b < a' idiom for testing unsigned add
    overflow, it doesn't do nearly as good for unsigned multiplication
    (there's also no single well-established idiom). So using
    check_mul_overflow in kcalloc and friends may also make gcc generate
    slightly better code.

    [1] https://lkml.org/lkml/2015/11/2/658

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Rasmus Villemoes
     
  • commit fe2bfd0d40c935763812973ce15f5764f1c12833 upstream.

    Add support for the SteelSeries Stratus Duo, a wireless Xbox 360
    controller. The Stratus Duo ships with a USB dongle to enable wireless
    connectivity, but it can also function as a wired controller by connecting
    it directly to a PC via USB, hence the need for two USD PIDs. 0x1430 is the
    dongle, and 0x1431 is the controller.

    Signed-off-by: Tom Panfil
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Tom Panfil
     
  • commit ef68e831840c40c7d01b328b3c0f5d8c4796c232 upstream.

    When executing add_credits() we currently call cifs_reconnect()
    if the number of credits is zero and there are no requests in
    flight. In this case we may call cifs_reconnect() recursively
    twice and cause memory corruption given the following sequence
    of functions:

    mid1.callback() -> add_credits() -> cifs_reconnect() ->
    -> mid2.callback() -> add_credits() -> cifs_reconnect().

    Fix this by avoiding to call cifs_reconnect() in add_credits()
    and checking for zero credits in the demultiplex thread.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit ec678eae746dd25766a61c4095e2b649d3b20b09 upstream.

    We do need to account for credits received in error responses
    to read requests on encrypted sessions.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit 8004c78c68e894e4fd5ac3c22cc22eb7dc24cabc upstream.

    Currently we mark MID as malformed if we get an error from server
    in a read response. This leads to not properly processing credits
    in the readv callback. Fix this by marking such a response as
    normal received response and process it appropriately.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.

    When doing MTU i/o we need to leave some credits for
    possible reopen requests and other operations happening
    in parallel. Currently we leave 1 credit which is not
    enough even for reopen only: we need at least 2 credits
    if durable handle reconnect fails. Also there may be
    other operations at the same time including compounding
    ones which require 3 credits at a time each. Fix this
    by leaving 8 credits which is big enough to cover most
    scenarios.

    Was able to reproduce this when server was configured
    to give out fewer credits than usual.

    The proper fix would be to reconnect a file handle first
    and then obtain credits for an MTU request but this leads
    to bigger code changes and should happen in other patches.

    Cc:
    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit ba50bf1ce9a51fc97db58b96d01306aa70bc3979 upstream.

    fc96df16a1ce is good and can already fix the "return stack garbage" issue,
    but let's also improve hv_ringbuffer_get_debuginfo(), which would silently
    return stack garbage, if people forget to check channel->state or
    ring_info->ring_buffer, when using the function in the future.

    Having an error check in the function would eliminate the potential risk.

    Add a Fixes tag to indicate the patch depdendency.

    Fixes: fc96df16a1ce ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels")
    Cc: stable@vger.kernel.org
    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Signed-off-by: Stephen Hemminger
    Signed-off-by: Dexuan Cui
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Dexuan Cui
     
  • commit da8ced360ca8ad72d8f41f5c8fcd5b0e63e1555f upstream.

    Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use
    128M. To deal with it we implement partial section onlining by registering
    custom page onlining callback (hv_online_page()). Later, when more memory
    arrives we try to online the 'tail' (see hv_bring_pgs_online()).

    It was found that in some cases this 'tail' onlining causes issues:

    BUG: Bad page state in process kworker/0:2 pfn:109e3a
    page:ffffe08344278e80 count:0 mapcount:1 mapping:0000000000000000 index:0x0
    flags: 0xfffff80000000()
    raw: 000fffff80000000 dead000000000100 dead000000000200 0000000000000000
    raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    page dumped because: nonzero mapcount
    ...
    Workqueue: events hot_add_req [hv_balloon]
    Call Trace:
    dump_stack+0x5c/0x80
    bad_page.cold.112+0x7f/0xb2
    free_pcppages_bulk+0x4b8/0x690
    free_unref_page+0x54/0x70
    hv_page_online_one+0x5c/0x80 [hv_balloon]
    hot_add_req.cold.24+0x182/0x835 [hv_balloon]
    ...

    Turns out that we now have deferred struct page initialization for memory
    hotplug so e.g. memory_block_action() in drivers/base/memory.c does
    pages_correctly_probed() check and in that check it avoids inspecting
    struct pages and checks sections instead. But in Hyper-V balloon driver we
    do PageReserved(pfn_to_page()) check and this is now wrong.

    Switch to checking online_section_nr() instead.

    Signed-off-by: Vitaly Kuznetsov
    Cc: stable@kernel.org
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • commit fc01d8c61ce02c034e67378cd3e645734bc18c8c upstream.

    Fix __might_sleep warning[1] in tty/n_hdlc.c read due to copy_to_user
    call while current is TASK_INTERRUPTIBLE. This is a false positive
    since the code path does not depend on current state remaining
    TASK_INTERRUPTIBLE. The loop breaks out and sets TASK_RUNNING after
    calling copy_to_user.

    This patch supresses the warning by setting TASK_RUNNING before calling
    copy_to_user.

    [1] https://syzkaller.appspot.com/bug?id=17d5de7f1fcab794cb8c40032f893f52de899324

    Signed-off-by: Paul Fulghum
    Reported-by: syzbot
    Cc: Tetsuo Handa
    Cc: Alan Cox
    Cc: stable
    Acked-by: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Paul Fulghum