21 Dec, 2018

22 commits

  • commit 6512276d97b160d90b53285bd06f7f201459a7e3 upstream.

    If a locker taking the qspinlock slowpath reads a lock value indicating
    that only the pending bit is set, then it will spin whilst the
    concurrent pending->locked transition takes effect.

    Unfortunately, there is no guarantee that such a transition will ever be
    observed since concurrent lockers could continuously set pending and
    hand over the lock amongst themselves, leading to starvation. Whilst
    this would probably resolve in practice, it means that it is not
    possible to prove liveness properties about the lock and means that lock
    acquisition time is unbounded.

    Rather than removing the pending->locked spinning from the slowpath
    altogether (which has been shown to heavily penalise a 2-threaded
    locking stress test on x86), this patch replaces the explicit spinning
    with a call to atomic_cond_read_relaxed and allows the architecture to
    provide a bound on the number of spins. For architectures that can
    respond to changes in cacheline state in their smp_cond_load implementation,
    it should be sufficient to use the default bound of 1.

    Suggested-by: Waiman Long
    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Waiman Long
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: boqun.feng@gmail.com
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: paulmck@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1524738868-31318-4-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 95bcade33a8af38755c9b0636e36a36ad3789fe6 upstream.

    When a locker ends up queuing on the qspinlock locking slowpath, we
    initialise the relevant mcs node and publish it indirectly by updating
    the tail portion of the lock word using xchg_tail. If we find that there
    was a pre-existing locker in the queue, we subsequently update their
    ->next field to point at our node so that we are notified when it's our
    turn to take the lock.

    This can be roughly illustrated as follows:

    /* Initialise the fields in node and encode a pointer to node in tail */
    tail = initialise_node(node);

    /*
    * Exchange tail into the lockword using an atomic read-modify-write
    * operation with release semantics
    */
    old = xchg_tail(lock, tail);

    /* If there was a pre-existing waiter ... */
    if (old & _Q_TAIL_MASK) {
    prev = decode_tail(old);
    smp_read_barrier_depends();

    /* ... then update their ->next field to point to node.
    WRITE_ONCE(prev->next, node);
    }

    The conditional update of prev->next therefore relies on the address
    dependency from the result of xchg_tail ensuring order against the
    prior initialisation of node. However, since the release semantics of
    the xchg_tail operation apply only to the write portion of the RmW,
    then this ordering is not guaranteed and it is possible for the CPU
    to return old before the writes to node have been published, consequently
    allowing us to point prev->next to an uninitialised node.

    This patch fixes the problem by making the update of prev->next a RELEASE
    operation, which also removes the reliance on dependency ordering.

    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1518528177-19169-2-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • commit 548095dea63ffc016d39c35b32c628d033638aca upstream.

    Queued spinlocks are not used by DEC Alpha, and furthermore operations
    such as READ_ONCE() and release/relaxed RMW atomics are being changed
    to imply smp_read_barrier_depends(). This commit therefore removes the
    now-redundant smp_read_barrier_depends() from queued_spin_lock_slowpath(),
    and adjusts the comments accordingly.

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Sasha Levin

    Paul E. McKenney
     
  • commit 25896d073d8a0403b07e6dec56f58e6c33678207 upstream.

    It is troublesome to add a diagnostic like this to the Makefile
    parse stage because the top-level Makefile could be parsed with
    a stale include/config/auto.conf.

    Once you are hit by the error about non-retpoline compiler, the
    compilation still breaks even after disabling CONFIG_RETPOLINE.

    The easiest fix is to move this check to the "archprepare" like
    this commit did:

    829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")

    Reported-by: Meelis Roos
    Tested-by: Meelis Roos
    Signed-off-by: Masahiro Yamada
    Acked-by: Zhenzhong Duan
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Zhenzhong Duan
    Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
    Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com
    Link: https://lkml.org/lkml/2018/12/4/206
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Cc: Gi-Oh Kim
    Signed-off-by: Greg Kroah-Hartman

    Masahiro Yamada
     
  • commit d55d8be0747c96db28a1d08fc24d22ccd9b448ac upstream.

    Some new variants require different firmwares.

    Signed-off-by: Junwei Zhang
    Reviewed-by: Alex Deucher
    Signed-off-by: Alex Deucher
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Junwei Zhang
     
  • commit cf66b8a0ba142fbd1bf10ac8f3ae92d1b0cb7b8f upstream.

    Braswell is really picky about having our writes posted to memory before
    we execute or else the GPU may see stale values. A wmb() is insufficient
    as it only ensures the writes are visible to other cores, we need a full
    mb() to ensure the writes are in memory and visible to the GPU.

    The most frequent failure in flushing before execution is that we see
    stale PTE values and execute the wrong pages.

    References: 987abd5c62f9 ("drm/i915/execlists: Force write serialisation into context image vs execution")
    Signed-off-by: Chris Wilson
    Cc: Mika Kuoppala
    Cc: Tvrtko Ursulin
    Cc: Joonas Lahtinen
    Cc: stable@vger.kernel.org
    Reviewed-by: Tvrtko Ursulin
    Link: https://patchwork.freedesktop.org/patch/msgid/20181206084431.9805-3-chris@chris-wilson.co.uk
    (cherry picked from commit 490b8c65b9db45896769e1095e78725775f47b3e)
    Signed-off-by: Joonas Lahtinen
    Signed-off-by: Greg Kroah-Hartman

    Chris Wilson
     
  • commit 63238173b2faf3d6b85a416f1c69af6c7be2413f upstream.

    This reverts commit 7f3ef5dedb146e3d5063b6845781ad1bb59b92b5.

    It causes new warnings [1] on shutdown when running the Google Kevin or
    Scarlet (RK3399) boards under Chrome OS. Presumably our usage of DRM is
    different than what Marc and Heiko test.

    We're looking at a different approach (e.g., [2]) to replace this, but
    IMO the revert should be taken first, as it already propagated to
    -stable.

    [1] Report here:
    http://lkml.kernel.org/lkml/20181205030127.GA200921@google.com

    WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
    ...
    Call trace:
    drm_mode_config_cleanup+0x1c4/0x294
    rockchip_drm_unbind+0x4c/0x8c
    component_master_del+0x88/0xb8
    rockchip_drm_platform_remove+0x2c/0x44
    rockchip_drm_platform_shutdown+0x20/0x2c
    platform_drv_shutdown+0x2c/0x38
    device_shutdown+0x164/0x1b8
    kernel_restart_prepare+0x40/0x48
    kernel_restart+0x20/0x68
    ...
    Memory manager not clean during takedown.
    WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x34/0x44
    ...
    drm_mm_takedown+0x34/0x44
    rockchip_drm_unbind+0x64/0x8c
    component_master_del+0x88/0xb8
    rockchip_drm_platform_remove+0x2c/0x44
    rockchip_drm_platform_shutdown+0x20/0x2c
    platform_drv_shutdown+0x2c/0x38
    device_shutdown+0x164/0x1b8
    kernel_restart_prepare+0x40/0x48
    kernel_restart+0x20/0x68
    ...

    [2] https://patchwork.kernel.org/patch/10556151/
    https://www.spinics.net/lists/linux-rockchip/msg21342.html
    [PATCH] drm/rockchip: shutdown drm subsystem on shutdown

    Fixes: 7f3ef5dedb14 ("drm/rockchip: Allow driver to be shutdown on reboot/kexec")
    Cc: Jeffy Chen
    Cc: Robin Murphy
    Cc: Vicente Bergas
    Cc: Marc Zyngier
    Cc: Heiko Stuebner
    Cc: stable@vger.kernel.org
    Signed-off-by: Brian Norris
    Signed-off-by: Heiko Stuebner
    Link: https://patchwork.freedesktop.org/patch/msgid/20181205181657.177703-1-briannorris@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Brian Norris
     
  • commit 24199c5436f267399afed0c4f1f57663c0408f57 upstream.

    Noticed this while working on redoing the reference counting scheme in
    the DP MST helpers. Nouveau doesn't attempt to call
    drm_dp_mst_topology_mgr_destroy() at all, which leaves it leaking all of
    the resources for drm_dp_mst_topology_mgr and it's children mstbs+ports.

    Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 multi-stream")
    Signed-off-by: Lyude Paul
    Cc: # v4.10+
    Signed-off-by: Ben Skeggs
    Signed-off-by: Greg Kroah-Hartman

    Lyude Paul
     
  • commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream.

    The arch_teardown_msi_irqs() function assumes that controller ops
    pointers were already checked in arch_setup_msi_irqs(), but this
    assumption is wrong: arch_teardown_msi_irqs() can be called even when
    arch_setup_msi_irqs() returns an error (-ENOSYS).

    This can happen in the following scenario:
    - msi_capability_init() calls pci_msi_setup_msi_irqs()
    - pci_msi_setup_msi_irqs() returns -ENOSYS
    - msi_capability_init() notices the error and calls free_msi_irqs()
    - free_msi_irqs() calls pci_msi_teardown_msi_irqs()

    This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and
    pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just
    aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs().

    The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure
    seems legit, as it does additional cleanup; e.g.
    list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do
    happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs()
    is called and need to be cleaned up if that fails).

    Fixes: 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()")
    Cc: stable@vger.kernel.org # v3.18+
    Signed-off-by: Radu Rendec
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Radu Rendec
     
  • commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream.

    The following commands will cause a memory leak:

    # cd /sys/kernel/tracing
    # mkdir instances/foo
    # echo schedule > instance/foo/set_ftrace_filter
    # rmdir instances/foo

    The reason is that the hashes that hold the filters to set_ftrace_filter and
    set_ftrace_notrace are not freed if they contain any data on the instance
    and the instance is removed.

    Found by kmemleak detector.

    Cc: stable@vger.kernel.org
    Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 3cec638b3d793b7cacdec5b8072364b41caeb0e1 upstream.

    When create_event_filter() fails in set_trigger_filter(), the filter may
    still be allocated and needs to be freed. The caller expects the
    data->filter to be updated with the new filter, even if the new filter
    failed (we could add an error message by setting set_str parameter of
    create_event_filter(), but that's another update).

    But because the error would just exit, filter was left hanging and
    nothing could free it.

    Found by kmemleak detector.

    Cc: stable@vger.kernel.org
    Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
    Reviewed-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 687cf4412a343a63928a5c9d91bdc0f522939d43 upstream.

    Otherwise dm_bitset_cursor_begin() return -ENODATA. Other calls to
    dm_bitset_cursor_begin() have similar negative checks.

    Fixes inability to create a cache in passthrough mode (even though doing
    so makes no sense).

    Fixes: 0d963b6e65 ("dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty")
    Cc: stable@vger.kernel.org
    Reported-by: David Teigland
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mike Snitzer
     
  • commit f6c367585d0d851349d3a9e607c43e5bea993fa1 upstream.

    Sending a DM event before a thin-pool state change is about to happen is
    a bug. It wasn't realized until it became clear that userspace response
    to the event raced with the actual state change that the event was
    meant to notify about.

    Fix this by first updating internal thin-pool state to reflect what the
    DM event is being issued about. This fixes a long-standing racey/buggy
    userspace device-mapper-test-suite 'resize_io' test that would get an
    event but not find the state it was looking for -- so it would just go
    on to hang because no other events caused the test to reevaluate the
    thin-pool's state.

    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mike Snitzer
     
  • commit 76f4e2c3b6a560cdd7a75b87df543e04d05a9e5f upstream.

    cpu_is_mmp2() was equivalent to cpu_is_pj4(), wouldn't be correct for
    multiplatform kernels. Fix it by also considering mmp_chip_id, as is
    done for cpu_is_pxa168() and cpu_is_pxa910() above.

    Moreover, it is only available with CONFIG_CPU_MMP2 and thus doesn't work
    on DT-based MMP2 machines. Enable it on CONFIG_MACH_MMP2_DT too.

    Note: CONFIG_CPU_MMP2 is only used for machines that use board files
    instead of DT. It should perhaps be renamed. I'm not doing it now, because
    I don't have a better idea.

    Signed-off-by: Lubomir Rintel
    Acked-by: Arnd Bergmann
    Cc: stable@vger.kernel.org
    Signed-off-by: Olof Johansson
    Signed-off-by: Greg Kroah-Hartman

    Lubomir Rintel
     
  • commit 2e64ff154ce6ce9a8dc0f9556463916efa6ff460 upstream.

    When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection.

    Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this
    incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent
    to userspace.

    Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR
    inside of fuse_file_put.

    Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'")
    Cc: # v3.14
    Signed-off-by: Chad Austin
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Chad Austin
     
  • commit b704441e38f645dcfba1348ca3cc1ba43d1a9f31 upstream.

    We observed some premature timeouts on a virtualization platform, the log
    is like this:

    case 1:
    [159525.255629] mmc1: Internal clock never stabilised.
    [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
    [159525.256049] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00001002
    ...
    [159525.257205] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x0000fa03
    From the clock control register dump, we are pretty sure the clock was
    stablized.

    case 2:
    [ 914.550127] mmc1: Reset 0x2 never completed.
    [ 914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
    [ 914.550608] mmc1: sdhci: Sys addr: 0x00000010 | Version: 0x00001002

    After checking the sdhci code, we found the timeout check actually has a
    little window that the CPU can be scheduled out and when it comes back,
    the original time set or check is not valid.

    Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
    Cc: stable@vger.kernel.org # v4.12+
    Signed-off-by: Alek Du
    Acked-by: Adrian Hunter
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Alek Du
     
  • commit e8cde625bfe8a714a856e1366bcbb259d7346095 upstream.

    Since v2.6.22 or so there has been reports [1] about OMAP MMC being
    broken on OMAP15XX based hardware (OMAP5910 and OMAP310). The breakage
    seems to have been caused by commit 46a6730e3ff9 ("mmc-omap: Fix
    omap to use MMC_POWER_ON") that changed clock enabling to be done
    on MMC_POWER_ON. This can happen multiple times in a row, and on 15XX
    the hardware doesn't seem to like it and the MMC just stops responding.
    Fix by memorizing the power mode and do the init only when necessary.

    Before the patch (on Palm TE):

    mmc0: new SD card at address b368
    mmcblk0: mmc0:b368 SDC 977 MiB
    mmci-omap mmci-omap.0: command timeout (CMD18)
    mmci-omap mmci-omap.0: command timeout (CMD13)
    mmci-omap mmci-omap.0: command timeout (CMD13)
    mmci-omap mmci-omap.0: command timeout (CMD12) [x 6]
    mmci-omap mmci-omap.0: command timeout (CMD13) [x 6]
    mmcblk0: error -110 requesting status
    mmci-omap mmci-omap.0: command timeout (CMD8)
    mmci-omap mmci-omap.0: command timeout (CMD18)
    mmci-omap mmci-omap.0: command timeout (CMD13)
    mmci-omap mmci-omap.0: command timeout (CMD13)
    mmci-omap mmci-omap.0: command timeout (CMD12) [x 6]
    mmci-omap mmci-omap.0: command timeout (CMD13) [x 6]
    mmcblk0: error -110 requesting status
    mmcblk0: recovery failed!
    print_req_error: I/O error, dev mmcblk0, sector 0
    Buffer I/O error on dev mmcblk0, logical block 0, async page read
    mmcblk0: unable to read partition table

    After the patch:

    mmc0: new SD card at address b368
    mmcblk0: mmc0:b368 SDC 977 MiB
    mmcblk0: p1

    The patch is based on a fix and analysis done by Ladislav Michl.

    Tested on OMAP15XX/OMAP310 (Palm TE), OMAP1710 (Nokia 770)
    and OMAP2420 (Nokia N810).

    [1] https://marc.info/?t=123175197000003&r=1&w=2

    Fixes: 46a6730e3ff9 ("mmc-omap: Fix omap to use MMC_POWER_ON")
    Reported-by: Ladislav Michl
    Reported-by: Andrzej Zaborowski
    Tested-by: Ladislav Michl
    Acked-by: Tony Lindgren
    Signed-off-by: Aaro Koskinen
    Cc: stable@vger.kernel.org
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Aaro Koskinen
     
  • commit 3238c359acee4ab57f15abb5a82b8ab38a661ee7 upstream.

    We need to invalidate the caches *before* clearing the buffer via the
    non-cacheable alias, else in the worst case __dma_flush_area() may
    write back dirty lines over the top of our nice new zeros.

    Fixes: dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag")
    Cc: # 4.18.x-
    Acked-by: Will Deacon
    Signed-off-by: Robin Murphy
    Signed-off-by: Catalin Marinas
    Signed-off-by: Greg Kroah-Hartman

    Robin Murphy
     
  • commit 01e881f5a1fca4677e82733061868c6d6ea05ca7 upstream.

    Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
    could trigger an harmless false positive WARN_ON. Check the vma is
    already registered before checking VM_MAYWRITE to shut off the false
    positive warning.

    Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
    Cc:
    Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
    Signed-off-by: Andrea Arcangeli
    Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
    Acked-by: Mike Rapoport
    Acked-by: Hugh Dickins
    Acked-by: Peter Xu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andrea Arcangeli
     
  • commit a538e3ff9dabcdf6c3f477a373c629213d1c3066 upstream.

    Matthew pointed out that the ioctx_table is susceptible to spectre v1,
    because the index can be controlled by an attacker. The below patch
    should mitigate the attack for all of the aio system calls.

    Cc: stable@vger.kernel.org
    Reported-by: Matthew Wilcox
    Reported-by: Dan Carpenter
    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jeff Moyer
     
  • commit 478b6767ad26ab86d9ecc341027dd09a87b1f997 upstream.

    Pin PH11 is used on various A83T board to detect a change in the OTG
    port's ID pin, as in when an OTG host cable is plugged in.

    The incorrect offset meant the gpiochip/irqchip was activating the wrong
    pin for interrupts.

    Fixes: 4730f33f0d82 ("pinctrl: sunxi: add allwinner A83T PIO controller support")
    Cc:
    Signed-off-by: Chen-Yu Tsai
    Acked-by: Maxime Ripard
    Signed-off-by: Linus Walleij
    Signed-off-by: Greg Kroah-Hartman

    Chen-Yu Tsai
     
  • [ Upstream commit 8e7df2b5b7f245c9bd11064712db5cb69044a362 ]

    While it uses %pK, there's still few reasons to read this file
    as non-root.

    Suggested-by: Linus Torvalds
    Acked-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin

    Ingo Molnar
     

17 Dec, 2018

18 commits

  • Greg Kroah-Hartman
     
  • commit f9bfe4e6a9d08d405fe7b081ee9a13e649c97ecf upstream.

    tcp_tso_should_defer() can return true in three different cases :

    1) We are cwnd-limited
    2) We are rwnd-limited
    3) We are application limited.

    Neal pointed out that my recent fix went too far, since
    it assumed that if we were not in 1) case, we must be rwnd-limited

    Fix this by properly populating the is_cwnd_limited and
    is_rwnd_limited booleans.

    After this change, we can finally move the silly check for FIN
    flag only for the application-limited case.

    The same move for EOR bit will be handled in net-next,
    since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
    with eor mark (MSG_EOR)") is scheduled for linux-4.21

    Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
    and checking none of them was rwnd_limited in the chrono_stat
    output from "ss -ti" command.

    Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
    Signed-off-by: Eric Dumazet
    Suggested-by: Neal Cardwell
    Reviewed-by: Neal Cardwell
    Acked-by: Soheil Hassas Yeganeh
    Reviewed-by: Yuchung Cheng
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • commit 36d842194a57f1b21fbc6a6875f2fa2f9a7f8679 upstream.

    When running with KASAN, the following trace is produced:

    [ 62.535888]

    ==================================================================
    [ 62.544930] BUG: KASAN: slab-out-of-bounds in
    gut_hw_stats+0x122/0x230 [hfi1]
    [ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task
    kworker/0:1/14

    [ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted
    4.19.0-test-build-kasan+ #8
    [ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
    SE5C610.86B.01.01.0019.101220160604 10/12/2016
    [ 62.587951] Workqueue: events work_for_cpu_fn
    [ 62.594050] Call Trace:
    [ 62.598023] dump_stack+0xc6/0x14c
    [ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f
    [ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59
    [ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1]
    [ 62.622985] print_address_description+0x6c/0x23c
    [ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1]
    [ 62.636108] kasan_report.cold.6+0x241/0x308
    [ 62.642365] get_hw_stats+0x122/0x230 [hfi1]
    [ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
    [ 62.655088] ? __kmalloc+0x110/0x240
    [ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1]
    [ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core]
    [ 62.673972] ? show_hfi+0x50/0x50 [hfi1]
    [ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core]
    [ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core]
    [ 62.695340] ? show_hfi+0x50/0x50 [hfi1]
    [ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core]
    [ 62.709222] ? __vmalloc_node_range+0x2d0/0x380
    [ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
    [ 62.723735] ? vmalloc_node+0x5c/0x70
    [ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
    [ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt]
    [ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt]
    [ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt]
    [ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt]
    [ 62.766364] ? mutex_unlock+0x1d/0x40
    [ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230
    [ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt]
    [ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt]
    [ 62.796058] ? __get_txreq+0x400/0x400 [hfi1]
    [ 62.802969] ? memcpy+0x34/0x50
    [ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1]
    [ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1]
    [ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1]
    [ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1]
    [ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0
    [ 62.846429] ? pcie_capability_read_word+0xd0/0xd0
    [ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1]
    [ 62.860958] init_one+0x67f/0xae0 [hfi1]
    [ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1]
    [ 62.873876] ? wait_woken+0x130/0x130
    [ 62.879860] ? read_word_at_a_time+0xe/0x20
    [ 62.886329] ? strscpy+0x14b/0x280
    [ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1]
    [ 62.898405] local_pci_probe+0x70/0xd0
    [ 62.904295] ? pci_device_shutdown+0x90/0x90
    [ 62.910833] work_for_cpu_fn+0x29/0x40
    [ 62.916750] process_one_work+0x584/0x960
    [ 62.922974] ? rcu_work_rcufn+0x40/0x40
    [ 62.928991] ? __schedule+0x396/0xdc0
    [ 62.934806] ? __sched_text_start+0x8/0x8
    [ 62.941020] ? pick_next_task_fair+0x68b/0xc60
    [ 62.947674] ? run_rebalance_domains+0x260/0x260
    [ 62.954471] ? __list_add_valid+0x29/0xa0
    [ 62.960607] ? move_linked_works+0x1c7/0x230
    [ 62.967077] ?
    trace_event_raw_event_workqueue_execute_start+0x140/0x140
    [ 62.976248] ? mutex_lock+0xa6/0x100
    [ 62.982029] ? __mutex_lock_slowpath+0x10/0x10
    [ 62.988795] ? __switch_to+0x37a/0x710
    [ 62.994731] worker_thread+0x62e/0x9d0
    [ 63.000602] ? max_active_store+0xf0/0xf0
    [ 63.006828] ? __switch_to_asm+0x40/0x70
    [ 63.012932] ? __switch_to_asm+0x34/0x70
    [ 63.019013] ? __switch_to_asm+0x40/0x70
    [ 63.025042] ? __switch_to_asm+0x34/0x70
    [ 63.031030] ? __switch_to_asm+0x40/0x70
    [ 63.037006] ? __schedule+0x396/0xdc0
    [ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0
    [ 63.049323] ? kthread+0x59/0x1d0
    [ 63.054594] ? ret_from_fork+0x35/0x40
    [ 63.060257] ? __sched_text_start+0x8/0x8
    [ 63.066212] ? schedule+0xcf/0x250
    [ 63.071529] ? __wake_up_common+0x110/0x350
    [ 63.077794] ? __schedule+0xdc0/0xdc0
    [ 63.083348] ? wait_woken+0x130/0x130
    [ 63.088963] ? finish_task_switch+0x1f1/0x520
    [ 63.095258] ? kasan_unpoison_shadow+0x30/0x40
    [ 63.101792] ? __init_waitqueue_head+0xa0/0xd0
    [ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18
    [ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50
    [ 63.121754] ? max_active_store+0xf0/0xf0
    [ 63.127753] kthread+0x1ae/0x1d0
    [ 63.132894] ? kthread_bind+0x30/0x30
    [ 63.138422] ret_from_fork+0x35/0x40

    [ 63.146973] Allocated by task 14:
    [ 63.152077] kasan_kmalloc+0xbf/0xe0
    [ 63.157471] __kmalloc+0x110/0x240
    [ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1]
    [ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1]
    [ 63.175244] init_one+0x551/0xae0 [hfi1]
    [ 63.181065] local_pci_probe+0x70/0xd0
    [ 63.186759] work_for_cpu_fn+0x29/0x40
    [ 63.192310] process_one_work+0x584/0x960
    [ 63.198163] worker_thread+0x62e/0x9d0
    [ 63.203843] kthread+0x1ae/0x1d0
    [ 63.208874] ret_from_fork+0x35/0x40

    [ 63.217203] Freed by task 1:
    [ 63.221844] __kasan_slab_free+0x12e/0x180
    [ 63.227844] kfree+0x92/0x1a0
    [ 63.232570] single_release+0x3a/0x60
    [ 63.238024] __fput+0x1d9/0x480
    [ 63.242911] task_work_run+0x139/0x190
    [ 63.248440] exit_to_usermode_loop+0x191/0x1a0
    [ 63.254814] do_syscall_64+0x301/0x330
    [ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    [ 63.270199] The buggy address belongs to the object at
    ffff88080e8d5500
    which belongs to the cache kmalloc-4096 of size 4096
    [ 63.287247] The buggy address is located 3632 bytes inside of
    4096-byte region [ffff88080e8d5500, ffff88080e8d6500)
    [ 63.303564] The buggy address belongs to the page:
    [ 63.310447] page:ffffea00203a3400 count:1 mapcount:0
    mapping:ffff88081380e840 index:0x0 compound_mapcount: 0
    [ 63.323102] flags: 0x2fffff80008100(slab|head)
    [ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001
    ffff88081380e840
    [ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff
    0000000000000000
    [ 63.350564] page dumped because: kasan: bad access detected

    [ 63.361974] Memory state around the buggy address:
    [ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00
    [ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00
    [ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc
    fc fc fc
    [ 63.398944] ^
    [ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc
    [ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc
    fc fc fc
    [ 63.426099]
    ==================================================================

    The trace happens because get_hw_stats() assumes there is room in the
    memory allocated in init_cntrs() to accommodate the driver counters.
    Unfortunately, that routine only allocated space for the device
    counters.

    Fix by insuring the allocation has room for the additional driver
    counters.

    Cc: # v4.14+
    Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface")
    Reviewed-by: Mike Marciniczyn
    Reviewed-by: Mike Ruhl
    Signed-off-by: Piotr Stankiewicz
    Signed-off-by: Dennis Dalessandro
    Signed-off-by: Doug Ledford
    Signed-off-by: Greg Kroah-Hartman

    Piotr Stankiewicz
     
  • commit bde1a7459623a66c2abec4d0a841e4b06cc88d9a upstream.

    If it plugged headphone or headset into the jack, then
    do the reboot, it will have a chance to cause headphone no sound.
    It just need to run the headphone mode procedure after boot time.
    The issue will be fixed.
    It also suitable for ALC234 ALC274 and ALC294.

    Signed-off-by: Kailang Yang
    Cc:
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Kailang Yang
     
  • commit fa9c98e4b975bb3192ed6af09d9fa282ed3cd8a0 upstream.

    In an initial commit, 'SYNC_STATUS' register is referred to get
    clock configuration, however this is wrong, according to my local
    note at hand for reverse-engineering about packet dump. It should
    be 'CLOCK_CONFIG' register. Actually, ff400_dump_clock_config()
    is correctly programmed.

    This commit fixes the bug.

    Cc: # v4.12+
    Fixes: 76fdb3a9e13a ('ALSA: fireface: add support for Fireface 400')
    Signed-off-by: Takashi Sakamoto
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Takashi Sakamoto
     
  • commit fd29edc7232bc19f969e8f463138afc5472b3d5f upstream.

    gcc 8.1.0 generates the following warnings.

    drivers/staging/speakup/kobjects.c: In function 'punc_store':
    drivers/staging/speakup/kobjects.c:522:2: warning:
    'strncpy' output truncated before terminating nul
    copying as many bytes from a string as its length
    drivers/staging/speakup/kobjects.c:504:6: note: length computed here

    drivers/staging/speakup/kobjects.c: In function 'synth_store':
    drivers/staging/speakup/kobjects.c:391:2: warning:
    'strncpy' output truncated before terminating nul
    copying as many bytes from a string as its length
    drivers/staging/speakup/kobjects.c:388:8: note: length computed here

    Using strncpy() is indeed less than perfect since the length of data to
    be copied has already been determined with strlen(). Replace strncpy()
    with memcpy() to address the warning and optimize the code a little.

    Signed-off-by: Guenter Roeck
    Reviewed-by: Samuel Thibault
    Signed-off-by: Greg Kroah-Hartman

    Guenter Roeck
     
  • commit 320f35b7bf8cccf1997ca3126843535e1b95e9c4 upstream.

    Since commit bb21ce0ad227 we always enforce per-mirror stateid.
    However, this makes sense only for v4+ servers.

    Signed-off-by: Tigran Mkrtchyan
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Tigran Mkrtchyan
     
  • commit 0b548e33e6cb2bff240fdaf1783783be15c29080 upstream.

    Fengguang reported soft lockups while running the rbtree and interval
    tree test modules. The logic for these tests all occur in init phase,
    and we currently are pounding with the default values for number of
    nodes and number of iterations of each test. Reduce the latter by two
    orders of magnitude. This does not influence the value of the tests in
    that one thousand times by default is enough to get the picture.

    Link: http://lkml.kernel.org/r/20171109161715.xai2dtwqw2frhkcm@linux-n805
    Signed-off-by: Davidlohr Bueso
    Reported-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Guenter Roeck
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • [ Upstream commit c14376de3a1befa70d9811ca2872d47367b48767 ]

    wake_klogd is a local variable in console_unlock(). The information
    is lost when the console_lock owner using the busy wait added by
    the commit dbdda842fe96f8932 ("printk: Add console owner and waiter
    logic to load balance console writes"). The following race is
    possible:

    CPU0 CPU1
    console_unlock()

    for (;;)
    /* calling console for last message */

    printk()
    log_store()
    log_next_seq++;

    /* see new message */
    if (seen_seq != log_next_seq) {
    wake_klogd = true;
    seen_seq = log_next_seq;
    }

    console_lock_spinning_enable();

    if (console_trylock_spinning())
    /* spinning */

    if (console_lock_spinning_disable_and_check()) {
    printk_safe_exit_irqrestore(flags);
    return;

    console_unlock()
    if (seen_seq != log_next_seq) {
    /* already seen */
    /* nothing to do */

    Result: Nobody would wakeup klogd.

    One solution would be to make a global variable from wake_klogd.
    But then we would need to manipulate it under a lock or so.

    This patch wakes klogd also when console_lock is passed to the
    spinning waiter. It looks like the right way to go. Also userspace
    should have a chance to see and store any "flood" of messages.

    Note that the very late klogd wake up was a historic solution.
    It made sense on single CPU systems or when sys_syslog() operations
    were synchronized using the big kernel lock like in v2.1.113.
    But it is questionable these days.

    Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
    Link: http://lkml.kernel.org/r/20180226155734.dzwg3aovqnwtvkoy@pathway.suse.cz
    Cc: Steven Rostedt
    Cc: linux-kernel@vger.kernel.org
    Cc: Tejun Heo
    Suggested-by: Sergey Senozhatsky
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Sasha Levin

    Petr Mladek
     
  • [ Upstream commit fd5f7cde1b85d4c8e09ca46ce948e008a2377f64 ]

    This patch, basically, reverts commit 6b97a20d3a79 ("printk:
    set may_schedule for some of console_trylock() callers").
    That commit was a mistake, it introduced a big dependency
    on the scheduler, by enabling preemption under console_sem
    in printk()->console_unlock() path, which is rather too
    critical. The patch did not significantly reduce the
    possibilities of printk() lockups, but made it possible to
    stall printk(), as has been reported by Tetsuo Handa [1].

    Another issues is that preemption under console_sem also
    messes up with Steven Rostedt's hand off scheme, by making
    it possible to sleep with console_sem both in console_unlock()
    and in vprintk_emit(), after acquiring the console_sem
    ownership (anywhere between printk_safe_exit_irqrestore() in
    console_trylock_spinning() and printk_safe_enter_irqsave()
    in console_unlock()). This makes hand off less likely and,
    at the same time, may result in a significant amount of
    pending logbuf messages. Preempted console_sem owner makes
    it impossible for other CPUs to emit logbuf messages, but
    does not make it impossible for other CPUs to append new
    messages to the logbuf.

    Reinstate the old behavior and make printk() non-preemptible.
    Should any printk() lockup reports arrive they must be handled
    in a different way.

    [1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
    Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
    Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
    To: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: Tejun Heo
    Cc: akpm@linux-foundation.org
    Cc: linux-mm@kvack.org
    Cc: Cong Wang
    Cc: Dave Hansen
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Mathieu Desnoyers
    Cc: Byungchul Park
    Cc: Pavel Machek
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Reported-by: Tetsuo Handa
    Reviewed-by: Steven Rostedt (VMware)
    Signed-off-by: Petr Mladek
    Signed-off-by: Sasha Levin

    Sergey Senozhatsky
     
  • [ Upstream commit c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab ]

    The commit ("printk: Add console owner and waiter logic to load balance
    console writes") made vprintk_emit() and console_unlock() even more
    complicated.

    This patch extracts the new code into 3 helper functions. They should
    help to keep it rather self-contained. It will be easier to use and
    maintain.

    This patch just shuffles the existing code. It does not change
    the functionality.

    Link: http://lkml.kernel.org/r/20180112160837.GD24497@linux.suse
    Cc: akpm@linux-foundation.org
    Cc: linux-mm@kvack.org
    Cc: Cong Wang
    Cc: Dave Hansen
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Mathieu Desnoyers
    Cc: Tetsuo Handa
    Cc: rostedt@home.goodmis.org
    Cc: Byungchul Park
    Cc: Tejun Heo
    Cc: Pavel Machek
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Steven Rostedt (VMware)
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Sasha Levin

    Petr Mladek
     
  • [ Upstream commit dbdda842fe96f8932bae554f0adf463c27c42bc7 ]

    This patch implements what I discussed in Kernel Summit. I added
    lockdep annotation (hopefully correctly), and it hasn't had any splats
    (since I fixed some bugs in the first iterations). It did catch
    problems when I had the owner covering too much. But now that the owner
    is only set when actively calling the consoles, lockdep has stayed
    quiet.

    Here's the design again:

    I added a "console_owner" which is set to a task that is actively
    writing to the consoles. It is *not* the same as the owner of the
    console_lock. It is only set when doing the calls to the console
    functions. It is protected by a console_owner_lock which is a raw spin
    lock.

    There is a console_waiter. This is set when there is an active console
    owner that is not current, and waiter is not set. This too is protected
    by console_owner_lock.

    In printk() when it tries to write to the consoles, we have:

    if (console_trylock())
    console_unlock();

    Now I added an else, which will check if there is an active owner, and
    no current waiter. If that is the case, then console_waiter is set, and
    the task goes into a spin until it is no longer set.

    When the active console owner finishes writing the current message to
    the consoles, it grabs the console_owner_lock and sees if there is a
    waiter, and clears console_owner.

    If there is a waiter, then it breaks out of the loop, clears the waiter
    flag (because that will release the waiter from its spin), and exits.
    Note, it does *not* release the console semaphore. Because it is a
    semaphore, there is no owner. Another task may release it. This means
    that the waiter is guaranteed to be the new console owner! Which it
    becomes.

    Then the waiter calls console_unlock() and continues to write to the
    consoles.

    If another task comes along and does a printk() it too can become the
    new waiter, and we wash rinse and repeat!

    By Petr Mladek about possible new deadlocks:

    The thing is that we move console_sem only to printk() call
    that normally calls console_unlock() as well. It means that
    the transferred owner should not bring new type of dependencies.
    As Steven said somewhere: "If there is a deadlock, it was
    there even before."

    We could look at it from this side. The possible deadlock would
    look like:

    CPU0 CPU1

    console_unlock()

    console_owner = current;

    spin_lockA()
    printk()
    spin = true;
    while (...)

    call_console_drivers()
    spin_lockA()

    This would be a deadlock. CPU0 would wait for the lock A.
    While CPU1 would own the lockA and would wait for CPU0
    to finish calling the console drivers and pass the console_sem
    owner.

    But if the above is true than the following scenario was
    already possible before:

    CPU0

    spin_lockA()
    printk()
    console_unlock()
    call_console_drivers()
    spin_lockA()

    By other words, this deadlock was there even before. Such
    deadlocks are prevented by using printk_deferred() in
    the sections guarded by the lock A.

    By Steven Rostedt:

    To demonstrate the issue, this module has been shown to lock up a
    system with 4 CPUs and a slow console (like a serial console). It is
    also able to lock up a 8 CPU system with only a fast (VGA) console, by
    passing in "loops=100". The changes in this commit prevent this module
    from locking up the system.

    #include
    #include
    #include
    #include
    #include
    #include

    static bool stop_testing;
    static unsigned int loops = 1;

    static void preempt_printk_workfn(struct work_struct *work)
    {
    int i;

    while (!READ_ONCE(stop_testing)) {
    for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
    preempt_disable();
    pr_emerg("%5d%-75s\n", smp_processor_id(),
    " XXX NOPREEMPT");
    preempt_enable();
    }
    msleep(1);
    }
    }

    static struct work_struct __percpu *works;

    static void finish(void)
    {
    int cpu;

    WRITE_ONCE(stop_testing, true);
    for_each_online_cpu(cpu)
    flush_work(per_cpu_ptr(works, cpu));
    free_percpu(works);
    }

    static int __init test_init(void)
    {
    int cpu;

    works = alloc_percpu(struct work_struct);
    if (!works)
    return -ENOMEM;

    /*
    * This is just a test module. This will break if you
    * do any CPU hot plugging between loading and
    * unloading the module.
    */

    for_each_online_cpu(cpu) {
    struct work_struct *work = per_cpu_ptr(works, cpu);

    INIT_WORK(work, &preempt_printk_workfn);
    schedule_work_on(cpu, work);
    }

    return 0;
    }

    static void __exit test_exit(void)
    {
    finish();
    }

    module_param(loops, uint, 0);
    module_init(test_init);
    module_exit(test_exit);
    MODULE_LICENSE("GPL");

    Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
    Cc: akpm@linux-foundation.org
    Cc: linux-mm@kvack.org
    Cc: Cong Wang
    Cc: Dave Hansen
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Mathieu Desnoyers
    Cc: Tetsuo Handa
    Cc: Byungchul Park
    Cc: Tejun Heo
    Cc: Pavel Machek
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Steven Rostedt (VMware)
    [pmladek@suse.com: Commit message about possible deadlocks]
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Sasha Levin

    Steven Rostedt (VMware)
     
  • This reverts commit c9b8d580b3fb0ab65d37c372aef19a318fda3199.

    This is just a technical revert to make the printk fix apply cleanly,
    this patch will be re-picked in about 3 commits.

    Sasha Levin
     
  • [ Upstream commit 164f7e586739d07eb56af6f6d66acebb11f315c8 ]

    ocfs2_get_dentry() calls iput(inode) to drop the reference count of
    inode, and if the reference count hits 0, inode is freed. However, in
    this function, it then reads inode->i_generation, which may result in a
    use after free bug. Move the put operation later.

    Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
    Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
    Signed-off-by: Pan Bian
    Reviewed-by: Andrew Morton
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Pan Bian
     
  • [ Upstream commit 8de456cf87ba863e028c4dd01bae44255ce3d835 ]

    CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to
    recursive calls.

    fill_pool
    kmemleak_ignore
    make_black_object
    put_object
    __call_rcu (kernel/rcu/tree.c)
    debug_rcu_head_queue
    debug_object_activate
    debug_object_init
    fill_pool
    kmemleak_ignore
    make_black_object
    ...

    So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly
    allocated debug objects at all.

    Link: http://lkml.kernel.org/r/20181126165343.2339-1-cai@gmx.us
    Signed-off-by: Qian Cai
    Suggested-by: Catalin Marinas
    Acked-by: Waiman Long
    Acked-by: Catalin Marinas
    Cc: Thomas Gleixner
    Cc: Yang Shi
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Qian Cai
     
  • [ Upstream commit c7d7d620dcbd2a1c595092280ca943f2fced7bbd ]

    hfs_bmap_free() frees node via hfs_bnode_put(node). However it then
    reads node->this when dumping error message on an error path, which may
    result in a use-after-free bug. This patch frees node only when it is
    never used.

    Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com
    Signed-off-by: Pan Bian
    Reviewed-by: Andrew Morton
    Cc: Ernesto A. Fernandez
    Cc: Joe Perches
    Cc: Viacheslav Dubeyko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Pan Bian
     
  • [ Upstream commit ce96a407adef126870b3f4a1b73529dd8aa80f49 ]

    hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it
    then reads node->this when dumping error message on an error path, which
    may result in a use-after-free bug. This patch frees the node only when
    it is never again used.

    Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com
    Fixes: a1185ffa2fc ("HFS rewrite")
    Signed-off-by: Pan Bian
    Reviewed-by: Andrew Morton
    Cc: Joe Perches
    Cc: Ernesto A. Fernandez
    Cc: Viacheslav Dubeyko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin

    Pan Bian
     
  • [ Upstream commit 8f416836c0d50b198cad1225132e5abebf8980dc ]

    init_currently_empty_zone() will adjust pgdat->nr_zones and set it to
    'zone_idx(zone) + 1' unconditionally. This is correct in the normal
    case, while not exact in hot-plug situation.

    This function is used in two places:

    * free_area_init_core()
    * move_pfn_range_to_zone()

    In the first case, we are sure zone index increase monotonically. While
    in the second one, this is under users control.

    One way to reproduce this is:
    ----------------------------

    1. create a virtual machine with empty node1

    -m 4G,slots=32,maxmem=32G \
    -smp 4,maxcpus=8 \
    -numa node,nodeid=0,mem=4G,cpus=0-3 \
    -numa node,nodeid=1,mem=0G,cpus=4-7

    2. hot-add cpu 3-7

    cpu-add [3-7]

    2. hot-add memory to nod1

    object_add memory-backend-ram,id=ram0,size=1G
    device_add pc-dimm,id=dimm0,memdev=ram0,node=1

    3. online memory with following order

    echo online_movable > memory47/state
    echo online > memory40/state

    After this, node1 will have its nr_zones equals to (ZONE_NORMAL + 1)
    instead of (ZONE_MOVABLE + 1).

    Michal said:
    "Having an incorrect nr_zones might result in all sorts of problems
    which would be quite hard to debug (e.g. reclaim not considering the
    movable zone). I do not expect many users would suffer from this it
    but still this is trivial and obviously right thing to do so
    backporting to the stable tree shouldn't be harmful (last famous
    words)"

    Link: http://lkml.kernel.org/r/20181117022022.9956-1-richard.weiyang@gmail.com
    Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")
    Signed-off-by: Wei Yang
    Acked-by: Michal Hocko
    Reviewed-by: Oscar Salvador
    Cc: Anshuman Khandual
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Signed-off-by: Sasha Levin

    Wei Yang