01 Apr, 2020

13 commits

  • [ Upstream commit 6002059d7882c3512e6ac52fa82424272ddfcd5c ]

    During initialization the driver issues a software reset command and
    then waits for the system status to change back to "ready" state.

    However, before issuing the reset command the driver does not check that
    the system is actually in "ready" state. On Spectrum-{1,2} systems this
    was always the case as the hardware initialization time is very short.
    On Spectrum-3 systems this is no longer the case. This results in the
    software reset command timing-out and the driver failing to load:

    [ 6.347591] mlxsw_spectrum3 0000:06:00.0: Cmd exec timed-out (opcode=40(ACCESS_REG),opcode_mod=0,in_mod=0)
    [ 6.358382] mlxsw_spectrum3 0000:06:00.0: Reg cmd access failed (reg_id=9023(mrsr),type=write)
    [ 6.368028] mlxsw_spectrum3 0000:06:00.0: cannot register bus device
    [ 6.375274] mlxsw_spectrum3: probe of 0000:06:00.0 failed with error -110

    Fix this by waiting for the system to become ready both before issuing
    the reset command and afterwards. In case of failure, print the last
    system status to aid in debugging.

    Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ido Schimmel
     
  • [ Upstream commit b06d072ccc4b1acd0147b17914b7ad1caa1818bb ]

    Only attach macsec to ethernet devices.

    Syzbot was able to trigger a KMSAN warning in macsec_handle_frame
    by attaching to a phonet device.

    Macvlan has a similar check in macvlan_port_create.

    v1->v2
    - fix commit message typo

    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit dddeb30bfc43926620f954266fd12c65a7206f07 ]

    There is a place,

    inet_dump_fib()
    fib_table_dump
    fn_trie_dump_leaf()
    hlist_for_each_entry_rcu()

    without rcu_read_lock() will trigger a warning,

    WARNING: suspicious RCU usage
    -----------------------------
    net/ipv4/fib_trie.c:2216 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/1923:
    #0: ffffffff8ce76e40 (rtnl_mutex){+.+.}, at: netlink_dump+0xd6/0x840

    Call Trace:
    dump_stack+0xa1/0xea
    lockdep_rcu_suspicious+0x103/0x10d
    fn_trie_dump_leaf+0x581/0x590
    fib_table_dump+0x15f/0x220
    inet_dump_fib+0x4ad/0x5d0
    netlink_dump+0x350/0x840
    __netlink_dump_start+0x315/0x3e0
    rtnetlink_rcv_msg+0x4d1/0x720
    netlink_rcv_skb+0xf0/0x220
    rtnetlink_rcv+0x15/0x20
    netlink_unicast+0x306/0x460
    netlink_sendmsg+0x44b/0x770
    __sys_sendto+0x259/0x270
    __x64_sys_sendto+0x80/0xa0
    do_syscall_64+0x69/0xf4
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Fixes: 18a8021a7be3 ("net/ipv4: Plumb support for filtering route dumps")
    Signed-off-by: Qian Cai
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Qian Cai
     
  • [ Upstream commit 3a303cfdd28d5f930a307c82e8a9d996394d5ebd ]

    The port->hsr is used in the hsr_handle_frame(), which is a
    callback of rx_handler.
    hsr master and slaves are initialized in hsr_add_port().
    This function initializes several pointers, which includes port->hsr after
    registering rx_handler.
    So, in the rx_handler routine, un-initialized pointer would be used.
    In order to fix this, pointers should be initialized before
    registering rx_handler.

    Test commands:
    ip netns del left
    ip netns del right
    modprobe -rv veth
    modprobe -rv hsr
    killall ping
    modprobe hsr
    ip netns add left
    ip netns add right
    ip link add veth0 type veth peer name veth1
    ip link add veth2 type veth peer name veth3
    ip link add veth4 type veth peer name veth5
    ip link set veth1 netns left
    ip link set veth3 netns right
    ip link set veth4 netns left
    ip link set veth5 netns right
    ip link set veth0 up
    ip link set veth2 up
    ip link set veth0 address fc:00:00:00:00:01
    ip link set veth2 address fc:00:00:00:00:02
    ip netns exec left ip link set veth1 up
    ip netns exec left ip link set veth4 up
    ip netns exec right ip link set veth3 up
    ip netns exec right ip link set veth5 up
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip netns exec left ip link add hsr1 type hsr slave1 veth1 slave2 veth4
    ip netns exec left ip a a 192.168.100.2/24 dev hsr1
    ip netns exec left ip link set hsr1 up
    ip netns exec left ip n a 192.168.100.1 dev hsr1 lladdr \
    fc:00:00:00:00:01 nud permanent
    ip netns exec left ip n r 192.168.100.1 dev hsr1 lladdr \
    fc:00:00:00:00:01 nud permanent
    for i in {1..100}
    do
    ip netns exec left ping 192.168.100.1 &
    done
    ip netns exec left hping3 192.168.100.1 -2 --flood &
    ip netns exec right ip link add hsr2 type hsr slave1 veth3 slave2 veth5
    ip netns exec right ip a a 192.168.100.3/24 dev hsr2
    ip netns exec right ip link set hsr2 up
    ip netns exec right ip n a 192.168.100.1 dev hsr2 lladdr \
    fc:00:00:00:00:02 nud permanent
    ip netns exec right ip n r 192.168.100.1 dev hsr2 lladdr \
    fc:00:00:00:00:02 nud permanent
    for i in {1..100}
    do
    ip netns exec right ping 192.168.100.1 &
    done
    ip netns exec right hping3 192.168.100.1 -2 --flood &
    while :
    do
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip link del hsr0
    done

    Splat looks like:
    [ 120.954938][ C0] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1]I
    [ 120.957761][ C0] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
    [ 120.959064][ C0] CPU: 0 PID: 1511 Comm: hping3 Not tainted 5.6.0-rc5+ #460
    [ 120.960054][ C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 120.962261][ C0] RIP: 0010:hsr_addr_is_self+0x65/0x2a0 [hsr]
    [ 120.963149][ C0] Code: 44 24 18 70 73 2f c0 48 c1 eb 03 48 8d 04 13 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2 f2 4
    [ 120.966277][ C0] RSP: 0018:ffff8880d9c09af0 EFLAGS: 00010206
    [ 120.967293][ C0] RAX: 0000000000000006 RBX: 1ffff1101b38135f RCX: 0000000000000000
    [ 120.968516][ C0] RDX: dffffc0000000000 RSI: ffff8880d17cb208 RDI: 0000000000000000
    [ 120.969718][ C0] RBP: 0000000000000030 R08: ffffed101b3c0e3c R09: 0000000000000001
    [ 120.972203][ C0] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 0000000000000000
    [ 120.973379][ C0] R13: ffff8880aaf80100 R14: ffff8880aaf800f2 R15: ffff8880aaf80040
    [ 120.974410][ C0] FS: 00007f58e693f740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
    [ 120.979794][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 120.980773][ C0] CR2: 00007ffcb8b38f29 CR3: 00000000afe8e001 CR4: 00000000000606f0
    [ 120.981945][ C0] Call Trace:
    [ 120.982411][ C0]
    [ 120.982848][ C0] ? hsr_add_node+0x8c0/0x8c0 [hsr]
    [ 120.983522][ C0] ? rcu_read_lock_held+0x90/0xa0
    [ 120.984159][ C0] ? rcu_read_lock_sched_held+0xc0/0xc0
    [ 120.984944][ C0] hsr_handle_frame+0x1db/0x4e0 [hsr]
    [ 120.985597][ C0] ? hsr_nl_nodedown+0x2b0/0x2b0 [hsr]
    [ 120.986289][ C0] __netif_receive_skb_core+0x6bf/0x3170
    [ 120.992513][ C0] ? check_chain_key+0x236/0x5d0
    [ 120.993223][ C0] ? do_xdp_generic+0x1460/0x1460
    [ 120.993875][ C0] ? register_lock_class+0x14d0/0x14d0
    [ 120.994609][ C0] ? __netif_receive_skb_one_core+0x8d/0x160
    [ 120.995377][ C0] __netif_receive_skb_one_core+0x8d/0x160
    [ 120.996204][ C0] ? __netif_receive_skb_core+0x3170/0x3170
    [ ... ]

    Reported-by: syzbot+fcf5dd39282ceb27108d@syzkaller.appspotmail.com
    Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 0fda7600c2e174fe27e9cf02e78e345226e441fa ]

    The debug check must be done after unregister_netdevice_many() call --
    the list_del() for this is done inside .ndo_stop.

    Fixes: 2843a25348f8 ("geneve: speedup geneve tunnels dismantle")
    Reported-and-tested-by:
    Cc: Haishuang Yan
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit f1f20a8666c55cb534b8f3fc1130eebf01a06155 ]

    Driver reclaims descriptors in much smaller batches, even if hardware
    indicates more to reclaim, during backpressure. So, fix the check to
    restart the Txq during backpressure, by looking at how many
    descriptors hardware had indicated to reclaim, and not on how many
    descriptors that driver had actually reclaimed. Once the Txq is
    restarted, driver will reclaim even more descriptors when Tx path
    is entered again.

    Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rahul Lakkireddy
     
  • [ Upstream commit 7affd80802afb6ca92dba47d768632fbde365241 ]

    commit 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page")
    reverted back to getting Tx CIDX updates via DMA, instead of interrupts,
    introduced by commit d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE
    doorbell queue timer")

    However, it missed reverting back several code changes where Tx CIDX
    updates are not explicitly requested during backpressure when using
    interrupt mode. These missed changes cause slow recovery during
    backpressure because the corresponding interrupt no longer comes and
    hence results in Tx throughput drop.

    So, revert back these missed code changes, as well, which will allow
    explicitly requesting Tx CIDX updates when backpressure happens.
    This enables the corresponding interrupt with Tx CIDX update message
    to get generated and hence speed up recovery and restore back
    throughput.

    Fixes: 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page")
    Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rahul Lakkireddy
     
  • commit 024aa8732acb7d2503eae43c3fe3504d0a8646d0 upstream.

    Note that the EC GPE processing need not be synchronized in
    acpi_s2idle_wake() after invoking acpi_ec_dispatch_gpe(), because
    that function checks the GPE status and dispatches its handler if
    need be and the SCI action handler is not going to run anyway at
    that point.

    Moreover, it is better to drain all of the pending ACPI events
    before restoring the working-state configuration of GPEs in
    acpi_s2idle_restore(), because those events are likely to be related
    to system wakeup, in which case they will not be relevant going
    forward.

    Rework the code to take these observations into account.

    Tested-by: Kenneth R. Crudup
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • [ Upstream commit d2f8bfa4bff5028bc40ed56b4497c32e05b0178f ]

    It has turned out that the sdhci-tegra controller requires the R1B response,
    for commands that has this response associated with them. So, converting
    from an R1B to an R1 response for a CMD6 for example, leads to problems
    with the HW busy detection support.

    Fix this by informing the mmc core about the requirement, via setting the
    host cap, MMC_CAP_NEED_RSP_BUSY.

    Reported-by: Bitan Biswas
    Reported-by: Peter Geis
    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Sowjanya Komatineni
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 055e04830d4544c57f2a5192a26c9e25915c29c0 ]

    It has turned out that the sdhci-omap controller requires the R1B response,
    for commands that has this response associated with them. So, converting
    from an R1B to an R1 response for a CMD6 for example, leads to problems
    with the HW busy detection support.

    Fix this by informing the mmc core about the requirement, via setting the
    host cap, MMC_CAP_NEED_RSP_BUSY.

    Reported-by: Naresh Kamboju
    Reported-by: Anders Roxell
    Reported-by: Faiz Abbas
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Faiz Abbas
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 18d200460cd73636d4f20674085c39e32b4e0097 ]

    The busy timeout for the CMD5 to put the eMMC into sleep state, is specific
    to the card. Potentially the timeout may exceed the host->max_busy_timeout.
    If that becomes the case, mmc_sleep() converts from using an R1B response
    to an R1 response, as to prevent the host from doing HW busy detection.

    However, it has turned out that some hosts requires an R1B response no
    matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
    that, if the R1B gets enforced, the host becomes fully responsible of
    managing the needed busy timeout, in one way or the other.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Link: https://lore.kernel.org/r/20200311092036.16084-1-ulf.hansson@linaro.org
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 43cc64e5221cc6741252b64bc4531dd1eefb733d ]

    The busy timeout that is computed for each erase/trim/discard operation,
    can become quite long and may thus exceed the host->max_busy_timeout. If
    that becomes the case, mmc_do_erase() converts from using an R1B response
    to an R1 response, as to prevent the host from doing HW busy detection.

    However, it has turned out that some hosts requires an R1B response no
    matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
    that, if the R1B gets enforced, the host becomes fully responsible of
    managing the needed busy timeout, in one way or the other.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Sowjanya Komatineni
    Tested-by: Faiz Abbas
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 1292e3efb149ee21d8d33d725eeed4e6b1ade963 ]

    It has turned out that some host controllers can't use R1B for CMD6 and
    other commands that have R1B associated with them. Therefore invent a new
    host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this.

    In __mmc_switch(), let's check the flag and use it to prevent R1B responses
    from being converted into R1. Note that, this also means that the host are
    on its own, when it comes to manage the busy timeout.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Sowjanya Komatineni
    Tested-by: Faiz Abbas
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     

25 Mar, 2020

27 commits

  • Greg Kroah-Hartman
     
  • commit ae62cf5eb2792d9a818c2d93728ed92119357017 upstream.

    Newer GCC warns about possible truncations of two generated path names as
    we're concatenating the configurable sysfs and debugfs path prefixes
    with a filename and placing the results in buffers of the same size as
    the maximum length of the prefixes.

    snprintf(d->name, MAX_STR_LEN, "gb_loopback%u", dev_id);

    snprintf(d->sysfs_entry, MAX_SYSFS_PATH, "%s%s/",
    t->sysfs_prefix, d->name);

    snprintf(d->debugfs_entry, MAX_SYSFS_PATH, "%sraw_latency_%s",
    t->debugfs_prefix, d->name);

    Fix this by separating the maximum path length from the maximum prefix
    length and reducing the latter enough to fit the generated strings.

    Note that we also need to reduce the device-name buffer size as GCC
    isn't smart enough to figure out that we ever only used MAX_STR_LEN
    bytes of it.

    Fixes: 6b0658f68786 ("greybus: tools: Add tools directory to greybus repo and add loopback")
    Signed-off-by: Johan Hovold
    Link: https://lore.kernel.org/r/20200312110151.22028-4-johan@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit f16023834863932f95dfad13fac3fc47f77d2f29 upstream.

    Newer GCC warns about a possible truncation of a generated sysfs path
    name as we're concatenating a directory path with a file name and
    placing the result in a buffer that is half the size of the maximum
    length of the directory path (which is user controlled).

    loopback_test.c: In function 'open_poll_files':
    loopback_test.c:651:31: warning: '%s' directive output may be truncated writing up to 511 bytes into a region of size 255 [-Wformat-truncation=]
    651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
    | ^~
    loopback_test.c:651:3: note: 'snprintf' output between 16 and 527 bytes into a destination of size 255
    651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Fix this by making sure the buffer is large enough the concatenated
    strings.

    Fixes: 6b0658f68786 ("greybus: tools: Add tools directory to greybus repo and add loopback")
    Fixes: 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify")
    Signed-off-by: Johan Hovold
    Link: https://lore.kernel.org/r/20200312110151.22028-3-johan@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit e8dca30f7118461d47e1c3510d0e31b277439151 upstream.

    CTA-861-F explicitly states that for RGB colorspace colorimetry should
    be set to "none". Fix that.

    Acked-by: Laurent Pinchart
    Fixes: def23aa7e982 ("drm: bridge: dw-hdmi: Switch to V4L bus format and encodings")
    Signed-off-by: Jernej Skrabec
    Link: https://patchwork.freedesktop.org/patch/msgid/20200304232512.51616-2-jernej.skrabec@siol.net
    Signed-off-by: Greg Kroah-Hartman

    Jernej Skrabec
     
  • commit 98fd5c723730f560e5bea919a64ac5b83d45eb72 upstream.

    When we send PDU data, we want to optimize the tcp stack
    operation if we have more data to send. So when we set MSG_MORE
    when:
    - We have more fragments coming in the batch, or
    - We have a more data to send in this PDU
    - We don't have a data digest trailer
    - We optimize with the SUCCESS flag and omit the NVMe completion
    (used if sq_head pointer update is disabled)

    This addresses a regression in QD=1 with SUCCESS flag optimization
    as we unconditionally set MSG_MORE when we didn't actually have
    more data to send.

    Fixes: 70583295388a ("nvmet-tcp: implement C2HData SUCCESS optimization")
    Reported-by: Mark Wunderlich
    Tested-by: Mark Wunderlich
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Keith Busch
    Signed-off-by: Greg Kroah-Hartman

    Sagi Grimberg
     
  • commit f50b7dacccbab2b9e3ef18f52a6dcc18ed2050b9 upstream.

    On a system configured to trigger a crash_kexec() reboot, when only one CPU
    is online and another CPU panics while starting-up, crash_smp_send_stop()
    will fail to send any STOP message to the other already online core,
    resulting in fail to freeze and registers not properly saved.

    Moreover even if the proper messages are sent (case CPUs > 2)
    it will similarly fail to account for the booting CPU when executing
    the final stop wait-loop, so potentially resulting in some CPU not
    been waited for shutdown before rebooting.

    A tangible effect of this behaviour can be observed when, after a panic
    with kexec enabled and loaded, on the following reboot triggered by kexec,
    the cpu that could not be successfully stopped fails to come back online:

    [ 362.291022] ------------[ cut here ]------------
    [ 362.291525] kernel BUG at arch/arm64/kernel/cpufeature.c:886!
    [ 362.292023] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    [ 362.292400] Modules linked in:
    [ 362.292970] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.6.0-rc4-00003-gc780b890948a #105
    [ 362.293136] Hardware name: Foundation-v8A (DT)
    [ 362.293382] pstate: 200001c5 (nzCv dAIF -PAN -UAO)
    [ 362.294063] pc : has_cpuid_feature+0xf0/0x348
    [ 362.294177] lr : verify_local_elf_hwcaps+0x84/0xe8
    [ 362.294280] sp : ffff800011b1bf60
    [ 362.294362] x29: ffff800011b1bf60 x28: 0000000000000000
    [ 362.294534] x27: 0000000000000000 x26: 0000000000000000
    [ 362.294631] x25: 0000000000000000 x24: ffff80001189a25c
    [ 362.294718] x23: 0000000000000000 x22: 0000000000000000
    [ 362.294803] x21: ffff8000114aa018 x20: ffff800011156a00
    [ 362.294897] x19: ffff800010c944a0 x18: 0000000000000004
    [ 362.294987] x17: 0000000000000000 x16: 0000000000000000
    [ 362.295073] x15: 00004e53b831ae3c x14: 00004e53b831ae3c
    [ 362.295165] x13: 0000000000000384 x12: 0000000000000000
    [ 362.295251] x11: 0000000000000000 x10: 00400032b5503510
    [ 362.295334] x9 : 0000000000000000 x8 : ffff800010c7e204
    [ 362.295426] x7 : 00000000410fd0f0 x6 : 0000000000000001
    [ 362.295508] x5 : 00000000410fd0f0 x4 : 0000000000000000
    [ 362.295592] x3 : 0000000000000000 x2 : ffff8000100939d8
    [ 362.295683] x1 : 0000000000180420 x0 : 0000000000180480
    [ 362.296011] Call trace:
    [ 362.296257] has_cpuid_feature+0xf0/0x348
    [ 362.296350] verify_local_elf_hwcaps+0x84/0xe8
    [ 362.296424] check_local_cpu_capabilities+0x44/0x128
    [ 362.296497] secondary_start_kernel+0xf4/0x188
    [ 362.296998] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000)
    [ 362.298652] SMP: stopping secondary CPUs
    [ 362.300615] Starting crashdump kernel...
    [ 362.301168] Bye!
    [ 0.000000] Booting Linux on physical CPU 0x0000000003 [0x410fd0f0]
    [ 0.000000] Linux version 5.6.0-rc4-00003-gc780b890948a (crimar01@e120937-lin) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #105 SMP PREEMPT Fri Mar 6 17:00:42 GMT 2020
    [ 0.000000] Machine model: Foundation-v8A
    [ 0.000000] earlycon: pl11 at MMIO 0x000000001c090000 (options '')
    [ 0.000000] printk: bootconsole [pl11] enabled
    .....
    [ 0.138024] rcu: Hierarchical SRCU implementation.
    [ 0.153472] its@2f020000: unable to locate ITS domain
    [ 0.154078] its@2f020000: Unable to locate ITS domain
    [ 0.157541] EFI services will not be available.
    [ 0.175395] smp: Bringing up secondary CPUs ...
    [ 0.209182] psci: failed to boot CPU1 (-22)
    [ 0.209377] CPU1: failed to boot: -22
    [ 0.274598] Detected PIPT I-cache on CPU2
    [ 0.278707] GICv3: CPU2: found redistributor 1 region 0:0x000000002f120000
    [ 0.285212] CPU2: Booted secondary processor 0x0000000001 [0x410fd0f0]
    [ 0.369053] Detected PIPT I-cache on CPU3
    [ 0.372947] GICv3: CPU3: found redistributor 2 region 0:0x000000002f140000
    [ 0.378664] CPU3: Booted secondary processor 0x0000000002 [0x410fd0f0]
    [ 0.401707] smp: Brought up 1 node, 3 CPUs
    [ 0.404057] SMP: Total of 3 processors activated.

    Make crash_smp_send_stop() account also for the online status of the
    calling CPU while evaluating how many CPUs are effectively online: this way
    the right number of STOPs is sent and all other stopped-cores's registers
    are properly saved.

    Fixes: 78fd584cdec05 ("arm64: kdump: implement machine_crash_shutdown()")
    Acked-by: Mark Rutland
    Signed-off-by: Cristian Marussi
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Cristian Marussi
     
  • commit d0bab0c39e32d39a8c5cddca72e5b4a3059fe050 upstream.

    On a system with only one CPU online, when another one CPU panics while
    starting-up, smp_send_stop() will fail to send any STOP message to the
    other already online core, resulting in a system still responsive and
    alive at the end of the panic procedure.

    [ 186.700083] CPU3: shutdown
    [ 187.075462] CPU2: shutdown
    [ 187.162869] CPU1: shutdown
    [ 188.689998] ------------[ cut here ]------------
    [ 188.691645] kernel BUG at arch/arm64/kernel/cpufeature.c:886!
    [ 188.692079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
    [ 188.692444] Modules linked in:
    [ 188.693031] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-rc4-00001-g338d25c35a98 #104
    [ 188.693175] Hardware name: Foundation-v8A (DT)
    [ 188.693492] pstate: 200001c5 (nzCv dAIF -PAN -UAO)
    [ 188.694183] pc : has_cpuid_feature+0xf0/0x348
    [ 188.694311] lr : verify_local_elf_hwcaps+0x84/0xe8
    [ 188.694410] sp : ffff800011b1bf60
    [ 188.694536] x29: ffff800011b1bf60 x28: 0000000000000000
    [ 188.694707] x27: 0000000000000000 x26: 0000000000000000
    [ 188.694801] x25: 0000000000000000 x24: ffff80001189a25c
    [ 188.694905] x23: 0000000000000000 x22: 0000000000000000
    [ 188.694996] x21: ffff8000114aa018 x20: ffff800011156a38
    [ 188.695089] x19: ffff800010c944a0 x18: 0000000000000004
    [ 188.695187] x17: 0000000000000000 x16: 0000000000000000
    [ 188.695280] x15: 0000249dbde5431e x14: 0262cbe497efa1fa
    [ 188.695371] x13: 0000000000000002 x12: 0000000000002592
    [ 188.695472] x11: 0000000000000080 x10: 00400032b5503510
    [ 188.695572] x9 : 0000000000000000 x8 : ffff800010c80204
    [ 188.695659] x7 : 00000000410fd0f0 x6 : 0000000000000001
    [ 188.695750] x5 : 00000000410fd0f0 x4 : 0000000000000000
    [ 188.695836] x3 : 0000000000000000 x2 : ffff8000100939d8
    [ 188.695919] x1 : 0000000000180420 x0 : 0000000000180480
    [ 188.696253] Call trace:
    [ 188.696410] has_cpuid_feature+0xf0/0x348
    [ 188.696504] verify_local_elf_hwcaps+0x84/0xe8
    [ 188.696591] check_local_cpu_capabilities+0x44/0x128
    [ 188.696666] secondary_start_kernel+0xf4/0x188
    [ 188.697150] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000)
    [ 188.698639] ---[ end trace 3f12ca47652f7b72 ]---
    [ 188.699160] Kernel panic - not syncing: Attempted to kill the idle task!
    [ 188.699546] Kernel Offset: disabled
    [ 188.699828] CPU features: 0x00004,20c02008
    [ 188.700012] Memory Limit: none
    [ 188.700538] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

    [root@arch ~]# echo Helo
    Helo
    [root@arch ~]# cat /proc/cpuinfo | grep proce
    processor : 0

    Make smp_send_stop() account also for the online status of the calling CPU
    while evaluating how many CPUs are effectively online: this way, the right
    number of STOPs is sent, so enforcing a proper freeze of the system at the
    end of panic even under the above conditions.

    Fixes: 08e875c16a16c ("arm64: SMP support")
    Reported-by: Dave Martin
    Acked-by: Mark Rutland
    Signed-off-by: Cristian Marussi
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Cristian Marussi
     
  • commit 3b36b13d5e69d6f51ff1c55d1b404a74646c9757 upstream.

    Commit 317d9313925c ("ALSA: hda/realtek - Set default power save node to
    0") makes the ALC225 have pop noise on S3 resume and cold boot.

    So partially revert this commit for ALC225 to fix the regression.

    Fixes: 317d9313925c ("ALSA: hda/realtek - Set default power save node to 0")
    BugLink: https://bugs.launchpad.net/bugs/1866357
    Signed-off-by: Kai-Heng Feng
    Link: https://lore.kernel.org/r/20200311061328.17614-1-kai.heng.feng@canonical.com
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Kai-Heng Feng
     
  • commit 8d67743653dce5a0e7aa500fcccb237cde7ad88e upstream.

    The recent futex inode life time fix changed the ordering of the futex key
    union struct members, but forgot to adjust the hash function accordingly,

    As a result the hashing omits the leading 64bit and even hashes beyond the
    futex key causing a bad hash distribution which led to a ~100% performance
    regression.

    Hand in the futex key pointer instead of a random struct member and make
    the size calculation based of the struct offset.

    Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
    Reported-by: Rong Chen
    Decoded-by: Linus Torvalds
    Signed-off-by: Thomas Gleixner
    Tested-by: Rong Chen
    Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream.

    As reported by Jann, ihold() does not in fact guarantee inode
    persistence. And instead of making it so, replace the usage of inode
    pointers with a per boot, machine wide, unique inode identifier.

    This sequence number is global, but shared (file backed) futexes are
    rare enough that this should not become a performance issue.

    Reported-by: Jann Horn
    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream.

    Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
    __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
    the vunmap() code-path. While this change was necessary to maintain
    correctness on x86-32-pae kernels, it also adds additional cycles for
    architectures that don't need it.

    Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
    severe performance regressions in micro-benchmarks because it now also
    calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But
    the vmalloc_sync_all() implementation on x86-64 is only needed for newly
    created mappings.

    To avoid the unnecessary work on x86-64 and to gain the performance
    back, split up vmalloc_sync_all() into two functions:

    * vmalloc_sync_mappings(), and
    * vmalloc_sync_unmappings()

    Most call-sites to vmalloc_sync_all() only care about new mappings being
    synchronized. The only exception is the new call-site added in the
    above mentioned commit.

    Shile Zhang directed us to a report of an 80% regression in reaim
    throughput.

    Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
    Reported-by: kernel test robot
    Reported-by: Shile Zhang
    Signed-off-by: Joerg Roedel
    Signed-off-by: Andrew Morton
    Tested-by: Borislav Petkov
    Acked-by: Rafael J. Wysocki [GHES]
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc:
    Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
    Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
    Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     
  • commit d72520ad004a8ce18a6ba6cde317f0081b27365a upstream.

    Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped
    out") supported writing THP to a swap device but forgot to upgrade an
    older commit df8c94d13c7e ("page-flags: define behavior of FS/IO-related
    flags on compound pages") which could trigger a crash during THP
    swapping out with DEBUG_VM_PGFLAGS=y,

    kernel BUG at include/linux/page-flags.h:317!

    page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
    page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0
    anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked)

    end_swap_bio_write()
    SetPageError(page)
    VM_BUG_ON_PAGE(1 && PageCompound(page))


    bio_endio+0x297/0x560
    dec_pending+0x218/0x430 [dm_mod]
    clone_endio+0xe4/0x2c0 [dm_mod]
    bio_endio+0x297/0x560
    blk_update_request+0x201/0x920
    scsi_end_request+0x6b/0x4b0
    scsi_io_completion+0x509/0x7e0
    scsi_finish_command+0x1ed/0x2a0
    scsi_softirq_done+0x1c9/0x1d0
    __blk_mqnterrupt+0xf/0x20

    Fix by checking PF_NO_TAIL in those places instead.

    Fixes: bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out")
    Signed-off-by: Qian Cai
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Acked-by: "Huang, Ying"
    Acked-by: Rafael Aquini
    Cc:
    Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Qian Cai
     
  • commit 0715e6c516f106ed553828a671d30ad9a3431536 upstream.

    Sachin reports [1] a crash in SLUB __slab_alloc():

    BUG: Kernel NULL pointer dereference on read at 0x000073b0
    Faulting instruction address: 0xc0000000003d55f4
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
    Modules linked in:
    CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
    NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
    REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest)
    MSR: 8000000000009033 CR: 24004844 XER: 00000000
    CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
    GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
    GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
    GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
    GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
    GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
    GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
    GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
    GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
    NIP ___slab_alloc+0x1f4/0x760
    LR __slab_alloc+0x34/0x60
    Call Trace:
    ___slab_alloc+0x334/0x760 (unreliable)
    __slab_alloc+0x34/0x60
    __kmalloc_node+0x110/0x490
    kvmalloc_node+0x58/0x110
    mem_cgroup_css_online+0x108/0x270
    online_css+0x48/0xd0
    cgroup_apply_control_enable+0x2ec/0x4d0
    cgroup_mkdir+0x228/0x5f0
    kernfs_iop_mkdir+0x90/0xf0
    vfs_mkdir+0x110/0x230
    do_mkdirat+0xb0/0x1a0
    system_call+0x5c/0x68

    This is a PowerPC platform with following NUMA topology:

    available: 2 nodes (0-1)
    node 0 cpus:
    node 0 size: 0 MB
    node 0 free: 0 MB
    node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    node 1 size: 35247 MB
    node 1 free: 30907 MB
    node distances:
    node 0 1
    0: 10 40
    1: 40 10

    possible numa nodes: 0-31

    This only happens with a mmotm patch "mm/memcontrol.c: allocate
    shrinker_map on appropriate NUMA node" [2] which effectively calls
    kmalloc_node for each possible node. SLUB however only allocates
    kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
    node_to_mem_node to return such valid node for other nodes since commit
    a561ce00b09e ("slub: fall back to node_to_mem_node() node if allocating
    on memoryless node"). This is however not true in this configuration
    where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
    thus it contains zeroes and get_partial() ends up accessing
    non-allocated kmem_cache_node.

    A related issue was reported by Bharata (originally by Ramachandran) [3]
    where a similar PowerPC configuration, but with mainline kernel without
    patch [2] ends up allocating large amounts of pages by kmalloc-1k
    kmalloc-512. This seems to have the same underlying issue with
    node_to_mem_node() not behaving as expected, and might probably also
    lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].

    This patch should fix both issues by not relying on node_to_mem_node()
    anymore and instead simply falling back to NUMA_NO_NODE, when
    kmalloc_node(node) is attempted for a node that's not online, or has no
    usable memory. The "usable memory" condition is also changed from
    node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
    the condition that SLUB uses to allocate kmem_cache_node structures.
    The check in get_partial() is removed completely, as the checks in
    ___slab_alloc() are now sufficient to prevent get_partial() being
    reached with an invalid node.

    [1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
    [2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
    [3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
    [4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

    Fixes: a561ce00b09e ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
    Reported-by: Sachin Sant
    Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Tested-by: Sachin Sant
    Tested-by: Bharata B Rao
    Reviewed-by: Srikar Dronamraju
    Cc: Mel Gorman
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Christopher Lameter
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Joonsoo Kim
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Kirill Tkhai
    Cc: Vlastimil Babka
    Cc: Nathan Lynch
    Cc:
    Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
    Debugged-by: Srikar Dronamraju
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     
  • commit 5076190daded2197f62fe92cf69674488be44175 upstream.

    This is just a cleanup addition to Jann's fix to properly update the
    transaction ID for the slub slowpath in commit fd4d9c7d0c71 ("mm: slub:
    add missing TID bump..").

    The transaction ID is what protects us against any concurrent accesses,
    but we should really also make sure to make the 'freelist' comparison
    itself always use the same freelist value that we then used as the new
    next free pointer.

    Jann points out that if we do all of this carefully, we could skip the
    transaction ID update for all the paths that only remove entries from
    the lists, and only update the TID when adding entries (to avoid the ABA
    issue with cmpxchg and list handling re-adding a previously seen value).

    But this patch just does the "make sure to cmpxchg the same value we
    used" rather than then try to be clever.

    Acked-by: Jann Horn
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 1b53734bd0b2feed8e7761771b2e76fc9126ea0c upstream.

    This fixes possible lost wakeup introduced by commit a218cc491420.
    Originally modifications to ep->wq were serialized by ep->wq.lock, but
    in commit a218cc491420 ("epoll: use rwlock in order to reduce
    ep_poll_callback() contention") a new rw lock was introduced in order to
    relax fd event path, i.e. callers of ep_poll_callback() function.

    After the change ep_modify and ep_insert (both are called on epoll_ctl()
    path) were switched to ep->lock, but ep_poll (epoll_wait) was using
    ep->wq.lock on wqueue list modification.

    The bug doesn't lead to any wqueue list corruptions, because wake up
    path and list modifications were serialized by ep->wq.lock internally,
    but actual waitqueue_active() check prior wake_up() call can be
    reordered with modifications of ep ready list, thus wake up can be lost.

    And yes, can be healed by explicit smp_mb():

    list_add_tail(&epi->rdlink, &ep->rdllist);
    smp_mb();
    if (waitqueue_active(&ep->wq))
    wake_up(&ep->wp);

    But let's make it simple, thus current patch replaces ep->wq.lock with
    the ep->lock for wqueue modifications, thus wake up path always observes
    activeness of the wqueue correcty.

    Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
    Reported-by: Max Neunhoeffer
    Signed-off-by: Roman Penyaev
    Signed-off-by: Andrew Morton
    Tested-by: Max Neunhoeffer
    Cc: Jakub Kicinski
    Cc: Christopher Kohlhoff
    Cc: Davidlohr Bueso
    Cc: Jason Baron
    Cc: Jes Sorensen
    Cc: [5.1+]
    Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
    References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
    Bisected-by: Max Neunhoeffer
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Roman Penyaev
     
  • commit 12e967fd8e4e6c3d275b4c69c890adc838891300 upstream.

    Jann has brought up a very interesting point [1]. While shared pages
    are excluded from MADV_PAGEOUT normally, CoW pages can be easily
    reclaimed that way. This can lead to all sorts of hard to debug
    problems. E.g. performance problems outlined by Daniel [2].

    There are runtime environments where there is a substantial memory
    shared among security domains via CoW memory and a easy to reclaim way
    of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
    performance degradation in for the parent process which might be more
    privileged or even open side channel attacks.

    The feasibility of the latter is not really clear to me TBH but there is
    no real reason for exposure at this stage. It seems there is no real
    use case to depend on reclaiming CoW memory via madvise at this stage so
    it is much easier to simply disallow it and this is what this patch
    does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the
    exclusively owned memory which is a straightforward semantic.

    [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
    [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com

    Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
    Reported-by: Jann Horn
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Daniel Colascione
    Cc: Dave Hansen
    Cc: "Joel Fernandes (Google)"
    Cc:
    Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     
  • commit d41e2f3bd54699f85b3d6f45abd09fa24a222cb9 upstream.

    In section_deactivate(), pfn_to_page() doesn't work any more after
    ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case. It
    causes a hot remove failure:

    kernel BUG at mm/page_alloc.c:4806!
    invalid opcode: 0000 [#1] SMP PTI
    CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:free_pages+0x85/0xa0
    Call Trace:
    __remove_pages+0x99/0xc0
    arch_remove_memory+0x23/0x4d
    try_remove_memory+0xc8/0x130
    __remove_memory+0xa/0x11
    acpi_memory_device_remove+0x72/0x100
    acpi_bus_trim+0x55/0x90
    acpi_device_hotplug+0x2eb/0x3d0
    acpi_hotplug_work_fn+0x1a/0x30
    process_one_work+0x1a7/0x370
    worker_thread+0x30/0x380
    kthread+0x112/0x130
    ret_from_fork+0x35/0x40

    Let's move the ->section_mem_map resetting after
    depopulate_section_memmap() to fix it.

    [akpm@linux-foundation.org: remove unneeded initialization, per David]
    Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
    Signed-off-by: Baoquan He
    Signed-off-by: Andrew Morton
    Reviewed-by: Pankaj Gupta
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Cc: Wei Yang
    Cc: Oscar Salvador
    Cc: Mike Rapoport
    Cc:
    Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Baoquan He
     
  • commit e26733e0d0ec6798eca93daa300bc3f43616127f upstream.

    Prior to this commit, we only directly check the affected cgroup's
    memory.high against its usage. However, it's possible that we are being
    reclaimed as a result of hitting an ancestor memory.high and should be
    penalised based on that, instead.

    This patch changes memory.high overage throttling to use the largest
    overage in its ancestors when considering how many penalty jiffies to
    charge. This makes sure that we penalise poorly behaving cgroups in the
    same way regardless of at what level of the hierarchy memory.high was
    breached.

    Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
    Reported-by: Johannes Weiner
    Signed-off-by: Chris Down
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Cc: Tejun Heo
    Cc: Michal Hocko
    Cc: Nathan Chancellor
    Cc: Roman Gushchin
    Cc: [5.4.x+]
    Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Chris Down
     
  • commit d397a45fc741c80c32a14e2de008441e9976f50c upstream.

    Commit 0e4b01df8659 had a bunch of fixups to use the right division
    method. However, it seems that after all that it still wasn't right --
    div_u64 takes a 32-bit divisor.

    The headroom is still large (2^32 pages), so on mundane systems you
    won't hit this, but this should definitely be fixed.

    Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
    Reported-by: Johannes Weiner
    Signed-off-by: Chris Down
    Signed-off-by: Andrew Morton
    Acked-by: Johannes Weiner
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Cc: Michal Hocko
    Cc: Nathan Chancellor
    Cc: [5.4.x+]
    Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Chris Down
     
  • commit 7d36665a5886c27ca4c4d0afd3ecc50b400f3587 upstream.

    An eventfd monitors multiple memory thresholds of the cgroup, closes them,
    the kernel deletes all events related to this eventfd. Before all events
    are deleted, another eventfd monitors the memory threshold of this cgroup,
    leading to a crash:

    BUG: kernel NULL pointer dereference, address: 0000000000000004
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
    Oops: 0002 [#1] SMP PTI
    CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
    Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
    Workqueue: events memcg_event_remove
    RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
    RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
    RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
    RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
    R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
    R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
    FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
    Call Trace:
    memcg_event_remove+0x32/0x90
    process_one_work+0x172/0x380
    worker_thread+0x49/0x3f0
    kthread+0xf8/0x130
    ret_from_fork+0x35/0x40
    CR2: 0000000000000004

    We can reproduce this problem in the following ways:

    1. We create a new cgroup subdirectory and a new eventfd, and then we
    monitor multiple memory thresholds of the cgroup through this eventfd.

    2. closing this eventfd, and __mem_cgroup_usage_unregister_event ()
    will be called multiple times to delete all events related to this
    eventfd.

    The first time __mem_cgroup_usage_unregister_event() is called, the
    kernel will clear all items related to this eventfd in thresholds->
    primary.

    Since there is currently only one eventfd, thresholds-> primary becomes
    empty, so the kernel will set thresholds-> primary and hresholds-> spare
    to NULL. If at this time, the user creates a new eventfd and monitor
    the memory threshold of this cgroup, kernel will re-initialize
    thresholds-> primary.

    Then when __mem_cgroup_usage_unregister_event () is called for the
    second time, because thresholds-> primary is not empty, the system will
    access thresholds-> spare, but thresholds-> spare is NULL, which will
    trigger a crash.

    In general, the longer it takes to delete all events related to this
    eventfd, the easier it is to trigger this problem.

    The solution is to check whether the thresholds associated with the
    eventfd has been cleared when deleting the event. If so, we do nothing.

    [akpm@linux-foundation.org: fix comment, per Kirill]
    Fixes: 907860ed381a ("cgroups: make cftype.unregister_event() void-returning")
    Signed-off-by: Chunguang Xu
    Signed-off-by: Andrew Morton
    Acked-by: Michal Hocko
    Acked-by: Kirill A. Shutemov
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc:
    Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Chunguang Xu
     
  • commit 283f87c0d5d32b4a5c22636adc559bca82196ed3 upstream.

    The operands of time_after() are in a wrong order in both instances in
    the sys-t driver. Fix that.

    Signed-off-by: Alexander Shishkin
    Reviewed-by: Andy Shevchenko
    Fixes: 39f10239df75 ("stm class: p_sys-t: Add support for CLOCKSYNC packets")
    Fixes: d69d5e83110f ("stm class: Add MIPI SyS-T protocol support")
    Cc: stable@vger.kernel.org # v4.20+
    Link: https://lore.kernel.org/r/20200317062215.15598-3-alexander.shishkin@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Alexander Shishkin
     
  • commit b216a8e7908cd750550c0480cf7d2b3a37f06954 upstream.

    drm_lease_create takes ownership of leases. And leases will be released
    by drm_master_put.

    drm_master_put
    ->drm_master_destroy
    ->idr_destroy

    So we needn't call idr_destroy again.

    Reported-and-tested-by: syzbot+05835159fe322770fe3d@syzkaller.appspotmail.com
    Signed-off-by: Qiujun Huang
    Cc: stable@vger.kernel.org
    Signed-off-by: Daniel Vetter
    Link: https://patchwork.freedesktop.org/patch/msgid/1584518030-4173-1-git-send-email-hqjagain@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Qiujun Huang
     
  • commit 5bbc6604a62814511c32f2e39bc9ffb2c1b92cbe upstream.

    The offset into the array was specified in bytes but should
    be in terms of 32-bit words. Also prevent large reads that
    would also cause a buffer overread.

    v2: Read from correct offset from internal storage buffer.

    Signed-off-by: Tom St Denis
    Acked-by: Christian König
    Reviewed-by: Alex Deucher
    Signed-off-by: Alex Deucher
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Tom St Denis
     
  • commit 236ebc20d9afc5e9ff52f3cf3f365a91583aac10 upstream.

    During a rename whiteout, if btrfs_whiteout_for_rename() returns an error
    we can end up returning from btrfs_rename() with the log context object
    still in the root's log context list - this happens if 'sync_log' was
    set to true before we called btrfs_whiteout_for_rename() and it is
    dangerous because we end up with a corrupt linked list (root->log_ctxs)
    as the log context object was allocated on the stack.

    After btrfs_rename() returns, any task that is running btrfs_sync_log()
    concurrently can end up crashing because that linked list is traversed by
    btrfs_sync_log() (through btrfs_remove_all_log_ctxs()). That results in
    the same issue that commit e6c617102c7e4 ("Btrfs: fix log context list
    corruption after rename exchange operation") fixed.

    Fixes: d4682ba03ef618 ("Btrfs: sync log after logging new name")
    CC: stable@vger.kernel.org # 4.19+
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 045706bff837ee89c13f1ace173db71922c1c40b upstream.

    libtraceevent (used by perf and trace-cmd) failed to parse the
    xhci_urb_dequeue trace event. This is because the user space trace
    event format parsing is not a full C compiler. It can handle some basic
    logic, but is not meant to be able to handle everything C can do.

    In cases where a trace event field needs to be converted from a number
    to a string, there's the __print_symbolic() macro that should be used:

    See samples/trace_events/trace-events-sample.h

    Some xhci trace events open coded the __print_symbolic() causing the
    user spaces tools to fail to parse it. This has to be replaced with
    __print_symbolic() instead.

    CC: stable@vger.kernel.org
    Reported-by: Tzvetomir Stoyanov
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206531
    Fixes: 5abdc2e6e12ff ("usb: host: xhci: add urb_enqueue/dequeue/giveback tracers")
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Mathias Nyman
    Link: https://lore.kernel.org/r/20200306150858.21904-2-mathias.nyman@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 3568b88944fef28db3ee989b957da49ffc627ede upstream.

    The syscall number of compat_clock_getres was erroneously set to 247
    (__NR_io_cancel!) instead of 264. This causes the vDSO fallback of
    clock_getres() to land on the wrong syscall for compat tasks.

    Fix the numbering.

    Cc:
    Fixes: 53c489e1dfeb6 ("arm64: compat: Add missing syscall numbers")
    Acked-by: Catalin Marinas
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Vincenzo Frascino
     
  • commit 5d892919fdd0cefd361697472d4e1b174a594991 upstream.

    I have hit the following build error:

    armv7a-hardfloat-linux-gnueabi-ld: drivers/rtc/rtc-max8907.o: in function `max8907_rtc_probe':
    rtc-max8907.c:(.text+0x400): undefined reference to `regmap_irq_get_virq'

    max8907 should select REGMAP_IRQ

    Fixes: 94c01ab6d7544 ("rtc: add MAX8907 RTC driver")
    Cc: stable
    Signed-off-by: Corentin Labbe
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Corentin Labbe