05 Aug, 2020

6 commits

  • [ Upstream commit 1748f6a2cbc4694523f16da1c892b59861045b9d ]

    The rcu_dereference call in rht_ptr_rcu is completely bogus because
    we've already dereferenced the value in __rht_ptr and operated on it.
    This causes potential double readings which could be fatal. The RCU
    dereference must occur prior to the comparison in __rht_ptr.

    This patch changes the order of RCU dereference so that it is done
    first and the result is then fed to __rht_ptr. The RCU marking
    changes have been minimised using casts which will be removed in
    a follow-up patch.

    Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...")
    Reported-by: "Gong, Sishuai"
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Herbert Xu
     
  • [ Upstream commit 7d0314b11cdd92bca8b89684c06953bf114605fc ]

    When setting the PF interface up/down, notify the firmware to update
    uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

    This behavior will prevent sending traffic out on uplink port when PF is
    down, such as sending traffic from a VF interface which is still up.
    Currently when calling mlx5e_open/close(), the driver only sends PAOS
    command to notify the firmware to set the physical port state to
    up/down, however, it is not sufficient. When VF is in "auto" state, it
    follows the uplink state, which was not updated on mlx5e_open/close()
    before this patch.

    When switchdev mode is enabled and uplink representor is first enabled,
    set the uplink port state value back to its FW default "AUTO".

    Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down")
    Signed-off-by: Ron Diskin
    Reviewed-by: Roi Dayan
    Reviewed-by: Moshe Shemesh
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Sasha Levin

    Ron Diskin
     
  • [ Upstream commit 101dde4207f1daa1fda57d714814a03835dccc3f ]

    The commits "xfrm: Move dst->path into struct xfrm_dst"
    and "net: Create and use new helper xfrm_dst_child()."
    changed xfrm bundle handling under the assumption
    that xdst->path and dst->child are not a NULL pointer
    only if dst->xfrm is not a NULL pointer. That is true
    with one exception. If the xfrm hold queue is used
    to wait until a SA is installed by the key manager,
    we create a dummy bundle without a valid dst->xfrm
    pointer. The current xfrm bundle handling crashes
    in that case. Fix this by extending the NULL check
    of dst->xfrm with a test of the DST_XFRM_QUEUE flag.

    Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
    Fixes: b92cf4aab8e6 ("net: Create and use new helper xfrm_dst_child().")
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Steffen Klassert
     
  • [ Upstream commit 4f47e8ab6ab796b5380f74866fa5287aca4dcc58 ]

    In commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    it would take 'priority' to make a policy unique, and allow duplicated
    policies with different 'priority' to be added, which is not expected
    by userland, as Tobias reported in strongswan.

    To fix this duplicated policies issue, and also fix the issue in
    commit ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list"),
    when doing add/del/get/update on user interfaces, this patch is to change
    to look up a policy with both mark and mask by doing:

    mark.v == pol->mark.v && mark.m == pol->mark.m

    and leave the check:

    (mark & pol->mark.m) == pol->mark.v

    for tx/rx path only.

    As the userland expects an exact mark and mask match to manage policies.

    v1->v2:
    - make xfrm_policy_mark_match inline and fix the changelog as
    Tobias suggested.

    Fixes: 295fae568885 ("xfrm: Allow user space manipulation of SPD mark")
    Fixes: ed17b8d377ea ("xfrm: fix a warning in xfrm_policy_insert_list")
    Reported-by: Tobias Brunner
    Tested-by: Tobias Brunner
    Signed-off-by: Xin Long
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Xin Long
     
  • commit 6989310f5d4327e8595664954edd40a7f99ddd0d upstream.

    Use offsetof to calculate offset of a field to take advantage of
    compiler built-in version when possible, and avoid UBSAN warning when
    compiling with Clang:

    ==================================================================
    UBSAN: Undefined behaviour in net/wireless/wext-core.c:525:14
    member access within null pointer of type 'struct iw_point'
    CPU: 3 PID: 165 Comm: kworker/u16:3 Tainted: G S W 4.19.23 #43
    Workqueue: cfg80211 __cfg80211_scan_done [cfg80211]
    Call trace:
    dump_backtrace+0x0/0x194
    show_stack+0x20/0x2c
    __dump_stack+0x20/0x28
    dump_stack+0x70/0x94
    ubsan_epilogue+0x14/0x44
    ubsan_type_mismatch_common+0xf4/0xfc
    __ubsan_handle_type_mismatch_v1+0x34/0x54
    wireless_send_event+0x3cc/0x470
    ___cfg80211_scan_done+0x13c/0x220 [cfg80211]
    __cfg80211_scan_done+0x28/0x34 [cfg80211]
    process_one_work+0x170/0x35c
    worker_thread+0x254/0x380
    kthread+0x13c/0x158
    ret_from_fork+0x10/0x18
    ===================================================================

    Signed-off-by: Pi-Hsun Shih
    Reviewed-by: Nick Desaulniers
    Link: https://lore.kernel.org/r/20191204081307.138765-1-pihsun@chromium.org
    Signed-off-by: Johannes Berg
    Signed-off-by: Nick Desaulniers
    Signed-off-by: Greg Kroah-Hartman

    Pi-Hsun Shih
     
  • commit 54a485e9ec084da1a4b32dcf7749c7d760ed8aa5 upstream.

    The lookaside count is improperly initialized to the size of the
    Receive Queue with the additional +1. In the traces below, the
    RQ size is 384, so the count was set to 385.

    The lookaside count is then rarely refreshed. Note the high and
    incorrect count in the trace below:

    rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c
    qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385
    rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1]
    Cc: # 5.4.x
    Reviewed-by: Kaike Wan
    Signed-off-by: Mike Marciniszyn
    Tested-by: Honggang Li
    Signed-off-by: Jason Gunthorpe
    Signed-off-by: Greg Kroah-Hartman

    Mike Marciniszyn
     

01 Aug, 2020

1 commit

  • [ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ]

    Previously TLP may send multiple probes of new data in one
    flight. This happens when the sender is cwnd limited. After the
    initial TLP containing new data is sent, the sender receives another
    ACK that acks partial inflight. It may re-arm another TLP timer
    to send more, if no further ACK returns before the next TLP timeout
    (PTO) expires. The sender may send in theory a large amount of TLP
    until send queue is depleted. This only happens if the sender sees
    such irregular uncommon ACK pattern. But it is generally undesirable
    behavior during congestion especially.

    The original TLP design restrict only one TLP probe per inflight as
    published in "Reducing Web Latency: the Virtue of Gentle Aggression",
    SIGCOMM 2013. This patch changes TLP to send at most one probe
    per inflight.

    Note that if the sender is app-limited, TLP retransmits old data
    and did not have this issue.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yuchung Cheng
     

29 Jul, 2020

7 commits

  • commit 5df96f2b9f58a5d2dc1f30fe7de75e197f2c25f2 upstream.

    Commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended
    device during destroy") broke integrity recalculation.

    The problem is dm_suspended() returns true not only during suspend,
    but also during resume. So this race condition could occur:
    1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
    2. integrity_recalc (&ic->recalc_work) preempts the current thread
    3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
    4. integrity_recalc exits and no recalculating is done.

    To fix this race condition, add a function dm_post_suspending that is
    only true during the postsuspend phase and use it instead of
    dm_suspended().

    Signed-off-by: Mikulas Patocka
    Fixes: adc0daad366b ("dm: report suspended device during destroy")
    Cc: stable vger kernel org # v4.18+
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • commit 85ca6b17e2bb96b19caac3b02c003d670b66de96 upstream.

    The Lenovo Miix 2 10 has a keyboard dock with extra speakers in the dock.
    Rather then the ACL5672's GPIO1 pin being used as IRQ to the CPU, it is
    actually used to enable the amplifier for these speakers
    (the IRQ to the CPU comes directly from the jack-detect switch).

    Add a quirk for having an ext speaker-amplifier enable pin on GPIO1
    and replace the Lenovo Miix 2 10's dmi_system_id table entry's wrong
    GPIO_DEV quirk (which needs to be renamed to GPIO1_IS_IRQ) with the
    new RT5670_GPIO1_IS_EXT_SPK_EN quirk, so that we enable the external
    speaker-amplifier as necessary.

    Also update the ident field for the dmi_system_id table entry, the
    Miix models are not Thinkpads.

    Fixes: 67e03ff3f32f ("ASoC: codecs: rt5670: add Thinkpad Tablet 10 quirk")
    Signed-off-by: Hans de Goede
    BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1786723
    Link: https://lore.kernel.org/r/20200628155231.71089-4-hdegoede@redhat.com
    Signed-off-by: Mark Brown
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • commit de2b41be8fcccb2f5b6c480d35df590476344201 upstream.

    On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
    page-aligned, but the end of the .bss..page_aligned section is not
    guaranteed to be page-aligned.

    As a result, objects from other .bss sections may end up on the same 4k
    page as the idt_table, and will accidentially get mapped read-only during
    boot, causing unexpected page-faults when the kernel writes to them.

    This could be worked around by making the objects in the page aligned
    sections page sized, but that's wrong.

    Explicit sections which store only page aligned objects have an implicit
    guarantee that the object is alone in the page in which it is placed. That
    works for all objects except the last one. That's inconsistent.

    Enforcing page sized objects for these sections would wreckage memory
    sanitizers, because the object becomes artificially larger than it should
    be and out of bound access becomes legit.

    Align the end of the .bss..page_aligned and .data..page_aligned section on
    page-size so all objects places in these sections are guaranteed to have
    their own page.

    [ tglx: Amended changelog ]

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org
    Signed-off-by: Greg Kroah-Hartman

    Joerg Roedel
     
  • commit e0b3e0b1a04367fc15c07f44e78361545b55357c upstream.

    The !ATOMIC_IOMAP version of io_maping_init_wc will always return
    success, even when the ioremap fails.

    Since the ATOMIC_IOMAP version returns NULL when the init fails, and
    callers check for a NULL return on error this is unexpected.

    During a device probe, where the ioremap failed, a crash can look like
    this:

    BUG: unable to handle page fault for address: 0000000000210000
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    Oops: 0002 [#1] PREEMPT SMP
    CPU: 0 PID: 177 Comm:
    RIP: 0010:fill_page_dma [i915]
    gen8_ppgtt_create [i915]
    i915_ppgtt_create [i915]
    intel_gt_init [i915]
    i915_gem_init [i915]
    i915_driver_probe [i915]
    pci_device_probe
    really_probe
    driver_probe_device

    The remap failure occurred much earlier in the probe. If it had been
    propagated, the driver would have exited with an error.

    Return NULL on ioremap failure.

    [akpm@linux-foundation.org: detect ioremap_wc() errors earlier]

    Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping")
    Signed-off-by: Michael J. Ruhl
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Mike Rapoport
    Cc: Andy Shevchenko
    Cc: Chris Wilson
    Cc: Daniel Vetter
    Cc:
    Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michael J. Ruhl
     
  • [ Upstream commit bd024e82e4cd95c7f1a475a55f99871936c2b2db ]

    Although mmiowb() is concerned only with serialising MMIO writes occuring
    in contexts where a spinlock is held, the call to mmiowb_set_pending()
    from the MMIO write accessors can occur in preemptible contexts, such
    as during driver probe() functions where ordering between CPUs is not
    usually a concern, assuming that the task migration path provides the
    necessary ordering guarantees.

    Unfortunately, the default implementation of mmiowb_set_pending() is not
    preempt-safe, as it makes use of a a per-cpu variable to track its
    internal state. This has been reported to generate the following splat
    on riscv:

    | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
    | caller is regmap_mmio_write32le+0x1c/0x46
    | CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-hfu+ #1
    | Call Trace:
    | walk_stackframe+0x0/0x7a
    | dump_stack+0x6e/0x88
    | regmap_mmio_write32le+0x18/0x46
    | check_preemption_disabled+0xa4/0xaa
    | regmap_mmio_write32le+0x18/0x46
    | regmap_mmio_write+0x26/0x44
    | regmap_write+0x28/0x48
    | sifive_gpio_probe+0xc0/0x1da

    Although it's possible to fix the driver in this case, other splats have
    been seen from other drivers, including the infamous 8250 UART, and so
    it's better to address this problem in the mmiowb core itself.

    Fix mmiowb_set_pending() by using the raw_cpu_ptr() to get at the mmiowb
    state and then only updating the 'mmiowb_pending' field if we are not
    preemptible (i.e. we have a non-zero nesting count).

    Cc: Arnd Bergmann
    Cc: Paul Walmsley
    Cc: Guo Ren
    Cc: Michael Ellerman
    Reported-by: Palmer Dabbelt
    Reported-by: Emil Renner Berthing
    Tested-by: Emil Renner Berthing
    Reviewed-by: Palmer Dabbelt
    Acked-by: Palmer Dabbelt
    Link: https://lore.kernel.org/r/20200716112816.7356-1-will@kernel.org
    Signed-off-by: Will Deacon
    Signed-off-by: Sasha Levin

    Will Deacon
     
  • [ Upstream commit c463bb2a8f8d7d97aa414bf7714fc77e9d3b10df ]

    This event code represents the state of a removable cover of a device.
    Value 0 means that the cover is open or removed, value 1 means that the
    cover is closed.

    Reviewed-by: Sebastian Reichel
    Acked-by: Tony Lindgren
    Signed-off-by: Merlijn Wajer
    Link: https://lore.kernel.org/r/20200612125402.18393-2-merlijn@wizzup.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Sasha Levin

    Merlijn Wajer
     
  • [ Upstream commit 6348dd291e3653534a9e28e6917569bc9967b35b ]

    There exists a sleep-while-atomic bug while accessing the dmabuf->name
    under mutex in the dmabuffs_dname(). This is caused from the SELinux
    permissions checks on a process where it tries to validate the inherited
    files from fork() by traversing them through iterate_fd() (which
    traverse files under spin_lock) and call
    match_file(security/selinux/hooks.c) where the permission checks happen.
    This audit information is logged using dump_common_audit_data() where it
    calls d_path() to get the file path name. If the file check happen on
    the dmabuf's fd, then it ends up in ->dmabuffs_dname() and use mutex to
    access dmabuf->name. The flow will be like below:
    flush_unauthorized_files()
    iterate_fd()
    spin_lock() --> Start of the atomic section.
    match_file()
    file_has_perm()
    avc_has_perm()
    avc_audit()
    slow_avc_audit()
    common_lsm_audit()
    dump_common_audit_data()
    audit_log_d_path()
    d_path()
    dmabuffs_dname()
    mutex_lock()--> Sleep while atomic.

    Call trace captured (on 4.19 kernels) is below:
    ___might_sleep+0x204/0x208
    __might_sleep+0x50/0x88
    __mutex_lock_common+0x5c/0x1068
    __mutex_lock_common+0x5c/0x1068
    mutex_lock_nested+0x40/0x50
    dmabuffs_dname+0xa0/0x170
    d_path+0x84/0x290
    audit_log_d_path+0x74/0x130
    common_lsm_audit+0x334/0x6e8
    slow_avc_audit+0xb8/0xf8
    avc_has_perm+0x154/0x218
    file_has_perm+0x70/0x180
    match_file+0x60/0x78
    iterate_fd+0x128/0x168
    selinux_bprm_committing_creds+0x178/0x248
    security_bprm_committing_creds+0x30/0x48
    install_exec_creds+0x1c/0x68
    load_elf_binary+0x3a4/0x14e0
    search_binary_handler+0xb0/0x1e0

    So, use spinlock to access dmabuf->name to avoid sleep-while-atomic.

    Cc: [5.3+]
    Signed-off-by: Charan Teja Kalla
    Reviewed-by: Michael J. Ruhl
    Acked-by: Christian König
    [sumits: added comment to spinlock_t definition to avoid warning]
    Signed-off-by: Sumit Semwal
    Link: https://patchwork.freedesktop.org/patch/msgid/a83e7f0d-4e54-9848-4b58-e1acdbe06735@codeaurora.org
    Signed-off-by: Sasha Levin

    Charan Teja Kalla
     

22 Jul, 2020

11 commits

  • commit aadf9dcef9d4cd68c73a4ab934f93319c4becc47 upstream.

    The trace symbol printer (__print_symbolic()) ignores symbols that map to
    an empty string and prints the hex value instead.

    Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid
    this.

    Fixes: b54a134a7de4 ("rxrpc: Fix handling of enums-to-string translation in tracing")
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit a50ca29523b18baea548bdf5df9b4b923c2bb4f6 upstream.

    This adds more hardware IDs for Elan touchpads found in various Lenovo
    laptops.

    Signed-off-by: Dave Wang
    Link: https://lore.kernel.org/r/000201d5a8bd$9fead3f0$dfc07bd0$@emc.com.tw
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dave Wang
     
  • commit f794db6841e5480208f0c3a3ac1df445a96b079e upstream.

    Until this commit the mainline kernel version (this version) of the
    vboxguest module contained a bug where it defined
    VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG using
    _IOC(_IOC_READ | _IOC_WRITE, 'V', ...) instead of
    _IO(V, ...) as the out of tree VirtualBox upstream version does.

    Since the VirtualBox userspace bits are always built against VirtualBox
    upstream's headers, this means that so far the mainline kernel version
    of the vboxguest module has been failing these 2 ioctls with -ENOTTY.
    I guess that VBGL_IOCTL_VMMDEV_REQUEST_BIG is never used causing us to
    not hit that one and sofar the vboxguest driver has failed to actually
    log any log messages passed it through VBGL_IOCTL_LOG.

    This commit changes the VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG
    defines to match the out of tree VirtualBox upstream vboxguest version,
    while keeping compatibility with the old wrong request defines so as
    to not break the kernel ABI in case someone has been using the old
    request defines.

    Fixes: f6ddd094f579 ("virt: Add vboxguest driver for Virtual Box Guest integration UAPI")
    Cc: stable@vger.kernel.org
    Acked-by: Arnd Bergmann
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Hans de Goede
    Link: https://lore.kernel.org/r/20200709120858.63928-2-hdegoede@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • [ Upstream commit e8639e1c986a8a9d0f94549170f6db579376c3ae ]

    The RTC modules on am3 and am4 need quirk handling to unlock and lock
    them for reset so let's add the quirk handling based on what we already
    have for legacy platform data. In later patches we will simply drop the
    RTC related platform data and the old quirk handling.

    Signed-off-by: Tony Lindgren
    Signed-off-by: Sasha Levin

    Tony Lindgren
     
  • [ Upstream commit bfe373f608cf81b7626dfeb904001b0e867c5110 ]

    Else there may be magic numbers in /sys/kernel/debug/block/*/state.

    Signed-off-by: Hou Tao
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Hou Tao
     
  • [ Upstream commit 14b032b8f8fce03a546dcf365454bec8c4a58d7d ]

    In order for no_refcnt and is_data to be the lowest order two
    bits in the 'val' we have to pad out the bitfield of the u8.

    Fixes: ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()")
    Reported-by: Guenter Roeck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

    When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 469aceddfa3ed16e17ee30533fae45e90f62efd8 ]

    Toshiaki pointed out that we now have two very similar functions to extract
    the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
    that the unbounded parsing loop makes it possible for maliciously crafted
    packets to loop through potentially hundreds of tags.

    Fix both of these issues by consolidating the two parsing functions and
    limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
    switch over __vlan_get_protocol() to use skb_header_pointer() instead of
    pskb_may_pull(), to avoid the possible side effects of the latter and keep
    the skb pointer 'const' through all the parsing functions.

    v2:
    - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)

    Reported-by: Toshiaki Makita
    Reported-by: Daniel Borkmann
    Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

    There are a couple of places in net/sched/ that check skb->protocol and act
    on the value there. However, in the presence of VLAN tags, the value stored
    in skb->protocol can be inconsistent based on whether VLAN acceleration is
    enabled. The commit quoted in the Fixes tag below fixed the users of
    skb->protocol to use a helper that will always see the VLAN ethertype.

    However, most of the callers don't actually handle the VLAN ethertype, but
    expect to find the IP header type in the protocol field. This means that
    things like changing the ECN field, or parsing diffserv values, stops
    working if there's a VLAN tag, or if there are multiple nested VLAN
    tags (QinQ).

    To fix this, change the helper to take an argument that indicates whether
    the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
    make sure to skip all of them, so behaviour is consistent even in QinQ
    mode.

    To make the helper usable from the ECN code, move it to if_vlan.h instead
    of pkt_sched.h.

    v3:
    - Remove empty lines
    - Move vlan variable definitions inside loop in skb_protocol()
    - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
    bpf_skb_ecn_set_ce()

    v2:
    - Use eth_type_vlan() helper in skb_protocol()
    - Also fix code that reads skb->protocol directly
    - Change a couple of 'if/else if' statements to switch constructs to avoid
    calling the helper twice

    Reported-by: Ilya Ponetayev
    Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     
  • [ Upstream commit 394de110a73395de2ca4516b0de435e91b11b604 ]

    The packets from tunnel devices (eg bareudp) may have only
    metadata in the dst pointer of skb. Hence a pointer check of
    neigh_lookup is needed in dst_neigh_lookup_skb

    Kernel crashes when packets from bareudp device is processed in
    the kernel neighbour subsytem.

    [ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [ 133.385240] #PF: supervisor instruction fetch in kernel mode
    [ 133.385828] #PF: error_code(0x0010) - not-present page
    [ 133.386603] PGD 0 P4D 0
    [ 133.386875] Oops: 0010 [#1] SMP PTI
    [ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15
    [ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [ 133.391076] RIP: 0010:0x0
    [ 133.392401] Code: Bad RIP value.
    [ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
    [ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
    [ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
    [ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
    [ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
    [ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
    [ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
    [ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
    [ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 133.404933] Call Trace:
    [ 133.405169]
    [ 133.405367] __neigh_update+0x5a4/0x8f0
    [ 133.405734] arp_process+0x294/0x820
    [ 133.406076] ? __netif_receive_skb_core+0x866/0xe70
    [ 133.406557] arp_rcv+0x129/0x1c0
    [ 133.406882] __netif_receive_skb_one_core+0x95/0xb0
    [ 133.407340] process_backlog+0xa7/0x150
    [ 133.407705] net_rx_action+0x2af/0x420
    [ 133.408457] __do_softirq+0xda/0x2a8
    [ 133.408813] asm_call_on_stack+0x12/0x20
    [ 133.409290]
    [ 133.409519] do_softirq_own_stack+0x39/0x50
    [ 133.410036] do_softirq+0x50/0x60
    [ 133.410401] __local_bh_enable_ip+0x50/0x60
    [ 133.410871] ip_finish_output2+0x195/0x530
    [ 133.411288] ip_output+0x72/0xf0
    [ 133.411673] ? __ip_finish_output+0x1f0/0x1f0
    [ 133.412122] ip_send_skb+0x15/0x40
    [ 133.412471] raw_sendmsg+0x853/0xab0
    [ 133.412855] ? insert_pfn+0xfe/0x270
    [ 133.413827] ? vvar_fault+0xec/0x190
    [ 133.414772] sock_sendmsg+0x57/0x80
    [ 133.415685] __sys_sendto+0xdc/0x160
    [ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0
    [ 133.417679] ? __audit_syscall_exit+0x1d9/0x280
    [ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0
    [ 133.419819] __x64_sys_sendto+0x24/0x30
    [ 133.420848] do_syscall_64+0x4d/0x90
    [ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 133.422833] RIP: 0033:0x7fe013689c03
    [ 133.423749] Code: Bad RIP value.
    [ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
    [ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
    [ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
    [ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
    [ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
    [ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
    [ 133.444045] CR2: 0000000000000000
    [ 133.445082] ---[ end trace f4aeee1958fd1638 ]---
    [ 133.446236] RIP: 0010:0x0
    [ 133.447180] Code: Bad RIP value.
    [ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
    [ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
    [ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
    [ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
    [ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
    [ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
    [ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
    [ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
    [ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt
    [ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

    Fixes: aaa0c23cb901 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
    Signed-off-by: Martin Varghese
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin Varghese
     
  • [ Upstream commit 1e82a62fec613844da9e558f3493540a5b7a7b67 ]

    A potential deadlock can occur during registering or unregistering a
    new generic netlink family between the main nl_table_lock and the
    cb_lock where each thread wants the lock held by the other, as
    demonstrated below.

    1) Thread 1 is performing a netlink_bind() operation on a socket. As part
    of this call, it will call netlink_lock_table(), incrementing the
    nl_table_users count to 1.
    2) Thread 2 is registering (or unregistering) a genl_family via the
    genl_(un)register_family() API. The cb_lock semaphore will be taken for
    writing.
    3) Thread 1 will call genl_bind() as part of the bind operation to handle
    subscribing to GENL multicast groups at the request of the user. It will
    attempt to take the cb_lock semaphore for reading, but it will fail and
    be scheduled away, waiting for Thread 2 to finish the write.
    4) Thread 2 will call netlink_table_grab() during the (un)registration
    call. However, as Thread 1 has incremented nl_table_users, it will not
    be able to proceed, and both threads will be stuck waiting for the
    other.

    genl_bind() is a noop, unless a genl_family implements the mcast_bind()
    function to handle setting up family-specific multicast operations. Since
    no one in-tree uses this functionality as Cong pointed out, simply removing
    the genl_bind() function will remove the possibility for deadlock, as there
    is no attempt by Thread 1 above to take the cb_lock semaphore.

    Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
    Suggested-by: Cong Wang
    Acked-by: Johannes Berg
    Reported-by: kernel test robot
    Signed-off-by: Sean Tranchetti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sean Tranchetti
     

16 Jul, 2020

3 commits

  • commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

    When evaluating access control over kallsyms visibility, credentials at
    open() time need to be used, not the "current" creds (though in BPF's
    case, this has likely always been the same). Plumb access to associated
    file->f_cred down through bpf_dump_raw_ok() and its callers now that
    kallsysm_show_value() has been refactored to take struct cred.

    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: bpf@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 160251842cd35a75edfb0a1d76afa3eb674ff40a upstream.

    In order to perform future tests against the cred saved during open(),
    switch kallsyms_show_value() to operate on a cred, and have all current
    callers pass current_cred(). This makes it very obvious where callers
    are checking the wrong credential in their "read" contexts. These will
    be fixed in the coming patches.

    Additionally switch return value to bool, since it is always used as a
    direct permission check, not a 0-on-success, negative-on-error style
    function return.

    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • [ Upstream commit f79a732a8325dfbd570d87f1435019d7e5501c6d ]

    On partial_drain completion we should be in SNDRV_PCM_STATE_RUNNING
    state, so set that for partially draining streams in
    snd_compr_drain_notify() and use a flag for partially draining streams

    While at it, add locks for stream state change in
    snd_compr_drain_notify() as well.

    Fixes: f44f2a5417b2 ("ALSA: compress: fix drain calls blocking other compress functions (v6)")
    Reviewed-by: Srinivas Kandagatla
    Tested-by: Srinivas Kandagatla
    Reviewed-by: Charles Keepax
    Tested-by: Charles Keepax
    Signed-off-by: Vinod Koul
    Link: https://lore.kernel.org/r/20200629134737.105993-4-vkoul@kernel.org
    Signed-off-by: Takashi Iwai
    Signed-off-by: Sasha Levin

    Vinod Koul
     

09 Jul, 2020

1 commit

  • commit 34c86f4c4a7be3b3e35aa48bd18299d4c756064d upstream.

    The locking in af_alg_release_parent is broken as the BH socket
    lock can only be taken if there is a code-path to handle the case
    where the lock is owned by process-context. Instead of adding
    such handling, we can fix this by changing the ref counts to
    atomic_t.

    This patch also modifies the main refcnt to include both normal
    and nokey sockets. This way we don't have to fudge the nokey
    ref count when a socket changes from nokey to normal.

    Credits go to Mauricio Faria de Oliveira who diagnosed this bug
    and sent a patch for it:

    https://lore.kernel.org/linux-crypto/20200605161657.535043-1-mfo@canonical.com/

    Reported-by: Brian Moyles
    Reported-by: Mauricio Faria de Oliveira
    Fixes: 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock in...")
    Cc:
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Herbert Xu
     

01 Jul, 2020

6 commits

  • [ Upstream commit 97dd1abd026ae4e6a82fa68645928404ad483409 ]

    qed_chain_get_element_left{,_u32} returned 0 when the difference
    between producer and consumer page count was equal to the total
    page count.
    Fix this by conditional expanding of producer value (vs
    unconditional). This allowed to eliminate normalizaton against
    total page count, which was the cause of this bug.

    Misc: replace open-coded constants with common defines.

    Fixes: a91eb52abb50 ("qed: Revisit chain implementation")
    Signed-off-by: Alexander Lobakin
    Signed-off-by: Igor Russkikh
    Signed-off-by: Michal Kalderon
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Alexander Lobakin
     
  • [ Upstream commit 7dfc06a0f25b593a9f51992f540c0f80a57f3629 ]

    It is possible that the first event in the event log is not actually a
    log header at all, but rather a normal event. This leads to the cast in
    __calc_tpm2_event_size being an invalid conversion, which means that
    the values read are effectively garbage. Depending on the first event's
    contents, this leads either to apparently normal behaviour, a crash or
    a freeze.

    While this behaviour of the firmware is not in accordance with the
    TCG Client EFI Specification, this happens on a Dell Precision 5510
    with the TPM enabled but hidden from the OS ("TPM On" disabled, state
    otherwise untouched). The EFI firmware claims that the TPM is present
    and active and that it supports the TCG 2.0 event log format.

    Fortunately, this can be worked around by simply checking the header
    of the first event and the event log header signature itself.

    Commit b4f1874c6216 ("tpm: check event log version before reading final
    events") addressed a similar issue also found on Dell models.

    Fixes: 6b0326190205 ("efi: Attempt to get the TCG2 event log in the boot stub")
    Signed-off-by: Fabian Vogt
    Link: https://lore.kernel.org/r/1927248.evlx2EsYKh@linux-e202.suse.de
    Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1165773
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Sasha Levin

    Fabian Vogt
     
  • [ Upstream commit 94579ac3f6d0820adc83b5dc5358ead0158101e9 ]

    During IPsec performance testing, we see bad ICMP checksum. The error packet
    has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
    is from ip_output, but the packet cannot be sent because
    netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
    call is from NET_TX softirq. However after the first call, the packet already
    has the ESP trailer.

    Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
    validate_xmit_xfrm to avoid duplicate ESP trailer insertion.

    Fixes: f6e27114a60a ("net: Add a xfrm validate function to validate_xmit_skb")
    Signed-off-by: Huy Nguyen
    Reviewed-by: Boris Pismenny
    Reviewed-by: Raed Salem
    Reviewed-by: Saeed Mahameed
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Huy Nguyen
     
  • [ Upstream commit 471e39df96b9a4c4ba88a2da9e25a126624d7a9c ]

    If a socket is set ipv6only, it will still send IPv4 addresses in the
    INIT and INIT_ACK packets. This potentially misleads the peer into using
    them, which then would cause association termination.

    The fix is to not add IPv4 addresses to ipv6only sockets.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: Corey Minyard
    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Corey Minyard
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Marcelo Ricardo Leitner
     
  • [ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]

    Clearing the sock TX queue in sk_set_socket() might cause unexpected
    out-of-order transmit when called from sock_orphan(), as outstanding
    packets can pick a different TX queue and bypass the ones already queued.

    This is undesired in general. More specifically, it breaks the in-order
    scheduling property guarantee for device-offloaded TLS sockets.

    Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
    explicitly only where needed.

    Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
    Signed-off-by: Tariq Toukan
    Reviewed-by: Boris Pismenny
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tariq Toukan
     
  • [ Upstream commit fb7861d14c8d7edac65b2fcb6e8031cb138457b2 ]

    In the current code, ->ndo_start_xmit() can be executed recursively only
    10 times because of stack memory.
    But, in the case of the vxlan, 10 recursion limit value results in
    a stack overflow.
    In the current code, the nested interface is limited by 8 depth.
    There is no critical reason that the recursion limitation value should
    be 10.
    So, it would be good to be the same value with the limitation value of
    nesting interface depth.

    Test commands:
    ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
    ip link set vxlan10 up
    ip a a 192.168.10.1/24 dev vxlan10
    ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent

    for i in {9..0}
    do
    let A=$i+1
    ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
    ip link set vxlan$i up
    ip a a 192.168.$i.1/24 dev vxlan$i
    ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
    bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
    done
    hping3 192.168.10.2 -2 -d 60000

    Splat looks like:
    [ 103.814237][ T1127] =============================================================================
    [ 103.871955][ T1127] BUG kmalloc-2k (Tainted: G B ): Padding overwritten. 0x00000000897a2e4f-0x000
    [ 103.873187][ T1127] -----------------------------------------------------------------------------
    [ 103.873187][ T1127]
    [ 103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
    [ 103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G B 5.7.0+ #575
    [ 103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 103.883006][ T1127] Call Trace:
    [ 103.883324][ T1127] dump_stack+0x96/0xdb
    [ 103.883716][ T1127] slab_err+0xad/0xd0
    [ 103.884106][ T1127] ? _raw_spin_unlock+0x1f/0x30
    [ 103.884620][ T1127] ? get_partial_node.isra.78+0x140/0x360
    [ 103.885214][ T1127] slab_pad_check.part.53+0xf7/0x160
    [ 103.885769][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.886316][ T1127] check_slab+0x97/0xb0
    [ 103.886763][ T1127] alloc_debug_processing+0x84/0x1a0
    [ 103.887308][ T1127] ___slab_alloc+0x5a5/0x630
    [ 103.887765][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.888265][ T1127] ? lock_downgrade+0x730/0x730
    [ 103.888762][ T1127] ? pskb_expand_head+0x110/0xe10
    [ 103.889244][ T1127] ? __slab_alloc+0x3e/0x80
    [ 103.889675][ T1127] __slab_alloc+0x3e/0x80
    [ 103.890108][ T1127] __kmalloc_node_track_caller+0xc7/0x420
    [ ... ]

    Fixes: 11a766ce915f ("net: Increase xmit RECURSION_LIMIT to 10.")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     

24 Jun, 2020

5 commits

  • commit 9b38cc704e844e41d9cf74e647bff1d249512cb3 upstream.

    Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
    My test was also able to trigger lockdep output:

    ============================================
    WARNING: possible recursive locking detected
    5.6.0-rc6+ #6 Not tainted
    --------------------------------------------
    sched-messaging/2767 is trying to acquire lock:
    ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

    but task is already holding lock:
    ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(kretprobe_table_locks[i].lock));
    lock(&(kretprobe_table_locks[i].lock));

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    1 lock held by sched-messaging/2767:
    #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    stack backtrace:
    CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
    Call Trace:
    dump_stack+0x96/0xe0
    __lock_acquire.cold.57+0x173/0x2b7
    ? native_queued_spin_lock_slowpath+0x42b/0x9e0
    ? lockdep_hardirqs_on+0x590/0x590
    ? __lock_acquire+0xf63/0x4030
    lock_acquire+0x15a/0x3d0
    ? kretprobe_hash_lock+0x52/0xa0
    _raw_spin_lock_irqsave+0x36/0x70
    ? kretprobe_hash_lock+0x52/0xa0
    kretprobe_hash_lock+0x52/0xa0
    trampoline_handler+0xf8/0x940
    ? kprobe_fault_handler+0x380/0x380
    ? find_held_lock+0x3a/0x1c0
    kretprobe_trampoline+0x25/0x50
    ? lock_acquired+0x392/0xbc0
    ? _raw_spin_lock_irqsave+0x50/0x70
    ? __get_valid_kprobe+0x1f0/0x1f0
    ? _raw_spin_unlock_irqrestore+0x3b/0x40
    ? finish_task_switch+0x4b9/0x6d0
    ? __switch_to_asm+0x34/0x70
    ? __switch_to_asm+0x40/0x70

    The code within the kretprobe handler checks for probe reentrancy,
    so we won't trigger any _raw_spin_lock_irqsave probe in there.

    The problem is in outside kprobe_flush_task, where we call:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave

    where _raw_spin_lock_irqsave triggers the kretprobe and installs
    kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

    The kretprobe_trampoline handler is then executed with already
    locked kretprobe_table_locks, and first thing it does is to
    lock kretprobe_table_locks ;-) the whole lockup path like:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

    ---> kretprobe_table_locks locked

    kretprobe_trampoline
    trampoline_handler
    kretprobe_hash_lock(current, &head, &flags);
    Cc: "Gustavo A . R . Silva"
    Cc: Anders Roxell
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Reported-by: "Ziqian SUN (Zamir)"
    Acked-by: Masami Hiramatsu
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     
  • [ Upstream commit 15b81ce5abdc4b502aa31dff2d415b79d2349d2f ]

    For optimized block readers not holding a mutex, the "number of sectors"
    64-bit value is protected from tearing on 32-bit architectures by a
    sequence counter.

    Disable preemption before entering that sequence counter's write side
    critical section. Otherwise, the read side can preempt the write side
    section and spin for the entire scheduler tick. If the reader belongs to
    a real-time scheduling class, it can spin forever and the kernel will
    livelock.

    Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
    Cc:
    Signed-off-by: Ahmed S. Darwish
    Reviewed-by: Sebastian Andrzej Siewior
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Ahmed S. Darwish
     
  • [ Upstream commit 7f6225e446cc8dfa4c3c7959a4de3dd03ec277bf ]

    __jbd2_journal_abort_hard() is no longer used, so now we can merge
    __jbd2_journal_abort_hard() and __journal_abort_soft() these two
    functions into jbd2_journal_abort() and remove them.

    Signed-off-by: zhangyi (F)
    Reviewed-by: Jan Kara
    Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    zhangyi (F)
     
  • [ Upstream commit b5292111de9bb70cba3489075970889765302136 ]

    Commit 130f4caf145c ("libata: Ensure ata_port probe has completed before
    detach") may cause system freeze during suspend.

    Using async_synchronize_full() in PM callbacks is wrong, since async
    callbacks that are already scheduled may wait for not-yet-scheduled
    callbacks, causes a circular dependency.

    Instead of using big hammer like async_synchronize_full(), use async
    cookie to make sure port probe are synced, without affecting other
    scheduled PM callbacks.

    Fixes: 130f4caf145c ("libata: Ensure ata_port probe has completed before detach")
    Suggested-by: John Garry
    Signed-off-by: Kai-Heng Feng
    Tested-by: John Garry
    BugLink: https://bugs.launchpad.net/bugs/1867983
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Kai-Heng Feng
     
  • [ Upstream commit cc7eac1e4afdd151085be4d0341a155760388653 ]

    Since EHCI/OHCI controllers on R-Car Gen3 SoCs are possible to
    be getting stuck very rarely after a full/low usb device was
    disconnected. To detect/recover from such a situation, the controllers
    require a special way which poll the EHCI PORTSC register and changes
    the OHCI functional state.

    So, this patch adds a polling timer into the ehci-platform driver,
    and if the ehci driver detects the issue by the EHCI PORTSC register,
    the ehci driver removes a companion device (= the OHCI controller)
    to change the OHCI functional state to USB Reset once. And then,
    the ehci driver adds the companion device again.

    Signed-off-by: Yoshihiro Shimoda
    Acked-by: Alan Stern
    Link: https://lore.kernel.org/r/1580114262-25029-1-git-send-email-yoshihiro.shimoda.uh@renesas.com
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Sasha Levin

    Yoshihiro Shimoda