09 Mar, 2019

2 commits

  • init_data_structures_once() is called for the first time before RCU has
    been initialized. Make sure that init_rcu_head() is called before the
    RCU head is used and after RCU has been initialized.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: longman@redhat.com
    Link: https://lkml.kernel.org/r/c20aa0f0-42ab-a884-d931-7d4ec2bf0cdc@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Clang warns about a tentative array definition without a length:

    kernel/locking/lockdep.c:845:12: error: tentative array definition assumed to have one element [-Werror]

    There is no real reason to do this here, so just set the same length as
    in the real definition later in the same file. It has to be hidden in
    an #ifdef or annotated __maybe_unused though, to avoid the unused-variable
    warning if CONFIG_PROVE_LOCKING is disabled.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Bart Van Assche
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Jiri Olsa
    Cc: Joel Fernandes (Google)
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stephane Eranian
    Cc: Steven Rostedt (VMware)
    Cc: Tetsuo Handa
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Waiman Long
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/20190307075222.3424524-1-arnd@arndb.de
    Signed-off-by: Ingo Molnar

    Arnd Bergmann
     

06 Mar, 2019

2 commits

  • Pull perf updates from Ingo Molnar:
    "Lots of tooling updates - too many to list, here's a few highlights:

    - Various subcommand updates to 'perf trace', 'perf report', 'perf
    record', 'perf annotate', 'perf script', 'perf test', etc.

    - CPU and NUMA topology and affinity handling improvements,

    - HW tracing and HW support updates:
    - Intel PT updates
    - ARM CoreSight updates
    - vendor HW event updates

    - BPF updates

    - Tons of infrastructure updates, both on the build system and the
    library support side

    - Documentation updates.

    - ... and lots of other changes, see the changelog for details.

    Kernel side updates:

    - Tighten up kprobes blacklist handling, reduce the number of places
    where developers can install a kprobe and hang/crash the system.

    - Fix/enhance vma address filter handling.

    - Various PMU driver updates, small fixes and additions.

    - refcount_t conversions

    - BPF updates

    - error code propagation enhancements

    - misc other changes"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
    perf script python: Add Python3 support to syscall-counts-by-pid.py
    perf script python: Add Python3 support to syscall-counts.py
    perf script python: Add Python3 support to stat-cpi.py
    perf script python: Add Python3 support to stackcollapse.py
    perf script python: Add Python3 support to sctop.py
    perf script python: Add Python3 support to powerpc-hcalls.py
    perf script python: Add Python3 support to net_dropmonitor.py
    perf script python: Add Python3 support to mem-phys-addr.py
    perf script python: Add Python3 support to failed-syscalls-by-pid.py
    perf script python: Add Python3 support to netdev-times.py
    perf tools: Add perf_exe() helper to find perf binary
    perf script: Handle missing fields with -F +..
    perf data: Add perf_data__open_dir_data function
    perf data: Add perf_data__(create_dir|close_dir) functions
    perf data: Fail check_backup in case of error
    perf data: Make check_backup work over directories
    perf tools: Add rm_rf_perf_data function
    perf tools: Add pattern name checking to rm_rf
    perf tools: Add depth checking to rm_rf
    perf data: Add global path holder
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The biggest part of this tree is the new auto-generated atomics API
    wrappers by Mark Rutland.

    The primary motivation was to allow instrumentation without uglifying
    the primary source code.

    The linecount increase comes from adding the auto-generated files to
    the Git space as well:

    include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++--
    include/asm-generic/atomic-long.h | 1174 ++++++++++---
    include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++
    include/linux/atomic.h | 1241 +------------

    I preferred this approach, so that the full call stack of the (already
    complex) locking APIs is still fully visible in 'git grep'.

    But if this is excessive we could certainly hide them.

    There's a separate build-time mechanism to determine whether the
    headers are out of date (they should never be stale if we do our job
    right).

    Anyway, nothing from this should be visible to regular kernel
    developers.

    Other changes:

    - Add support for dynamic keys, which removes a source of false
    positives in the workqueue code, among other things (Bart Van
    Assche)

    - Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)

    - qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)

    - misc other updates and enhancements"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
    locking/lockdep: Shrink struct lock_class_key
    locking/lockdep: Add module_param to enable consistency checks
    lockdep/lib/tests: Test dynamic key registration
    lockdep/lib/tests: Fix run_tests.sh
    kernel/workqueue: Use dynamic lockdep keys for workqueues
    locking/lockdep: Add support for dynamic keys
    locking/lockdep: Verify whether lock objects are small enough to be used as class keys
    locking/lockdep: Check data structure consistency
    locking/lockdep: Reuse lock chains that have been freed
    locking/lockdep: Fix a comment in add_chain_cache()
    locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
    locking/lockdep: Reuse list entries that are no longer in use
    locking/lockdep: Free lock classes that are no longer in use
    locking/lockdep: Update two outdated comments
    locking/lockdep: Make it easy to detect whether or not inside a selftest
    locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
    locking/lockdep: Initialize the locks_before and locks_after lists earlier
    locking/lockdep: Make zap_class() remove all matching lock order entries
    locking/lockdep: Reorder struct lock_class members
    locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
    ...

    Linus Torvalds
     

28 Feb, 2019

21 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • And move the whole lot under CONFIG_DEBUG_LOCKDEP.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • A shortcoming of the current lockdep implementation is that it requires
    lock keys to be allocated statically. That forces all instances of lock
    objects that occur in a given data structure to share a lock key. Since
    lock dependency analysis groups lock objects per key sharing lock keys
    can cause false positive lockdep reports. Make it possible to avoid
    such false positive reports by allowing lock keys to be allocated
    dynamically. Require that dynamically allocated lock keys are
    registered before use by calling lockdep_register_key(). Complain about
    attempts to register the same lock key pointer twice without calling
    lockdep_unregister_key() between successive registration calls.

    The purpose of the new lock_keys_hash[] data structure that keeps
    track of all dynamic keys is twofold:

    - Verify whether the lockdep_register_key() and lockdep_unregister_key()
    functions are used correctly.

    - Avoid that lockdep_init_map() complains when encountering a dynamically
    allocated key.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-19-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-18-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Debugging lockdep data structure inconsistencies is challenging. Add
    code that verifies data structure consistency at runtime. That code is
    disabled by default because it is very CPU intensive.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-17-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • A previous patch introduced a lock chain leak. Fix that leak by reusing
    lock chains that have been freed.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-16-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Reflect that add_chain_cache() is always called with the graph lock held.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-15-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • This patch does not change any functionality but makes the next patch in
    this series easier to read.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-14-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Instead of abandoning elements of list_entries[] that are no longer in
    use, make alloc_list_entry() reuse array elements that have been freed.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-13-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Instead of leaving lock classes that are no longer in use in the
    lock_classes array, reuse entries from that array that are no longer in
    use. Maintain a linked list of free lock classes with list head
    'free_lock_class'. Only add freed lock classes to the free_lock_classes
    list after a grace period to avoid that a lock_classes[] element would
    be reused while an RCU reader is accessing it. Since the lockdep
    selftests run in a context where sleeping is not allowed and since the
    selftests require that lock resetting/zapping works with debug_locks
    off, make the behavior of lockdep_free_key_range() and
    lockdep_reset_lock() depend on whether or not these are called from
    the context of the lockdep selftests.

    Thanks to Peter for having shown how to modify get_pending_free()
    such that that function does not have to sleep.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-12-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • synchronize_sched() has been removed recently. Update the comments that
    refer to synchronize_sched().

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Fixes: 51959d85f32d ("lockdep: Replace synchronize_sched() with synchronize_rcu()") # v5.0-rc1
    Link: https://lkml.kernel.org/r/20190214230058.196511-11-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • The patch that frees unused lock classes will modify the behavior of
    lockdep_free_key_range() and lockdep_reset_lock() depending on whether
    or not these functions are called from the context of the lockdep
    selftests. Hence make it easy to detect whether or not lockdep code
    is called from the context of a lockdep selftest.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-10-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • This patch does not change the behavior of these functions but makes the
    patch that frees unused lock classes easier to read.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-9-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • This patch does not change any functionality. A later patch will reuse
    lock classes that have been freed. In combination with that patch this
    patch wil have the effect of initializing lock class order lists once
    instead of every time a lock class structure is reinitialized.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-8-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Make sure that all lock order entries that refer to a class are removed
    from the list_entries[] array when a kernel module is unloaded.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-7-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Make sure that add_chain_cache() returns 0 and does not modify the
    chain hash if nr_chain_hlocks == MAX_LOCKDEP_CHAIN_HLOCKS before this
    function is called.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-5-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Lock chains are only tracked with CONFIG_PROVE_LOCKING=y. Do not report
    the memory required for the lock chain array if CONFIG_PROVE_LOCKING=n.
    See also commit:

    ca58abcb4a6d ("lockdep: sanitise CONFIG_PROVE_LOCKING")

    Include the size of the chain_hlocks[] array.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-4-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Change the sizeof(array element time) * (array size) expressions into
    sizeof(array). This fixes the size computations of the classhash_table[]
    and chainhash_table[] arrays.

    The reason is that commit:

    a63f38cc4ccf ("locking/lockdep: Convert hash tables to hlists")

    changed the type of the elements of that array from 'struct list_head' into
    'struct hlist_head'.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-3-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • Use %zu to format size_t instead of %lu to avoid that the compiler
    complains about a mismatch between format specifier and argument on
    32-bit systems.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Johannes Berg
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: johannes.berg@intel.com
    Cc: tj@kernel.org
    Link: https://lkml.kernel.org/r/20190214230058.196511-2-bvanassche@acm.org
    Signed-off-by: Ingo Molnar

    Bart Van Assche
     
  • With the > 4 nesting levels case handled by the commit:

    d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels")

    the BUG_ON() call in encode_tail() will never actually be triggered.

    Remove it.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/1551057253-3231-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

13 Feb, 2019

2 commits

  • …/linux-rcu into core/rcu

    Pull the latest RCU tree from Paul E. McKenney:

    - Additional cleanups after RCU flavor consolidation
    - Grace-period forward-progress cleanups and improvements
    - Documentation updates
    - Miscellaneous fixes
    - spin_is_locked() conversions to lockdep
    - SPDX changes to RCU source and header files
    - SRCU updates
    - Torture-test updates, including nolibc updates and moving
    nolibc to tools/include

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • Some lockdep functions can be involved in breakpoint handling
    and probing on those functions can cause a breakpoint recursion.

    Prohibit probing on those functions by blacklist.

    Signed-off-by: Masami Hiramatsu
    Cc: Alexander Shishkin
    Cc: Andrea Righi
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/154998810578.31052.1680977921449292812.stgit@devbox
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

10 Feb, 2019

2 commits


08 Feb, 2019

1 commit

  • commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the
    rtmutex") changed the locking rules in the futex code so that the hash
    bucket lock is not longer held while the waiter is enqueued into the
    rtmutex wait list. This made the lock and the unlock path symmetric, but
    unfortunately the possible early exit from __rt_mutex_proxy_start() due to
    a detected deadlock was not updated accordingly. That allows a concurrent
    unlocker to observe inconsitent state which triggers the warning in the
    unlock path.

    futex_lock_pi() futex_unlock_pi()
    lock(hb->lock)
    queue(hb_waiter) lock(hb->lock)
    lock(rtmutex->wait_lock)
    unlock(hb->lock)
    // acquired hb->lock
    hb_waiter = futex_top_waiter()
    lock(rtmutex->wait_lock)
    __rt_mutex_proxy_start()
    ---> fail
    remove(rtmutex_waiter);
    ---> returns -EDEADLOCK
    unlock(rtmutex->wait_lock)
    // acquired wait_lock
    wake_futex_pi()
    rt_mutex_next_owner()
    --> returns NULL
    --> WARN

    lock(hb->lock)
    unqueue(hb_waiter)

    The problem is caused by the remove(rtmutex_waiter) in the failure case of
    __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
    hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

    The original commit handles this correctly for the other early return cases
    (timeout, signal) by delaying the removal of the rtmutex waiter until the
    returning task reacquired the hash bucket lock.

    Treat the failure case of __rt_mutex_proxy_start() in the same way and let
    the existing cleanup code handle the eventual handover of the rtmutex
    gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
    removal for the failure case, so that the other callsites are still
    operating correctly.

    Add proper comments to the code so all these details are fully documented.

    Thanks to Peter for helping with the analysis and writing the really
    valuable code comments.

    Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex")
    Reported-by: Heiko Carstens
    Co-developed-by: Peter Zijlstra
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Tested-by: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: linux-s390@vger.kernel.org
    Cc: Stefan Liebler
    Cc: Sebastian Sewior
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de

    Thomas Gleixner
     

04 Feb, 2019

4 commits

  • Track the number of slowpath locking operations that are being done
    without any MCS node available as well renaming lock_index[123] to make
    them more descriptive.

    Using these stat counters is one way to find out if a code path is
    being exercised.

    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: James Morse
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: SRINIVAS
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: Zhenzhong Duan
    Link: https://lkml.kernel.org/r/1548798828-16156-3-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Four queue nodes per CPU are allocated to enable up to 4 nesting levels
    using the per-CPU nodes. Nested NMIs are possible in some architectures.
    Still it is very unlikely that we will ever hit more than 4 nested
    levels with contention in the slowpath.

    When that rare condition happens, however, it is likely that the system
    will hang or crash shortly after that. It is not good and we need to
    handle this exception case.

    This is done by spinning directly on the lock using repeated trylock.
    This alternative code path should only be used when there is nested
    NMIs. Assuming that the locks used by those NMI handlers will not be
    heavily contended, a simple TAS locking should work out.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: James Morse
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: SRINIVAS
    Cc: Thomas Gleixner
    Cc: Zhenzhong Duan
    Link: https://lkml.kernel.org/r/1548798828-16156-2-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Some users, specifically futexes and rwsems, required fixes
    that allowed the callers to be safe when wakeups occur before
    they are expected by wake_up_q(). Such scenarios also play
    games and rely on reference counting, and until now were
    pivoting on wake_q doing it. With the wake_q_add() call being
    moved down, this can no longer be the case. As such we end up
    with a a double task refcounting overhead; and these callers
    care enough about this (being rather core-ish).

    This patch introduces a wake_q_add_safe() call that serves
    for callers that have already done refcounting and therefore the
    task is 'safe' from wake_q point of view (int that it requires
    reference throughout the entire queue/>wakeup cycle). In the one
    case it has internal reference counting, in the other case it
    consumes the reference counting.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Waiman Long
    Cc: Will Deacon
    Cc: Xie Yongji
    Cc: Yongji Xie
    Cc: andrea.parri@amarulasolutions.com
    Cc: lilin24@baidu.com
    Cc: liuqi16@baidu.com
    Cc: nixun@baidu.com
    Cc: yuanlinsi01@baidu.com
    Cc: zhangyu31@baidu.com
    Link: https://lkml.kernel.org/r/20181218195352.7orq3upiwfdbrdne@linux-r8p5
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
    warning right after a previous lockdep warning. It is likely that the
    previous warning turned off lock debugging causing the lockdep to have
    inconsistency states leading to the lock downgrade warning.

    Fix that by add a check for debug_locks at the beginning of
    __lock_downgrade().

    Reported-by: Tetsuo Handa
    Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     

26 Jan, 2019

1 commit

  • Beyond a certain point in the CPU-hotplug offline process, timers get
    stranded on the outgoing CPU, and won't fire until that CPU comes back
    online, which might well be never. This commit therefore adds a hook
    in torture_onoff_init() that is invoked from torture_offline(), which
    rcutorture uses to occasionally wait for a grace period. This should
    result in failures for RCU implementations that rely on stranded timers
    eventually firing in the absence of the CPU coming back online.

    Reported-by: Sebastian Andrzej Siewior
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

21 Jan, 2019

4 commits

  • It makes the code more self-explanatory and tells throughout the code
    what magic number refers to:

    - state (Hardirq/Softirq)
    - direction (used in or enabled above state)
    - read or write

    We can even remove some comments that were compensating for the lack of
    those constant names.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/1545973321-24422-3-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The enum mark_type appears a bit artificial here. We can directly pass
    the base enum lock_usage_bit value to mark_held_locks(). All we need
    then is to add the read index for each lock if necessary. It makes the
    code clearer.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/1545973321-24422-2-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
    warning right after a previous lockdep warning. It is likely that the
    previous warning turned off lock debugging causing the lockdep to have
    inconsistency states leading to the lock downgrade warning.

    Fix that by add a check for debug_locks at the beginning of
    __lock_downgrade().

    Debugged-by: Tetsuo Handa
    Reported-by: Tetsuo Handa
    Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
    Signed-off-by: Waiman Long
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • Because wake_q_add() can imply an immediate wakeup (cmpxchg failure
    case), we must not rely on the wakeup being delayed. However, commit:

    e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task")

    relies on exactly that behaviour in that the wakeup must not happen
    until after we clear waiter->task.

    [ peterz: Added changelog. ]

    Signed-off-by: Xie Yongji
    Signed-off-by: Zhang Yu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task")
    Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.com
    Signed-off-by: Ingo Molnar

    Xie Yongji
     

05 Jan, 2019

1 commit