09 Jan, 2021

2 commits

  • [ Upstream commit 31784cff7ee073b34d6eddabb95e3be2880a425c ]

    In preparation for converting exec_update_mutex to a rwsem so that
    multiple readers can execute in parallel and not deadlock, add
    down_read_interruptible. This is needed for perf_event_open to be
    converted (with no semantic changes) from working on a mutex to
    wroking on a rwsem.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
    Signed-off-by: Sasha Levin

    Eric W. Biederman
     
  • [ Upstream commit 0f9368b5bf6db0c04afc5454b1be79022a681615 ]

    In preparation for converting exec_update_mutex to a rwsem so that
    multiple readers can execute in parallel and not deadlock, add
    down_read_killable_nested. This is needed so that kcmp_lock
    can be converted from working on a mutexes to working on rw_semaphores.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
    Signed-off-by: Sasha Levin

    Eric W. Biederman
     

17 Nov, 2020

1 commit

  • A warning was hit when running xfstests/generic/068 in a Hyper-V guest:

    [...] ------------[ cut here ]------------
    [...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
    [...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170
    [...] ...
    [...] Workqueue: events pwq_unbound_release_workfn
    [...] RIP: 0010:check_flags.part.0+0x165/0x170
    [...] ...
    [...] Call Trace:
    [...] lock_is_held_type+0x72/0x150
    [...] ? lock_acquire+0x16e/0x4a0
    [...] rcu_read_lock_sched_held+0x3f/0x80
    [...] __send_ipi_one+0x14d/0x1b0
    [...] hv_send_ipi+0x12/0x30
    [...] __pv_queued_spin_unlock_slowpath+0xd1/0x110
    [...] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
    [...] .slowpath+0x9/0xe
    [...] lockdep_unregister_key+0x128/0x180
    [...] pwq_unbound_release_workfn+0xbb/0xf0
    [...] process_one_work+0x227/0x5c0
    [...] worker_thread+0x55/0x3c0
    [...] ? process_one_work+0x5c0/0x5c0
    [...] kthread+0x153/0x170
    [...] ? __kthread_bind_mask+0x60/0x60
    [...] ret_from_fork+0x1f/0x30

    The cause of the problem is we have call chain lockdep_unregister_key()
    -> lockdep_unlock() ->
    arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() ->
    __send_ipi_one() -> trace_hyperv_send_ipi_one().

    Although this particular warning is triggered because Hyper-V has a
    trace point in ipi sending, but in general arch_spin_unlock() may call
    another function having a trace point in it, so put the arch_spin_lock()
    and arch_spin_unlock() after lock_recursion protection to fix this
    problem and avoid similiar problems.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com

    Boqun Feng
     

11 Nov, 2020

1 commit

  • Chris Wilson reported a problem spotted by check_chain_key(): a chain
    key got changed in validate_chain() because we modify the ->read in
    validate_chain() to skip checks for dependency adding, and ->read is
    taken into calculation for chain key since commit f611e8cf98ec
    ("lockdep: Take read/write status in consideration when generate
    chainkey").

    Fix this by avoiding to modify ->read in validate_chain() based on two
    facts: a) since we now support recursive read lock detection, there is
    no need to skip checks for dependency adding for recursive readers, b)
    since we have a), there is only one case left (nest_lock) where we want
    to skip checks in validate_chain(), we simply remove the modification
    for ->read and rely on the return value of check_deadlock() to skip the
    dependency adding.

    Reported-by: Chris Wilson
    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com

    Boqun Feng
     

02 Nov, 2020

1 commit

  • Pull locking fixes from Thomas Gleixner:
    "A couple of locking fixes:

    - Fix incorrect failure injection handling in the fuxtex code

    - Prevent a preemption warning in lockdep when tracking
    local_irq_enable() and interrupts are already enabled

    - Remove more raw_cpu_read() usage from lockdep which causes state
    corruption on !X86 architectures.

    - Make the nr_unused_locks accounting in lockdep correct again"

    * tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep: Fix nr_unused_locks accounting
    locking/lockdep: Remove more raw_cpu_read() usage
    futex: Fix incorrect should_fail_futex() handling
    lockdep: Fix preemption WARN for spurious IRQ-enable

    Linus Torvalds
     

31 Oct, 2020

2 commits

  • Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
    usage_traceoverflow") breaks the nr_unused_locks validation code
    triggered by /proc/lockdep_stats.

    By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
    indicator for accounting nr_unused_locks; simplyfy by using any first
    bit.

    Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
    Reported-by: Chris Wilson
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Chris Wilson
    Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net

    Peter Zijlstra
     
  • I initially thought raw_cpu_read() was OK, since if it is !0 we have
    IRQs disabled and can't get migrated, so if we get migrated both CPUs
    must have 0 and it doesn't matter which 0 we read.

    And while that is true; it isn't the whole store, on pretty much all
    architectures (except x86) this can result in computing the address for
    one CPU, getting migrated, the old CPU continuing execution with another
    task (possibly setting recursion) and then the new CPU reading the value
    of the old CPU, which is no longer 0.

    Similer to:

    baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")

    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net

    Peter Zijlstra
     

22 Oct, 2020

1 commit

  • It is valid (albeit uncommon) to call local_irq_enable() without first
    having called local_irq_disable(). In this case we enter
    lockdep_hardirqs_on*() with IRQs enabled and trip a preemption warning
    for using __this_cpu_read().

    Use this_cpu_read() instead to avoid the warning.

    Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
    Reported-by: syzbot+53f8ce8bbc07924b6417@syzkaller.appspotmail.com
    Reported-by: kernel test robot
    Signed-off-by: Peter Zijlstra (Intel)

    Peter Zijlstra
     

19 Oct, 2020

1 commit

  • Pull RCU changes from Ingo Molnar:

    - Debugging for smp_call_function()

    - RT raw/non-raw lock ordering fixes

    - Strict grace periods for KASAN

    - New smp_call_function() torture test

    - Torture-test updates

    - Documentation updates

    - Miscellaneous fixes

    [ This doesn't actually pull the tag - I've dropped the last merge from
    the RCU branch due to questions about the series. - Linus ]

    * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
    smp: Make symbol 'csd_bug_count' static
    kernel/smp: Provide CSD lock timeout diagnostics
    smp: Add source and destination CPUs to __call_single_data
    rcu: Shrink each possible cpu krcp
    rcu/segcblist: Prevent useless GP start if no CBs to accelerate
    torture: Add gdb support
    rcutorture: Allow pointer leaks to test diagnostic code
    rcutorture: Hoist OOM registry up one level
    refperf: Avoid null pointer dereference when buf fails to allocate
    rcutorture: Properly synchronize with OOM notifier
    rcutorture: Properly set rcu_fwds for OOM handling
    torture: Add kvm.sh --help and update help message
    rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
    torture: Update initrd documentation
    rcutorture: Replace HTTP links with HTTPS ones
    locktorture: Make function torture_percpu_rwsem_init() static
    torture: document --allcpus argument added to the kvm.sh script
    rcutorture: Output number of elapsed grace periods
    rcutorture: Remove KCSAN stubs
    rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
    ...

    Linus Torvalds
     

09 Oct, 2020

4 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Steve reported that lockdep_assert*irq*(), when nested inside lockdep
    itself, will trigger a false-positive.

    One example is the stack-trace code, as called from inside lockdep,
    triggering tracing, which in turn calls RCU, which then uses
    lockdep_assert_irqs_disabled().

    Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
    Reported-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Basically print_lock_class_header()'s for loop is out of sync with the
    the size of of ->usage_traces[].

    Also clean things up a bit while at it, to avoid such mishaps in the future.

    Fixes: 23870f122768 ("locking/lockdep: Fix "USED"
    Debugged-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Tested-by: Qian Cai
    Link: https://lkml.kernel.org/r/20200930094937.GE2651@hirez.programming.kicks-ass.net

    Peter Zijlstra
     
  • …k/linux-rcu into core/rcu

    Pull v5.10 RCU changes from Paul E. McKenney:

    - Debugging for smp_call_function().

    - Strict grace periods for KASAN. The point of this series is to find
    RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
    Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
    further disabled by dfefault. Finally, the help text includes
    a goodly list of scary caveats.

    - New smp_call_function() torture test.

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

29 Sep, 2020

1 commit

  • Qian Cai reported a BFS_EQUEUEFULL warning [1] after read recursive
    deadlock detection merged into tip tree recently. Unlike the previous
    lockep graph searching, which iterate every lock class (every node in
    the graph) exactly once, the graph searching for read recurisve deadlock
    detection needs to iterate every lock dependency (every edge in the
    graph) once, as a result, the maximum memory cost of the circular queue
    changes from O(V), where V is the number of lock classes (nodes or
    vertices) in the graph, to O(E), where E is the number of lock
    dependencies (edges), because every lock class or dependency gets
    enqueued once in the BFS. Therefore we hit the BFS_EQUEUEFULL case.

    However, actually we don't need to enqueue all dependencies for the BFS,
    because every time we enqueue a dependency, we almostly enqueue all
    other dependencies in the same dependency list ("almostly" is because
    we currently check before enqueue, so if a dependency doesn't pass the
    check stage we won't enqueue it, however, we can always do in reverse
    ordering), based on this, we can only enqueue the first dependency from
    a dependency list and every time we want to fetch a new dependency to
    work, we can either:

    1) fetch the dependency next to the current dependency in the
    dependency list
    or

    2) if the dependency in 1) doesn't exist, fetch the dependency from
    the queue.

    With this approach, the "max bfs queue depth" for a x86_64_defconfig +
    lockdep and selftest config kernel can get descreased from:

    max bfs queue depth: 201

    to (after apply this patch)

    max bfs queue depth: 61

    While I'm at it, clean up the code logic a little (e.g. directly return
    other than set a "ret" value and goto the "exit" label).

    [1]: https://lore.kernel.org/lkml/17343f6f7f2438fc376125384133c5ba70c2a681.camel@redhat.com/

    Reported-by: Qian Cai
    Reported-by: syzbot+62ebe501c1ce9a91f68c@syzkaller.appspotmail.com
    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200917080210.108095-1-boqun.feng@gmail.com

    Boqun Feng
     

16 Sep, 2020

1 commit

  • The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
    that percpu-rwsem is a blocking primitive, should be just fine.

    However, file_end_write() is used from IRQ context and will cause
    load-store issues on architectures where the per-cpu accessors are not
    natively irq-safe.

    Fix it by using the IRQ-safe this_cpu_*() for operations on
    read_count. This will generate more expensive code on a number of
    platforms, which might cause a performance regression for some of the
    other percpu-rwsem users.

    If any such is reported, we can consider alternative solutions.

    Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
    Signed-off-by: Hou Tao
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Acked-by: Oleg Nesterov
    Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com

    Hou Tao
     

03 Sep, 2020

1 commit

  • During the LPC RCU BoF Paul asked how come the "USED" usage_mask & LOCK_USED))
    + if (!(class->usage_mask & LOCKF_USED))

    fixing that will indeed cause rcu_read_lock() to insta-splat :/

    The above typo means that instead of testing for: 0x100 (1 <<
    LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 <<
    LOCK_ENABLED_HARDIRQ).

    So instead of testing for _any_ used lock, it will only match any lock
    used with interrupts enabled.

    The rcu_read_lock() annotation uses .check=0, which means it will not
    set any of the interrupt bits and will thus never match.

    In order to properly fix the situation and allow rcu_read_lock() to
    correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by
    having .read users set USED_READ and test USED, pure read-recursive
    locks are permitted.

    Fixes: f6f48e180404 ("lockdep: Teach lockdep about "USED"
    Signed-off-by: Ingo Molnar
    Tested-by: Masami Hiramatsu
    Acked-by: Paul E. McKenney
    Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net

    peterz@infradead.org
     

26 Aug, 2020

14 commits

  • Currently, the chainkey of a lock chain is a hash sum of the class_idx
    of all the held locks, the read/write status are not taken in to
    consideration while generating the chainkey. This could result into a
    problem, if we have:

    P1()
    {
    read_lock(B);
    lock(A);
    }

    P2()
    {
    lock(A);
    read_lock(B);
    }

    P3()
    {
    lock(A);
    write_lock(B);
    }

    , and P1(), P2(), P3() run one by one. And when running P2(), lockdep
    detects such a lock chain A -> B is not a deadlock, then it's added in
    the chain cache, and then when running P3(), even if it's a deadlock, we
    could miss it because of the hit of chain cache. This could be confirmed
    by self testcase "chain cached mixed R-L/L-W ".

    To resolve this, we use concept "hlock_id" to generate the chainkey, the
    hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
    type. With this, the chainkeys are different is the lock sequences have
    the same locks but different read/write status.

    Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
    array now store the "hlock_id"s rather than lock_class indexes.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com

    Boqun Feng
     
  • Since we have all the fundamental to handle recursive read locks, we now
    add them into the dependency graph.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com

    Boqun Feng
     
  • Currently, in safe->unsafe detection, lockdep misses the fact that a
    LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
    cause deadlock too, for example:

    P1 P2

    write_lock(l1);
    read_lock(l2);
    write_lock(l2);

    read_lock(l1);

    Actually, all of the following cases may cause deadlocks:

    LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
    LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
    LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
    LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

    To fix this, we need to 1) change the calculation of exclusive_mask() so
    that READ bits are not dropped and 2) always call usage() in
    mark_lock_irq() to check usage deadlocks, even when the new usage of the
    lock is READ.

    Besides, adjust usage_match() and usage_acculumate() to recursive read
    lock changes.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com

    Boqun Feng
     
  • check_redundant() will report redundancy if it finds a path could
    replace the about-to-add dependency in the BFS search. With recursive
    read lock changes, we certainly need to change the match function for
    the check_redundant(), because the path needs to match not only the lock
    class but also the dependency kinds. For example, if the about-to-add
    dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
    .. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
    also say we find an -(SR)-> path from A to B), we can not replace the
    dependency with that path in the BFS search. Because the -(SN)->
    dependency can make a strong path with a following -(S*)-> dependency,
    however an -(SR)-> path cannot.

    Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
    means if we find a path which is stronger than or equal to the
    about-to-add dependency, we can report the redundancy. By "stronger", it
    means both the start and the end of the path are not weaker than the
    start and the end of the dependency (E is "stronger" than S and N is
    "stronger" than R), so that we can replace the dependency with that
    path.

    To make sure we find a path whose start point is not weaker than the
    about-to-add dependency, we use a trick: the ->only_xr of the root
    (start point) of __bfs() is initialized as @prev-> == 0, therefore if
    @prev is E, __bfs() will pick only -(E*)-> for the first dependency,
    otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

    To make sure we find a path whose end point is not weaker than the
    about-to-add dependency, we replace the match function for __bfs()
    check_redundant(), we check for the case that either @next is R
    (anything is not weaker than it) or the end point of the path is N
    (which is not weaker than anything).

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com

    Boqun Feng
     
  • Currently, lockdep only has limit support for deadlock detection for
    recursive read locks.

    This patch support deadlock detection for recursive read locks. The
    basic idea is:

    We are about to add dependency B -> A in to the dependency graph, we use
    check_noncircular() to find whether we have a strong dependency path
    A -> .. -> B so that we have a strong dependency circle (a closed strong
    dependency path):

    A -> .. -> B -> A

    , which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

    Since A -> .. -> B is already a strong dependency path, so if either
    B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
    A is strong, otherwise not. So we introduce a new match function
    hlock_conflict() to replace the class_equal() for the deadlock check in
    check_noncircular().

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com

    Boqun Feng
     
  • The "match" parameter of __bfs() is used for checking whether we hit a
    match in the search, therefore it should return a boolean value rather
    than an integer for better readability.

    This patch then changes the return type of the function parameter and the
    match functions to bool.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com

    Boqun Feng
     
  • Now we have four types of dependencies in the dependency graph, and not
    all the pathes carry real dependencies (the dependencies that may cause
    a deadlock), for example:

    Given lock A and B, if we have:

    CPU1 CPU2
    ============= ==============
    write_lock(A); read_lock(B);
    read_lock(B); write_lock(A);

    (assuming read_lock(B) is a recursive reader)

    then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a
    dependency path A -(ER)-> B -(SN)-> A.

    In lockdep w/o recursive locks, a dependency path from A to A
    means a deadlock. However, the above case is obviously not a
    deadlock, because no one holds B exclusively, therefore no one
    waits for the other to release B, so who get A first in CPU1 and
    CPU2 will run non-blockingly.

    As a result, dependency path A -(ER)-> B -(SN)-> A is not a
    real/strong dependency that could cause a deadlock.

    From the observation above, we know that for a dependency path to be
    real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->.

    Now our mission is to make __bfs() traverse only the strong dependency
    paths, which is simple: we record whether we only have -(*R)-> for the
    previous lock_list of the path in lock_list::only_xr, and when we pick a
    dependency in the traverse, we 1) filter out -(S*)-> dependency if the
    previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true)
    and 2) set the next lock_list::only_xr to true if we only have -(*R)->
    left after we filter out dependencies based on 1), otherwise, set it to
    false.

    With this extension for __bfs(), we now need to initialize the root of
    __bfs() properly (with a correct ->only_xr), to do so, we introduce some
    helper functions, which also cleans up a little bit for the __bfs() root
    initialization code.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-8-boqun.feng@gmail.com

    Boqun Feng
     
  • To add recursive read locks into the dependency graph, we need to store
    the types of dependencies for the BFS later. There are four types of
    dependencies:

    * Exclusive -> Non-recursive dependencies: EN
    e.g. write_lock(prev) held and try to acquire write_lock(next)
    or non-recursive read_lock(next), which can be represented as
    "prev -(EN)-> next"

    * Shared -> Non-recursive dependencies: SN
    e.g. read_lock(prev) held and try to acquire write_lock(next) or
    non-recursive read_lock(next), which can be represented as
    "prev -(SN)-> next"

    * Exclusive -> Recursive dependencies: ER
    e.g. write_lock(prev) held and try to acquire recursive
    read_lock(next), which can be represented as "prev -(ER)-> next"

    * Shared -> Recursive dependencies: SR
    e.g. read_lock(prev) held and try to acquire recursive
    read_lock(next), which can be represented as "prev -(SR)-> next"

    So we use 4 bits for the presence of each type in lock_list::dep. Helper
    functions and macros are also introduced to convert a pair of locks into
    lock_list::dep bit and maintain the addition of different types of
    dependencies.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-7-boqun.feng@gmail.com

    Boqun Feng
     
  • lock_list::distance is always not greater than MAX_LOCK_DEPTH (which
    is 48 right now), so a u16 will fit. This patch reduces the size of
    lock_list::distance to save space, so that we can introduce other fields
    to help detect recursive read lock deadlocks without increasing the size
    of lock_list structure.

    Suggested-by: Peter Zijlstra
    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-6-boqun.feng@gmail.com

    Boqun Feng
     
  • Currently, __bfs() will do a breadth-first search in the dependency
    graph and visit each lock class in the graph exactly once, so for
    example, in the following graph:

    A ---------> B
    | ^
    | |
    +----------> C

    a __bfs() call starts at A, will visit B through dependency A -> B and
    visit C through dependency A -> C and that's it, IOW, __bfs() will not
    visit dependency C -> B.

    This is OK for now, as we only have strong dependencies in the
    dependency graph, so whenever there is a traverse path from A to B in
    __bfs(), it means A has strong dependencies to B (IOW, B depends on A
    strongly). So no need to visit all dependencies in the graph.

    However, as we are going to add recursive-read lock into the dependency
    graph, as a result, not all the paths mean strong dependencies, in the
    same example above, dependency A -> B may be a weak dependency and
    traverse A -> C -> B may be a strong dependency path. And with the old
    way of __bfs() (i.e. visiting every lock class exactly once), we will
    miss the strong dependency path, which will result into failing to find
    a deadlock. To cure this for the future, we need to find a way for
    __bfs() to visit each dependency, rather than each class, exactly once
    in the search until we find a match.

    The solution is simple:

    We used to mark lock_class::lockdep_dependency_gen_id to indicate a
    class has been visited in __bfs(), now we change the semantics a little
    bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all
    the dependencies_ in its lock_{after,before} have been visited in the
    __bfs() (note we only take one direction in a __bfs() search). In this
    way, every dependency is guaranteed to be visited until we find a match.

    Note: the checks in mark_lock_accessed() and lock_accessed() are
    removed, because after this modification, we may call these two
    functions on @source_entry of __bfs(), which may not be the entry in
    "list_entries"

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-5-boqun.feng@gmail.com

    Boqun Feng
     
  • __bfs() could return four magic numbers:

    1: search succeeds, but none match.
    0: search succeeds, find one match.
    -1: search fails because of the cq is full.
    -2: search fails because a invalid node is found.

    This patch cleans things up by using a enum type for the return value
    of __bfs() and its friends, this improves the code readability of the
    code, and further, could help if we want to extend the BFS.

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-4-boqun.feng@gmail.com

    Boqun Feng
     
  • On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
    read lock, actually it's only recursive if in_interrupt() is true. So
    change the annotation accordingly to catch more deadlocks.

    Note we used to treat read_lock() as pure recursive read locks in
    lib/locking-seftest.c, and this is useful, especially for the lockdep
    development selftest, so we keep this via a variable to force switching
    lock annotation for read_lock().

    Signed-off-by: Boqun Feng
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200807074238.1632519-2-boqun.feng@gmail.com

    Boqun Feng
     
  • The lockdep tracepoints are under the lockdep recursion counter, this
    has a bunch of nasty side effects:

    - TRACE_IRQFLAGS doesn't work across the entire tracepoint

    - RCU-lockdep doesn't see the tracepoints either, hiding numerous
    "suspicious RCU usage" warnings.

    Pull the trace_lock_*() tracepoints completely out from under the
    lockdep recursion handling and completely rely on the trace level
    recusion handling -- also, tracing *SHOULD* not be taking locks in any
    case.

    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org

    Peter Zijlstra
     
  • Sven reported that commit a21ee6055c30 ("lockdep: Change
    hardirq{s_enabled,_context} to per-cpu variables") caused trouble on
    s390 because their this_cpu_*() primitives disable preemption which
    then lands back tracing.

    On the one hand, per-cpu ops should use preempt_*able_notrace() and
    raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*()
    ops for this.

    Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
    Reported-by: Sven Schnelle
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Thomas Gleixner
    Acked-by: Rafael J. Wysocki
    Tested-by: Marco Elver
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200821085348.192346882@infradead.org

    Peter Zijlstra
     

25 Aug, 2020

1 commit


11 Aug, 2020

1 commit

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     

07 Aug, 2020

2 commits

  • Pull KVM updates from Paolo Bonzini:
    "s390:
    - implement diag318

    x86:
    - Report last CPU for debugging
    - Emulate smaller MAXPHYADDR in the guest than in the host
    - .noinstr and tracing fixes from Thomas
    - nested SVM page table switching optimization and fixes

    Generic:
    - Unify shadow MMU cache data structures across architectures"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
    KVM: SVM: Fix sev_pin_memory() error handling
    KVM: LAPIC: Set the TDCR settable bits
    KVM: x86: Specify max TDP level via kvm_configure_mmu()
    KVM: x86/mmu: Rename max_page_level to max_huge_page_level
    KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
    KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
    KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
    KVM: VMX: Make vmx_load_mmu_pgd() static
    KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
    KVM: VMX: Drop a duplicate declaration of construct_eptp()
    KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
    KVM: Using macros instead of magic values
    MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
    KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
    KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
    KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
    KVM: VMX: Add guest physical address check in EPT violation and misconfig
    KVM: VMX: introduce vmx_need_pf_intercept
    KVM: x86: update exception bitmap on CPUID changes
    KVM: x86: rename update_bp_intercept to update_exception_bitmap
    ...

    Linus Torvalds
     
  • Pull sched/fifo updates from Ingo Molnar:
    "This adds the sched_set_fifo*() encapsulation APIs to remove static
    priority level knowledge from non-scheduler code.

    The three APIs for non-scheduler code to set SCHED_FIFO are:

    - sched_set_fifo()
    - sched_set_fifo_low()
    - sched_set_normal()

    These are two FIFO priority levels: default (high), and a 'low'
    priority level, plus sched_set_normal() to set the policy back to
    non-SCHED_FIFO.

    Since the changes affect a lot of non-scheduler code, we kept this in
    a separate tree"

    * tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    sched,tracing: Convert to sched_set_fifo()
    sched: Remove sched_set_*() return value
    sched: Remove sched_setscheduler*() EXPORTs
    sched,psi: Convert to sched_set_fifo_low()
    sched,rcutorture: Convert to sched_set_fifo_low()
    sched,rcuperf: Convert to sched_set_fifo_low()
    sched,locktorture: Convert to sched_set_fifo()
    sched,irq: Convert to sched_set_fifo()
    sched,watchdog: Convert to sched_set_fifo()
    sched,serial: Convert to sched_set_fifo()
    sched,powerclamp: Convert to sched_set_fifo()
    sched,ion: Convert to sched_set_normal()
    sched,powercap: Convert to sched_set_fifo*()
    sched,spi: Convert to sched_set_fifo*()
    sched,mmc: Convert to sched_set_fifo*()
    sched,ivtv: Convert to sched_set_fifo*()
    sched,drm/scheduler: Convert to sched_set_fifo*()
    sched,msm: Convert to sched_set_fifo*()
    sched,psci: Convert to sched_set_fifo*()
    sched,drbd: Convert to sched_set_fifo*()
    ...

    Linus Torvalds
     

06 Aug, 2020

1 commit


05 Aug, 2020

1 commit

  • Pull uninitialized_var() macro removal from Kees Cook:
    "This is long overdue, and has hidden too many bugs over the years. The
    series has several "by hand" fixes, and then a trivial treewide
    replacement.

    - Clean up non-trivial uses of uninitialized_var()

    - Update documentation and checkpatch for uninitialized_var() removal

    - Treewide removal of uninitialized_var()"

    * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    compiler: Remove uninitialized_var() macro
    treewide: Remove uninitialized_var() usage
    checkpatch: Remove awareness of uninitialized_var() macro
    mm/debug_vm_pgtable: Remove uninitialized_var() usage
    f2fs: Eliminate usage of uninitialized_var() macro
    media: sur40: Remove uninitialized_var() usage
    KVM: PPC: Book3S PR: Remove uninitialized_var() usage
    clk: spear: Remove uninitialized_var() usage
    clk: st: Remove uninitialized_var() usage
    spi: davinci: Remove uninitialized_var() usage
    ide: Remove uninitialized_var() usage
    rtlwifi: rtl8192cu: Remove uninitialized_var() usage
    b43: Remove uninitialized_var() usage
    drbd: Remove uninitialized_var() usage
    x86/mm/numa: Remove uninitialized_var() usage
    docs: deprecated.rst: Add uninitialized_var()

    Linus Torvalds
     

04 Aug, 2020

1 commit

  • Pull locking updates from Ingo Molnar:

    - LKMM updates: mostly documentation changes, but also some new litmus
    tests for atomic ops.

    - KCSAN updates: the most important change is that GCC 11 now has all
    fixes in place to support KCSAN, so GCC support can be enabled again.
    Also more annotations.

    - futex updates: minor cleanups and simplifications

    - seqlock updates: merge preparatory changes/cleanups for the
    'associated locks' facilities.

    - lockdep updates:
    - simplify IRQ trace event handling
    - add various new debug checks
    - simplify header dependencies, split out ,
    decouple lockdep from other low level headers some more
    - fix NMI handling

    - misc cleanups and smaller fixes

    * tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    kcsan: Improve IRQ state trace reporting
    lockdep: Refactor IRQ trace events fields into struct
    seqlock: lockdep assert non-preemptibility on seqcount_t write
    lockdep: Add preemption enabled/disabled assertion APIs
    seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
    seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
    seqlock: Reorder seqcount_t and seqlock_t API definitions
    seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
    seqlock: Properly format kernel-doc code samples
    Documentation: locking: Describe seqlock design and usage
    locking/qspinlock: Do not include atomic.h from qspinlock_types.h
    locking/atomic: Move ATOMIC_INIT into linux/types.h
    lockdep: Move list.h inclusion into lockdep.h
    locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
    futex: Remove unused or redundant includes
    futex: Consistently use fshared as boolean
    futex: Remove needless goto's
    futex: Remove put_futex_key()
    rwsem: fix commas in initialisation
    docs: locking: Replace HTTP links with HTTPS ones
    ...

    Linus Torvalds
     

03 Aug, 2020

1 commit


01 Aug, 2020

1 commit


31 Jul, 2020

1 commit

  • Refactor the IRQ trace events fields, used for printing information
    about the IRQ trace events, into a separate struct 'irqtrace_events'.

    This improves readability by separating the information only used in
    reporting, as well as enables (simplified) storing/restoring of
    irqtrace_events snapshots.

    No functional change intended.

    Signed-off-by: Marco Elver
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200729110916.3920464-1-elver@google.com
    Signed-off-by: Ingo Molnar

    Marco Elver