Eric Lee / smarc-fsl-linux-kernel

09 Jan, 2021

2 commits

933b7cc86 rwsem: Implement down_read_interruptible ... Browse Code »

[ Upstream commit 31784cff7ee073b34d6eddabb95e3be2880a425c ]

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_interruptible. This is needed for perf_event_open to be
converted (with no semantic changes) from working on a mutex to
wroking on a rwsem.

Signed-off-by: Eric W. Biederman
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
Signed-off-by: Sasha Levin

Eric W. Biederman
2021-01-09 20:46:24 +0800
27bae39e4 rwsem: Implement down_read_killable_nested ... Browse Code »

[ Upstream commit 0f9368b5bf6db0c04afc5454b1be79022a681615 ]

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_killable_nested. This is needed so that kcmp_lock
can be converted from working on a mutexes to working on rw_semaphores.

Signed-off-by: Eric W. Biederman
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
Signed-off-by: Sasha Levin

Eric W. Biederman
2021-01-09 20:46:24 +0800

17 Nov, 2020

1 commit

43be4388e lockdep: Put graph lock/unlock under lock_recursion protection ... Browse Code »

A warning was hit when running xfstests/generic/068 in a Hyper-V guest:

[...] ------------[ cut here ]------------
[...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
[...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170
[...] ...
[...] Workqueue: events pwq_unbound_release_workfn
[...] RIP: 0010:check_flags.part.0+0x165/0x170
[...] ...
[...] Call Trace:
[...] lock_is_held_type+0x72/0x150
[...] ? lock_acquire+0x16e/0x4a0
[...] rcu_read_lock_sched_held+0x3f/0x80
[...] __send_ipi_one+0x14d/0x1b0
[...] hv_send_ipi+0x12/0x30
[...] __pv_queued_spin_unlock_slowpath+0xd1/0x110
[...] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[...] .slowpath+0x9/0xe
[...] lockdep_unregister_key+0x128/0x180
[...] pwq_unbound_release_workfn+0xbb/0xf0
[...] process_one_work+0x227/0x5c0
[...] worker_thread+0x55/0x3c0
[...] ? process_one_work+0x5c0/0x5c0
[...] kthread+0x153/0x170
[...] ? __kthread_bind_mask+0x60/0x60
[...] ret_from_fork+0x1f/0x30

The cause of the problem is we have call chain lockdep_unregister_key()
-> lockdep_unlock() ->
arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() ->
__send_ipi_one() -> trace_hyperv_send_ipi_one().

Although this particular warning is triggered because Hyper-V has a
trace point in ipi sending, but in general arch_spin_unlock() may call
another function having a trace point in it, so put the arch_spin_lock()
and arch_spin_unlock() after lock_recursion protection to fix this
problem and avoid similiar problems.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com

Boqun Feng
2020-11-17 20:15:35 +0800

11 Nov, 2020

1 commit

d61fc96a3 lockdep: Avoid to modify chain keys in validate_chain() ... Browse Code »

Chris Wilson reported a problem spotted by check_chain_key(): a chain
key got changed in validate_chain() because we modify the ->read in
validate_chain() to skip checks for dependency adding, and ->read is
taken into calculation for chain key since commit f611e8cf98ec
("lockdep: Take read/write status in consideration when generate
chainkey").

Fix this by avoiding to modify ->read in validate_chain() based on two
facts: a) since we now support recursive read lock detection, there is
no need to skip checks for dependency adding for recursive readers, b)
since we have a), there is only one case left (nest_lock) where we want
to skip checks in validate_chain(), we simply remove the modification
for ->read and rely on the return value of check_deadlock() to skip the
dependency adding.

Reported-by: Chris Wilson
Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com

Boqun Feng
2020-11-11 01:38:38 +0800

02 Nov, 2020

1 commit

8d99084ef Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:

- Fix incorrect failure injection handling in the fuxtex code

- Prevent a preemption warning in lockdep when tracking
local_irq_enable() and interrupts are already enabled

- Remove more raw_cpu_read() usage from lockdep which causes state
corruption on !X86 architectures.

- Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix nr_unused_locks accounting
locking/lockdep: Remove more raw_cpu_read() usage
futex: Fix incorrect should_fail_futex() handling
lockdep: Fix preemption WARN for spurious IRQ-enable

Linus Torvalds
2020-11-02 03:08:17 +0800

31 Oct, 2020

2 commits

1a3934086 lockdep: Fix nr_unused_locks accounting ... Browse Code »

Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
usage_traceoverflow") breaks the nr_unused_locks validation code
triggered by /proc/lockdep_stats.

By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
indicator for accounting nr_unused_locks; simplyfy by using any first
bit.

Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
Reported-by: Chris Wilson
Signed-off-by: Peter Zijlstra (Intel)
Tested-by: Chris Wilson
Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net

Peter Zijlstra
2020-10-31 00:07:18 +0800
d48e38500 locking/lockdep: Remove more raw_cpu_read() usage ... Browse Code »

I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.

And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.

Similer to:

baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")

Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net

Peter Zijlstra
2020-10-31 00:07:18 +0800

22 Oct, 2020

1 commit

f8e48a3dc lockdep: Fix preemption WARN for spurious IRQ-enable ... Browse Code »

It is valid (albeit uncommon) to call local_irq_enable() without first
having called local_irq_disable(). In this case we enter
lockdep_hardirqs_on*() with IRQs enabled and trip a preemption warning
for using __this_cpu_read().

Use this_cpu_read() instead to avoid the warning.

Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: syzbot+53f8ce8bbc07924b6417@syzkaller.appspotmail.com
Reported-by: kernel test robot
Signed-off-by: Peter Zijlstra (Intel)

Peter Zijlstra
2020-10-22 18:37:22 +0800

19 Oct, 2020

1 commit

41eea65e2 Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU changes from Ingo Molnar:

- Debugging for smp_call_function()

- RT raw/non-raw lock ordering fixes

- Strict grace periods for KASAN

- New smp_call_function() torture test

- Torture-test updates

- Documentation updates

- Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
the RCU branch due to questions about the series. - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
smp: Make symbol 'csd_bug_count' static
kernel/smp: Provide CSD lock timeout diagnostics
smp: Add source and destination CPUs to __call_single_data
rcu: Shrink each possible cpu krcp
rcu/segcblist: Prevent useless GP start if no CBs to accelerate
torture: Add gdb support
rcutorture: Allow pointer leaks to test diagnostic code
rcutorture: Hoist OOM registry up one level
refperf: Avoid null pointer dereference when buf fails to allocate
rcutorture: Properly synchronize with OOM notifier
rcutorture: Properly set rcu_fwds for OOM handling
torture: Add kvm.sh --help and update help message
rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
torture: Update initrd documentation
rcutorture: Replace HTTP links with HTTPS ones
locktorture: Make function torture_percpu_rwsem_init() static
torture: document --allcpus argument added to the kvm.sh script
rcutorture: Output number of elapsed grace periods
rcutorture: Remove KCSAN stubs
rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
...

Linus Torvalds
2020-10-19 05:34:50 +0800

09 Oct, 2020

4 commits

e705d3979 Merge branch 'locking/urgent' into locking/core, to pick up fixes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2020-10-09 14:55:17 +0800
4d004099a lockdep: Fix lockdep recursion ... Browse Code »

Steve reported that lockdep_assert*irq*(), when nested inside lockdep
itself, will trigger a false-positive.

One example is the stack-trace code, as called from inside lockdep,
triggering tracing, which in turn calls RCU, which then uses
lockdep_assert_irqs_disabled().

Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Steven Rostedt
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Ingo Molnar

Peter Zijlstra
2020-10-09 14:53:30 +0800
2bb8945bc lockdep: Fix usage_traceoverflow ... Browse Code »

Basically print_lock_class_header()'s for loop is out of sync with the
the size of of ->usage_traces[].

Also clean things up a bit while at it, to avoid such mishaps in the future.

Fixes: 23870f122768 ("locking/lockdep: Fix "USED"
Debugged-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Ingo Molnar
Tested-by: Qian Cai
Link: https://lkml.kernel.org/r/20200930094937.GE2651@hirez.programming.kicks-ass.net

Peter Zijlstra
2020-10-09 14:53:08 +0800
b36c830f8 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmc… ... Browse Code »

…k/linux-rcu into core/rcu

Pull v5.10 RCU changes from Paul E. McKenney:

- Debugging for smp_call_function().

- Strict grace periods for KASAN. The point of this series is to find
RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
further disabled by dfefault. Finally, the help text includes
a goodly list of scary caveats.

- New smp_call_function() torture test.

- Torture-test updates.

- Documentation updates.

- Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2020-10-09 14:21:56 +0800

29 Sep, 2020

1 commit

6d1823ccc lockdep: Optimize the memory usage of circular queue ... Browse Code »

Qian Cai reported a BFS_EQUEUEFULL warning [1] after read recursive
deadlock detection merged into tip tree recently. Unlike the previous
lockep graph searching, which iterate every lock class (every node in
the graph) exactly once, the graph searching for read recurisve deadlock
detection needs to iterate every lock dependency (every edge in the
graph) once, as a result, the maximum memory cost of the circular queue
changes from O(V), where V is the number of lock classes (nodes or
vertices) in the graph, to O(E), where E is the number of lock
dependencies (edges), because every lock class or dependency gets
enqueued once in the BFS. Therefore we hit the BFS_EQUEUEFULL case.

However, actually we don't need to enqueue all dependencies for the BFS,
because every time we enqueue a dependency, we almostly enqueue all
other dependencies in the same dependency list ("almostly" is because
we currently check before enqueue, so if a dependency doesn't pass the
check stage we won't enqueue it, however, we can always do in reverse
ordering), based on this, we can only enqueue the first dependency from
a dependency list and every time we want to fetch a new dependency to
work, we can either:

1) fetch the dependency next to the current dependency in the
dependency list
or

2) if the dependency in 1) doesn't exist, fetch the dependency from
the queue.

With this approach, the "max bfs queue depth" for a x86_64_defconfig +
lockdep and selftest config kernel can get descreased from:

max bfs queue depth: 201

to (after apply this patch)

max bfs queue depth: 61

While I'm at it, clean up the code logic a little (e.g. directly return
other than set a "ret" value and goto the "exit" label).

[1]: https://lore.kernel.org/lkml/17343f6f7f2438fc376125384133c5ba70c2a681.camel@redhat.com/

Reported-by: Qian Cai
Reported-by: syzbot+62ebe501c1ce9a91f68c@syzkaller.appspotmail.com
Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200917080210.108095-1-boqun.feng@gmail.com

Boqun Feng
2020-09-29 15:56:59 +0800

16 Sep, 2020

1 commit

e6b1a44ec locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count ... Browse Code »

The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
that percpu-rwsem is a blocking primitive, should be just fine.

However, file_end_write() is used from IRQ context and will cause
load-store issues on architectures where the per-cpu accessors are not
natively irq-safe.

Fix it by using the IRQ-safe this_cpu_*() for operations on
read_count. This will generate more expensive code on a number of
platforms, which might cause a performance regression for some of the
other percpu-rwsem users.

If any such is reported, we can consider alternative solutions.

Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
Signed-off-by: Hou Tao
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Acked-by: Oleg Nesterov
Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com

Hou Tao
2020-09-16 22:26:56 +0800

03 Sep, 2020

1 commit

23870f122 locking/lockdep: Fix "USED" <- "IN-NMI" inversions ... Browse Code »

During the LPC RCU BoF Paul asked how come the "USED" usage_mask & LOCK_USED))
+ if (!(class->usage_mask & LOCKF_USED))

fixing that will indeed cause rcu_read_lock() to insta-splat :/

The above typo means that instead of testing for: 0x100 (1 <<
LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 <<
LOCK_ENABLED_HARDIRQ).

So instead of testing for _any_ used lock, it will only match any lock
used with interrupts enabled.

The rcu_read_lock() annotation uses .check=0, which means it will not
set any of the interrupt bits and will thus never match.

In order to properly fix the situation and allow rcu_read_lock() to
correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by
having .read users set USED_READ and test USED, pure read-recursive
locks are permitted.

Fixes: f6f48e180404 ("lockdep: Teach lockdep about "USED"
Signed-off-by: Ingo Molnar
Tested-by: Masami Hiramatsu
Acked-by: Paul E. McKenney
Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net

peterz@infradead.org
2020-09-03 17:19:42 +0800

26 Aug, 2020

14 commits

f611e8cf9 lockdep: Take read/write status in consideration when generate chainkey ... Browse Code »

Currently, the chainkey of a lock chain is a hash sum of the class_idx
of all the held locks, the read/write status are not taken in to
consideration while generating the chainkey. This could result into a
problem, if we have:

P1()
{
read_lock(B);
lock(A);
}

P2()
{
lock(A);
read_lock(B);
}

P3()
{
lock(A);
write_lock(B);
}

, and P1(), P2(), P3() run one by one. And when running P2(), lockdep
detects such a lock chain A -> B is not a deadlock, then it's added in
the chain cache, and then when running P3(), even if it's a deadlock, we
could miss it because of the hit of chain cache. This could be confirmed
by self testcase "chain cached mixed R-L/L-W ".

To resolve this, we use concept "hlock_id" to generate the chainkey, the
hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
type. With this, the chainkeys are different is the lock sequences have
the same locks but different read/write status.

Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
array now store the "hlock_id"s rather than lock_class indexes.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:06 +0800
621c9dac0 lockdep: Add recursive read locks into dependency graph ... Browse Code »

Since we have all the fundamental to handle recursive read locks, we now
add them into the dependency graph.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:06 +0800
f08e38885 lockdep: Fix recursive read lock related safe->unsafe detection ... Browse Code »

Currently, in safe->unsafe detection, lockdep misses the fact that a
LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
cause deadlock too, for example:

P1 P2

write_lock(l1);
read_lock(l2);
write_lock(l2);

read_lock(l1);

Actually, all of the following cases may cause deadlocks:

LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

To fix this, we need to 1) change the calculation of exclusive_mask() so
that READ bits are not dropped and 2) always call usage() in
mark_lock_irq() to check usage deadlocks, even when the new usage of the
lock is READ.

Besides, adjust usage_match() and usage_acculumate() to recursive read
lock changes.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:05 +0800
68e305678 lockdep: Adjust check_redundant() for recursive read change ... Browse Code »

check_redundant() will report redundancy if it finds a path could
replace the about-to-add dependency in the BFS search. With recursive
read lock changes, we certainly need to change the match function for
the check_redundant(), because the path needs to match not only the lock
class but also the dependency kinds. For example, if the about-to-add
dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
.. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
also say we find an -(SR)-> path from A to B), we can not replace the
dependency with that path in the BFS search. Because the -(SN)->
dependency can make a strong path with a following -(S*)-> dependency,
however an -(SR)-> path cannot.

Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
means if we find a path which is stronger than or equal to the
about-to-add dependency, we can report the redundancy. By "stronger", it
means both the start and the end of the path are not weaker than the
start and the end of the dependency (E is "stronger" than S and N is
"stronger" than R), so that we can replace the dependency with that
path.

To make sure we find a path whose start point is not weaker than the
about-to-add dependency, we use a trick: the ->only_xr of the root
(start point) of __bfs() is initialized as @prev-> == 0, therefore if
@prev is E, __bfs() will pick only -(E*)-> for the first dependency,
otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

To make sure we find a path whose end point is not weaker than the
about-to-add dependency, we replace the match function for __bfs()
check_redundant(), we check for the case that either @next is R
(anything is not weaker than it) or the end point of the path is N
(which is not weaker than anything).

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:05 +0800
9de0c9bbc lockdep: Support deadlock detection for recursive read locks in check_noncircular() ... Browse Code »

Currently, lockdep only has limit support for deadlock detection for
recursive read locks.

This patch support deadlock detection for recursive read locks. The
basic idea is:

We are about to add dependency B -> A in to the dependency graph, we use
check_noncircular() to find whether we have a strong dependency path
A -> .. -> B so that we have a strong dependency circle (a closed strong
dependency path):

A -> .. -> B -> A

, which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

Since A -> .. -> B is already a strong dependency path, so if either
B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
A is strong, otherwise not. So we introduce a new match function
hlock_conflict() to replace the class_equal() for the deadlock check in
check_noncircular().

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:05 +0800
61775ed24 lockdep: Make __bfs(.match) return bool ... Browse Code »

The "match" parameter of __bfs() is used for checking whether we hit a
match in the search, therefore it should return a boolean value rather
than an integer for better readability.

This patch then changes the return type of the function parameter and the
match functions to bool.

Suggested-by: Peter Zijlstra
Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:05 +0800
6971c0f34 lockdep: Extend __bfs() to work with multiple types of dependencies ... Browse Code »

Now we have four types of dependencies in the dependency graph, and not
all the pathes carry real dependencies (the dependencies that may cause
a deadlock), for example:

Given lock A and B, if we have:

CPU1 CPU2
============= ==============
write_lock(A); read_lock(B);
read_lock(B); write_lock(A);

(assuming read_lock(B) is a recursive reader)

then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a
dependency path A -(ER)-> B -(SN)-> A.

In lockdep w/o recursive locks, a dependency path from A to A
means a deadlock. However, the above case is obviously not a
deadlock, because no one holds B exclusively, therefore no one
waits for the other to release B, so who get A first in CPU1 and
CPU2 will run non-blockingly.

As a result, dependency path A -(ER)-> B -(SN)-> A is not a
real/strong dependency that could cause a deadlock.

From the observation above, we know that for a dependency path to be
real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->.

Now our mission is to make __bfs() traverse only the strong dependency
paths, which is simple: we record whether we only have -(*R)-> for the
previous lock_list of the path in lock_list::only_xr, and when we pick a
dependency in the traverse, we 1) filter out -(S*)-> dependency if the
previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true)
and 2) set the next lock_list::only_xr to true if we only have -(*R)->
left after we filter out dependencies based on 1), otherwise, set it to
false.

With this extension for __bfs(), we now need to initialize the root of
__bfs() properly (with a correct ->only_xr), to do so, we introduce some
helper functions, which also cleans up a little bit for the __bfs() root
initialization code.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-8-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:04 +0800
3454a36d6 lockdep: Introduce lock_list::dep ... Browse Code »

To add recursive read locks into the dependency graph, we need to store
the types of dependencies for the BFS later. There are four types of
dependencies:

* Exclusive -> Non-recursive dependencies: EN
e.g. write_lock(prev) held and try to acquire write_lock(next)
or non-recursive read_lock(next), which can be represented as
"prev -(EN)-> next"

* Shared -> Non-recursive dependencies: SN
e.g. read_lock(prev) held and try to acquire write_lock(next) or
non-recursive read_lock(next), which can be represented as
"prev -(SN)-> next"

* Exclusive -> Recursive dependencies: ER
e.g. write_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(ER)-> next"

* Shared -> Recursive dependencies: SR
e.g. read_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(SR)-> next"

So we use 4 bits for the presence of each type in lock_list::dep. Helper
functions and macros are also introduced to convert a pair of locks into
lock_list::dep bit and maintain the addition of different types of
dependencies.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-7-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:04 +0800
bd76eca10 lockdep: Reduce the size of lock_list::distance ... Browse Code »

lock_list::distance is always not greater than MAX_LOCK_DEPTH (which
is 48 right now), so a u16 will fit. This patch reduces the size of
lock_list::distance to save space, so that we can introduce other fields
to help detect recursive read lock deadlocks without increasing the size
of lock_list structure.

Suggested-by: Peter Zijlstra
Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-6-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:04 +0800
d563bc6ea lockdep: Make __bfs() visit every dependency until a match ... Browse Code »

Currently, __bfs() will do a breadth-first search in the dependency
graph and visit each lock class in the graph exactly once, so for
example, in the following graph:

A ---------> B
| ^
| |
+----------> C

a __bfs() call starts at A, will visit B through dependency A -> B and
visit C through dependency A -> C and that's it, IOW, __bfs() will not
visit dependency C -> B.

This is OK for now, as we only have strong dependencies in the
dependency graph, so whenever there is a traverse path from A to B in
__bfs(), it means A has strong dependencies to B (IOW, B depends on A
strongly). So no need to visit all dependencies in the graph.

However, as we are going to add recursive-read lock into the dependency
graph, as a result, not all the paths mean strong dependencies, in the
same example above, dependency A -> B may be a weak dependency and
traverse A -> C -> B may be a strong dependency path. And with the old
way of __bfs() (i.e. visiting every lock class exactly once), we will
miss the strong dependency path, which will result into failing to find
a deadlock. To cure this for the future, we need to find a way for
__bfs() to visit each dependency, rather than each class, exactly once
in the search until we find a match.

The solution is simple:

We used to mark lock_class::lockdep_dependency_gen_id to indicate a
class has been visited in __bfs(), now we change the semantics a little
bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all
the dependencies_ in its lock_{after,before} have been visited in the
__bfs() (note we only take one direction in a __bfs() search). In this
way, every dependency is guaranteed to be visited until we find a match.

Note: the checks in mark_lock_accessed() and lock_accessed() are
removed, because after this modification, we may call these two
functions on @source_entry of __bfs(), which may not be the entry in
"list_entries"

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-5-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:03 +0800
b11be024d lockdep: Demagic the return value of BFS ... Browse Code »

__bfs() could return four magic numbers:

1: search succeeds, but none match.
0: search succeeds, find one match.
-1: search fails because of the cq is full.
-2: search fails because a invalid node is found.

This patch cleans things up by using a enum type for the return value
of __bfs() and its friends, this improves the code readability of the
code, and further, could help if we want to extend the BFS.

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-4-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:03 +0800
e91818861 locking: More accurate annotations for read_lock() ... Browse Code »

On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200807074238.1632519-2-boqun.feng@gmail.com

Boqun Feng
2020-08-26 18:42:02 +0800
eb1f00237 lockdep,trace: Expose tracepoints ... Browse Code »

The lockdep tracepoints are under the lockdep recursion counter, this
has a bunch of nasty side effects:

- TRACE_IRQFLAGS doesn't work across the entire tracepoint

- RCU-lockdep doesn't see the tracepoints either, hiding numerous
"suspicious RCU usage" warnings.

Pull the trace_lock_*() tracepoints completely out from under the
lockdep recursion handling and completely rely on the trace level
recusion handling -- also, tracing *SHOULD* not be taking locks in any
case.

Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Thomas Gleixner
Acked-by: Rafael J. Wysocki
Tested-by: Marco Elver
Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org

Peter Zijlstra
2020-08-26 18:41:56 +0800
fddf9055a lockdep: Use raw_cpu_*() for per-cpu variables ... Browse Code »

Sven reported that commit a21ee6055c30 ("lockdep: Change
hardirq{s_enabled,_context} to per-cpu variables") caused trouble on
s390 because their this_cpu_*() primitives disable preemption which
then lands back tracing.

On the one hand, per-cpu ops should use preempt_*able_notrace() and
raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*()
ops for this.

Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Sven Schnelle
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Thomas Gleixner
Acked-by: Rafael J. Wysocki
Tested-by: Marco Elver
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200821085348.192346882@infradead.org

Peter Zijlstra
2020-08-26 18:41:53 +0800

25 Aug, 2020

1 commit

d49bed9ab locktorture: Make function torture_percpu_rwsem_init() static ... Browse Code »

The sparse tool complains as follows:

kernel/locking/locktorture.c:569:6: warning:
symbol 'torture_percpu_rwsem_init' was not declared. Should it be static?

And this function is not used outside of locktorture.c,
so this commit marks it static.

Signed-off-by: Wei Yongjun
Signed-off-by: Paul E. McKenney

Wei Yongjun
2020-08-25 09:45:32 +0800

11 Aug, 2020

1 commit

97d052ea3 Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:

- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.

- The seqcount associated lock debug addons which were blocked by the
above fallout.

seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.

This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.

Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.

Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.

While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.

- Conversion of seqcount usage sites to associated types and
initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
x86/headers: Remove APIC headers from
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...

Linus Torvalds
2020-08-11 10:07:44 +0800

07 Aug, 2020

2 commits

921d2597a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:
"s390:
- implement diag318

x86:
- Report last CPU for debugging
- Emulate smaller MAXPHYADDR in the guest than in the host
- .noinstr and tracing fixes from Thomas
- nested SVM page table switching optimization and fixes

Generic:
- Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
KVM: SVM: Fix sev_pin_memory() error handling
KVM: LAPIC: Set the TDCR settable bits
KVM: x86: Specify max TDP level via kvm_configure_mmu()
KVM: x86/mmu: Rename max_page_level to max_huge_page_level
KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
KVM: VMX: Make vmx_load_mmu_pgd() static
KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
KVM: VMX: Drop a duplicate declaration of construct_eptp()
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
KVM: Using macros instead of magic values
MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
KVM: VMX: Add guest physical address check in EPT violation and misconfig
KVM: VMX: introduce vmx_need_pf_intercept
KVM: x86: update exception bitmap on CPUID changes
KVM: x86: rename update_bp_intercept to update_exception_bitmap
...

Linus Torvalds
2020-08-07 03:59:31 +0800
6d2b84a4e Merge tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull sched/fifo updates from Ingo Molnar:
"This adds the sched_set_fifo*() encapsulation APIs to remove static
priority level knowledge from non-scheduler code.

The three APIs for non-scheduler code to set SCHED_FIFO are:

- sched_set_fifo()
- sched_set_fifo_low()
- sched_set_normal()

These are two FIFO priority levels: default (high), and a 'low'
priority level, plus sched_set_normal() to set the policy back to
non-SCHED_FIFO.

Since the changes affect a lot of non-scheduler code, we kept this in
a separate tree"

* tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
sched,tracing: Convert to sched_set_fifo()
sched: Remove sched_set_*() return value
sched: Remove sched_setscheduler*() EXPORTs
sched,psi: Convert to sched_set_fifo_low()
sched,rcutorture: Convert to sched_set_fifo_low()
sched,rcuperf: Convert to sched_set_fifo_low()
sched,locktorture: Convert to sched_set_fifo()
sched,irq: Convert to sched_set_fifo()
sched,watchdog: Convert to sched_set_fifo()
sched,serial: Convert to sched_set_fifo()
sched,powerclamp: Convert to sched_set_fifo()
sched,ion: Convert to sched_set_normal()
sched,powercap: Convert to sched_set_fifo*()
sched,spi: Convert to sched_set_fifo*()
sched,mmc: Convert to sched_set_fifo*()
sched,ivtv: Convert to sched_set_fifo*()
sched,drm/scheduler: Convert to sched_set_fifo*()
sched,msm: Convert to sched_set_fifo*()
sched,psci: Convert to sched_set_fifo*()
sched,drbd: Convert to sched_set_fifo*()
...

Linus Torvalds
2020-08-07 02:55:43 +0800

06 Aug, 2020

1 commit

a703f3633 Merge branch 'WIP.locking/seqlocks' into locking/urgent ... Browse Code »

Pick up the full seqlock series PeterZ is working on.

Signed-off-by: Ingo Molnar

Ingo Molnar
2020-08-06 16:16:38 +0800

05 Aug, 2020

1 commit

99ea1521a Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull uninitialized_var() macro removal from Kees Cook:
"This is long overdue, and has hidden too many bugs over the years. The
series has several "by hand" fixes, and then a trivial treewide
replacement.

- Clean up non-trivial uses of uninitialized_var()

- Update documentation and checkpatch for uninitialized_var() removal

- Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
compiler: Remove uninitialized_var() macro
treewide: Remove uninitialized_var() usage
checkpatch: Remove awareness of uninitialized_var() macro
mm/debug_vm_pgtable: Remove uninitialized_var() usage
f2fs: Eliminate usage of uninitialized_var() macro
media: sur40: Remove uninitialized_var() usage
KVM: PPC: Book3S PR: Remove uninitialized_var() usage
clk: spear: Remove uninitialized_var() usage
clk: st: Remove uninitialized_var() usage
spi: davinci: Remove uninitialized_var() usage
ide: Remove uninitialized_var() usage
rtlwifi: rtl8192cu: Remove uninitialized_var() usage
b43: Remove uninitialized_var() usage
drbd: Remove uninitialized_var() usage
x86/mm/numa: Remove uninitialized_var() usage
docs: deprecated.rst: Add uninitialized_var()

Linus Torvalds
2020-08-05 04:49:43 +0800

04 Aug, 2020

1 commit

9ba19ccd2 Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:

- LKMM updates: mostly documentation changes, but also some new litmus
tests for atomic ops.

- KCSAN updates: the most important change is that GCC 11 now has all
fixes in place to support KCSAN, so GCC support can be enabled again.
Also more annotations.

- futex updates: minor cleanups and simplifications

- seqlock updates: merge preparatory changes/cleanups for the
'associated locks' facilities.

- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out ,
decouple lockdep from other low level headers some more
- fix NMI handling

- misc cleanups and smaller fixes

* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
kcsan: Improve IRQ state trace reporting
lockdep: Refactor IRQ trace events fields into struct
seqlock: lockdep assert non-preemptibility on seqcount_t write
lockdep: Add preemption enabled/disabled assertion APIs
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock: Reorder seqcount_t and seqlock_t API definitions
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
seqlock: Properly format kernel-doc code samples
Documentation: locking: Describe seqlock design and usage
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
locking/atomic: Move ATOMIC_INIT into linux/types.h
lockdep: Move list.h inclusion into lockdep.h
locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
futex: Remove unused or redundant includes
futex: Consistently use fshared as boolean
futex: Remove needless goto's
futex: Remove put_futex_key()
rwsem: fix commas in initialisation
docs: locking: Replace HTTP links with HTTPS ones
...

Linus Torvalds
2020-08-04 05:39:35 +0800

03 Aug, 2020

1 commit

992414a18 Merge branch 'locking/nmi' into locking/core, to pick up completed topic branch ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2020-08-03 19:00:27 +0800

01 Aug, 2020

1 commit

63722bbca Merge branch 'kcsan' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/li… ... Browse Code »

…nux-rcu into locking/core

Pull v5.9 KCSAN bits from Paul E. McKenney.

Perhaps the most important change is that GCC 11 now has all fixes in place
to support KCSAN, so GCC support can be enabled again.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2020-08-01 15:26:27 +0800

31 Jul, 2020

1 commit

0584df9c1 lockdep: Refactor IRQ trace events fields into struct ... Browse Code »

Refactor the IRQ trace events fields, used for printing information
about the IRQ trace events, into a separate struct 'irqtrace_events'.

This improves readability by separating the information only used in
reporting, as well as enables (simplified) storing/restoring of
irqtrace_events snapshots.

No functional change intended.

Signed-off-by: Marco Elver
Signed-off-by: Ingo Molnar
Link: https://lore.kernel.org/r/20200729110916.3920464-1-elver@google.com
Signed-off-by: Ingo Molnar

Marco Elver
2020-07-31 18:11:58 +0800