locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()

commit 54cf809b9512be95f53ed4a5e3b631d1ac42f0fa upstream. Similar to commits: 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") qspinlock suffers from the fact that the _Q_LOCKED_VAL store is unordered inside the ACQUIRE of the lock. And while this is not a problem for the regular mutual exclusive critical section usage of spinlocks, it breaks creative locking like: spin_lock(A) spin_lock(B) spin_unlock_wait(B) if (!spin_is_locked(A)) do_something() do_something() In that both CPUs can end up running do_something at the same time, because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait() spin_is_locked() loads (even on x86!!). To avoid making the normal case slower, add smp_mb()s to the less used spin_unlock_wait() / spin_is_locked() side of things to avoid this problem. Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net> Reported-by: Giovanni Gherdovich <ggherdovich@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
commit 54cf809b9512be95f53ed4a5e3b631d1ac42f0fa upstream. Similar to commits: 51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") qspinlock suffers from the fact that the _Q_LOCKED_VAL store is unordered inside the ACQUIRE of the lock. And while this is not a problem for the regular mutual exclusive critical section usage of spinlocks, it breaks creative locking like: spin_lock(A) spin_lock(B) spin_unlock_wait(B) if (!spin_is_locked(A)) do_something() do_something() In that both CPUs can end up running do_something at the same time, because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait() spin_is_locked() loads (even on x86!!). To avoid making the normal case slower, add smp_mb()s to the less used spin_unlock_wait() / spin_is_locked() side of things to avoid this problem. Reported-and-tested-by: Davidlohr Bueso <dave@stgolabs.net> Reported-by: Giovanni Gherdovich <ggherdovich@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra · Greg Kroah-Hartman
1 parent df8ad62006
Showing 1 changed file with 26 additions and 1 deletions Side-by-side Diff
include/asm-generic/qspinlock.h
@@ -27,7 +27,30 @@
  */
 static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
 {
-	return atomic_read(&lock->val);
+	/*
+	 * queued_spin_lock_slowpath() can ACQUIRE the lock before
+	 * issuing the unordered store that sets _Q_LOCKED_VAL.
+	 *
+	 * See both smp_cond_acquire() sites for more detail.
+	 *
+	 * This however means that in code like:
+	 *
+	 *   spin_lock(A)		spin_lock(B)
+	 *   spin_unlock_wait(B)	spin_is_locked(A)
+	 *   do_something()		do_something()
+	 *
+	 * Both CPUs can end up running do_something() because the store
+	 * setting _Q_LOCKED_VAL will pass through the loads in
+	 * spin_unlock_wait() and/or spin_is_locked().
+	 *
+	 * Avoid this by issuing a full memory barrier between the spin_lock()
+	 * and the loads in spin_unlock_wait() and spin_is_locked().
+	 *
+	 * Note that regular mutual exclusion doesn't care about this
+	 * delayed store.
+	 */
+	smp_mb();
+	return atomic_read(&lock->val) & _Q_LOCKED_MASK;
 }
  
 /**
@@ -107,6 +130,8 @@
  */
 static inline void queued_spin_unlock_wait(struct qspinlock *lock)
 {
+	/* See queued_spin_is_locked() */
+	smp_mb();
 	while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
 		cpu_relax();
 }
...	...	@@ -27,7 +27,30 @@
27	27	*/
28	28	static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
29	29	{
30		- return atomic_read(&lock->val);
	30	+ /*
	31	+ * queued_spin_lock_slowpath() can ACQUIRE the lock before
	32	+ * issuing the unordered store that sets _Q_LOCKED_VAL.
	33	+ *
	34	+ * See both smp_cond_acquire() sites for more detail.
	35	+ *
	36	+ * This however means that in code like:
	37	+ *
	38	+ * spin_lock(A) spin_lock(B)
	39	+ * spin_unlock_wait(B) spin_is_locked(A)
	40	+ * do_something() do_something()
	41	+ *
	42	+ * Both CPUs can end up running do_something() because the store
	43	+ * setting _Q_LOCKED_VAL will pass through the loads in
	44	+ * spin_unlock_wait() and/or spin_is_locked().
	45	+ *
	46	+ * Avoid this by issuing a full memory barrier between the spin_lock()
	47	+ * and the loads in spin_unlock_wait() and spin_is_locked().
	48	+ *
	49	+ * Note that regular mutual exclusion doesn't care about this
	50	+ * delayed store.
	51	+ */
	52	+ smp_mb();
	53	+ return atomic_read(&lock->val) & _Q_LOCKED_MASK;
31	54	}
32	55
33	56	/**
...	...	@@ -107,6 +130,8 @@
107	130	*/
108	131	static inline void queued_spin_unlock_wait(struct qspinlock *lock)
109	132	{
	133	+ /* See queued_spin_is_locked() */
	134	+ smp_mb();
110	135	while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
111	136	cpu_relax();
112	137	}