Commit 3a6bfbc91df04b081a44d419e0260bad54abddf7

Authored by Davidlohr Bueso
Committed by Ingo Molnar
1 parent acf5937726

arch, locking: Ciao arch_mutex_cpu_relax()

The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Showing 36 changed files with 51 additions and 25 deletions Side-by-side Diff

arch/alpha/include/asm/processor.h
... ... @@ -57,6 +57,7 @@
57 57 ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
58 58  
59 59 #define cpu_relax() barrier()
  60 +#define cpu_relax_lowlatency() cpu_relax()
60 61  
61 62 #define ARCH_HAS_PREFETCH
62 63 #define ARCH_HAS_PREFETCHW
arch/arc/include/asm/processor.h
... ... @@ -62,6 +62,8 @@
62 62 #define cpu_relax() do { } while (0)
63 63 #endif
64 64  
  65 +#define cpu_relax_lowlatency() cpu_relax()
  66 +
65 67 #define copy_segments(tsk, mm) do { } while (0)
66 68 #define release_segments(mm) do { } while (0)
67 69  
arch/arm/include/asm/processor.h
... ... @@ -82,6 +82,8 @@
82 82 #define cpu_relax() barrier()
83 83 #endif
84 84  
  85 +#define cpu_relax_lowlatency() cpu_relax()
  86 +
85 87 #define task_pt_regs(p) \
86 88 ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
87 89  
arch/arm64/include/asm/processor.h
... ... @@ -129,6 +129,7 @@
129 129 unsigned long get_wchan(struct task_struct *p);
130 130  
131 131 #define cpu_relax() barrier()
  132 +#define cpu_relax_lowlatency() cpu_relax()
132 133  
133 134 /* Thread switching */
134 135 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
arch/avr32/include/asm/processor.h
... ... @@ -92,6 +92,7 @@
92 92 #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
93 93  
94 94 #define cpu_relax() barrier()
  95 +#define cpu_relax_lowlatency() cpu_relax()
95 96 #define cpu_sync_pipeline() asm volatile("sub pc, -2" : : : "memory")
96 97  
97 98 struct cpu_context {
arch/blackfin/include/asm/processor.h
... ... @@ -99,7 +99,7 @@
99 99 #define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp)
100 100  
101 101 #define cpu_relax() smp_mb()
102   -
  102 +#define cpu_relax_lowlatency() cpu_relax()
103 103  
104 104 /* Get the Silicon Revision of the chip */
105 105 static inline uint32_t __pure bfin_revid(void)
arch/c6x/include/asm/processor.h
... ... @@ -121,6 +121,7 @@
121 121 #define KSTK_ESP(task) (task_pt_regs(task)->sp)
122 122  
123 123 #define cpu_relax() do { } while (0)
  124 +#define cpu_relax_lowlatency() cpu_relax()
124 125  
125 126 extern const struct seq_operations cpuinfo_op;
126 127  
arch/cris/include/asm/processor.h
... ... @@ -63,6 +63,7 @@
63 63 #define init_stack (init_thread_union.stack)
64 64  
65 65 #define cpu_relax() barrier()
  66 +#define cpu_relax_lowlatency() cpu_relax()
66 67  
67 68 void default_idle(void);
68 69  
arch/hexagon/include/asm/processor.h
... ... @@ -56,6 +56,7 @@
56 56 }
57 57  
58 58 #define cpu_relax() __vmyield()
  59 +#define cpu_relax_lowlatency() cpu_relax()
59 60  
60 61 /*
61 62 * Decides where the kernel will search for a free chunk of vm space during
arch/ia64/include/asm/processor.h
... ... @@ -548,6 +548,7 @@
548 548 }
549 549  
550 550 #define cpu_relax() ia64_hint(ia64_hint_pause)
  551 +#define cpu_relax_lowlatency() cpu_relax()
551 552  
552 553 static inline int
553 554 ia64_get_irr(unsigned int vector)
arch/m32r/include/asm/processor.h
... ... @@ -133,6 +133,7 @@
133 133 #define KSTK_ESP(tsk) ((tsk)->thread.sp)
134 134  
135 135 #define cpu_relax() barrier()
  136 +#define cpu_relax_lowlatency() cpu_relax()
136 137  
137 138 #endif /* _ASM_M32R_PROCESSOR_H */
arch/m68k/include/asm/processor.h
... ... @@ -176,6 +176,7 @@
176 176 #define task_pt_regs(tsk) ((struct pt_regs *) ((tsk)->thread.esp0))
177 177  
178 178 #define cpu_relax() barrier()
  179 +#define cpu_relax_lowlatency() cpu_relax()
179 180  
180 181 #endif
arch/metag/include/asm/processor.h
... ... @@ -155,6 +155,7 @@
155 155 #define user_stack_pointer(regs) ((regs)->ctx.AX[0].U0)
156 156  
157 157 #define cpu_relax() barrier()
  158 +#define cpu_relax_lowlatency() cpu_relax()
158 159  
159 160 extern void setup_priv(void);
160 161  
arch/microblaze/include/asm/processor.h
... ... @@ -22,6 +22,7 @@
22 22 extern const struct seq_operations cpuinfo_op;
23 23  
24 24 # define cpu_relax() barrier()
  25 +# define cpu_relax_lowlatency() cpu_relax()
25 26  
26 27 #define task_pt_regs(tsk) \
27 28 (((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
arch/mips/include/asm/processor.h
... ... @@ -367,6 +367,7 @@
367 367 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
368 368  
369 369 #define cpu_relax() barrier()
  370 +#define cpu_relax_lowlatency() cpu_relax()
370 371  
371 372 /*
372 373 * Return_address is a replacement for __builtin_return_address(count)
arch/mn10300/include/asm/processor.h
... ... @@ -68,7 +68,9 @@
68 68 extern void identify_cpu(struct mn10300_cpuinfo *);
69 69 extern void print_cpu_info(struct mn10300_cpuinfo *);
70 70 extern void dodgy_tsc(void);
  71 +
71 72 #define cpu_relax() barrier()
  73 +#define cpu_relax_lowlatency() cpu_relax()
72 74  
73 75 /*
74 76 * User space process size: 1.75GB (default).
arch/openrisc/include/asm/processor.h
... ... @@ -101,6 +101,7 @@
101 101 #define init_stack (init_thread_union.stack)
102 102  
103 103 #define cpu_relax() barrier()
  104 +#define cpu_relax_lowlatency() cpu_relax()
104 105  
105 106 #endif /* __ASSEMBLY__ */
106 107 #endif /* __ASM_OPENRISC_PROCESSOR_H */
arch/parisc/include/asm/processor.h
... ... @@ -338,6 +338,7 @@
338 338 #define KSTK_ESP(tsk) ((tsk)->thread.regs.gr[30])
339 339  
340 340 #define cpu_relax() barrier()
  341 +#define cpu_relax_lowlatency() cpu_relax()
341 342  
342 343 /* Used as a macro to identify the combined VIPT/PIPT cached
343 344 * CPUs which require a guarantee of coherency (no inequivalent
arch/powerpc/include/asm/processor.h
... ... @@ -400,6 +400,8 @@
400 400 #define cpu_relax() barrier()
401 401 #endif
402 402  
  403 +#define cpu_relax_lowlatency() cpu_relax()
  404 +
403 405 /* Check that a certain kernel stack pointer is valid in task_struct p */
404 406 int validate_sp(unsigned long sp, struct task_struct *p,
405 407 unsigned long nbytes);
arch/s390/include/asm/processor.h
... ... @@ -217,7 +217,7 @@
217 217 barrier();
218 218 }
219 219  
220   -#define arch_mutex_cpu_relax() barrier()
  220 +#define cpu_relax_lowlatency() barrier()
221 221  
222 222 static inline void psw_set_key(unsigned int key)
223 223 {
arch/score/include/asm/processor.h
... ... @@ -24,6 +24,7 @@
24 24 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
25 25  
26 26 #define cpu_relax() barrier()
  27 +#define cpu_relax_lowlatency() cpu_relax()
27 28 #define release_thread(thread) do {} while (0)
28 29  
29 30 /*
arch/sh/include/asm/processor.h
... ... @@ -97,6 +97,7 @@
97 97  
98 98 #define cpu_sleep() __asm__ __volatile__ ("sleep" : : : "memory")
99 99 #define cpu_relax() barrier()
  100 +#define cpu_relax_lowlatency() cpu_relax()
100 101  
101 102 void default_idle(void);
102 103 void stop_this_cpu(void *);
arch/sparc/include/asm/processor_32.h
... ... @@ -119,6 +119,8 @@
119 119 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
120 120  
121 121 #define cpu_relax() barrier()
  122 +#define cpu_relax_lowlatency() cpu_relax()
  123 +
122 124 extern void (*sparc_idle)(void);
123 125  
124 126 #endif
arch/sparc/include/asm/processor_64.h
... ... @@ -216,6 +216,7 @@
216 216 "nop\n\t" \
217 217 ".previous" \
218 218 ::: "memory")
  219 +#define cpu_relax_lowlatency() cpu_relax()
219 220  
220 221 /* Prefetch support. This is tuned for UltraSPARC-III and later.
221 222 * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
arch/tile/include/asm/processor.h
... ... @@ -266,6 +266,8 @@
266 266 barrier();
267 267 }
268 268  
  269 +#define cpu_relax_lowlatency() cpu_relax()
  270 +
269 271 /* Info on this processor (see fs/proc/cpuinfo.c) */
270 272 struct seq_operations;
271 273 extern const struct seq_operations cpuinfo_op;
arch/unicore32/include/asm/processor.h
... ... @@ -71,6 +71,7 @@
71 71 unsigned long get_wchan(struct task_struct *p);
72 72  
73 73 #define cpu_relax() barrier()
  74 +#define cpu_relax_lowlatency() cpu_relax()
74 75  
75 76 #define task_pt_regs(p) \
76 77 ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
arch/x86/include/asm/processor.h
... ... @@ -696,6 +696,8 @@
696 696 rep_nop();
697 697 }
698 698  
  699 +#define cpu_relax_lowlatency() cpu_relax()
  700 +
699 701 /* Stop speculative execution and prefetching of modified code. */
700 702 static inline void sync_core(void)
701 703 {
arch/x86/um/asm/processor.h
... ... @@ -25,7 +25,8 @@
25 25 __asm__ __volatile__("rep;nop": : :"memory");
26 26 }
27 27  
28   -#define cpu_relax() rep_nop()
  28 +#define cpu_relax() rep_nop()
  29 +#define cpu_relax_lowlatency() cpu_relax()
29 30  
30 31 #include <asm/processor-generic.h>
31 32  
arch/xtensa/include/asm/processor.h
... ... @@ -182,6 +182,7 @@
182 182 #define KSTK_ESP(tsk) (task_pt_regs(tsk)->areg[1])
183 183  
184 184 #define cpu_relax() barrier()
  185 +#define cpu_relax_lowlatency() cpu_relax()
185 186  
186 187 /* Special register access. */
187 188  
include/linux/mutex.h
... ... @@ -176,9 +176,5 @@
176 176  
177 177 extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
178 178  
179   -#ifndef arch_mutex_cpu_relax
180   -# define arch_mutex_cpu_relax() cpu_relax()
181   -#endif
182   -
183 179 #endif /* __LINUX_MUTEX_H */
kernel/locking/mcs_spinlock.c
1   -
2 1 #include <linux/percpu.h>
3   -#include <linux/mutex.h>
4 2 #include <linux/sched.h>
5 3 #include "mcs_spinlock.h"
6 4  
... ... @@ -79,7 +77,7 @@
79 77 break;
80 78 }
81 79  
82   - arch_mutex_cpu_relax();
  80 + cpu_relax_lowlatency();
83 81 }
84 82  
85 83 return next;
... ... @@ -120,7 +118,7 @@
120 118 if (need_resched())
121 119 goto unqueue;
122 120  
123   - arch_mutex_cpu_relax();
  121 + cpu_relax_lowlatency();
124 122 }
125 123 return true;
126 124  
... ... @@ -146,7 +144,7 @@
146 144 if (smp_load_acquire(&node->locked))
147 145 return true;
148 146  
149   - arch_mutex_cpu_relax();
  147 + cpu_relax_lowlatency();
150 148  
151 149 /*
152 150 * Or we race against a concurrent unqueue()'s step-B, in which
kernel/locking/mcs_spinlock.h
... ... @@ -27,7 +27,7 @@
27 27 #define arch_mcs_spin_lock_contended(l) \
28 28 do { \
29 29 while (!(smp_load_acquire(l))) \
30   - arch_mutex_cpu_relax(); \
  30 + cpu_relax_lowlatency(); \
31 31 } while (0)
32 32 #endif
33 33  
... ... @@ -104,7 +104,7 @@
104 104 return;
105 105 /* Wait until the next pointer is set */
106 106 while (!(next = ACCESS_ONCE(node->next)))
107   - arch_mutex_cpu_relax();
  107 + cpu_relax_lowlatency();
108 108 }
109 109  
110 110 /* Pass lock to next waiter. */
kernel/locking/mutex.c
... ... @@ -146,7 +146,7 @@
146 146 if (need_resched())
147 147 break;
148 148  
149   - arch_mutex_cpu_relax();
  149 + cpu_relax_lowlatency();
150 150 }
151 151 rcu_read_unlock();
152 152  
... ... @@ -464,7 +464,7 @@
464 464 * memory barriers as we'll eventually observe the right
465 465 * values at the cost of a few extra spins.
466 466 */
467   - arch_mutex_cpu_relax();
  467 + cpu_relax_lowlatency();
468 468 }
469 469 osq_unlock(&lock->osq);
470 470 slowpath:
kernel/locking/qrwlock.c
... ... @@ -20,7 +20,6 @@
20 20 #include <linux/cpumask.h>
21 21 #include <linux/percpu.h>
22 22 #include <linux/hardirq.h>
23   -#include <linux/mutex.h>
24 23 #include <asm/qrwlock.h>
25 24  
26 25 /**
... ... @@ -35,7 +34,7 @@
35 34 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
36 35 {
37 36 while ((cnts & _QW_WMASK) == _QW_LOCKED) {
38   - arch_mutex_cpu_relax();
  37 + cpu_relax_lowlatency();
39 38 cnts = smp_load_acquire((u32 *)&lock->cnts);
40 39 }
41 40 }
... ... @@ -75,7 +74,7 @@
75 74 * to make sure that the write lock isn't taken.
76 75 */
77 76 while (atomic_read(&lock->cnts) & _QW_WMASK)
78   - arch_mutex_cpu_relax();
  77 + cpu_relax_lowlatency();
79 78  
80 79 cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
81 80 rspin_until_writer_unlock(lock, cnts);
... ... @@ -114,7 +113,7 @@
114 113 cnts | _QW_WAITING) == cnts))
115 114 break;
116 115  
117   - arch_mutex_cpu_relax();
  116 + cpu_relax_lowlatency();
118 117 }
119 118  
120 119 /* When no more readers, set the locked flag */
... ... @@ -125,7 +124,7 @@
125 124 _QW_LOCKED) == _QW_WAITING))
126 125 break;
127 126  
128   - arch_mutex_cpu_relax();
  127 + cpu_relax_lowlatency();
129 128 }
130 129 unlock:
131 130 arch_spin_unlock(&lock->lock);
kernel/locking/rwsem-xadd.c
... ... @@ -329,7 +329,7 @@
329 329 if (need_resched())
330 330 break;
331 331  
332   - arch_mutex_cpu_relax();
  332 + cpu_relax_lowlatency();
333 333 }
334 334 rcu_read_unlock();
335 335  
... ... @@ -381,7 +381,7 @@
381 381 * memory barriers as we'll eventually observe the right
382 382 * values at the cost of a few extra spins.
383 383 */
384   - arch_mutex_cpu_relax();
  384 + cpu_relax_lowlatency();
385 385 }
386 386 osq_unlock(&sem->osq);
387 387 done:
1 1 #include <linux/export.h>
2 2 #include <linux/lockref.h>
3   -#include <linux/mutex.h>
4 3  
5 4 #if USE_CMPXCHG_LOCKREF
6 5  
... ... @@ -29,7 +28,7 @@
29 28 if (likely(old.lock_count == prev.lock_count)) { \
30 29 SUCCESS; \
31 30 } \
32   - arch_mutex_cpu_relax(); \
  31 + cpu_relax_lowlatency(); \
33 32 } \
34 33 } while (0)
35 34