Commit a6537be9324c67b41f6d98f5a60a1bd5a8e02861
Committed by
Linus Torvalds
1 parent
23f78d4a03
Exists in
master
and in
20 other branches
[PATCH] pi-futex: rt mutex docs
Add rt-mutex documentation. [rostedt@goodmis.org: Update rt-mutex-design.txt as per Randy Dunlap suggestions] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing 3 changed files with 981 additions and 0 deletions Side-by-side Diff
Documentation/pi-futex.txt
1 | +Lightweight PI-futexes | |
2 | +---------------------- | |
3 | + | |
4 | +We are calling them lightweight for 3 reasons: | |
5 | + | |
6 | + - in the user-space fastpath a PI-enabled futex involves no kernel work | |
7 | + (or any other PI complexity) at all. No registration, no extra kernel | |
8 | + calls - just pure fast atomic ops in userspace. | |
9 | + | |
10 | + - even in the slowpath, the system call and scheduling pattern is very | |
11 | + similar to normal futexes. | |
12 | + | |
13 | + - the in-kernel PI implementation is streamlined around the mutex | |
14 | + abstraction, with strict rules that keep the implementation | |
15 | + relatively simple: only a single owner may own a lock (i.e. no | |
16 | + read-write lock support), only the owner may unlock a lock, no | |
17 | + recursive locking, etc. | |
18 | + | |
19 | +Priority Inheritance - why? | |
20 | +--------------------------- | |
21 | + | |
22 | +The short reply: user-space PI helps achieving/improving determinism for | |
23 | +user-space applications. In the best-case, it can help achieve | |
24 | +determinism and well-bound latencies. Even in the worst-case, PI will | |
25 | +improve the statistical distribution of locking related application | |
26 | +delays. | |
27 | + | |
28 | +The longer reply: | |
29 | +----------------- | |
30 | + | |
31 | +Firstly, sharing locks between multiple tasks is a common programming | |
32 | +technique that often cannot be replaced with lockless algorithms. As we | |
33 | +can see it in the kernel [which is a quite complex program in itself], | |
34 | +lockless structures are rather the exception than the norm - the current | |
35 | +ratio of lockless vs. locky code for shared data structures is somewhere | |
36 | +between 1:10 and 1:100. Lockless is hard, and the complexity of lockless | |
37 | +algorithms often endangers to ability to do robust reviews of said code. | |
38 | +I.e. critical RT apps often choose lock structures to protect critical | |
39 | +data structures, instead of lockless algorithms. Furthermore, there are | |
40 | +cases (like shared hardware, or other resource limits) where lockless | |
41 | +access is mathematically impossible. | |
42 | + | |
43 | +Media players (such as Jack) are an example of reasonable application | |
44 | +design with multiple tasks (with multiple priority levels) sharing | |
45 | +short-held locks: for example, a highprio audio playback thread is | |
46 | +combined with medium-prio construct-audio-data threads and low-prio | |
47 | +display-colory-stuff threads. Add video and decoding to the mix and | |
48 | +we've got even more priority levels. | |
49 | + | |
50 | +So once we accept that synchronization objects (locks) are an | |
51 | +unavoidable fact of life, and once we accept that multi-task userspace | |
52 | +apps have a very fair expectation of being able to use locks, we've got | |
53 | +to think about how to offer the option of a deterministic locking | |
54 | +implementation to user-space. | |
55 | + | |
56 | +Most of the technical counter-arguments against doing priority | |
57 | +inheritance only apply to kernel-space locks. But user-space locks are | |
58 | +different, there we cannot disable interrupts or make the task | |
59 | +non-preemptible in a critical section, so the 'use spinlocks' argument | |
60 | +does not apply (user-space spinlocks have the same priority inversion | |
61 | +problems as other user-space locking constructs). Fact is, pretty much | |
62 | +the only technique that currently enables good determinism for userspace | |
63 | +locks (such as futex-based pthread mutexes) is priority inheritance: | |
64 | + | |
65 | +Currently (without PI), if a high-prio and a low-prio task shares a lock | |
66 | +[this is a quite common scenario for most non-trivial RT applications], | |
67 | +even if all critical sections are coded carefully to be deterministic | |
68 | +(i.e. all critical sections are short in duration and only execute a | |
69 | +limited number of instructions), the kernel cannot guarantee any | |
70 | +deterministic execution of the high-prio task: any medium-priority task | |
71 | +could preempt the low-prio task while it holds the shared lock and | |
72 | +executes the critical section, and could delay it indefinitely. | |
73 | + | |
74 | +Implementation: | |
75 | +--------------- | |
76 | + | |
77 | +As mentioned before, the userspace fastpath of PI-enabled pthread | |
78 | +mutexes involves no kernel work at all - they behave quite similarly to | |
79 | +normal futex-based locks: a 0 value means unlocked, and a value==TID | |
80 | +means locked. (This is the same method as used by list-based robust | |
81 | +futexes.) Userspace uses atomic ops to lock/unlock these mutexes without | |
82 | +entering the kernel. | |
83 | + | |
84 | +To handle the slowpath, we have added two new futex ops: | |
85 | + | |
86 | + FUTEX_LOCK_PI | |
87 | + FUTEX_UNLOCK_PI | |
88 | + | |
89 | +If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to | |
90 | +TID fails], then FUTEX_LOCK_PI is called. The kernel does all the | |
91 | +remaining work: if there is no futex-queue attached to the futex address | |
92 | +yet then the code looks up the task that owns the futex [it has put its | |
93 | +own TID into the futex value], and attaches a 'PI state' structure to | |
94 | +the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, | |
95 | +kernel-based synchronization object. The 'other' task is made the owner | |
96 | +of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the | |
97 | +futex value. Then this task tries to lock the rt-mutex, on which it | |
98 | +blocks. Once it returns, it has the mutex acquired, and it sets the | |
99 | +futex value to its own TID and returns. Userspace has no other work to | |
100 | +perform - it now owns the lock, and futex value contains | |
101 | +FUTEX_WAITERS|TID. | |
102 | + | |
103 | +If the unlock side fastpath succeeds, [i.e. userspace manages to do a | |
104 | +TID -> 0 atomic transition of the futex value], then no kernel work is | |
105 | +triggered. | |
106 | + | |
107 | +If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), | |
108 | +then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the | |
109 | +behalf of userspace - and it also unlocks the attached | |
110 | +pi_state->rt_mutex and thus wakes up any potential waiters. | |
111 | + | |
112 | +Note that under this approach, contrary to previous PI-futex approaches, | |
113 | +there is no prior 'registration' of a PI-futex. [which is not quite | |
114 | +possible anyway, due to existing ABI properties of pthread mutexes.] | |
115 | + | |
116 | +Also, under this scheme, 'robustness' and 'PI' are two orthogonal | |
117 | +properties of futexes, and all four combinations are possible: futex, | |
118 | +robust-futex, PI-futex, robust+PI-futex. | |
119 | + | |
120 | +More details about priority inheritance can be found in | |
121 | +Documentation/rtmutex.txt. |
Documentation/rt-mutex-design.txt
1 | +# | |
2 | +# Copyright (c) 2006 Steven Rostedt | |
3 | +# Licensed under the GNU Free Documentation License, Version 1.2 | |
4 | +# | |
5 | + | |
6 | +RT-mutex implementation design | |
7 | +------------------------------ | |
8 | + | |
9 | +This document tries to describe the design of the rtmutex.c implementation. | |
10 | +It doesn't describe the reasons why rtmutex.c exists. For that please see | |
11 | +Documentation/rt-mutex.txt. Although this document does explain problems | |
12 | +that happen without this code, but that is in the concept to understand | |
13 | +what the code actually is doing. | |
14 | + | |
15 | +The goal of this document is to help others understand the priority | |
16 | +inheritance (PI) algorithm that is used, as well as reasons for the | |
17 | +decisions that were made to implement PI in the manner that was done. | |
18 | + | |
19 | + | |
20 | +Unbounded Priority Inversion | |
21 | +---------------------------- | |
22 | + | |
23 | +Priority inversion is when a lower priority process executes while a higher | |
24 | +priority process wants to run. This happens for several reasons, and | |
25 | +most of the time it can't be helped. Anytime a high priority process wants | |
26 | +to use a resource that a lower priority process has (a mutex for example), | |
27 | +the high priority process must wait until the lower priority process is done | |
28 | +with the resource. This is a priority inversion. What we want to prevent | |
29 | +is something called unbounded priority inversion. That is when the high | |
30 | +priority process is prevented from running by a lower priority process for | |
31 | +an undetermined amount of time. | |
32 | + | |
33 | +The classic example of unbounded priority inversion is were you have three | |
34 | +processes, let's call them processes A, B, and C, where A is the highest | |
35 | +priority process, C is the lowest, and B is in between. A tries to grab a lock | |
36 | +that C owns and must wait and lets C run to release the lock. But in the | |
37 | +meantime, B executes, and since B is of a higher priority than C, it preempts C, | |
38 | +but by doing so, it is in fact preempting A which is a higher priority process. | |
39 | +Now there's no way of knowing how long A will be sleeping waiting for C | |
40 | +to release the lock, because for all we know, B is a CPU hog and will | |
41 | +never give C a chance to release the lock. This is called unbounded priority | |
42 | +inversion. | |
43 | + | |
44 | +Here's a little ASCII art to show the problem. | |
45 | + | |
46 | + grab lock L1 (owned by C) | |
47 | + | | |
48 | +A ---+ | |
49 | + C preempted by B | |
50 | + | | |
51 | +C +----+ | |
52 | + | |
53 | +B +--------> | |
54 | + B now keeps A from running. | |
55 | + | |
56 | + | |
57 | +Priority Inheritance (PI) | |
58 | +------------------------- | |
59 | + | |
60 | +There are several ways to solve this issue, but other ways are out of scope | |
61 | +for this document. Here we only discuss PI. | |
62 | + | |
63 | +PI is where a process inherits the priority of another process if the other | |
64 | +process blocks on a lock owned by the current process. To make this easier | |
65 | +to understand, let's use the previous example, with processes A, B, and C again. | |
66 | + | |
67 | +This time, when A blocks on the lock owned by C, C would inherit the priority | |
68 | +of A. So now if B becomes runnable, it would not preempt C, since C now has | |
69 | +the high priority of A. As soon as C releases the lock, it loses its | |
70 | +inherited priority, and A then can continue with the resource that C had. | |
71 | + | |
72 | +Terminology | |
73 | +----------- | |
74 | + | |
75 | +Here I explain some terminology that is used in this document to help describe | |
76 | +the design that is used to implement PI. | |
77 | + | |
78 | +PI chain - The PI chain is an ordered series of locks and processes that cause | |
79 | + processes to inherit priorities from a previous process that is | |
80 | + blocked on one of its locks. This is described in more detail | |
81 | + later in this document. | |
82 | + | |
83 | +mutex - In this document, to differentiate from locks that implement | |
84 | + PI and spin locks that are used in the PI code, from now on | |
85 | + the PI locks will be called a mutex. | |
86 | + | |
87 | +lock - In this document from now on, I will use the term lock when | |
88 | + referring to spin locks that are used to protect parts of the PI | |
89 | + algorithm. These locks disable preemption for UP (when | |
90 | + CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from | |
91 | + entering critical sections simultaneously. | |
92 | + | |
93 | +spin lock - Same as lock above. | |
94 | + | |
95 | +waiter - A waiter is a struct that is stored on the stack of a blocked | |
96 | + process. Since the scope of the waiter is within the code for | |
97 | + a process being blocked on the mutex, it is fine to allocate | |
98 | + the waiter on the process's stack (local variable). This | |
99 | + structure holds a pointer to the task, as well as the mutex that | |
100 | + the task is blocked on. It also has the plist node structures to | |
101 | + place the task in the waiter_list of a mutex as well as the | |
102 | + pi_list of a mutex owner task (described below). | |
103 | + | |
104 | + waiter is sometimes used in reference to the task that is waiting | |
105 | + on a mutex. This is the same as waiter->task. | |
106 | + | |
107 | +waiters - A list of processes that are blocked on a mutex. | |
108 | + | |
109 | +top waiter - The highest priority process waiting on a specific mutex. | |
110 | + | |
111 | +top pi waiter - The highest priority process waiting on one of the mutexes | |
112 | + that a specific process owns. | |
113 | + | |
114 | +Note: task and process are used interchangeably in this document, mostly to | |
115 | + differentiate between two processes that are being described together. | |
116 | + | |
117 | + | |
118 | +PI chain | |
119 | +-------- | |
120 | + | |
121 | +The PI chain is a list of processes and mutexes that may cause priority | |
122 | +inheritance to take place. Multiple chains may converge, but a chain | |
123 | +would never diverge, since a process can't be blocked on more than one | |
124 | +mutex at a time. | |
125 | + | |
126 | +Example: | |
127 | + | |
128 | + Process: A, B, C, D, E | |
129 | + Mutexes: L1, L2, L3, L4 | |
130 | + | |
131 | + A owns: L1 | |
132 | + B blocked on L1 | |
133 | + B owns L2 | |
134 | + C blocked on L2 | |
135 | + C owns L3 | |
136 | + D blocked on L3 | |
137 | + D owns L4 | |
138 | + E blocked on L4 | |
139 | + | |
140 | +The chain would be: | |
141 | + | |
142 | + E->L4->D->L3->C->L2->B->L1->A | |
143 | + | |
144 | +To show where two chains merge, we could add another process F and | |
145 | +another mutex L5 where B owns L5 and F is blocked on mutex L5. | |
146 | + | |
147 | +The chain for F would be: | |
148 | + | |
149 | + F->L5->B->L1->A | |
150 | + | |
151 | +Since a process may own more than one mutex, but never be blocked on more than | |
152 | +one, the chains merge. | |
153 | + | |
154 | +Here we show both chains: | |
155 | + | |
156 | + E->L4->D->L3->C->L2-+ | |
157 | + | | |
158 | + +->B->L1->A | |
159 | + | | |
160 | + F->L5-+ | |
161 | + | |
162 | +For PI to work, the processes at the right end of these chains (or we may | |
163 | +also call it the Top of the chain) must be equal to or higher in priority | |
164 | +than the processes to the left or below in the chain. | |
165 | + | |
166 | +Also since a mutex may have more than one process blocked on it, we can | |
167 | +have multiple chains merge at mutexes. If we add another process G that is | |
168 | +blocked on mutex L2: | |
169 | + | |
170 | + G->L2->B->L1->A | |
171 | + | |
172 | +And once again, to show how this can grow I will show the merging chains | |
173 | +again. | |
174 | + | |
175 | + E->L4->D->L3->C-+ | |
176 | + +->L2-+ | |
177 | + | | | |
178 | + G-+ +->B->L1->A | |
179 | + | | |
180 | + F->L5-+ | |
181 | + | |
182 | + | |
183 | +Plist | |
184 | +----- | |
185 | + | |
186 | +Before I go further and talk about how the PI chain is stored through lists | |
187 | +on both mutexes and processes, I'll explain the plist. This is similar to | |
188 | +the struct list_head functionality that is already in the kernel. | |
189 | +The implementation of plist is out of scope for this document, but it is | |
190 | +very important to understand what it does. | |
191 | + | |
192 | +There are a few differences between plist and list, the most important one | |
193 | +being that plist is a priority sorted linked list. This means that the | |
194 | +priorities of the plist are sorted, such that it takes O(1) to retrieve the | |
195 | +highest priority item in the list. Obviously this is useful to store processes | |
196 | +based on their priorities. | |
197 | + | |
198 | +Another difference, which is important for implementation, is that, unlike | |
199 | +list, the head of the list is a different element than the nodes of a list. | |
200 | +So the head of the list is declared as struct plist_head and nodes that will | |
201 | +be added to the list are declared as struct plist_node. | |
202 | + | |
203 | + | |
204 | +Mutex Waiter List | |
205 | +----------------- | |
206 | + | |
207 | +Every mutex keeps track of all the waiters that are blocked on itself. The mutex | |
208 | +has a plist to store these waiters by priority. This list is protected by | |
209 | +a spin lock that is located in the struct of the mutex. This lock is called | |
210 | +wait_lock. Since the modification of the waiter list is never done in | |
211 | +interrupt context, the wait_lock can be taken without disabling interrupts. | |
212 | + | |
213 | + | |
214 | +Task PI List | |
215 | +------------ | |
216 | + | |
217 | +To keep track of the PI chains, each process has its own PI list. This is | |
218 | +a list of all top waiters of the mutexes that are owned by the process. | |
219 | +Note that this list only holds the top waiters and not all waiters that are | |
220 | +blocked on mutexes owned by the process. | |
221 | + | |
222 | +The top of the task's PI list is always the highest priority task that | |
223 | +is waiting on a mutex that is owned by the task. So if the task has | |
224 | +inherited a priority, it will always be the priority of the task that is | |
225 | +at the top of this list. | |
226 | + | |
227 | +This list is stored in the task structure of a process as a plist called | |
228 | +pi_list. This list is protected by a spin lock also in the task structure, | |
229 | +called pi_lock. This lock may also be taken in interrupt context, so when | |
230 | +locking the pi_lock, interrupts must be disabled. | |
231 | + | |
232 | + | |
233 | +Depth of the PI Chain | |
234 | +--------------------- | |
235 | + | |
236 | +The maximum depth of the PI chain is not dynamic, and could actually be | |
237 | +defined. But is very complex to figure it out, since it depends on all | |
238 | +the nesting of mutexes. Let's look at the example where we have 3 mutexes, | |
239 | +L1, L2, and L3, and four separate functions func1, func2, func3 and func4. | |
240 | +The following shows a locking order of L1->L2->L3, but may not actually | |
241 | +be directly nested that way. | |
242 | + | |
243 | +void func1(void) | |
244 | +{ | |
245 | + mutex_lock(L1); | |
246 | + | |
247 | + /* do anything */ | |
248 | + | |
249 | + mutex_unlock(L1); | |
250 | +} | |
251 | + | |
252 | +void func2(void) | |
253 | +{ | |
254 | + mutex_lock(L1); | |
255 | + mutex_lock(L2); | |
256 | + | |
257 | + /* do something */ | |
258 | + | |
259 | + mutex_unlock(L2); | |
260 | + mutex_unlock(L1); | |
261 | +} | |
262 | + | |
263 | +void func3(void) | |
264 | +{ | |
265 | + mutex_lock(L2); | |
266 | + mutex_lock(L3); | |
267 | + | |
268 | + /* do something else */ | |
269 | + | |
270 | + mutex_unlock(L3); | |
271 | + mutex_unlock(L2); | |
272 | +} | |
273 | + | |
274 | +void func4(void) | |
275 | +{ | |
276 | + mutex_lock(L3); | |
277 | + | |
278 | + /* do something again */ | |
279 | + | |
280 | + mutex_unlock(L3); | |
281 | +} | |
282 | + | |
283 | +Now we add 4 processes that run each of these functions separately. | |
284 | +Processes A, B, C, and D which run functions func1, func2, func3 and func4 | |
285 | +respectively, and such that D runs first and A last. With D being preempted | |
286 | +in func4 in the "do something again" area, we have a locking that follows: | |
287 | + | |
288 | +D owns L3 | |
289 | + C blocked on L3 | |
290 | + C owns L2 | |
291 | + B blocked on L2 | |
292 | + B owns L1 | |
293 | + A blocked on L1 | |
294 | + | |
295 | +And thus we have the chain A->L1->B->L2->C->L3->D. | |
296 | + | |
297 | +This gives us a PI depth of 4 (four processes), but looking at any of the | |
298 | +functions individually, it seems as though they only have at most a locking | |
299 | +depth of two. So, although the locking depth is defined at compile time, | |
300 | +it still is very difficult to find the possibilities of that depth. | |
301 | + | |
302 | +Now since mutexes can be defined by user-land applications, we don't want a DOS | |
303 | +type of application that nests large amounts of mutexes to create a large | |
304 | +PI chain, and have the code holding spin locks while looking at a large | |
305 | +amount of data. So to prevent this, the implementation not only implements | |
306 | +a maximum lock depth, but also only holds at most two different locks at a | |
307 | +time, as it walks the PI chain. More about this below. | |
308 | + | |
309 | + | |
310 | +Mutex owner and flags | |
311 | +--------------------- | |
312 | + | |
313 | +The mutex structure contains a pointer to the owner of the mutex. If the | |
314 | +mutex is not owned, this owner is set to NULL. Since all architectures | |
315 | +have the task structure on at least a four byte alignment (and if this is | |
316 | +not true, the rtmutex.c code will be broken!), this allows for the two | |
317 | +least significant bits to be used as flags. This part is also described | |
318 | +in Documentation/rt-mutex.txt, but will also be briefly described here. | |
319 | + | |
320 | +Bit 0 is used as the "Pending Owner" flag. This is described later. | |
321 | +Bit 1 is used as the "Has Waiters" flags. This is also described later | |
322 | + in more detail, but is set whenever there are waiters on a mutex. | |
323 | + | |
324 | + | |
325 | +cmpxchg Tricks | |
326 | +-------------- | |
327 | + | |
328 | +Some architectures implement an atomic cmpxchg (Compare and Exchange). This | |
329 | +is used (when applicable) to keep the fast path of grabbing and releasing | |
330 | +mutexes short. | |
331 | + | |
332 | +cmpxchg is basically the following function performed atomically: | |
333 | + | |
334 | +unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) | |
335 | +{ | |
336 | + unsigned long T = *A; | |
337 | + if (*A == *B) { | |
338 | + *A = *C; | |
339 | + } | |
340 | + return T; | |
341 | +} | |
342 | +#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) | |
343 | + | |
344 | +This is really nice to have, since it allows you to only update a variable | |
345 | +if the variable is what you expect it to be. You know if it succeeded if | |
346 | +the return value (the old value of A) is equal to B. | |
347 | + | |
348 | +The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If | |
349 | +the architecture does not support CMPXCHG, then this macro is simply set | |
350 | +to fail every time. But if CMPXCHG is supported, then this will | |
351 | +help out extremely to keep the fast path short. | |
352 | + | |
353 | +The use of rt_mutex_cmpxchg with the flags in the owner field help optimize | |
354 | +the system for architectures that support it. This will also be explained | |
355 | +later in this document. | |
356 | + | |
357 | + | |
358 | +Priority adjustments | |
359 | +-------------------- | |
360 | + | |
361 | +The implementation of the PI code in rtmutex.c has several places that a | |
362 | +process must adjust its priority. With the help of the pi_list of a | |
363 | +process this is rather easy to know what needs to be adjusted. | |
364 | + | |
365 | +The functions implementing the task adjustments are rt_mutex_adjust_prio, | |
366 | +__rt_mutex_adjust_prio (same as the former, but expects the task pi_lock | |
367 | +to already be taken), rt_mutex_get_prio, and rt_mutex_setprio. | |
368 | + | |
369 | +rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio. | |
370 | + | |
371 | +rt_mutex_getprio returns the priority that the task should have. Either the | |
372 | +task's own normal priority, or if a process of a higher priority is waiting on | |
373 | +a mutex owned by the task, then that higher priority should be returned. | |
374 | +Since the pi_list of a task holds an order by priority list of all the top | |
375 | +waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs | |
376 | +to compare the top pi waiter to its own normal priority, and return the higher | |
377 | +priority back. | |
378 | + | |
379 | +(Note: if looking at the code, you will notice that the lower number of | |
380 | + prio is returned. This is because the prio field in the task structure | |
381 | + is an inverse order of the actual priority. So a "prio" of 5 is | |
382 | + of higher priority than a "prio" of 10.) | |
383 | + | |
384 | +__rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the | |
385 | +result does not equal the task's current priority, then rt_mutex_setprio | |
386 | +is called to adjust the priority of the task to the new priority. | |
387 | +Note that rt_mutex_setprio is defined in kernel/sched.c to implement the | |
388 | +actual change in priority. | |
389 | + | |
390 | +It is interesting to note that __rt_mutex_adjust_prio can either increase | |
391 | +or decrease the priority of the task. In the case that a higher priority | |
392 | +process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio | |
393 | +would increase/boost the task's priority. But if a higher priority task | |
394 | +were for some reason to leave the mutex (timeout or signal), this same function | |
395 | +would decrease/unboost the priority of the task. That is because the pi_list | |
396 | +always contains the highest priority task that is waiting on a mutex owned | |
397 | +by the task, so we only need to compare the priority of that top pi waiter | |
398 | +to the normal priority of the given task. | |
399 | + | |
400 | + | |
401 | +High level overview of the PI chain walk | |
402 | +---------------------------------------- | |
403 | + | |
404 | +The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain. | |
405 | + | |
406 | +The implementation has gone through several iterations, and has ended up | |
407 | +with what we believe is the best. It walks the PI chain by only grabbing | |
408 | +at most two locks at a time, and is very efficient. | |
409 | + | |
410 | +The rt_mutex_adjust_prio_chain can be used either to boost or lower process | |
411 | +priorities. | |
412 | + | |
413 | +rt_mutex_adjust_prio_chain is called with a task to be checked for PI | |
414 | +(de)boosting (the owner of a mutex that a process is blocking on), a flag to | |
415 | +check for deadlocking, the mutex that the task owns, and a pointer to a waiter | |
416 | +that is the process's waiter struct that is blocked on the mutex (although this | |
417 | +parameter may be NULL for deboosting). | |
418 | + | |
419 | +For this explanation, I will not mention deadlock detection. This explanation | |
420 | +will try to stay at a high level. | |
421 | + | |
422 | +When this function is called, there are no locks held. That also means | |
423 | +that the state of the owner and lock can change when entered into this function. | |
424 | + | |
425 | +Before this function is called, the task has already had rt_mutex_adjust_prio | |
426 | +performed on it. This means that the task is set to the priority that it | |
427 | +should be at, but the plist nodes of the task's waiter have not been updated | |
428 | +with the new priorities, and that this task may not be in the proper locations | |
429 | +in the pi_lists and wait_lists that the task is blocked on. This function | |
430 | +solves all that. | |
431 | + | |
432 | +A loop is entered, where task is the owner to be checked for PI changes that | |
433 | +was passed by parameter (for the first iteration). The pi_lock of this task is | |
434 | +taken to prevent any more changes to the pi_list of the task. This also | |
435 | +prevents new tasks from completing the blocking on a mutex that is owned by this | |
436 | +task. | |
437 | + | |
438 | +If the task is not blocked on a mutex then the loop is exited. We are at | |
439 | +the top of the PI chain. | |
440 | + | |
441 | +A check is now done to see if the original waiter (the process that is blocked | |
442 | +on the current mutex) is the top pi waiter of the task. That is, is this | |
443 | +waiter on the top of the task's pi_list. If it is not, it either means that | |
444 | +there is another process higher in priority that is blocked on one of the | |
445 | +mutexes that the task owns, or that the waiter has just woken up via a signal | |
446 | +or timeout and has left the PI chain. In either case, the loop is exited, since | |
447 | +we don't need to do any more changes to the priority of the current task, or any | |
448 | +task that owns a mutex that this current task is waiting on. A priority chain | |
449 | +walk is only needed when a new top pi waiter is made to a task. | |
450 | + | |
451 | +The next check sees if the task's waiter plist node has the priority equal to | |
452 | +the priority the task is set at. If they are equal, then we are done with | |
453 | +the loop. Remember that the function started with the priority of the | |
454 | +task adjusted, but the plist nodes that hold the task in other processes | |
455 | +pi_lists have not been adjusted. | |
456 | + | |
457 | +Next, we look at the mutex that the task is blocked on. The mutex's wait_lock | |
458 | +is taken. This is done by a spin_trylock, because the locking order of the | |
459 | +pi_lock and wait_lock goes in the opposite direction. If we fail to grab the | |
460 | +lock, the pi_lock is released, and we restart the loop. | |
461 | + | |
462 | +Now that we have both the pi_lock of the task as well as the wait_lock of | |
463 | +the mutex the task is blocked on, we update the task's waiter's plist node | |
464 | +that is located on the mutex's wait_list. | |
465 | + | |
466 | +Now we release the pi_lock of the task. | |
467 | + | |
468 | +Next the owner of the mutex has its pi_lock taken, so we can update the | |
469 | +task's entry in the owner's pi_list. If the task is the highest priority | |
470 | +process on the mutex's wait_list, then we remove the previous top waiter | |
471 | +from the owner's pi_list, and replace it with the task. | |
472 | + | |
473 | +Note: It is possible that the task was the current top waiter on the mutex, | |
474 | + in which case the task is not yet on the pi_list of the waiter. This | |
475 | + is OK, since plist_del does nothing if the plist node is not on any | |
476 | + list. | |
477 | + | |
478 | +If the task was not the top waiter of the mutex, but it was before we | |
479 | +did the priority updates, that means we are deboosting/lowering the | |
480 | +task. In this case, the task is removed from the pi_list of the owner, | |
481 | +and the new top waiter is added. | |
482 | + | |
483 | +Lastly, we unlock both the pi_lock of the task, as well as the mutex's | |
484 | +wait_lock, and continue the loop again. On the next iteration of the | |
485 | +loop, the previous owner of the mutex will be the task that will be | |
486 | +processed. | |
487 | + | |
488 | +Note: One might think that the owner of this mutex might have changed | |
489 | + since we just grab the mutex's wait_lock. And one could be right. | |
490 | + The important thing to remember is that the owner could not have | |
491 | + become the task that is being processed in the PI chain, since | |
492 | + we have taken that task's pi_lock at the beginning of the loop. | |
493 | + So as long as there is an owner of this mutex that is not the same | |
494 | + process as the tasked being worked on, we are OK. | |
495 | + | |
496 | + Looking closely at the code, one might be confused. The check for the | |
497 | + end of the PI chain is when the task isn't blocked on anything or the | |
498 | + task's waiter structure "task" element is NULL. This check is | |
499 | + protected only by the task's pi_lock. But the code to unlock the mutex | |
500 | + sets the task's waiter structure "task" element to NULL with only | |
501 | + the protection of the mutex's wait_lock, which was not taken yet. | |
502 | + Isn't this a race condition if the task becomes the new owner? | |
503 | + | |
504 | + The answer is No! The trick is the spin_trylock of the mutex's | |
505 | + wait_lock. If we fail that lock, we release the pi_lock of the | |
506 | + task and continue the loop, doing the end of PI chain check again. | |
507 | + | |
508 | + In the code to release the lock, the wait_lock of the mutex is held | |
509 | + the entire time, and it is not let go when we grab the pi_lock of the | |
510 | + new owner of the mutex. So if the switch of a new owner were to happen | |
511 | + after the check for end of the PI chain and the grabbing of the | |
512 | + wait_lock, the unlocking code would spin on the new owner's pi_lock | |
513 | + but never give up the wait_lock. So the PI chain loop is guaranteed to | |
514 | + fail the spin_trylock on the wait_lock, release the pi_lock, and | |
515 | + try again. | |
516 | + | |
517 | + If you don't quite understand the above, that's OK. You don't have to, | |
518 | + unless you really want to make a proof out of it ;) | |
519 | + | |
520 | + | |
521 | +Pending Owners and Lock stealing | |
522 | +-------------------------------- | |
523 | + | |
524 | +One of the flags in the owner field of the mutex structure is "Pending Owner". | |
525 | +What this means is that an owner was chosen by the process releasing the | |
526 | +mutex, but that owner has yet to wake up and actually take the mutex. | |
527 | + | |
528 | +Why is this important? Why can't we just give the mutex to another process | |
529 | +and be done with it? | |
530 | + | |
531 | +The PI code is to help with real-time processes, and to let the highest | |
532 | +priority process run as long as possible with little latencies and delays. | |
533 | +If a high priority process owns a mutex that a lower priority process is | |
534 | +blocked on, when the mutex is released it would be given to the lower priority | |
535 | +process. What if the higher priority process wants to take that mutex again. | |
536 | +The high priority process would fail to take that mutex that it just gave up | |
537 | +and it would need to boost the lower priority process to run with full | |
538 | +latency of that critical section (since the low priority process just entered | |
539 | +it). | |
540 | + | |
541 | +There's no reason a high priority process that gives up a mutex should be | |
542 | +penalized if it tries to take that mutex again. If the new owner of the | |
543 | +mutex has not woken up yet, there's no reason that the higher priority process | |
544 | +could not take that mutex away. | |
545 | + | |
546 | +To solve this, we introduced Pending Ownership and Lock Stealing. When a | |
547 | +new process is given a mutex that it was blocked on, it is only given | |
548 | +pending ownership. This means that it's the new owner, unless a higher | |
549 | +priority process comes in and tries to grab that mutex. If a higher priority | |
550 | +process does come along and wants that mutex, we let the higher priority | |
551 | +process "steal" the mutex from the pending owner (only if it is still pending) | |
552 | +and continue with the mutex. | |
553 | + | |
554 | + | |
555 | +Taking of a mutex (The walk through) | |
556 | +------------------------------------ | |
557 | + | |
558 | +OK, now let's take a look at the detailed walk through of what happens when | |
559 | +taking a mutex. | |
560 | + | |
561 | +The first thing that is tried is the fast taking of the mutex. This is | |
562 | +done when we have CMPXCHG enabled (otherwise the fast taking automatically | |
563 | +fails). Only when the owner field of the mutex is NULL can the lock be | |
564 | +taken with the CMPXCHG and nothing else needs to be done. | |
565 | + | |
566 | +If there is contention on the lock, whether it is owned or pending owner | |
567 | +we go about the slow path (rt_mutex_slowlock). | |
568 | + | |
569 | +The slow path function is where the task's waiter structure is created on | |
570 | +the stack. This is because the waiter structure is only needed for the | |
571 | +scope of this function. The waiter structure holds the nodes to store | |
572 | +the task on the wait_list of the mutex, and if need be, the pi_list of | |
573 | +the owner. | |
574 | + | |
575 | +The wait_lock of the mutex is taken since the slow path of unlocking the | |
576 | +mutex also takes this lock. | |
577 | + | |
578 | +We then call try_to_take_rt_mutex. This is where the architecture that | |
579 | +does not implement CMPXCHG would always grab the lock (if there's no | |
580 | +contention). | |
581 | + | |
582 | +try_to_take_rt_mutex is used every time the task tries to grab a mutex in the | |
583 | +slow path. The first thing that is done here is an atomic setting of | |
584 | +the "Has Waiters" flag of the mutex's owner field. Yes, this could really | |
585 | +be false, because if the the mutex has no owner, there are no waiters and | |
586 | +the current task also won't have any waiters. But we don't have the lock | |
587 | +yet, so we assume we are going to be a waiter. The reason for this is to | |
588 | +play nice for those architectures that do have CMPXCHG. By setting this flag | |
589 | +now, the owner of the mutex can't release the mutex without going into the | |
590 | +slow unlock path, and it would then need to grab the wait_lock, which this | |
591 | +code currently holds. So setting the "Has Waiters" flag forces the owner | |
592 | +to synchronize with this code. | |
593 | + | |
594 | +Now that we know that we can't have any races with the owner releasing the | |
595 | +mutex, we check to see if we can take the ownership. This is done if the | |
596 | +mutex doesn't have a owner, or if we can steal the mutex from a pending | |
597 | +owner. Let's look at the situations we have here. | |
598 | + | |
599 | + 1) Has owner that is pending | |
600 | + ---------------------------- | |
601 | + | |
602 | + The mutex has a owner, but it hasn't woken up and the mutex flag | |
603 | + "Pending Owner" is set. The first check is to see if the owner isn't the | |
604 | + current task. This is because this function is also used for the pending | |
605 | + owner to grab the mutex. When a pending owner wakes up, it checks to see | |
606 | + if it can take the mutex, and this is done if the owner is already set to | |
607 | + itself. If so, we succeed and leave the function, clearing the "Pending | |
608 | + Owner" bit. | |
609 | + | |
610 | + If the pending owner is not current, we check to see if the current priority is | |
611 | + higher than the pending owner. If not, we fail the function and return. | |
612 | + | |
613 | + There's also something special about a pending owner. That is a pending owner | |
614 | + is never blocked on a mutex. So there is no PI chain to worry about. It also | |
615 | + means that if the mutex doesn't have any waiters, there's no accounting needed | |
616 | + to update the pending owner's pi_list, since we only worry about processes | |
617 | + blocked on the current mutex. | |
618 | + | |
619 | + If there are waiters on this mutex, and we just stole the ownership, we need | |
620 | + to take the top waiter, remove it from the pi_list of the pending owner, and | |
621 | + add it to the current pi_list. Note that at this moment, the pending owner | |
622 | + is no longer on the list of waiters. This is fine, since the pending owner | |
623 | + would add itself back when it realizes that it had the ownership stolen | |
624 | + from itself. When the pending owner tries to grab the mutex, it will fail | |
625 | + in try_to_take_rt_mutex if the owner field points to another process. | |
626 | + | |
627 | + 2) No owner | |
628 | + ----------- | |
629 | + | |
630 | + If there is no owner (or we successfully stole the lock), we set the owner | |
631 | + of the mutex to current, and set the flag of "Has Waiters" if the current | |
632 | + mutex actually has waiters, or we clear the flag if it doesn't. See, it was | |
633 | + OK that we set that flag early, since now it is cleared. | |
634 | + | |
635 | + 3) Failed to grab ownership | |
636 | + --------------------------- | |
637 | + | |
638 | + The most interesting case is when we fail to take ownership. This means that | |
639 | + there exists an owner, or there's a pending owner with equal or higher | |
640 | + priority than the current task. | |
641 | + | |
642 | +We'll continue on the failed case. | |
643 | + | |
644 | +If the mutex has a timeout, we set up a timer to go off to break us out | |
645 | +of this mutex if we failed to get it after a specified amount of time. | |
646 | + | |
647 | +Now we enter a loop that will continue to try to take ownership of the mutex, or | |
648 | +fail from a timeout or signal. | |
649 | + | |
650 | +Once again we try to take the mutex. This will usually fail the first time | |
651 | +in the loop, since it had just failed to get the mutex. But the second time | |
652 | +in the loop, this would likely succeed, since the task would likely be | |
653 | +the pending owner. | |
654 | + | |
655 | +If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done | |
656 | +here. | |
657 | + | |
658 | +The waiter structure has a "task" field that points to the task that is blocked | |
659 | +on the mutex. This field can be NULL the first time it goes through the loop | |
660 | +or if the task is a pending owner and had it's mutex stolen. If the "task" | |
661 | +field is NULL then we need to set up the accounting for it. | |
662 | + | |
663 | +Task blocks on mutex | |
664 | +-------------------- | |
665 | + | |
666 | +The accounting of a mutex and process is done with the waiter structure of | |
667 | +the process. The "task" field is set to the process, and the "lock" field | |
668 | +to the mutex. The plist nodes are initialized to the processes current | |
669 | +priority. | |
670 | + | |
671 | +Since the wait_lock was taken at the entry of the slow lock, we can safely | |
672 | +add the waiter to the wait_list. If the current process is the highest | |
673 | +priority process currently waiting on this mutex, then we remove the | |
674 | +previous top waiter process (if it exists) from the pi_list of the owner, | |
675 | +and add the current process to that list. Since the pi_list of the owner | |
676 | +has changed, we call rt_mutex_adjust_prio on the owner to see if the owner | |
677 | +should adjust its priority accordingly. | |
678 | + | |
679 | +If the owner is also blocked on a lock, and had its pi_list changed | |
680 | +(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead | |
681 | +and run rt_mutex_adjust_prio_chain on the owner, as described earlier. | |
682 | + | |
683 | +Now all locks are released, and if the current process is still blocked on a | |
684 | +mutex (waiter "task" field is not NULL), then we go to sleep (call schedule). | |
685 | + | |
686 | +Waking up in the loop | |
687 | +--------------------- | |
688 | + | |
689 | +The schedule can then wake up for a few reasons. | |
690 | + 1) we were given pending ownership of the mutex. | |
691 | + 2) we received a signal and was TASK_INTERRUPTIBLE | |
692 | + 3) we had a timeout and was TASK_INTERRUPTIBLE | |
693 | + | |
694 | +In any of these cases, we continue the loop and once again try to grab the | |
695 | +ownership of the mutex. If we succeed, we exit the loop, otherwise we continue | |
696 | +and on signal and timeout, will exit the loop, or if we had the mutex stolen | |
697 | +we just simply add ourselves back on the lists and go back to sleep. | |
698 | + | |
699 | +Note: For various reasons, because of timeout and signals, the steal mutex | |
700 | + algorithm needs to be careful. This is because the current process is | |
701 | + still on the wait_list. And because of dynamic changing of priorities, | |
702 | + especially on SCHED_OTHER tasks, the current process can be the | |
703 | + highest priority task on the wait_list. | |
704 | + | |
705 | +Failed to get mutex on Timeout or Signal | |
706 | +---------------------------------------- | |
707 | + | |
708 | +If a timeout or signal occurred, the waiter's "task" field would not be | |
709 | +NULL and the task needs to be taken off the wait_list of the mutex and perhaps | |
710 | +pi_list of the owner. If this process was a high priority process, then | |
711 | +the rt_mutex_adjust_prio_chain needs to be executed again on the owner, | |
712 | +but this time it will be lowering the priorities. | |
713 | + | |
714 | + | |
715 | +Unlocking the Mutex | |
716 | +------------------- | |
717 | + | |
718 | +The unlocking of a mutex also has a fast path for those architectures with | |
719 | +CMPXCHG. Since the taking of a mutex on contention always sets the | |
720 | +"Has Waiters" flag of the mutex's owner, we use this to know if we need to | |
721 | +take the slow path when unlocking the mutex. If the mutex doesn't have any | |
722 | +waiters, the owner field of the mutex would equal the current process and | |
723 | +the mutex can be unlocked by just replacing the owner field with NULL. | |
724 | + | |
725 | +If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available), | |
726 | +the slow unlock path is taken. | |
727 | + | |
728 | +The first thing done in the slow unlock path is to take the wait_lock of the | |
729 | +mutex. This synchronizes the locking and unlocking of the mutex. | |
730 | + | |
731 | +A check is made to see if the mutex has waiters or not. On architectures that | |
732 | +do not have CMPXCHG, this is the location that the owner of the mutex will | |
733 | +determine if a waiter needs to be awoken or not. On architectures that | |
734 | +do have CMPXCHG, that check is done in the fast path, but it is still needed | |
735 | +in the slow path too. If a waiter of a mutex woke up because of a signal | |
736 | +or timeout between the time the owner failed the fast path CMPXCHG check and | |
737 | +the grabbing of the wait_lock, the mutex may not have any waiters, thus the | |
738 | +owner still needs to make this check. If there are no waiters than the mutex | |
739 | +owner field is set to NULL, the wait_lock is released and nothing more is | |
740 | +needed. | |
741 | + | |
742 | +If there are waiters, then we need to wake one up and give that waiter | |
743 | +pending ownership. | |
744 | + | |
745 | +On the wake up code, the pi_lock of the current owner is taken. The top | |
746 | +waiter of the lock is found and removed from the wait_list of the mutex | |
747 | +as well as the pi_list of the current owner. The task field of the new | |
748 | +pending owner's waiter structure is set to NULL, and the owner field of the | |
749 | +mutex is set to the new owner with the "Pending Owner" bit set, as well | |
750 | +as the "Has Waiters" bit if there still are other processes blocked on the | |
751 | +mutex. | |
752 | + | |
753 | +The pi_lock of the previous owner is released, and the new pending owner's | |
754 | +pi_lock is taken. Remember that this is the trick to prevent the race | |
755 | +condition in rt_mutex_adjust_prio_chain from adding itself as a waiter | |
756 | +on the mutex. | |
757 | + | |
758 | +We now clear the "pi_blocked_on" field of the new pending owner, and if | |
759 | +the mutex still has waiters pending, we add the new top waiter to the pi_list | |
760 | +of the pending owner. | |
761 | + | |
762 | +Finally we unlock the pi_lock of the pending owner and wake it up. | |
763 | + | |
764 | + | |
765 | +Contact | |
766 | +------- | |
767 | + | |
768 | +For updates on this document, please email Steven Rostedt <rostedt@goodmis.org> | |
769 | + | |
770 | + | |
771 | +Credits | |
772 | +------- | |
773 | + | |
774 | +Author: Steven Rostedt <rostedt@goodmis.org> | |
775 | + | |
776 | +Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap | |
777 | + | |
778 | +Updates | |
779 | +------- | |
780 | + | |
781 | +This document was originally written for 2.6.17-rc3-mm1 |
Documentation/rt-mutex.txt
1 | +RT-mutex subsystem with PI support | |
2 | +---------------------------------- | |
3 | + | |
4 | +RT-mutexes with priority inheritance are used to support PI-futexes, | |
5 | +which enable pthread_mutex_t priority inheritance attributes | |
6 | +(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details | |
7 | +about PI-futexes.] | |
8 | + | |
9 | +This technology was developed in the -rt tree and streamlined for | |
10 | +pthread_mutex support. | |
11 | + | |
12 | +Basic principles: | |
13 | +----------------- | |
14 | + | |
15 | +RT-mutexes extend the semantics of simple mutexes by the priority | |
16 | +inheritance protocol. | |
17 | + | |
18 | +A low priority owner of a rt-mutex inherits the priority of a higher | |
19 | +priority waiter until the rt-mutex is released. If the temporarily | |
20 | +boosted owner blocks on a rt-mutex itself it propagates the priority | |
21 | +boosting to the owner of the other rt_mutex it gets blocked on. The | |
22 | +priority boosting is immediately removed once the rt_mutex has been | |
23 | +unlocked. | |
24 | + | |
25 | +This approach allows us to shorten the block of high-prio tasks on | |
26 | +mutexes which protect shared resources. Priority inheritance is not a | |
27 | +magic bullet for poorly designed applications, but it allows | |
28 | +well-designed applications to use userspace locks in critical parts of | |
29 | +an high priority thread, without losing determinism. | |
30 | + | |
31 | +The enqueueing of the waiters into the rtmutex waiter list is done in | |
32 | +priority order. For same priorities FIFO order is chosen. For each | |
33 | +rtmutex, only the top priority waiter is enqueued into the owner's | |
34 | +priority waiters list. This list too queues in priority order. Whenever | |
35 | +the top priority waiter of a task changes (for example it timed out or | |
36 | +got a signal), the priority of the owner task is readjusted. [The | |
37 | +priority enqueueing is handled by "plists", see include/linux/plist.h | |
38 | +for more details.] | |
39 | + | |
40 | +RT-mutexes are optimized for fastpath operations and have no internal | |
41 | +locking overhead when locking an uncontended mutex or unlocking a mutex | |
42 | +without waiters. The optimized fastpath operations require cmpxchg | |
43 | +support. [If that is not available then the rt-mutex internal spinlock | |
44 | +is used] | |
45 | + | |
46 | +The state of the rt-mutex is tracked via the owner field of the rt-mutex | |
47 | +structure: | |
48 | + | |
49 | +rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1 | |
50 | +are used to keep track of the "owner is pending" and "rtmutex has | |
51 | +waiters" state. | |
52 | + | |
53 | + owner bit1 bit0 | |
54 | + NULL 0 0 mutex is free (fast acquire possible) | |
55 | + NULL 0 1 invalid state | |
56 | + NULL 1 0 Transitional state* | |
57 | + NULL 1 1 invalid state | |
58 | + taskpointer 0 0 mutex is held (fast release possible) | |
59 | + taskpointer 0 1 task is pending owner | |
60 | + taskpointer 1 0 mutex is held and has waiters | |
61 | + taskpointer 1 1 task is pending owner and mutex has waiters | |
62 | + | |
63 | +Pending-ownership handling is a performance optimization: | |
64 | +pending-ownership is assigned to the first (highest priority) waiter of | |
65 | +the mutex, when the mutex is released. The thread is woken up and once | |
66 | +it starts executing it can acquire the mutex. Until the mutex is taken | |
67 | +by it (bit 0 is cleared) a competing higher priority thread can "steal" | |
68 | +the mutex which puts the woken up thread back on the waiters list. | |
69 | + | |
70 | +The pending-ownership optimization is especially important for the | |
71 | +uninterrupted workflow of high-prio tasks which repeatedly | |
72 | +takes/releases locks that have lower-prio waiters. Without this | |
73 | +optimization the higher-prio thread would ping-pong to the lower-prio | |
74 | +task [because at unlock time we always assign a new owner]. | |
75 | + | |
76 | +(*) The "mutex has waiters" bit gets set to take the lock. If the lock | |
77 | +doesn't already have an owner, this bit is quickly cleared if there are | |
78 | +no waiters. So this is a transitional state to synchronize with looking | |
79 | +at the owner field of the mutex and the mutex owner releasing the lock. |