rtmutex: Simplify PI algorithm and make highest prio task get lock

In current rtmutex, the pending owner may be boosted by the tasks in the rtmutex's waitlist when the pending owner is deboosted or a task in the waitlist is boosted. This boosting is unrelated, because the pending owner does not really take the rtmutex. It is not reasonable. Example. time1: A(high prio) onwers the rtmutex. B(mid prio) and C (low prio) in the waitlist. time2 A release the lock, B becomes the pending owner A(or other high prio task) continues to run. B's prio is lower than A, so B is just queued at the runqueue. time3 A or other high prio task sleeps, but we have passed some time The B and C's prio are changed in the period (time2 ~ time3) due to boosting or deboosting. Now C has the priority higher than B. ***Is it reasonable that C has to boost B and help B to get the rtmutex? NO!! I think, it is unrelated/unneed boosting before B really owns the rtmutex. We should give C a chance to beat B and win the rtmutex. This is the motivation of this patch. This patch *ensures* only the top waiter or higher priority task can take the lock. How? 1) we don't dequeue the top waiter when unlock, if the top waiter is changed, the old top waiter will fail and go to sleep again. 2) when requiring lock, it will get the lock when the lock is not taken and: there is no waiter OR higher priority than waiters OR it is top waiter. 3) In any time, the top waiter is changed, the top waiter will be woken up. The algorithm is much simpler than before, no pending owner, no boosting for pending owner. Other advantage of this patch: 1) The states of a rtmutex are reduced a half, easier to read the code. 2) the codes become shorter. 3) top waiter is not dequeued until it really take the lock: they will retain FIFO when it is stolen. Not advantage nor disadvantage 1) Even we may wakeup multiple waiters(any time when top waiter changed), we hardly cause "thundering herd", the number of wokenup task is likely 1 or very little. 2) two APIs are changed. rt_mutex_owner() will not return pending owner, it will return NULL when the top waiter is going to take the lock. rt_mutex_next_owner() always return the top waiter. will not return NULL if we have waiters because the top waiter is not dequeued. I have fixed the code that use these APIs. need updated after this patch is accepted 1) Document/* 2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

rtmutex: Simplify PI algorithm and make highest prio task get lock
In current rtmutex, the pending owner may be boosted by the tasks in the rtmutex's waitlist when the pending owner is deboosted or a task in the waitlist is boosted. This boosting is unrelated, because the pending owner does not really take the rtmutex. It is not reasonable. Example. time1: A(high prio) onwers the rtmutex. B(mid prio) and C (low prio) in the waitlist. time2 A release the lock, B becomes the pending owner A(or other high prio task) continues to run. B's prio is lower than A, so B is just queued at the runqueue. time3 A or other high prio task sleeps, but we have passed some time The B and C's prio are changed in the period (time2 ~ time3) due to boosting or deboosting. Now C has the priority higher than B. ***Is it reasonable that C has to boost B and help B to get the rtmutex? NO!! I think, it is unrelated/unneed boosting before B really owns the rtmutex. We should give C a chance to beat B and win the rtmutex. This is the motivation of this patch. This patch *ensures* only the top waiter or higher priority task can take the lock. How? 1) we don't dequeue the top waiter when unlock, if the top waiter is changed, the old top waiter will fail and go to sleep again. 2) when requiring lock, it will get the lock when the lock is not taken and: there is no waiter OR higher priority than waiters OR it is top waiter. 3) In any time, the top waiter is changed, the top waiter will be woken up. The algorithm is much simpler than before, no pending owner, no boosting for pending owner. Other advantage of this patch: 1) The states of a rtmutex are reduced a half, easier to read the code. 2) the codes become shorter. 3) top waiter is not dequeued until it really take the lock: they will retain FIFO when it is stolen. Not advantage nor disadvantage 1) Even we may wakeup multiple waiters(any time when top waiter changed), we hardly cause "thundering herd", the number of wokenup task is likely 1 or very little. 2) two APIs are changed. rt_mutex_owner() will not return pending owner, it will return NULL when the top waiter is going to take the lock. rt_mutex_next_owner() always return the top waiter. will not return NULL if we have waiters because the top waiter is not dequeued. I have fixed the code that use these APIs. need updated after this patch is accepted 1) Document/* 2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Lai Jiangshan · Steven Rostedt
1 parent 6fb1b30425
Showing 4 changed files with 127 additions and 230 deletions Side-by-side Diff
kernel/futex.c
kernel/rtmutex-debug.c
kernel/rtmutex.c
kernel/rtmutex_common.h
@@ -1556,10 +1556,10 @@
  
 	/*
 	 * We are here either because we stole the rtmutex from the
-	 * pending owner or we are the pending owner which failed to
-	 * get the rtmutex. We have to replace the pending owner TID
-	 * in the user space variable. This must be atomic as we have
-	 * to preserve the owner died bit here.
+	 * previous highest priority waiter or we are the highest priority
+	 * waiter but failed to get the rtmutex the first time.
+	 * We have to replace the newowner TID in the user space variable.
+	 * This must be atomic as we have to preserve the owner died bit here.
 	 *
 	 * Note: We write the user space value _before_ changing the pi_state
 	 * because we can fault here. Imagine swapped out pages or a fork
@@ -1608,8 +1608,8 @@
  
 	/*
 	 * To handle the page fault we need to drop the hash bucket
-	 * lock here. That gives the other task (either the pending
-	 * owner itself or the task which stole the rtmutex) the
+	 * lock here. That gives the other task (either the highest priority
+	 * waiter itself or the task which stole the rtmutex) the
 	 * chance to try the fixup of the pi_state. So once we are
 	 * back from handling the fault we need to check the pi_state
 	 * after reacquiring the hash bucket lock and before trying to
  
  
  
@@ -1685,18 +1685,20 @@
 		/*
 		 * pi_state is incorrect, some other task did a lock steal and
 		 * we returned due to timeout or signal without taking the
-		 * rt_mutex. Too late. We can access the rt_mutex_owner without
-		 * locking, as the other task is now blocked on the hash bucket
-		 * lock. Fix the state up.
+		 * rt_mutex. Too late.
 		 */
+		raw_spin_lock(&q->pi_state->pi_mutex.wait_lock);
 		owner = rt_mutex_owner(&q->pi_state->pi_mutex);
+		if (!owner)
+			owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
+		raw_spin_unlock(&q->pi_state->pi_mutex.wait_lock);
 		ret = fixup_pi_state_owner(uaddr, q, owner);
 		goto out;
 	}
  
 	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
-	 * the owner, nor the pending owner, of the rt_mutex.
+	 * the owner of the rt_mutex.
 	 */
 	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
 		printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p "
@@ -215,7 +215,6 @@
 	put_pid(waiter->deadlock_task_pid);
 	TRACE_WARN_ON(!plist_node_empty(&waiter->list_entry));
 	TRACE_WARN_ON(!plist_node_empty(&waiter->pi_list_entry));
-	TRACE_WARN_ON(waiter->task);
 	memset(waiter, 0x22, sizeof(*waiter));
 }
  
@@ -20,41 +20,34 @@
 /*
  * lock->owner state tracking:
  *
- * lock->owner holds the task_struct pointer of the owner. Bit 0 and 1
- * are used to keep track of the "owner is pending" and "lock has
- * waiters" state.
+ * lock->owner holds the task_struct pointer of the owner. Bit 0
+ * is used to keep track of the "lock has waiters" state.
  *
- * owner	bit1	bit0
- * NULL		0	0	lock is free (fast acquire possible)
- * NULL		0	1	invalid state
- * NULL		1	0	Transitional State*
- * NULL		1	1	invalid state
- * taskpointer	0	0	lock is held (fast release possible)
- * taskpointer	0	1	task is pending owner
- * taskpointer	1	0	lock is held and has waiters
- * taskpointer	1	1	task is pending owner and lock has more waiters
+ * owner	bit0
+ * NULL		0	lock is free (fast acquire possible)
+ * NULL		1	lock is free and has waiters and the top waiter
+ *				is going to take the lock*
+ * taskpointer	0	lock is held (fast release possible)
+ * taskpointer	1	lock is held and has waiters**
  *
- * Pending ownership is assigned to the top (highest priority)
- * waiter of the lock, when the lock is released. The thread is woken
- * up and can now take the lock. Until the lock is taken (bit 0
- * cleared) a competing higher priority thread can steal the lock
- * which puts the woken up thread back on the waiters list.
- *
  * The fast atomic compare exchange based acquire and release is only
- * possible when bit 0 and 1 of lock->owner are 0.
+ * possible when bit 0 of lock->owner is 0.
  *
- * (*) There's a small time where the owner can be NULL and the
- * "lock has waiters" bit is set.  This can happen when grabbing the lock.
- * To prevent a cmpxchg of the owner releasing the lock, we need to set this
- * bit before looking at the lock, hence the reason this is a transitional
- * state.
+ * (*) It also can be a transitional state when grabbing the lock
+ * with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
+ * we need to set the bit0 before looking at the lock, and the owner may be
+ * NULL in this small time, hence this can be a transitional state.
+ *
+ * (**) There is a small time when bit 0 is set but there are no
+ * waiters. This can happen when grabbing the lock in the slow path.
+ * To prevent a cmpxchg of the owner releasing the lock, we need to
+ * set this bit before looking at the lock.
  */
  
 static void
-rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner,
-		   unsigned long mask)
+rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
 {
-	unsigned long val = (unsigned long)owner | mask;
+	unsigned long val = (unsigned long)owner;
  
 	if (rt_mutex_has_waiters(lock))
 		val |= RT_MUTEX_HAS_WAITERS;
  
  
@@ -203,15 +196,14 @@
 	 * reached or the state of the chain has changed while we
 	 * dropped the locks.
 	 */
-	if (!waiter || !waiter->task)
+	if (!waiter)
 		goto out_unlock_pi;
  
 	/*
 	 * Check the orig_waiter state. After we dropped the locks,
-	 * the previous owner of the lock might have released the lock
-	 * and made us the pending owner:
+	 * the previous owner of the lock might have released the lock.
 	 */
-	if (orig_waiter && !orig_waiter->task)
+	if (orig_waiter && !rt_mutex_owner(orig_lock))
 		goto out_unlock_pi;
  
 	/*
@@ -254,6 +246,17 @@
  
 	/* Release the task */
 	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+	if (!rt_mutex_owner(lock)) {
+		/*
+		 * If the requeue above changed the top waiter, then we need
+		 * to wake the new top waiter up to try to get the lock.
+		 */
+
+		if (top_waiter != rt_mutex_top_waiter(lock))
+			wake_up_process(rt_mutex_top_waiter(lock)->task);
+		raw_spin_unlock(&lock->wait_lock);
+		goto out_put_task;
+	}
 	put_task_struct(task);
  
 	/* Grab the next task */
  
  
  
@@ -296,78 +299,16 @@
 }
  
 /*
- * Optimization: check if we can steal the lock from the
- * assigned pending owner [which might not have taken the
- * lock yet]:
- */
-static inline int try_to_steal_lock(struct rt_mutex *lock,
-				    struct task_struct *task)
-{
-	struct task_struct *pendowner = rt_mutex_owner(lock);
-	struct rt_mutex_waiter *next;
-	unsigned long flags;
-
-	if (!rt_mutex_owner_pending(lock))
-		return 0;
-
-	if (pendowner == task)
-		return 1;
-
-	raw_spin_lock_irqsave(&pendowner->pi_lock, flags);
-	if (task->prio >= pendowner->prio) {
-		raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
-		return 0;
-	}
-
-	/*
-	 * Check if a waiter is enqueued on the pending owners
-	 * pi_waiters list. Remove it and readjust pending owners
-	 * priority.
-	 */
-	if (likely(!rt_mutex_has_waiters(lock))) {
-		raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
-		return 1;
-	}
-
-	/* No chain handling, pending owner is not blocked on anything: */
-	next = rt_mutex_top_waiter(lock);
-	plist_del(&next->pi_list_entry, &pendowner->pi_waiters);
-	__rt_mutex_adjust_prio(pendowner);
-	raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
-
-	/*
-	 * We are going to steal the lock and a waiter was
-	 * enqueued on the pending owners pi_waiters queue. So
-	 * we have to enqueue this waiter into
-	 * task->pi_waiters list. This covers the case,
-	 * where task is boosted because it holds another
-	 * lock and gets unboosted because the booster is
-	 * interrupted, so we would delay a waiter with higher
-	 * priority as task->normal_prio.
-	 *
-	 * Note: in the rare case of a SCHED_OTHER task changing
-	 * its priority and thus stealing the lock, next->task
-	 * might be task:
-	 */
-	if (likely(next->task != task)) {
-		raw_spin_lock_irqsave(&task->pi_lock, flags);
-		plist_add(&next->pi_list_entry, &task->pi_waiters);
-		__rt_mutex_adjust_prio(task);
-		raw_spin_unlock_irqrestore(&task->pi_lock, flags);
-	}
-	return 1;
-}
-
-/*
  * Try to take an rt-mutex
  *
- * This fails
- * - when the lock has a real owner
- * - when a different pending owner exists and has higher priority than current
- *
  * Must be called with lock->wait_lock held.
+ *
+ * @lock:   the lock to be acquired.
+ * @task:   the task which wants to acquire the lock
+ * @waiter: the waiter that is queued to the lock's wait list. (could be NULL)
  */
-static int try_to_take_rt_mutex(struct rt_mutex *lock)
+static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
+		struct rt_mutex_waiter *waiter)
 {
 	/*
 	 * We have to be careful here if the atomic speedups are
  
  
  
@@ -390,15 +331,52 @@
 	 */
 	mark_rt_mutex_waiters(lock);
  
-	if (rt_mutex_owner(lock) && !try_to_steal_lock(lock, current))
+	if (rt_mutex_owner(lock))
 		return 0;
  
+	/*
+	 * It will get the lock because of one of these conditions:
+	 * 1) there is no waiter
+	 * 2) higher priority than waiters
+	 * 3) it is top waiter
+	 */
+	if (rt_mutex_has_waiters(lock)) {
+		if (task->prio >= rt_mutex_top_waiter(lock)->list_entry.prio) {
+			if (!waiter || waiter != rt_mutex_top_waiter(lock))
+				return 0;
+		}
+	}
+
+	if (waiter || rt_mutex_has_waiters(lock)) {
+		unsigned long flags;
+		struct rt_mutex_waiter *top;
+
+		raw_spin_lock_irqsave(&task->pi_lock, flags);
+
+		/* remove the queued waiter. */
+		if (waiter) {
+			plist_del(&waiter->list_entry, &lock->wait_list);
+			task->pi_blocked_on = NULL;
+		}
+
+		/*
+		 * We have to enqueue the top waiter(if it exists) into
+		 * task->pi_waiters list.
+		 */
+		if (rt_mutex_has_waiters(lock)) {
+			top = rt_mutex_top_waiter(lock);
+			top->pi_list_entry.prio = top->list_entry.prio;
+			plist_add(&top->pi_list_entry, &task->pi_waiters);
+		}
+		raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+	}
+
 	/* We got the lock. */
 	debug_rt_mutex_lock(lock);
  
-	rt_mutex_set_owner(lock, current, 0);
+	rt_mutex_set_owner(lock, task);
  
-	rt_mutex_deadlock_account_lock(lock, current);
+	rt_mutex_deadlock_account_lock(lock, task);
  
 	return 1;
 }
@@ -436,6 +414,9 @@
  
 	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
  
+	if (!owner)
+		return 0;
+
 	if (waiter == rt_mutex_top_waiter(lock)) {
 		raw_spin_lock_irqsave(&owner->pi_lock, flags);
 		plist_del(&top_waiter->pi_list_entry, &owner->pi_waiters);
  
  
@@ -472,21 +453,18 @@
 /*
  * Wake up the next waiter on the lock.
  *
- * Remove the top waiter from the current tasks waiter list and from
- * the lock waiter list. Set it as pending owner. Then wake it up.
+ * Remove the top waiter from the current tasks waiter list and wake it up.
  *
  * Called with lock->wait_lock held.
  */
 static void wakeup_next_waiter(struct rt_mutex *lock)
 {
 	struct rt_mutex_waiter *waiter;
-	struct task_struct *pendowner;
 	unsigned long flags;
  
 	raw_spin_lock_irqsave(&current->pi_lock, flags);
  
 	waiter = rt_mutex_top_waiter(lock);
-	plist_del(&waiter->list_entry, &lock->wait_list);
  
 	/*
 	 * Remove it from current->pi_waiters. We do not adjust a
  
  
  
  
@@ -495,43 +473,19 @@
 	 * lock->wait_lock.
 	 */
 	plist_del(&waiter->pi_list_entry, &current->pi_waiters);
-	pendowner = waiter->task;
-	waiter->task = NULL;
  
-	rt_mutex_set_owner(lock, pendowner, RT_MUTEX_OWNER_PENDING);
+	rt_mutex_set_owner(lock, NULL);
  
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
  
-	/*
-	 * Clear the pi_blocked_on variable and enqueue a possible
-	 * waiter into the pi_waiters list of the pending owner. This
-	 * prevents that in case the pending owner gets unboosted a
-	 * waiter with higher priority than pending-owner->normal_prio
-	 * is blocked on the unboosted (pending) owner.
-	 */
-	raw_spin_lock_irqsave(&pendowner->pi_lock, flags);
-
-	WARN_ON(!pendowner->pi_blocked_on);
-	WARN_ON(pendowner->pi_blocked_on != waiter);
-	WARN_ON(pendowner->pi_blocked_on->lock != lock);
-
-	pendowner->pi_blocked_on = NULL;
-
-	if (rt_mutex_has_waiters(lock)) {
-		struct rt_mutex_waiter *next;
-
-		next = rt_mutex_top_waiter(lock);
-		plist_add(&next->pi_list_entry, &pendowner->pi_waiters);
-	}
-	raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
-
-	wake_up_process(pendowner);
+	wake_up_process(waiter->task);
 }
  
 /*
- * Remove a waiter from a lock
+ * Remove a waiter from a lock and give up
  *
- * Must be called with lock->wait_lock held
+ * Must be called with lock->wait_lock held and
+ * have just failed to try_to_take_rt_mutex().
  */
 static void remove_waiter(struct rt_mutex *lock,
 			  struct rt_mutex_waiter *waiter)
  
  
@@ -543,12 +497,14 @@
  
 	raw_spin_lock_irqsave(&current->pi_lock, flags);
 	plist_del(&waiter->list_entry, &lock->wait_list);
-	waiter->task = NULL;
 	current->pi_blocked_on = NULL;
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
  
-	if (first && owner != current) {
+	if (!owner)
+		return;
  
+	if (first) {
+
 		raw_spin_lock_irqsave(&owner->pi_lock, flags);
  
 		plist_del(&waiter->pi_list_entry, &owner->pi_waiters);
  
  
@@ -614,21 +570,19 @@
  * 			 or TASK_UNINTERRUPTIBLE)
  * @timeout:		 the pre-initialized and started timer, or NULL for none
  * @waiter:		 the pre-initialized rt_mutex_waiter
- * @detect_deadlock:	 passed to task_blocks_on_rt_mutex
  *
  * lock->wait_lock must be held by the caller.
  */
 static int __sched
 __rt_mutex_slowlock(struct rt_mutex *lock, int state,
 		    struct hrtimer_sleeper *timeout,
-		    struct rt_mutex_waiter *waiter,
-		    int detect_deadlock)
+		    struct rt_mutex_waiter *waiter)
 {
 	int ret = 0;
  
 	for (;;) {
 		/* Try to acquire the lock: */
-		if (try_to_take_rt_mutex(lock))
+		if (try_to_take_rt_mutex(lock, current, waiter))
 			break;
  
 		/*
  
@@ -645,39 +599,11 @@
 				break;
 		}
  
-		/*
-		 * waiter->task is NULL the first time we come here and
-		 * when we have been woken up by the previous owner
-		 * but the lock got stolen by a higher prio task.
-		 */
-		if (!waiter->task) {
-			ret = task_blocks_on_rt_mutex(lock, waiter, current,
-						      detect_deadlock);
-			/*
-			 * If we got woken up by the owner then start loop
-			 * all over without going into schedule to try
-			 * to get the lock now:
-			 */
-			if (unlikely(!waiter->task)) {
-				/*
-				 * Reset the return value. We might
-				 * have returned with -EDEADLK and the
-				 * owner released the lock while we
-				 * were walking the pi chain.
-				 */
-				ret = 0;
-				continue;
-			}
-			if (unlikely(ret))
-				break;
-		}
-
 		raw_spin_unlock(&lock->wait_lock);
  
 		debug_rt_mutex_print_deadlock(waiter);
  
-		if (waiter->task)
-			schedule_rt_mutex(lock);
+		schedule_rt_mutex(lock);
  
 		raw_spin_lock(&lock->wait_lock);
 		set_current_state(state);
  
@@ -698,12 +624,11 @@
 	int ret = 0;
  
 	debug_rt_mutex_init_waiter(&waiter);
-	waiter.task = NULL;
  
 	raw_spin_lock(&lock->wait_lock);
  
 	/* Try to acquire the lock again: */
-	if (try_to_take_rt_mutex(lock)) {
+	if (try_to_take_rt_mutex(lock, current, NULL)) {
 		raw_spin_unlock(&lock->wait_lock);
 		return 0;
 	}
  
  
@@ -717,12 +642,14 @@
 			timeout->task = NULL;
 	}
  
-	ret = __rt_mutex_slowlock(lock, state, timeout, &waiter,
-				  detect_deadlock);
+	ret = task_blocks_on_rt_mutex(lock, &waiter, current, detect_deadlock);
  
+	if (likely(!ret))
+		ret = __rt_mutex_slowlock(lock, state, timeout, &waiter);
+
 	set_current_state(TASK_RUNNING);
  
-	if (unlikely(waiter.task))
+	if (unlikely(ret))
 		remove_waiter(lock, &waiter);
  
 	/*
@@ -737,14 +664,6 @@
 	if (unlikely(timeout))
 		hrtimer_cancel(&timeout->timer);
  
-	/*
-	 * Readjust priority, when we did not get the lock. We might
-	 * have been the pending owner and boosted. Since we did not
-	 * take the lock, the PI boost has to go.
-	 */
-	if (unlikely(ret))
-		rt_mutex_adjust_prio(current);
-
 	debug_rt_mutex_free_waiter(&waiter);
  
 	return ret;
@@ -762,7 +681,7 @@
  
 	if (likely(rt_mutex_owner(lock) != current)) {
  
-		ret = try_to_take_rt_mutex(lock);
+		ret = try_to_take_rt_mutex(lock, current, NULL);
 		/*
 		 * try_to_take_rt_mutex() sets the lock waiters
 		 * bit unconditionally. Clean this up.
@@ -992,7 +911,7 @@
 {
 	__rt_mutex_init(lock, NULL);
 	debug_rt_mutex_proxy_lock(lock, proxy_owner);
-	rt_mutex_set_owner(lock, proxy_owner, 0);
+	rt_mutex_set_owner(lock, proxy_owner);
 	rt_mutex_deadlock_account_lock(lock, proxy_owner);
 }
  
@@ -1008,7 +927,7 @@
 			   struct task_struct *proxy_owner)
 {
 	debug_rt_mutex_proxy_unlock(lock);
-	rt_mutex_set_owner(lock, NULL, 0);
+	rt_mutex_set_owner(lock, NULL);
 	rt_mutex_deadlock_account_unlock(proxy_owner);
 }
  
  
  
@@ -1034,20 +953,14 @@
  
 	raw_spin_lock(&lock->wait_lock);
  
-	mark_rt_mutex_waiters(lock);
-
-	if (!rt_mutex_owner(lock) || try_to_steal_lock(lock, task)) {
-		/* We got the lock for task. */
-		debug_rt_mutex_lock(lock);
-		rt_mutex_set_owner(lock, task, 0);
+	if (try_to_take_rt_mutex(lock, task, NULL)) {
 		raw_spin_unlock(&lock->wait_lock);
-		rt_mutex_deadlock_account_lock(lock, task);
 		return 1;
 	}
  
 	ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);
  
-	if (ret && !waiter->task) {
+	if (ret && !rt_mutex_owner(lock)) {
 		/*
 		 * Reset the return value. We might have
 		 * returned with -EDEADLK and the owner
@@ -1056,6 +969,10 @@
 		 */
 		ret = 0;
 	}
+
+	if (unlikely(ret))
+		remove_waiter(lock, waiter);
+
 	raw_spin_unlock(&lock->wait_lock);
  
 	debug_rt_mutex_print_deadlock(waiter);
  
@@ -1110,12 +1027,11 @@
  
 	set_current_state(TASK_INTERRUPTIBLE);
  
-	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter,
-				  detect_deadlock);
+	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
  
 	set_current_state(TASK_RUNNING);
  
-	if (unlikely(waiter->task))
+	if (unlikely(ret))
 		remove_waiter(lock, waiter);
  
 	/*
@@ -1125,14 +1041,6 @@
 	fixup_rt_mutex_waiters(lock);
  
 	raw_spin_unlock(&lock->wait_lock);
-
-	/*
-	 * Readjust priority, when we did not get the lock. We might have been
-	 * the pending owner and boosted. Since we did not take the lock, the
-	 * PI boost has to go.
-	 */
-	if (unlikely(ret))
-		rt_mutex_adjust_prio(current);
  
 	return ret;
 }
@@ -91,25 +91,13 @@
 /*
  * lock->owner state tracking:
  */
-#define RT_MUTEX_OWNER_PENDING	1UL
-#define RT_MUTEX_HAS_WAITERS	2UL
-#define RT_MUTEX_OWNER_MASKALL	3UL
+#define RT_MUTEX_HAS_WAITERS	1UL
+#define RT_MUTEX_OWNER_MASKALL	1UL
  
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
 	return (struct task_struct *)
 		((unsigned long)lock->owner & ~RT_MUTEX_OWNER_MASKALL);
-}
-
-static inline struct task_struct *rt_mutex_real_owner(struct rt_mutex *lock)
-{
-	return (struct task_struct *)
-		((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
-}
-
-static inline unsigned long rt_mutex_owner_pending(struct rt_mutex *lock)
-{
-	return (unsigned long)lock->owner & RT_MUTEX_OWNER_PENDING;
 }
  
 /*
...	...	@@ -1556,10 +1556,10 @@
1556	1556
1557	1557	/*
1558	1558	* We are here either because we stole the rtmutex from the
1559		- * pending owner or we are the pending owner which failed to
1560		- * get the rtmutex. We have to replace the pending owner TID
1561		- * in the user space variable. This must be atomic as we have
1562		- * to preserve the owner died bit here.
	1559	+ * previous highest priority waiter or we are the highest priority
	1560	+ * waiter but failed to get the rtmutex the first time.
	1561	+ * We have to replace the newowner TID in the user space variable.
	1562	+ * This must be atomic as we have to preserve the owner died bit here.
1563	1563	*
1564	1564	* Note: We write the user space value _before_ changing the pi_state
1565	1565	* because we can fault here. Imagine swapped out pages or a fork
...	...	@@ -1608,8 +1608,8 @@
1608	1608
1609	1609	/*
1610	1610	* To handle the page fault we need to drop the hash bucket
1611		- * lock here. That gives the other task (either the pending
1612		- * owner itself or the task which stole the rtmutex) the
	1611	+ * lock here. That gives the other task (either the highest priority
	1612	+ * waiter itself or the task which stole the rtmutex) the
1613	1613	* chance to try the fixup of the pi_state. So once we are
1614	1614	* back from handling the fault we need to check the pi_state
1615	1615	* after reacquiring the hash bucket lock and before trying to
1616	1616
1617	1617
1618	1618
...	...	@@ -1685,18 +1685,20 @@
1685	1685	/*
1686	1686	* pi_state is incorrect, some other task did a lock steal and
1687	1687	* we returned due to timeout or signal without taking the
1688		- * rt_mutex. Too late. We can access the rt_mutex_owner without
1689		- * locking, as the other task is now blocked on the hash bucket
1690		- * lock. Fix the state up.
	1688	+ * rt_mutex. Too late.
1691	1689	*/
	1690	+ raw_spin_lock(&q->pi_state->pi_mutex.wait_lock);
1692	1691	owner = rt_mutex_owner(&q->pi_state->pi_mutex);
	1692	+ if (!owner)
	1693	+ owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
	1694	+ raw_spin_unlock(&q->pi_state->pi_mutex.wait_lock);
1693	1695	ret = fixup_pi_state_owner(uaddr, q, owner);
1694	1696	goto out;
1695	1697	}
1696	1698
1697	1699	/*
1698	1700	* Paranoia check. If we did not take the lock, then we should not be
1699		- * the owner, nor the pending owner, of the rt_mutex.
	1701	+ * the owner of the rt_mutex.
1700	1702	*/
1701	1703	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
1702	1704	printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p "
...	...	@@ -215,7 +215,6 @@
215	215	put_pid(waiter->deadlock_task_pid);
216	216	TRACE_WARN_ON(!plist_node_empty(&waiter->list_entry));
217	217	TRACE_WARN_ON(!plist_node_empty(&waiter->pi_list_entry));
218		- TRACE_WARN_ON(waiter->task);
219	218	memset(waiter, 0x22, sizeof(*waiter));
220	219	}
221	220
...	...	@@ -20,41 +20,34 @@
20	20	/*
21	21	* lock->owner state tracking:
22	22	*
23		- * lock->owner holds the task_struct pointer of the owner. Bit 0 and 1
24		- * are used to keep track of the "owner is pending" and "lock has
25		- * waiters" state.
	23	+ * lock->owner holds the task_struct pointer of the owner. Bit 0
	24	+ * is used to keep track of the "lock has waiters" state.
26	25	*
27		- * owner bit1 bit0
28		- * NULL 0 0 lock is free (fast acquire possible)
29		- * NULL 0 1 invalid state
30		- * NULL 1 0 Transitional State*
31		- * NULL 1 1 invalid state
32		- * taskpointer 0 0 lock is held (fast release possible)
33		- * taskpointer 0 1 task is pending owner
34		- * taskpointer 1 0 lock is held and has waiters
35		- * taskpointer 1 1 task is pending owner and lock has more waiters
	26	+ * owner bit0
	27	+ * NULL 0 lock is free (fast acquire possible)
	28	+ * NULL 1 lock is free and has waiters and the top waiter
	29	+ * is going to take the lock*
	30	+ * taskpointer 0 lock is held (fast release possible)
	31	+ * taskpointer 1 lock is held and has waiters**
36	32	*
37		- * Pending ownership is assigned to the top (highest priority)
38		- * waiter of the lock, when the lock is released. The thread is woken
39		- * up and can now take the lock. Until the lock is taken (bit 0
40		- * cleared) a competing higher priority thread can steal the lock
41		- * which puts the woken up thread back on the waiters list.
42		- *
43	33	* The fast atomic compare exchange based acquire and release is only
44		- * possible when bit 0 and 1 of lock->owner are 0.
	34	+ * possible when bit 0 of lock->owner is 0.
45	35	*
46		- * (*) There's a small time where the owner can be NULL and the
47		- * "lock has waiters" bit is set. This can happen when grabbing the lock.
48		- * To prevent a cmpxchg of the owner releasing the lock, we need to set this
49		- * bit before looking at the lock, hence the reason this is a transitional
50		- * state.
	36	+ * (*) It also can be a transitional state when grabbing the lock
	37	+ * with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
	38	+ * we need to set the bit0 before looking at the lock, and the owner may be
	39	+ * NULL in this small time, hence this can be a transitional state.
	40	+ *
	41	+ * (**) There is a small time when bit 0 is set but there are no
	42	+ * waiters. This can happen when grabbing the lock in the slow path.
	43	+ * To prevent a cmpxchg of the owner releasing the lock, we need to
	44	+ * set this bit before looking at the lock.
51	45	*/
52	46
53	47	static void
54		-rt_mutex_set_owner(struct rt_mutex lock, struct task_struct owner,
55		- unsigned long mask)
	48	+rt_mutex_set_owner(struct rt_mutex lock, struct task_struct owner)
56	49	{
57		- unsigned long val = (unsigned long)owner \| mask;
	50	+ unsigned long val = (unsigned long)owner;
58	51
59	52	if (rt_mutex_has_waiters(lock))
60	53	val \|= RT_MUTEX_HAS_WAITERS;
61	54
62	55
...	...	@@ -203,15 +196,14 @@
203	196	* reached or the state of the chain has changed while we
204	197	* dropped the locks.
205	198	*/
206		- if (!waiter \|\| !waiter->task)
	199	+ if (!waiter)
207	200	goto out_unlock_pi;
208	201
209	202	/*
210	203	* Check the orig_waiter state. After we dropped the locks,
211		- * the previous owner of the lock might have released the lock
212		- * and made us the pending owner:
	204	+ * the previous owner of the lock might have released the lock.
213	205	*/
214		- if (orig_waiter && !orig_waiter->task)
	206	+ if (orig_waiter && !rt_mutex_owner(orig_lock))
215	207	goto out_unlock_pi;
216	208
217	209	/*
...	...	@@ -254,6 +246,17 @@
254	246
255	247	/* Release the task */
256	248	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
	249	+ if (!rt_mutex_owner(lock)) {
	250	+ /*
	251	+ * If the requeue above changed the top waiter, then we need
	252	+ * to wake the new top waiter up to try to get the lock.
	253	+ */
	254	+
	255	+ if (top_waiter != rt_mutex_top_waiter(lock))
	256	+ wake_up_process(rt_mutex_top_waiter(lock)->task);
	257	+ raw_spin_unlock(&lock->wait_lock);
	258	+ goto out_put_task;
	259	+ }
257	260	put_task_struct(task);
258	261
259	262	/* Grab the next task */
260	263
261	264
262	265
...	...	@@ -296,78 +299,16 @@
296	299	}
297	300
298	301	/*
299		- * Optimization: check if we can steal the lock from the
300		- * assigned pending owner [which might not have taken the
301		- * lock yet]:
302		- */
303		-static inline int try_to_steal_lock(struct rt_mutex *lock,
304		- struct task_struct *task)
305		-{
306		- struct task_struct *pendowner = rt_mutex_owner(lock);
307		- struct rt_mutex_waiter *next;
308		- unsigned long flags;
309		-
310		- if (!rt_mutex_owner_pending(lock))
311		- return 0;
312		-
313		- if (pendowner == task)
314		- return 1;
315		-
316		- raw_spin_lock_irqsave(&pendowner->pi_lock, flags);
317		- if (task->prio >= pendowner->prio) {
318		- raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
319		- return 0;
320		- }
321		-
322		- /*
323		- * Check if a waiter is enqueued on the pending owners
324		- * pi_waiters list. Remove it and readjust pending owners
325		- * priority.
326		- */
327		- if (likely(!rt_mutex_has_waiters(lock))) {
328		- raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
329		- return 1;
330		- }
331		-
332		- /* No chain handling, pending owner is not blocked on anything: */
333		- next = rt_mutex_top_waiter(lock);
334		- plist_del(&next->pi_list_entry, &pendowner->pi_waiters);
335		- __rt_mutex_adjust_prio(pendowner);
336		- raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
337		-
338		- /*
339		- * We are going to steal the lock and a waiter was
340		- * enqueued on the pending owners pi_waiters queue. So
341		- * we have to enqueue this waiter into
342		- * task->pi_waiters list. This covers the case,
343		- * where task is boosted because it holds another
344		- * lock and gets unboosted because the booster is
345		- * interrupted, so we would delay a waiter with higher
346		- * priority as task->normal_prio.
347		- *
348		- * Note: in the rare case of a SCHED_OTHER task changing
349		- * its priority and thus stealing the lock, next->task
350		- * might be task:
351		- */
352		- if (likely(next->task != task)) {
353		- raw_spin_lock_irqsave(&task->pi_lock, flags);
354		- plist_add(&next->pi_list_entry, &task->pi_waiters);
355		- __rt_mutex_adjust_prio(task);
356		- raw_spin_unlock_irqrestore(&task->pi_lock, flags);
357		- }
358		- return 1;
359		-}
360		-
361		-/*
362	302	* Try to take an rt-mutex
363	303	*
364		- * This fails
365		- * - when the lock has a real owner
366		- * - when a different pending owner exists and has higher priority than current
367		- *
368	304	* Must be called with lock->wait_lock held.
	305	+ *
	306	+ * @lock: the lock to be acquired.
	307	+ * @task: the task which wants to acquire the lock
	308	+ * @waiter: the waiter that is queued to the lock's wait list. (could be NULL)
369	309	*/
370		-static int try_to_take_rt_mutex(struct rt_mutex *lock)
	310	+static int try_to_take_rt_mutex(struct rt_mutex lock, struct task_struct task,
	311	+ struct rt_mutex_waiter *waiter)
371	312	{
372	313	/*
373	314	* We have to be careful here if the atomic speedups are
374	315
375	316
376	317
...	...	@@ -390,15 +331,52 @@
390	331	*/
391	332	mark_rt_mutex_waiters(lock);
392	333
393		- if (rt_mutex_owner(lock) && !try_to_steal_lock(lock, current))
	334	+ if (rt_mutex_owner(lock))
394	335	return 0;
395	336
	337	+ /*
	338	+ * It will get the lock because of one of these conditions:
	339	+ * 1) there is no waiter
	340	+ * 2) higher priority than waiters
	341	+ * 3) it is top waiter
	342	+ */
	343	+ if (rt_mutex_has_waiters(lock)) {
	344	+ if (task->prio >= rt_mutex_top_waiter(lock)->list_entry.prio) {
	345	+ if (!waiter \|\| waiter != rt_mutex_top_waiter(lock))
	346	+ return 0;
	347	+ }
	348	+ }
	349	+
	350	+ if (waiter \|\| rt_mutex_has_waiters(lock)) {
	351	+ unsigned long flags;
	352	+ struct rt_mutex_waiter *top;
	353	+
	354	+ raw_spin_lock_irqsave(&task->pi_lock, flags);
	355	+
	356	+ /* remove the queued waiter. */
	357	+ if (waiter) {
	358	+ plist_del(&waiter->list_entry, &lock->wait_list);
	359	+ task->pi_blocked_on = NULL;
	360	+ }
	361	+
	362	+ /*
	363	+ * We have to enqueue the top waiter(if it exists) into
	364	+ * task->pi_waiters list.
	365	+ */
	366	+ if (rt_mutex_has_waiters(lock)) {
	367	+ top = rt_mutex_top_waiter(lock);
	368	+ top->pi_list_entry.prio = top->list_entry.prio;
	369	+ plist_add(&top->pi_list_entry, &task->pi_waiters);
	370	+ }
	371	+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
	372	+ }
	373	+
396	374	/* We got the lock. */
397	375	debug_rt_mutex_lock(lock);
398	376
399		- rt_mutex_set_owner(lock, current, 0);
	377	+ rt_mutex_set_owner(lock, task);
400	378
401		- rt_mutex_deadlock_account_lock(lock, current);
	379	+ rt_mutex_deadlock_account_lock(lock, task);
402	380
403	381	return 1;
404	382	}
...	...	@@ -436,6 +414,9 @@
436	414
437	415	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
438	416
	417	+ if (!owner)
	418	+ return 0;
	419	+
439	420	if (waiter == rt_mutex_top_waiter(lock)) {
440	421	raw_spin_lock_irqsave(&owner->pi_lock, flags);
441	422	plist_del(&top_waiter->pi_list_entry, &owner->pi_waiters);
442	423
443	424
...	...	@@ -472,21 +453,18 @@
472	453	/*
473	454	* Wake up the next waiter on the lock.
474	455	*
475		- * Remove the top waiter from the current tasks waiter list and from
476		- * the lock waiter list. Set it as pending owner. Then wake it up.
	456	+ * Remove the top waiter from the current tasks waiter list and wake it up.
477	457	*
478	458	* Called with lock->wait_lock held.
479	459	*/
480	460	static void wakeup_next_waiter(struct rt_mutex *lock)
481	461	{
482	462	struct rt_mutex_waiter *waiter;
483		- struct task_struct *pendowner;
484	463	unsigned long flags;
485	464
486	465	raw_spin_lock_irqsave(&current->pi_lock, flags);
487	466
488	467	waiter = rt_mutex_top_waiter(lock);
489		- plist_del(&waiter->list_entry, &lock->wait_list);
490	468
491	469	/*
492	470	* Remove it from current->pi_waiters. We do not adjust a
493	471
494	472
495	473
496	474
...	...	@@ -495,43 +473,19 @@
495	473	* lock->wait_lock.
496	474	*/
497	475	plist_del(&waiter->pi_list_entry, &current->pi_waiters);
498		- pendowner = waiter->task;
499		- waiter->task = NULL;
500	476
501		- rt_mutex_set_owner(lock, pendowner, RT_MUTEX_OWNER_PENDING);
	477	+ rt_mutex_set_owner(lock, NULL);
502	478
503	479	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
504	480
505		- /*
506		- * Clear the pi_blocked_on variable and enqueue a possible
507		- * waiter into the pi_waiters list of the pending owner. This
508		- * prevents that in case the pending owner gets unboosted a
509		- * waiter with higher priority than pending-owner->normal_prio
510		- * is blocked on the unboosted (pending) owner.
511		- */
512		- raw_spin_lock_irqsave(&pendowner->pi_lock, flags);
513		-
514		- WARN_ON(!pendowner->pi_blocked_on);
515		- WARN_ON(pendowner->pi_blocked_on != waiter);
516		- WARN_ON(pendowner->pi_blocked_on->lock != lock);
517		-
518		- pendowner->pi_blocked_on = NULL;
519		-
520		- if (rt_mutex_has_waiters(lock)) {
521		- struct rt_mutex_waiter *next;
522		-
523		- next = rt_mutex_top_waiter(lock);
524		- plist_add(&next->pi_list_entry, &pendowner->pi_waiters);
525		- }
526		- raw_spin_unlock_irqrestore(&pendowner->pi_lock, flags);
527		-
528		- wake_up_process(pendowner);
	481	+ wake_up_process(waiter->task);
529	482	}
530	483
531	484	/*
532		- * Remove a waiter from a lock
	485	+ * Remove a waiter from a lock and give up
533	486	*
534		- * Must be called with lock->wait_lock held
	487	+ * Must be called with lock->wait_lock held and
	488	+ * have just failed to try_to_take_rt_mutex().
535	489	*/
536	490	static void remove_waiter(struct rt_mutex *lock,
537	491	struct rt_mutex_waiter *waiter)
538	492
539	493
...	...	@@ -543,12 +497,14 @@
543	497
544	498	raw_spin_lock_irqsave(&current->pi_lock, flags);
545	499	plist_del(&waiter->list_entry, &lock->wait_list);
546		- waiter->task = NULL;
547	500	current->pi_blocked_on = NULL;
548	501	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
549	502
550		- if (first && owner != current) {
	503	+ if (!owner)
	504	+ return;
551	505
	506	+ if (first) {
	507	+
552	508	raw_spin_lock_irqsave(&owner->pi_lock, flags);
553	509
554	510	plist_del(&waiter->pi_list_entry, &owner->pi_waiters);
555	511
556	512
...	...	@@ -614,21 +570,19 @@
614	570	* or TASK_UNINTERRUPTIBLE)
615	571	* @timeout: the pre-initialized and started timer, or NULL for none
616	572	* @waiter: the pre-initialized rt_mutex_waiter
617		- * @detect_deadlock: passed to task_blocks_on_rt_mutex
618	573	*
619	574	* lock->wait_lock must be held by the caller.
620	575	*/
621	576	static int __sched
622	577	__rt_mutex_slowlock(struct rt_mutex *lock, int state,
623	578	struct hrtimer_sleeper *timeout,
624		- struct rt_mutex_waiter *waiter,
625		- int detect_deadlock)
	579	+ struct rt_mutex_waiter *waiter)
626	580	{
627	581	int ret = 0;
628	582
629	583	for (;;) {
630	584	/* Try to acquire the lock: */
631		- if (try_to_take_rt_mutex(lock))
	585	+ if (try_to_take_rt_mutex(lock, current, waiter))
632	586	break;
633	587
634	588	/*
635	589
...	...	@@ -645,39 +599,11 @@
645	599	break;
646	600	}
647	601
648		- /*
649		- * waiter->task is NULL the first time we come here and
650		- * when we have been woken up by the previous owner
651		- * but the lock got stolen by a higher prio task.
652		- */
653		- if (!waiter->task) {
654		- ret = task_blocks_on_rt_mutex(lock, waiter, current,
655		- detect_deadlock);
656		- /*
657		- * If we got woken up by the owner then start loop
658		- * all over without going into schedule to try
659		- * to get the lock now:
660		- */
661		- if (unlikely(!waiter->task)) {
662		- /*
663		- * Reset the return value. We might
664		- * have returned with -EDEADLK and the
665		- * owner released the lock while we
666		- * were walking the pi chain.
667		- */
668		- ret = 0;
669		- continue;
670		- }
671		- if (unlikely(ret))
672		- break;
673		- }
674		-
675	602	raw_spin_unlock(&lock->wait_lock);
676	603
677	604	debug_rt_mutex_print_deadlock(waiter);
678	605
679		- if (waiter->task)
680		- schedule_rt_mutex(lock);
	606	+ schedule_rt_mutex(lock);
681	607
682	608	raw_spin_lock(&lock->wait_lock);
683	609	set_current_state(state);
684	610
...	...	@@ -698,12 +624,11 @@
698	624	int ret = 0;
699	625
700	626	debug_rt_mutex_init_waiter(&waiter);
701		- waiter.task = NULL;
702	627
703	628	raw_spin_lock(&lock->wait_lock);
704	629
705	630	/* Try to acquire the lock again: */
706		- if (try_to_take_rt_mutex(lock)) {
	631	+ if (try_to_take_rt_mutex(lock, current, NULL)) {
707	632	raw_spin_unlock(&lock->wait_lock);
708	633	return 0;
709	634	}
710	635
711	636
...	...	@@ -717,12 +642,14 @@
717	642	timeout->task = NULL;
718	643	}
719	644
720		- ret = __rt_mutex_slowlock(lock, state, timeout, &waiter,
721		- detect_deadlock);
	645	+ ret = task_blocks_on_rt_mutex(lock, &waiter, current, detect_deadlock);
722	646
	647	+ if (likely(!ret))
	648	+ ret = __rt_mutex_slowlock(lock, state, timeout, &waiter);
	649	+
723	650	set_current_state(TASK_RUNNING);
724	651
725		- if (unlikely(waiter.task))
	652	+ if (unlikely(ret))
726	653	remove_waiter(lock, &waiter);
727	654
728	655	/*
...	...	@@ -737,14 +664,6 @@
737	664	if (unlikely(timeout))
738	665	hrtimer_cancel(&timeout->timer);
739	666
740		- /*
741		- * Readjust priority, when we did not get the lock. We might
742		- * have been the pending owner and boosted. Since we did not
743		- * take the lock, the PI boost has to go.
744		- */
745		- if (unlikely(ret))
746		- rt_mutex_adjust_prio(current);
747		-
748	667	debug_rt_mutex_free_waiter(&waiter);
749	668
750	669	return ret;
...	...	@@ -762,7 +681,7 @@
762	681
763	682	if (likely(rt_mutex_owner(lock) != current)) {
764	683
765		- ret = try_to_take_rt_mutex(lock);
	684	+ ret = try_to_take_rt_mutex(lock, current, NULL);
766	685	/*
767	686	* try_to_take_rt_mutex() sets the lock waiters
768	687	* bit unconditionally. Clean this up.
...	...	@@ -992,7 +911,7 @@
992	911	{
993	912	__rt_mutex_init(lock, NULL);
994	913	debug_rt_mutex_proxy_lock(lock, proxy_owner);
995		- rt_mutex_set_owner(lock, proxy_owner, 0);
	914	+ rt_mutex_set_owner(lock, proxy_owner);
996	915	rt_mutex_deadlock_account_lock(lock, proxy_owner);
997	916	}
998	917
...	...	@@ -1008,7 +927,7 @@
1008	927	struct task_struct *proxy_owner)
1009	928	{
1010	929	debug_rt_mutex_proxy_unlock(lock);
1011		- rt_mutex_set_owner(lock, NULL, 0);
	930	+ rt_mutex_set_owner(lock, NULL);
1012	931	rt_mutex_deadlock_account_unlock(proxy_owner);
1013	932	}
1014	933
1015	934
1016	935
...	...	@@ -1034,20 +953,14 @@
1034	953
1035	954	raw_spin_lock(&lock->wait_lock);
1036	955
1037		- mark_rt_mutex_waiters(lock);
1038		-
1039		- if (!rt_mutex_owner(lock) \|\| try_to_steal_lock(lock, task)) {
1040		- /* We got the lock for task. */
1041		- debug_rt_mutex_lock(lock);
1042		- rt_mutex_set_owner(lock, task, 0);
	956	+ if (try_to_take_rt_mutex(lock, task, NULL)) {
1043	957	raw_spin_unlock(&lock->wait_lock);
1044		- rt_mutex_deadlock_account_lock(lock, task);
1045	958	return 1;
1046	959	}
1047	960
1048	961	ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);
1049	962
1050		- if (ret && !waiter->task) {
	963	+ if (ret && !rt_mutex_owner(lock)) {
1051	964	/*
1052	965	* Reset the return value. We might have
1053	966	* returned with -EDEADLK and the owner
...	...	@@ -1056,6 +969,10 @@
1056	969	*/
1057	970	ret = 0;
1058	971	}
	972	+
	973	+ if (unlikely(ret))
	974	+ remove_waiter(lock, waiter);
	975	+
1059	976	raw_spin_unlock(&lock->wait_lock);
1060	977
1061	978	debug_rt_mutex_print_deadlock(waiter);
1062	979
...	...	@@ -1110,12 +1027,11 @@
1110	1027
1111	1028	set_current_state(TASK_INTERRUPTIBLE);
1112	1029
1113		- ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter,
1114		- detect_deadlock);
	1030	+ ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
1115	1031
1116	1032	set_current_state(TASK_RUNNING);
1117	1033
1118		- if (unlikely(waiter->task))
	1034	+ if (unlikely(ret))
1119	1035	remove_waiter(lock, waiter);
1120	1036
1121	1037	/*
...	...	@@ -1125,14 +1041,6 @@
1125	1041	fixup_rt_mutex_waiters(lock);
1126	1042
1127	1043	raw_spin_unlock(&lock->wait_lock);
1128		-
1129		- /*
1130		- * Readjust priority, when we did not get the lock. We might have been
1131		- * the pending owner and boosted. Since we did not take the lock, the
1132		- * PI boost has to go.
1133		- */
1134		- if (unlikely(ret))
1135		- rt_mutex_adjust_prio(current);
1136	1044
1137	1045	return ret;
1138	1046	}
...	...	@@ -91,25 +91,13 @@
91	91	/*
92	92	* lock->owner state tracking:
93	93	*/
94		-#define RT_MUTEX_OWNER_PENDING 1UL
95		-#define RT_MUTEX_HAS_WAITERS 2UL
96		-#define RT_MUTEX_OWNER_MASKALL 3UL
	94	+#define RT_MUTEX_HAS_WAITERS 1UL
	95	+#define RT_MUTEX_OWNER_MASKALL 1UL
97	96
98	97	static inline struct task_struct rt_mutex_owner(struct rt_mutex lock)
99	98	{
100	99	return (struct task_struct *)
101	100	((unsigned long)lock->owner & ~RT_MUTEX_OWNER_MASKALL);
102		-}
103		-
104		-static inline struct task_struct rt_mutex_real_owner(struct rt_mutex lock)
105		-{
106		- return (struct task_struct *)
107		- ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
108		-}
109		-
110		-static inline unsigned long rt_mutex_owner_pending(struct rt_mutex *lock)
111		-{
112		- return (unsigned long)lock->owner & RT_MUTEX_OWNER_PENDING;
113	101	}
114	102
115	103	/*