Eric Lee / smarc-fsl-linux-kernel

31 Oct, 2011

1 commit

9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

13 Sep, 2011

1 commit

8292c9e15 locking, semaphores: Annotate inner lock as raw ... Browse Code »

There is no reason to have the spin_lock protecting the semaphore
preemptible on -rt. Annotate it as a raw_spinlock.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

( On rt this also solves lockdep complaining about the
rt_mutex.wait_lock being not initialized. )

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

Thomas Gleixner
2011-09-13 17:11:57 +0800

06 Aug, 2008

1 commit

5b2becc8c semaphore: __down_common: use signal_pending_state() ... Browse Code »

Change __down_common() to use signal_pending_state() instead of open
coding.

The changes in kernel/semaphore.o are just artifacts, the state checks are
optimized away.

Signed-off-by: Oleg Nesterov
Cc: Ingo Molnar
Cc: Matthew Wilcox
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-08-06 05:33:47 +0800

01 Jul, 2008

1 commit

3e61e0c97 mmiotrace broken in linux-next (8-bit writes only) ... Browse Code »

The moment mmiotrace is enabled, I hit a NULL deref in:

IP: [] __trace_special+0x17c/0x23a
Call Trace:
[] ftrace_special+0x6f/0x9a
[] down+0x19/0x4a
[] acquire_console_sem+0x42/0x58
[] con_flush_chars+0x28/0x43
[] write_chan+0x22e/0x334
[] ? default_wake_function+0x0/0xf
[] tty_write+0x195/0x228
[] ? write_chan+0x0/0x334
[] vfs_write+0xae/0x137
[] sys_write+0x47/0x70
[] system_call_after_swapgs+0x7b/0x80

which means 'entry' in __trace_special() is NULL.

[ mingo@elte.hu: that ftrace_special() was a leftover. ]

Signed-off-by: Pekka Paalanen
Cc: Steven Rostedt
Cc: proski@gnu.org
Cc: "Vegard Nossum"
Signed-off-by: Ingo Molnar

Pekka Paalanen
2008-07-01 16:14:06 +0800

24 May, 2008

1 commit

74f4e369f ftrace: stacktrace fix ... Browse Code »

Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Ingo Molnar
2008-05-24 04:34:56 +0800

11 May, 2008

1 commit

00b41ec26 Revert "semaphore: fix" ... Browse Code »

This reverts commit bf726eab3711cf192405d21688a4b21e07b6188a, as it has
been reported to cause a regression with processes stuck in __down(),
apparently because some missing wakeup.

Quoth Sven Wegener:
"I'm currently investigating a regression that has showed up with my
last git pull yesterday. Bisecting the commits showed bf726e
"semaphore: fix" to be the culprit, reverting it fixed the issue.

Symptoms: During heavy filesystem usage (e.g. a kernel compile) I get
several compiler processes in uninterruptible sleep, blocking all i/o
on the filesystem. System is an Intel Core 2 Quad running a 64bit
kernel and userspace. Filesystem is xfs on top of lvm. See below for
the output of sysrq-w."

See

http://lkml.org/lkml/2008/5/10/45

for full report.

In the meantime, we can just fix the BKL performance regression by
reverting back to the good old BKL spinlock implementation instead,
since any sleeping lock will generally perform badly, especially if it
tries to be fair.

Reported-by: Sven Wegener
Cc: Andrew Morton
Cc: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-05-11 11:43:22 +0800

08 May, 2008

1 commit

bf726eab3 semaphore: fix ... Browse Code »

Yanmin Zhang reported:

| Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
| regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
| and Itanium Montecito. Bisect located the patch below:
|
| 64ac24e738823161693bf791f87adc802cf529ff is first bad commit
| commit 64ac24e738823161693bf791f87adc802cf529ff
| Author: Matthew Wilcox
| Date: Fri Mar 7 21:55:58 2008 -0500
|
| Generic semaphore implementation
|
| After I manually reverted the patch against 2.6.26-rc1 while fixing
| lots of conflicts/errors, aim7 regression became less than 2%.

i reproduced the AIM7 workload and can confirm Yanmin's findings that
-.26-rc1 regresses over .25 - by over 67% here.

Looking at the workload i found and fixed what i believe to be the real
bug causing the AIM7 regression: it was inefficient wakeup / scheduling
/ locking behavior of the new generic semaphore code, causing suboptimal
performance.

The problem comes from the following code. The new semaphore code does
this on down():

spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
sem->count--;
else
__down(sem);
spin_unlock_irqrestore(&sem->lock, flags);

and this on up():

spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
__up(sem);
spin_unlock_irqrestore(&sem->lock, flags);

where __up() does:

list_del(&waiter->list);
waiter->up = 1;
wake_up_process(waiter->task);

and where __down() does this in essence:

list_add_tail(&waiter.list, &sem->wait_list);
waiter.task = task;
waiter.up = 0;
for (;;) {
[...]
spin_unlock_irq(&sem->lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&sem->lock);
if (waiter.up)
return 0;
}

the fastpath looks good and obvious, but note the following property of
the contended path: if there's a task on the ->wait_list, the up() of
the current owner will "pass over" ownership to that waiting task, in a
wake-one manner, via the waiter->up flag and by removing the waiter from
the wait list.

That is all and fine in principle, but as implemented in
kernel/semaphore.c it also creates a nasty, hidden source of contention!

The contention comes from the following property of the new semaphore
code: the new owner owns the semaphore exclusively, even if it is not
running yet.

So if the old owner, even if just a few instructions later, does a
down() [lock_kernel()] again, it will be blocked and will have to wait
on the new owner to eventually be scheduled (possibly on another CPU)!
Or if another task gets to lock_kernel() sooner than the "new owner"
scheduled, it will be blocked unnecessarily and for a very long time
when there are 2000 tasks running.

I.e. the implementation of the new semaphores code does wake-one and
lock ownership in a very restrictive way - it does not allow
opportunistic re-locking of the lock at all and keeps the scheduler from
picking task order intelligently.

This kind of scheduling, with 2000 AIM7 processes running, creates awful
cross-scheduling between those 2000 tasks, causes reduced parallelism, a
throttled runqueue length and a lot of idle time. With increasing number
of CPUs it causes an exponentially worse behavior in AIM7, as the chance
for a newly woken new-owner task to actually run anytime soon is less
and less likely.

Note that it takes just a tiny bit of contention for the 'new-semaphore
catastrophy' to happen: the wakeup latencies get added to whatever small
contention there is, and quickly snowball out of control!

I believe Yanmin's findings and numbers support this analysis too.

The best fix for this problem is to use the same scheduling logic that
the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
wanted because we do not want to over-schedule), but also allow
opportunistic locking of the lock even if a wakee is already "in
flight".

The patch below implements this new logic. With this patch applied the
AIM7 regression is largely fixed on my quad testbox:

# v2.6.25 vanilla:
..................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 56096.4 91 207.5 789.7 0.4675
2000 55894.4 94 208.2 792.7 0.4658

# v2.6.26-rc1-166-gc0a1811 vanilla:
...................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 33230.6 83 350.3 784.5 0.2769
2000 31778.1 86 366.3 783.6 0.2648

# v2.6.26-rc1-166-gc0a1811 + semaphore-speedup:
...............................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 55707.1 92 209.0 795.6 0.4642
2000 55704.4 96 209.0 796.0 0.4642

i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
performance levels and have zero idle time during the test, as expected.

Btw., interactivity also improved dramatically with the fix - for
example console-switching became almost instantaneous during this
workload (which after all is running 2000 tasks at once!), without the
patch it was stuck for a minute at times.

There's another nice side-effect of this speedup patch, the new generic
semaphore code got even smaller:

text data bss dec hex filename
1241 0 0 1241 4d9 semaphore.o.before
1207 0 0 1207 4b7 semaphore.o.after

(because the waiter.up complication got removed.)

Longer-term we should look into using the mutex code for the generic
semaphore code as well - but i's not easy due to legacies and it's
outside of the scope of v2.6.26 and outside the scope of this patch as
well.

Bisected-by: "Zhang, Yanmin"
Signed-off-by: Ingo Molnar

Ingo Molnar
2008-05-08 23:00:42 +0800

17 Apr, 2008

5 commits

714493cd5 Improve semaphore documentation ... Browse Code »

Move documentation from semaphore.h to semaphore.c as requested by
Andrew Morton. Also reformat to kernel-doc style and add some more
notes about the implementation.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2008-04-17 22:43:01 +0800
b17170b2f Simplify semaphore implementation ... Browse Code »

By removing the negative values of 'count' and relying on the wait_list to
indicate whether we have any waiters, we can simplify the implementation
by removing the protection against an unlikely race condition. Thanks to
David Howells for his suggestions.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2008-04-17 22:42:54 +0800
f1241c87a Add down_timeout and change ACPI to use it ... Browse Code »

ACPI currently emulates a timeout for semaphores with calls to
down_trylock and sleep. This produces horrible behaviour in terms of
fairness and excessive wakeups. Now that we have a unified semaphore
implementation, adding a real down_trylock is almost trivial.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2008-04-17 22:42:46 +0800
f06d96865 Introduce down_killable() ... Browse Code »

down_killable() is the functional counterpart of mutex_lock_killable.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2008-04-17 22:42:40 +0800
64ac24e73 Generic semaphore implementation ... Browse Code »
43

Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.

Signed-off-by: Matthew Wilcox
Acked-by: Ingo Molnar

Matthew Wilcox
2008-04-17 22:42:34 +0800