Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

03 Nov, 2009

8 commits

38dc63459 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 ... Browse Code »

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Remove some debug messages producing too much noise
PM: Fix warning on suspend errors
PM / Hibernate: Add newline to load_image() fail path
PM / Hibernate: Fix error handling in save_image()
PM / Hibernate: Fix blkdev refleaks
PM / yenta: Split resume into early and late parts (rev. 4)

Linus Torvalds
2009-11-03 23:52:57 +0800
1d5107509 Correct nr_processes() when CPUs have been unplugged ... Browse Code »

nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.

Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.

The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
stat->nlink = proc_root.nlink + nr_processes();

You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.

Example with some debugging inserted to show what goes on:
# ps haux|wc -l
nr_processes: CPU0: 90
nr_processes: CPU1: 1030
nr_processes: CPU2: -900
nr_processes: CPU3: -136
nr_processes: TOTAL: 84
proc_root_getattr. nlink 12 + nr_processes() 84 = 96
84
# echo 0 >/sys/devices/system/cpu/cpu1/online
# ps haux|wc -l
nr_processes: CPU0: 85
nr_processes: CPU2: -901
nr_processes: CPU3: -137
nr_processes: TOTAL: -953
proc_root_getattr. nlink 12 + nr_processes() -953 = -941
75
# stat /proc/
nr_processes: CPU0: 84
nr_processes: CPU2: -901
nr_processes: CPU3: -137
nr_processes: TOTAL: -954
proc_root_getattr. nlink 12 + nr_processes() -954 = -942
File: `/proc/'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 3h/3d Inode: 1 Links: 4294966354
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2009-11-03 09:06:55.000000000 +0000
Modify: 2009-11-03 09:06:55.000000000 +0000
Change: 2009-11-03 09:06:55.000000000 +0000

I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().

This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.

Signed-off-by: Ian Campbell
Cc: Andrew Morton
Cc: Rusty Russell
Signed-off-by: Linus Torvalds

Ian Campbell
2009-11-03 23:52:39 +0800
bf9fd67a0 PM / Hibernate: Add newline to load_image() fail path ... Browse Code »

Finish a line by \n when load_image fails in the middle of loading.

Signed-off-by: Jiri Slaby
Acked-by: Pavel Machek
Signed-off-by: Rafael J. Wysocki

Jiri Slaby
2009-11-03 18:03:09 +0800
4ff277f9e PM / Hibernate: Fix error handling in save_image() ... Browse Code »

There are too many retval variables in save_image(). Thus error return
value from snapshot_read_next() may be ignored and only part of the
snapshot (successfully) written.

Remove 'error' variable, invert the condition in the do-while loop
and convert the loop to use only 'ret' variable.

Switch the rest of the function to consider only 'ret'.

Also make sure we end printed line by \n if an error occurs.

Signed-off-by: Jiri Slaby
Acked-by: Pavel Machek
Signed-off-by: Rafael J. Wysocki

Jiri Slaby
2009-11-03 18:02:43 +0800
76b57e613 PM / Hibernate: Fix blkdev refleaks ... Browse Code »

While cruising through the swsusp code I found few blkdev reference
leaks of resume_bdev.

swsusp_read: remove blkdev_put altogether. Some fail paths do
not do that.
swsusp_check: make sure we always put a reference on fail paths
software_resume: all fail paths between swsusp_check and swsusp_read
omit swsusp_close. Add it in those cases. And since
swsusp_read doesn't drop the reference anymore, do
it here unconditionally.

[rjw: Fixed a small coding style issue.]

Signed-off-by: Jiri Slaby
Signed-off-by: Rafael J. Wysocki

Jiri Slaby
2009-11-03 18:01:46 +0800
3fe866ca6 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix spurious wakeup for requeue_pi really

Linus Torvalds
2009-11-03 01:46:33 +0800
bce8fc4cb Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Remove -Wcast-align
perf tools: Fix compatibility with libelf 0.8 and autodetect
perf events: Don't generate events for the idle task when exclude_idle is set
perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers

Linus Torvalds
2009-11-03 01:46:06 +0800
a5e3013d6 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove cpu arg from the rb_time_stamp() function
tracing: Fix comment typo and documentation example
tracing: Fix trace_seq_printf() return value
tracing: Update *ppos instead of filp->f_pos

Linus Torvalds
2009-11-03 01:45:44 +0800

30 Oct, 2009

2 commits

8633322c5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
sched: move rq_weight data array out of .percpu
percpu: allow pcpu_alloc() to be called with IRQs off

Linus Torvalds
2009-10-30 00:19:29 +0800
9532faeb2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-param-fixes ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-param-fixes:
param: fix setting arrays of bool
param: fix NULL comparison on oom
param: fix lots of bugs with writing charp params from sysfs, by leaking mem.

Linus Torvalds
2009-10-30 00:18:20 +0800

29 Oct, 2009

11 commits

3242f9804 Merge branch 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: fix invalid page count in printk output
HWPOISON: Allow schedule_on_each_cpu() from keventd
HWPOISON: fix/proc/meminfo alignment
HWPOISON: fix oops on ksm pages
HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
HWPOISON: return early on non-LRU pages
HWPOISON: Add brief hwpoison description to Documentation
HWPOISON: Clean up PR_MCE_KILL interface

Linus Torvalds
2009-10-29 23:20:00 +0800
fefcfd431 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Move drop_futex_key_refs out of spinlock'ed region
rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang
rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
rcu: Prevent RCU IPI storms in presence of high call_rcu() load
futex: Check for NULL keys in match_futex
futex: Handle spurious wake up

Linus Torvalds
2009-10-29 23:12:20 +0800
37c2ca241 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf timechart: Improve the visual appearance of scheduler delays
perf timechart: Fix the wakeup-arrows that point to non-visible processes
perf top: Fix --delay_secs 0 division by zero
perf tools: Bump version to 0.0.2
perf_event: Adjust frequency and unthrottle for non-group-leader events

Linus Torvalds
2009-10-29 23:12:00 +0800
6e958d73c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Do less agressive buddy clearing
sched: Disable SD_PREFER_LOCAL for MC/CPU domains

Linus Torvalds
2009-10-29 23:10:38 +0800
8c85dd873 sysctl: fix false positives when PROC_SYSCTL=n ... Browse Code »

Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408

Signed-off-by: Alexey Dobriyan
Reported-by: Peter Teoh
Cc: "Eric W. Biederman"
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-10-29 22:39:30 +0800
478988d3b cgroup: fix strstrip() misuse ... Browse Code »

cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip(). it makes small inconsistent behavior.

example:
=========================
# cd /mnt/cgroup/hoge
# cat memory.swappiness
60
# echo "59 " > memory.swappiness
# cat memory.swappiness
59
# echo " 58" > memory.swappiness
bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan
Acked-by: Paul Menage
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-10-29 22:39:25 +0800
0d0df599f connector: fix regression introduced by sid connector ... Browse Code »

Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([] 0x13fe04100)
[] sk_filter+0x9a/0xd0
[] netlink_broadcast+0x2c0/0x53c
[] cn_netlink_send+0x272/0x2b0
[] proc_sid_connector+0xc4/0xd4
[] __set_special_pids+0x58/0x90
[] sys_setsid+0xb4/0xd8
[] sysc_noemu+0x10/0x16
[] 0x41616cb266

The warning is
---> WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger
Cc: Scott James Remnant
Cc: Matt Helsley
Cc: David S. Miller
Acked-by: Oleg Nesterov
Acked-by: Evgeniy Polyakov
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Borntraeger
2009-10-29 22:39:25 +0800
3c7d76e37 param: fix setting arrays of bool ... Browse Code »

We create a dummy struct kernel_param on the stack for parsing each
array element, but we didn't initialize the flags word. This matters
for arrays of type "bool", where the flag indicates if it really is
an array of bools or unsigned int (old-style).

Reported-by: Takashi Iwai
Signed-off-by: Rusty Russell
Cc: stable@kernel.org

Rusty Russell
2009-10-29 06:26:20 +0800
d553ad864 param: fix NULL comparison on oom ... Browse Code »

kp->arg is always true: it's the contents of that pointer we care about.

Reported-by: Takashi Iwai
Signed-off-by: Rusty Russell
Cc: stable@kernel.org

Rusty Russell
2009-10-29 06:26:18 +0800
65afac7d8 param: fix lots of bugs with writing charp params from sysfs, by leaking mem. ... Browse Code »

e180a6b7759a "param: fix charp parameters set via sysfs" fixed the case
where charp parameters written via sysfs were freed, leaving drivers
accessing random memory.

Unfortunately, storing a flag in the kparam struct was a bad idea: it's
rodata so setting it causes an oops on some archs. But that's not all:

1) module_param_array() on charp doesn't work reliably, since we use an
uninitialized temporary struct kernel_param.
2) there's a fundamental race if a module uses this parameter and then
it's changed: they will still access the old, freed, memory.

The simplest fix (ie. for 2.6.32) is to never free the memory. This
prevents all these problems, at cost of a memory leak. In practice, there
are only 18 places where a charp is writable via sysfs, and all are
root-only writable.

Reported-by: Takashi Iwai
Cc: Sitsofe Wheeler
Cc: Frederic Weisbecker
Cc: Christof Schmitt
Signed-off-by: Rusty Russell
Cc: stable@kernel.org

Rusty Russell
2009-10-29 06:26:17 +0800
11df6dddc futex: Fix spurious wakeup for requeue_pi really ... Browse Code »

The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)

See debugging discussing on LKML Message-ID:

The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.

Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.

Cc: Darren Hart
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: John Stultz
Cc: Dinakar Guniguntala
LKML-Reference:
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2009-10-29 03:34:34 +0800

28 Oct, 2009

1 commit

4a6cc4bd3 sched: move rq_weight data array out of .percpu ... Browse Code »

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up().

...
ffffffff8104337c: 65 48 8b 14 25 20 cd mov %gs:0xcd20,%rdx
ffffffff81043383: 00 00
ffffffff81043385: 48 c7 c0 00 e1 00 00 mov $0xe100,%rax
ffffffff8104338c: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff81043393: 00
ffffffff81043394: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff8104339b: 00
ffffffff8104339c: 48 01 d0 add %rdx,%rax
ffffffff8104339f: 49 8d 94 24 08 01 00 lea 0x108(%r12),%rdx
ffffffff810433a6: 00
ffffffff810433a7: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433ac: 48 89 45 b0 mov %rax,-0x50(%rbp)
ffffffff810433b0: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433b5: 48 89 55 c0 mov %rdx,-0x40(%rbp)
...

After:

...
ffffffff8104337c: 65 8b 04 25 28 cd 00 mov %gs:0xcd28,%eax
ffffffff81043383: 00
ffffffff81043384: 48 98 cltq
ffffffff81043386: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi
ffffffff8104338d: 00
ffffffff8104338e: 48 8b 15 d3 7f 76 00 mov 0x767fd3(%rip),%rdx # ffffffff817ab368
ffffffff81043395: 48 8b 34 c5 00 ee 6d mov -0x7e921200(,%rax,8),%rsi
ffffffff8104339c: 81
ffffffff8104339d: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp)
ffffffff810433a4: 00
ffffffff810433a5: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff810433aa: 48 89 7d c0 mov %rdi,-0x40(%rbp)
ffffffff810433ae: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp)
ffffffff810433b5: 00
ffffffff810433b6: bb 00 04 00 00 mov $0x400,%ebx
ffffffff810433bb: 48 01 f2 add %rsi,%rdx
ffffffff810433be: 48 89 55 b0 mov %rdx,-0x50(%rbp)
...

Signed-off-by: Jiri Kosina
Acked-by: Ingo Molnar
Signed-off-by: Tejun Heo

Jiri Kosina
2009-10-28 23:26:00 +0800

24 Oct, 2009

4 commits

6d3f1e12f tracing: Remove cpu arg from the rb_time_stamp() function ... Browse Code »

The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.

Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Jiri Olsa
2009-10-24 17:07:51 +0800
67b394f7f tracing: Fix comment typo and documentation example ... Browse Code »

Trivial patch to fix a documentation example and to fix a
comment.

Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Jiri Olsa
2009-10-24 17:07:50 +0800
3e69533b5 tracing: Fix trace_seq_printf() return value ... Browse Code »

trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:

trace_seq_printf(s, "");

is perfectly legal, and returning 0 would indicate that it
failed.

We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.

Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.

Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Jiri Olsa
2009-10-24 17:07:50 +0800
cf8517cf9 tracing: Update *ppos instead of filp->f_pos ... Browse Code »

Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().

Signed-off-by: Jiri Olsa
Signed-off-by: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Jiri Olsa
2009-10-24 17:07:49 +0800

23 Oct, 2009

2 commits

54f440760 perf events: Don't generate events for the idle task when exclude_idle is set ... Browse Code »

Getting samples for the idle task is often not interesting, so
don't generate them when exclude_idle is set for the event in
question.

Signed-off-by: Søren Sandmann Pedersen
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Soeren Sandmann
2009-10-23 15:35:02 +0800
721a669b7 perf events: Fix swevent hrtimer sampling by keeping track of remaining time whe… ... Browse Code »

…n enabling/disabling swevent hrtimers

Make the hrtimer based events work for sysprof.

Whenever a swevent is scheduled out, the hrtimer is canceled.
When it is scheduled back in, the timer is restarted. This
happens every scheduler tick, which means the timer never
expired because it was getting repeatedly restarted over and
over with the same period.

To fix that, save the remaining time when disabling; when
reenabling, use that saved time as the period instead of the
user-specified sampling period.

Also, move the starting and stopping of the hrtimers to helper
functions instead of duplicating the code.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Soeren Sandmann
2009-10-23 15:35:02 +0800

22 Oct, 2009

1 commit

04bf7539c PM: Make warning in suspend_test_finish() less likely to happen ... Browse Code »

Increase TEST_SUSPEND_SECONDS to 10 so the warning in
suspend_test_finish() doesn't annoy the users of slower systems so much.

Also, make the warning print the suspend-resume cycle time, so that we
know why the warning actually triggered.

Patch prepared during the hacking session at the Kernel Summit in Tokyo.

Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2009-10-22 07:23:45 +0800

19 Oct, 2009

1 commit

65a644643 HWPOISON: Allow schedule_on_each_cpu() from keventd ... Browse Code »

Right now when calling schedule_on_each_cpu() from keventd there
is a deadlock because it tries to schedule a work item on the current CPU
too. This happens via lru_add_drain_all() in hwpoison.

Just call the function for the current CPU in this case. This is actually
faster too.

Debugging with Fengguang Wu & Max Asbock

Signed-off-by: Andi Kleen

Andi Kleen
2009-10-19 13:29:22 +0800

16 Oct, 2009

3 commits

89061d3d5 futex: Move drop_futex_key_refs out of spinlock'ed region ... Browse Code »

When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.

Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.

It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.

Originally-From: Helge Bahmann
Signed-off-by: Darren Hart
Cc:
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: Dinakar Guniguntala
Cc: John Stultz
Cc: Sven-Thorsten Dietrich
Cc: John Kacur
LKML-Reference:
Signed-off-by: Ingo Molnar

Darren Hart
2009-10-16 16:19:18 +0800
bd0704111 Merge the right tty-fixes branch ... Browse Code »

* branch 'tty-fixes'
tty: use the new 'flush_delayed_work()' helper to do ldisc flush
workqueue: add 'flush_delayed_work()' to run and wait for delayed work
tty: Make flush_to_ldisc() locking more robust

Linus Torvalds
2009-10-16 05:59:24 +0800
237c80c5c rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang ... Browse Code »

If the following sequence of events occurs, then
TREE_PREEMPT_RCU will hang waiting for a grace period to
complete, eventually OOMing the system:

o A TREE_PREEMPT_RCU build of the kernel is booted on a system
with more than 64 physical CPUs present (32 on a 32-bit system).
Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
with RCU_FANOUT set to a sufficiently small value that the
physical CPUs populate two or more leaf rcu_node structures.

o A task is preempted in an RCU read-side critical section
while running on a CPU corresponding to a given leaf rcu_node
structure.

o All CPUs corresponding to this same leaf rcu_node structure
record quiescent states for the current grace period.

o All of these same CPUs go offline (hence the need for enough
physical CPUs to populate more than one leaf rcu_node structure).
This causes the preempted task to be moved to the root rcu_node
structure.

At this point, there is nothing left to cause the quiescent
state to be propagated up the rcu_node tree, so the current
grace period never completes.

The simplest fix, especially after considering the deadlock
possibilities, is to detect this situation when the last CPU is
offlined, and to set that CPU's ->qsmask bit in its leaf
rcu_node structure. This will cause the next invocation of
force_quiescent_state() to end the grace period.

Without this fix, this hang can be triggered in an hour or so on
some machines with rcutorture and random CPU onlining/offlining.
With this fix, these same machines pass a full 10 hours of this
sort of abuse.

Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
2009-10-16 02:33:01 +0800

15 Oct, 2009

7 commits

019129d59 rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU ... Browse Code »

For the short term, map synchronize_rcu_expedited() to
synchronize_rcu() for TREE_PREEMPT_RCU and to
synchronize_sched_expedited() for TREE_RCU.

Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably
more complex and intrusive.

Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
2009-10-15 17:17:17 +0800
37c72e56f rcu: Prevent RCU IPI storms in presence of high call_rcu() load ... Browse Code »

As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has
invoked force_quiescent_state() in the meantime.

This should fix the performance regression reported by Nick.

Reported-by: Nick Piggin
Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: jens.axboe@oracle.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
2009-10-15 17:17:16 +0800
d6047d79b Merge branch 'tty-fixes' ... Browse Code »

* branch 'tty-fixes':
tty: use the new 'flush_delayed_work()' helper to do ldisc flush
workqueue: add 'flush_delayed_work()' to run and wait for delayed work
Make flush_to_ldisc properly handle parallel calls

Linus Torvalds
2009-10-15 06:34:55 +0800
ee67e6cbe Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
oprofile: warn on freeing event buffer too early
oprofile: fix race condition in event_buffer free
lockdep: Use cpu_clock() for lockstat

Linus Torvalds
2009-10-15 06:25:35 +0800
f061d83a2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix missing kernel-doc notation
Revert "x86, timers: Check for pending timers after (device) interrupts"
sched: Update the clock of runqueue select_task_rq() selected

Linus Torvalds
2009-10-15 06:25:04 +0800
e345fe1ad Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing/filters: Fix memory leak when setting a filter
tracing: fix trace_vprintk call

Linus Torvalds
2009-10-15 06:24:51 +0800
8c53e4631 workqueue: add 'flush_delayed_work()' to run and wait for delayed work ... Browse Code »

It basically turns a delayed work into an immediate work, and then waits
for it to finish, thus allowing you to force (and wait for) an immediate
flush of a delayed work.

We'll want to use this in the tty layer to clean up tty_flush_to_ldisc().

Acked-by: Oleg Nesterov
[ Fixed to use 'del_timer_sync()' as noted by Oleg ]
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-10-15 06:11:35 +0800