Eric Lee / smarc-fsl-linux-kernel

03 Apr, 2010

3 commits

897f0b3c3 sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code ... Browse Code »

This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.

- cpuset_lock() doesn't really work. It is needed for
cpuset_cpus_allowed_locked() but we can't take this lock in
try_to_wake_up()->select_fallback_rq() path.

- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.

- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
which is not irq-safe, but try_to_wake_up() can be called from irq.

Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.

Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.

The subsequent patches try to to fix these problems.

Signed-off-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Oleg Nesterov
2010-04-03 02:12:01 +0800
32bd7eb5a sched: Remove remaining USER_SCHED code ... Browse Code »

This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")

Signed-off-by: Li Zefan
Acked-by: Dhaval Giani
Signed-off-by: Peter Zijlstra
Cc: David Howells
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2010-04-03 02:12:00 +0800
c9494727c Merge branch 'linus' into sched/core ... Browse Code »

Merge reason: update to latest upstream

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-04-03 02:03:08 +0800

30 Mar, 2010

6 commits

246750ffa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
CRED: Fix memory leak in error handling

Linus Torvalds
2010-03-30 22:26:30 +0800
be3fd3cc7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Do not free zero sized per cpu areas
x86: Make sure free_init_pages() frees pages on page boundary
x86: Make smp_locks end with page alignment

Linus Torvalds
2010-03-30 22:22:38 +0800
570b8fb50 CRED: Fix memory leak in error handling ... Browse Code »

Fix a memory leak on an OOM condition in prepare_usermodehelper_creds().

Signed-off-by: Mathieu Desnoyers
Signed-off-by: David Howells
Signed-off-by: James Morris

Mathieu Desnoyers
2010-03-30 14:15:38 +0800
eed63519e x86: Do not free zero sized per cpu areas ... Browse Code »

This avoids an infinite loop in free_early_partial().

Add a warning to free_early_partial() to catch future problems.

-v5: put back start > end back into WARN_ONCE()
-v6: use one line for warning, suggested by Linus
-v7: more tests
-v8: remove the function name as suggested by Johannes
WARN_ONCE() will print out that function name.

Signed-off-by: Ian Campbell
Signed-off-by: Yinghai Lu
Tested-by: Konrad Rzeszutek Wilk
Tested-by: Joel Becker
Tested-by: Stanislaw Gruszka
Acked-by: Johannes Weiner
Cc: Peter Zijlstra
Cc: David Miller
Cc: Benjamin Herrenschmidt
Cc: Linus Torvalds
LKML-Reference:
Signed-off-by: Ingo Molnar

Ian Campbell
2010-03-30 00:55:40 +0800
a53f4f9ef SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUG ... Browse Code »

CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all
instances. Change the remaining instances. This makes the debugfs file
display the time mark and the owner's description again.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-03-30 00:14:47 +0800
88be12c44 slow-work: use get_ref wrapper instead of directly calling get_ref ... Browse Code »

Otherwise we can get an oops if the user has no get_ref/put_ref
requirement.

Signed-off-by: Dave Airlie
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Dave Airlie
2010-03-30 00:13:30 +0800

27 Mar, 2010

6 commits

b72c40949 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1
x86/PCI: for host bridge address space collisions, show conflicting resource
frv/PCI: remove redundant warnings
x86/PCI: remove redundant warnings
PCI: don't say we claimed a resource if we failed
PCI quirk: Disable MSI on VIA K8T890 systems
PCI quirk: RS780/RS880: work around missing MSI initialization
PCI quirk: only apply CX700 PCI bus parking quirk if external VT6212L is present
PCI: complain about devices that seem to be broken
PCI: print resources consistently with %pR
PCI: make disabled window printk style match the enabled ones
PCI: break out primary/secondary/subordinate for readability
PCI: for address space collisions, show conflicting resource
resources: add interfaces that return conflict information
PCI: cleanup error return for pcix get and set mmrbc functions
PCI: fix access of PCI_X_CMD by pcix get and set mmrbc functions
PCI: kill off pci_register_set_vga_state() symbol export.
PCI: fix return value from pcix_get_max_mmrbc()

Linus Torvalds
2010-03-27 07:34:29 +0800
054319b5e Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kern… ... Browse Code »

…el/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
time: Fix accumulation bug triggered by long delay.
posix-cpu-timers: Reset expire cache when no timer is running
timer stats: Fix del_timer_sync() and try_to_del_timer_sync()
clockevents: Sanitize min_delta_ns adjustment and prevent overflows

Linus Torvalds
2010-03-27 06:10:38 +0800
833961d81 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align

Linus Torvalds
2010-03-27 06:10:13 +0800
3cacf4246 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Use proper type in sched_getaffinity()
kernel/sched.c: Suppress unused var warning
sched: sched_getaffinity(): Allow less than NR_CPUS length

Linus Torvalds
2010-03-27 06:09:59 +0800
309d1dcb5 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Move two IRQ functions from .init.text to .text
genirq: Protect access to irq_desc->action in can_request_irq()
genirq: Prevent oneshot irq thread race

Linus Torvalds
2010-03-27 06:09:06 +0800
8128f55a0 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove excessive early_res debug output
softlockup: Stop spurious softlockup messages due to overflow
rcu: Fix local_irq_disable() CONFIG_PROVE_RCU=y false positives
rcu: Fix tracepoints & lockdep false positive
rcu: Make rcu_read_lock_bh_held() allow for disabled BH

Linus Torvalds
2010-03-27 06:08:31 +0800

25 Mar, 2010

3 commits

53feb2976 cpuset: alloc nodemask_t on the heap rather than the stack ... Browse Code »

Signed-off-by: Miao Xie
Acked-by: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Li Zefan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-03-25 07:31:21 +0800
5ab116c93 cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node ... Browse Code »

cpuset_mem_spread_node() returns an offline node, and causes an oops.

This patch fixes it by initializing task->mems_allowed to
node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing
memory hotplug.

Signed-off-by: Miao Xie
Acked-by: David Rientjes
Reported-by: Nick Piggin
Tested-by: Nick Piggin
Cc: Paul Menage
Cc: Li Zefan
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-03-25 07:31:21 +0800
9d34706f4 cgroups: remove duplicate include ... Browse Code »

commit e6a1105b ("cgroups: subsystem module loading interface") and commit
c50cc752 ("sched, cgroups: Fix module export") result in duplicate
including of module.h

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2010-03-25 07:31:19 +0800

24 Mar, 2010

3 commits

860652bfb genirq: Move two IRQ functions from .init.text to .text ... Browse Code »

Both functions should not be marked as __init, since they be called
from modules after the init section is freed.

Signed-off-by: Henrik Kretzschmar
Cc: Yinghai Lu
Cc: Peter Zijlstra
Cc: Jiri Kosina
LKML-Reference:
Signed-off-by: Thomas Gleixner

Henrik Kretzschmar
2010-03-24 21:38:23 +0800
cc8c3b784 genirq: Protect access to irq_desc->action in can_request_irq() ... Browse Code »

can_request_irq() accesses and dereferences irq_desc->action w/o
holding irq_desc->lock. So action can be freed on another CPU before
it's dereferenced. Unlikely, but ...

Protect it with desc->lock.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2010-03-24 21:38:23 +0800
66f1207bc resources: add interfaces that return conflict information ... Browse Code »

request_resource() and insert_resource() only return success or failure,
which no information about what existing resource conflicted with the
proposed new reservation. This patch adds request_resource_conflict()
and insert_resource_conflict(), which return the conflicting resource.

Callers may use this for better error messages or to adjust the new
resource and retry the request.

Signed-off-by: Bjorn Helgaas
Signed-off-by: Jesse Barnes

Bjorn Helgaas
2010-03-24 04:33:50 +0800

23 Mar, 2010

1 commit

830ec0458 time: Fix accumulation bug triggered by long delay. ... Browse Code »

The logarithmic accumulation done in the timekeeping has some overflow
protection that limits the max shift value. That means it will take
more then shift loops to accumulate all of the cycles. This causes
the shift decrement to underflow, which causes the loop to never exit.

The simplest fix would be simply to do a:
if (shift)
shift--;

However that is not optimal, as we know the cycle offset is larger
then the interval << shift, the above would make shift drop to zero,
then we would be spinning for quite awhile accumulating at interval
chunks at a time.

Instead, this patch only decreases shift if the offset is smaller
then cycle_interval << shift. This makes sure we accumulate using
the largest chunks possible without overflowing tick_length, and limits
the number of iterations through the loop.

This issue was found and reported by Sonic Zhang, who also tested the fix.
Many thanks your explanation and testing!

Reported-by: Sonic Zhang
Signed-off-by: John Stultz
Tested-by: Sonic Zhang
LKML-Reference:
Signed-off-by: Thomas Gleixner

John Stultz
2010-03-23 23:41:01 +0800

22 Mar, 2010

1 commit

8c2eb4805 softlockup: Stop spurious softlockup messages due to overflow ... Browse Code »

Ensure additions on touch_ts do not overflow. This can occur
when the top 32 bits of the TSC reach 0xffffffff causing
additions to touch_ts to overflow and this in turn generates
spurious softlockup warnings.

Signed-off-by: Colin Ian King
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Colin Ian King
2010-03-22 02:30:13 +0800

19 Mar, 2010

2 commits

2271048d1 ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align ... Browse Code »

The ring buffer uses 4 byte alignment while recording events into the
buffer, even on 64bit machines. This saves space when there are lots
of events being recorded at 4 byte boundaries.

The ring buffer has a zero copy method to write into the buffer, with
the reserving of space and then committing it. This may cause problems
when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
PPC this is not an issue, but on some architectures this would cause an
out-of-alignment exception.

This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
it forces the ring buffer event header to be 8 bytes and not 4,
and will align the length of the data to be 8 byte aligned.
This keeps the data payload at 8 byte alignments and will allow these
machines to run without issue.

The trick to this is that the header can be either 4 bytes or 8 bytes
depending on the length of the data payload. The 4 byte header
has a length field that supports up to 112 bytes. If the length of
the data is more than 112, the length field is set to zero, and the actual
length is stored in the next 4 bytes after the header.

When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
zero in the 4 byte header forcing the length to be stored in the 4 byte
array, even with a small data load. It also forces the length of the
data load to be 8 byte aligned. The combination of these two guarantee
that the data is always at 8 byte alignment.

Tested-by: Frederic Weisbecker
(on sparc64)
Reported-by: Frederic Weisbecker
Acked-by: David S. Miller
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-03-19 11:11:35 +0800
f82c37e7b Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
perf: Fix unexported generic perf_arch_fetch_caller_regs
perf record: Don't try to find buildids in a zero sized file
perf: export perf_trace_regs and perf_arch_fetch_caller_regs
perf, x86: Fix hw_perf_enable() event assignment
perf, ppc: Fix compile error due to new cpu notifiers
perf: Make the install relative to DESTDIR if specified
kprobes: Calculate the index correctly when freeing the out-of-line execution slot
perf tools: Fix sparse CPU numbering related bugs
perf_event: Fix oops triggered by cpu offline/online
perf: Drop the obsolete profile naming for trace events
perf: Take a hot regs snapshot for trace events
perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
perf/x86-64: Use frame pointer to walk on irq and process stacks
lockdep: Move lock events under lockdep recursion protection
perf report: Print the map table just after samples for which no map was found
perf report: Add multiple event support
perf session: Change perf_session post processing functions to take histogram tree
perf session: Add storage for seperating event types in report
perf session: Change add_hist_entry to take the tree root instead of session
perf record: Add ID and to recorded event data when recording multiple events
...

Linus Torvalds
2010-03-19 07:52:46 +0800

17 Mar, 2010

2 commits

dcd5c1662 perf: Fix unexported generic perf_arch_fetch_caller_regs ... Browse Code »

perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.

As a general rule, weak functions should not have their symbol
exported in the same file they are defined.

So let's export it on trace_event_perf.c as it is used by trace
events only.

This fixes:

ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!

-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake

Reported-by: Stephen Rothwell
Signed-off-by: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Xiao Guangrong
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2010-03-17 19:26:49 +0800
8bc037fb8 sched: Use proper type in sched_getaffinity() ... Browse Code »

Using the proper type fixes the following compiler warning:

kernel/sched.c:4850: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: KOSAKI Motohiro
Cc: torvalds@linux-foundation.org
Cc: travis@sgi.com
Cc: peterz@infradead.org
Cc: drepper@redhat.com
Cc: rja@sgi.com
Cc: sharyath@in.ibm.com
Cc: steiner@sgi.com
LKML-Reference:
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2010-03-17 17:48:49 +0800

16 Mar, 2010

3 commits

c890692bf kernel/sched.c: Suppress unused var warning ... Browse Code »

On UP:

kernel/sched.c: In function 'wake_up_new_task':
kernel/sched.c:2631: warning: unused variable 'cpu'

Signed-off-by: Andrew Morton
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Andrew Morton
2010-03-16 18:13:42 +0800
6427462bf sched: Remove some dead code ... Browse Code »

This was left over from "7c9414385e sched: Remove USER_SCHED"

Signed-off-by: Dan Carpenter
Acked-by: Dhaval Giani
Cc: Kay Sievers
Cc: Greg Kroah-Hartman
LKML-Reference:
Signed-off-by: Ingo Molnar

Dan Carpenter
2010-03-16 18:05:44 +0800
e3818b8dc rcu: Make rcu_read_lock_bh_held() allow for disabled BH ... Browse Code »

Disabling BH can stand in for rcu_read_lock_bh(), and this patch
updates rcu_read_lock_bh_held() to allow for this. In order to
avoid include-file hell, this function is moved out of line to
kernel/rcupdate.c.

This fixes a false positive RCU warning.

Reported-by: Arnd Bergmann
Reported-by: Eric Dumazet
Signed-off-by: Paul E. McKenney
Acked-by: Lai Jiangshan
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
2010-03-16 16:57:49 +0800

15 Mar, 2010

1 commit

cd3d8031e sched: sched_getaffinity(): Allow less than NR_CPUS length ... Browse Code »

[ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ]

Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons.
Unfortunately, glibc sched interface has the following definition:

# define __CPU_SETSIZE 1024
# define __NCPUBITS (8 * sizeof (__cpu_mask))
typedef unsigned long int __cpu_mask;
typedef struct
{
__cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
} cpu_set_t;

It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an
ABI issue ...

More recently, Sharyathi Nagesh reported following test program makes
misterious syscall failure:

-----------------------------------------------------------------------
#define _GNU_SOURCE
#include
#include
#include

int main()
{
cpu_set_t set;
if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0)
printf("\n Call is failing with:%d", errno);
}
-----------------------------------------------------------------------

Because the kernel assumes len argument of sched_getaffinity() is bigger
than NR_CPUS. But now it is not correct.

Now we are faced with the following annoying dilemma, due to
the limitations of the glibc interface built in years ago:

(1) if we change glibc's __CPU_SETSIZE definition, we lost
binary compatibility of _all_ application.

(2) if we don't change it, we also lost binary compatibility of
Sharyathi's use case.

Then, I would propse to change the rule of the len argument of
sched_getaffinity().

Old:
len should be bigger than NR_CPUS
New:
len should be bigger than maximum possible cpu id

This creates the following behavior:

(A) In the real 4096 cpus machine, the above test program still
return -EINVAL.

(B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost
all machines in the world), the above can run successfully.

Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means
they can rebuild their programs.

IOW we hope they are not annoyed by this issue ...

Reported-by: Sharyathi Nagesh
Signed-off-by: KOSAKI Motohiro
Acked-by: Ulrich Drepper
Acked-by: Peter Zijlstra
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Jack Steiner
Cc: Russ Anderson
Cc: Mike Travis
LKML-Reference:
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2010-03-15 15:28:44 +0800

14 Mar, 2010

4 commits

80a186074 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix pick_next_highest_task_rt() for cgroups
sched: Cleanup: remove unused variable in try_to_wake_up()
x86: Fix sched_clock_cpu for systems with unsynchronized TSC

Linus Torvalds
2010-03-14 06:46:18 +0800
4e3eaddd1 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_aperture

Fix trivial conflicts in kernel/trace/ftrace.c

Linus Torvalds
2010-03-14 06:43:01 +0800
8655e7e3d Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Do not record user stack trace from NMI context
tracing: Disable buffer switching when starting or stopping trace
tracing: Use same local variable when resetting the ring buffer
function-graph: Init curr_ret_stack with ret_stack
ring-buffer: Move disabled check into preempt disable section
function-graph: Add tracing_thresh support to function_graph tracer
tracing: Update the comm field in the right variable in update_max_tr
function-graph: Use comment notation for func names of dangling '}'
function-graph: Fix unused reference to ftrace_set_func()
tracing: Fix warning in s_next of trace file ops
tracing: Include irqflags headers from trace clock

Linus Torvalds
2010-03-14 06:40:50 +0800
9fdfbc2bf Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Provide generic perf_sample_data initialization
MAINTAINERS: Add Arnaldo as tools/perf/ co-maintainer
perf trace: Don't use pager if scripting
perf trace/scripting: Remove extraneous header read
perf, ARM: Modify kuser rmb() call to compile for Thumb-2
x86/stacktrace: Don't dereference bad frame pointers
perf archive: Don't try to collect files without a build-id
perf_events, x86: Fixup fixed counter constraints
perf, x86: Restrict the ANY flag
perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
perf, x86: add some IBS macros to perf_event.h
perf, x86: make IBS macros available in perf_event.h
hw-breakpoints: Remove stub unthrottle callback
x86/hw-breakpoints: Remove the name field
perf: Remove pointless breakpoint union
perf lock: Drop the buffers multiplexing dependency
perf lock: Fix and add misc documentally things
percpu: Add __percpu sparse annotations to hw_breakpoint

Linus Torvalds
2010-03-14 06:39:42 +0800

13 Mar, 2010

5 commits

b6345879c tracing: Do not record user stack trace from NMI context ... Browse Code »

A bug was found with Li Zefan's ftrace_stress_test that caused applications
to segfault during the test.

Placing a tracing_off() in the segfault code, and examining several
traces, I found that the following was always the case. The lock tracer
was enabled (lockdep being required) and userstack was enabled. Testing
this out, I just enabled the two, but that was not good enough. I needed
to run something else that could trigger it. Running a load like hackbench
did not work, but executing a new program would. The following would
trigger the segfault within seconds:

# echo 1 > /debug/tracing/options/userstacktrace
# echo 1 > /debug/tracing/events/lock/enable
# while :; do ls > /dev/null ; done

Enabling the function graph tracer and looking at what was happening
I finally noticed that all cashes happened just after an NMI.

1) | copy_user_handle_tail() {
1) | bad_area_nosemaphore() {
1) | __bad_area_nosemaphore() {
1) | no_context() {
1) | fixup_exception() {
1) 0.319 us | search_exception_tables();
1) 0.873 us | }
[...]
1) 0.314 us | __rcu_read_unlock();
1) 0.325 us | native_apic_mem_write();
1) 0.943 us | }
1) 0.304 us | rcu_nmi_exit();
[...]
1) 0.479 us | find_vma();
1) | bad_area() {
1) | __bad_area() {

After capturing several traces of failures, all of them happened
after an NMI. Curious about this, I added a trace_printk() to the NMI
handler to read the regs->ip to see where the NMI happened. In which I
found out it was here:

ffffffff8135b660 :
ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp
ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800

What was happening is that the NMI would happen at the place that a page
fault occurred. It would call rcu_read_lock() which was traced by
the lock events, and the user_stack_trace would run. This would trigger
a page fault inside the NMI. I do not see where the CR2 register is
saved or restored in NMI handling. This means that it would corrupt
the page fault handling that the NMI interrupted.

The reason the while loop of ls helped trigger the bug, was that
each execution of ls would cause lots of pages to be faulted in, and
increase the chances of the race happening.

The simple solution is to not allow user stack traces in NMI context.
After this patch, I ran the above "ls" test for a couple of hours
without any issues. Without this patch, the bug would trigger in less
than a minute.

Cc: stable@kernel.org
Reported-by: Li Zefan
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-03-13 09:31:49 +0800
a2f807142 tracing: Disable buffer switching when starting or stopping trace ... Browse Code »

When the trace iterator is read, tracing_start() and tracing_stop()
is called to stop tracing while the iterator is processing the trace
output.

These functions disable both the standard buffer and the max latency
buffer. But if the wakeup tracer is running, it can switch these
buffers between the two disables:

buffer = global_trace.buffer;
if (buffer)
ring_buffer_record_disable(buffer);

<<
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-03-13 09:30:21 +0800
283740c61 tracing: Use same local variable when resetting the ring buffer ... Browse Code »

In the ftrace code that resets the ring buffer it references the
buffer with a local variable, but then uses the tr->buffer as the
parameter to reset. If the wakeup tracer is running, which can
switch the tr->buffer with the max saved buffer, this can break
the requirement of disabling the buffer before the reset.

buffer = tr->buffer;
ring_buffer_record_disable(buffer);
synchronize_sched();
__tracing_reset(tr->buffer, cpu);

If the tr->buffer is swapped, then the reset is not happening to the
buffer that was disabled. This will cause the ring buffer to fail.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Reported-by: Lai Jiangshan
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-03-13 09:29:20 +0800
ea14eb714 function-graph: Init curr_ret_stack with ret_stack ... Browse Code »

If the graph tracer is active, and a task is forked but the allocating of
the processes graph stack fails, it can cause crash later on.

This is due to the temporary stack being NULL, but the curr_ret_stack
variable is copied from the parent. If it is not -1, then in
ftrace_graph_probe_sched_switch() the following:

for (index = next->curr_ret_stack; index >= 0; index--)
next->ret_stack[index].calltime += timestamp;

Will cause a kernel OOPS.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-03-13 09:28:02 +0800
52fbe9cde ring-buffer: Move disabled check into preempt disable section ... Browse Code »

The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.

But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.

Cc: stable@kernel.org
Reported-by: Li Zefan
Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2010-03-13 09:26:56 +0800