Eric Lee / smarc-fsl-linux-kernel

13 Oct, 2010

1 commit

d01343244 ring-buffer: Fix typo of time extends per page ... Browse Code »

Time stamps for the ring buffer are created by the difference between
two events. Each page of the ring buffer holds a full 64 bit timestamp.
Each event has a 27 bit delta stamp from the last event. The unit of time
is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
happen more than 134 milliseconds apart, a time extend is inserted
to add more bits for the delta. The time extend has 59 bits, which
is good for ~18 years.

Currently the time extend is committed separately from the event.
If an event is discarded before it is committed, due to filtering,
the time extend still exists. If all events are being filtered, then
after ~134 milliseconds a new time extend will be added to the buffer.

This can only happen till the end of the page. Since each page holds
a full timestamp, there is no reason to add a time extend to the
beginning of a page. Time extends can only fill a page that has actual
data at the beginning, so there is no fear that time extends will fill
more than a page without any data.

When reading an event, a loop is made to skip over time extends
since they are only used to maintain the time stamp and are never
given to the caller. As a paranoid check to prevent the loop running
forever, with the knowledge that time extends may only fill a page,
a check is made that tests the iteration of the loop, and if the
iteration is more than the number of time extends that can fit in a page
a warning is printed and the ring buffer is disabled (all of ftrace
is also disabled with it).

There is another event type that is called a TIMESTAMP which can
hold 64 bits of data in the theoretical case that two events happen
18 years apart. This code has not been implemented, but the name
of this event exists, as well as the structure for it. The
size of a TIMESTAMP is 16 bytes, where as a time extend is only
8 bytes. The macro used to calculate how many time extends can fit on
a page used the TIMESTAMP size instead of the time extend size
cutting the amount in half.

The following test case can easily trigger the warning since we only
need to have half the page filled with time extends to trigger the
warning:

# cd /sys/kernel/debug/tracing/
# echo function > current_tracer
# echo 'common_pid < 0' > events/ftrace/function/filter
# echo > trace
# echo 1 > trace_marker
# sleep 120
# cat trace

Enabling the function tracer and then setting the filter to only trace
functions where the process id is negative (no events), then clearing
the trace buffer to ensure that we have nothing in the buffer,
then write to trace_marker to add an event to the beginning of a page,
sleep for 2 minutes (only 35 seconds is probably needed, but this
guarantees the bug), and then finally reading the trace which will
trigger the bug.

This patch fixes the typo and prevents the false positive of that warning.

Reported-by: Hans J. Koch
Tested-by: Hans J. Koch
Cc: Thomas Gleixner
Cc: Stable Kernel
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-10-13 00:06:43 +0800

10 Sep, 2010

3 commits

f2955b490 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
perf symbols: Fix multiple initialization of symbol system
perf: Fix CPU hotplug
perf, trace: Fix module leak
tracing/kprobe: Fix handling of C-unlike argument names
tracing/kprobes: Fix handling of argument names
perf probe: Fix handling of arguments names
perf probe: Fix return probe support
tracing/kprobe: Fix a memory leak in error case
tracing: Do not allow llseek to set_ftrace_filter

Linus Torvalds
2010-09-10 22:31:24 +0800
df0916255 tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread ... Browse Code »

Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash. This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer. This causes a NULL ptr deref with possible security
implications. Tracked as CVE-2010-3079.

Cc: Robert Swiecki
Cc: Eugene Teo
Cc:
Signed-off-by: Chris Wright
Signed-off-by: Steven Rostedt

Chris Wright
2010-09-10 10:43:49 +0800
9cb627d5f perf, trace: Fix module leak ... Browse Code »

Commit 1c024eca (perf, trace: Optimize tracepoints by using
per-tracepoint-per-cpu hlist to track events) caused a module
refcount leak.

Reported-And-Tested-by: Avi Kivity
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2010-09-10 02:38:51 +0800

09 Sep, 2010

2 commits

79637a41e Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
gcc-4.6: kernel/*: Fix unused but set warnings
mutex: Fix annotations to include it in kernel-locking docbook
pid: make setpgid() system call use RCU read-side critical section
MAINTAINERS: Add RCU's public git tree

Linus Torvalds
2010-09-09 02:13:42 +0800
9c55cb12c tracing: Do not allow llseek to set_ftrace_filter ... Browse Code »

Reading the file set_ftrace_filter does three things.

1) shows whether or not filters are set for the function tracer
2) shows what functions are set for the function tracer
3) shows what triggers are set on any functions

3 is independent from 1 and 2.

The way this file currently works is that it is a state machine,
and as you read it, it may change state. But this assumption breaks
when you use lseek() on the file. The state machine gets out of sync
and the t_show() may use the wrong pointer and cause a kernel oops.

Luckily, this will only kill the app that does the lseek, but the app
dies while holding a mutex. This prevents anyone else from using the
set_ftrace_filter file (or any other function tracing file for that matter).

A real fix for this is to rewrite the code, but that is too much for
a -rc release or stable. This patch simply disables llseek on the
set_ftrace_filter() file for now, and we can do the proper fix for the
next major release.

Reported-by: Robert Swiecki
Cc: Chris Wright
Cc: Tavis Ormandy
Cc: Eugene Teo
Cc: vendor-sec@lst.de
Cc:
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-09-09 00:08:01 +0800

08 Sep, 2010

3 commits

da34634fd tracing/kprobe: Fix handling of C-unlike argument names ... Browse Code »

Check the argument name whether it is invalid (not C-like symbol name). This
makes event format simple.

Reported-by: Srikar Dronamraju
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:19 +0800
aba91595c tracing/kprobes: Fix handling of argument names ... Browse Code »

Set "argN" name for each argument automatically if it has no specified name.
Since dynamic trace event(kprobe_events) accepts special characters for its
argument, its format can show those special characters (e.g. '$', '%', '+').
However, perf can't parse those format because of the character (especially
'%') mess up the format. This sets "argX" name for those arguments if user
omitted the argument names.

E.g.
# echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events
# cat tracing/kprobe_events
p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack

Reported-by: Srikar Dronamraju
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:19 +0800
61a527362 tracing/kprobe: Fix a memory leak in error case ... Browse Code »

Fix a memory leak which happens when a field name conflicts with others. In
error case, free_trace_probe() will free all arguments until nr_args, so this
increments nr_args the begining of the loop instead of the end.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:18 +0800

05 Sep, 2010

1 commit

b3bd3de66 gcc-4.6: kernel/*: Fix unused but set warnings ... Browse Code »

No real bugs I believe, just some dead code.

Signed-off-by: Andi Kleen
Cc: Peter Zijlstra
Cc: andi@firstfloor.org
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Andi Kleen
2010-09-05 20:36:58 +0800

01 Sep, 2010

1 commit

3aaba20f2 tracing: Fix a race in function profile ... Browse Code »

While we are reading trace_stat/functionX and someone just
disabled function_profile at that time, we can trigger this:

divide error: 0000 [#1] PREEMPT SMP
...
EIP is at function_stat_show+0x90/0x230
...

This fix just takes the ftrace_profile_lock and checks if
rec->counter is 0. If it's 0, we know the profile buffer
has been reset.

Signed-off-by: Li Zefan
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2010-09-01 04:46:23 +0800

25 Aug, 2010

1 commit

151772dbf tracing/trace_stack: Fix stack trace on ppc64 ... Browse Code »

save_stack_trace() stores the instruction pointer, not the
function descriptor. On ppc64 the trace stack code currently
dereferences the instruction pointer and shows 8 bytes of
instructions in our backtraces:

# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (26 entries)
----- ---- --------
0) 5424 112 0x6000000048000004
1) 5312 160 0x60000000ebad01b0
2) 5152 160 0x2c23000041c20030
3) 4992 240 0x600000007c781b79
4) 4752 160 0xe84100284800000c
5) 4592 192 0x600000002fa30000
6) 4400 256 0x7f1800347b7407e0
7) 4144 208 0xe89f0108f87f0070
8) 3936 272 0xe84100282fa30000

Since we aren't dealing with function descriptors, use %pS
instead of %pF to fix it:

# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (26 entries)
----- ---- --------
0) 5424 112 ftrace_call+0x4/0x8
1) 5312 160 .current_io_context+0x28/0x74
2) 5152 160 .get_io_context+0x48/0xa0
3) 4992 240 .cfq_set_request+0x94/0x4c4
4) 4752 160 .elv_set_request+0x60/0x84
5) 4592 192 .get_request+0x2d4/0x468
6) 4400 256 .get_request_wait+0x7c/0x258
7) 4144 208 .__make_request+0x49c/0x610
8) 3936 272 .generic_make_request+0x390/0x434

Signed-off-by: Anton Blanchard
Cc: rostedt@goodmis.org
Cc: fweisbec@gmail.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Anton Blanchard
2010-08-25 19:08:48 +0800

16 Aug, 2010

1 commit

d244b6bd4 Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/rostedt/linux-2.6-trace into trace/tip/perf/urgent-4

Conflicts:
kernel/trace/trace_events.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Steven Rostedt
2010-08-16 23:17:30 +0800

14 Aug, 2010

1 commit

1aa54bca6 tracing: Sanitize value returned from write(trace_marker, "...", len) ... Browse Code »

When userspace code writes non-new-line-terminated string to trace_marker
file, write handler appends new-line and returns number of bytes written
to trace buffer, so
write(fd, "abc", 3) will return 4

That's unexpected and unfortunately it confuses glibc's fprintf function.

Example:
int main() {
fprintf(stderr, "abc");
return 0;
}

$ gcc test.c -o test
$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
$ ./test 2>/sys/kernel/debug/tracing/trace_marker

results in infinite loop:
write(fd, "abc", 3) = 4
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
(...)

...and kernel trace buffer full of empty markers.

Fix it by sanitizing write return value.

Signed-off-by: Marcin Slusarz
LKML-Reference:
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: Steven Rostedt

Marcin Slusarz
2010-08-14 03:23:16 +0800

13 Aug, 2010

1 commit

2a37a3df5 tracing/events: Convert format output to seq_file ... Browse Code »

Two new events were added that broke the current format output.

Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout

The reason is that their print_fmt exceeded a page size. Since the output
of the format used simple_read_from_buffer and trace_seq, it was limited
to a page size in output.

This patch converts the printing of the format of an event into seq_file,
which allows greater than a page size to be shown.

I diffed all event formats comparing the output with and without this
patch. All matched except for the above two, which showed just:

FORMAT TOO BIG

without this patch, but now properly displays the output with this patch.

v2: Remove updating *pos in seq start function.
[ Thanks to Li Zefan for pointing that out ]

Reviewed-by: Li Zefan
Cc: Martin K. Petersen
Cc: Kei Tokunaga
Cc: James Bottomley
Cc: Tomohiro Kusumi
Cc: Xiao Guangrong
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-08-13 04:59:29 +0800

12 Aug, 2010

1 commit

8d57a98cc block: add secure discard ... Browse Code »

Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.

Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Hunter
2010-08-12 23:43:30 +0800

11 Aug, 2010

1 commit

2f9e825d3 Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.

Linus Torvalds
2010-08-11 06:22:42 +0800

08 Aug, 2010

5 commits

78417334b Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing ... Browse Code »

* 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
do_coredump: Do not take BKL
init: Remove the BKL from startup code

Linus Torvalds
2010-08-08 08:06:54 +0800
3b7433b8a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c

Linus Torvalds
2010-08-08 03:42:58 +0800
62c2a7d96 block: push BKL into blktrace ioctls ... Browse Code »

The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.

It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.

Signed-off-by: Arnd Bergmann
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Arnd Bergmann
2010-08-08 00:26:08 +0800
7b6d91dae block: unify flags for struct bio and struct request ... Browse Code »

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:20:39 +0800
33659ebba block: remove wrappers for request type/flags ... Browse Code »

Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:17:56 +0800

07 Aug, 2010

5 commits

b62ad9ab1 Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Fix read_persistent_clock fallout
kgdb: Do not access xtime directly
powerpc: Clean up obsolete code relating to decrementer and timebase
powerpc: Rework VDSO gettimeofday to prevent time going backwards
clocksource: Add __clocksource_updatefreq_hz/khz methods
x86: Convert common clocksources to use clocksource_register_hz/khz
timekeeping: Make xtime and wall_to_monotonic static
hrtimer: Cleanup direct access to wall_to_monotonic
um: Convert to use read_persistent_clock
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
powerpc: Cleanup xtime usage
powerpc: Simplify update_vsyscall
time: Kill off CONFIG_GENERIC_TIME
time: Implement timespec_add
x86: Fix vtime/file timestamp inconsistencies

Trivial conflicts in Documentation/feature-removal-schedule.txt

Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e28c ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")

Linus Torvalds
2010-08-07 04:18:29 +0800
18fab912d tracing: Fix ring_buffer_read_page reading out of page boundary ... Browse Code »

With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:

[PATCH]x86: make spurious_fault check correct pte bit

Function call graph trace with the following will trigger a page fault.

# cd /sys/kernel/debug/tracing/
# echo function_graph > current_tracer
# cat per_cpu/cpu1/trace_pipe_raw > /dev/null

BUG: unable to handle kernel paging request at ffff880006e99000
IP: [] rb_event_length+0x1/0x3f
PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/net/lo/operstate
CPU 1
Modules linked in:

Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
RIP: 0010:[] [] rb_event_length+0x1/0x3f
RSP: 0018:ffff880006475e38 EFLAGS: 00010006
RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
FS: 00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
Stack:
ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
Call Trace:
[] ? ring_buffer_read_page+0x15a/0x24a
[] ? return_to_handler+0x15/0x2f
[] tracing_buffers_read+0xb9/0x164
[] vfs_read+0xaf/0x150
[] return_to_handler+0x0/0x2f
[] __bad_area_nosemaphore+0x17e/0x1a1
[] return_to_handler+0x0/0x2f
[] bad_area_nosemaphore+0x13/0x15
Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
RIP [] rb_event_length+0x1/0x3f
RSP
CR2: ffff880006e99000
---[ end trace a6877bb92ccb36bb ]---

The root cause is that ring_buffer_read_page() may read out of page
boundary, because the boundary checking is done after reading. This is
fixed via doing boundary checking before reading.

Reported-by: Shaohua Li
Cc:
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Steven Rostedt

Huang Ying
2010-08-07 02:34:45 +0800
c4efd6b56 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
sched: No need for bootmem special cases
sched: Revert nohz_ratelimit() for now
sched: Reduce update_group_power() calls
sched: Update rq->clock for nohz balanced cpus
sched: Fix spelling of sibling
sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
sched: thread_group_cputime: Simplify, document the "alive" check
sched: Remove the obsolete exit_state/signal hacks
sched: task_tick_rt: Remove the obsolete ->signal != NULL check
sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
sched: Fix comments to make them DocBook happy
sched: Fix fix_small_capacity
powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
powerpc: Enable asymmetric SMT scheduling on POWER7
sched: Add asymmetric group packing option for sibling domain
sched: Fix capacity calculations for SMT4
sched: Change nohz idle load balancing logic to push model
...

Linus Torvalds
2010-08-07 00:39:22 +0800
4aed2fd8e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c

Linus Torvalds
2010-08-07 00:30:52 +0800
575570f02 tracing: Fix an unallocated memory access in function_graph ... Browse Code »

With CONFIG_DEBUG_PAGEALLOC, I observed an unallocated memory access in
function_graph trace. It appears we find a small size entry in ring buffer,
but we access it as a big size entry. The access overflows the page size
and touches an unallocated page.

Cc:
Signed-off-by: Shaohua Li
LKML-Reference:
[ Added a comment to explain the problem - SDR ]
Signed-off-by: Steven Rostedt

Shaohua Li
2010-08-07 00:19:15 +0800

05 Aug, 2010

2 commits

19063c776 ftrace,kdb: Allow dumping a specific cpu's buffer with ftdump ... Browse Code »

In systems with more than one processor it is desirable to look at the
per cpu trace buffers.

Signed-off-by: Jason Wessel
Acked-by: Steven Rostedt
CC: Frederic Weisbecker

Jason Wessel
2010-08-05 22:22:23 +0800
955b61e59 ftrace,kdb: Extend kdb to be able to dump the ftrace buffer ... Browse Code »

Add in a helper function to allow the kdb shell to dump the ftrace
buffer.

Modify trace.c to expose the capability to iterate over the ftrace
buffer in a read only capacity.

Signed-off-by: Jason Wessel
Acked-by: Steven Rostedt
CC: Frederic Weisbecker

Jason Wessel
2010-08-05 22:22:23 +0800

04 Aug, 2010

1 commit

9da79ab83 tracing/kprobes: unregister_trace_probe needs to be called under mutex ... Browse Code »

Comment in unregister_trace_probe() says probe_lock will be held when it
gets called. However there is a case where it might called without the
probe_lock being held. Also since we are traversing the probe_list and
deleting an element from the probe_list, probe_lock should be held.

This was first pointed in uprobes traceevent review by Frederic
Weisbecker here. (http://lkml.org/lkml/2010/5/12/106)

Cc: Ingo Molnar
Cc: Masami Hiramatsu
Acked-by: Masami Hiramatsu
Acked-by: Steven Rostedt
LKML-Reference:
Signed-off-by: Srikar Dronamraju
Signed-off-by: Arnaldo Carvalho de Melo

Srikar Dronamraju
2010-08-04 23:41:23 +0800

02 Aug, 2010

1 commit

669336e4c perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Browse Code »

We use synchronize_sched() to ensure a tracepoint won't be called
while/after we release the perf buffers it references.

But the tracepoint API has its own API for that:
tracepoint_synchronize_unregister(). Use it instead as it's
self-explanatory and eases maintainance.

Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Mathieu Desnoyers
Cc: Steven Rostedt
Cc: Li Zefan

Frederic Weisbecker
2010-08-02 07:30:56 +0800

27 Jul, 2010

1 commit

592913ecb time: Kill off CONFIG_GENERIC_TIME ... Browse Code »

Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.

Signed-off-by: John Stultz
LKML-Reference:
Signed-off-by: Thomas Gleixner

John Stultz
2010-07-27 18:40:54 +0800

23 Jul, 2010

2 commits

3a01736e7 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/ro… ... Browse Code »

…stedt/linux-2.6-trace into perf/core

Ingo Molnar
2010-07-23 15:10:29 +0800
24a461d53 trace: strlen() return doesn't account for the NULL ... Browse Code »

We need to add one to the strlen() return because of the NULL
character. The type->name here generally comes from the kernel and I
don't think any of them come close to being MAX_TRACER_SIZE (100)
characters long so this is basically a cleanup.

Signed-off-by: Dan Carpenter
LKML-Reference:
Signed-off-by: Steven Rostedt

Dan Carpenter
2010-07-23 02:56:41 +0800

22 Jul, 2010

1 commit

dca45ad8a Merge branch 'linus' into sched/core ... Browse Code »

Merge reason: Move from the -rc3 to the almost-rc6 base.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-07-22 03:45:08 +0800

21 Jul, 2010

4 commits

ef710e100 tracing: Shrink max latency ringbuffer if unnecessary ... Browse Code »

Documentation/trace/ftrace.txt says

buffer_size_kb:

This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
than requested, the rest of the page will be used,
making the actual allocation bigger than requested.
( Note, the size may not be a multiple of the page size
due to buffer management overhead. )

This can only be updated when the current_tracer
is set to "nop".

But it's incorrect. currently total memory consumption is
'buffer_size_kb x CPUs x 2'.

Why two times difference is there? because ftrace implicitly allocate
the buffer for max latency too.

That makes sad result when admin want to use large buffer. (If admin
want full logging and makes detail analysis). example, If admin
have 24 CPUs machine and write 200MB to buffer_size_kb, the system
consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
usually unacceptable.

Fortunatelly, almost all users don't use max latency feature.
The max latency buffer can be disabled easily.

This patch shrink buffer size of the max latency buffer if
unnecessary.

Signed-off-by: KOSAKI Motohiro
LKML-Reference:
Signed-off-by: Steven Rostedt

KOSAKI Motohiro
2010-07-21 22:20:17 +0800
bc289ae98 tracing: Reduce latency and remove percpu trace_seq ... Browse Code »

__print_flags() and __print_symbolic() use percpu trace_seq:

1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
2) It is percpu data and it wastes more memory for multi-cpus system.
3) It disables preemption when it executes its core routine
"trace_seq_printf(s, "%s: ", #call);" and introduces latency.

So we move this trace_seq to struct trace_iterator.

Signed-off-by: Lai Jiangshan
LKML-Reference:
Signed-off-by: Steven Rostedt

Lai Jiangshan
2010-07-21 10:05:34 +0800
985023dee trace: Reorder struct ring_buffer_per_cpu to remove padding on 64bit ... Browse Code »

Reorder structure to remove 8 bytes of padding on 64 bit builds.
This shrinks the size to 128 bytes so allowing allocation from a smaller
slab & needed one fewer cache lines.

Signed-off-by: Richard Kennedy
LKML-Reference:
Signed-off-by: Steven Rostedt

Richard Kennedy
2010-07-21 09:58:44 +0800
e870e9a12 tracing: Allow to disable cmdline recording ... Browse Code »

We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
-------------------------------------------------
2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
------ ------ ------ ------ ------ ------- -------
2.19 2.3 2.21 2.56 2.13 2.54 2.07
2.39 2.51 2.35 2.75 2.27 2.81 2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2010-07-21 09:52:33 +0800

20 Jul, 2010

1 commit

b444786f1 tracing: Use generic_file_llseek for debugfs ... Browse Code »

The default for llseek will change to no_llseek,
so the tracing debugfs files need to add explicit
.llseek assignments. Since we're dealing with regular
files from a VFS perspective, use generic_file_llseek.

Signed-off-by: Arnd Bergmann
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: John Kacur
Cc: Li Zefan
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Arnd Bergmann
2010-07-20 20:31:24 +0800