19 Oct, 2010
10 commits
-
Add an interface to allow usage of jump_labels with atomic counters.
Signed-off-by: Peter Zijlstra
Acked-by: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Now that there's still only a few users around, rename things to make
them more consistent.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
hw_breakpoint creation needs to account stuff per-task to ensure there
is always sufficient hardware resources to back these things due to
ptrace.With the perf per pmu context changes the event initialization no
longer has access to the event context, for the simple reason that we
need to first find the pmu (result of initialization) before we can
find the context.This makes hw_breakpoints unhappy, because it can no longer do per
task accounting, cure this by frobbing a task pointer in the event::hw
bits for now...Signed-off-by: Peter Zijlstra
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
So that we can pass the task pointer to the event allocation, so that
we can use task associated data during event initialization.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Currently it looks like find_lively_task_by_vpid() takes a task ref
and relies on find_get_context() to drop it.The problem is that perf_event_create_kernel_counter() shouldn't be
dropping task refs.Signed-off-by: Peter Zijlstra
Acked-by: Frederic Weisbecker
Acked-by: Matt Helsley
LKML-Reference:
Signed-off-by: Ingo Molnar -
Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
the move_group path in perf_event_open().Since we cannot de-construct the group (we rely on it to move the events), we
have to simply ignore the double attach. The group state is context invariant
and doesn't need changing.Reported-by: Matt Fleming
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.Signed-off-by: Peter Zijlstra
Acked-by: Kyle McMartin
Acked-by: Martin Schwidefsky
[ various fixes ]
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Ingo Molnar -
The group_sched_in() function uses a transactional approach to schedule
a group of events. In a group, either all events can be scheduled or
none are. To schedule each event in, the function calls event_sched_in().
In case of error, event_sched_out() is called on each event in the group.The problem is that event_sched_out() does not completely cancel the
effects of event_sched_in(). Furthermore event_sched_out() changes the
state of the event as if it had run which is not true is this particular
case.Those inconsistencies impact time tracking fields and may lead to events
in a group not all reporting the same time_enabled and time_running values.
This is demonstrated with the example below:$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
11423 baclears (32.85% scaling, ena=829181, run=556827)
7671 baclears (0.00% scaling, ena=556827, run=556827)2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
11705 baclears (57.83% scaling, ena=962822, run=405995)
11705 baclears (57.83% scaling, ena=962822, run=405995)Notice that in the first group, the last baclears event does not
report the same timings as its siblings.This issue comes from the fact that tstamp_stopped is updated
by event_sched_out() as if the event had actually run.To solve the issue, we must ensure that, in case of error, there is
no change in the event state whatsoever. That means timings must
remain as they were when entering group_sched_in().To do this we defer updating tstamp_running until we know the
transaction succeeded. Therefore, we have split event_sched_in()
in two parts separating the update to tstamp_running.Similarly, in case of error, we do not want to update tstamp_stopped.
Therefore, we have split event_sched_out() in two parts separating
the update to tstamp_stopped.With this patch, we now get the following output:
$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
11243 baclears (71.75% scaling, ena=1093330, run=308841)
11243 baclears (71.75% scaling, ena=1093330, run=308841)1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
9253 baclears (0.00% scaling, ena=784489, run=784489)
9253 baclears (0.00% scaling, ena=784489, run=784489)Note that the uneven timing between groups is a side effect of
the process spending most of its time sleeping, i.e., not enough
event rotations (but that's a separate issue).Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x7 (to count all possibilities).PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x3 (to count all possibilities).Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Robert Richter
Cc: # as far back as it applies
LKML-Reference:
Signed-off-by: Ingo Molnar -
You can only call update_context_time() when the context
is active, i.e., the thread it is attached to is still running.However, perf_event_read() can be called even when the context
is inactive, e.g., user read() the counters. The call to
update_context_time() must be conditioned on the status of
the context, otherwise, bogus time_enabled, time_running may
be returned. Here is an example on AMD64. The task program
is an example from libpfm4. The -p prints deltas every 1s.$ task -p -e cpu_clk_unhalted sleep 5
2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)Whereas if you don't read deltas, e.g., no call to perf_event_read() until
the process terminates:$ task -e cpu_clk_unhalted sleep 5
2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)Notice that time_enable, time_running are bogus in the first example
causing bogus scaling.This patch fixes the problem, by conditionally calling update_context_time()
in perf_event_read().Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Oct, 2010
1 commit
16 Oct, 2010
2 commits
-
…l/git/rostedt/linux-2.6-trace into perf/core
-
The C version of recordmcount is compiled to a binary, which will
end up located in the objtree. If the kernel is built with O=path,
the srctree will not include the binary recordmcount caller.Cc: Michal Marek
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Steven Rostedt
15 Oct, 2010
12 commits
-
The file kernel/trace/ftrace.c references the mcount() call to
convert the mcount() callers to nops. But because it references
mcount(), the mcount() address is placed in the relocation table.The C version of recordmcount reads the relocation table of all
object files, and it will add all references to mcount to the
__mcount_loc table that is used to find the places that call mcount()
and change the call to a nop. When recordmcount finds the mcount reference
in kernel/trace/ftrace.o, it saves that location even though the code
is not a call, but references mcount as data.On boot up, when all calls are converted to nops, the code has a safety
check to determine what op code it is actually replacing before it
replaces it. If that op code at the address does not match, then
a warning is printed and the function tracer is disabled.The reference to mcount in ftrace.c, causes this warning to trigger,
since the reference is not a call to mcount(). The ftrace.c file is
not compiled with the -pg flag, so no calls to mcount() should be
expected.This patch simply makes recordmcount.c skip the kernel/trace/ftrace.c
file. This was the same solution used by the perl version of
recordmcount.Reported-by: Ingo Molnar
Cc: John Reiser
Signed-off-by: Steven Rostedt -
Make !CONFIG_PM function stubs static inline and remove section
attribute.Signed-off-by: Robert Richter
-
Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
cleanup on failure) caused oprofile_perf_exit to be called
in the cleanup path of oprofile_perf_init. The __exit tag
for oprofile_perf_exit should therefore be dropped.The same has to be done for exit_driverfs as well, as this
function is called from oprofile_perf_exit. Else, we get
the following two linker errors.LD .tmp_vmlinux1
`oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1LD .tmp_vmlinux1
`exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1Signed-off-by: Anand Gadiyar
Cc: Will Deacon
Signed-off-by: Robert Richter -
oprofile_perf.c needs to include platform_device.h
Otherwise we get the following build break.CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':Signed-off-by: Anand Gadiyar
Cc: Matt Fleming
Cc: Will Deacon
Signed-off-by: Robert Richter -
Conflicts:
arch/arm/oprofile/common.c
kernel/perf_event.c -
…nel/git/rostedt/linux-2.6-trace into perf/core
-
The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.Suggested-by: Ingo Molnar
Cc:
Cc: Michal Marek
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser
Signed-off-by: Steven Rostedt -
…ic/random-tracing into perf/core
-
The elf reader for recordmcount.c had duplicate functions for both
32 bit and 64 bit elf handling. This was due to the need of using
the 32 and 64 bit elf structures.This patch consolidates the two by using macros to define the 32
and 64 bit names in a recordmcount.h file, and then by just defining
a RECORD_MCOUNT_64 macro and including recordmcount.h twice we
create the funtions for both the 32 bit version as well as the
64 bit version using one code source.Cc: John Reiser
Signed-off-by: Steven Rostedt -
This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.After verifying this works, other archs can add:
HAVE_C_MCOUNT_RECORD
in its Kconfig and it will use the C version of recordmcount
instead of the perl version.Cc:
Cc: Michal Marek
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser
Signed-off-by: Steven Rostedt -
Currently, the mcount callers are found with a perl script that does
an objdump on every file in the kernel. This is a C version of that
same code which should increase the performance time of compiling
the kernel with dynamic ftrace enabled.Signed-off-by: John Reiser
[ Updated the code to include .text.unlikely section as well as
changing the format to follow Linux coding style. ]Signed-off-by: Steven Rostedt
-
In x86, faults exit by executing the iret instruction, which then
reenables NMIs if we faulted in NMI context. Then if a fault
happens in NMI, another NMI can nest after the fault exits.But we don't yet support nested NMIs because we have only one NMI
stack. To prevent from that, check that vmalloc and kmemcheck
faults don't happen in this context. Most of the other kernel faults
in NMIs can be more easily spotted by finding explicit
copy_from,to_user() calls on review.Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: H. Peter Anvin
Cc: Mathieu Desnoyers
Cc: Peter Zijlstra
14 Oct, 2010
6 commits
-
Since the text_poke_smp() definately depends on actual
stop_machine() on smp, add that dependency to Kconfig.Signed-off-by: Masami Hiramatsu
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Ingo Molnar -
Use __stop_machine() in text_poke_smp() because the caller
must get online_cpus before calling text_poke_smp(), but
stop_machine() do it again. We don't need it.Signed-off-by: Masami Hiramatsu
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Ingo Molnar -
Define dummy __stop_machine() function even when
CONFIG_STOP_MACHINE=n. This getcpu-required version of
stop_machine() will be used from poke_text_smp().Signed-off-by: Masami Hiramatsu
Acked-by: Tejun Heo
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix selftest to clear flags field for reusing probes
because the flags field can be modified by Kprobes.
This also set NULL to kprobe.addr instead of 0.Signed-off-by: Masami Hiramatsu
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference:
Signed-off-by: Ingo Molnar -
Update kprobes.txt about interrupts disabled state inside
kprobes handlers, because optimized probe/boosted kretprobe
run without disabling interrrupts on x86.Signed-off-by: Masami Hiramatsu
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference:
Signed-off-by: Ingo Molnar -
Fix this linux-next build failure that Stephen reported:
arch/arm/kernel/perf_event.c: In function 'armpmu_event_init':
arch/arm/kernel/perf_event.c:543: error: request for member 'num_events' in something not a structure or unionReported-by: Stephen Rothwell
Cc: Peter Zijlstra
Cc: paulus
LKML-Reference:
Signed-off-by: Ingo Molnar
13 Oct, 2010
1 commit
-
Fix
kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a castwhen building 36-rc6 on a 32-bit due to the strict type check failing
in the min() macro.Signed-off-by: Borislav Petkov
Cc: Chase Douglas
Cc: Steven Rostedt
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Frederic Weisbecker
12 Oct, 2010
8 commits
-
Oprofile counters are setup when profiling is disabled. Thus, writing
to oprofilefs has no immediate effect. Changes are updated only after
oprofile is reenabled.To keep userland and kernel states synchronized, we now allow
configuration of oprofile only if profiling is disabled. In this case
it checks if the profiler is running and then disables write access to
oprofilefs by returning -EBUSY. The change should be backward
compatible with current oprofile userland daemon.Acked-by: Maynard Johnson
Cc: William Cohen
Cc: Suravee Suthikulpanit
Signed-off-by: Robert Richter -
Conflicts:
arch/arm/oprofile/common.cSigned-off-by: Robert Richter
-
There is duplicate cleanup code in the init and exit functions. Now,
oprofile_arch_exit() is also used if oprofile_arch_init() fails.Acked-by: Will Deacon
Signed-off-by: Robert Richter -
This patch simplifies op_create_counter(). Removing if/else if paths
and return code variable by direct returning from function.Acked-by: Will Deacon
Signed-off-by: Robert Richter -
This patch removes some unnecessary goto statements.
Acked-by: Will Deacon
Signed-off-by: Robert Richter -
Conflicts:
arch/arm/oprofile/common.cSigned-off-by: Robert Richter
-
This patch fixes a resource leak on failure, where the
oprofilefs and some counters may not released properly.Signed-off-by: Robert Richter
Acked-by: Will Deacon
Cc: linux-arm-kernel@lists.infradead.org
Cc: # .35.x
LKML-Reference:
Signed-off-by: Ingo Molnar