19 Oct, 2010
6 commits
-
So that we can pass the task pointer to the event allocation, so that
we can use task associated data during event initialization.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Currently it looks like find_lively_task_by_vpid() takes a task ref
and relies on find_get_context() to drop it.The problem is that perf_event_create_kernel_counter() shouldn't be
dropping task refs.Signed-off-by: Peter Zijlstra
Acked-by: Frederic Weisbecker
Acked-by: Matt Helsley
LKML-Reference:
Signed-off-by: Ingo Molnar -
Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
the move_group path in perf_event_open().Since we cannot de-construct the group (we rely on it to move the events), we
have to simply ignore the double attach. The group state is context invariant
and doesn't need changing.Reported-by: Matt Fleming
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.Signed-off-by: Peter Zijlstra
Acked-by: Kyle McMartin
Acked-by: Martin Schwidefsky
[ various fixes ]
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Ingo Molnar -
The group_sched_in() function uses a transactional approach to schedule
a group of events. In a group, either all events can be scheduled or
none are. To schedule each event in, the function calls event_sched_in().
In case of error, event_sched_out() is called on each event in the group.The problem is that event_sched_out() does not completely cancel the
effects of event_sched_in(). Furthermore event_sched_out() changes the
state of the event as if it had run which is not true is this particular
case.Those inconsistencies impact time tracking fields and may lead to events
in a group not all reporting the same time_enabled and time_running values.
This is demonstrated with the example below:$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
11423 baclears (32.85% scaling, ena=829181, run=556827)
7671 baclears (0.00% scaling, ena=556827, run=556827)2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
11705 baclears (57.83% scaling, ena=962822, run=405995)
11705 baclears (57.83% scaling, ena=962822, run=405995)Notice that in the first group, the last baclears event does not
report the same timings as its siblings.This issue comes from the fact that tstamp_stopped is updated
by event_sched_out() as if the event had actually run.To solve the issue, we must ensure that, in case of error, there is
no change in the event state whatsoever. That means timings must
remain as they were when entering group_sched_in().To do this we defer updating tstamp_running until we know the
transaction succeeded. Therefore, we have split event_sched_in()
in two parts separating the update to tstamp_running.Similarly, in case of error, we do not want to update tstamp_stopped.
Therefore, we have split event_sched_out() in two parts separating
the update to tstamp_stopped.With this patch, we now get the following output:
$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
11243 baclears (71.75% scaling, ena=1093330, run=308841)
11243 baclears (71.75% scaling, ena=1093330, run=308841)1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
9253 baclears (0.00% scaling, ena=784489, run=784489)
9253 baclears (0.00% scaling, ena=784489, run=784489)Note that the uneven timing between groups is a side effect of
the process spending most of its time sleeping, i.e., not enough
event rotations (but that's a separate issue).Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
You can only call update_context_time() when the context
is active, i.e., the thread it is attached to is still running.However, perf_event_read() can be called even when the context
is inactive, e.g., user read() the counters. The call to
update_context_time() must be conditioned on the status of
the context, otherwise, bogus time_enabled, time_running may
be returned. Here is an example on AMD64. The task program
is an example from libpfm4. The -p prints deltas every 1s.$ task -p -e cpu_clk_unhalted sleep 5
2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)Whereas if you don't read deltas, e.g., no call to perf_event_read() until
the process terminates:$ task -e cpu_clk_unhalted sleep 5
2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)Notice that time_enable, time_running are bogus in the first example
causing bogus scaling.This patch fixes the problem, by conditionally calling update_context_time()
in perf_event_read().Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: stable@kernel.org
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Oct, 2010
5 commits
-
Conflicts:
arch/arm/oprofile/common.c
kernel/perf_event.c -
…nel/git/rostedt/linux-2.6-trace into perf/core
-
The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.Suggested-by: Ingo Molnar
Cc:
Cc: Michal Marek
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser
Signed-off-by: Steven Rostedt -
…ic/random-tracing into perf/core
-
This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.After verifying this works, other archs can add:
HAVE_C_MCOUNT_RECORD
in its Kconfig and it will use the C version of recordmcount
instead of the perl version.Cc:
Cc: Michal Marek
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser
Signed-off-by: Steven Rostedt
14 Oct, 2010
1 commit
-
Fix selftest to clear flags field for reusing probes
because the flags field can be modified by Kprobes.
This also set NULL to kprobe.addr instead of 0.Signed-off-by: Masami Hiramatsu
Cc: Rusty Russell
Cc: Ananth N Mavinakayanahalli
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference:
Signed-off-by: Ingo Molnar
13 Oct, 2010
1 commit
-
Fix
kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a castwhen building 36-rc6 on a 32-bit due to the strict type check failing
in the min() macro.Signed-off-by: Borislav Petkov
Cc: Chase Douglas
Cc: Steven Rostedt
Cc: Ingo Molnar
LKML-Reference:
Signed-off-by: Frederic Weisbecker
12 Oct, 2010
1 commit
11 Oct, 2010
1 commit
-
Introduce perf_pmu_name() helper function that returns the name of the
pmu. This gives us a generic way to get the name of a pmu regardless of
how an architecture identifies it internally.Signed-off-by: Matt Fleming
Acked-by: Peter Zijlstra
Acked-by: Paul Mundt
Signed-off-by: Robert Richter
08 Oct, 2010
1 commit
-
Conflicts:
arch/x86/kernel/module.cMerge reason: Resolve the conflict, pick up fixes.
Signed-off-by: Ingo Molnar
06 Oct, 2010
2 commits
-
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh
generic-ipi: Fix deadlock in __smp_call_function_single -
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling. That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.Future fixups:
- move the module list handling code into kernel/module.c where it
belongs.
- get rid of 'module_bug_list' and just use the regular list of modules
(called 'modules' - imagine that) that we already create and maintain
for other reasons.Reported-and-tested-by: Thomas Gleixner
Cc: Rusty Russell
Cc: Adrian Bunk
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
04 Oct, 2010
1 commit
-
This patch fixes an error in perf_event_open() when the pid
provided by the user is invalid. find_lively_task_by_vpid()
does not return NULL on error but an error code. Without the
fix the error code was silently passed to find_get_context()
which would eventually cause a invalid pointer dereference.Signed-off-by: Stephane Eranian
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
LKML-Reference:
Signed-off-by: Ingo Molnar
02 Oct, 2010
1 commit
-
The kfifo_dma family of functions use sg_mark_end() on the last element in
their scatterlist. This forces use of a fresh scatterlist for each DMA
operation, which makes recycling a single scatterlist impossible.Change the behavior of the kfifo_dma functions to match the usage of the
dma_map_sg function. This means that users must respect the returned
nents value. The sample code is updated to reflect the change.This bug is trivial to cause: call kfifo_dma_in_prepare() such that it
prepares a scatterlist with a single entry comprising the whole fifo.
This is the case when you map the entirety of a newly created empty fifo.
This causes the setup_sgl() function to mark the first scatterlist entry
as the end of the chain, no matter what comes after it.Afterwards, add and remove some data from the fifo such that another call
to kfifo_dma_in_prepare() will create two scatterlist entries. It returns
nents=2. However, due to the previous sg_mark_end() call, sg_is_last()
will now return true for the first scatterlist element. This causes the
sample code to print a single scatterlist element when it should print
two.By removing the call to sg_mark_end(), we make the API as similar as
possible to the DMA mapping API. All users are required to respect the
returned nents.Signed-off-by: Ira W. Snyder
Cc: Stefani Seibold
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Sep, 2010
1 commit
-
…stedt/linux-2.6-trace into perf/core
23 Sep, 2010
6 commits
-
The below bug in fork led to the rmap walk finding the parent huge-pmd
twice instead of just once, because the anon_vma_chain objects of the
child vma still point to the vma->vm_mm of the parent.The patch fixes it by making the rmap walk accurate during fork. It's not
a big deal normally but it worth being accurate considering the cost is
the same.Signed-off-by: Andrea Arcangeli
Acked-by: Johannes Weiner
Acked-by: Rik van Riel
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make use of the jump label infrastructure for tracepoints.
Signed-off-by: Jason Baron
LKML-Reference:
Signed-off-by: Steven Rostedt -
Add a jump_label_text_reserved(void *start, void *end), so that other
pieces of code that want to modify kernel text, can first verify that
jump label has not reserved the instruction.Acked-by: Masami Hiramatsu
Signed-off-by: Jason Baron
LKML-Reference:
Signed-off-by: Steven Rostedt -
Initialize the workqueue data structures *before* they are registered
so that they are ready for callbacks.Signed-off-by: Jason Baron
LKML-Reference:
Signed-off-by: Steven Rostedt -
base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.Acked-by: David S. Miller
Signed-off-by: Jason Baron
LKML-Reference:[ cleaned up some formating ]
Signed-off-by: Steven Rostedt
-
Conflicts:
kernel/hw_breakpoint.cMerge reason: resolve the conflict.
Signed-off-by: Ingo Molnar
22 Sep, 2010
2 commits
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix nohz balance kick
sched: Fix user time incorrectly accounted as system time on 32-bit -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hw breakpoints: Fix pid namespace bug
x86: Fix instruction breakpoint encoding
oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540)
kprobes: Fix Kconfig dependency
21 Sep, 2010
3 commits
-
The per-pmu per-cpu context patch converted things from
get_cpu_var() to this_cpu_ptr(), but that only works if
rcu_read_lock() actually disables preemption, and since
there is no such guarantee, we need to fix that.Use the newly introduced {get,put}_cpu_ptr().
Signed-off-by: Peter Zijlstra
Cc: Tejun Heo
LKML-Reference:
Signed-off-by: Ingo Molnar -
Merge reason: Pick up the latest fixes in -rc5.
Signed-off-by: Ingo Molnar
-
There's a situation where the nohz balancer will try to wake itself:
cpu-x is idle which is also ilb_cpu
got a scheduler tick during idle
and the nohz_kick_needed() in trigger_load_balance() checks for
rq_x->nr_running which might not be zero (because of someone waking a
task on this rq etc) and this leads to the situation of the cpu-x
sending a kick to itself.And this can cause a lockup.
Avoid this by not marking ourself eligible for kicking.
Signed-off-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
17 Sep, 2010
6 commits
-
Revert the timer per cpu-context timers because of unfortunate
nohz interaction. Fixing that would have been somewhat ugly, so
go back to driving things from the regular tick. Provide a
jiffies interval feature for people who want slower rotations.Signed-off-by: Peter Zijlstra
Cc: Stephane Eranian
Cc: Robert Richter
Cc: Yinghai Lu
LKML-Reference:
Signed-off-by: Ingo Molnar -
Use the right cpu-context.. spotted by preempt warning on
hot-unplugSigned-off-by: Peter Zijlstra
Cc: Stephane Eranian
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Ingo Molnar -
Aside from allowing software events into a !software group,
allow adding !software events to pure software groups.Once we've moved the software group and attached the first
!software event, the group will no longer be a pure software
group and hence no longer be eligible for movement, at which
point the straight ctx comparison is correct again.Signed-off-by: Peter Zijlstra
Cc: Stephane Eranian
Cc: Robert Richter
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar -
Events were not grouped anymore. The reason was that in
perf_event_open(), the field event->group_leader was
initialized before the function looked up the group_fd
to find the event leader. This patch fixes this by
reordering the code correctly.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Robert Richter
LKML-Reference:
Signed-off-by: Ingo Molnar -
Hardware breakpoints can't be registered within pid namespaces
because tsk->pid is passed rather than the pid in the current
namespace.(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )
This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.Much thanks to Frederic Weisbecker for doing
the bulk of the work finding this bug.Reported-by: Robin Green
Signed-off-by: Matt Helsley
Signed-off-by: Peter Zijlstra
Cc: Prasad
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Will Deacon
Cc: Mahesh Salgaonkar
Cc: 2.6.33-2.6.35
LKML-Reference:
Signed-off-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: add documentation
15 Sep, 2010
2 commits
-
This removes following warnings when build with C=1
warning: context imbalance in 'kretprobe_hash_lock' - wrong count at exit
warning: context imbalance in 'kretprobe_table_lock' - wrong count at exit
warning: context imbalance in 'kretprobe_hash_unlock' - unexpected unlock
warning: context imbalance in 'kretprobe_table_unlock' - unexpected unlockSigned-off-by: Namhyung Kim
Acked-by: Masami Hiramatsu
LKML-Reference:
Signed-off-by: Ingo Molnar -
Make following (internal) functions static to make sparse
happier :-)* get_optimized_kprobe: only called from static functions
* kretprobe_table_unlock: _lock function is static
* kprobes_optinsn_template_holder: never called but holding asm codeSigned-off-by: Namhyung Kim
Acked-by: Masami Hiramatsu
LKML-Reference:
Signed-off-by: Ingo Molnar