Eric Lee / smarc-fsl-linux-kernel

21 Dec, 2011

1 commit

c37e17497 perf events: Add PERF_COUNT_HW_REF_CPU_CYCLES generic PMU event ... Browse Code »

This event counts the number of reference core cpu cycles.
Reference means that the event increments at a constant rate which
is not subject to core CPU frequency adjustments. The event may
not count when the processor is in halted (low power) state.
As such, it may not be equivalent to wall clock time. However,
when the processor is not halted state, the event keeps
a constant correlation with wall clock time.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1323559734-3488-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-12-21 17:26:37 +0800

06 Dec, 2011

2 commits

b20295207 perf, core: Rate limit perf_sched_events jump_label patching ... Browse Code »

jump_lable patching is very expensive operation that involves pausing all
cpus. The patching of perf_sched_events jump_label is easily controllable
from userspace by unprivileged user.

When te user runs a loop like this:

"while true; do perf stat -e cycles true; done"

... the performance of my test application that just increments a counter
for one second drops by 4%.

This is on a 16 cpu box with my test application using only one of
them. An impact on a real server doing real work will be worse.

Performance of KVM PMU drops nearly 50% due to jump_lable for "perf
record" since KVM PMU implementation creates and destroys perf event
frequently.

This patch introduces a way to rate limit jump_label patching and uses
it to fix the above problem.

I believe that as jump_label use will spread the problem will become more
common and thus solving it in a generic code is appropriate. Also fixing
it in the perf code would result in moving jump_label accounting logic to
perf code with all the ifdefs in case of JUMP_LABEL=n kernel. With this
patch all details are nicely hidden inside jump_label code.

Signed-off-by: Gleb Natapov
Acked-by: Jason Baron
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111127155909.GO2557@redhat.com
Signed-off-by: Ingo Molnar

Gleb Natapov
2011-12-06 15:34:02 +0800
0f5a26012 perf: Avoid a useless pmu_disable() in the perf-tick ... Browse Code »

Gleb writes:

> Currently pmu is disabled and re-enabled on each timer interrupt even
> when no rotation or frequency adjustment is needed. On Intel CPU this
> results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal
> it does not cause significant slowdown, but when running perf in a virtual
> machine it leads to 20% slowdown on my machine.

Cure this by keeping a perf_event_context::nr_freq counter that counts the
number of active events that require frequency adjustments and use this in a
similar fashion to the already existing nr_events != nr_active test in
perf_rotate_context().

By being able to exclude both rotation and frequency adjustments a-priory for
the common case we can avoid the otherwise superfluous PMU disable.

Suggested-by: Gleb Natapov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-06 15:33:52 +0800

05 Dec, 2011

1 commit

10c6db110 perf: Fix loss of notification with multi-event ... Browse Code »

When you do:
$ perf record -e cycles,cycles,cycles noploop 10

You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:

$ perf report -D | tail -15
Aggregated stats:
TOTAL events: 10998
MMAP events: 66
COMM events: 2
SAMPLE events: 10930
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644
cycles stats:
TOTAL events: 3642
SAMPLE events: 3642
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644

On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.

The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it. The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.

The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -> FAIL.

Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.

Reported-by: Stephane Eranian
Finished-by: Stephane Eranian
Reviewed-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:33:03 +0800

06 Oct, 2011

1 commit

a240f7616 perf, core: Introduce attrs to count in either host or guest mode ... Browse Code »

The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.

An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.

Signed-off-by: Joerg Roedel
Signed-off-by: Gleb Natapov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar

Joerg Roedel
2011-10-06 19:00:28 +0800

29 Aug, 2011

1 commit

a8d757ef0 perf events: Fix slow and broken cgroup context switch code ... Browse Code »

The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

$ ./pong
Both processes pinned to CPU1, running for 10s
10684.51 ctxsw/s

Now start a cgroup perf stat:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

$ ./pong
Both processes pinned to CPU1, running for 10s
6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

Performance counter stats for 'sleep 100':

CPU1 cycles test
CPU1 16,984,189,138 cycles # 0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
$ ./pong
Both processes pinned to CPU1, running for 10s
10775.30 ctxsw/s

Start perf stat with cgroup:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Run pong outside the cgroup:
$ /pong
Both processes pinned to CPU1, running for 10s
10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 22,297,726,205 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-08-29 18:28:33 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

01 Jul, 2011

7 commits

26ca5c11f perf: export perf_event_refresh() to modules ... Browse Code »

KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).

Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar

Avi Kivity
2011-07-01 17:06:40 +0800
4dc0da869 perf: Add context field to perf_event ... Browse Code »

The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.

Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar

Avi Kivity
2011-07-01 17:06:38 +0800
89d6c0b5b perf, arch: Add generic NODE cache events ... Browse Code »
43

Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..

The below needs filling out for !x86 (which I filled out with
unsupported events).

I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.

Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.

Signed-off-by: Peter Zijlstra
Cc: David Miller
Cc: Anton Blanchard
Cc: David Daney
Cc: Deng-Cheng Zhu
Cc: Paul Mundt
Cc: Will Deacon
Cc: Robert Richter
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-07-01 17:06:38 +0800
efc9f05df perf_events: Update Intel extra regs shared constraints management ... Browse Code »

This patch improves the code managing the extra shared registers
used for offcore_response events on Intel Nehalem/Westmere. The
idea is to use static allocation instead of dynamic allocation.
This simplifies greatly the get and put constraint routines for
those events.

The patch also renames per_core to shared_regs because the same
data structure gets used whether or not HT is on. When HT is
off, those events still need to coordination because they use
a extra MSR that has to be shared within an event group.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-07-01 17:06:36 +0800
a7ac67ea0 perf: Remove the perf_output_begin(.sample) argument ... Browse Code »

Since only samples call perf_output_sample() its much saner (and more
correct) to put the sample logic in there than in the
perf_output_begin()/perf_output_end() pair.

Saves a useless argument, reduces conditionals and shrinks
struct perf_output_handle, win!

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-07-01 17:06:35 +0800
a8b0ca17b perf: Remove the nmi parameter from the swevent and overflow interface ... Browse Code »
1

The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra
Cc: Michael Cree
Cc: Will Deacon
Cc: Deng-Cheng Zhu
Cc: Anton Blanchard
Cc: Eric B Munson
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Jason Wessel
Cc: Don Zickus
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-07-01 17:06:35 +0800
28009ce4a perf: Remove 64-bit alignment padding from perf_event_context ... Browse Code »

Reorder perf_event_context to remove 8 bytes of 64 bit alignment padding
shrinking its size to 192 bytes, allowing it to fit into a smaller slab
and use one fewer cache lines.

Signed-off-by: Richard Kennedy
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1307460819.1950.5.camel@castor.rsk
Signed-off-by: Ingo Molnar

Richard Kennedy
2011-07-01 17:06:32 +0800

16 Jun, 2011

1 commit

b4f9f2b64 Merge commit 'v3.0-rc3' into perf/core ... Browse Code »

Merge reason: add the latest fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-06-16 19:23:22 +0800

09 Jun, 2011

1 commit

76369139c perf: Split up buffer handling from core code ... Browse Code »

And create the internal perf events header.

v2: Keep an internal inlined perf_output_copy()

Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra
Cc: Borislav Petkov
Cc: Stephane Eranian
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1305827704-5607-1-git-send-email-fweisbec@gmail.com
[ v3: use clearer 'ring_buffer' and 'rb' naming ]
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2011-06-09 18:57:54 +0800

04 Jun, 2011

1 commit

d7ebe75b0 perf: Fix comments in include/linux/perf_event.h ... Browse Code »

Fix include/linux/perf_event.h comments to be consistent with
the actual #define names. This is trivial, but it can be a bit
confusing when first reading through the file.

Signed-off-by: Vince Weaver
Cc: Peter Zijlstra
Cc: paulus@samba.org
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031757090.29381@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar

Vince Weaver
2011-06-04 18:31:14 +0800

07 May, 2011

1 commit

57d524154 Merge branch 'perf/stat' into perf/core ... Browse Code »

Merge reason: the perf stat improvements are tested and ready now.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-07 03:07:38 +0800

04 May, 2011

1 commit

e7e7ee2ea perf events: Clean up definitions and initializers, update copyrights ... Browse Code »

Fix a few inconsistent style bits that were added over the past few
months.

Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-04 14:49:24 +0800

29 Apr, 2011

1 commit

8f6224224 perf events: Add generic front-end and back-end stalled cycle event definitions ... Browse Code »

Add two generic hardware events: front-end and back-end stalled cycles.

These events measure conditions when the CPU is executing code but its
capabilities are not fully utilized. Understanding such situations and
analyzing them is an important sub-task of code optimization workflows.

Both events limit performance: most front end stalls tend to be caused
by branch misprediction or instruction fetch cachemisses, backend
stalls can be caused by various resource shortages or inefficient
instruction scheduling.

Front-end stalls are the more important ones: code cannot run fast
if the instruction stream is not being kept up.

An over-utilized back-end can cause front-end stalls and thus
has to be kept an eye on as well.

The exact composition is very program logic and instruction mix
dependent.

We use the terms 'stall', 'front-end' and 'back-end' loosely and
try to use the best available events from specific CPUs that
approximate these concepts.

Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-29 20:23:58 +0800

27 Apr, 2011

2 commits

32673822e Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/ro… ... Browse Code »

…stedt/linux-2.6-trace into perf/core

Conflicts:
include/linux/perf_event.h

Merge reason: pick up the latest jump-label enhancements, they are cooked ready.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Ingo Molnar
2011-04-27 16:40:21 +0800
94403f886 perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES ... Browse Code »

The new PERF_COUNT_HW_STALLED_CYCLES event tries to approximate
cycles the CPU does nothing useful, because it is stalled on a
cache-miss or some other condition.

Acked-by: Peter Zijlstra
Acked-by: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Link: http://lkml.kernel.org/n/tip-fue11vymwqsoo5to72jxxjyl@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-04-27 02:04:53 +0800

08 Apr, 2011

1 commit

42933bac1 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 ... Browse Code »

* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
Fix common misspellings

Linus Torvalds
2011-04-08 02:14:49 +0800

05 Apr, 2011

1 commit

d430d3d7e jump label: Introduce static_branch() interface ... Browse Code »

Introduce:

static __always_inline bool static_branch(struct jump_label_key *key);

instead of the old JUMP_LABEL(key, label) macro.

In this way, jump labels become really easy to use:

Define:

struct jump_label_key jump_key;

Can be used as:

if (static_branch(&jump_key))
do unlikely code

enable/disale via:

jump_label_inc(&jump_key);
jump_label_dec(&jump_key);

that's it!

For the jump labels disabled case, the static_branch() becomes an
atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
atomic_dec() operations. We show testing results for this change below.

Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.

Since we now require a 'struct jump_label_key *key', we can store a pointer into
the jump table addresses. In this way, we can enable/disable jump labels, in
basically constant time. This change allows us to completely remove the previous
hashtable scheme. Thanks to Peter Zijlstra for this re-write.

Testing:

I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
configurations, where tracepoints were disabled.

jump label configured in
avg: 815.6

jump label *not* configured in (using atomic reads)
avg: 800.1

jump label *not* configured in (regular reads)
avg: 803.4

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Jason Baron
Suggested-by: H. Peter Anvin
Tested-by: David Daney
Acked-by: Ralf Baechle
Acked-by: David S. Miller
Acked-by: Mathieu Desnoyers
Signed-off-by: Steven Rostedt

Jason Baron
2011-04-05 00:48:08 +0800

31 Mar, 2011

2 commits

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800
ab711fe08 perf: Fix task context scheduling ... Browse Code »

Jiri reported:

|
| - once an event is created by sys_perf_event_open, task context
| is created and it stays even if the event is closed, until the
| task is finished ... thats what I see in code and I assume it's
| correct
|
| - when the task opens event, perf_sched_events jump label is
| incremented and following callbacks are started from scheduler
|
| __perf_event_task_sched_in
| __perf_event_task_sched_out
|
| These callback *in/out set/unset cpuctx->task_ctx value to the
| task context.
|
| - close is called on event on CPU 0:
| - the task is scheduled on CPU 0
| - __perf_event_task_sched_in is called
| - cpuctx->task_ctx is set
| - perf_sched_events jump label is decremented and == 0
| - __perf_event_task_sched_out is not called
| - cpuctx->task_ctx on CPU 0 stays set
|
| - exit is called on CPU 1:
| - the task is scheduled on CPU 1
| - perf_event_exit_task is called
| - task_ctx_sched_out unsets cpuctx->task_ctx on CPU 1
| - put_ctx destroys the context
|
| - another call of perf_rotate_context on CPU 0 will use invalid
| task_ctx pointer, and eventualy panic.
|

Cure this the simplest possibly way by partially reverting the
jump_label optimization for the sched_out case.

Reported-and-tested-by: Jiri Olsa
Signed-off-by: Peter Zijlstra
Cc: Oleg Nesterov
Cc: # .37+
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-03-31 19:02:55 +0800

23 Mar, 2011

1 commit

68cacd291 perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx() ... Browse Code »

This patch solves a stale pointer problem in
update_cgrp_time_from_cpuctx(). The cpuctx->cgrp
was not cleared on all possible event exit paths,
including:

close()
perf_release()
perf_release_kernel()
list_del_event()

This patch fixes list_del_event() to clear cpuctx->cgrp
when there are no cgroup events left in the context.

[ This second version makes the code compile when
CONFIG_CGROUP_PERF is not enabled. We unconditionally define
perf_cpu_context->cgrp. ]

Signed-off-by: Stephane Eranian
Cc: peterz@infradead.org
Cc: perfmon2-devel@lists.sf.net
Cc: paulus@samba.org
Cc: davem@davemloft.net
LKML-Reference:
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-03-23 23:07:22 +0800

16 Mar, 2011

1 commit

ee643c417 perf: Reorder & optimize perf_event_context to remove alignment padding on 64 bit builds ... Browse Code »

Remove 8 bytes of alignment padding from perf_event_context on 64 bit
builds which shrinks its size to 192 bytes allowing it to fit into one
fewer cache lines and into a smaller slab.

Signed-off-by: Richard Kennedy
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Richard Kennedy
2011-03-16 21:04:14 +0800

04 Mar, 2011

1 commit

a7e3ed1e4 perf: Add support for supplementary event registers ... Browse Code »

Change logs against Andi's original version:

- Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra)
- Fixed a major event scheduling issue. There cannot be a ref++ on an
event that has already done ref++ once and without calling
put_constraint() in between. (Stephane Eranian)
- Use thread_cpumask for percore allocation. (Lin Ming)
- Use MSR names in the extra reg lists. (Lin Ming)
- Remove redundant "c = NULL" in intel_percore_constraints
- Fix comment of perf_event_attr::config1

Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event
that can be used to monitor any offcore accesses from a core.
This is a very useful event for various tunings, and it's
also needed to implement the generic LLC-* events correctly.

Unfortunately this event requires programming a mask in a separate
register. And worse this separate register is per core, not per
CPU thread.

This patch:

- Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
The extra parameters are passed by user space in the
perf_event_attr::config1 field.

- Adds support to the Intel perf_event core to schedule per
core resources. This adds fairly generic infrastructure that
can be also used for other per core resources.
The basic code has is patterned after the similar AMD northbridge
constraints code.

Thanks to Stephane Eranian who pointed out some problems
in the original version and suggested improvements.

Signed-off-by: Andi Kleen
Signed-off-by: Lin Ming
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Andi Kleen
2011-03-04 18:32:53 +0800

16 Feb, 2011

2 commits

163ec4354 perf: Optimize throttling code ... Browse Code »

By pre-computing the maximum number of samples per tick we can avoid a
multiplication and a conditional since MAX_INTERRUPTS >
max_samples_per_tick.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-02-16 20:30:55 +0800
e5d1367f1 perf: Add cgroup support ... Browse Code »

This kernel patch adds the ability to filter monitoring based on
container groups (cgroups). This is for use in per-cpu mode only.

The cgroup to monitor is passed as a file descriptor in the pid
argument to the syscall. The file descriptor must be opened to
the cgroup name in the cgroup filesystem. For instance, if the
cgroup name is foo and cgroupfs is mounted in /cgroup, then the
file descriptor is opened to /cgroup/foo. Cgroup mode is
activated by passing PERF_FLAG_PID_CGROUP in the flags argument
to the syscall.

For instance to measure in cgroup foo on CPU1 assuming
cgroupfs is mounted under /cgroup:

struct perf_event_attr attr;
int cgroup_fd, fd;

cgroup_fd = open("/cgroup/foo", O_RDONLY);
fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
close(cgroup_fd);

Signed-off-by: Stephane Eranian
[ added perf_cgroup_{exit,attach} ]
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-02-16 20:30:48 +0800

16 Dec, 2010

3 commits

abe434005 perf: Sysfs enumeration ... Browse Code »

Simple sysfs emumeration of the PMUs.

Use a "event_source" bus, and add PMU devices using their name.

Each PMU device has a type attribute which contrains the value needed
for perf_event_attr::type to identify this PMU.

This is the minimal stub needed to start using this interface,
we'll consider extending the sysfs usage later.

Cc: Kay Sievers
Cc: Greg KH
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-12-16 18:36:43 +0800
2e80a82a4 perf: Dynamic pmu types ... Browse Code »

Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.

Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.

If we want to enumerate the PMUs they need a name, provide one.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-12-16 18:36:43 +0800
006b20fe4 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Merge reason: We want to apply a dependent patch.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-12-16 18:22:27 +0800

09 Dec, 2010

1 commit

516769575 perf: Fix duplicate events with multiple-pmu vs software events ... Browse Code »

Because the multi-pmu bits can share contexts between struct pmu
instances we could get duplicate events by iterating the pmu list.

Signed-off-by: Peter Zijlstra
Signed-off-by: Thomas Gleixner
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-12-09 03:14:08 +0800

05 Dec, 2010

2 commits

c980d1091 perf events: Make sample_type identity fields available in all PERF_RECORD_ events ... Browse Code »

If perf_event_attr.sample_id_all is set it will add the PERF_SAMPLE_ identity
info:

TID, TIME, ID, CPU, STREAM_ID

As a trailer, so that older perf tools can process new files, just ignoring the
extra payload.

With this its possible to do further analysis on problems in the event stream,
like detecting reordering of MMAP and FORK events, etc.

V2: Fixup header size in comm, mmap and task processing, as we have to take into
account different sample_types for each matching event, noticed by Thomas Gleixner.

Thomas also noticed a problem in v2 where if we didn't had space in the buffer we
wouldn't restore the header size.

Tested-by: Thomas Gleixner
Reviewed-by: Thomas Gleixner
Acked-by: Ian Munsie
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Frédéric Weisbecker
Cc: Ian Munsie
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Stephane Eranian
Cc: Thomas Gleixner
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2010-12-05 09:02:20 +0800
6844c09d8 perf events: Separate the routines handling the PERF_SAMPLE_ identity fields ... Browse Code »

Those will be made available in sample like events like MMAP, EXEC, etc in a
followup patch. So precalculate the extra id header space and have a separate
routine to fill them up.

V2: Thomas noticed that the id header needs to be precalculated at
inherit_events too:

LKML-Reference:

Tested-by: Thomas Gleixner
Reviewed-by: Thomas Gleixner
Acked-by: Ian Munsie
Acked-by: Peter Zijlstra
Acked-by: Thomas Gleixner
Cc: Frédéric Weisbecker
Cc: Ian Munsie
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Stephane Eranian
Cc: Thomas Gleixner
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2010-12-05 08:56:48 +0800

01 Dec, 2010

1 commit

c320c7b7d perf events: Precalculate the header space for PERF_SAMPLE_ fields ... Browse Code »

PERF_SAMPLE_{CALLCHAIN,RAW} have variable lenghts per sample, but the others
can be precalculated, reducing a bit the per sample cost.

Acked-by: Peter Zijlstra
Cc: Frédéric Weisbecker
Cc: Ian Munsie
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Stephane Eranian
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo
2010-12-01 05:19:04 +0800

26 Nov, 2010

2 commits

6c7e550f1 perf: Introduce is_sampling_event() ... Browse Code »

and use it when appropriate.

Signed-off-by: Franck Bui-Huu
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Franck Bui-Huu
2010-11-26 22:14:54 +0800
ee6dcfa40 perf: Fix the software context switch counter ... Browse Code »

Stephane noticed that because the perf_sw_event() call is inside the
perf_event_task_sched_out() call it won't get called unless we
have a per-task counter.

Reported-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-11-26 22:00:59 +0800