Eric Lee / linux-smarc-t335x-v3.2

31 Aug, 2011

1 commit

7f310a5d4 perf_event: Fix broken calc_timer_values() ... Browse Code »

We detected a serious issue with PERF_SAMPLE_READ and
timing information when events were being multiplexing.

Samples would have time_running > time_enabled. That
was easy to reproduce with a libpfm4 example (ran 3
times to cause multiplexing on Core 2):

$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
syst_smpl: WARNING: time_running > time_enabled
63277537998 uops_retired:freq=1 , scaled

The bug was not present in kernel up to (and including) 3.0. It turns
out the bug was introduced by the following commit:

commit c4794295917ebeda8013b6cb9c8d71ab4f74a1fa

events: Move lockless timer calculation into helper function

The parameters of the function got reversed yet the call sites
were not updated to reflect the change. That lead to time_running
and time_enabled being swapped. That had no effect when there was
no multiplexing because in that case time_running = time_enabled
but it would show up in any other scenario.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad
Signed-off-by: Ingo Molnar

Eric B Munson
2011-08-31 21:56:29 +0800

29 Aug, 2011

1 commit

a8d757ef0 perf events: Fix slow and broken cgroup context switch code ... Browse Code »

The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

$ ./pong
Both processes pinned to CPU1, running for 10s
10684.51 ctxsw/s

Now start a cgroup perf stat:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

$ ./pong
Both processes pinned to CPU1, running for 10s
6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

Performance counter stats for 'sleep 100':

CPU1 cycles test
CPU1 16,984,189,138 cycles # 0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
$ ./pong
Both processes pinned to CPU1, running for 10s
10775.30 ctxsw/s

Start perf stat with cgroup:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Run pong outside the cgroup:
$ /pong
Both processes pinned to CPU1, running for 10s
10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

Performance counter stats for 'sleep 10':

CPU1 22,297,726,205 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz

10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2011-08-29 18:28:33 +0800

22 Jul, 2011

1 commit

9985c20f9 perf: Remove perf_event_attr::type check ... Browse Code »

PMU type id can be allocated dynamically, so perf_event_attr::type check
when copying attribute from userspace to kernel is not valid.

Signed-off-by: Lin Ming
Cc: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309421396-17438-4-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar

Lin Ming
2011-07-22 02:41:55 +0800

01 Jul, 2011

8 commits

26ca5c11f perf: export perf_event_refresh() to modules ... Browse Code »

KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).

Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar

Avi Kivity
2011-07-01 17:06:40 +0800
4dc0da869 perf: Add context field to perf_event ... Browse Code »

The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.

Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar

Avi Kivity
2011-07-01 17:06:38 +0800
a7ac67ea0 perf: Remove the perf_output_begin(.sample) argument ... Browse Code »

Since only samples call perf_output_sample() its much saner (and more
correct) to put the sample logic in there than in the
perf_output_begin()/perf_output_end() pair.

Saves a useless argument, reduces conditionals and shrinks
struct perf_output_handle, win!

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-07-01 17:06:35 +0800
a8b0ca17b perf: Remove the nmi parameter from the swevent and overflow interface ... Browse Code »

The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra
Cc: Michael Cree
Cc: Will Deacon
Cc: Deng-Cheng Zhu
Cc: Anton Blanchard
Cc: Eric B Munson
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Jason Wessel
Cc: Don Zickus
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-07-01 17:06:35 +0800
0d6412085 events: Ensure that timers are updated without requiring read() call ... Browse Code »

The event tracing infrastructure exposes two timers which should be updated
each time the value of the counter is updated. Currently, these counters are
only updated when userspace calls read() on the fd associated with an event.
This means that counters which are read via the mmap'd page exclusively never
have their timers updated. This patch adds ensures that the timers are updated
each time the values in the mmap'd page are updated.

Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar

Eric B Munson
2011-07-01 17:06:34 +0800
c47942959 events: Move lockless timer calculation into helper function ... Browse Code »

Take the timer calculation from perf_output_read and move it to a helper
function for any place that needs timer values but cannot take the ctx->lock.

Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar

Eric B Munson
2011-07-01 17:06:33 +0800
b7526f0ca events: Add note to update_event_times comment about holding ctx->lock ... Browse Code »

Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar

Eric B Munson
2011-07-01 17:06:33 +0800
4ec8363df perf_events: Fix perf buffer watermark setting ... Browse Code »

Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the
ring-buffer logic: make perf_buffer_alloc() do everything needed"),
the perf_buffer_init_code() has been mis-setting the buffer watermark
if perf_event_attr.wakeup_events has a non-zero value.

This is because perf_event_attr.wakeup_events is a union with
perf_event_attr.wakeup_watermark.

This commit re-enables the check for perf_event_attr.watermark being
set before continuing with setting a non-default watermark.

This bug is most noticable when you are trying to use PERF_IOC_REFRESH
with a value larger than one and perf_event_attr.wakeup_events is set to
one. In this case the buffer watermark will be set to 1 and you will
get extraneous POLL_IN overflows rather than POLL_HUP as expected.

[ avoid using attr.wakeup_events when attr.watermark is set ]

Signed-off-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc:
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar

Vince Weaver
2011-07-01 17:06:32 +0800

09 Jun, 2011

1 commit

76369139c perf: Split up buffer handling from core code ... Browse Code »

And create the internal perf events header.

v2: Keep an internal inlined perf_output_copy()

Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra
Cc: Borislav Petkov
Cc: Stephane Eranian
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1305827704-5607-1-git-send-email-fweisbec@gmail.com
[ v3: use clearer 'ring_buffer' and 'rb' naming ]
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2011-06-09 18:57:54 +0800

07 Jun, 2011

1 commit

b58f6b0dd perf, core: Fix initial task_ctx/event installation ... Browse Code »

A lost Quilt refresh of 2c29ef0fef8 (perf: Simplify and fix
__perf_install_in_context()) is causing grief and lockups,
reported by Jiri Olsa.

When installing an event in a task context, there's a number of
issues:

- there might not be an existing task context, in which case
we should install the now current context;

- there might already be a context, not the current one, in
which case we should de-schedule the old and install the new;

these cases were dealt with in the lost refresh, however there is one
further case that was found in testing:

- there might already be a context, the current one, in which
case we should still de-schedule, and should take care
to re-install it (note that task_ctx_sched_out() clears
cpuctx->task_ctx).

Reported-by: Jiri Olsa
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1307399008.2497.971.camel@laptop
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-06-07 19:02:41 +0800

04 Jun, 2011

2 commits

3ce2a0bc9 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Conflicts:
tools/perf/util/python.c

Merge reason: resolve the conflict with perf/urgent.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-06-04 18:28:05 +0800
710054ba2 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme… ... Browse Code »

…/linux into perf/urgent

Ingo Molnar
2011-06-04 18:13:06 +0800

31 May, 2011

1 commit

74c355fbd perf, cgroups: Fix up for new API ... Browse Code »

Ben changed the cgroup API in commit f780bdb7c1c (cgroups: add
per-thread subsystem callbacks) in an incompatible way, but
forgot to convert the perf cgroup bits.

Avoid compile warnings and runtime splats and convert perf too ;-)

Acked-by: Ben Blum
Cc: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1306767651.1200.2990.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-31 20:20:25 +0800

29 May, 2011

9 commits

64ce31261 perf: De-schedule a task context when removing the last event ... Browse Code »

Since perf_install_in_context() will now install a context when we
add the first event, we can de-schedule the context when the last
event is removed.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192142.090431763@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:23 +0800
e03a9a55b perf: Change close() semantics for group events ... Browse Code »

In order to always call list_del_event() on the correct cpu if the
event is part of an active context and avoid having to do two IPIs,
change the close() semantics slightly.

The current perf_event_disable() call would disable a whole group if
the event that's being closed is the group leader, whereas the new
code keeps the group siblings enabled.

People should not rely on this behaviour and I don't think they do,
but in case we find they do, the fix is easy and we have to take the
double IPI cost.

Signed-off-by: Peter Zijlstra
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/20110409192142.038377551@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:21 +0800
dce5855bb perf: Collect the schedule-in rules in one function ... Browse Code »

This was scattered out - refactor it into a single function.
No change in functionality.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.979862055@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:19 +0800
db24d33e0 perf: Change and simplify ctx::is_active semantics ... Browse Code »

Instead of tracking if a context is active or not, track which events
of the context are active. By making it a bitmask of
EVENT_PINNED|EVENT_FLEXIBLE we can simplify some of the scheduling
routines since it can avoid adding events that are already active.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.930282378@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:17 +0800
2c29ef0fe perf: Simplify and fix __perf_install_in_context() ... Browse Code »

Currently __perf_install_in_context() will try and schedule in the
event irrespective of our event scheduling rules, that is, we try to
schedule CPU-pinned, TASK-pinned, CPU-flexible, TASK-flexible, but
when creating a new event we simply try and schedule it on top of
whatever is already on the PMU, this can lead to errors for pinned
events.

Therefore, simplify things and simply schedule everything out, add the
event to the corresponding context and schedule everything back in.

This also nicely handles the case where with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI can come right in the middle
of schedule, before we managed to call perf_event_task_sched_in().

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.870894224@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:16 +0800
04dc2dbbf perf: Remove task_ctx_sched_in() ... Browse Code »

Make task_ctx_sched_*() imply EVENT_ALL, since anything less will not
actually have scheduled the task in/out at all.

Since there's no site that schedules all of a task in (due to the
interleave with flexible cpuctx) we can remove this function.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.817893268@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:14 +0800
facc43071 perf: Optimize event scheduling locking ... Browse Code »

Currently we only hold one ctx->lock at a time, which results in us
flipping back and forth between cpuctx->ctx.lock and task_ctx->lock.

Avoid this and gain large atomic regions by holding both locks. We
nest the task lock inside the cpu lock, since with task scheduling we
might have to change task ctx while holding the cpu ctx lock.

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.769881865@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:12 +0800
9137fb28a perf: Clean up 'ctx' reference counting ... Browse Code »

Small cleanup to how we refcount in find_get_context(), this also
allows us to use put_ctx() to free things instead of using kfree().

Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.719340481@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:10 +0800
075e0b008 perf: Optimize ctx_sched_out() ... Browse Code »

Oleg noted that ctx_sched_out() disables the PMU even though it might
not actually do something, avoid needless PMU-disabling.

Reported-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.665385503@chello.nl
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-29 00:01:09 +0800

28 May, 2011

1 commit

f506b3dc0 perf: Fix SIGIO handling ... Browse Code »

Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So
explicitly push the wakeup (including signals) when requested.

Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc:
Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-05-28 23:04:59 +0800

20 May, 2011

1 commit

eb04f2f04 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
...

Linus Torvalds
2011-05-20 09:14:34 +0800

04 May, 2011

1 commit

e7e7ee2ea perf events: Clean up definitions and initializers, update copyrights ... Browse Code »

Fix a few inconsistent style bits that were added over the past few
months.

Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2011-05-04 14:49:24 +0800

03 May, 2011

2 commits

48dbb6dc8 hw breakpoints: Move to kernel/events/ ... Browse Code »

As part of the events sybsystem unification, relocate hw_breakpoint.c
into its new destination.

Cc: Frederic Weisbecker
Signed-off-by: Borislav Petkov

Borislav Petkov
2011-05-03 21:26:43 +0800
fae85b7c8 perf: Start the restructuring ... Browse Code »

mv kernel/perf_event.c -> kernel/events/core.c. From there, all further
sensible splitting can happen. The idea is that due to perf_event.c
becoming pretty sizable and with the advent of the marriage with ftrace,
splitting functionality into its logical parts should help speeding up
the unification and to manage the complexity of the subsystem.

Signed-off-by: Borislav Petkov

Borislav Petkov
2011-05-03 18:59:43 +0800