07 Nov, 2011
1 commit
-
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
03 Nov, 2011
1 commit
-
This reverts commit 144060fee07e9c22e179d00819c83c86fbcbf82c.
It causes a resume regression for Andi on his Acer Aspire 1830T post
3.1. The screen just stays black after wakeup.Also, it really looks like the wrong way to suspend and resume perf
events: I think they should be done as part of the CPU suspend and
resume, rather than as a notifier that does smp_call_function().Reported-by: Andi Kleen
Acked-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Signed-off-by: Linus Torvalds
01 Nov, 2011
2 commits
-
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".The difference between mlocking and pinning is:
A. mlocked pages are marked with PG_mlocked and are exempt from
swapping. Page migration may move them around though.
They are kept on a special LRU list.B. Pinned pages cannot be moved because something needs to
directly access physical memory. They may not be on any
LRU list.I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.This patch introduces a separate counter for pinned pages and
accounts them seperately.Signed-off-by: Christoph Lameter
Cc: Mike Marciniszyn
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
These files were getting via an implicit non-obvious
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason. Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.Signed-off-by: Paul Gortmaker
26 Sep, 2011
1 commit
-
Merge reason: Pick up the latest upstream fixes.
Signed-off-by: Ingo Molnar
31 Aug, 2011
1 commit
-
We detected a serious issue with PERF_SAMPLE_READ and
timing information when events were being multiplexing.Samples would have time_running > time_enabled. That
was easy to reproduce with a libpfm4 example (ran 3
times to cause multiplexing on Core 2):$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
syst_smpl: WARNING: time_running > time_enabled
63277537998 uops_retired:freq=1 , scaledThe bug was not present in kernel up to (and including) 3.0. It turns
out the bug was introduced by the following commit:commit c4794295917ebeda8013b6cb9c8d71ab4f74a1fa
events: Move lockless timer calculation into helper function
The parameters of the function got reversed yet the call sites
were not updated to reflect the change. That lead to time_running
and time_enabled being swapped. That had no effect when there was
no multiplexing because in that case time_running = time_enabled
but it would show up in any other scenario.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad
Signed-off-by: Ingo Molnar
29 Aug, 2011
1 commit
-
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:$ ./pong
Both processes pinned to CPU1, running for 10s
10684.51 ctxsw/sNow start a cgroup perf stat:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100$ ./pong
Both processes pinned to CPU1, running for 10s
6674.61 ctxsw/sThat's a 37% penalty.
Note that pong is not even in the monitored cgroup.
The results shown by perf stat are bogus:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100Performance counter stats for 'sleep 100':
CPU1 cycles test
CPU1 16,984,189,138 cycles # 0.000 GHzThe second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.With this patch the same test now yields:
$ ./pong
Both processes pinned to CPU1, running for 10s
10775.30 ctxsw/sStart perf stat with cgroup:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Run pong outside the cgroup:
$ /pong
Both processes pinned to CPU1, running for 10s
10687.80 ctxsw/sThe penalty is now less than 2%.
And the results for perf stat are correct:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Performance counter stats for 'sleep 10':
CPU1 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHzNow perf stat reports the correct counts for
for the non cgroup event.If we run pong inside the cgroup, then we also get the
correct counts:$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Performance counter stats for 'sleep 10':
CPU1 22,297,726,205 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz10.001457237 seconds time elapsed
Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar
14 Aug, 2011
2 commits
-
Currently, an event's 'pmu' field is set after pmu::event_init() is
called. This means that pmu::event_init() must figure out which struct
pmu the event was initialised from. This makes it difficult to
consolidate common event initialisation code for similar PMUs, and
very difficult to implement drivers for PMUs which can have multiple
instances (e.g. a USB controller PMU, a GPU PMU, etc).This patch sets the 'pmu' field before initialising the event, allowing
event init code to identify the struct pmu instance easily. In the
event of failure to initialise an event, the event is destroyed via
kfree() without calling perf_event::destroy(), so this shouldn't
result in bad behaviour even if the destroy field was set before
failure to initialise was noted.Signed-off-by: Mark Rutland
Reviewed-by: Will Deacon
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1313062280-19123-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar -
Francis reports that s2r gets him spurious NMIs, this is because the
suspend code leaves the boot cpu up and running.Cure this by adding a suspend notifier. The problem is that hotplug
and suspend are completely un-serialized and the PM notifiers run
before the suspend cpu unplug of all but the boot cpu.This leaves a window where the user can initialize another hotplug
operation (either remove or add a cpu) resulting in either one too
many or one too few hotplug ops. Thus we cannot use the hotplug code
for the suspend case.There's another reason to not use the hotplug code, which is that the
hotplug code totally destroys the perf state, we can do better for
suspend and simply remove all counters from the PMU so that we can
re-instate them on resume.Reported-by: Francis Moreau
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-1cvevybkgmv4s6v5y37t4847@git.kernel.org
Signed-off-by: Ingo Molnar
22 Jul, 2011
1 commit
-
PMU type id can be allocated dynamically, so perf_event_attr::type check
when copying attribute from userspace to kernel is not valid.Signed-off-by: Lin Ming
Cc: Robert Richter
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309421396-17438-4-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar
01 Jul, 2011
8 commits
-
KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar -
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.Signed-off-by: Avi Kivity
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar -
Since only samples call perf_output_sample() its much saner (and more
correct) to put the sample logic in there than in the
perf_output_begin()/perf_output_end() pair.Saves a useless argument, reduces conditionals and shrinks
struct perf_output_handle, win!Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
Signed-off-by: Ingo Molnar -
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.Signed-off-by: Peter Zijlstra
Cc: Michael Cree
Cc: Will Deacon
Cc: Deng-Cheng Zhu
Cc: Anton Blanchard
Cc: Eric B Munson
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Jason Wessel
Cc: Don Zickus
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar -
The event tracing infrastructure exposes two timers which should be updated
each time the value of the counter is updated. Currently, these counters are
only updated when userspace calls read() on the fd associated with an event.
This means that counters which are read via the mmap'd page exclusively never
have their timers updated. This patch adds ensures that the timers are updated
each time the values in the mmap'd page are updated.Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar -
Take the timer calculation from perf_output_read and move it to a helper
function for any place that needs timer values but cannot take the ctx->lock.Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar -
Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net
Signed-off-by: Ingo Molnar -
Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the
ring-buffer logic: make perf_buffer_alloc() do everything needed"),
the perf_buffer_init_code() has been mis-setting the buffer watermark
if perf_event_attr.wakeup_events has a non-zero value.This is because perf_event_attr.wakeup_events is a union with
perf_event_attr.wakeup_watermark.This commit re-enables the check for perf_event_attr.watermark being
set before continuing with setting a non-default watermark.This bug is most noticable when you are trying to use PERF_IOC_REFRESH
with a value larger than one and perf_event_attr.wakeup_events is set to
one. In this case the buffer watermark will be set to 1 and you will
get extraneous POLL_IN overflows rather than POLL_HUP as expected.[ avoid using attr.wakeup_events when attr.watermark is set ]
Signed-off-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc:
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar
09 Jun, 2011
1 commit
-
And create the internal perf events header.
v2: Keep an internal inlined perf_output_copy()
Signed-off-by: Frederic Weisbecker
Acked-by: Peter Zijlstra
Cc: Borislav Petkov
Cc: Stephane Eranian
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1305827704-5607-1-git-send-email-fweisbec@gmail.com
[ v3: use clearer 'ring_buffer' and 'rb' naming ]
Signed-off-by: Ingo Molnar
07 Jun, 2011
1 commit
-
A lost Quilt refresh of 2c29ef0fef8 (perf: Simplify and fix
__perf_install_in_context()) is causing grief and lockups,
reported by Jiri Olsa.When installing an event in a task context, there's a number of
issues:- there might not be an existing task context, in which case
we should install the now current context;- there might already be a context, not the current one, in
which case we should de-schedule the old and install the new;these cases were dealt with in the lost refresh, however there is one
further case that was found in testing:- there might already be a context, the current one, in which
case we should still de-schedule, and should take care
to re-install it (note that task_ctx_sched_out() clears
cpuctx->task_ctx).Reported-by: Jiri Olsa
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1307399008.2497.971.camel@laptop
Signed-off-by: Ingo Molnar
04 Jun, 2011
2 commits
-
Conflicts:
tools/perf/util/python.cMerge reason: resolve the conflict with perf/urgent.
Signed-off-by: Ingo Molnar
-
…/linux into perf/urgent
31 May, 2011
1 commit
-
Ben changed the cgroup API in commit f780bdb7c1c (cgroups: add
per-thread subsystem callbacks) in an incompatible way, but
forgot to convert the perf cgroup bits.Avoid compile warnings and runtime splats and convert perf too ;-)
Acked-by: Ben Blum
Cc: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1306767651.1200.2990.camel@twins
Signed-off-by: Ingo Molnar
29 May, 2011
9 commits
-
Since perf_install_in_context() will now install a context when we
add the first event, we can de-schedule the context when the last
event is removed.Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192142.090431763@chello.nl
Signed-off-by: Ingo Molnar -
In order to always call list_del_event() on the correct cpu if the
event is part of an active context and avoid having to do two IPIs,
change the close() semantics slightly.The current perf_event_disable() call would disable a whole group if
the event that's being closed is the group leader, whereas the new
code keeps the group siblings enabled.People should not rely on this behaviour and I don't think they do,
but in case we find they do, the fix is easy and we have to take the
double IPI cost.Signed-off-by: Peter Zijlstra
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/20110409192142.038377551@chello.nl
Signed-off-by: Ingo Molnar -
This was scattered out - refactor it into a single function.
No change in functionality.Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.979862055@chello.nl
Signed-off-by: Ingo Molnar -
Instead of tracking if a context is active or not, track which events
of the context are active. By making it a bitmask of
EVENT_PINNED|EVENT_FLEXIBLE we can simplify some of the scheduling
routines since it can avoid adding events that are already active.Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.930282378@chello.nl
Signed-off-by: Ingo Molnar -
Currently __perf_install_in_context() will try and schedule in the
event irrespective of our event scheduling rules, that is, we try to
schedule CPU-pinned, TASK-pinned, CPU-flexible, TASK-flexible, but
when creating a new event we simply try and schedule it on top of
whatever is already on the PMU, this can lead to errors for pinned
events.Therefore, simplify things and simply schedule everything out, add the
event to the corresponding context and schedule everything back in.This also nicely handles the case where with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI can come right in the middle
of schedule, before we managed to call perf_event_task_sched_in().Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.870894224@chello.nl
Signed-off-by: Ingo Molnar -
Make task_ctx_sched_*() imply EVENT_ALL, since anything less will not
actually have scheduled the task in/out at all.Since there's no site that schedules all of a task in (due to the
interleave with flexible cpuctx) we can remove this function.Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.817893268@chello.nl
Signed-off-by: Ingo Molnar -
Currently we only hold one ctx->lock at a time, which results in us
flipping back and forth between cpuctx->ctx.lock and task_ctx->lock.Avoid this and gain large atomic regions by holding both locks. We
nest the task lock inside the cpu lock, since with task scheduling we
might have to change task ctx while holding the cpu ctx lock.Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.769881865@chello.nl
Signed-off-by: Ingo Molnar -
Small cleanup to how we refcount in find_get_context(), this also
allows us to use put_ctx() to free things instead of using kfree().Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.719340481@chello.nl
Signed-off-by: Ingo Molnar -
Oleg noted that ctx_sched_out() disables the PMU even though it might
not actually do something, avoid needless PMU-disabling.Reported-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20110409192141.665385503@chello.nl
Signed-off-by: Ingo Molnar
28 May, 2011
1 commit
-
Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So
explicitly push the wakeup (including signals) when requested.Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc:
Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.org
Signed-off-by: Ingo Molnar
20 May, 2011
1 commit
-
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
...
04 May, 2011
1 commit
-
Fix a few inconsistent style bits that were added over the past few
months.Cc: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.org
Signed-off-by: Ingo Molnar
03 May, 2011
2 commits
-
As part of the events sybsystem unification, relocate hw_breakpoint.c
into its new destination.Cc: Frederic Weisbecker
Signed-off-by: Borislav Petkov -
mv kernel/perf_event.c -> kernel/events/core.c. From there, all further
sensible splitting can happen. The idea is that due to perf_event.c
becoming pretty sizable and with the advent of the marriage with ftrace,
splitting functionality into its logical parts should help speeding up
the unification and to manage the complexity of the subsystem.Signed-off-by: Borislav Petkov