07 Aug, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
sched: No need for bootmem special cases
sched: Revert nohz_ratelimit() for now
sched: Reduce update_group_power() calls
sched: Update rq->clock for nohz balanced cpus
sched: Fix spelling of sibling
sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
sched: thread_group_cputime: Simplify, document the "alive" check
sched: Remove the obsolete exit_state/signal hacks
sched: task_tick_rt: Remove the obsolete ->signal != NULL check
sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
sched: Fix comments to make them DocBook happy
sched: Fix fix_small_capacity
powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
powerpc: Enable asymmetric SMT scheduling on POWER7
sched: Add asymmetric group packing option for sibling domain
sched: Fix capacity calculations for SMT4
sched: Change nohz idle load balancing logic to push model
...
18 Jun, 2010
1 commit
-
Merge reason: Update to the latest -rc.
10 Jun, 2010
1 commit
-
…ic/random-tracing into perf/core
09 Jun, 2010
12 commits
-
Since now all modification to event->count (and ->prev_count
and ->period_left) are local to a cpu, change then to local64_t so we
avoid the LOCK'ed ops.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Only child counters adding back their values into the parent counter
are responsible for cross-cpu updates to event->count.So if we pull that out into a new child_count variable, we get an
event->count that is only modified locally.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Create a helper function for those sites that want to read the event count.
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Currently there are perf_buffer_alloc() + perf_buffer_init() + some
separate bits, fold it all into a single perf_buffer_alloc() and only
leave the attachment to the event separate.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Rename to clarify code.
s/perf_mmap_data/perf_buffer/g and selective s/data/buffer/g
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Clarify some of the transactional group scheduling API details
and change it so that a successfull ->commit_txn also closes
the transaction.Signed-off-by: Peter Zijlstra
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Add the capacility to track data mmap()s. This can be used together
with PERF_SAMPLE_ADDR for data profiling.Signed-off-by: Anton Blanchard
[Updated code for stable perf ABI]
Signed-off-by: Eric B Munson
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
__DO_TRACE() already calls the callbacks under rcu_read_lock_sched(),
which is sufficient for our needs, avoid doing it again.Signed-off-by: Peter Zijlstra
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Inline perf_swevent_put_recursion_context into perf_tp_event(), this
shrinks the per trace template code footprint and saves a function
call.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
For people who otherwise get to write: cpu_clock(smp_processor_id()),
there is now: local_clock().Also, as per suggestion from Andrew, provide some documentation on
the various clock interfaces, and minimize the unsigned long long vs
u64 mess.Signed-off-by: Peter Zijlstra
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Jens Axboe
LKML-Reference:
Signed-off-by: Ingo Molnar -
Drop this argument now that we always want to rewind only to the
state of the first caller.
It means frame pointers are not necessary anymore to reliably get
the source of an event. But this also means we need this helper
to be a macro now, as an inline function is not an option since
we need to know when to provide a default implentation.Signed-off-by: Frederic Weisbecker
Signed-off-by: Paul Mackerras
Cc: David Miller
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo -
Frederic reported that frequency driven swevents didn't work properly
and even caused a division-by-zero error.It turns out there are two bugs, the division-by-zero comes from a
failure to deal with that in perf_calculate_period().The other was more interesting and turned out to be a wrong comparison
in perf_adjust_period(). The comparison was between an s64 and u64 and
got implicitly converted to an unsigned comparison. The problem is
that period_left is typically < 0, so it ended up being always true.Cure this by making the local period variables s64.
Reported-by: Frederic Weisbecker
Tested-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
03 Jun, 2010
1 commit
-
Frederic reported that because swevents handling doesn't disable IRQs
anymore, we can get a recursion of perf_adjust_period(), once from
overflow handling and once from the tick.If both call ->disable, we get a double hlist_del_rcu() and trigger
a LIST_POISON2 dereference.Since we don't actually need to stop/start a swevent to re-programm
the hardware (lack of hardware to program), simply nop out these
callbacks for the swevent pmu.Reported-by: Frederic Weisbecker
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
31 May, 2010
4 commits
-
If a sample size crosses to the next page boundary, the copy
will be made in more than one step. However we forget to advance
the source offset for the next copy, leading to unexpected double
copies that completely mess up the traces.This fixes various kinds of bad traces that have irrelevant
data inside, as an example:geany-4979 [001] 5758.077775: sched_switch: prev_comm=! prev_pid=121
prev_prio=0 prev_state=S|D|Z|X|x ==> next_comm= next_pid=7497072
next_prio=0Signed-off-by: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
The transactional API patch between the generic and model-specific
code introduced several important bugs with event scheduling, at
least on X86. If you had pinned events, e.g., watchdog, and were
over-committing the PMU, you would get bogus counts. The bug was
showing up on Intel CPU because events would move around more
often that on AMD. But the problem also existed on AMD, though
harder to expose.The issues were:
- group_sched_in() was missing a cancel_txn() in the error path
- cpuc->n_added was not properly maintained, leading to missing
actions in hw_perf_enable(), i.e., n_running being 0. You cannot
update n_added until you know the transaction has succeeded. In
case of failed transaction n_added was not adjusted back.- in case of failed transactions, event_sched_out() was called
and eventually invoked x86_disable_event() to touch the HW reg.
But with transactions, on X86, event_sched_in() does not touch
HW registers, it simply collects events into a list. Thus, you
could end up calling x86_disable_event() on a counter which
did not correspond to the current event when idx != -1.The patch modifies the generic and X86 code to avoid all those problems.
First, we keep track of the number of events added last. In case the
transaction fails, we substract them from n_added. This approach is
necessary (as opposed to delaying updates to n_added) because not all
event updates use the transaction API, e.g., single events.Second, we encapsulate the event_sched_in() and event_sched_out() in
group_sched_in() inside the transaction. That makes the operations
symmetrical and you can also detect that you are inside a transaction
and skip the HW reg access by checking cpuc->group_flag.With this patch, you can now overcommit the PMU even with pinned
system-wide events present and still get valid counts.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Group siblings don't pin each-other or the parent, so when we destroy
events we must make sure to clean up all cross referencing pointers.In particular, for destruction of a group leader we must be able to
find all its siblings and remove their reference to it.This means that detaching an event from its context must not detach it
from the group, otherwise we can end up failing to clear all pointers.Solve this by clearly separating the attachment to a context and
attachment to a group, and keep the group composed until we destroy
the events.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
In order to move toward separate buffer objects, rework the whole
perf_mmap_data construct to be a more self-sufficient entity, one
with its own lifetime rules.This greatly sanitizes the whole output redirection code, which
was riddled with bugs and races.Signed-off-by: Peter Zijlstra
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar
28 May, 2010
1 commit
-
once anon_inode_getfd() is called, you can't expect *anything* about
struct file that descriptor points to - another thread might be doing
whatever it likes with descriptor table at that point.Cc: stable
Signed-off-by: Al Viro
21 May, 2010
8 commits
-
Since we know tracepoints come from kernel context,
avoid conditionals that try and establish that very
fact.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Sanity checks cost instructions.
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Reduce code and data by using the knowledge that for
!PERF_USE_VMALLOC data_order is always 0.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Reduce the clutter in perf_output_copy() by keeping
an interator in perf_output_handle.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
RO mmap()s don't update the tail pointer, so
comparing against it for determining the written data
size doesn't really do any good.Keep track of when we last did a wakeup, and compare
against that.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since we want to ensure buffers only have a single
writer, we must avoid creating one with multiple.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
Avoid the swevent hash-table by using per-tracepoint
hlists.Also, avoid conditionals on the fast path by ordering
with probe unregister so that we should never get on
the callback path without the data being there.Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar -
A writer that gets a reference to the buffer handle disables
preemption. When we put that reference, we check if we are
the outer most writer and if not, we simply return and defer
the head update to the outer most writer. The problem here
is that preemption is only reenabled by the outer most, that
produces preemption count imbalance for every nested writer
that exit.So just don't forget to always re-enable preemption when we
put the buffer reference, whoever we are.Fixes lots of sleeping in atomic warnings, visible with lock
events recording.Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Cc: Stephane Eranian
Cc: Robert Richter
20 May, 2010
2 commits
-
…eric/random-tracing into perf/core
-
The software events hlist doesn't fully comply with the new
rcu checks api.We need to consider three different sides that access the hlist:
- the hlist allocation/release side. This side happens when an
events is created or released, accesses to the hlist are
serialized under the cpuctx mutex.- the events insertion/removal in the hlist. This side is always
serialized against the above one. The hlist is always present
during such operations. This side happens when a software event
is scheduled in/out. The serialization that ensures the software
event is really attached to the context is made under the
ctx->lock.- events triggering. This is the read side, it can happen
concurrently with any update side.This patch deals with them one by one and anticipates with the
separate rcu mem space patches in preparation.This patch fixes various annoying rcu warnings.
Reported-by: Paul E. McKenney
Signed-off-by: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
19 May, 2010
7 commits
-
Since the x86 XCHG ins implies LOCK, avoid the use by
using a sequence count instead.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since there is now only a single writer, we can use
local_t instead and avoid all these pesky LOCK insn.Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since we can now assume there is only a single writer
to each buffer, we can remove per-cpu lock thingy and
use a simply nest-count to the same effect.This removes the need to disable IRQs.
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since we now have working per-task-per-cpu events for
a while, disallow mmap() on per-task inherited
events. Those things were a performance problem
anyway, and doing away with it allows us to optimize
the buffer somewhat by assuming there is only a
single writer.Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
Ensure cpu bound buffers live on the right NUMA node.
Suggested-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
In case the sampling buffer has no "payload" pages,
nr_pages is 0. The problem is that the error path in
perf_output_begin() skips to a label which assumes
perf_output_lock() has been issued which is not the
case. That triggers a WARN_ON() in
perf_output_unlock().This patch fixes the problem by skipping
perf_output_unlock() in case data->nr_pages is 0.Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar -
When we've got but a single event per tracepoint
there is no reason to try and multiplex it so don't.Signed-off-by: Peter Zijlstra
Tested-by: Ingo Molnar
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar
11 May, 2010
2 commits
-
Corey reported that the value scale times of group siblings are not
updated when the monitored task dies.The problem appears to be that we only update the group leader's
time values, fix it by updating the whole group.Reported-by: Corey Ashford
Signed-off-by: Peter Zijlstra
Cc: Paul Mackerras
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: # .34.x
LKML-Reference:
Signed-off-by: Ingo Molnar -
Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't
work as expected if the task the counters were attached to quit
before the read() call.The cause is that we unconditionally destroy the grouping when
we remove counters from their context. Fix this by splitting off
the group destroy from the list removal such that
perf_event_remove_from_context() does not do this and change
perf_event_release() to do so.Reported-by: Corey Ashford
Reported-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: # .34.x
LKML-Reference:
Signed-off-by: Ingo Molnar