07 Aug, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
    sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug
    sched: No need for bootmem special cases
    sched: Revert nohz_ratelimit() for now
    sched: Reduce update_group_power() calls
    sched: Update rq->clock for nohz balanced cpus
    sched: Fix spelling of sibling
    sched, cpuset: Drop __cpuexit from cpu hotplug callbacks
    sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check()
    sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()
    sched: thread_group_cputime: Simplify, document the "alive" check
    sched: Remove the obsolete exit_state/signal hacks
    sched: task_tick_rt: Remove the obsolete ->signal != NULL check
    sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless
    sched: Fix comments to make them DocBook happy
    sched: Fix fix_small_capacity
    powerpc: Exclude arch_sd_sibiling_asym_packing() on UP
    powerpc: Enable asymmetric SMT scheduling on POWER7
    sched: Add asymmetric group packing option for sibling domain
    sched: Fix capacity calculations for SMT4
    sched: Change nohz idle load balancing logic to push model
    ...

    Linus Torvalds
     

18 Jun, 2010

1 commit


10 Jun, 2010

1 commit


09 Jun, 2010

12 commits

  • Since now all modification to event->count (and ->prev_count
    and ->period_left) are local to a cpu, change then to local64_t so we
    avoid the LOCK'ed ops.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Only child counters adding back their values into the parent counter
    are responsible for cross-cpu updates to event->count.

    So if we pull that out into a new child_count variable, we get an
    event->count that is only modified locally.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Create a helper function for those sites that want to read the event count.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently there are perf_buffer_alloc() + perf_buffer_init() + some
    separate bits, fold it all into a single perf_buffer_alloc() and only
    leave the attachment to the event separate.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Rename to clarify code.

    s/perf_mmap_data/perf_buffer/g and selective s/data/buffer/g

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Clarify some of the transactional group scheduling API details
    and change it so that a successfull ->commit_txn also closes
    the transaction.

    Signed-off-by: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Add the capacility to track data mmap()s. This can be used together
    with PERF_SAMPLE_ADDR for data profiling.

    Signed-off-by: Anton Blanchard
    [Updated code for stable perf ABI]
    Signed-off-by: Eric B Munson
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric B Munson
     
  • __DO_TRACE() already calls the callbacks under rcu_read_lock_sched(),
    which is sufficient for our needs, avoid doing it again.

    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Inline perf_swevent_put_recursion_context into perf_tp_event(), this
    shrinks the per trace template code footprint and saves a function
    call.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • For people who otherwise get to write: cpu_clock(smp_processor_id()),
    there is now: local_clock().

    Also, as per suggestion from Andrew, provide some documentation on
    the various clock interfaces, and minimize the unsigned long long vs
    u64 mess.

    Signed-off-by: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Drop this argument now that we always want to rewind only to the
    state of the first caller.
    It means frame pointers are not necessary anymore to reliably get
    the source of an event. But this also means we need this helper
    to be a macro now, as an inline function is not an option since
    we need to know when to provide a default implentation.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Paul Mackerras
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Frederic reported that frequency driven swevents didn't work properly
    and even caused a division-by-zero error.

    It turns out there are two bugs, the division-by-zero comes from a
    failure to deal with that in perf_calculate_period().

    The other was more interesting and turned out to be a wrong comparison
    in perf_adjust_period(). The comparison was between an s64 and u64 and
    got implicitly converted to an unsigned comparison. The problem is
    that period_left is typically < 0, so it ended up being always true.

    Cure this by making the local period variables s64.

    Reported-by: Frederic Weisbecker
    Tested-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jun, 2010

1 commit

  • Frederic reported that because swevents handling doesn't disable IRQs
    anymore, we can get a recursion of perf_adjust_period(), once from
    overflow handling and once from the tick.

    If both call ->disable, we get a double hlist_del_rcu() and trigger
    a LIST_POISON2 dereference.

    Since we don't actually need to stop/start a swevent to re-programm
    the hardware (lack of hardware to program), simply nop out these
    callbacks for the swevent pmu.

    Reported-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 May, 2010

4 commits

  • If a sample size crosses to the next page boundary, the copy
    will be made in more than one step. However we forget to advance
    the source offset for the next copy, leading to unexpected double
    copies that completely mess up the traces.

    This fixes various kinds of bad traces that have irrelevant
    data inside, as an example:

    geany-4979 [001] 5758.077775: sched_switch: prev_comm=! prev_pid=121
    prev_prio=0 prev_state=S|D|Z|X|x ==> next_comm= next_pid=7497072
    next_prio=0

    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The transactional API patch between the generic and model-specific
    code introduced several important bugs with event scheduling, at
    least on X86. If you had pinned events, e.g., watchdog, and were
    over-committing the PMU, you would get bogus counts. The bug was
    showing up on Intel CPU because events would move around more
    often that on AMD. But the problem also existed on AMD, though
    harder to expose.

    The issues were:

    - group_sched_in() was missing a cancel_txn() in the error path

    - cpuc->n_added was not properly maintained, leading to missing
    actions in hw_perf_enable(), i.e., n_running being 0. You cannot
    update n_added until you know the transaction has succeeded. In
    case of failed transaction n_added was not adjusted back.

    - in case of failed transactions, event_sched_out() was called
    and eventually invoked x86_disable_event() to touch the HW reg.
    But with transactions, on X86, event_sched_in() does not touch
    HW registers, it simply collects events into a list. Thus, you
    could end up calling x86_disable_event() on a counter which
    did not correspond to the current event when idx != -1.

    The patch modifies the generic and X86 code to avoid all those problems.

    First, we keep track of the number of events added last. In case the
    transaction fails, we substract them from n_added. This approach is
    necessary (as opposed to delaying updates to n_added) because not all
    event updates use the transaction API, e.g., single events.

    Second, we encapsulate the event_sched_in() and event_sched_out() in
    group_sched_in() inside the transaction. That makes the operations
    symmetrical and you can also detect that you are inside a transaction
    and skip the HW reg access by checking cpuc->group_flag.

    With this patch, you can now overcommit the PMU even with pinned
    system-wide events present and still get valid counts.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Group siblings don't pin each-other or the parent, so when we destroy
    events we must make sure to clean up all cross referencing pointers.

    In particular, for destruction of a group leader we must be able to
    find all its siblings and remove their reference to it.

    This means that detaching an event from its context must not detach it
    from the group, otherwise we can end up failing to clear all pointers.

    Solve this by clearly separating the attachment to a context and
    attachment to a group, and keep the group composed until we destroy
    the events.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order to move toward separate buffer objects, rework the whole
    perf_mmap_data construct to be a more self-sufficient entity, one
    with its own lifetime rules.

    This greatly sanitizes the whole output redirection code, which
    was riddled with bugs and races.

    Signed-off-by: Peter Zijlstra
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

28 May, 2010

1 commit


21 May, 2010

8 commits

  • Since we know tracepoints come from kernel context,
    avoid conditionals that try and establish that very
    fact.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Sanity checks cost instructions.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Reduce code and data by using the knowledge that for
    !PERF_USE_VMALLOC data_order is always 0.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Reduce the clutter in perf_output_copy() by keeping
    an interator in perf_output_handle.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • RO mmap()s don't update the tail pointer, so
    comparing against it for determining the written data
    size doesn't really do any good.

    Keep track of when we last did a wakeup, and compare
    against that.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we want to ensure buffers only have a single
    writer, we must avoid creating one with multiple.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Avoid the swevent hash-table by using per-tracepoint
    hlists.

    Also, avoid conditionals on the fast path by ordering
    with probe unregister so that we should never get on
    the callback path without the data being there.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • A writer that gets a reference to the buffer handle disables
    preemption. When we put that reference, we check if we are
    the outer most writer and if not, we simply return and defer
    the head update to the outer most writer. The problem here
    is that preemption is only reenabled by the outer most, that
    produces preemption count imbalance for every nested writer
    that exit.

    So just don't forget to always re-enable preemption when we
    put the buffer reference, whoever we are.

    Fixes lots of sleeping in atomic warnings, visible with lock
    events recording.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    Cc: Robert Richter

    Frederic Weisbecker
     

20 May, 2010

2 commits

  • …eric/random-tracing into perf/core

    Ingo Molnar
     
  • The software events hlist doesn't fully comply with the new
    rcu checks api.

    We need to consider three different sides that access the hlist:

    - the hlist allocation/release side. This side happens when an
    events is created or released, accesses to the hlist are
    serialized under the cpuctx mutex.

    - the events insertion/removal in the hlist. This side is always
    serialized against the above one. The hlist is always present
    during such operations. This side happens when a software event
    is scheduled in/out. The serialization that ensures the software
    event is really attached to the context is made under the
    ctx->lock.

    - events triggering. This is the read side, it can happen
    concurrently with any update side.

    This patch deals with them one by one and anticipates with the
    separate rcu mem space patches in preparation.

    This patch fixes various annoying rcu warnings.

    Reported-by: Paul E. McKenney
    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras

    Frederic Weisbecker
     

19 May, 2010

7 commits


11 May, 2010

2 commits

  • Corey reported that the value scale times of group siblings are not
    updated when the monitored task dies.

    The problem appears to be that we only update the group leader's
    time values, fix it by updating the whole group.

    Reported-by: Corey Ashford
    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: # .34.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't
    work as expected if the task the counters were attached to quit
    before the read() call.

    The cause is that we unconditionally destroy the grouping when
    we remove counters from their context. Fix this by splitting off
    the group destroy from the list removal such that
    perf_event_remove_from_context() does not do this and change
    perf_event_release() to do so.

    Reported-by: Corey Ashford
    Reported-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: # .34.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra