02 May, 2013

1 commit


23 Apr, 2013

1 commit

  • Provide a new helper that help full dynticks CPUs to prevent
    from stopping their tick in case there are events in the local
    rotation list.

    This way we make sure that perf_event_task_tick() is serviced
    on demand.

    Signed-off-by: Frederic Weisbecker
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Li Zhong
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Stephane Eranian
    Cc: Jiri Olsa

    Frederic Weisbecker
     

08 Apr, 2013

1 commit


01 Apr, 2013

3 commits

  • This patch adds PERF_SAMPLE_DATA_SRC.

    PERF_SAMPLE_DATA_SRC collects the data source, i.e., where
    did the data associated with the sampled instruction
    come from. Information is stored in a perf_mem_data_src
    structure. It contains opcode, mem level, tlb, snoop,
    lock information, subject to availability in hardware.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: ak@linux.intel.com
    Cc: acme@redhat.com
    Cc: jolsa@redhat.com
    Cc: namhyung.kim@lge.com
    Link: http://lkml.kernel.org/r/1359040242-8269-8-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     
  • For some events it's useful to weight sample with a hardware
    provided number. This expresses how expensive the action the
    sample represent was. This allows the profiler to scale
    the samples to be more informative to the programmer.

    There is already the period which is used similarly, but it
    means something different, so I chose to not overload it.
    Instead a new sample type for WEIGHT is added.

    Can be used for multiple things. Initially it is used for TSX
    abort costs and profiling by memory latencies (so to make
    expensive load appear higher up in the histograms). The concept
    is quite generic and can be extended to many other kinds of
    events or architectures, as long as the hardware provides
    suitable auxillary values. In principle it could be also used
    for software tracepoints.

    This adds the generic glue. A new optional sample format for a
    64-bit weight value.

    Signed-off-by: Andi Kleen
    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: jolsa@redhat.com
    Cc: namhyung.kim@lge.com
    Link: http://lkml.kernel.org/r/1359040242-8269-5-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • This patch adds a flags field to each event constraint.
    It can be used to store event specific features which can
    then later be used by scheduling code or low-level x86 code.

    The flags are propagated into event->hw.flags during the
    get_event_constraint() call. They are cleared during the
    put_event_constraint() call.

    This mechanism is going to be used by the PEBS-LL patches.
    It avoids defining yet another table to hold event specific
    information.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: ak@linux.intel.com
    Cc: jolsa@redhat.com
    Cc: namhyung.kim@lge.com
    Link: http://lkml.kernel.org/r/1359040242-8269-4-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

27 Mar, 2013

1 commit

  • This patch extends Jiri's changes to make generic
    events mapping visible via sysfs. The patch extends
    the mechanism to non-generic events by allowing
    the mappings to be hardcoded in strings.

    This mechanism will be used by the PEBS-LL patch
    later on.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: ak@linux.intel.com
    Cc: acme@redhat.com
    Cc: jolsa@redhat.com
    Cc: namhyung.kim@lge.com
    Link: http://lkml.kernel.org/r/1359040242-8269-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar
    [ fixed up conflict with 2663960 "perf: Make EVENT_ATTR global" ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

18 Mar, 2013

1 commit

  • Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
    suspend/resume") introduces a link failure since
    perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

    arch/x86/power/built-in.o: In function `restore_processor_state':
    (.text+0x45c): undefined reference to `perf_restore_debug_store'

    Fix it by defining the dummy function appropriately.

    Signed-off-by: David Rientjes
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    David Rientjes
     

16 Mar, 2013

1 commit

  • This patch fixes a kernel crash when using precise sampling (PEBS)
    after a suspend/resume. Turns out the CPU notifier code is not invoked
    on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
    by the kernel and keeps it power-on/resume value of 0 causing any PEBS
    measurement to crash when running on CPU0.

    The workaround is to add a hook in the actual resume code to restore
    the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
    the DS_AREA will be restored twice but this is harmless.

    Reported-by: Linus Torvalds
    Signed-off-by: Stephane Eranian
    Signed-off-by: Linus Torvalds

    Stephane Eranian
     

06 Mar, 2013

1 commit

  • Move struct perf_cgroup_info and perf_cgroup to
    kernel/perf/core.c, and then we can remove include of cgroup.h.

    Signed-off-by: Li Zefan
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/513568A0.6020804@huawei.com
    Signed-off-by: Ingo Molnar

    Li Zefan
     

09 Feb, 2013

1 commit

  • sys_perf_event_open()->perf_init_event(event) is called before
    find_get_context(event), this means that event->ctx == NULL when
    class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
    can't know if this event is per-task or system-wide.

    This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
    this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
    The patch also moves ->bp_target up so that it can overlap with the
    new member, this can help the compiler to generate the better code.

    trace_uprobe_register() will use it for prefiltering to avoid the
    unnecessary breakpoints in mm's we do not want to trace.

    ->tp_target doesn't have its own reference, but we can rely on the
    fact that either sys_perf_event_open() holds a reference, or it is
    equal to event->ctx->task. So this pointer is always valid until
    free_event().

    Also add the "struct list_head tp_list" into this union. It is not
    strictly necessary, but it can simplify the next changes and we can
    add it for free.

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     

01 Feb, 2013

1 commit

  • Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
    available to all architectures.

    Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
    in the variable name as a parameter.

    Changelog[v2]
    - [Jiri Olsa] No need to define PMU_EVENT_PTR()

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Anton Blanchard
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: linuxppc-dev@ozlabs.org
    Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Sukadev Bhattiprolu
     

24 Oct, 2012

2 commits

  • The perf_cpu_notifier() macro invokes smp_processor_id()
    multiple times. Optimize it by using a local variable.

    Signed-off-by: Srivatsa S. Bhat
    Reviewed-by: Paul E. McKenney
    Cc: peterz@infradead.org
    Cc: acme@ghostprotocols.net
    Link: http://lkml.kernel.org/r/20121016075817.3572.76733.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     
  • The CPU_STARTING notifiers are supposed to be run with irqs
    disabled. But the perf_cpu_notifier() macro invokes them without
    doing that. Fix it.

    Signed-off-by: Srivatsa S. Bhat
    Reviewed-by: Paul E. McKenney
    Cc: peterz@infradead.org
    Cc: acme@ghostprotocols.net
    Link: http://lkml.kernel.org/r/20121016075809.3572.47848.stgit@srivatsabhat.in.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     

13 Oct, 2012

1 commit


05 Oct, 2012

1 commit

  • Stephane thought the perf_cpu_context::active_pmu name confusing and
    suggested using 'unique_pmu' instead.

    This pointer is a pointer to a 'random' pmu sharing the cpuctx
    instance, therefore limiting a for_each_pmu loop to those where
    cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx
    instances.

    Suggested-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

02 Oct, 2012

1 commit

  • Pull perf update from Ingo Molnar:
    "Lots of changes in this cycle as well, with hundreds of commits from
    over 30 contributors. Most of the activity was on the tooling side.

    Higher level changes:

    - New 'perf kvm' analysis tool, from Xiao Guangrong.

    - New 'perf trace' system-wide tracing tool

    - uprobes fixes + cleanups from Oleg Nesterov.

    - Lots of patches to make perf build on Android out of box, from
    Irina Tirdea

    - Extend ftrace function tracing utility to be more dynamic for its
    users. It allows for data passing to the callback functions, as
    well as reading regs as if a breakpoint were to trigger at function
    entry.

    The main goal of this patch series was to allow kprobes to use
    ftrace as an optimized probe point when a probe is placed on an
    ftrace nop. With lots of help from Masami Hiramatsu, and going
    through lots of iterations, we finally came up with a good
    solution.

    - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng.

    - Various tracing updates from Steve Rostedt

    - Clean up and improve 'perf sched' performance by elliminating lots
    of needless calls to libtraceevent.

    - Event group parsing support, from Jiri Olsa

    - UI/gtk refactorings and improvements from Namhyung Kim

    - Add support for non-tracepoint events in perf script python, from
    Feng Tang

    - Add --symbols to 'script', similar to the one in 'report', from
    Feng Tang.

    Infrastructure enhancements and fixes:

    - Convert the trace builtins to use the growing evsel/evlist
    tracepoint infrastructure, removing several open coded constructs
    like switch like series of strcmp to dispatch events, etc.
    Basically what had already been showcased in 'perf sched'.

    - Add evsel constructor for tracepoints, that uses libtraceevent just
    to parse the /format events file, use it in a new 'perf test' to
    make sure the libtraceevent format parsing regressions can be more
    readily caught.

    - Some strange errors were happening in some builds, but not on the
    next, reported by several people, problem was some parser related
    files, generated during the build, didn't had proper make deps, fix
    from Eric Sandeen.

    - Introduce struct and cache information about the environment where
    a perf.data file was captured, from Namhyung Kim.

    - Fix handling of unresolved samples when --symbols is used in
    'report', from Feng Tang.

    - Add union member access support to 'probe', from Hyeoncheol Lee.

    - Fixups to die() removal, from Namhyung Kim.

    - Render fixes for the TUI, from Namhyung Kim.

    - Don't enable annotation in non symbolic view, from Namhyung Kim.

    - Fix pipe mode in 'report', from Namhyung Kim.

    - Move related stats code from stat to util/, will be used by the
    'stat' kvm tool, from Xiao Guangrong.

    - Remove die()/exit() calls from several tools.

    - Resolve vdso callchains, from Jiri Olsa

    - Don't pass const char pointers to basename, so that we can
    unconditionally use libgen.h and thus avoid ifdef BIONIC lines,
    from David Ahern

    - Refactor hist formatting so that it can be reused with the GTK
    browser, From Namhyung Kim

    - Fix build for another rbtree.c change, from Adrian Hunter.

    - Make 'perf diff' command work with evsel hists, from Jiri Olsa.

    - Use the only field_sep var that is set up: symbol_conf.field_sep,
    fix from Jiri Olsa.

    - .gitignore compiled python binaries, from Namhyung Kim.

    - Get rid of die() in more libtraceevent places, from Namhyung Kim.

    - Rename libtraceevent 'private' struct member to 'priv' so that it
    works in C++, from Steven Rostedt

    - Remove lots of exit()/die() calls from tools so that the main perf
    exit routine can take place, from David Ahern

    - Fix x86 build on x86-64, from David Ahern.

    - {int,str,rb}list fixes from Suzuki K Poulose

    - perf.data header fixes from Namhyung Kim

    - Allow user to indicate objdump path, needed in cross environments,
    from Maciek Borzecki

    - Fix hardware cache event name generation, fix from Jiri Olsa

    - Add round trip test for sw, hw and cache event names, catching the
    problem Jiri fixed, after Jiri's patch, the test passes
    successfully.

    - Clean target should do clean for lib/traceevent too, fix from David
    Ahern

    - Check the right variable for allocation failure, fix from Namhyung
    Kim

    - Set up evsel->tp_format regardless of evsel->name being set
    already, fix from Namhyung Kim

    - Oprofile fixes from Robert Richter.

    - Remove perf_event_attr needless version inflation, from Jiri Olsa

    - Introduce libtraceevent strerror like error reporting facility,
    from Namhyung Kim

    - Add pmu mappings to perf.data header and use event names from cmd
    line, from Robert Richter

    - Fix include order for bison/flex-generated C files, from Ben
    Hutchings

    - Build fixes and documentation corrections from David Ahern

    - Assorted cleanups from Robert Richter

    - Let O= makes handle relative paths, from Steven Rostedt

    - perf script python fixes, from Feng Tang.

    - Initial bash completion support, from Frederic Weisbecker

    - Allow building without libelf, from Namhyung Kim.

    - Support DWARF CFI based unwind to have callchains when %bp based
    unwinding is not possible, from Jiri Olsa.

    - Symbol resolution fixes, while fixing support PPC64 files with an
    .opt ELF section was the end goal, several fixes for code that
    handles all architectures and cleanups are included, from Cody
    Schafer.

    - Assorted fixes for Documentation and build in 32 bit, from Robert
    Richter

    - Cache the libtraceevent event_format associated to each evsel
    early, so that we avoid relookups, i.e. calling pevent_find_event
    repeatedly when processing tracepoint events.

    [ This is to reduce the surface contact with libtraceevents and
    make clear what is that the perf tools needs from that lib: so
    far parsing the common and per event fields. ]

    - Don't stop the build if the audit libraries are not installed, fix
    from Namhyung Kim.

    - Fix bfd.h/libbfd detection with recent binutils, from Markus
    Trippelsdorf.

    - Improve warning message when libunwind devel packages not present,
    from Jiri Olsa"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits)
    perf trace: Add aliases for some syscalls
    perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables
    perf tools: Check libaudit availability for perf-trace builtin
    perf hists: Add missing period_* fields when collapsing a hist entry
    perf trace: New tool
    perf evsel: Export the event_format constructor
    perf evsel: Introduce rawptr() method
    perf tools: Use perf_evsel__newtp in the event parser
    perf evsel: The tracepoint constructor should store sys:name
    perf evlist: Introduce set_filter() method
    perf evlist: Renane set_filters method to apply_filters
    perf test: Add test to check we correctly parse and match syscall open parms
    perf evsel: Handle endianity in intval method
    perf evsel: Know if byte swap is needed
    perf tools: Allow handling a NULL cpu_map as meaning "all cpus"
    perf evsel: Improve tracepoint constructor setup
    tools lib traceevent: Fix error path on pevent_parse_event
    perf test: Fix build failure
    trace: Move trace event enable from fs_initcall to core_initcall
    tracing: Add an option for disabling markers
    ...

    Linus Torvalds
     

13 Sep, 2012

1 commit

  • Current implementation simply ignores attribute flags. Thus, there is
    no notification to userland of unsupported features. Check syscall's
    attribute flags to let userland know if a feature is supported by the
    kernel. This is also needed to distinguish between future kernels what
    might support a feature.

    Cc: v3.5..
    Signed-off-by: Robert Richter
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120910093018.GO8285@erda.amd.com
    Signed-off-by: Ingo Molnar

    Robert Richter
     

04 Sep, 2012

2 commits

  • While debugging a warning message on PowerPC while using hardware
    breakpoints, it was discovered that when perf_event_disable is invoked
    through hw_breakpoint_handler function with interrupts disabled, a
    subsequent IPI in the code path would trigger a WARN_ON_ONCE message in
    smp_call_function_single function.

    This patch calls __perf_event_disable() when interrupts are already
    disabled, instead of perf_event_disable().

    Reported-by: Edjunior Barbosa Machado
    Signed-off-by: K.Prasad
    [naveen.n.rao@linux.vnet.ibm.com: v3: Check to make sure we target current task]
    Signed-off-by: Naveen N. Rao
    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain
    [ Fixed build error on MIPS. ]
    Signed-off-by: Ingo Molnar

    K.Prasad
     
  • Don't mess with file refcounts (or keep a reference to file, for
    that matter) in perf_event. Use explicit refcount of its own
    instead. Deal with the race between the final reference to event
    going away and new children getting created for it by use of
    atomic_long_inc_not_zero() in inherit_event(); just have the
    latter free what it had allocated and return NULL, that works
    out just fine (children of siblings of something doomed are
    created as singletons, same as if the child of leader had been
    created and immediately killed).

    Signed-off-by: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
    Signed-off-by: Ingo Molnar

    Al Viro
     

23 Aug, 2012

1 commit

  • Stashing version 4 under version 3 and removing version 4, because both
    version changes were within single patchset.

    Reported-by: Peter Zijlstra
    Signed-off-by: Jiri Olsa
    Acked-by: Peter Zijlstra
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/20120822083540.GB1003@krava.brq.redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

10 Aug, 2012

5 commits

  • Introducing following bits to the the perf_event_attr struct:

    - exclude_callchain_kernel to filter out kernel callchain
    from the sample dump

    - exclude_callchain_user to filter out user callchain
    from the sample dump

    We need to be able to disable standard user callchain dump when we use
    the dwarf cfi callchain mode, because frame pointer based user
    callchains are useless in this mode.

    Implementing also exclude_callchain_kernel to have complete set of
    options.

    Signed-off-by: Jiri Olsa
    [ Added kernel callchains filtering ]
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.com
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
    of the user level stack on sample. The size of the dump is specified by
    sample_stack_user value.

    Being able to dump parts of the user stack, starting from the stack
    pointer, will be useful to make a post mortem dwarf CFI based stack
    unwinding.

    Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
    architecture provides user stack dump on perf event samples. This needs
    access to the user stack pointer which is not unified across
    architectures. Enabling this for x86 architecture.

    Signed-off-by: Jiri Olsa
    Original-patch-by: Frederic Weisbecker
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Introducing perf_output_skip function to be able to skip data within the
    perf ring buffer.

    When writing data into perf ring buffer we first reserve needed place in
    ring buffer and then copy the actual data.

    There's a possibility we won't be able to fill all the reserved size
    with data, so we need a way to skip the remaining bytes.

    This is going to be useful when storing the user stack dump, where we
    might end up with less data than we originally requested.

    Signed-off-by: Jiri Olsa
    Acked-by: Frederic Weisbecker
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding a generic way to use __output_copy function with specific copy
    function via DEFINE_PERF_OUTPUT_COPY macro.

    Using this to add new __output_copy_user function, that provides output
    copy from user pointers. For x86 the copy_from_user_nmi function is used
    and __copy_from_user_inatomic for the rest of the architectures.

    This new function will be used in user stack dump on sample, coming in
    next patches.

    Signed-off-by: Jiri Olsa
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Ingo Molnar
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.com
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
    user level registers on sample. Registers we want to dump are specified
    by sample_regs_user bitmask.

    Only user level registers are dumped at the moment. Meaning the register
    values of the user space context as it was before the user entered the
    kernel for whatever reason (syscall, irq, exception, or a PMI happening
    in userspace).

    The layout of the sample_regs_user bitmap is described in
    asm/perf_regs.h for archs that support register dump.

    This is going to be useful to bring Dwarf CFI based stack unwinding on
    top of samples.

    Original-patch-by: Frederic Weisbecker
    [ Dump registers ABI specification. ]
    Signed-off-by: Jiri Olsa
    Suggested-by: Stephane Eranian
    Cc: "Frank Ch. Eigler"
    Cc: Arun Sharma
    Cc: Benjamin Redelings
    Cc: Corey Ashford
    Cc: Cyrill Gorcunov
    Cc: Frank Ch. Eigler
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Ulrich Drepper
    Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

31 Jul, 2012

1 commit

  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

18 Jun, 2012

1 commit

  • Originally from Peter Zijlstra. The helper migrates perf events
    from one cpu to another cpu.

    Signed-off-by: Zheng Yan
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1339741902-8449-5-git-send-email-zheng.z.yan@intel.com
    Signed-off-by: Ingo Molnar

    Yan, Zheng
     

06 Jun, 2012

2 commits

  • The rdpmc instruction is faster than the equivelant rdmsr call,
    so use it when possible in the kernel.

    The perfctr kernel patches did this, after extensive testing showed
    rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
    package to see a historical list of the overhead).

    I have done some tests on a 3.2 kernel, the kernel module I used
    was included in the first posting of this patch:

    rdmsr rdpmc
    Core2 T9900: 203.9 cycles 30.9 cycles
    AMD fam0fh: 56.2 cycles 9.8 cycles
    Atom 6/28/2: 129.7 cycles 50.6 cycles

    The speedup of using rdpmc is large.

    [ It's probably possible (and desirable) to do this without
    requiring a new field in the hw_perf_event structure, but
    the fixed events make this tricky. ]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1203011724030.26934@cl320.eecs.utk.edu
    Signed-off-by: Ingo Molnar

    Vince Weaver
     
  • Stack depth of 255 seems excessive, given that copy_from_user_nmi()
    could be slow.

    Signed-off-by: Arun Sharma
    Cc: Linus Torvalds
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1334961696-19580-3-git-send-email-asharma@fb.com
    Signed-off-by: Ingo Molnar

    Arun Sharma
     

31 May, 2012

1 commit

  • We faced segmentation fault on perf top -G at very high sampling rate
    due to a corrupted callchain. While the root cause was not revealed (I
    failed to figure it out), this patch tries to protect us from the
    segfault on such cases.

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Namhyung Kim
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Sunjin Yang
    Link: http://lkml.kernel.org/r/1338443007-24857-2-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

23 May, 2012

1 commit

  • This reverts commit cb04ff9ac424 ("sched, perf: Use a single
    callback into the scheduler").

    Before this change was introduced, the process switch worked
    like this (wrt. to perf event schedule):

    schedule (prev, next)
    - schedule out all perf events for prev
    - switch to next
    - schedule in all perf events for current (next)

    After the commit, the process switch looks like:

    schedule (prev, next)
    - schedule out all perf events for prev
    - schedule in all perf events for (next)
    - switch to next

    The problem is, that after we schedule perf events in, the pmu
    is enabled and we can receive events even before we make the
    switch to next - so "current" still being prev process (event
    SAMPLE data are filled based on the value of the "current"
    process).

    Thats exactly what we see for test__PERF_RECORD test. We receive
    SAMPLES with PID of the process that our tracee is scheduled
    from.

    Discussed with Peter Zijlstra:

    > Bah!, yeah I guess reverting is the right thing for now. Sad
    > though.
    >
    > So by having the two hooks we have a black-spot between them
    > where we receive no events at all, this black-spot covers the
    > hand-over of current and we thus don't receive the 'wrong'
    > events.
    >
    > I rather liked we could do away with both that black-spot and
    > clean up the code a little, but apparently people rely on it.

    Signed-off-by: Jiri Olsa
    Acked-by: Peter Zijlstra
    Cc: acme@redhat.com
    Cc: paulus@samba.org
    Cc: cjashfor@linux.vnet.ibm.com
    Cc: fweisbec@gmail.com
    Cc: eranian@google.com
    Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

09 May, 2012

2 commits

  • We can easily use a single callback for both sched-in and sched-out. This
    reduces the code footprint in the scheduler path as well as removes
    the PMU black spot otherwise present between the out and in callback.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-o56ajxp1edwqg6x9d31wb805@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We always need to pass the last sample period to
    perf_sample_data_init(), otherwise the event distribution will be
    wrong. Thus, modifiyng the function interface with the required period
    as argument. So basically a pattern like this:

    perf_sample_data_init(&data, ~0ULL);
    data.period = event->hw.last_period;

    will now be like that:

    perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

    Avoids unininitialized data.period and simplifies code.

    Signed-off-by: Robert Richter
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
    Signed-off-by: Ingo Molnar

    Robert Richter
     

24 Mar, 2012

1 commit

  • Having the build time assertion in header is making the perf
    build fail on x86 with:

    ../../include/linux/perf_event.h:411:32: error: variably modified \
    ‘__assert_mmap_data_head_offset’ at file scope [-Werror]

    I'm moving the build time validation out of the header, because
    I think it's better than to lessen the perf build warn/error
    check.

    Signed-off-by: Jiri Olsa
    Cc: acme@redhat.com
    Cc: a.p.zijlstra@chello.nl
    Cc: paulus@samba.org
    Cc: cjashfor@linux.vnet.ibm.com
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/1332513680-7870-1-git-send-email-jolsa@redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

23 Mar, 2012

1 commit

  • Complete the syscall-less self-profiling feature and address
    all complaints, namely:

    - capabilities, so we can detect what is actually available at runtime

    Add a capabilities field to perf_event_mmap_page to indicate
    what is actually available for use.

    - on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending
    properly.

    - ABI documentation as to how all this stuff works.

    Also improve the documentation for the new features.

    Signed-off-by: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Mar, 2012

1 commit

  • Adding sysfs group 'format' attribute for pmu device that
    contains a syntax description on how to construct raw events.

    The event configuration is described in following
    struct pefr_event_attr attributes:

    config
    config1
    config2

    Each sysfs attribute within the format attribute group,
    describes mapping of name and bitfield definition within
    one of above attributes.

    eg:
    "/sys/.../format/event" contains "config:0-7"
    "/sys/.../format/umask" contains "config:8-15"
    "/sys/.../format/usr" contains "config:16"

    the attribute value syntax is:

    line: config ':' bits
    config: 'config' | 'config1' | 'config2"
    bits: bits ',' bit_term | bit_term
    bit_term: VALUE '-' VALUE | VALUE

    Adding format attribute definitions for x86 cpu pmus.

    Acked-by: Peter Zijlstra
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Jiri Olsa
    Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

09 Mar, 2012

1 commit

  • This patch adds reference sizes for revision 1
    and 2 of the perf_event ABI, i.e., the size of
    the perf_event_attr struct.

    With Rev1: config2 was added = +8 bytes
    With Rev2: branch_sample_type was added = +8 bytes

    Adds the definition for Rev1, Rev2.

    This is useful for tools trying to decode the revision
    numbers based on the size of the struct.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-16-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

05 Mar, 2012

2 commits

  • With branch stack sampling, it is possible to filter by priv levels.

    In system-wide mode, that means it is possible to capture only user
    level branches. The builtin SW LBR filter needs to disassemble code
    based on LBR captured addresses. For that, it needs to know the task
    the addresses are associated with. Because of context switches, the
    content of the branch stack buffer may contain addresses from
    different tasks.

    We need a callback on context switch to either flush the branch stack
    or save it. This patch adds a new callback in struct pmu which is called
    during context switches. The callback is called only when necessary.
    That is when a system-wide context has, at least, one event which
    uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
    per-thread context.

    In this version, the Intel x86 code simply flushes (resets) the LBR
    on context switches (fills it with zeroes). Those zeroed branches are
    then filtered out by the SW filter.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch adds the ability to sample taken branches to the
    perf_event interface.

    The ability to capture taken branches is very useful for all
    sorts of analysis. For instance, basic block profiling, call
    counts, statistical call graph.

    This new capability requires hardware assist and as such may
    not be available on all HW platforms. On Intel x86 it is
    implemented on top of the Last Branch Record (LBR) facility.

    To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
    bit must be set in attr->sample_type.

    Sampled taken branches may be filtered by type and/or priv
    levels.

    The patch adds a new field, called branch_sample_type, to the
    perf_event_attr structure. It contains a bitmask of filters
    to apply to the sampled taken branches.

    Filters may be implemented in HW. If the HW filter does not exist
    or is not good enough, some arch may also implement a SW filter.

    The following generic filters are currently defined:
    - PERF_SAMPLE_USER
    only branches whose targets are at the user level

    - PERF_SAMPLE_KERNEL
    only branches whose targets are at the kernel level

    - PERF_SAMPLE_HV
    only branches whose targets are at the hypervisor level

    - PERF_SAMPLE_ANY
    any type of branches (subject to priv levels filters)

    - PERF_SAMPLE_ANY_CALL
    any call branches (may incl. syscall on some arch)

    - PERF_SAMPLE_ANY_RET
    any return branches (may incl. syscall returns on some arch)

    - PERF_SAMPLE_IND_CALL
    indirect call branches

    Obviously filter may be combined. The priv level bits are optional.
    If not provided, the priv level of the associated event are used. It
    is possible to collect branches at a priv level different from the
    associated event. Use of kernel, hv priv levels is subject to permissions
    and availability (hv).

    The number of taken branch records present in each sample may vary based
    on HW, the type of sampled branches, the executed code. Therefore
    each sample contains the number of taken branches it contains.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian