24 Dec, 2011

1 commit

  • The default input file for perf report is not handled the same way as
    perf record does it for its output file. This leads to unexpected
    behavior of perf report, etc. E.g.:

    # perf record -a -e cpu-cycles sleep 2 | perf report | cat
    failed to open perf.data: No such file or directory (try 'perf record' first)

    While perf record writes to a fifo, perf report expects perf.data to be
    read. This patch changes this to accept fifos as input file.

    Applies to the following commands:

    perf annotate
    perf buildid-list
    perf evlist
    perf kmem
    perf lock
    perf report
    perf sched
    perf script
    perf timechart

    Also fixes char const* -> const char* type declaration for filename
    strings.

    v2:
    * Prevent potential null pointer access to input_name in
    builtin-report.c. Needed due to removal of patch "perf report: Setup
    browser if stdout is a pipe"

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1323248577-11268-5-git-send-email-robert.richter@amd.com
    Signed-off-by: Robert Richter
    Signed-off-by: Arnaldo Carvalho de Melo

    Robert Richter
     

22 Dec, 2011

1 commit

  • perf report does not take a command from command line.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1323703017-6060-8-git-send-email-namhyung@gmail.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

20 Dec, 2011

1 commit

  • The '--call-graph' command line option can receive undocumented optional
    print_limit argument. Besides, use strtoul() to parse the option since
    its type is u32.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1323703017-6060-2-git-send-email-namhyung@gmail.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

28 Nov, 2011

7 commits

  • Currently the meaning of -C varies by perf command: for perf-top,
    perf-stat, perf-record it means cpu list. For perf-report it means comm
    list. Then perf-annotate, perf-report and perf-script use -c for cpu
    list.

    Fix annotate, report and script to use -C for cpu list to be consistent
    with top, stat and record. This means report needs to use -c for comm
    list which does introduce a backward compatibility change.

    v1 -> v2
    - update perf-script.txt too

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1321209008-7004-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • To better reflect that it became the base class for all tools, that must
    be in each tool struct and where common stuff will be put.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-qgpc4msetqlwr8y2k7537cxe@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Reducing the exposure of perf_session further, so that we can use the
    classes in cases where no perf.data file is created.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-stua66dcscsezzrcdugvbmvd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that we don't need to have that many globals.

    Next steps will remove the 'session' pointer, that in most cases is
    not needed.

    Then we can rename perf_event_ops to 'perf_tool' that better describes
    this class hierarchy.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-wp4djox7x6w1i2bab1pt4xxp@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Paving the way to remove these globals when we change the perf_event_ops
    to receive as a first parameter a pointer to a perf_event_ops that will
    then provide access to perf_report via container_of.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-2eh2vi2nb5z3tg1lvoxv09xu@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Since we have it in evsel->hists.callchain_cursor, remove it from
    perf_session.

    One more step in disentangling several places from requiring a
    perf_session pointer.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-rxr5dj3di7ckyfmnz0naku1z@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Since symbol__alloc_hists need it, to avoid passing it around in many
    functions have it in the symbol_conf struct.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-cwv8ysvpywzjq4v3xtbd4zwv@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

08 Oct, 2011

2 commits

  • And add the annotation output knobs to all the tools that have
    integrated annotation (top, report).

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-gnlob67mke6sji2kf4nstp7m@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The goal of this patch is to include more information about the host
    environment into the perf.data so it is more self-descriptive. Overtime,
    profiles are captured on various machines and it becomes hard to track
    what was recorded, on what machine and when.

    This patch provides a way to solve this by extending the perf.data file
    with basic information about the host machine. To add those extensions,
    we leverage the feature bits capabilities of the perf.data format. The
    change is backward compatible with existing perf.data files.

    We define the following useful new extensions:
    - HEADER_HOSTNAME: the hostname
    - HEADER_OSRELEASE: the kernel release number
    - HEADER_ARCH: the hw architecture
    - HEADER_CPUDESC: generic CPU description
    - HEADER_NRCPUS: number of online/avail cpus
    - HEADER_CMDLINE: perf command line
    - HEADER_VERSION: perf version
    - HEADER_TOPOLOGY: cpu topology
    - HEADER_EVENT_DESC: full event description (attrs)
    - HEADER_CPUID: easy-to-parse low level CPU identication

    The small granularity for the entries is to make it easier to extend
    without breaking backward compatiblity. Many entries are provided as
    ASCII strings.

    Perf report/script have been modified to print the basic information as
    easy-to-parse ASCII strings. Extended information about CPU and NUMA
    topology may be requested with the -I option.

    Thanks to David Ahern for reviewing and testing the many versions of
    this patch.

    $ perf report --stdio
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # HEADER_CPU_TOPOLOGY info available, use -I to display
    # HEADER_NUMA_TOPOLOGY info available, use -I to display
    # ========
    #
    ...

    $ perf report --stdio -I
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # sibling cores : 0-3
    # sibling threads : 0
    # sibling threads : 1
    # sibling threads : 2
    # sibling threads : 3
    # node0 meminfo : total = 8320608 kB, free = 7571024 kB
    # node0 cpu list : 0-3
    # ========
    #
    ...

    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
    Signed-off-by: Stephane Eranian
    [ committer notes: Use --show-info in the tools as was in the docs, rename
    perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
    conflict with f69b64f7 "perf: Support setting the disassembler style" ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

07 Oct, 2011

3 commits

  • This allows passing a timer to be run periodically, which will update
    the hists tree that then gers refreshed on the screen, just like the
    Live mode (symbol entries, annotation) we already have in 'perf top
    --tui'.

    Will be used by the new hist_entry/hists based 'top' tool.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-2r44qd8oe4sagzcgoikl8qzc@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Just like --show-nr-samples, to help in diagnosing problems in the
    tools.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-1lr7ejdjfvy2uwy2wkmatcpq@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • So that we can reuse hists__fprintf for in the new perf top tool.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-huazw48x05h8r9niz5cf63za@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Sep, 2011

2 commits

  • In the past we tried to avoid printing the name of the event when just
    one event was found in the perf.data file, after some refactorings it
    ended up not printing the event name if just one hist_entry was found in
    one of the events.

    Fix it by always printing the name of the event, even if just one is
    found.

    Reported-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-kikr0c7ou55bd9caok8569rf@git.kernel.org
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Add -M option to report/annotate to pass directly to objdump. This
    allows to use -M intel for intel style disassembler syntax, which is
    useful for people who are very used to the Intel syntax.

    Link: http://lkml.kernel.org/r/1316122302-24306-2-git-send-email-andi@firstfloor.org
    [committer note: Add missing Documentation bits, fixup conflicts with 3e6a2a7]
    Cc: Frederic Weisbecker
    Cc: Stephane Eranian
    Signed-off-by: Andi Kleen
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

03 Aug, 2011

1 commit

  • So that we get a proper warning in the TUI in cases like:

    $ perf report --stdio -g fractal,0.5,caller --sort pid
    Selected -g but no callchain data. Did you call 'perf record' without -g?
    $

    The --stdio case is ok because it uses fprintf, ui__warning is needed to
    figure out if --stdio or --tui is being used.

    Cc: Arun Sharma
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Sam Liao
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ag9fz2wd17mbbfjsbznq1wms@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

05 Jul, 2011

1 commit

  • Add an option to perf report/annotate/script to specify which
    CPUs to operate on. This enables us to take a single system wide
    profile and analyse each CPU (or group of CPUs) in isolation.

    This was useful when profiling a multiprocess workload where the
    bottleneck was on one CPU but this was hidden in the overall
    profile. Per process and per thread breakdowns didn't help
    because multiple processes were running on each CPU and no
    single process consumed an entire CPU.

    The patch converts the list of CPUs returned by cpu_map__new
    into a bitmap for fast lookup. I wanted to use -C to be
    consistent with perf top/record/stat, but unfortunately perf
    report already uses -C .

    v2: Incorporate suggestions from David Ahern:
    - Added -c to perf script
    - Check that SAMPLE_CPU is set when -c is used
    - Update documentation

    v3: Create perf_session__cpu_bitmap()

    Signed-off-by: Anton Blanchard
    Acked-by: David Ahern
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten
    Signed-off-by: Ingo Molnar

    Anton Blanchard
     

30 Jun, 2011

2 commits

  • We don't need to display the parent field if the parent
    sorting machinery is only used for parent filtering
    (as in "-p foo").

    However if parent filtering is used in combination with
    explicit parent sorting ( -s parent), we want to
    display it.

    Result with:

    perf report -p kernel_thread -s parent

    Before:

    # Overhead Parent symbol
    # ........ .............
    #
    0.07%
    |
    --- ioread8
    ata_sff_check_status
    ata_sff_tf_load
    ata_sff_qc_issue
    ata_bmdma_qc_issue
    ata_qc_issue
    ata_scsi_translate
    ata_scsi_queuecmd
    scsi_dispatch_cmd
    scsi_request_fn
    __blk_run_queue
    __make_request
    generic_make_request
    submit_bio
    submit_bh
    journal_submit_commit_record
    jbd2_journal_commit_transaction
    kjournald2
    kthread
    kernel_thread_helpe

    After:

    # Overhead Parent symbol
    # ........ .............
    #
    0.07% kernel_thread_helper
    |
    --- ioread8
    ata_sff_check_status
    ata_sff_tf_load
    ata_sff_qc_issue
    ata_bmdma_qc_issue
    ata_qc_issue
    ata_scsi_translate
    ata_scsi_queuecmd
    scsi_dispatch_cmd
    scsi_request_fn
    __blk_run_queue
    __make_request
    generic_make_request
    submit_bio
    submit_bh
    journal_submit_commit_record
    jbd2_journal_commit_transaction
    kjournald2
    kthread
    kernel_thread_helper

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Ahern
    Cc: Sam Liao

    Frederic Weisbecker
     
  • Add "caller/callee" option to support inverted butterfly report,
    in the inverted report (with caller option), the call graph start
    from the callee's ancestor. Users can use such view to catch system's
    performance bottleneck from a sysprof like view. Using this option
    with specified sort order like pid gives us high level view of call
    graph statistics.

    Also add "-G" alias for inverted call graph.

    Signed-off-by: Sam Liao
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Ahern
    Signed-off-by: Frederic Weisbecker

    Sam Liao
     

28 May, 2011

1 commit


26 May, 2011

1 commit

  • Perf uses /proc/modules to figure out where kernel modules are loaded.

    With the advent of kptr_restrict, non root users get zeroes for all module
    start addresses.

    So check if kptr_restrict is non zero and don't generate the syntethic
    PERF_RECORD_MMAP events for them.

    Warn the user about it in perf record and in perf report.

    In perf report the reference relocation symbol being zero means that
    kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
    use it to fixup symbol addresses when using a valid kallsyms (in the buildid
    cache) or vmlinux (in the vmlinux path) build-id located automatically or
    specified by the user.

    Provide an explanation about it in 'perf report' if kernel samples were taken,
    checking if a suitable vmlinux or kallsyms was found/specified.

    Restricted /proc/kallsyms don't go to the buildid cache anymore.

    Example:

    [acme@emilia ~]$ perf record -F 100000 sleep 1

    WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
    /proc/sys/kernel/kptr_restrict.

    Samples in kernel functions may not be resolved if a suitable vmlinux file is
    not found in the buildid cache or in the vmlinux path.

    Samples in kernel modules won't be resolved at all.

    If some relocation was applied (e.g. kexec) symbols may be misresolved even
    with a suitable vmlinux or kallsyms file.

    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
    [acme@emilia ~]$

    [acme@emilia ~]$ perf report --stdio
    Kernel address maps (/proc/{kallsyms,modules}) were restricted,
    check /proc/sys/kernel/kptr_restrict before running 'perf record'.

    If some relocation was applied (e.g. kexec) symbols may be misresolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. .....................
    #
    20.24% sleep [kernel.kallsyms] [k] page_fault
    20.04% sleep [kernel.kallsyms] [k] filemap_fault
    19.78% sleep [kernel.kallsyms] [k] __lru_cache_add
    19.69% sleep ld-2.12.so [.] memcpy
    14.71% sleep [kernel.kallsyms] [k] dput
    4.70% sleep [kernel.kallsyms] [k] flush_signal_handlers
    0.73% sleep [kernel.kallsyms] [k] perf_event_comm
    0.11% sleep [kernel.kallsyms] [k] native_write_msr_safe

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    This is because it found a suitable vmlinux (build-id checked) in
    /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
    file name).

    If we remove that file from the vmlinux path:

    [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
    /lib/modules/2.6.39-rc7+/build/vmlinux.OFF
    [acme@emilia ~]$ perf report --stdio
    [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
    not found, continuing without symbols

    Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
    /proc/sys/kernel/kptr_restrict before running 'perf record'.

    As no suitable kallsyms nor vmlinux was found, kernel samples can't be
    resolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ......
    #
    80.31% sleep [kernel.kallsyms] [k] 0xffffffff8103425a
    19.69% sleep ld-2.12.so [.] memcpy

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    Reported-by: Stephane Eranian
    Suggested-by: David Miller
    Cc: Dave Jones
    Cc: David Miller
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

24 Mar, 2011

1 commit

  • Resolving the sample->id to an evsel since the most advanced tools,
    report and annotate, and the others will too when they evolve to
    properly support multi-event perf.data files.

    Good also because it does an extra validation, checking that the ID is
    valid when present. When that is not the case, the overhead is just a
    branch + function call (perf_evlist__id2evsel).

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

10 Mar, 2011

1 commit

  • So that we can reuse things like the id to attr lookup routine
    (perf_evlist__id2evsel) that uses a hash table instead of the linear
    lookup done in the older perf_header_attr routines, etc.

    Also to make evsels/evlist more pervasive an API, simplyfing using the
    emerging perf lib.

    cc: Arun Sharma
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

07 Mar, 2011

2 commits

  • When multiple events were used in 'perf record', allow the user to
    choose which one is wanted before showing the per event histograms.

    Annotations will be performed on the chosen event.

    Allow going back and forth from event to event quickly using just the
    arrow keys and enter.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: William Cohen
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • By creating an perf_evlist out of the attributes in the perf.data file
    header, so that we can use evlists and evsels when reading recorded
    sessions in addition to when we record sessions.

    More work is needed to allow tools to allow the user to select which
    events are wanted when browsing sessions, be it just one or a subset of
    them, aggregated or showed at the same time but with different
    indications on the UI to allow seeing workloads thru different views at
    the same time.

    But the overall goal/trend is to more uniformly use evsels and evlists.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

18 Feb, 2011

1 commit

  • [root@emilia ~]# perf report --stdio
    The perf.data file has no samples!
    [root@emilia ~]#

    The TUI shows a popup warning message with the same message.

    Reported-by: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Steven Rostedt
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

11 Feb, 2011

1 commit


09 Feb, 2011

1 commit

  • Since we'll need it when implementing the live annotate TUI browser.

    This also simplifies things a bit by having the list head for the source
    code to be in the dynamicly allocated part of struct annotation, that
    way we don't have to pass it around, it can be found from the struct
    symbol that is passed everywhere.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

05 Feb, 2011

2 commits

  • The perf annotate tool continues aggregating everything on just one
    histograms, but to support the top model add support for one histogram
    perf evsel in the evlist.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • They will be used by perf top, so that we have just one set of routines
    to do annotation.

    Rename "struct sym_priv" to "struct annotation", etc, to clarify this
    code a bit.

    Rename "struct sym_ext" to "struct source_line", to give it a meaningful
    name, that clarifies that it is a the result of an addr2line call, that
    is sorted by percentage one particular source code line appeared in the
    annotation.

    And since we're moving things around also rename 'sym_hist->ip' to
    'sym_hist->addr' as we want to do data structure annotation at some
    point.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

01 Feb, 2011

1 commit


30 Jan, 2011

2 commits


23 Jan, 2011

3 commits

  • To make the callchain API naming more consistent.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • The callchains are fed with an array of a fixed size.
    As a result we iterate over each callchains three times:

    - 1st to resolve symbols
    - 2nd to filter out context boundaries
    - 3rd for the insertion into the tree

    This also involves some pairs of memory allocation/deallocation
    everytime we insert a callchain, for the filtered out array of
    addresses and for the array of symbols that comes along.

    Instead, feed the callchains through a linked list with persistent
    allocations. It brings several pros like:

    - Merge the 1st and 2nd iterations in one. That was possible before
    but in a way that would involve allocating an array slightly taller
    than necessary because we don't know in advance the number of context
    boundaries to filter out.

    - Much lesser allocations/deallocations. The linked list keeps
    persistent empty entries for the next usages and is extendable at
    will.

    - Makes it easier for multiple sources of callchains to feed a
    stacktrace together. This is deemed to pave the way for cfi based
    callchains wherein traditional frame pointer based kernel
    stacktraces will precede cfi based user ones, producing an overall
    callchain which size is hardly predictable. This requirement
    makes the static array obsolete and makes a linked list based
    iterator a much more flexible fit.

    Basic testing on a big perf file containing callchains (~ 176 MB)
    has shown a throughput gain of about 11% with perf report.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Arnaldo Carvalho de Melo

    Frederic Weisbecker
     
  • Using %L[uxd] has issues in some architectures, like on ppc64. Fix it
    by making our 64 bit integers typedefs of stdint.h types and using
    PRI[ux]64 like, for instance, git does.

    Reported by Denis Kirjanov that provided a patch for one case, I went
    and changed all cases.

    Reported-by: Denis Kirjanov
    Tested-by: Denis Kirjanov
    LKML-Reference:
    Cc: Denis Kirjanov
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Pingtian Han
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

22 Dec, 2010

2 commits

  • The symfs argument allows analysis of perf.data file using a locally accessible
    filesystem tree with debug symbols - e.g., tree created during image builds,
    sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
    with an OS tree can be analyzed from anywhere without the need to populate a
    local data store with build-ids.

    Commiter notes:

    o Fixed up symfs="/" variants handling.

    o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
    outside the symfs directory.

    LKML-Reference:
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • This patch changes perf report to ask for the ID info on all events be
    default if recording from multiple CPUs.

    Perf report, annotate and diff will now process the events in order if
    the kernel is able to provide timestamps on all events. This ensures
    that events such as COMM and MMAP which are necessary to correctly
    interpret samples are processed prior to those samples so that they are
    attributed correctly.

    Before:
    # perf record ./cachetest
    # perf report

    # Events: 6K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ...............................
    #
    74.11% :3259 [unknown] [k] 0x4a6c
    1.50% cachetest ld-2.11.2.so [.] 0x1777c
    1.46% :3259 [kernel.kallsyms] [k] .perf_event_mmap_ctx
    1.25% :3259 [kernel.kallsyms] [k] restore
    0.74% :3259 [kernel.kallsyms] [k] ._raw_spin_lock
    0.71% :3259 [kernel.kallsyms] [k] .filemap_fault
    0.66% :3259 [kernel.kallsyms] [k] .memset
    0.54% cachetest [kernel.kallsyms] [k] .sha_transform
    0.54% :3259 [kernel.kallsyms] [k] .copy_4K_page
    0.54% :3259 [kernel.kallsyms] [k] .find_get_page
    0.52% :3259 [kernel.kallsyms] [k] .trace_hardirqs_off
    0.50% :3259 [kernel.kallsyms] [k] .__do_fault

    After:
    # perf report

    # Events: 6K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ...............................
    #
    44.28% cachetest cachetest [.] sumArrayNaive
    22.53% cachetest cachetest [.] sumArrayOptimal
    6.59% cachetest ld-2.11.2.so [.] 0x1777c
    2.13% cachetest [unknown] [k] 0x340
    1.46% cachetest [kernel.kallsyms] [k] .perf_event_mmap_ctx
    1.25% cachetest [kernel.kallsyms] [k] restore
    0.74% cachetest [kernel.kallsyms] [k] ._raw_spin_lock
    0.71% cachetest [kernel.kallsyms] [k] .filemap_fault
    0.66% cachetest [kernel.kallsyms] [k] .memset
    0.54% cachetest [kernel.kallsyms] [k] .copy_4K_page
    0.54% cachetest [kernel.kallsyms] [k] .find_get_page
    0.54% cachetest [kernel.kallsyms] [k] .sha_transform
    0.52% cachetest [kernel.kallsyms] [k] .trace_hardirqs_off
    0.50% cachetest [kernel.kallsyms] [k] .__do_fault

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ian Munsie
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian Munsie