20 Jun, 2009

2 commits

  • Building perfcounter tools raises the following warnings:

    builtin-record.c: In function ‘atexit_header’:
    builtin-record.c:464: erreur: ignoring return value of ‘pwrite’, declared with attribute warn_unused_result
    builtin-record.c: In function ‘__cmd_record’:
    builtin-record.c:503: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result

    builtin-report.c: In function ‘__cmd_report’:
    builtin-report.c:1403: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result

    This patch handles these IO return values.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • On 64-bit powerpc, __u64 is defined to be unsigned long rather than
    unsigned long long. This causes compiler warnings every time we
    print a __u64 value with %Lx.

    Rather than changing __u64, we define our own u64 to be unsigned long
    long on all architectures, and similarly s64 as signed long long.
    For consistency we also define u32, s32, u16, s16, u8 and s8. These
    definitions are put in a new header, types.h, because these definitions
    are needed in util/string.h and util/symbol.h.

    The main change here is the mechanical change of __[us]{64,32,16,8}
    to remove the "__". The other changes are:

    * Create types.h
    * Include types.h in perf.h, util/string.h and util/symbol.h
    * Add types.h to the LIB_H definition in Makefile
    * Added (u64) casts in process_overflow_event() and print_sym_table()
    to kill two remaining warnings.

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

19 Jun, 2009

2 commits


18 Jun, 2009

8 commits

  • Make it easier to use parent filtering - default to a filtered
    output. Also add the parent column so that we get collapsing but
    dont display it by default.

    add --no-exclude-other to override this.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Make use of the new ->data_tail mechanism to tell kernel-space
    about user-space draining the data stream. Emit lost events
    (and display them) if they happen.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • On 64-bit powerpc, perf needs to be built as a 64-bit executable.
    This arranges to add the -m64 flag to CFLAGS if we are running on
    a 64-bit machine, indicated by the result of uname -m ending in "64".
    This means that we'll use -m64 on x86_64 machines as well.

    Signed-off-by: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: linuxppc-dev@ozlabs.org
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Introduce isprint() to print out raw event dumps to ASCII, etc.

    (This is an extension to upstream Git's ctype.c.)

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    [ removed openssl.h inclusion from util.h - it leaked ctype.h ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Add boundary checks for call-chain events. In case of corrupted
    entries we could crash otherwise.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Instead of the ambigious 'call' naming use the much more
    specific 'parent' naming:

    - rename --call to --parent

    - rename --sort call to --sort parent

    - rename [unmatched] to [other] - to signal that this is not
    an error but the inverse set

    Also add pagefaults to the default parent-symbol pattern too,
    as it's a 'syscall overhead category' in a sense.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The Git utils came with a ctype replacement that doesn't provide
    isprint(). Add a replacement.

    Solves a build bug on certain distros.

    Signed-off-by: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Implement sorting by callchain symbols, --sort .

    It will create a new column which will show a match to
    --call $regex or "[unmatched]".

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Jun, 2009

4 commits

  • Yong Wang reported the following compiler warning:

    builtin-report.c: In function 'process_overflow_event':
    builtin-report.c:984: error: cast to pointer from integer of different size

    Which happens because we try to print ->ips[] out with a limited
    format, losing the high 32 bits. Print it out using %016Lx instead.

    Reported-by: Yong Wang
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Take advantage of call-graph percounter sampling/recording to
    display a non-trivial histogram: the true, collapsed/summarized
    cost measurement, on a per system call total overhead basis:

    aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
    aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
    #
    # (3536 samples)
    #
    # Overhead Symbol
    # ........ ......
    #
    40.75% [k] sys_write
    40.21% [k] sys_read
    4.44% [k] do_nmi
    ...

    This is done by accounting each (reliable) call-chain that chains back
    to a given system call to that system call function.

    [ So in the above example we can see that hackbench spends about 40% of
    its total time somewhere in sys_write() and 40% somewhere in
    sys_read(), the rest of the time is spent in user-space. The time
    is not spent in sys_write() _itself_ but in one of its many child
    functions. ]

    Or, a recording of a (source files are already in the page-cache) kernel build:

    $ perf record -g -m 512 -f -- make -j32 kernel
    $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi

    4.14% [k] do_page_fault
    1.20% [k] sys_write
    1.10% [k] sys_open
    0.63% [k] sys_exit_group
    0.48% [k] smp_apic_timer_interrupt
    0.37% [k] sys_read
    0.37% [k] sys_execve
    0.20% [k] sys_mmap
    0.18% [k] sys_close
    0.14% [k] sys_munmap
    0.13% [k] sys_poll
    0.09% [k] sys_newstat
    0.07% [k] sys_clone
    0.06% [k] sys_newfstat
    0.05% [k] sys_access
    0.05% [k] schedule

    Shows the true total cost of each syscall variant that gets used
    during a kernel build. This profile reveals it that pagefaults are
    the costliest, followed by read()/write().

    An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
    per 100 seconds of kernel build-time. (this was done with HZ=1000)

    The summary is done in 'perf report', i.e. in the post-processing
    stage - so once we have a good call-graph recording, this type of
    non-trivial high-level analysis becomes possible.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Frederic Weisbecker
    Cc: Pekka Enberg
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Recording with -a (or with -p) can race with tasks going away:

    couldn't open /proc/8440/maps

    Causing an early exit() and no recording done.

    Do not abort the recording session - instead just skip that task.

    Also, only print the warnings under -v.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add the first steps of call-graph profiling:

    - add the -c (--call-graph) option to perf record
    - parse the call-graph record and printout out under -D (--dump-trace)

    The call-graph data is not put into the histogram yet, but it
    can be seen that it's being processed correctly:

    0x3ce0 [0x38]: event: 35
    .
    . ... raw event: size 56 bytes
    . 0000: 23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff #.....8........
    . 0010: 60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00 `...`..........
    . 0020: d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00 .........a.A6..
    . 0030: 04 92 e6 41 36 00 00 00 .a.A6..
    .
    0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
    ... chain: u:2, k:1, nr:3
    ..... 0: 0xffffffff810edfd4
    ..... 1: 0x3641ed61a0
    ..... 2: 0x3641e69204
    ... thread: perf:2912
    ...... dso: [kernel]

    This shows a 3-entry call-graph: with 1 kernel-space and two user-space
    entries

    Cc: Frederic Weisbecker
    Cc: Pekka Enberg
    Cc: Arjan van de Ven
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Jun, 2009

1 commit

  • Print out events in hexa dump format, when -D is specified:

    0x4868 [0x48]: event: 1
    .
    . ... raw event: size 72 bytes
    . 0000: 01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00 ......H..r...r.
    . 0010: 00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00 ..@.>....0.....
    . 0020: 00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62 ......../usr/li
    . 0030: 36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e 64/libelf-0.141
    . 0040: 73 6f 00 00 00 00 00 00 f-0.141
    .
    0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so

    This helps the debugging of mis-parsing of data files, and helps
    the addition of new sample/trace formats.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

13 Jun, 2009

7 commits

  • - fix addr2line on userspace binary: don't only check kernel image.
    - fix string allocation size for path: missing ending null char room
    - fix overflow in symbol extra info

    Reported-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • If -vv (very verbose) is specified, print out raw data
    in the following format:

    $ perf stat -vv -r 3 ./loop_1b_instructions

    [ perf stat: executing run #1 ... ]
    [ perf stat: executing run #2 ... ]
    [ perf stat: executing run #3 ... ]

    debug: runtime[0]: 235871872
    debug: walltime[0]: 236646752
    debug: runtime_cycles[0]: 755150182
    debug: counter/0[0]: 235871872
    debug: counter/1[0]: 235871872
    debug: counter/2[0]: 235871872
    debug: scaled[0]: 0
    debug: counter/0[1]: 2
    debug: counter/1[1]: 235870662
    debug: counter/2[1]: 235870662
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235870437
    debug: counter/2[2]: 235870437
    debug: scaled[2]: 0
    debug: counter/0[3]: 140
    debug: counter/1[3]: 235870298
    debug: counter/2[3]: 235870298
    debug: scaled[3]: 0
    debug: counter/0[4]: 755150182
    debug: counter/1[4]: 235870145
    debug: counter/2[4]: 235870145
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001411258
    debug: counter/1[5]: 235868838
    debug: counter/2[5]: 235868838
    debug: scaled[5]: 0
    debug: counter/0[6]: 27897
    debug: counter/1[6]: 235868560
    debug: counter/2[6]: 235868560
    debug: scaled[6]: 0
    debug: counter/0[7]: 2910
    debug: counter/1[7]: 235868151
    debug: counter/2[7]: 235868151
    debug: scaled[7]: 0
    debug: runtime[0]: 235980257
    debug: walltime[0]: 236770942
    debug: runtime_cycles[0]: 755114546
    debug: counter/0[0]: 235980257
    debug: counter/1[0]: 235980257
    debug: counter/2[0]: 235980257
    debug: scaled[0]: 0
    debug: counter/0[1]: 3
    debug: counter/1[1]: 235980049
    debug: counter/2[1]: 235980049
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235979907
    debug: counter/2[2]: 235979907
    debug: scaled[2]: 0
    debug: counter/0[3]: 135
    debug: counter/1[3]: 235979780
    debug: counter/2[3]: 235979780
    debug: scaled[3]: 0
    debug: counter/0[4]: 755114546
    debug: counter/1[4]: 235979652
    debug: counter/2[4]: 235979652
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001439771
    debug: counter/1[5]: 235979304
    debug: counter/2[5]: 235979304
    debug: scaled[5]: 0
    debug: counter/0[6]: 23723
    debug: counter/1[6]: 235979050
    debug: counter/2[6]: 235979050
    debug: scaled[6]: 0
    debug: counter/0[7]: 2213
    debug: counter/1[7]: 235978820
    debug: counter/2[7]: 235978820
    debug: scaled[7]: 0
    debug: runtime[0]: 235888002
    debug: walltime[0]: 236700533
    debug: runtime_cycles[0]: 754881504
    debug: counter/0[0]: 235888002
    debug: counter/1[0]: 235888002
    debug: counter/2[0]: 235888002
    debug: scaled[0]: 0
    debug: counter/0[1]: 2
    debug: counter/1[1]: 235887793
    debug: counter/2[1]: 235887793
    debug: scaled[1]: 0
    debug: counter/0[2]: 1
    debug: counter/1[2]: 235887645
    debug: counter/2[2]: 235887645
    debug: scaled[2]: 0
    debug: counter/0[3]: 135
    debug: counter/1[3]: 235887499
    debug: counter/2[3]: 235887499
    debug: scaled[3]: 0
    debug: counter/0[4]: 754881504
    debug: counter/1[4]: 235887368
    debug: counter/2[4]: 235887368
    debug: scaled[4]: 0
    debug: counter/0[5]: 1001401731
    debug: counter/1[5]: 235887024
    debug: counter/2[5]: 235887024
    debug: scaled[5]: 0
    debug: counter/0[6]: 24212
    debug: counter/1[6]: 235886786
    debug: counter/2[6]: 235886786
    debug: scaled[6]: 0
    debug: counter/0[7]: 1824
    debug: counter/1[7]: 235886560
    debug: counter/2[7]: 235886560
    debug: scaled[7]: 0

    Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):

    235.913377 task-clock-msecs # 0.997 CPUs ( +- 0.011% )
    2 context-switches # 0.000 M/sec ( +- 0.000% )
    1 CPU-migrations # 0.000 M/sec ( +- 0.000% )
    136 page-faults # 0.001 M/sec ( +- 0.730% )
    755048744 cycles # 3200.534 M/sec ( +- 0.009% )
    1001417586 instructions # 1.326 IPC ( +- 0.001% )
    25277 cache-references # 0.107 M/sec ( +- 3.988% )
    2315 cache-misses # 0.010 M/sec ( +- 9.845% )

    0.236706075 seconds time elapsed.

    This allows the summary stats to be validated.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Add the --repeat feature to perf stat, which repeats a given
    command up to a 100 times, collects the stats and calculates an
    average and a stddev.

    For example, the following oneliner 'perf stat' command runs hackbench
    5 times and prints a tabulated result of all metrics, with averages
    and noise levels (in percentage) printed:

    aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
    Time: 0.117
    Time: 0.108
    Time: 0.089
    Time: 0.088
    Time: 0.100

    Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

    1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% )
    47706 context-switches # 0.038 M/sec ( +- 19.706% )
    387 CPU-migrations # 0.000 M/sec ( +- 3.608% )
    17793 page-faults # 0.014 M/sec ( +- 0.354% )
    3770941606 cycles # 3031.329 M/sec ( +- 4.621% )
    1566372416 instructions # 0.415 IPC ( +- 2.703% )
    16783421 cache-references # 13.492 M/sec ( +- 5.202% )
    7128590 cache-misses # 5.730 M/sec ( +- 7.420% )

    0.118924455 seconds time elapsed.

    The goal of this feature is to allow the reliance on these accurate
    statistics and to know how many times a command has to be repeated
    for the noise to go down to an acceptable level.

    (The -v option can be used to see a line printed out as each run progresses.)

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • - use IPC for the instruction normalization output
    - CPUs for the CPU utilization factor value.
    - print out time elapsed like the other rows
    - tidy up the task-clocks/cpu-clocks printout

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • It's can be very annoying to scroll down perf annotated output
    until we find relevant overhead.

    Using the -l option, you can now have a small summary sorted per
    overhead in the beginning of the output.

    Example:

    ./perf annotate -l -k ../../vmlinux -s __lock_acquire

    Sorted summary for file ../../vmlinux
    ----------------------------------------------

    12.04 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
    4.61 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
    3.77 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1775
    3.56 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
    2.93 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
    2.83 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2545
    2.30 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
    2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2388
    2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
    2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
    2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:138
    1.88 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2548
    1.47 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
    1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
    1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
    1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1654
    1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
    1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2592
    1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
    1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740

    [...]

    Only overhead over 0.5% are summarized.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • When we have a colored line in perf annotate, ie a middle/high
    overhead one, it's sometimes useful to get the matching line
    and filename from the source file, especially this path prepares
    to another subsequent one which will print a sorted summary of
    midle/high overhead lines in the beginning of the output.

    Filename:Lines have the same color than the concerned ip lines.

    It can be slow because it relies on addr2line. We could also
    use objdump with -l but that implies we would have to bufferize
    objdump output and parse it to filter the relevant lines since
    we want to print a sorted summary in the beginning.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Help out arch porters who want to support perf counters by listing some
    basic requirements.

    Signed-off-by: Mike Frysinger
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Frysinger
     

12 Jun, 2009

3 commits

  • Provide for means of extending the perf_counter_attr in a 'natural' way.

    We allow growing the structure by appending fields at the end by specifying
    the full structure size inside it.

    When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
    When an old kernel sees a larger (new) structure, it will verify the tail
    consists of 0s, otherwise fail.

    If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
    native attribe size back into the provided structure.

    Furthermore, add some attribute verification, so that we'll fail counter
    creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
    the __reserved fields).

    (This ABI detail is introduced while keeping the existing syscall ABI.)

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Up until now record has worked on the assumption that type=0, config=0
    was a suitable configuration - which it is. Lets make this a little more
    explicit and more readable via the use of proper symbols.

    [ Impact: cleanup ]

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Otherwise all L1-instruction aliases will be recognized as
    L1-data by strcasestr() when calling function parse_aliases.

    Signed-off-by: Yong Wang
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yong Wang
     

11 Jun, 2009

3 commits

  • Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • A build error slipped in:

    builtin-report.c: In function ‘hist_entry__fprintf’:
    builtin-report.c:711: error: format ‘%12d’ expects type ‘int’, but argument 3 has type ‘uint64_t’

    Because we got a bit sloppy with those types. uint64_t really sucks,
    because there's no printf format for it. So standardize on __u64
    instead - for all types that go to or come from the ABI (which is __u64),
    or for values that need to be large enough even on 32-bit.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • When we use variable period sampling, add the period to the sample
    data and use that to normalize the samples.

    Signed-off-by: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

10 Jun, 2009

2 commits

  • Currently report and stat catch SIGINT (and others) without altering
    their exit state. This means that things like:

    while :; do perf stat ./foo ; done

    Loops become hard-to-interrupt, because bash never sees perf terminate
    due to interruption. Fix this.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Create the counter in a disabled state and only enable it after we
    mmap() the buffer, this allows us to see the first few samples (and
    observe the frequency ramp).

    Furthermore, print the period in the verbose report.

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Jun, 2009

2 commits

  • The rule is:

    - high overhead: red
    - mid overhead: green
    - low overhead: normal (white/black)

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This patch adds support for profiling JIT generated code to 'perf
    report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
    symbols map that is parsed when looking and displaying symbols.

    Thanks to Peter Zijlstra for his help with this patch!

    Example "perf report" output with the Jato JIT:

    #
    # (40311 samples)
    #
    # Overhead Command Shared Object Symbol
    # ........ ................ ......................... ......
    #
    97.80% jato /tmp/perf-11915.map [.] Fibonacci.fib(I)I
    0.56% jato 00000000b7fa023b 0x000000b7fa023b
    0.45% jato /tmp/perf-11915.map [.] Fibonacci.main([Ljava/lang/String;)V
    0.38% jato [kernel] [k] get_page_from_freelist
    0.06% jato [kernel] [k] kunmap_atomic
    0.05% jato ./jato [.] utf8Hash
    0.04% jato ./jato [.] executeJava
    0.04% jato ./jato [.] defineClass

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Pekka Enberg
    Cc: a.p.zijlstra@chello.nl
    Cc: acme@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Pekka Enberg
     

08 Jun, 2009

1 commit

  • Before:

    7549326754 cycles # 3201.811 M/sec
    10007594937 instructions # 4244.408 M/sec

    After:

    7542051194 cycles # 3201.996 M/sec
    10007743852 instructions # 4248.811 M/sec # 1.327 per cycle

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

07 Jun, 2009

5 commits

  • Before:

    $ perf report
    failed to open file: No such file or directory

    After:

    $ perf report
    failed to open file: perf.data (try 'perf record' first)

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • If perf is run on a !CONFIG_PERF_COUNTER kernel right now it
    bails out with no messages or with confusing messages.

    Standardize this case some more and explain the situation.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • On architectures/CPUs without PMU support but with perfcounters
    enabled 'perf record' currently fails because it cannot create a
    cycle based hw-perfcounter.

    Fall back to the cpu-clock-tick sw-perfcounter in this case, which
    is hrtimer based and will always work (as long as perfcounters
    are enabled).

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • On architectures/CPUs without PMU support but with perfcounters
    enabled 'perf top' currently fails because it cannot create a
    cycle based hw-perfcounter.

    Fall back to the cpu-clock-tick sw-perfcounter in this case, which
    is hrtimer based and will always work (as long as perfcounters
    is enabled).

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Before:

    $ perf stat ~/hackbench 5

    error: syscall returned with -1 (No such device)

    After:

    $ perf stat ~/hackbench 5
    Time: 1.640

    Performance counter stats for '/home/mingo/hackbench 5':

    6524.570382 task-clock-ticks # 3.838 CPU utilization factor
    35704 context-switches # 0.005 M/sec
    191 CPU-migrations # 0.000 M/sec
    8958 page-faults # 0.001 M/sec
    cycles
    instructions
    cache-references
    cache-misses

    Wall-clock time elapsed: 1699.999995 msecs

    Also add -v (--verbose) option to allow the printing of failed
    counter opens.

    Plus dont print 'inf' if wall-time is zero (due to jiffies granularity),
    instead skip the printing of the CPU utilization factor.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar