24 Dec, 2011

3 commits

  • The features HEADER_TRACE_INFO and HEADER_BUILD_ID are handled
    different when writing the feature section. All other features are
    simply disabled on failure and writing the section goes on without
    returning an error. There is no reason for these special cases. This
    patch unifies handling of the features.

    This should be ok since all features can be parsed independently.
    Offset and size of a feature's block is stored in struct perf_file_
    section right after the data block of perf.data (see perf_session__
    write_header()). Thus, if a feature does not exist then other features
    can be processed anyway.

    Also moving special code for HEADER_BUILD_ID out to write_build_id().

    v2:
    * perf record throws an error now if buildids may not be generated,
    which can be disabled with the --no-buildid option.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1323248577-11268-6-git-send-email-robert.richter@amd.com
    Signed-off-by: Robert Richter
    Signed-off-by: Arnaldo Carvalho de Melo

    Robert Richter
     
  • Now that we automatically point users at it, let's provide them some
    guidance so that they hopefully don't just get mysterious EINVAL's
    from the kernel.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1324301972-22740-4-git-send-email-nelhage@nelhage.com
    Signed-off-by: Nelson Elhage
    [ committer note: Made it work after 50a682c ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Nelson Elhage
     
  • This failure is most likely due to running up against the
    kernel.perf_event_mlock_kb sysctl, so we can tell the user what to do to
    fix the issue.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1324301972-22740-3-git-send-email-nelhage@nelhage.com
    Signed-off-by: Nelson Elhage
    Signed-off-by: Arnaldo Carvalho de Melo

    Nelson Elhage
     

20 Dec, 2011

1 commit

  • The problem is that when SAMPLE_PERIOD is not set, the kernel generates
    a number of samples in proportion to an event's period. Number of these
    samples may be too big and the kernel throttles all samples above a
    defined limit.

    E.g.: I want to trace when a process sleeps. I created a process which
    sleeps for 1ms and for 4ms. perf got 100 events in both cases.

    swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns]
    swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns]

    In the first case a kernel want to send 4499585 events and in the second
    case it wants to send 1386750 events. perf-reports shows that process
    sleeps in both places equal time.

    Instead of this we can get only one sample with an attribute period. As
    result we have less data transferring between kernel and user-space and
    we avoid throttling of samples.

    The patch "events: Don't divide events if it has field period" added a
    kernel part of this functionality.

    Acked-by: Arun Sharma
    Cc: Arun Sharma
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: devel@openvz.org
    Link: http://lkml.kernel.org/r/1324391565-1369947-1-git-send-email-avagin@openvz.org
    Signed-off-by: Andrew Vagin
    Signed-off-by: Arnaldo Carvalho de Melo

    Andrew Vagin
     

29 Nov, 2011

1 commit

  • At first tools were required to do that, but while writing the python
    bindings to simplify the API I made them auto-allocate when needed.

    This just makes record, stat and top use that auto allocation,
    simplifying them a bit.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-iokhcvkzzijr3keioubx8hlq@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

28 Nov, 2011

8 commits


26 Oct, 2011

2 commits

  • As it will exit the tool after the user is notified.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-vy06m8xzlvkhr8tk7nylhbng@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The __perf_evsel__open routing was grouping just the threads for that
    specific events per cpu when we want to group all threads in all events
    to the first fd opened on that cpu.

    So pass the xyarray with the first event, where the other events will be
    able to get that first per cpu fd.

    At some point top and record will switch to using perf_evlist__open that
    takes care of this detail and probably will also handle the fallback
    from hw to soft counters, etc.

    Reported-by: Deng-Cheng Zhu
    Tested-by: Deng-Cheng Zhu
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ebm34rh098i9y9v4cytfdp0x@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

08 Oct, 2011

1 commit

  • The goal of this patch is to include more information about the host
    environment into the perf.data so it is more self-descriptive. Overtime,
    profiles are captured on various machines and it becomes hard to track
    what was recorded, on what machine and when.

    This patch provides a way to solve this by extending the perf.data file
    with basic information about the host machine. To add those extensions,
    we leverage the feature bits capabilities of the perf.data format. The
    change is backward compatible with existing perf.data files.

    We define the following useful new extensions:
    - HEADER_HOSTNAME: the hostname
    - HEADER_OSRELEASE: the kernel release number
    - HEADER_ARCH: the hw architecture
    - HEADER_CPUDESC: generic CPU description
    - HEADER_NRCPUS: number of online/avail cpus
    - HEADER_CMDLINE: perf command line
    - HEADER_VERSION: perf version
    - HEADER_TOPOLOGY: cpu topology
    - HEADER_EVENT_DESC: full event description (attrs)
    - HEADER_CPUID: easy-to-parse low level CPU identication

    The small granularity for the entries is to make it easier to extend
    without breaking backward compatiblity. Many entries are provided as
    ASCII strings.

    Perf report/script have been modified to print the basic information as
    easy-to-parse ASCII strings. Extended information about CPU and NUMA
    topology may be requested with the -I option.

    Thanks to David Ahern for reviewing and testing the many versions of
    this patch.

    $ perf report --stdio
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # HEADER_CPU_TOPOLOGY info available, use -I to display
    # HEADER_NUMA_TOPOLOGY info available, use -I to display
    # ========
    #
    ...

    $ perf report --stdio -I
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # sibling cores : 0-3
    # sibling threads : 0
    # sibling threads : 1
    # sibling threads : 2
    # sibling threads : 3
    # node0 meminfo : total = 8320608 kB, free = 7571024 kB
    # node0 cpu list : 0-3
    # ========
    #
    ...

    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
    Signed-off-by: Stephane Eranian
    [ committer notes: Use --show-info in the tools as was in the docs, rename
    perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
    conflict with f69b64f7 "perf: Support setting the disassembler style" ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

06 Oct, 2011

1 commit


30 Sep, 2011

1 commit

  • When a program crashes under perf there is no message about it, unlike
    when running it from bash. This can be confusing and lead to wrong
    actions during debugging.

    Print fatal signals in perf stat/record.

    Thanks to Furat Afram for finding the problem originally

    Link: http://lkml.kernel.org/r/1316122302-24306-1-git-send-email-andi@firstfloor.org
    Cc: Frederic Weisbecker
    Cc: Stephane Eranian
    Signed-off-by: Andi Kleen
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

24 Sep, 2011

1 commit

  • perf-record currently creates events enabled. When doing a system wide
    collection (-a arg) this causes data collection for perf's
    initialization activities -- eg., perf_event__synthesize_threads().

    For some events (e.g., context switch S/W event or tracepoints like
    syscalls) perf's initialization causes a lot of events to be captured
    frequently generating "Check IO/CPU overload!" warnings on larger
    systems (e.g., 2 socket, quad core, hyperthreading).

    perf's initialization phase can be skipped by creating events
    disabled and then enabling them once the initialization is done.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

18 Aug, 2011

1 commit

  • Group event scheduling command line option is missing in perf
    record/stat.

    Add it to perf record/stat, which is same as in perf top.

    Reported-by: Andi Kleen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1313577727.2754.5.camel@hp6530s
    Signed-off-by: Lin Ming
    Signed-off-by: Arnaldo Carvalho de Melo

    Lin Ming
     

25 Jul, 2011

1 commit

  • To remove the last case of access to the FD() macro outside the library.

    Inspired by a patch by Borislav that moved the FD() macro to util.h, for
    namespace concerns I rather preferred to constrain it to ev{sel,list}.c.

    Cc: Borislav Petkov
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-qn893qsstcg366tkucu649qj@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

21 Jul, 2011

1 commit

  • Moving out the option parameter from parse_events function,
    and adding new parse_events_option function instead.

    The option parameter is used only to carry "struct perf_evlist"
    pointer for chaining new events. Putting it away, enable us
    to call parse_events from other places without using the
    option parameter.

    Signed-off-by: Jiri Olsa
    Cc: acme@redhat.com
    Cc: a.p.zijlstra@chello.nl
    Cc: paulus@samba.org
    Link: http://lkml.kernel.org/r/1310635534-4013-2-git-send-email-jolsa@redhat.com
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

28 May, 2011

1 commit


26 May, 2011

1 commit

  • Perf uses /proc/modules to figure out where kernel modules are loaded.

    With the advent of kptr_restrict, non root users get zeroes for all module
    start addresses.

    So check if kptr_restrict is non zero and don't generate the syntethic
    PERF_RECORD_MMAP events for them.

    Warn the user about it in perf record and in perf report.

    In perf report the reference relocation symbol being zero means that
    kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
    use it to fixup symbol addresses when using a valid kallsyms (in the buildid
    cache) or vmlinux (in the vmlinux path) build-id located automatically or
    specified by the user.

    Provide an explanation about it in 'perf report' if kernel samples were taken,
    checking if a suitable vmlinux or kallsyms was found/specified.

    Restricted /proc/kallsyms don't go to the buildid cache anymore.

    Example:

    [acme@emilia ~]$ perf record -F 100000 sleep 1

    WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
    /proc/sys/kernel/kptr_restrict.

    Samples in kernel functions may not be resolved if a suitable vmlinux file is
    not found in the buildid cache or in the vmlinux path.

    Samples in kernel modules won't be resolved at all.

    If some relocation was applied (e.g. kexec) symbols may be misresolved even
    with a suitable vmlinux or kallsyms file.

    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
    [acme@emilia ~]$

    [acme@emilia ~]$ perf report --stdio
    Kernel address maps (/proc/{kallsyms,modules}) were restricted,
    check /proc/sys/kernel/kptr_restrict before running 'perf record'.

    If some relocation was applied (e.g. kexec) symbols may be misresolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. .....................
    #
    20.24% sleep [kernel.kallsyms] [k] page_fault
    20.04% sleep [kernel.kallsyms] [k] filemap_fault
    19.78% sleep [kernel.kallsyms] [k] __lru_cache_add
    19.69% sleep ld-2.12.so [.] memcpy
    14.71% sleep [kernel.kallsyms] [k] dput
    4.70% sleep [kernel.kallsyms] [k] flush_signal_handlers
    0.73% sleep [kernel.kallsyms] [k] perf_event_comm
    0.11% sleep [kernel.kallsyms] [k] native_write_msr_safe

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    This is because it found a suitable vmlinux (build-id checked) in
    /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
    file name).

    If we remove that file from the vmlinux path:

    [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
    /lib/modules/2.6.39-rc7+/build/vmlinux.OFF
    [acme@emilia ~]$ perf report --stdio
    [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
    not found, continuing without symbols

    Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
    /proc/sys/kernel/kptr_restrict before running 'perf record'.

    As no suitable kallsyms nor vmlinux was found, kernel samples can't be
    resolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ......
    #
    80.31% sleep [kernel.kallsyms] [k] 0xffffffff8103425a
    19.69% sleep ld-2.12.so [.] memcpy

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    Reported-by: Stephane Eranian
    Suggested-by: David Miller
    Cc: Dave Jones
    Cc: David Miller
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 May, 2011

1 commit

  • The PERF_EVENT_IOC_SET_OUTPUT ioctl was returning -EINVAL when using
    --pid when monitoring multithreaded apps, as we can only share a ring
    buffer for events on the same thread if not doing per cpu.

    Fix it by using per thread ring buffers.

    Tested with:

    [root@felicio ~]# tuna -t 26131 -CP | nl
    1 thread ctxt_switches
    2 pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    3 26131 OTHER 0 0,1 10814276 2397830 chromium-browse
    4 642 OTHER 0 0,1 14688 0 chromium-browse
    5 26148 OTHER 0 0,1 713602 115479 chromium-browse
    6 26149 OTHER 0 0,1 801958 2262 chromium-browse
    7 26150 OTHER 0 0,1 1271128 248 chromium-browse
    8 26151 OTHER 0 0,1 3 0 chromium-browse
    9 27049 OTHER 0 0,1 36796 9 chromium-browse
    10 618 OTHER 0 0,1 14711 0 chromium-browse
    11 661 OTHER 0 0,1 14593 0 chromium-browse
    12 29048 OTHER 0 0,1 28125 0 chromium-browse
    13 26143 OTHER 0 0,1 2202789 781 chromium-browse
    [root@felicio ~]#

    So 11 threads under pid 26131, then:

    [root@felicio ~]# perf record -F 50000 --pid 26131

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7fa4a2538000-7fa4a25b9000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    2 7fa4a25b9000-7fa4a263a000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    3 7fa4a263a000-7fa4a26bb000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    4 7fa4a26bb000-7fa4a273c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    5 7fa4a273c000-7fa4a27bd000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    6 7fa4a27bd000-7fa4a283e000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    7 7fa4a283e000-7fa4a28bf000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    8 7fa4a28bf000-7fa4a2940000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    9 7fa4a2940000-7fa4a29c1000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    10 7fa4a29c1000-7fa4a2a42000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    11 7fa4a2a42000-7fa4a2ac3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    11 mmaps, one per thread since we didn't specify any CPU list, so we need one
    mmap per thread and:

    [root@felicio ~]# perf record -F 50000 --pid 26131
    ^M
    ^C[ perf record: Woken up 79 times to write data ]
    [ perf record: Captured and wrote 20.614 MB perf.data (~900639 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 371310 26131
    2 96516 26148
    3 95694 26149
    4 95203 26150
    5 7291 26143
    6 87 27049
    7 76 661
    8 60 29048
    9 47 618
    10 43 642
    [root@felicio ~]#

    Ok, one of the threads, 26151 was quiescent, so no samples there, but all the
    others are there.

    Then, if I specify one CPU:

    [root@felicio ~]# perf record -F 50000 --pid 26131 --cpu 1
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.680 MB perf.data (~29730 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 8444 26131
    2 2584 26149
    3 2518 26148
    4 2324 26150
    5 123 26143
    6 9 661
    7 9 29048
    [root@felicio ~]#

    This machine has two cores, so fewer threads appeared on the radar, and:

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7f484b922000-7f484b9a3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    Just one mmap, as now we can use just one per-cpu buffer instead of the
    per-thread needed in the previous case.

    For global profiling:

    [root@felicio ~]# perf record -F 50000 -a
    ^C[ perf record: Woken up 26 times to write data ]
    [ perf record: Captured and wrote 7.128 MB perf.data (~311412 samples) ]

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7fb49b435000-7fb49b4b6000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    2 7fb49b4b6000-7fb49b537000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    It uses per-cpu buffers.

    For just one thread:

    [root@felicio ~]# perf record -F 50000 --tid 26148
    ^C[ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.330 MB perf.data (~14426 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 9969 26148
    [root@felicio ~]#

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7f286a51b000-7f286a59c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    Tested-by: David Ahern
    Tested-by: Lin Ming
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

15 Apr, 2011

1 commit

  • perf stat doesn't mmap and its perfectly fine for it to use task-bound
    counters with inheritance.

    So set the attr.inherit on the caller and leave the syscall itself to
    validate it.

    When the mmap fails perf_evlist__mmap will just emit a warning if this
    is the failure reason.

    Reported-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/r/20110414170121.GC3229@ghostprotocols.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

31 Mar, 2011

1 commit

  • The default setting of perf record is to mmap 128 pages if the user
    did not override with -m.

    However the page size may vary accross different architecture
    settings, giving different default size between each.

    Moreover the kernel side still has a default max number of mlocked
    pages of 512 kiB + 1 page for unprivileged users. 128 + 1 pages
    with page size > 4096 overlaps this threshold.

    Thus, better adapt to this limitation and set the default number of
    pages to fit those 512 kiB + 1 page.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

30 Mar, 2011

2 commits

  • Resend of patch sent back in January 2011 in light of recent confusion around
    unsupported events for a given platform.

    Improve sys_perf_event_open ENOENT return handling in top and record, just
    like 5a3446b does for stat.

    Retry of Arnaldo's patch using ui_warning instead of die which allows the
    fallback from hardware cycles to software clock.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    LKML-Reference:
    Signed-off-by: David Ahern
    [ committer note: Some adjustments to make it apply to newer codebase ]
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • We have to deal with the TUI mode in perf top, so that we don't end up
    with a garbled screen when, say, a non root user on a machine with a
    paranoid setting (the default) tries to use 'perf top'.

    Introduce a ui__warning_paranoid() routine shared by top and record that
    tells the user the valid values for /proc/sys/kernel/perf_event_paranoid.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

10 Mar, 2011

1 commit

  • So that we can reuse things like the id to attr lookup routine
    (perf_evlist__id2evsel) that uses a hash table instead of the linear
    lookup done in the older perf_header_attr routines, etc.

    Also to make evsels/evlist more pervasive an API, simplyfing using the
    emerging perf lib.

    cc: Arun Sharma
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

02 Mar, 2011

1 commit

  • We currently set the filters after we mmap the events, this is a
    race that let undesired events record themselves in the buffer before
    we had the time to set the filters.

    So set the filters before they can be recorded. That also librarizes
    the filters setting so that filtering can be done more easily
    from other tools than perf record later.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt

    Frederic Weisbecker
     

17 Feb, 2011

1 commit

  • While testing the --filter option I noticed that we were writing lots of
    unneeded stuff to the perf.data header when the filter ioctl fails, so
    move the atexit(atexit_header) call to after we create the counters
    successfully.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

16 Feb, 2011

1 commit

  • This patch adds the ability to filter monitoring based on container groups
    (cgroups) for both perf stat and perf record. It is possible to monitor
    multiple cgroup in parallel. There is one cgroup per event. The cgroups to
    monitor are passed via a new -G option followed by a comma separated list of
    cgroup names.

    The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
    finds the corresponding directory in the cgroup filesystem and opens it. It
    then passes that file descriptor to the kernel.

    Example:

    $ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
    Performance counter stats for 'sleep 1':

    2,368,667,414 cycles test1
    2,369,661,459 cycles
    cycles test2

    1.001856890 seconds time elapsed

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

11 Feb, 2011

1 commit


10 Feb, 2011

1 commit

  • Jeff Moyer reported these messages:

    Warning: ... trying to fall back to cpu-clock-ticks

    couldn't open /proc/-1/status
    couldn't open /proc/-1/maps
    [ls output]
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.008 MB perf.data (~363 samples) ]

    That lead me and David Ahern to see that something was fishy on the thread
    synthesizing routines, at least for the case where the workload is started
    from 'perf record', as -1 is the default for target_tid in 'perf record --tid'
    parameter, so somehow we were trying to synthesize the PERF_RECORD_MMAP and
    PERF_RECORD_COMM events for the thread -1, a bug.

    So I investigated this and noticed that when we introduced support for
    recording a process and its threads using --pid some bugs were introduced and
    that the way to fix it was to instead of passing the target_tid to the event
    synthesizing routines we should better pass the thread_map that has the list of
    threads for a --pid or just the single thread for a --tid.

    Checked in the following ways:

    On a 8-way machine run cyclictest:

    [root@emilia ~]# perf record cyclictest -a -t -n -p99 -i100 -d50
    policy: fifo: loadavg: 0.00 0.13 0.31 2/139 28798

    T: 0 (28791) P:99 I:100 C: 25072 Min: 4 Act: 5 Avg: 6 Max: 122
    T: 1 (28792) P:98 I:150 C: 16715 Min: 4 Act: 6 Avg: 5 Max: 27
    T: 2 (28793) P:97 I:200 C: 12534 Min: 4 Act: 5 Avg: 4 Max: 8
    T: 3 (28794) P:96 I:250 C: 10028 Min: 4 Act: 5 Avg: 5 Max: 96
    T: 4 (28795) P:95 I:300 C: 8357 Min: 5 Act: 6 Avg: 5 Max: 12
    T: 5 (28796) P:94 I:350 C: 7163 Min: 5 Act: 6 Avg: 5 Max: 12
    T: 6 (28797) P:93 I:400 C: 6267 Min: 4 Act: 5 Avg: 5 Max: 9
    T: 7 (28798) P:92 I:450 C: 5571 Min: 4 Act: 5 Avg: 5 Max: 9
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.108 MB perf.data (~4719 samples) ]

    [root@emilia ~]#

    This will create one extra thread per CPU:

    [root@emilia ~]# tuna -t cyclictest -CP
    thread ctxt_switches
    pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    28825 OTHER 0 0xff 2169 671 cyclictest
    28832 FIFO 93 6 52338 1 cyclictest
    28833 FIFO 92 7 46524 1 cyclictest
    28826 FIFO 99 0 209360 1 cyclictest
    28827 FIFO 98 1 139577 1 cyclictest
    28828 FIFO 97 2 104686 0 cyclictest
    28829 FIFO 96 3 83751 1 cyclictest
    28830 FIFO 95 4 69794 1 cyclictest
    28831 FIFO 94 5 59825 1 cyclictest
    [root@emilia ~]#

    So we should expect only samples for the above 9 threads when using the
    --dump-raw-trace|-D perf report switch to look at the column with the tid:

    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    629 28825
    110 28826
    491 28827
    308 28828
    198 28829
    621 28830
    225 28831
    203 28832
    89 28833
    [root@emilia ~]#

    So for workloads started by 'perf record' seems to work, now for existing workloads,
    just run cyclictest first, without 'perf record':

    [root@emilia ~]# tuna -t cyclictest -CP
    thread ctxt_switches
    pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    28859 OTHER 0 0xff 594 200 cyclictest
    28864 FIFO 95 4 16587 1 cyclictest
    28865 FIFO 94 5 14219 1 cyclictest
    28866 FIFO 93 6 12443 0 cyclictest
    28867 FIFO 92 7 11062 1 cyclictest
    28860 FIFO 99 0 49779 1 cyclictest
    28861 FIFO 98 1 33190 1 cyclictest
    28862 FIFO 97 2 24895 1 cyclictest
    28863 FIFO 96 3 19918 1 cyclictest
    [root@emilia ~]#

    and then later did:

    [root@emilia ~]# perf record --pid 28859 sleep 3
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.027 MB perf.data (~1195 samples) ]
    [root@emilia ~]#

    To collect 3 seconds worth of samples for pid 28859 and its children:

    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    15 28859
    33 28860
    19 28861
    13 28862
    13 28863
    10 28864
    11 28865
    9 28866
    255 28867
    [root@emilia ~]#

    Works, last thing is to check if looking at just one of those threads also works:

    [root@emilia ~]# perf record --tid 28866 sleep 3
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.006 MB perf.data (~242 samples) ]
    [root@emilia ~]# perf report -D | grep RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort | uniq -c
    3 28866
    [root@emilia ~]#

    Works too.

    Reported-by: Jeff Moyer
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Jeff Moyer
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

31 Jan, 2011

1 commit

  • So that we don't have to pass it around to the several methods that
    needs it, simplifying usage.

    There is one case where we don't have the thread/cpu map in advance,
    which is in the parsing routines used by top, stat, record, that we have
    to wait till all options are parsed to know if a cpu or thread list was
    passed to then create those maps.

    For that case consolidate the cpu and thread map creation via
    perf_evlist__create_maps() out of the code in top and record, while also
    providing a perf_evlist__set_maps() for cases where multiple evlists
    share maps or for when maps that represent CPU sockets, for instance,
    get crafted out of topology information or subsets of threads in a
    particular application are to be monitored, providing more granularity
    in specifying which cpus and threads to monitor.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

30 Jan, 2011

3 commits


24 Jan, 2011

1 commit

  • To untangle it from struct thread handling, that is tied to symbols, etc.

    Right now in the python bindings I'm working on I need just a subset of
    the util/ files, untangling it allows me to do that.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo