20 Dec, 2011

1 commit

  • The problem is that when SAMPLE_PERIOD is not set, the kernel generates
    a number of samples in proportion to an event's period. Number of these
    samples may be too big and the kernel throttles all samples above a
    defined limit.

    E.g.: I want to trace when a process sleeps. I created a process which
    sleeps for 1ms and for 4ms. perf got 100 events in both cases.

    swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns]
    swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns]

    In the first case a kernel want to send 4499585 events and in the second
    case it wants to send 1386750 events. perf-reports shows that process
    sleeps in both places equal time.

    Instead of this we can get only one sample with an attribute period. As
    result we have less data transferring between kernel and user-space and
    we avoid throttling of samples.

    The patch "events: Don't divide events if it has field period" added a
    kernel part of this functionality.

    Acked-by: Arun Sharma
    Cc: Arun Sharma
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: devel@openvz.org
    Link: http://lkml.kernel.org/r/1324391565-1369947-1-git-send-email-avagin@openvz.org
    Signed-off-by: Andrew Vagin
    Signed-off-by: Arnaldo Carvalho de Melo

    Andrew Vagin
     

28 Nov, 2011

4 commits


13 Oct, 2011

1 commit

  • To do that we needed to stop using newtForm, as we don't want libnewt to
    catch the xterm resize signal.

    Remove some more newt calls and instead use the underlying libslang
    directly. In time tools/perf will use just libslang.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-h1824yjiru5n2ivz4bseizwj@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

08 Oct, 2011

1 commit

  • The goal of this patch is to include more information about the host
    environment into the perf.data so it is more self-descriptive. Overtime,
    profiles are captured on various machines and it becomes hard to track
    what was recorded, on what machine and when.

    This patch provides a way to solve this by extending the perf.data file
    with basic information about the host machine. To add those extensions,
    we leverage the feature bits capabilities of the perf.data format. The
    change is backward compatible with existing perf.data files.

    We define the following useful new extensions:
    - HEADER_HOSTNAME: the hostname
    - HEADER_OSRELEASE: the kernel release number
    - HEADER_ARCH: the hw architecture
    - HEADER_CPUDESC: generic CPU description
    - HEADER_NRCPUS: number of online/avail cpus
    - HEADER_CMDLINE: perf command line
    - HEADER_VERSION: perf version
    - HEADER_TOPOLOGY: cpu topology
    - HEADER_EVENT_DESC: full event description (attrs)
    - HEADER_CPUID: easy-to-parse low level CPU identication

    The small granularity for the entries is to make it easier to extend
    without breaking backward compatiblity. Many entries are provided as
    ASCII strings.

    Perf report/script have been modified to print the basic information as
    easy-to-parse ASCII strings. Extended information about CPU and NUMA
    topology may be requested with the -I option.

    Thanks to David Ahern for reviewing and testing the many versions of
    this patch.

    $ perf report --stdio
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # HEADER_CPU_TOPOLOGY info available, use -I to display
    # HEADER_NUMA_TOPOLOGY info available, use -I to display
    # ========
    #
    ...

    $ perf report --stdio -I
    # ========
    # captured on : Mon Sep 26 15:22:14 2011
    # hostname : quad
    # os release : 3.1.0-rc4-tip
    # perf version : 3.1.0-rc4
    # arch : x86_64
    # nrcpus online : 4
    # nrcpus avail : 4
    # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    # cpuid : GenuineIntel,6,15,11
    # total memory : 8105360 kB
    # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
    # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
    # sibling cores : 0-3
    # sibling threads : 0
    # sibling threads : 1
    # sibling threads : 2
    # sibling threads : 3
    # node0 meminfo : total = 8320608 kB, free = 7571024 kB
    # node0 cpu list : 0-3
    # ========
    #
    ...

    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
    Signed-off-by: Stephane Eranian
    [ committer notes: Use --show-info in the tools as was in the docs, rename
    perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
    conflict with f69b64f7 "perf: Support setting the disassembler style" ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

23 Jan, 2011

2 commits


12 Oct, 2010

1 commit

  • Changes:
    v4: Fix the cosmetic issue of redundant dot-ops
    v3: Change rmb() to use SYNC
    v2: Include mips unistd.h and define rmb()/cpu_relax() in tools/perf/perf.h

    Signed-off-by: Deng-Cheng Zhu
    Acked-by: Ralf Baechle
    Cc: David Daney
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Signed-off-by: Ingo Molnar

    Deng-Cheng Zhu
     

18 May, 2010

1 commit


19 Apr, 2010

1 commit


26 Mar, 2010

1 commit


12 Mar, 2010

1 commit


04 Mar, 2010

1 commit

  • The Thumb-2 instruction set does not provide an encoding
    for sub pc, r0, #95 as present in the rmb() definition used
    by perf. This results in compilation failure when using a
    compiler targetting an instruction set other than ARM.

    This patch redefines rmb() for ARM by casting the address
    of the kuser helper to a function pointer, therefore getting
    the compiler to take care of making the call.

    Patch taken against tip/master.

    Signed-off-by: Will Deacon
    Cc: Russell King - ARM Linux
    Cc: Jamie Iles
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Will Deacon
     

11 Dec, 2009

1 commit

  • Add definitions of rmb() and cpu_relax() and include the ARM
    unistd.h header. The __kuser_memory_barrier helper in the helper
    page is used to provide the correct memory barrier depending on
    the CPU type.

    [ The rmb() will work on v6 and v7, segfault on v5. Dynamic
    detection to add v5 support will be added later. ]

    Signed-off-by: Jamie Iles
    Cc: Russell King
    Cc: Peter Zijlstra
    Cc: Mikael Pettersson
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jamie Iles
     

19 Nov, 2009

1 commit


26 Oct, 2009

1 commit


21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

04 Sep, 2009

1 commit


23 Jul, 2009

1 commit

  • …nel/git/peterz/linux-2.6-perf

    * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits)
    perf_counter tools: Give perf top inherit option
    perf_counter tools: Fix vmlinux symbol generation breakage
    perf_counter: Detect debugfs location
    perf_counter: Add tracepoint support to perf list, perf stat
    perf symbol: C++ demangling
    perf: avoid structure size confusion by using a fixed size
    perf_counter: Fix throttle/unthrottle event logging
    perf_counter: Improve perf stat and perf record option parsing
    perf_counter: PERF_SAMPLE_ID and inherited counters
    perf_counter: Plug more stack leaks
    perf: Fix stack data leak
    perf_counter: Remove unused variables
    perf_counter: Make call graph option consistent
    perf_counter: Add perf record option to log addresses
    perf_counter: Log vfork as a fork event
    perf_counter: Synthesize VDSO mmap event
    perf_counter: Make sure we dont leak kernel memory to userspace
    perf_counter tools: Fix index boundary check
    perf_counter: Fix the tracepoint channel to perfcounters
    perf_counter, x86: Extend perf_counter Pentium M support
    ...

    Linus Torvalds
     

11 Jul, 2009

1 commit

  • …x/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf report: Add "Fractal" mode output - support callchains with relative overhead rate
    perf_counter tools: callchains: Manage the cumul hits on the fly
    perf report: Change default callchain parameters
    perf report: Use a modifiable string for default callchain options
    perf report: Warn on callchain output request from non-callchain file
    x86: atomic64: Inline atomic64_read() again
    x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
    x86: atomic64: Improve atomic64_xchg()
    x86: atomic64: Export APIs to modules
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
    x86: atomic64: Fix unclean type use in atomic64_xchg()
    x86: atomic64: Make atomic_read() type-safe
    x86: atomic64: Reduce size of functions
    x86: atomic64: Improve atomic64_add_return()
    x86: atomic64: Improve cmpxchg8b()
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
    x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
    perf report: Annotate variable initialization
    ...

    Linus Torvalds
     

10 Jul, 2009

1 commit

  • Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
    enable/disable both its counters. We use this for the
    global enable/disable, and clear all config bits (except EN)
    to disable individual counters.

    Actual ia32 hardware doesn't support lfence, so use a locked
    op without side-effect to implement a full barrier.

    perf stat and perf record seem to function correctly.

    [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]

    Signed-off-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Vince Weaver
     

05 Jul, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (27 commits)
    parisc: use generic atomic64 on 32-bit
    parisc: superio: fix build breakage
    parisc: Fix PCI resource allocation on non-PAT SBA machines
    parisc: perf: wire up sys_perf_counter_open
    parisc: add task_pt_regs macro
    parisc: wire sys_perf_counter_open to sys_ni_syscall
    parisc: inventory.c, fix bloated stack frame
    parisc: processor.c, fix bloated stack frame
    parisc: fix compile warning in mm/init.c
    parisc: remove dead code from sys_parisc32.c
    parisc: wire up rt_tgsigqueueinfo
    parisc: ensure broadcast tlb purge runs single threaded
    parisc: fix "delay!" timer handling
    parisc: fix mismatched parenthesis in memcpy.c
    parisc: Fix gcc 4.4 warning in lba_pci.c
    parisc: add parameter to read_cr16()
    parisc: decode_exc.c should include kernel.h
    parisc: remove obsolete hw_interrupt_type
    parisc: fix irq compile bugs in arch/parisc/kernel/irq.c
    parisc: advertise PCI devs after "assign_resources"
    ...

    Manually fixed up trivial conflicts in tools/perf/perf.h due to addition
    of SH vs HPPA perf-counter support.

    Linus Torvalds
     

03 Jul, 2009

1 commit


02 Jul, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: LCDC dcache flush for deferred io
    sh: Fix compiler error and include the definition of IS_ERR_VALUE
    sh: re-add LCDC fbdev support to the Migo-R defconfig
    sh: fix se7724 ceu names
    sh: ms7724se: Enable sh_eth in defconfig.
    arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
    sh: ms7724se: Add sh_eth support
    nommu: provide follow_pfn().
    sh: Kill off unused DEBUG_BOOTMEM symbol.
    perf_counter tools: add cpu_relax()/rmb() definitions for sh.
    sh64: Hook up page fault events for software perf counters.
    sh: Hook up page fault events for software perf counters.
    sh: make set_perf_counter_pending() static inline.
    clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.

    Linus Torvalds
     

01 Jul, 2009

2 commits

  • Enable -Wextra. This found a few real bugs plus a number
    of signed/unsigned type mismatches/uncleanlinesses. It
    also required a few annotations

    All things considered it was still worth it so lets try with
    this enabled for now.

    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
    perf report: Add --symbols parameter
    perf report: Add --comms parameter
    perf report: Add --dsos parameter
    perf_counter tools: Adjust only prelinked symbol's addresses
    perf_counter: Provide a way to enable counters on exec
    perf_counter tools: Reduce perf stat measurement overhead/skew
    perf stat: Use percentages for scaling output
    perf_counter, x86: Update x86_pmu after WARN()
    perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
    perf stat: Improve output
    perf stat: Fix multi-run stats
    perf stat: Add -n/--null option to run without counters
    perf_counter tools: Remove dead code
    perf_counter: Complete counter swap
    perf report: Print sorted callchains per histogram entries
    perf_counter tools: Prepare a small callchain framework
    perf record: Fix unhandled io return value
    perf_counter tools: Add alias for 'l1d' and 'l1i'
    perf-report: Add bare minimum PERF_EVENT_READ parsing
    perf-report: Add modes for inherited stats and no-samples
    ...

    Linus Torvalds
     

26 Jun, 2009

2 commits

  • We plan to display the callchains depending on some user-configurable
    parameters.

    To gather the callchains stats from the recorded stream in a fast way,
    this patch introduces an ad hoc radix tree adapted for callchains and also
    a rbtree to sort these callchains once we have gathered every events
    from the stream.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Create a structured file format that includes the full
    perf_counter_attr and all its relevant counter IDs so that
    the reporting program has full information.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Jun, 2009

1 commit


22 Jun, 2009

1 commit


20 Jun, 2009

1 commit

  • On 64-bit powerpc, __u64 is defined to be unsigned long rather than
    unsigned long long. This causes compiler warnings every time we
    print a __u64 value with %Lx.

    Rather than changing __u64, we define our own u64 to be unsigned long
    long on all architectures, and similarly s64 as signed long long.
    For consistency we also define u32, s32, u16, s16, u8 and s8. These
    definitions are put in a new header, types.h, because these definitions
    are needed in util/string.h and util/symbol.h.

    The main change here is the mechanical change of __[us]{64,32,16,8}
    to remove the "__". The other changes are:

    * Create types.h
    * Include types.h in perf.h, util/string.h and util/symbol.h
    * Add types.h to the LIB_H definition in Makefile
    * Added (u64) casts in process_overflow_event() and print_sym_table()
    to kill two remaining warnings.

    Signed-off-by: Paul Mackerras
    Acked-by: Peter Zijlstra
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

19 Jun, 2009

1 commit


12 Jun, 2009

1 commit

  • Provide for means of extending the perf_counter_attr in a 'natural' way.

    We allow growing the structure by appending fields at the end by specifying
    the full structure size inside it.

    When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
    When an old kernel sees a larger (new) structure, it will verify the tail
    consists of 0s, otherwise fail.

    If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
    native attribe size back into the provided structure.

    Furthermore, add some attribute verification, so that we'll fail counter
    creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
    the __reserved fields).

    (This ABI detail is introduced while keeping the existing syscall ABI.)

    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Jun, 2009

1 commit