28 May, 2011

5 commits

  • We now just warn the user about the fact and go on providing just
    userspace samples.

    This fixes a problem when no vmlinux is explicetely passed by the user,
    thus symbol_conf.vmlinux_name is NULL, no suitable vmlinux is found, and
    then we get:

    aldebaran:~> perf top -p 7557
    [kernel.kallsyms] with build id 44d9a989eabbd79e486bc079d6b743d397c204e0
    not found, continuing without symbols
    The (null) file can't be used

    Reported-by: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-cj2g81hn64wv2bipmqk4fy2m@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Reported-by: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-cyl5zmi1nu35vyu7l5im2pyv@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-weqbs0tkk2u0qp1xxdxxosfg@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • perf_evsel__alloc_fd allocates an array of file descriptors with the
    memory initialized to 0. The array has dimensions for cpus and threads.

    Later, __perf_evsel__open calls sys_perf_event_open for each cpu and thread
    dimensions. If the open fails for any of the cpus or threads then the fd's
    for this event are closed and the fd entry in the array is set to -1. Now,
    if the first attempt fails for the event (e.g., the event is not supported)
    the remaining dimensions (cpu > 0 and thread > 0) are not touched and left
    at the initialized value of 0.

    builtin-stat catches ENOENT and ENOSYS failures and allows the command to
    continue. The end result is that stat attempts to read from an fd of 0 which
    of course is stdin and so the command hangs until you type ctrl-D.

    Resolve by initializing the array to -1 since an fd < 0 is already
    handled.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1306511914-8016-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • Suggested-by: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-i1p8vrhq7xveyui6t1sc914e@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

26 May, 2011

3 commits

  • Where /usr/include/linux/const.h is not present, e.g. RHEL5.

    Reported-by: Srikar Dronamraju
    Cc: Srikar Dronamraju
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-ypcw2mu0w7dl1rrc6ncz3pee@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Perf uses /proc/modules to figure out where kernel modules are loaded.

    With the advent of kptr_restrict, non root users get zeroes for all module
    start addresses.

    So check if kptr_restrict is non zero and don't generate the syntethic
    PERF_RECORD_MMAP events for them.

    Warn the user about it in perf record and in perf report.

    In perf report the reference relocation symbol being zero means that
    kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't
    use it to fixup symbol addresses when using a valid kallsyms (in the buildid
    cache) or vmlinux (in the vmlinux path) build-id located automatically or
    specified by the user.

    Provide an explanation about it in 'perf report' if kernel samples were taken,
    checking if a suitable vmlinux or kallsyms was found/specified.

    Restricted /proc/kallsyms don't go to the buildid cache anymore.

    Example:

    [acme@emilia ~]$ perf record -F 100000 sleep 1

    WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check
    /proc/sys/kernel/kptr_restrict.

    Samples in kernel functions may not be resolved if a suitable vmlinux file is
    not found in the buildid cache or in the vmlinux path.

    Samples in kernel modules won't be resolved at all.

    If some relocation was applied (e.g. kexec) symbols may be misresolved even
    with a suitable vmlinux or kallsyms file.

    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ]
    [acme@emilia ~]$

    [acme@emilia ~]$ perf report --stdio
    Kernel address maps (/proc/{kallsyms,modules}) were restricted,
    check /proc/sys/kernel/kptr_restrict before running 'perf record'.

    If some relocation was applied (e.g. kexec) symbols may be misresolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. .....................
    #
    20.24% sleep [kernel.kallsyms] [k] page_fault
    20.04% sleep [kernel.kallsyms] [k] filemap_fault
    19.78% sleep [kernel.kallsyms] [k] __lru_cache_add
    19.69% sleep ld-2.12.so [.] memcpy
    14.71% sleep [kernel.kallsyms] [k] dput
    4.70% sleep [kernel.kallsyms] [k] flush_signal_handlers
    0.73% sleep [kernel.kallsyms] [k] perf_event_comm
    0.11% sleep [kernel.kallsyms] [k] native_write_msr_safe

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    This is because it found a suitable vmlinux (build-id checked) in
    /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long
    file name).

    If we remove that file from the vmlinux path:

    [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \
    /lib/modules/2.6.39-rc7+/build/vmlinux.OFF
    [acme@emilia ~]$ perf report --stdio
    [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562
    not found, continuing without symbols

    Kernel address maps (/proc/{kallsyms,modules}) were restricted, check
    /proc/sys/kernel/kptr_restrict before running 'perf record'.

    As no suitable kallsyms nor vmlinux was found, kernel samples can't be
    resolved.

    Samples in kernel modules can't be resolved as well.

    # Events: 13 cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... ................. ......
    #
    80.31% sleep [kernel.kallsyms] [k] 0xffffffff8103425a
    19.69% sleep ld-2.12.so [.] memcpy

    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    [acme@emilia ~]$

    Reported-by: Stephane Eranian
    Suggested-by: David Miller
    Cc: Dave Jones
    Cc: David Miller
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Jesper Juhl
    Cc: Tom Zanussi
    Cc: Arnaldo Carvalho de Melo
    Cc: trivial@kernel.org
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1105261011290.17400@swampdragon.chaosbits.net
    Signed-off-by: Ingo Molnar

    Jesper Juhl
     

24 May, 2011

4 commits

  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools: Fix sample type size calculation in 32 bits archs
    profile: Use vzalloc() rather than vmalloc() & memset()

    Linus Torvalds
     
  • The shift used here to count the number of bits set in
    the mask doesn't work above the low part for archs that
    are not 64 bits.

    Fix the constant used for the shift.

    This fixes a 32-bit perf top failure reported by Eric Dumazet:

    Can't parse sample, err = -14
    Can't parse sample, err = -14
    ...

    Reported-and-tested-by: Eric Dumazet
    Signed-off-by: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian

    Frederic Weisbecker
     
  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools: Fix sample size bit operations
    perf tools: Fix ommitted mmap data update on remap
    watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh
    watchdog: Disable watchdog when thresh is zero
    watchdog: Only disable/enable watchdog if neccessary
    watchdog: Fix rounding bug in get_sample_period()
    perf tools: Propagate event parse error handling
    perf tools: Robustify dynamic sample content fetch
    perf tools: Pre-check sample size before parsing
    perf tools: Move evlist sample helpers to evlist area
    perf tools: Remove junk code in mmap size handling
    perf tools: Check we are able to read the event size on mmap

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    b43: fix comment typo reqest -> request
    Haavard Skinnemoen has left Atmel
    cris: typo in mach-fs Makefile
    Kconfig: fix copy/paste-ism for dell-wmi-aio driver
    doc: timers-howto: fix a typo ("unsgined")
    perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
    md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
    treewide: fix a few typos in comments
    regulator: change debug statement be consistent with the style of the rest
    Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
    audit: acquire creds selectively to reduce atomic op overhead
    rtlwifi: don't touch with treewide double semicolon removal
    treewide: cleanup continuations and remove logging message whitespace
    ath9k_hw: don't touch with treewide double semicolon removal
    include/linux/leds-regulator.h: fix syntax in example code
    tty: fix typo in descripton of tty_termios_encode_baud_rate
    xtensa: remove obsolete BKL kernel option from defconfig
    m68k: fix comment typo 'occcured'
    arch:Kconfig.locks Remove unused config option.
    treewide: remove extra semicolons
    ...

    Linus Torvalds
     

23 May, 2011

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-ktest:
    ktest: Allow options to be used by other options
    ktest: Create variables for the ktest config files
    ktest: Reboot after each patchcheck run
    ktest: Reboot to good kernel after every bisect run
    ktest: If test failed due to timeout, print that
    ktest: Fix post install command

    Linus Torvalds
     
  • What we want is to count the number of bits in the mask,
    not some other random operation written in the middle
    of the night.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1306148788-6179-2-git-send-email-fweisbec@gmail.com
    [ Fixed perf_event__names[] alignment which was nearby and hurting my eyes ... ]
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Commit eac9eacee16 "perf tools: Check we are able to read the event
    size on mmap" brought a check to ensure we can read the size of the
    event before dereferencing it, and do a remap otherwise to move the
    buffer forward.

    However that remap was ommitting all the necessary work to
    update the new page offset, head, and to unmap previous pages,
    etc...

    To fix this, gather all the code that fetches the event in a
    seperate helper which does all the necessary checks about the
    header/event size and tells us anytime a remap is needed.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/1306148788-6179-3-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

22 May, 2011

7 commits


21 May, 2011

3 commits

  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • There are cases where one ktest option may be used within another
    ktest option. Allow them to be reused just like config variables
    but there are evaluated at time of test not config processing time.

    Thus having something like:

    MAKE_CMD = make ARCH=${ARCH}

    TEST_START
    ARCH = powerpc

    TEST_START
    ARCH = arm

    Will have the arch defined for each test iteration.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • I found that I constantly reuse information for each test case.
    It would be nice to just define a variable to reuse.

    For example I may have:

    TEST_START
    [...]
    TEST = ssh root@mybox /path/to/my/script

    TEST_START
    [...]
    TEST = ssh root@mybox /path/to/my/script

    [etc]

    The issue is, I may wont to change that script or one of the other
    fields. Then I need to update each line individually.

    With the addition of config variables (variables only used during parsing
    the config) we can simplify the config files. These variables can
    also be defined multiple times and each time the new value will
    overwrite the old value.

    The convention to use a config variable over a ktest option is to use :=
    instead of =.

    Now we could do:

    USER := root
    TARGET := mybox
    TEST_SCRIPT := /path/to/my/script
    TEST_CASE := ${USER}@${TARGET} ${TEST_SCRIPT}

    TEST_START
    [...]
    TEST = ${TEST_CASE}

    TEST_START
    [...]
    TEST = ${TEST_CASE}

    [etc]

    Now we just need to update the variables at the top.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

20 May, 2011

7 commits

  • The patches being checked may not leave the kernel in a state
    that the next run will allow the new kernel to be copied to the
    machine. Reboot to a known good kernel before continuing to the
    next kernel to test.

    Added option PATCHCHECK_SLEEP_TIME for the max time to sleep between
    patchcheck reboots.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Reboot after each bisect run regardless if the bisect passed
    or failed. The test may just be to boot the kernel and that kernel
    may not have a way to copy the next kerne to it. Reboot to a known
    good kernel after each bisect run.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If the test failed due to timeout for boot, print a message saying
    so. Otherwise the user will be confused to why their test just failed.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The command to run post install (for those that want initrds) was
    broken. Instead of doing a substitution for the $KERNEL_VERSION
    variable. It was replacing the entire command with nothing.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
    Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
    net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
    batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
    batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
    batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
    net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
    net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
    net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
    perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
    perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
    net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
    security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
    net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
    net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
    net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
    ...

    Linus Torvalds
     
  • …kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
    sched: Fix and optimise calculation of the weight-inverse
    sched: Avoid going ahead if ->cpus_allowed is not changed
    sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
    sched: Remove unused parameters from sched_fork() and wake_up_new_task()
    sched: Shorten the construction of the span cpu mask of sched domain
    sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
    sched: Remove unused 'this_best_prio arg' from balance_tasks()
    sched: Remove noop in alloc_rt_sched_group()
    sched: Get rid of lock_depth
    sched: Remove obsolete comment from scheduler_tick()
    sched: Fix sched_domain iterations vs. RCU
    sched: Next buddy hint on sleep and preempt path
    sched: Make set_*_buddy() work on non-task entities
    sched: Remove need_migrate_task()
    sched: Move the second half of ttwu() to the remote cpu
    sched: Restructure ttwu() some more
    sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
    sched: Remove rq argument from ttwu_stat()
    sched: Remove rq->lock from the first half of ttwu()
    sched: Drop rq->lock from sched_exec()
    ...

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix rt_rq runtime leakage bug

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (107 commits)
    perf stat: Add more cache-miss percentage printouts
    perf stat: Add -d -d and -d -d -d options to show more CPU events
    ftrace/kbuild: Add recordmcount files to force full build
    ftrace: Add self-tests for multiple function trace users
    ftrace: Modify ftrace_set_filter/notrace to take ops
    ftrace: Allow dynamically allocated function tracers
    ftrace: Implement separate user function filtering
    ftrace: Free hash with call_rcu_sched()
    ftrace: Have global_ops store the functions that are to be traced
    ftrace: Add ops parameter to ftrace_startup/shutdown functions
    ftrace: Add enabled_functions file
    ftrace: Use counters to enable functions to trace
    ftrace: Separate hash allocation and assignment
    ftrace: Create a global_ops to hold the filter and notrace hashes
    ftrace: Use hash instead for FTRACE_FL_FILTER
    ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functions
    perf bench, x86: Add alternatives-asm.h wrapper
    x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit
    x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
    x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
    ...

    Linus Torvalds
     

19 May, 2011

3 commits

  • Print out the cache-miss percentage as well if the cache refs were
    collected, for all the generic cache event types.

    Before:

    11,103,723,230 dTLB-loads # 622.471 M/sec ( +- 0.30% )
    87,065,337 dTLB-load-misses # 4.881 M/sec ( +- 0.90% )

    After:

    11,353,713,242 dTLB-loads # 626.020 M/sec ( +- 0.35% )
    113,393,472 dTLB-load-misses # 1.00% of all dTLB cache hits ( +- 0.49% )

    Also ASCII color highlight too high percentages, them when it's executed on the console.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-lkhwxsevdbd9a8nymx0vxc3y@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Print even more detailed statistics if requested via perf stat -d:

    -d: detailed events, L1 and LLC data cache
    -d -d: more detailed events, dTLB and iTLB events
    -d -d -d: very detailed events, adding prefetch events

    Full output looks like this now:

    Performance counter stats for '/home/mingo/hackbench 10' (5 runs):

    1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% )
    49,068 context-switches # 0.029 M/sec ( +- 16.66% )
    8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% )
    17,397 page-faults # 0.010 M/sec ( +- 0.46% )
    2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%]
    1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%]
    743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%]
    1,314,416,379 instructions # 0.56 insns per cycle
    # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%]
    272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%]
    3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%]
    449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%]
    22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%]
    6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%]
    1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%]
    411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%]
    27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%]
    464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%]
    10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%]
    1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%]
    117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%]
    4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%]
    1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%]

    0.195622057 seconds time elapsed ( +- 6.84% )

    Also clean up the attribute construction code to be appending, and factor
    it out into add_default_attributes().

    Tweak the coverage percentage printout a bit, so that it's easier to view it
    alongside the +- sttddev colum.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • perf bench needs this to build the kernel's memcpy routine:

    In file included from bench/mem-memcpy-x86-64-asm.S:2:0:
    bench/../../../arch/x86/lib/memcpy_64.S:7:33: fatal error: asm/alternative-asm.h: No such file or directory

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-c5d41xibgullk8h2280q4gv0@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

18 May, 2011

1 commit

  • This patch fixes an issue with event parsing.
    The following commit appears to have broken the
    ability to specify a comma separated list of events:

    commit ceb53fbf6dbb1df26d38379a262c6981fe73dd36
    Author: Ingo Molnar
    Date: Wed Apr 27 04:06:33 2011 +0200

    perf stat: Fail more clearly when an invalid modifier is specified

    This patch fixes this while preserving the desired effect:

    $ perf stat -e instructions:u,instructions:k ls /dev/null /dev/null

    Performance counter stats for 'ls /dev/null':

    365956 instructions:u # 0.00 insns per cycle
    731806 instructions:k # 0.00 insns per cycle

    0.001108862 seconds time elapsed

    $ perf stat -e task-clock-msecs true
    invalid event modifier: '-msecs'
    Run 'perf list' for a list of valid events and modifiers

    Signed-off-by: Stephane Eranian
    Cc: acme@redhat.com
    Cc: peterz@infradead.org
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/20110517133619.GA6999@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

16 May, 2011

1 commit


15 May, 2011

2 commits

  • The PERF_EVENT_IOC_SET_OUTPUT ioctl was returning -EINVAL when using
    --pid when monitoring multithreaded apps, as we can only share a ring
    buffer for events on the same thread if not doing per cpu.

    Fix it by using per thread ring buffers.

    Tested with:

    [root@felicio ~]# tuna -t 26131 -CP | nl
    1 thread ctxt_switches
    2 pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
    3 26131 OTHER 0 0,1 10814276 2397830 chromium-browse
    4 642 OTHER 0 0,1 14688 0 chromium-browse
    5 26148 OTHER 0 0,1 713602 115479 chromium-browse
    6 26149 OTHER 0 0,1 801958 2262 chromium-browse
    7 26150 OTHER 0 0,1 1271128 248 chromium-browse
    8 26151 OTHER 0 0,1 3 0 chromium-browse
    9 27049 OTHER 0 0,1 36796 9 chromium-browse
    10 618 OTHER 0 0,1 14711 0 chromium-browse
    11 661 OTHER 0 0,1 14593 0 chromium-browse
    12 29048 OTHER 0 0,1 28125 0 chromium-browse
    13 26143 OTHER 0 0,1 2202789 781 chromium-browse
    [root@felicio ~]#

    So 11 threads under pid 26131, then:

    [root@felicio ~]# perf record -F 50000 --pid 26131

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7fa4a2538000-7fa4a25b9000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    2 7fa4a25b9000-7fa4a263a000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    3 7fa4a263a000-7fa4a26bb000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    4 7fa4a26bb000-7fa4a273c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    5 7fa4a273c000-7fa4a27bd000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    6 7fa4a27bd000-7fa4a283e000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    7 7fa4a283e000-7fa4a28bf000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    8 7fa4a28bf000-7fa4a2940000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    9 7fa4a2940000-7fa4a29c1000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    10 7fa4a29c1000-7fa4a2a42000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    11 7fa4a2a42000-7fa4a2ac3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    11 mmaps, one per thread since we didn't specify any CPU list, so we need one
    mmap per thread and:

    [root@felicio ~]# perf record -F 50000 --pid 26131
    ^M
    ^C[ perf record: Woken up 79 times to write data ]
    [ perf record: Captured and wrote 20.614 MB perf.data (~900639 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 371310 26131
    2 96516 26148
    3 95694 26149
    4 95203 26150
    5 7291 26143
    6 87 27049
    7 76 661
    8 60 29048
    9 47 618
    10 43 642
    [root@felicio ~]#

    Ok, one of the threads, 26151 was quiescent, so no samples there, but all the
    others are there.

    Then, if I specify one CPU:

    [root@felicio ~]# perf record -F 50000 --pid 26131 --cpu 1
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.680 MB perf.data (~29730 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 8444 26131
    2 2584 26149
    3 2518 26148
    4 2324 26150
    5 123 26143
    6 9 661
    7 9 29048
    [root@felicio ~]#

    This machine has two cores, so fewer threads appeared on the radar, and:

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7f484b922000-7f484b9a3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    Just one mmap, as now we can use just one per-cpu buffer instead of the
    per-thread needed in the previous case.

    For global profiling:

    [root@felicio ~]# perf record -F 50000 -a
    ^C[ perf record: Woken up 26 times to write data ]
    [ perf record: Captured and wrote 7.128 MB perf.data (~311412 samples) ]

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7fb49b435000-7fb49b4b6000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    2 7fb49b4b6000-7fb49b537000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    It uses per-cpu buffers.

    For just one thread:

    [root@felicio ~]# perf record -F 50000 --tid 26148
    ^C[ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.330 MB perf.data (~14426 samples) ]

    [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
    1 9969 26148
    [root@felicio ~]#

    [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
    1 7f286a51b000-7f286a59c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
    [root@felicio ~]#

    Tested-by: David Ahern
    Tested-by: Lin Ming
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The perf_evlist__create_maps was discarding the --cpu parameter when a
    --pid or --tid was specified, fix that.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    Link: http://lkml.kernel.org/r/20110426204401.GB1746@ghostprotocols.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

12 May, 2011

1 commit