23 Oct, 2013
1 commit
-
Before this patch, looking at 'perf bench sched pipe' behavior over
'top' only told us that something related to perf is running:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19934 mingo 20 0 54836 1296 952 R 18.6 0.0 0:00.56 perf
19935 mingo 20 0 54836 384 36 S 18.6 0.0 0:00.56 perfAfter the patch it's clearly visible what's going on:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19744 mingo 20 0 125m 3536 2644 R 68.2 0.0 0:01.12 sched-pipe
19745 mingo 20 0 125m 1172 276 R 68.2 0.0 0:01.12 sched-pipeThe benchmark-subsystem name is concatenated with the individual
testcase name.Unfortunately 'perf top' does not show the reconfigured name, possibly
because it caches ->comm[] values and does not recognize changes to
them?Also clean up a few bits in builtin-bench.c while at it and reorganize
the code and the output strings to be consistent.Use iterators to access the various arrays. Rename 'suites' concept to
'benchmark collection' and the 'bench_suite' to 'benchmark/bench'. The
many repetitions of 'suite' made the code harder to read and understand.The new output is:
comet:~/tip/tools/perf> ./perf bench
Usage:
perf bench [] []# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
all: All benchmarkscomet:~/tip/tools/perf> ./perf bench sched
# List of available benchmarks for collection 'sched':
messaging: Benchmark for scheduling and IPC
pipe: Benchmark for pipe() between two processes
all: Test all scheduler benchmarkscomet:~/tip/tools/perf> ./perf bench mem
# List of available benchmarks for collection 'mem':
memcpy: Benchmark for memcpy()
memset: Benchmark for memset() tests
all: Test all memory benchmarkscomet:~/tip/tools/perf> ./perf bench numa
# List of available benchmarks for collection 'numa':
mem: Benchmark for NUMA workloads
all: Test all NUMA benchmarksIndividual benchmark modules were not touched.
Signed-off-by: Ingo Molnar
Cc: David Ahern
Cc: Hitoshi Mitake
Cc: Jiri Olsa
Cc: Namhyung Kim
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20131023123756.GA17871@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo
09 Oct, 2013
1 commit
-
Standardize all the feature flags based on the HAVE_{FEATURE}_SUPPORT naming convention:
HAVE_ARCH_X86_64_SUPPORT
HAVE_BACKTRACE_SUPPORT
HAVE_CPLUS_DEMANGLE_SUPPORT
HAVE_DWARF_SUPPORT
HAVE_ELF_GETPHDRNUM_SUPPORT
HAVE_GTK2_SUPPORT
HAVE_GTK_INFO_BAR_SUPPORT
HAVE_LIBAUDIT_SUPPORT
HAVE_LIBELF_MMAP_SUPPORT
HAVE_LIBELF_SUPPORT
HAVE_LIBNUMA_SUPPORT
HAVE_LIBUNWIND_SUPPORT
HAVE_ON_EXIT_SUPPORT
HAVE_PERF_REGS_SUPPORT
HAVE_SLANG_SUPPORT
HAVE_STRLCPY_SUPPORTCc: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Namhyung Kim
Cc: David Ahern
Cc: Jiri Olsa
Link: http://lkml.kernel.org/n/tip-u3zvqejddfZhtrbYbfhi3spa@git.kernel.org
Signed-off-by: Ingo Molnar
30 Jan, 2013
2 commits
-
Commit "perf: Add 'perf bench numa mem'..." added a NUMA performance
benchmark to perf. Make this optional and test for required
dependencies.Signed-off-by: Peter Hurley
Acked-by: Ingo Molnar
Cc: Ingo Molnar
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1359337882-21821-1-git-send-email-peter@hurleysoftware.com
Signed-off-by: Arnaldo Carvalho de Melo -
Add a suite of NUMA performance benchmarks.
The goal was simulate the behavior and access patterns of real NUMA
workloads, via a wide range of parameters, so this tool goes well
beyond simple bzero() measurements that most NUMA micro-benchmarks use:- It processes the data and creates a chain of data dependencies,
like a real workload would. Neither the compiler, nor the
kernel (via KSM and other optimizations) nor the CPU can
eliminate parts of the workload.- It randomizes the initial state and also randomizes the target
addresses of the processing - it's not a simple forward scan
of addresses.- It provides flexible options to set process, thread and memory
relationship information: -G sets "global" memory shared between
all test processes, -P sets "process" memory shared by all
threads of a process and -T sets "thread" private memory.- There's a NUMA convergence monitoring and convergence latency
measurement option via -c and -m.- Micro-sleeps and synchronization can be injected to provoke lock
contention and scheduling, via the -u and -S options. This simulates
IO and contention.- The -x option instructs the workload to 'perturb' itself artificially
every N seconds, by moving to the first and last CPU of the system
periodically. This way the stability of convergence equilibrium and
the number of steps taken for the scheduler to reach equilibrium again
can be measured.- The amount of work can be specified via the -l loop count, and/or
via a -s seconds-timeout value.- CPU and node memory binding options, to test hard binding scenarios.
THP can be turned on and off via madvise() calls.- Live reporting of convergence progress in an 'at glance' output format.
Printing of convergence and deconvergence events.The 'perf bench numa mem -a' option will start an array of about 30
individual tests that will each output such measurements:# Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
5x5-bw-thread, 20.276, secs, runtime-max/thread
5x5-bw-thread, 20.004, secs, runtime-min/thread
5x5-bw-thread, 20.155, secs, runtime-avg/thread
5x5-bw-thread, 0.671, %, spread-runtime/thread
5x5-bw-thread, 21.153, GB, data/thread
5x5-bw-thread, 528.818, GB, data-total
5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
5x5-bw-thread, 1.043, GB/sec, thread-speed
5x5-bw-thread, 26.081, GB/sec, total-speedSee the help text and the code for more details.
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Cc: Steven Rostedt
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hugh Dickins
Signed-off-by: Ingo Molnar
25 Jan, 2013
1 commit
-
perf bench prints header message for bench suite before starting the
benchmark. However if the stdout is redirected to a file and bench
suite forks child processes this (and possibly other debugging
messages too) will be repeated multiple times.$ perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes runTotal time: 0.100 [sec]
$ perf bench sched messaging > result.txt
$ wc -l result.txt
391In this file, there were so many "Running sched/messaging benchmark..."
lines. This was because stdout is converted to fully-buffered due to
the redirection and inherited child processes. Other lines are printed
after reaping all those tasks.So fix it by flushing stdout before starting bench suites.
Signed-off-by: Namhyung Kim
Acked-by: Hitoshi Mitake
Cc: Hitoshi Mitake
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1357637966-8216-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
11 Sep, 2012
1 commit
-
perf defines both __used and __unused variables to use for marking
unused variables. The variable __used is defined to
__attribute__((__unused__)), which contradicts the kernel definition to
__attribute__((__used__)) for new gcc versions. On Android, __used is
also defined in system headers and this leads to warnings like: warning:
'__used__' attribute ignored__unused is not defined in the kernel and is not a standard definition.
If __unused is included everywhere instead of __used, this leads to
conflicts with glibc headers, since glibc has a variables with this name
in its headers.The best approach is to use __maybe_unused, the definition used in the
kernel for __attribute__((unused)). In this way there is only one
definition in perf sources (instead of 2 definitions that point to the
same thing: __used and __unused) and it works on both Linux and Android.
This patch simply replaces all instances of __used and __unused with
__maybe_unused.Signed-off-by: Irina Tirdea
Acked-by: Pekka Enberg
Cc: David Ahern
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
[ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
Signed-off-by: Arnaldo Carvalho de Melo
28 Jun, 2012
1 commit
-
The current perf-bench documentation has a couple of typos and even
lacks entire description of mem subsystem. Fix it.Reported-by: Ingo Molnar
Signed-off-by: Namhyung Kim
Acked-by: Hitoshi Mitake
Cc: Hitoshi Mitake
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo
25 Jan, 2012
1 commit
-
This simply clones the respective memcpy() implementation.
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/4F16D743020000780006D735@nat28.tlf.novell.com
Signed-off-by: Jan Beulich
Signed-off-by: Arnaldo Carvalho de Melo
18 May, 2010
1 commit
-
OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
tools is to set something that has an enum type, that is builtin
compatible with unsigned int.Several string constifications were done to make OPT_STRING require a
const char * type.Cc: Frédéric Weisbecker
Cc: Mike Galbraith
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Arnaldo Carvalho de Melo
14 Dec, 2009
1 commit
-
This patch adds a new "all" pseudo subsystem and an "all" pseudo
suite. These are for testing all subsystem and its all suite, or
all suite of one subsystem.(This patch also contains a few trivial comment fixes for
bench/* and output style fixes. I judged that there are no
necessity to make them into individual patch.)Example of use:
| % ./perf bench sched all # Test all suites of sched subsystem
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.414 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted 1000000 pipe operations between two tasks
|
| Total time: 10.999 [sec]
|
| 10.999317 usecs/op
| 90914 ops/sec
|
| % ./perf bench all # Test all suites of all subsystems
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.420 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted 1000000 pipe operations between two tasks
|
| Total time: 11.741 [sec]
|
| 11.741346 usecs/op
| 85169 ops/sec
|
| # Running mem/memcpy benchmark...
| # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ...
|
| 808.407437 MB/SecSigned-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
19 Nov, 2009
1 commit
-
'perf bench mem memcpy' is a benchmark suite for measuring memcpy()
performance.Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz:
| % perf bench mem memcpy -l 1GB
| # Running mem/memcpy benchmark...
| # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ...
|
| 726.216412 MB/SecSigned-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Frederic Weisbecker
LKML-Reference:
[ v2: updated changelog, clarified history of builtin-bench.c ]
Signed-off-by: Ingo Molnar
11 Nov, 2009
1 commit
-
This patch makes output of perf bench more friendly.
Current style of putput, keeping user wait
and printing everything suddenly when we finish,
may confuse users.So I improved it:
| % perf bench sched messaging
| # Running sched/messaging benchmark...
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar
10 Nov, 2009
1 commit
-
This patch modifies builtin-bench.c for processing common
options. The first option added is "--format".
Users of perf bench will be able to specify output style by
--format.Usage example:
% ./perf bench sched messaging # with no style specify
(20 sender and receiver processes per group)
(10 groups == 400 processes run)Total time:1.431 sec
% ./perf bench --format=simple sched messaging # specified
simple 1.431Signed-off-by: Hitoshi Mitake
Cc: Peter Zijlstra
Cc: Paul Mackerras
LKML-Reference:
Signed-off-by: Ingo Molnar
08 Nov, 2009
1 commit
-
This patch adds builtin-bench.c
builtin-bench.c is a general framework for benchmark suites.Signed-off-by: Hitoshi Mitake
Cc: Rusty Russell
Cc: Peter Zijlstra
Cc: Mike Galbraith
Cc: Arnaldo Carvalho de Melo
Cc: fweisbec@gmail.com
Cc: Jiri Kosina
LKML-Reference:
Signed-off-by: Ingo Molnar