23 Oct, 2013

1 commit

  • Before this patch, looking at 'perf bench sched pipe' behavior over
    'top' only told us that something related to perf is running:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    19934 mingo 20 0 54836 1296 952 R 18.6 0.0 0:00.56 perf
    19935 mingo 20 0 54836 384 36 S 18.6 0.0 0:00.56 perf

    After the patch it's clearly visible what's going on:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    19744 mingo 20 0 125m 3536 2644 R 68.2 0.0 0:01.12 sched-pipe
    19745 mingo 20 0 125m 1172 276 R 68.2 0.0 0:01.12 sched-pipe

    The benchmark-subsystem name is concatenated with the individual
    testcase name.

    Unfortunately 'perf top' does not show the reconfigured name, possibly
    because it caches ->comm[] values and does not recognize changes to
    them?

    Also clean up a few bits in builtin-bench.c while at it and reorganize
    the code and the output strings to be consistent.

    Use iterators to access the various arrays. Rename 'suites' concept to
    'benchmark collection' and the 'bench_suite' to 'benchmark/bench'. The
    many repetitions of 'suite' made the code harder to read and understand.

    The new output is:

    comet:~/tip/tools/perf> ./perf bench
    Usage:
    perf bench [] []

    # List of all available benchmark collections:

    sched: Scheduler and IPC benchmarks
    mem: Memory access benchmarks
    numa: NUMA scheduling and MM benchmarks
    all: All benchmarks

    comet:~/tip/tools/perf> ./perf bench sched

    # List of available benchmarks for collection 'sched':

    messaging: Benchmark for scheduling and IPC
    pipe: Benchmark for pipe() between two processes
    all: Test all scheduler benchmarks

    comet:~/tip/tools/perf> ./perf bench mem

    # List of available benchmarks for collection 'mem':

    memcpy: Benchmark for memcpy()
    memset: Benchmark for memset() tests
    all: Test all memory benchmarks

    comet:~/tip/tools/perf> ./perf bench numa

    # List of available benchmarks for collection 'numa':

    mem: Benchmark for NUMA workloads
    all: Test all NUMA benchmarks

    Individual benchmark modules were not touched.

    Signed-off-by: Ingo Molnar
    Cc: David Ahern
    Cc: Hitoshi Mitake
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20131023123756.GA17871@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ingo Molnar
     

09 Oct, 2013

1 commit

  • Standardize all the feature flags based on the HAVE_{FEATURE}_SUPPORT naming convention:

    HAVE_ARCH_X86_64_SUPPORT
    HAVE_BACKTRACE_SUPPORT
    HAVE_CPLUS_DEMANGLE_SUPPORT
    HAVE_DWARF_SUPPORT
    HAVE_ELF_GETPHDRNUM_SUPPORT
    HAVE_GTK2_SUPPORT
    HAVE_GTK_INFO_BAR_SUPPORT
    HAVE_LIBAUDIT_SUPPORT
    HAVE_LIBELF_MMAP_SUPPORT
    HAVE_LIBELF_SUPPORT
    HAVE_LIBNUMA_SUPPORT
    HAVE_LIBUNWIND_SUPPORT
    HAVE_ON_EXIT_SUPPORT
    HAVE_PERF_REGS_SUPPORT
    HAVE_SLANG_SUPPORT
    HAVE_STRLCPY_SUPPORT

    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Link: http://lkml.kernel.org/n/tip-u3zvqejddfZhtrbYbfhi3spa@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

30 Jan, 2013

2 commits

  • Commit "perf: Add 'perf bench numa mem'..." added a NUMA performance
    benchmark to perf. Make this optional and test for required
    dependencies.

    Signed-off-by: Peter Hurley
    Acked-by: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1359337882-21821-1-git-send-email-peter@hurleysoftware.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Peter Hurley
     
  • Add a suite of NUMA performance benchmarks.

    The goal was simulate the behavior and access patterns of real NUMA
    workloads, via a wide range of parameters, so this tool goes well
    beyond simple bzero() measurements that most NUMA micro-benchmarks use:

    - It processes the data and creates a chain of data dependencies,
    like a real workload would. Neither the compiler, nor the
    kernel (via KSM and other optimizations) nor the CPU can
    eliminate parts of the workload.

    - It randomizes the initial state and also randomizes the target
    addresses of the processing - it's not a simple forward scan
    of addresses.

    - It provides flexible options to set process, thread and memory
    relationship information: -G sets "global" memory shared between
    all test processes, -P sets "process" memory shared by all
    threads of a process and -T sets "thread" private memory.

    - There's a NUMA convergence monitoring and convergence latency
    measurement option via -c and -m.

    - Micro-sleeps and synchronization can be injected to provoke lock
    contention and scheduling, via the -u and -S options. This simulates
    IO and contention.

    - The -x option instructs the workload to 'perturb' itself artificially
    every N seconds, by moving to the first and last CPU of the system
    periodically. This way the stability of convergence equilibrium and
    the number of steps taken for the scheduler to reach equilibrium again
    can be measured.

    - The amount of work can be specified via the -l loop count, and/or
    via a -s seconds-timeout value.

    - CPU and node memory binding options, to test hard binding scenarios.
    THP can be turned on and off via madvise() calls.

    - Live reporting of convergence progress in an 'at glance' output format.
    Printing of convergence and deconvergence events.

    The 'perf bench numa mem -a' option will start an array of about 30
    individual tests that will each output such measurements:

    # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
    5x5-bw-thread, 20.276, secs, runtime-max/thread
    5x5-bw-thread, 20.004, secs, runtime-min/thread
    5x5-bw-thread, 20.155, secs, runtime-avg/thread
    5x5-bw-thread, 0.671, %, spread-runtime/thread
    5x5-bw-thread, 21.153, GB, data/thread
    5x5-bw-thread, 528.818, GB, data-total
    5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
    5x5-bw-thread, 1.043, GB/sec, thread-speed
    5x5-bw-thread, 26.081, GB/sec, total-speed

    See the help text and the code for more details.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Jan, 2013

1 commit

  • perf bench prints header message for bench suite before starting the
    benchmark. However if the stdout is redirected to a file and bench
    suite forks child processes this (and possibly other debugging
    messages too) will be repeated multiple times.

    $ perf bench sched messaging
    # Running sched/messaging benchmark...
    # 20 sender and receiver processes per group
    # 10 groups == 400 processes run

    Total time: 0.100 [sec]

    $ perf bench sched messaging > result.txt
    $ wc -l result.txt
    391

    In this file, there were so many "Running sched/messaging benchmark..."
    lines. This was because stdout is converted to fully-buffered due to
    the redirection and inherited child processes. Other lines are printed
    after reaping all those tasks.

    So fix it by flushing stdout before starting bench suites.

    Signed-off-by: Namhyung Kim
    Acked-by: Hitoshi Mitake
    Cc: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1357637966-8216-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

11 Sep, 2012

1 commit

  • perf defines both __used and __unused variables to use for marking
    unused variables. The variable __used is defined to
    __attribute__((__unused__)), which contradicts the kernel definition to
    __attribute__((__used__)) for new gcc versions. On Android, __used is
    also defined in system headers and this leads to warnings like: warning:
    '__used__' attribute ignored

    __unused is not defined in the kernel and is not a standard definition.
    If __unused is included everywhere instead of __used, this leads to
    conflicts with glibc headers, since glibc has a variables with this name
    in its headers.

    The best approach is to use __maybe_unused, the definition used in the
    kernel for __attribute__((unused)). In this way there is only one
    definition in perf sources (instead of 2 definitions that point to the
    same thing: __used and __unused) and it works on both Linux and Android.
    This patch simply replaces all instances of __used and __unused with
    __maybe_unused.

    Signed-off-by: Irina Tirdea
    Acked-by: Pekka Enberg
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
    [ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Irina Tirdea
     

28 Jun, 2012

1 commit

  • The current perf-bench documentation has a couple of typos and even
    lacks entire description of mem subsystem. Fix it.

    Reported-by: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Acked-by: Hitoshi Mitake
    Cc: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

25 Jan, 2012

1 commit

  • This simply clones the respective memcpy() implementation.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/4F16D743020000780006D735@nat28.tlf.novell.com
    Signed-off-by: Jan Beulich
    Signed-off-by: Arnaldo Carvalho de Melo

    Jan Beulich
     

18 May, 2010

1 commit

  • OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these
    tools is to set something that has an enum type, that is builtin
    compatible with unsigned int.

    Several string constifications were done to make OPT_STRING require a
    const char * type.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

14 Dec, 2009

1 commit

  • This patch adds a new "all" pseudo subsystem and an "all" pseudo
    suite. These are for testing all subsystem and its all suite, or
    all suite of one subsystem.

    (This patch also contains a few trivial comment fixes for
    bench/* and output style fixes. I judged that there are no
    necessity to make them into individual patch.)

    Example of use:

    | % ./perf bench sched all # Test all suites of sched subsystem
    | # Running sched/messaging benchmark...
    | # 20 sender and receiver processes per group
    | # 10 groups == 400 processes run
    |
    | Total time: 0.414 [sec]
    |
    | # Running sched/pipe benchmark...
    | # Extecuted 1000000 pipe operations between two tasks
    |
    | Total time: 10.999 [sec]
    |
    | 10.999317 usecs/op
    | 90914 ops/sec
    |
    | % ./perf bench all # Test all suites of all subsystems
    | # Running sched/messaging benchmark...
    | # 20 sender and receiver processes per group
    | # 10 groups == 400 processes run
    |
    | Total time: 0.420 [sec]
    |
    | # Running sched/pipe benchmark...
    | # Extecuted 1000000 pipe operations between two tasks
    |
    | Total time: 11.741 [sec]
    |
    | 11.741346 usecs/op
    | 85169 ops/sec
    |
    | # Running mem/memcpy benchmark...
    | # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ...
    |
    | 808.407437 MB/Sec

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

19 Nov, 2009

1 commit

  • 'perf bench mem memcpy' is a benchmark suite for measuring memcpy()
    performance.

    Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz:

    | % perf bench mem memcpy -l 1GB
    | # Running mem/memcpy benchmark...
    | # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ...
    |
    | 726.216412 MB/Sec

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    [ v2: updated changelog, clarified history of builtin-bench.c ]
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

11 Nov, 2009

1 commit

  • This patch makes output of perf bench more friendly.
    Current style of putput, keeping user wait
    and printing everything suddenly when we finish,
    may confuse users.

    So I improved it:

    | % perf bench sched messaging
    | # Running sched/messaging benchmark...
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

10 Nov, 2009

1 commit

  • This patch modifies builtin-bench.c for processing common
    options. The first option added is "--format".
    Users of perf bench will be able to specify output style by
    --format.

    Usage example:

    % ./perf bench sched messaging # with no style specify
    (20 sender and receiver processes per group)
    (10 groups == 400 processes run)

    Total time:1.431 sec

    % ./perf bench --format=simple sched messaging # specified
    simple 1.431

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

08 Nov, 2009

1 commit